在mapper中獲得inputsplit的資訊

2022-10-30 18:09:28

在社群版的hadoop版本0.19/0.20中,當使用普通的輸入的時候,比如

job.setInputFormatClass(TextInputFormat.class);

在mapper運作的時候,可以用如下的方法得到對應的filesplit,也就能拿到對應的輸入路徑,等等資訊.

(FileSplit)(reporter.getInputSplit()); 0.19

(FileSplit)(context.getInputSplit());0.20

但是如果是使用

MultipleInputs.addInputPath(job, new Path(path),

SequenceFileInputFormat.class, ProfileMapper.class);

在mapper中再使用上面的那種方式,就會報出一個類型轉換錯誤

java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit cannot be cast to org.apache.hadoop.mapreduce.lib.input.FileSplit

而我們需要的filesplit實際上就是TaggedInputSplit中的成員變量inputSplit

然而TaggedInputSplit這個類在社群版中并不是public的,是以我們并不能直接直接拿到對應的資訊了.

不知道後續的社群版是怎麼做的?可能已經修改了吧

我們公司使用的是0.19的加強版,采用的方式是把TaggedInputSplit聲明為public,讓後重新打一個包釋出.這個是最簡單的了.

(FileSplit)((TaggedInputSplit)reporter.getInputSplit()).getInputSplit(); 這樣就能獲得

在社群版的hadoop版本0.19/0.20中,當使用普通的輸入的時候,比如

job.setInputFormatClass(TextInputFormat.class);

在mapper運作的時候,可以用如下的方法得到對應的filesplit,也就能拿到對應的輸入路徑,等等資訊.

(FileSplit)(reporter.getInputSplit()); 0.19

(FileSplit)(context.getInputSplit());0.20

但是如果是使用

MultipleInputs.addInputPath(job, new Path(path),

SequenceFileInputFormat.class, ProfileMapper.class);

在mapper中再使用上面的那種方式,就會報出一個類型轉換錯誤

java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit cannot be cast to org.apache.hadoop.mapreduce.lib.input.FileSplit

而我們需要的filesplit實際上就是TaggedInputSplit中的成員變量inputSplit

然而TaggedInputSplit這個類在社群版中并不是public的,是以我們并不能直接直接拿到對應的資訊了.

不知道後續的社群版是怎麼做的?可能已經修改了吧

我們公司使用的是0.19的加強版,采用的方式是把TaggedInputSplit聲明為public,讓後重新打一個包釋出.這個是最簡單的了.

(FileSplit)((TaggedInputSplit)reporter.getInputSplit()).getInputSplit(); 這樣就能獲得

另外還可以直接通過反射來獲得TaggedInputSplit中的inputSplit.處理過程就不寫了

private String getFilePath(Context context) throws IOException {
    // FileSplit fileSplit = (FileSplit) context.getInputSplit();
    InputSplit split = context.getInputSplit();
    Class<? extends InputSplit> splitClass = split.getClass();
    FileSplit fileSplit = null;
    if (splitClass.equals(FileSplit.class)) {
      fileSplit = (FileSplit) split;
    } else if (splitClass.getName().equals("org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit")) {
      // begin reflection hackery...
      try {
        Method getInputSplitMethod = splitClass.getDeclaredMethod("getInputSplit");
        getInputSplitMethod.setAccessible(true);
        fileSplit = (FileSplit) getInputSplitMethod.invoke(split);
      } catch (Exception e) {
        // wrap and re-throw error
        throw new IOException(e);
      }
      // end reflection hackery
    }
    return fileSplit.getPath().toString();
  }

在mapper中獲得inputsplit的資訊

繼續閱讀

dos指令(轉東轉西)

DOS批處理腳本語言簡介

批處理程式設計- -介紹DOS/BAT

使用Windbg調試.Net應用程式

MapReduce運作Wordcount時一直卡在INFO mapreduce.Job: Running job，web檢視一直處于accepted階段

MapReduce(一)：入門級程式wordcount及其分析

HiveQl語句應用執行個體：WordCount具體步驟如下：

用mapreduce計算wordCount和手機流量統計程式運作過程WordCount統計手機流量統計

Hadoop之運作wordcount

Eclipse運作WordCount（詳細版）相關連接配接Eclipse運作WordCount

在DOS下運作不了ipconfig指令

GNU科學函數庫[參考手冊][v0.1 Build 090201 Beta][GNU Scientific Library]

專家訪談：搜尋開源力量：Lucene技術前景

基于XOR的加密程式

什麼是BNF範式

MapReduce的幾個企業級經典面試案例MapReduce的幾個企業級經典面試案例