指令如下:
hadoop jar /usr/local/hadoop/hadoop-streaming-0.23.6.jar \
-input /hdfs/input/path -output /hdfs/output/path \
-mapper "python mapper.py" -reducer "python reducer.py" \
-file mapper.py -file reducer.py
注意事項:
hdfs使用者執行;
-input和-output為hdfs路徑,且output路徑應該為不存在的路徑;
-mapper和-reducer中py需加python *.py
-file為必需項,将本地*.py檔案打包放到叢集上,供叢集其他機器執行;