天天看點

Eclipse運作WordCount(詳細版)相關連接配接Eclipse運作WordCount

不鏡于水,而鏡于人,則吉兇可鑒也

不蹶于山,而蹶于垤,則細微宜防也

相關連接配接

HDFS相關知識

  • Hadoop分布式檔案系統(HDFS)快速入門
  • Hadoop分布式檔案系統(HDFS)知識梳理(超詳細)

Hadoop叢集連接配接

  • Eclipse連接配接Hadoop叢集
  • IntelliJ IDEA連接配接Hadoop叢集

HDFS Java API

Hadoop分布式檔案系統(HDFS)Java接口(HDFS Java API)詳細版

WordCount程式分析

使用Java API編寫WordCount程式

Eclipse運作WordCount

檔案下載下傳

  • WordCount.java 提取碼2kwo
  • log4j.properties 提取碼tpz9
  • data.txt 提取碼zefp

具體步驟

注意:Eclipse連接配接Hadoop叢集執行完所有步驟後方可進行接下來的操作

  1. 打開Eclipse,依次點選“File”→“New”→“Map/ReduceProject”,點選“Next”
    Eclipse運作WordCount(詳細版)相關連接配接Eclipse運作WordCount
  2. 在彈出的視窗填寫項目名,選擇項目路徑,點選“Finish”
    Eclipse運作WordCount(詳細版)相關連接配接Eclipse運作WordCount
  3. 在mapreduce項目的src目錄中建立cn.neu包,點選“Finish”
    Eclipse運作WordCount(詳細版)相關連接配接Eclipse運作WordCount
  4. 将下載下傳的WordCount.java檔案拷貝粘貼至cn.neu包中(直接拖拽即可)
  5. 使用Xftp等檔案傳輸軟體将遠端Hadoop叢集安裝目錄下的hadoop/hadoop-2.6.0/etc/hadoop目錄下的core-site.xml和hdfs-site.xml傳輸到本地
    Eclipse運作WordCount(詳細版)相關連接配接Eclipse運作WordCount

    上述兩個XML檔案和下載下傳的log4j.properties檔案一起拷貝到src中

    注:若不清楚上述XML檔案如何配置,推薦參考多台Linux虛拟機Hadoop叢集的安裝與部署(超詳細版)

    Eclipse運作WordCount(詳細版)相關連接配接Eclipse運作WordCount
    若不添加兩個XML檔案,會産生如下錯誤
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/G:/hadoop-2.6.0/share/hadoop/common/lib/hadoop-auth-2.6.0.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/test/input/data.txt
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:385)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
	at java.base/java.security.AccessController.doPrivileged(Native Method)
	at java.base/javax.security.auth.Subject.doAs(Unknown Source)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
	at cn.neu.WordCount.main(WordCount.java:60)
           
  1. 右擊HDFS根目錄,點選“Create new directory”
    Eclipse運作WordCount(詳細版)相關連接配接Eclipse運作WordCount
    輸入test後點選“OK”
    Eclipse運作WordCount(詳細版)相關連接配接Eclipse運作WordCount

    在Project Explorer框内右擊,點選Refresh重新整理後,即可看到建立的目錄

    右擊test檔案夾,在此檔案夾下建立目錄input,重新整理後如下

    Eclipse運作WordCount(詳細版)相關連接配接Eclipse運作WordCount
  2. 右擊input目錄,選擇Upload files to DFS(HDFS以前也稱DFS)
    Eclipse運作WordCount(詳細版)相關連接配接Eclipse運作WordCount
    選擇下載下傳的data.txt檔案後,點選“打開”,再次重新整理Project Explorer,如下圖所示
    Eclipse運作WordCount(詳細版)相關連接配接Eclipse運作WordCount
  3. WordCount.java代碼中有兩處參數值,是以需要配置參數

FileInputFormat.addInputPath(job, new Path(otherArgs[0]));

FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

在代碼編輯處右鍵滑鼠,依次點選“Run As”→“Run Configurations”

Eclipse運作WordCount(詳細版)相關連接配接Eclipse運作WordCount

點選Arguments,輸入上一步驟中設定的data.txt路徑和程式最終的輸出路徑,點選“Apply”後點選“Run”開始運作程式

注意:不可再程式執行前在test目錄中建立output目錄,output目錄務必不存在!否則會産生目錄已存在的錯誤!

Eclipse運作WordCount(詳細版)相關連接配接Eclipse運作WordCount
  1. 可能會報出如下錯誤(若未報該錯誤,直接跳過此步驟)
Exception in thread "main" java.lang.ExceptionInInitializerError
	at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:438)
	at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:484)
	at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
	at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
	at cn.neu.WordCount.main(WordCount.java:45)
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
	at java.base/java.lang.String.checkBoundsBeginEnd(Unknown Source)
	at java.base/java.lang.String.substring(Unknown Source)
	at org.apache.hadoop.util.Shell.<clinit>(Shell.java:49)
	... 5 more
           

點選

(Shell.java:49)

,進入如下界面,點選Attach Source

Eclipse運作WordCount(詳細版)相關連接配接Eclipse運作WordCount

進入以下界面後,依次點選“External loaction”→“External file”,根據上圖中的路徑找到sources檔案夾,打開後點選hadoop-common-2.6.0-sources.jar,點選“打開”,最後點選“OK”

Eclipse運作WordCount(詳細版)相關連接配接Eclipse運作WordCount
Eclipse運作WordCount(詳細版)相關連接配接Eclipse運作WordCount
Eclipse運作WordCount(詳細版)相關連接配接Eclipse運作WordCount

再次點選

(Shell.java:49)

可檢視其源碼,定位到第49行,源碼如下

private static boolean IS_JAVA7_OR_ABOVE =

System.getProperty(“java.version”).substring(0, 3).compareTo(“1.7”) >= 0;

結合如下錯誤資訊

at java.base/java.lang.String.checkBoundsBeginEnd(Unknown Source)

at java.base/java.lang.String.substring(Unknown Source)

即找不到字元串,是以需要在主函數中添加如下代碼

System.setProperty("java.version", "1.8");

,其中後面的數字比1.7大即可

  1. 若程式可以正常運作,等待程式運作完畢後,右擊Project Explorer中Hadoop下建立的test目錄,點選Refresh重新整理,可在其中看到output目錄
    Eclipse運作WordCount(詳細版)相關連接配接Eclipse運作WordCount
    輕按兩下part-r-0000檔案可檢視程式運作結果
    Eclipse運作WordCount(詳細版)相關連接配接Eclipse運作WordCount
  2. 若要再次執行,要麼在參數配置中更改輸出目錄,要麼删除輸出路徑下的檔案
    Eclipse運作WordCount(詳細版)相關連接配接Eclipse運作WordCount

    有一個一勞永逸的方法,即在程式中主函數略加改動,即每次進行運算前檢查輸出路徑是否存在,若存在則删除輸出路徑

    改動前

System.setProperty("HADOOP_USER_NAME", "root");
        System.setProperty("java.version", "1.8");
		Configuration conf = new Configuration();
		String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
		if(otherArgs.length != 2){
			System.err.println("Usage WordCount <int> <out>");
			System.exit(2);
		}
           

改動後

System.setProperty("HADOOP_USER_NAME", "root");
		System.setProperty("java.version", "1.8");
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(conf);
		String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
		if(otherArgs.length != 2){
			System.err.println("Usage WordCount <int> <out>");
			System.exit(2);
		}
		Path outPath = new Path(otherArgs[1]);
		if(fs.exists(outPath)) {
			fs.delete(outPath, true);
		}