天天看点

Eclipse运行WordCount(详细版)相关连接Eclipse运行WordCount

不镜于水,而镜于人,则吉凶可鉴也

不蹶于山,而蹶于垤,则细微宜防也

相关连接

HDFS相关知识

  • Hadoop分布式文件系统(HDFS)快速入门
  • Hadoop分布式文件系统(HDFS)知识梳理(超详细)

Hadoop集群连接

  • Eclipse连接Hadoop集群
  • IntelliJ IDEA连接Hadoop集群

HDFS Java API

Hadoop分布式文件系统(HDFS)Java接口(HDFS Java API)详细版

WordCount程序分析

使用Java API编写WordCount程序

Eclipse运行WordCount

文件下载

  • WordCount.java 提取码2kwo
  • log4j.properties 提取码tpz9
  • data.txt 提取码zefp

具体步骤

注意:Eclipse连接Hadoop集群执行完所有步骤后方可进行接下来的操作

  1. 打开Eclipse,依次点击“File”→“New”→“Map/ReduceProject”,点击“Next”
    Eclipse运行WordCount(详细版)相关连接Eclipse运行WordCount
  2. 在弹出的窗口填写项目名,选择项目路径,点击“Finish”
    Eclipse运行WordCount(详细版)相关连接Eclipse运行WordCount
  3. 在mapreduce项目的src目录中新建cn.neu包,点击“Finish”
    Eclipse运行WordCount(详细版)相关连接Eclipse运行WordCount
  4. 将下载的WordCount.java文件拷贝粘贴至cn.neu包中(直接拖拽即可)
  5. 使用Xftp等文件传输软件将远程Hadoop集群安装目录下的hadoop/hadoop-2.6.0/etc/hadoop目录下的core-site.xml和hdfs-site.xml传输到本地
    Eclipse运行WordCount(详细版)相关连接Eclipse运行WordCount

    上述两个XML文件和下载的log4j.properties文件一起拷贝到src中

    注:若不清楚上述XML文件如何配置,推荐参考多台Linux虚拟机Hadoop集群的安装与部署(超详细版)

    Eclipse运行WordCount(详细版)相关连接Eclipse运行WordCount
    若不添加两个XML文件,会产生如下错误
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/G:/hadoop-2.6.0/share/hadoop/common/lib/hadoop-auth-2.6.0.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Exception in thread "main" org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: file:/test/input/data.txt
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:321)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:264)
	at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:385)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:597)
	at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:614)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
	at java.base/java.security.AccessController.doPrivileged(Native Method)
	at java.base/javax.security.auth.Subject.doAs(Unknown Source)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
	at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1314)
	at cn.neu.WordCount.main(WordCount.java:60)
           
  1. 右击HDFS根目录,点击“Create new directory”
    Eclipse运行WordCount(详细版)相关连接Eclipse运行WordCount
    输入test后点击“OK”
    Eclipse运行WordCount(详细版)相关连接Eclipse运行WordCount

    在Project Explorer框内右击,点击Refresh刷新后,即可看到新建的目录

    右击test文件夹,在此文件夹下建立目录input,刷新后如下

    Eclipse运行WordCount(详细版)相关连接Eclipse运行WordCount
  2. 右击input目录,选择Upload files to DFS(HDFS以前也称DFS)
    Eclipse运行WordCount(详细版)相关连接Eclipse运行WordCount
    选择下载的data.txt文件后,点击“打开”,再次刷新Project Explorer,如下图所示
    Eclipse运行WordCount(详细版)相关连接Eclipse运行WordCount
  3. WordCount.java代码中有两处参数值,因此需要配置参数

FileInputFormat.addInputPath(job, new Path(otherArgs[0]));

FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

在代码编辑处右键鼠标,依次点击“Run As”→“Run Configurations”

Eclipse运行WordCount(详细版)相关连接Eclipse运行WordCount

点击Arguments,输入上一步骤中设置的data.txt路径和程序最终的输出路径,点击“Apply”后点击“Run”开始运行程序

注意:不可再程序执行前在test目录中新建output目录,output目录务必不存在!否则会产生目录已存在的错误!

Eclipse运行WordCount(详细版)相关连接Eclipse运行WordCount
  1. 可能会报出如下错误(若未报该错误,直接跳过此步骤)
Exception in thread "main" java.lang.ExceptionInInitializerError
	at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:438)
	at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:484)
	at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
	at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
	at cn.neu.WordCount.main(WordCount.java:45)
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
	at java.base/java.lang.String.checkBoundsBeginEnd(Unknown Source)
	at java.base/java.lang.String.substring(Unknown Source)
	at org.apache.hadoop.util.Shell.<clinit>(Shell.java:49)
	... 5 more
           

点击

(Shell.java:49)

,进入如下界面,点击Attach Source

Eclipse运行WordCount(详细版)相关连接Eclipse运行WordCount

进入以下界面后,依次点击“External loaction”→“External file”,根据上图中的路径找到sources文件夹,打开后点击hadoop-common-2.6.0-sources.jar,点击“打开”,最后点击“OK”

Eclipse运行WordCount(详细版)相关连接Eclipse运行WordCount
Eclipse运行WordCount(详细版)相关连接Eclipse运行WordCount
Eclipse运行WordCount(详细版)相关连接Eclipse运行WordCount

再次点击

(Shell.java:49)

可查看其源码,定位到第49行,源码如下

private static boolean IS_JAVA7_OR_ABOVE =

System.getProperty(“java.version”).substring(0, 3).compareTo(“1.7”) >= 0;

结合如下错误信息

at java.base/java.lang.String.checkBoundsBeginEnd(Unknown Source)

at java.base/java.lang.String.substring(Unknown Source)

即找不到字符串,因此需要在主函数中添加如下代码

System.setProperty("java.version", "1.8");

,其中后面的数字比1.7大即可

  1. 若程序可以正常运行,等待程序运行完毕后,右击Project Explorer中Hadoop下新建的test目录,点击Refresh刷新,可在其中看到output目录
    Eclipse运行WordCount(详细版)相关连接Eclipse运行WordCount
    双击part-r-0000文件可查看程序运行结果
    Eclipse运行WordCount(详细版)相关连接Eclipse运行WordCount
  2. 若要再次执行,要么在参数配置中更改输出目录,要么删除输出路径下的文件
    Eclipse运行WordCount(详细版)相关连接Eclipse运行WordCount

    有一个一劳永逸的方法,即在程序中主函数略加改动,即每次进行运算前检查输出路径是否存在,若存在则删除输出路径

    改动前

System.setProperty("HADOOP_USER_NAME", "root");
        System.setProperty("java.version", "1.8");
		Configuration conf = new Configuration();
		String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
		if(otherArgs.length != 2){
			System.err.println("Usage WordCount <int> <out>");
			System.exit(2);
		}
           

改动后

System.setProperty("HADOOP_USER_NAME", "root");
		System.setProperty("java.version", "1.8");
		Configuration conf = new Configuration();
		FileSystem fs = FileSystem.get(conf);
		String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
		if(otherArgs.length != 2){
			System.err.println("Usage WordCount <int> <out>");
			System.exit(2);
		}
		Path outPath = new Path(otherArgs[1]);
		if(fs.exists(outPath)) {
			fs.delete(outPath, true);
		}