目的:在Idea上直接調試虛拟機中的HDFS,執行MapReduce,不需要将jar包上傳到hadoop目錄下再運作
配置環境:
虛拟機 hadoop 2.7.1
本地 hadoop 2.7.1
IDEA 版本 2019.3.3
Maven 3.6.3
1. 修改虛拟機hadoop的etc/hadoop/core-site.xml
需要修改為hdfs://IP位址:9000 (主機名也可以,如果使用IP位址建議将IP位址設定為靜态IP)
![](https://img.laitimes.com/img/__Qf2AjLwojIjJCLyojI0JCLiAzNfRHLGZkRGZkRfJ3bs92YsYTMfVmepNHLwkFRPpXVE9UMRpHW4Z0MMBjVtJWd0ckW65UbM5WOHJWa5kHT20ESjBjUIF2X0hXZ0xCMx81dvRWYoNHLrdEZwZ1Rh5WNXp1bwNjW1ZUba9VZwlHdssmch1mclRXY39CXldWYtlWPzNXZj9mcw1ycz9WL49zZuBnL0EDO2QzN0UTMwMTNwAjMwIzLc52YucWbp5GZzNmLn9Gbi1yZtl2Lc9CX6MHc0RHaiojIsJye.png)
2. 使用xftp 等工具将HADOOP etc/hadoop目錄下的core-site.xml和hdfs-site.xml 取出虛拟機備用
3. 在idea建立一個maven項目
等待加載完成出現SUCCESS與src檔案夾
建立檔案夾
依次點選
建立完成的目錄如下:
4. 配置pom.xml檔案
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.4</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.4</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.4</version>
</dependency>
粘貼完畢後,右下角會出現彈窗
點選import Changes,然後等待自動加載
5.在src/main/java目錄下建立package,在package下建立WordCount項目
WordCount代碼如下:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
import java.util.StringTokenizer;
public class WordCount {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
@Override
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer
extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
@Override
public void reduce(Text key, Iterable<IntWritable> values,
Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
6.将之前從虛拟機複制出來的core-site.xml與hdfs-site.xml粘貼到src/main/resources中,并且在該目錄建立log4j.properties檔案
log4j.properties檔案内容(複制粘貼即可)
log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n
7.在WordCount代碼中有兩個路徑變量,要為其設定變量
InputPath是需要處理檔案所在的位置
OutputPath是處理完檔案所在的位置
設定步驟如下:
8.重點的一步(這種方法局限性大,建議使用方法二)
在windows系統環境變量中添加HADOOP_USER_NAME變量,防止通路權限問題報錯(第二種方法在結尾)
添加後需要将IDEA IDE工具重新開機
9.執行結果
第二種方式(建議使用)
在方法開頭添加
Owner是什麼,你就以什麼身份通路HDFS,像我這裡是hadoop,是以我就寫的是System.setProperty(“HADOOP_USER_NAME”,“hadoop”);
//設定用戶端的通路身份,以hadoop身份通路HDFS
System.setProperty("HADOOP_USER_NAME","hadoop");
示例2:
常見錯誤:
未關閉防火牆
未設定以什麼身份通路hdfs
output路徑存在