文章目錄

一、MapReduce
二、MapReduce開發環境搭建
- 2.1、Maven環境
- 2.2、手動導入Jar包
三、MapReduce單詞計數源碼分析
- 3.1、打開WordCount.java
- 3.2、源碼分析
- - 3.2.1、MapReduce單詞計數源碼 : Map任務
  - 3.2.2、MapReduce單詞計數源碼 : Reduce任務
  - 3.2.3、MapReduce單詞計數源碼 : main 函數
四、MapReduce API介紹
- 4.1、MapReduce程式子產品 : Main 函數
- 4.2、MapReduce程式子產品： Mapper
- 4.3、MapReduce程式子產品： Reducer
五、MapReduce執行個體
- 5.1、流程（Mapper、Reducer、Main、打包運作）
- 5.2、執行個體1：按日期通路統計次數:
- 5.3、執行個體2：按使用者通路次數排序

一、MapReduce

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

MapReduce是Google提出的一個軟體架構，用于大規模資料集（大于1TB）的并行運算。概念“Map（映射）”和“Reduce（歸納）”，及他們的主要思想，都是從函數式程式設計語言借來的，還有從矢量程式設計語言借來的特性。

目前的軟體實作是指定一個Map（映射）函數，用來把一組鍵值對映射成一組新的鍵值對，指定并發的Reduce（歸納）函數，用來保證所有映射的鍵值對中的每一個共享相同的鍵組。

二、MapReduce開發環境搭建

環境準備： Java， Intellij IDEA， Maven

開發環境搭建方式

java安裝連結及步驟：https://www.cnblogs.com/de-ming/p/13909440.html

2.1、Maven環境

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

添加依賴

https://search.maven.org/artifact/org.apache.hadoop/hadoop-client/3.1.4/jar

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

添加源碼

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

2.2、手動導入Jar包

Hadoop安裝包連結：https://pan.baidu.com/s/1teHwnBH2Qm6F7iWZ3q-hSQ

提取碼：cgnb

建立一個java工程

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

然後，搜JobClient.class，點選’Choose Sources’

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

這樣就OK了，可以看到JobClient.java

三、MapReduce單詞計數源碼分析

3.1、打開WordCount.java

打開：https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-mapreduce-examples/3.1.4，複制Maven裡面的内容

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

粘貼到源碼

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

搜尋WordCount

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

3.2、源碼分析

3.2.1、MapReduce單詞計數源碼 : Map任務

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

3.2.2、MapReduce單詞計數源碼 : Reduce任務

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

3.2.3、MapReduce單詞計數源碼 : main 函數

設定必要參數及組裝MapReduce程式

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

四、MapReduce API介紹

一般MapReduce都是由Mapper， Reducer 及main 函數組成。
Mapper程式一般完成鍵值對映射操作;
Reducer 程式一般完成鍵值對聚合操作;
Main函數則負責組裝Mapper，Reducer及必要的配置;
高階程式設計還涉及到設定輸入輸出檔案格式、設定Combiner、Partitioner優化程式等;

4.1、MapReduce程式子產品 : Main 函數

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

4.2、MapReduce程式子產品： Mapper

org.apache.hadoop.mapreduce.Mapper

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

4.3、MapReduce程式子產品： Reducer

org.apache.hadoop.mapreduce.Reducer

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

五、MapReduce執行個體

5.1、流程（Mapper、Reducer、Main、打包運作）

參考WordCount程式，修改Mapper;
直接複制 Reducer程式；
直接複制Main函數，并做相應修改;
編譯打包 ;
上傳Jar包;
上傳資料;
運作程式;
檢視運作結果;

5.2、執行個體1：按日期通路統計次數:

1、參考WordCount程式，修改Mapper;

（這裡建立一個java程式，然後把下面(1、2、3步代碼)複制到類裡）

public static class SpiltMapper
            extends Mapper<Object, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();
        //value: email_address | date
        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {
            String[] data = value.toString().split("\\|",-1);  //
            word.set(data[1]);   //
            context.write(word, one);
        }
    }

2、直接複制 Reducer程式；

public static class IntSumReducer
            extends Reducer<Text,IntWritable,Text,IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context
        ) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

3、直接複制Main函數，并做相應修改;

public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length < 2) {
            System.err.println("Usage: wordcount <in> [<in>...] <out>");
            System.exit(2);
        }
        Job job = Job.getInstance(conf, "word count");
        job.setJarByClass(CountByDate.class);   //我們的主類是CountByDate
        job.setMapperClass(SpiltMapper.class);  //mapper：我們修改為SpiltMapper
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        for (int i = 0; i < otherArgs.length - 1; ++i) {
            FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
        }
        FileOutputFormat.setOutputPath(job,
                new Path(otherArgs[otherArgs.length - 1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }

4、編譯打包 (jar打包)

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

build出現錯誤及解決辦法：

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

完成

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

5/6、上傳jar包&資料

email_log_with_date.txt資料包連結：https://pan.baidu.com/s/1HfwHCfmvVdQpuL-MPtpAng

提取碼：cgnb

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

上傳資料包(注意開啟hdfs)：

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

上傳OK（浏覽器：

master:50070

檢視）

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

7、運作程式

(注意開啟yarn)

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

上傳完成後：

(

master:8088

)

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

8、檢視結果

(

master:50070

)

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

5.3、執行個體2：按使用者通路次數排序

Mapper、Reducer、Main程式

SortByCountFirst.Mapper

package demo;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

import java.io.IOException;

public class SortByCountFirst {
    //1、修改Mapper
    public static class SpiltMapper
            extends Mapper<Object, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();
        //value: email_address | date
        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {
            String[] data = value.toString().split("\\|",-1);
            word.set(data[0]);
            context.write(word, one);
        }
    }

    //2、直接複制 Reducer程式，不用修改
    public static class IntSumReducer
            extends Reducer<Text,IntWritable,Text,IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context
        ) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    //3、直接複制Main函數，并做相應修改;
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length < 2) {
            System.err.println("Usage: demo.SortByCountFirst <in> [<in>...] <out>");
            System.exit(2);
        }
        Job job = Job.getInstance(conf, "sort by count first ");
        job.setJarByClass(SortByCountFirst.class);   //我們的主類是CountByDate
        job.setMapperClass(SpiltMapper.class);  //mapper：我們修改為SpiltMapper
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        for (int i = 0; i < otherArgs.length - 1; ++i) {
            FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
        }
        FileOutputFormat.setOutputPath(job,
                new Path(otherArgs[otherArgs.length - 1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

SortByCountSecond.Mapper

package demo;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

import java.io.IOException;

public class SortByCountSecond {
    //1、修改Mapper
    public static class SpiltMapper
            extends Mapper<Object, Text, IntWritable, Text> {

        private IntWritable count = new IntWritable(1);
        private Text word = new Text();
        //value: email_address \t count
        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {
            String[] data = value.toString().split("\t",-1);
            word.set(data[0]);
            count.set(Integer.parseInt(data[1]));
            context.write(count,word);
        }
    }

    //2、直接複制 Reducer程式，不用修改
    public static class ReverseReducer
            extends Reducer<IntWritable,Text,Text,IntWritable> {

        public void reduce(IntWritable key, Iterable<Text> values,
                           Context context
        ) throws IOException, InterruptedException {
            for (Text val : values) {
                context.write(val,key);
            }
        }
    }

    //3、直接複制Main函數，并做相應修改;
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length < 2) {
            System.err.println("Usage: demo.SortByCountFirst <in> [<in>...] <out>");
            System.exit(2);
        }
        Job job = Job.getInstance(conf, "sort by count first ");
        job.setJarByClass(SortByCountSecond.class);   //我們的主類是CountByDate
        job.setMapperClass(SpiltMapper.class);  //mapper：我們修改為SpiltMapper
//        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(ReverseReducer.class);
        job.setMapOutputKeyClass(IntWritable.class);
        job.setMapOutputValueClass(Text.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        for (int i = 0; i < otherArgs.length - 1; ++i) {
            FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
        }
        FileOutputFormat.setOutputPath(job,
                new Path(otherArgs[otherArgs.length - 1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

然後打包上傳

yarn jar sortbycount.jar demo.SortByCountSecond -Dmapreduce.job.queuename=prod email_log_with_date.txt sortbycountfirst_output00

yarn jar sortbycount.jar demo.SortByCountSecond -Dmapreduce.job.queuename=prod email_log_with_date.txt sortbycountfirst_output00 sortbycountsecond_output00

學習筆記Hadoop（五）—— MapReduce開發入門一、MapReduce二、MapReduce開發環境搭建三、MapReduce單詞計數源碼分析四、MapReduce API介紹五、MapReduce執行個體

文章目錄

一、MapReduce

二、MapReduce開發環境搭建

2.1、Maven環境

2.2、手動導入Jar包

三、MapReduce單詞計數源碼分析

3.1、打開WordCount.java

3.2、源碼分析

3.2.1、MapReduce單詞計數源碼 : Map任務

3.2.2、MapReduce單詞計數源碼 : Reduce任務

3.2.3、MapReduce單詞計數源碼 : main 函數

四、MapReduce API介紹

4.1、MapReduce程式子產品 : Main 函數

4.2、MapReduce程式子產品： Mapper

4.3、MapReduce程式子產品： Reducer

五、MapReduce執行個體

5.1、流程（Mapper、Reducer、Main、打包運作）

5.2、執行個體1：按日期通路統計次數:

5.3、執行個體2：按使用者通路次數排序

繼續閱讀

nginx location中斜線的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

Ambari介紹和架構原理

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

NOSQL安全攻擊

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method