Hadoop之MapReduce01【自帶wordcount案例】

一、什麼是mapreduce

元件說明

HDFS 分布式存儲系統

MapReduce 分布式計算系統

YARN hadoop 的資源排程系統

Common 三大[HDFS,Mapreduce,Yarn]元件的底層支撐元件，

主要提供基礎工具包和 RPC 架構等

Mapreduce 是一個分布式運算程式的程式設計架構，是使用者開發“基于 hadoop 的資料分析應用”的核心架構,Mapreduce 核心功能是将使用者編寫的業務邏輯代碼和自帶預設元件整合成一個完整的分布式運算程式，并發運作在一個 hadoop 叢集上.

二、為什麼需要mapreduce

海量資料在單機上處理因為硬體資源限制，無法勝任

而一旦将單機版程式擴充到叢集來分布式運作，将極大增加程式的複雜度和開發難度

引入 MapReduce 架構後，開發人員可以将絕大部分工作集中在業務邏輯的開發上，而将分布式計算中的複雜性交由架構來處理

三、mapreduce程式運作執行個體

在 MapReduce 元件裡，官方給我們提供了一些樣例程式，其中非常有名的就是 wordcount 和 pi程式。這些 MapReduce程式的代碼都在hadoop-mapreduce-examples-2.6.4.jar包裡,這個jar包在 hadoop安裝目錄下的/share/hadoop/mapreduce/目錄裡

wordcount案例

執行wordcount案例來統計檔案中單詞出現的次數.

1.準備資料

2.HDFS中建立對應的檔案夾

在hdfs中建立檔案夾存儲需要統計的檔案，及建立輸出檔案的路徑

hadoop fs -mkdir -p /wordcount/input
 hadoop fs -put a.txt /wordcount/input/

3.啟動yarn

要做分布式運算必須要啟動yarn

start-yarn.sh

4.執行程式

hadoop jar hadoop-mapreduce-examples-2.6.5.jar wordcount /wordcount/input/ /wordcount/output

輸出

[root@hadoop-node01 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.6.5.jar wordcount /wordcount/input/ /wordcount/output
19/04/02 23:06:03 INFO client.RMProxy: Connecting to ResourceManager at hadoop-node01/192.168.88.61:8032
19/04/02 23:06:07 INFO input.FileInputFormat: Total input paths to process : 1
19/04/02 23:06:09 INFO mapreduce.JobSubmitter: number of splits:1
19/04/02 23:06:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1554217397936_0001
19/04/02 23:06:10 INFO impl.YarnClientImpl: Submitted application application_1554217397936_0001
19/04/02 23:06:11 INFO mapreduce.Job: The url to track the job: http://hadoop-node01:8088/proxy/application_1554217397936_0001/
19/04/02 23:06:11 INFO mapreduce.Job: Running job: job_1554217397936_0001
19/04/02 23:06:30 INFO mapreduce.Job: Job job_1554217397936_0001 running in uber mode : false
19/04/02 23:06:30 INFO mapreduce.Job:  map 0% reduce 0%
19/04/02 23:06:46 INFO mapreduce.Job:  map 100% reduce 0%
19/04/02 23:06:57 INFO mapreduce.Job:  map 100% reduce 100%
19/04/02 23:06:58 INFO mapreduce.Job: Job job_1554217397936_0001 completed successfully
19/04/02 23:06:59 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=133
        FILE: Number of bytes written=214969
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=240
        HDFS: Number of bytes written=79
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=11386
        Total time spent by all reduces in occupied slots (ms)=9511
        Total time spent by all map tasks (ms)=11386
        Total time spent by all reduce tasks (ms)=9511
        Total vcore-milliseconds taken by all map tasks=11386
        Total vcore-milliseconds taken by all reduce tasks=9511
        Total megabyte-milliseconds taken by all map tasks=11659264
        Total megabyte-milliseconds taken by all reduce tasks=9739264
    Map-Reduce Framework
        Map input records=24
        Map output records=27
        Map output bytes=236
        Map output materialized bytes=133
        Input split bytes=112
        Combine input records=27
        Combine output records=12
        Reduce input groups=12
        Reduce shuffle bytes=133
        Reduce input records=12
        Reduce output records=12
        Spilled Records=24
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=338
        CPU time spent (ms)=2600
        Physical memory (bytes) snapshot=283582464
        Virtual memory (bytes) snapshot=4125011968
        Total committed heap usage (bytes)=137363456
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=128
    File Output Format Counters 
        Bytes Written=79

執行成功，檢視結果

[root@hadoop-node01 mapreduce]# hadoop fs -cat /wordcount/output/part-r-00000
1   1
2   1
3   1
a   4
b   2
c   1
hadoop  3
hdfs    2
hello   2
java    7
mapreduce   1
wordcount   2

注意：輸出的目錄不能存在。如果存在會爆如下錯誤。

源碼内容可以自行觀看，下篇介紹手動實作wordcount案例~

Hadoop之MapReduce01【自帶wordcount案例】

繼續閱讀

Java小案例——随機數猜測随機數猜測

nginx location中斜線的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

Ambari介紹和架構原理

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method