天天看點

Hadoop之MapReduce01【自帶wordcount案例】

一、什麼是mapreduce

元件 說明

HDFS 分布式存儲系統

MapReduce 分布式計算系統

YARN hadoop 的資源排程系統

Common 三大[HDFS,Mapreduce,Yarn]元件的底層支撐元件,

主要提供基礎工具包和 RPC 架構等

 Mapreduce 是一個分布式運算程式的程式設計架構,是使用者開發“基于 hadoop 的資料分析應用”的核心架構,Mapreduce 核心功能是将使用者編寫的業務邏輯代碼和自帶預設元件整合成一個完整的 分布式運算程式,并發運作在一個 hadoop 叢集上.

二、為什麼需要mapreduce

   海量資料在單機上處理因為硬體資源限制,無法勝任

   而一旦将單機版程式擴充到叢集來分布式運作,将極大增加程式的複雜度和開發難度

   引入 MapReduce 架構後,開發人員可以将絕大部分工作集中在業務邏輯的開發上,而将 分布式計算中的複雜性交由架構來處理

三、mapreduce程式運作執行個體

 在 MapReduce 元件裡, 官方給我們提供了一些樣例程式,其中非常有名的就是 wordcount 和 pi程式。這些 MapReduce程式的代碼都在hadoop-mapreduce-examples-2.6.4.jar包裡,這個jar包在 hadoop安裝目錄下的/share/hadoop/mapreduce/目錄裡

Hadoop之MapReduce01【自帶wordcount案例】

wordcount案例

 執行wordcount案例來統計檔案中單詞出現的次數.

1.準備資料

Hadoop之MapReduce01【自帶wordcount案例】

2.HDFS中建立對應的檔案夾

 在hdfs中建立檔案夾存儲需要統計的檔案,及建立輸出檔案的路徑

hadoop fs -mkdir -p /wordcount/input
 hadoop fs -put a.txt /wordcount/input/      
Hadoop之MapReduce01【自帶wordcount案例】

3.啟動yarn

 要做分布式運算必須要啟動yarn

start-yarn.sh      
Hadoop之MapReduce01【自帶wordcount案例】

4.執行程式

hadoop jar hadoop-mapreduce-examples-2.6.5.jar wordcount /wordcount/input/ /wordcount/output      

輸出

[root@hadoop-node01 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.6.5.jar wordcount /wordcount/input/ /wordcount/output
19/04/02 23:06:03 INFO client.RMProxy: Connecting to ResourceManager at hadoop-node01/192.168.88.61:8032
19/04/02 23:06:07 INFO input.FileInputFormat: Total input paths to process : 1
19/04/02 23:06:09 INFO mapreduce.JobSubmitter: number of splits:1
19/04/02 23:06:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1554217397936_0001
19/04/02 23:06:10 INFO impl.YarnClientImpl: Submitted application application_1554217397936_0001
19/04/02 23:06:11 INFO mapreduce.Job: The url to track the job: http://hadoop-node01:8088/proxy/application_1554217397936_0001/
19/04/02 23:06:11 INFO mapreduce.Job: Running job: job_1554217397936_0001
19/04/02 23:06:30 INFO mapreduce.Job: Job job_1554217397936_0001 running in uber mode : false
19/04/02 23:06:30 INFO mapreduce.Job:  map 0% reduce 0%
19/04/02 23:06:46 INFO mapreduce.Job:  map 100% reduce 0%
19/04/02 23:06:57 INFO mapreduce.Job:  map 100% reduce 100%
19/04/02 23:06:58 INFO mapreduce.Job: Job job_1554217397936_0001 completed successfully
19/04/02 23:06:59 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=133
        FILE: Number of bytes written=214969
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=240
        HDFS: Number of bytes written=79
        HDFS: Number of read operations=6
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Launched reduce tasks=1
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=11386
        Total time spent by all reduces in occupied slots (ms)=9511
        Total time spent by all map tasks (ms)=11386
        Total time spent by all reduce tasks (ms)=9511
        Total vcore-milliseconds taken by all map tasks=11386
        Total vcore-milliseconds taken by all reduce tasks=9511
        Total megabyte-milliseconds taken by all map tasks=11659264
        Total megabyte-milliseconds taken by all reduce tasks=9739264
    Map-Reduce Framework
        Map input records=24
        Map output records=27
        Map output bytes=236
        Map output materialized bytes=133
        Input split bytes=112
        Combine input records=27
        Combine output records=12
        Reduce input groups=12
        Reduce shuffle bytes=133
        Reduce input records=12
        Reduce output records=12
        Spilled Records=24
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=338
        CPU time spent (ms)=2600
        Physical memory (bytes) snapshot=283582464
        Virtual memory (bytes) snapshot=4125011968
        Total committed heap usage (bytes)=137363456
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=128
    File Output Format Counters 
        Bytes Written=79      

執行成功,檢視結果

Hadoop之MapReduce01【自帶wordcount案例】
[root@hadoop-node01 mapreduce]# hadoop fs -cat /wordcount/output/part-r-00000
1   1
2   1
3   1
a   4
b   2
c   1
hadoop  3
hdfs    2
hello   2
java    7
mapreduce   1
wordcount   2      

注意:輸出的目錄不能存在。如果存在會爆如下錯誤。

Hadoop之MapReduce01【自帶wordcount案例】

源碼内容可以自行觀看,下篇介紹手動實作wordcount案例~