hadoop 測試第一個mapreduce程式

說明：測試hadoop自帶的執行個體 wordcount程式（此程式統計每個單詞在檔案中出現的次數）

2.6.0版本jar程式的路徑是

/usr/local/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar

一、在本地建立目錄和檔案

建立目錄：

mkdir /home/hadoop/input

cd /home/hadoop/input

建立檔案：

touch wordcount1.txt

touch wordcount2.txt

二、添加内容

echo "Hello World" > wordcount1.txt

echo "Hello Hadoop" > wordcount2.txt

三、在hdfs上建立input目錄

hadoop fs -mkdir /input

四、拷貝檔案到/input目錄

hadoop fs -put /home/hadoop/input/* /input

五、執行程式

hadoop jar /usr/local/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /input /output

說明：wordcount為程式的主類名， /input 輸入目錄 /output 輸出目錄（輸出目錄不能存在）

六、執行過程資訊

15/04/14 15:55:03 INFO client.RMProxy: Connecting to ResourceManager at hdnn140/192.168.152.140:8032

15/04/14 15:55:04 INFO input.FileInputFormat: Total input paths to process : 2

15/04/14 15:55:04 INFO mapreduce.JobSubmitter: number of splits:2

15/04/14 15:55:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1428996061278_0002

15/04/14 15:55:05 INFO impl.YarnClientImpl: Submitted application application_1428996061278_0002

15/04/14 15:55:05 INFO mapreduce.Job: The url to track the job: http://hdnn140:8088/proxy/application_1428996061278_0002/

15/04/14 15:55:05 INFO mapreduce.Job: Running job: job_1428996061278_0002

15/04/14 15:55:17 INFO mapreduce.Job: Job job_1428996061278_0002 running in uber mode : false

15/04/14 15:55:17 INFO mapreduce.Job: map 0% reduce 0%

15/04/14 15:56:00 INFO mapreduce.Job: map 100% reduce 0%

15/04/14 15:56:10 INFO mapreduce.Job: map 100% reduce 100%

15/04/14 15:56:11 INFO mapreduce.Job: Job job_1428996061278_0002 completed successfully

15/04/14 15:56:11 INFO mapreduce.Job: Counters: 49

File System Counters

FILE: Number of bytes read=55

FILE: Number of bytes written=316738

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=235

HDFS: Number of bytes written=25

HDFS: Number of read operations=9

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Launched map tasks=2

Launched reduce tasks=1

Data-local map tasks=2

Total time spent by all maps in occupied slots (ms)=83088

Total time spent by all reduces in occupied slots (ms)=7098

Total time spent by all map tasks (ms)=83088

Total time spent by all reduce tasks (ms)=7098

Total vcore-seconds taken by all map tasks=83088

Total vcore-seconds taken by all reduce tasks=7098

Total megabyte-seconds taken by all map tasks=85082112

Total megabyte-seconds taken by all reduce tasks=7268352

Map-Reduce Framework

Map input records=2

Map output records=4

Map output bytes=41

Map output materialized bytes=61

Input split bytes=210

Combine input records=4

Combine output records=4

Reduce input groups=3

Reduce shuffle bytes=61

Reduce input records=4

Reduce output records=3

Spilled Records=8

Shuffled Maps =2

Failed Shuffles=0

Merged Map outputs=2

GC time elapsed (ms)=1649

CPU time spent (ms)=4260

Physical memory (bytes) snapshot=280866816

Virtual memory (bytes) snapshot=2578739200

Total committed heap usage (bytes)=244625408

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=25

File Output Format Counters

Bytes Written=25

七、完成後檢視輸出目錄

hadoop fs -ls /output

八、檢視輸出結果

hadoop fs -cat /output/part-r-00000

九、完成

本文轉自 yntmdr 51CTO部落格，原文連結：http://blog.51cto.com/yntmdr/1632323，如需轉載請自行聯系原作者

hadoop 測試第一個mapreduce程式

繼續閱讀

Java小案例——随機數猜測随機數猜測

nginx location中斜線的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

Ambari介紹和架構原理

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method