Hadoop MapReduce之wordcount(词频统计)

1.创建test.log

点击(此处)折叠或打开

[root@sht-sgmhadoopnn-01 mapreduce]# more /tmp/test.log

a a a

abc

我是谁

%……

2.hadoop创建目录及上传

[root@sht-sgmhadoopnn-01 ~]# hadoop fs -mkdir /testdir

16/02/28 19:40:12 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

[root@sht-sgmhadoopnn-01 ~]# hadoop fs -put /tmp/test.log /testdir/

16/02/28 19:40:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

3.查看官方封装好的函数,我们选取wordcount

[root@sht-sgmhadoopnn-01 ~]#cd /hadoop/hadoop-2.7.2/share/hadoop/mapreduce

[root@sht-sgmhadoopnn-01 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.2.jar

An example program must be given as the first argument.

Valid program names are:

aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.

aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.

bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.

dbcount: An example job that count the pageview counts from a database.

distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.

grep: A map/reduce program that counts the matches of a regex in the input.

join: A job that effects a join over sorted, equally partitioned datasets

multifilewc: A job that counts words from several files.

pentomino: A map/reduce tile laying program to find solutions to pentomino problems.

pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.

randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.

randomwriter: A map/reduce program that writes 10GB of random data per node.

secondarysort: An example defining a secondary sort to the reduce.

sort: A map/reduce program that sorts the data written by the random writer.

sudoku: A sudoku solver.

teragen: Generate data for the terasort

terasort: Run the terasort

teravalidate: Checking results of terasort

wordcount: A map/reduce program that counts the words in the input files.

wordmean: A map/reduce program that counts the average length of the words in the input files.

wordmedian: A map/reduce program that counts the median length of the words in the input files.

wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

4.运行wordcount

# hadoop jar hadoop-mapreduce-examples-2.7.2.jar wordcount /testdir /out1

# 官方模板jar包函数输入目录输出目录(未创建)

[root@sht-sgmhadoopnn-01 mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.2.jar wordcount /testdir /out1

16/02/28 19:40:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

16/02/28 19:40:53 INFO input.FileInputFormat: Total input paths to process : 1

16/02/28 19:40:53 INFO mapreduce.JobSubmitter: number of splits:1

16/02/28 19:40:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1456590271264_0002

16/02/28 19:40:54 INFO impl.YarnClientImpl: Submitted application application_1456590271264_0002

16/02/28 19:40:54 INFO mapreduce.Job: The url to track the job: http://sht-sgmhadoopnn-01:8088/proxy/application_1456590271264_0002/

16/02/28 19:40:54 INFO mapreduce.Job: Running job: job_1456590271264_0002

16/02/28 19:41:04 INFO mapreduce.Job: Job job_1456590271264_0002 running in uber mode : false

16/02/28 19:41:04 INFO mapreduce.Job: map 0% reduce 0%

16/02/28 19:41:12 INFO mapreduce.Job: map 100% reduce 0%

16/02/28 19:41:21 INFO mapreduce.Job: map 100% reduce 100%

16/02/28 19:41:22 INFO mapreduce.Job: Job job_1456590271264_0002 completed successfully

16/02/28 19:41:22 INFO mapreduce.Job: Counters: 49

File System Counters

FILE: Number of bytes read=102

FILE: Number of bytes written=244621

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=142

HDFS: Number of bytes written=56

HDFS: Number of read operations=6

HDFS: Number of large read operations=0

HDFS: Number of write operations=2

Job Counters

Launched map tasks=1

Launched reduce tasks=1

Data-local map tasks=1

Total time spent by all maps in occupied slots (ms)=5537

Total time spent by all reduces in occupied slots (ms)=6555

Total time spent by all map tasks (ms)=5537

Total time spent by all reduce tasks (ms)=6555

Total vcore-milliseconds taken by all map tasks=5537

Total vcore-milliseconds taken by all reduce tasks=6555

Total megabyte-milliseconds taken by all map tasks=5669888

Total megabyte-milliseconds taken by all reduce tasks=6712320

Map-Reduce Framework

Map input records=12

Map output records=14

Map output bytes=100

Map output materialized bytes=102

Input split bytes=98

Combine input records=14

Combine output records=10

Reduce input groups=10

Reduce shuffle bytes=102

Reduce input records=10

Reduce output records=10

Spilled Records=20

Shuffled Maps =1

Failed Shuffles=0

Merged Map outputs=1

GC time elapsed (ms)=79

CPU time spent (ms)=2560

Physical memory (bytes) snapshot=445992960

Virtual memory (bytes) snapshot=1775263744

Total committed heap usage (bytes)=306184192

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=44

File Output Format Counters

Bytes Written=56

You have mail in /var/spool/mail/root

[root@sht-sgmhadoopnn-01 mapreduce]#

5.验证wordcount，词频统计

[root@sht-sgmhadoopnn-01 mapreduce]# hadoop fs -ls /out1

16/02/28 19:43:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 2 items

-rw-r--r-- 3 root supergroup 0 2016-02-28 19:41 /out1/_SUCCESS

-rw-r--r-- 3 root supergroup 56 2016-02-28 19:41 /out1/part-r-00000

[root@sht-sgmhadoopnn-01 mapreduce]# hadoop fs -text /out1/part-r-00000

16/02/28 19:43:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

% 1

%…… 1

1 1

2 1

3 1

a 5

abc 1

b 1

v 1

我是谁 1

Hadoop MapReduce之wordcount(词频统计)

继续阅读

Java小案例——随机数猜测随机数猜测

nginx location中斜线的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的简单使用

neo4j之cypher使用文档

Ambari介绍和架构原理

GitHub连夜封杀！这份阿里 10W 字内部 Java 字面试手册到底有多强？

spark/scala关于【资源文件】加载方法概述外部文件加载方案测试资源文件打包入jar包中小结

mybatis_入门程序Mybatis入门

AOP编程_Android优雅权限框架(1)概念基础，2021金三银四前言正文大纲正文

Effective Java 8:通用程序设计

OOM三种类型

工厂模式-三种类型

【递归】高效率求2的n次幂

win10本地scala和spark安装安装scala安装spark

scala (3) Function 和 Method