Hadoop基準測試

Hadoop自帶了幾個基準測試，本文使用的是hadoop-2.6.0

一、Hadoop Test 的測試

[[email protected] hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar

An example program must be given as the first argument.

Valid program names are:

DFSCIOTest: Distributed i/o benchmark of libhdfs.

DistributedFSCheck: Distributed checkup of the file system consistency.

JHLogAnalyzer: Job History Log analyzer.

MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures

SliveTest: HDFS Stress Test and Live Data Verification.

TestDFSIO: Distributed i/o benchmark.

fail: a job that always fails

filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)

largesorter: Large-Sort tester

loadgen: Generic map/reduce load generator

mapredtest: A map/reduce test check.

minicluster: Single process HDFS and MR cluster.

mrbench: A map/reduce benchmark that can create many small jobs

nnbench: A benchmark that stresses the namenode.

sleep: A job that sleeps at each map and reduce task.

testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce

testfilesystem: A test for FileSystem read/write.

testmapredsort: A map/reduce program that validates the map-reduce framework's sort.

testsequencefile: A test for flat files of binary key value pairs.

testsequencefileinputformat: A test for sequence file input format.

testtextinputformat: A test for text input format.

threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill

這些例子從多個角度對Hadoop進行測試，其中 TestDFSIO、mrbench和nnbench是三個廣泛被使用的測試。

1、TestDFSIO 測試

① TestDFSIO write

測試hadoop寫的速度。

TestDFSIO的用法如下：

向HDFS檔案系統中寫入資料，10個檔案，每個檔案10MB，檔案存放到/benchmarks/TestDFSIO/io_data下面。

[[email protected] hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar TestDFSIO -write -nrFiles 10 -size 10MB

跑出來的資料如下圖：

Hadoop基準測試

檢視寫入的結果：

[[email protected] hadoop-2.6.0]# cat TestDFSIO_results.log

----- TestDFSIO ----- : write

Date & time: Fri Sep 23 19:21:01 CST 2016

Number of files: 10

Total MBytes processed: 100.0

Throughput mb/sec: 1.7217037980785785

Average IO rate mb/sec: 1.9971516132354736

IO rate std deviation: 0.9978736646901237

Test exec time sec: 81.711

② TestDFSIO read

測試hadoop讀檔案的速度

從HDFS檔案系統中讀入10個檔案，每個檔案大小為10MB

[[email protected] hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar TestDFSIO -read -nrFiles 10 -size 10

[[email protected] hadoop-2.6.0]# cat TestDFSIO_results.log

----- TestDFSIO ----- : write

Date & time: Fri Sep 23 19:21:01 CST 2016

Number of files: 10

Total MBytes processed: 100.0

Throughput mb/sec: 1.7217037980785785

Average IO rate mb/sec: 1.9971516132354736

IO rate std deviation: 0.9978736646901237

Test exec time sec: 81.711

----- TestDFSIO ----- : read

Date & time: Fri Sep 23 19:37:21 CST 2016

Number of files: 10

Total MBytes processed: 100.0

Throughput mb/sec: 14.85001485001485

Average IO rate mb/sec: 16.221948623657227

IO rate std deviation: 4.983088493832205

Test exec time sec: 50.188

③ 清空測試資料

[[email protected] hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar TestDFSIO -clean

如下圖所示：

Hadoop基準測試

2、nnbench 測試 [NameNode benchmark (nnbench)]

nnbench用于測試NameNode的負載，它會生成很多與HDFS相關的請求，給NameNode施加較大的壓力。

這個測試能在HDFS上建立、讀取、重命名和删除檔案操作。

nnbench 的用法：

[[email protected] hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar nnbench

NameNode Benchmark 0.4

Usage: nnbench <options>

Options:

-operation <Available operations are create_write open_read rename delete. This option is mandatory>

* NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations.

-maps <number of maps. default is 1. This is not mandatory>

-reduces <number of reduces. default is 1. This is not mandatory>

-startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time>. default is launch time + 2 mins. This is not mandatory

-blockSize <Block size in bytes. default is 1. This is not mandatory>

-bytesToWrite <Bytes to write. default is 0. This is not mandatory>

-bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory>

-numberOfFiles <number of files to create. default is 1. This is not mandatory>

-replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory>

-baseDir <base DFS path. default is /becnhmarks/NNBench. This is not mandatory>

-readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory>

-help: Display the help statement

以下例子使用10個mapper和5個reducer來建立1000個檔案

[[email protected] hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar nnbench -operation create_write -maps 10 -reduces 5 -numberOfFiles 1000 -replicationFactorPerFile 3 -readFileAfterOpen true

3、mrbench測試[MapReduce benchmark (mrbench)]

mrbench會多次重複執行一個小作業，用于檢查在機群上小作業的運作是否可重複以及運作是否高效。

[[email protected] hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar mrbench --help

MRBenchmark.0.0.2

Usage: mrbench [-baseDir <base DFS path for output/input, default is /benchmarks/MRBench>] [-jar <local path to job jar file containing Mapper and Reducer implementations, default is current jar file>] [-numRuns <number of times to run the job, default is 1>] [-maps <number of maps for each run, default is 2>] [-reduces <number of reduces for each run, default is 1>] [-inputLines <number of input lines to generate, default is 1>] [-inputType <type of input to generate, one of ascending (default), descending, random>] [-verbose]

下面的例子會運作一個小作業50次：

[[email protected] hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.0-tests.jar mrbench -numRuns 50

這樣會運作50次。

二、Hadoop Examples 的測試

[[email protected] hadoop-2.6.0]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar

An example program must be given as the first argument.

Valid program names are:

aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.

aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.

bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.

dbcount: An example job that count the pageview counts from a database.

distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.

grep: A map/reduce program that counts the matches of a regex in the input.

join: A job that effects a join over sorted, equally partitioned datasets

multifilewc: A job that counts words from several files.

pentomino: A map/reduce tile laying program to find solutions to pentomino problems.

pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.

randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.

randomwriter: A map/reduce program that writes 10GB of random data per node.

secondarysort: An example defining a secondary sort to the reduce.

sort: A map/reduce program that sorts the data written by the random writer.

sudoku: A sudoku solver.

teragen: Generate data for the terasort

terasort: Run the terasort

teravalidate: Checking results of terasort

wordcount: A map/reduce program that counts the words in the input files.

wordmean: A map/reduce program that counts the average length of the words in the input files.

wordmedian: A map/reduce program that counts the median length of the words in the input files.

wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

最常用的就是 wordcount。

Hadoop基準測試

繼續閱讀

大資料技術原理與應用（最後三天備考了！！！）

Hadoop FSDataInputStream 和FSDataOutputStream 用法

Windows下Cygwin環境的Hadoop安裝（3）- 運作hadoop中的wordcount執行個體遇到的問題和解決方法

MapReduce運作Wordcount時一直卡在INFO mapreduce.Job: Running job，web檢視一直處于accepted階段

ubuntu hadoop2.6.1，terminal下運作wordcount

MapReduce(一)：入門級程式wordcount及其分析

hadoop操作遇到的問題問題一：輸出檔案已存在

Hadoop之運作wordcount

jdk1.7+Eclipse+Maven3.5+Hadoop2.7.3建構hadoop項目

Eclipse運作WordCount（詳細版）相關連接配接Eclipse運作WordCount

hadoop 用MR實作join操作

Centos7 下 Hadoop 2.6.4 分布式叢集環境搭建摘要叢集準備安裝JDK 安裝 Hadoop 2.6.4 部署 slaver1-slaver4 啟動 hadoop 叢集成功了

MapReduce的幾個企業級經典面試案例MapReduce的幾個企業級經典面試案例

ubuntu14.04下安裝hbse1.0.1.1

User Defined Hadoop DataType

Ambari介紹和架構原理