運作hadoop基準測試

轉自：http://blog.csdn.net/azhao_dn/article/details/6930909

由于需要為hadoop叢集采購新的伺服器，需要對伺服器在hadoop環境下的性能進行測試，是以特地整理了一下hadoop叢集自帶的測試用例：

```
bin/hadoop jar hadoop-*test*.jar      
```
運作上述指令，可以得到hadoop-*test*.jar自帶的測試程式 [html] view plain copy
1. An example program must be given as the first argument.
2. Valid program names are:
3. DFSCIOTest: Distributed i/o benchmark of libhdfs.
4. DistributedFSCheck: Distributed checkup of the file system consistency.
5. MRReliabilityTest: A program that tests the reliability of the MR framework by injecting faults/failures
6. TestDFSIO: Distributed i/o benchmark.
7. dfsthroughput: measure hdfs throughput
8. filebench: Benchmark SequenceFile(Input|Output)Format (block,record compressed and uncompressed), Text(Input|Output)Format (compressed and uncompressed)
9. loadgen: Generic map/reduce load generator
10. mapredtest: A map/reduce test check.
11. mrbench: A map/reduce benchmark that can create many small jobs
12. nnbench: A benchmark that stresses the namenode.
13. testarrayfile: A test for flat files of binary key/value pairs.
14. testbigmapoutput: A map/reduce program that works on a very big non-splittable file and does identity map/reduce
15. testfilesystem: A test for FileSystem read/write.
16. testipc: A test for ipc.
17. testmapredsort: A map/reduce program that validates the map-reduce framework's sort.
18. testrpc: A test for rpc.
19. testsequencefile: A test for flat files of binary key value pairs.
20. testsequencefileinputformat: A test for sequence file input format.
21. testsetfile: A test for flat files of binary key/value pairs.
22. testtextinputformat: A test for text input format.
23. threadedmapbench: A map/reduce benchmark that compares the performance of maps with multiple spills over maps with 1 spill
其中最常用到的是DFSCIOTest，DFSCIOTest的指令參數如下： [html] view plain copy
1. $ bin/hadoop jar hadoop-*test*.jar TestDFSIO
2. TestDFSIO.0.0.4
3. Usage: TestDFSIO -read | -write | -clean [-nrFiles N] [-fileSize MB] [-resFile resultFileName] [-bufferSize Bytes]
```
hadoop jar hadoop-*test*.jar TestDFSIO -write -nrFiles 10 -fileSize 1000        hadoop jar hadoop-*test*.jar TestDFSIO -read -nrFiles 10 -fileSize 1000      
    hadoop jar hadoop-*test*.jar TestDFSIO -clean      
   
```
[html] view plain copy
1. bin/hadoop jar hadoop-*examples*.jar
運作上述指令，可以得到hadoop-*example*.jar自帶的測試程式 [html] view plain copy
1. An example program must be given as the first argument.
2. Valid program names are:
3. aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
4. aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
5. dbcount: An example job that count the pageview counts from a database.
6. grep: A map/reduce program that counts the matches of a regex in the input.
7. join: A job that effects a join over sorted, equally partitioned datasets
8. multifilewc: A job that counts words from several files.
9. pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
10. pi: A map/reduce program that estimates Pi using monte-carlo method.
11. randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
12. randomwriter: A map/reduce program that writes 10GB of random data per node.
13. secondarysort: An example defining a secondary sort to the reduce.
14. sleep: A job that sleeps at each map and reduce task.
15. sort: A map/reduce program that sorts the data written by the random writer.
16. sudoku: A sudoku solver.
17. teragen: Generate data for the terasort
18. terasort: Run the terasort
19. teravalidate: Checking results of terasort
20. wordcount: A map/reduce program that counts the words in the input files.
其中最常用的是teragen/terasort/teravalidate，一個完整的terasort測試由三個步驟組成：1）teragen産生資料；2）terasort執行排序；3）teravalidate驗證排序結果。其運作指令參數如下：
```
hadoop jar hadoop-*examples*.jar teragen <number of 100-byte rows> <output dir>       hadoop jar hadoop-*examples*.jar terasort <input dir> <output dir>       hadoop jar hadoop-*examples*.jar teravalidate <terasort output dir (= input data)> <teravalidate output dir>      
   
   
```
teravalidate執行驗證操作時會輸出排序錯誤的key，當輸出結果為空時，表示排序正确
NameNode基準測試nnbench [html] view plain copy
1. $ bin/hadoop jar hadoop-*test*.jar nnbench
2. NameNode Benchmark 0.4
3. Usage: nnbench <options>
4. Options:
5. -operation <Available operations are create_write open_read rename delete. This option is mandatory>
6. * NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations.
7. -maps <number of maps. default is 1. This is not mandatory>
8. -reduces <number of reduces. default is 1. This is not mandatory>
9. -startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time>. default is launch time + 2 mins. This is not mandatory
10. -blockSize <Block size in bytes. default is 1. This is not mandatory>
11. -bytesToWrite <Bytes to write. default is 0. This is not mandatory>
12. -bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory>
13. -numberOfFiles <number of files to create. default is 1. This is not mandatory>
14. -replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory>
15. -baseDir <base DFS path. default is /becnhmarks/NNBench. This is not mandatory>
16. -readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory>
17. -help: Display the help statement
運作案例：
```
$ hadoop jar hadoop-*test*.jar nnbench -operation create_write \ -maps 12 -reduces 6 -blockSize 1 -bytesToWrite 0 -numberOfFiles 1000 \ -replicationFactorPerFile 3 -readFileAfterOpen true \ -baseDir /benchmarks/NNBench-`hostname -s`      
```
MapRed基準測試mrbench [html] view plain copy
1. bin/hadoop jar hadoop-*test*.jar nnbench --help
2. NameNode Benchmark 0.4
3. Usage: nnbench <options>
4. Options:
5. -operation <Available operations are create_write open_read rename delete. This option is mandatory>
6. * NOTE: The open_read, rename and delete operations assume that the files they operate on, are already available. The create_write operation must be run before running the other operations.
7. -maps <number of maps. default is 1. This is not mandatory>
8. -reduces <number of reduces. default is 1. This is not mandatory>
9. -startTime <time to start, given in seconds from the epoch. Make sure this is far enough into the future, so all maps (operations) will start at the same time>. default is launch time + 2 mins. This is not mandatory
10. -blockSize <Block size in bytes. default is 1. This is not mandatory>
11. -bytesToWrite <Bytes to write. default is 0. This is not mandatory>
12. -bytesPerChecksum <Bytes per checksum for the files. default is 1. This is not mandatory>
13. -numberOfFiles <number of files to create. default is 1. This is not mandatory>
14. -replicationFactorPerFile <Replication factor for the files. default is 1. This is not mandatory>
15. -baseDir <base DFS path. default is /becnhmarks/NNBench. This is not mandatory>
16. -readFileAfterOpen <true or false. if true, it reads the file and reports the average time to read. This is valid with the open_read operation. default is false. This is not mandatory>
17. -help: Display the help statement
gridmix測試：gridmix測試是将hadoop自帶基準測試進一步打包，一次運作所有測試 [html] view plain copy
1. 1）編譯：<pre name="code" class="html">cd src/benchmarks/gridmix2
2. ant
2）修改配置檔案：vi gridmix-env-2 [html] view plain copy
1. export HADOOP_INSTALL_HOME=/home/test/hadoop
2. export HADOOP_VERSION=hadoop-0.20.203.0
3. export HADOOP_HOME=${HADOOP_INSTALL_HOME}/${HADOOP_VERSION}
4. export HADOOP_CONF_DIR=$HADOOP_HOME/conf
5. export USE_REAL_DATASET=
6. export APP_JAR=${HADOOP_HOME}/hadoop-core-0.20.203.0.jar
7. export EXAMPLE_JAR=${HADOOP_HOME}/hadoop-examples-0.20.203.0.jar
8. export STREAMING_JAR=${HADOOP_HOME}/contrib/streaming/hadoop-streaming-0.20.203.0.jar
9. <pre name="code" class="html">3）産生測試資料：sh generateGridmix2data.sh
4）運作測試： [html] view plain copy
1. $ chmod +x rungridmix_2
2. $ ./rungridmix_2
參考資料：

[html] view plain copy
1. 1.<a href="http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/">Benchmarking and Stress Testing an Hadoop Cluster with TeraSort, TestDFSIO & Co.</a>
2. 2.<a href="http://adaishu.blog.163.com/blog/static/17583128620114218589154/">Hadoop的Gridmix2基準測試點</a>
3. 3.<a href="http://dongxicheng.org/mapreduce/hadoop-gridmix-benchmark/">Hadoop Gridmix基準測試</a>
4. 4<a href="http://blog.csdn.net/dahaifeiyu/article/details/6220174">.Hadoop 叢集的基準測試</a>

運作hadoop基準測試

繼續閱讀

hadoop 用MR實作join操作

Centos7 下 Hadoop 2.6.4 分布式叢集環境搭建摘要叢集準備安裝JDK 安裝 Hadoop 2.6.4 部署 slaver1-slaver4 啟動 hadoop 叢集成功了

CSS之折疊菜單

web開發之前後端渲染

CentOS7下TestLink環境的部署

軟體測試基礎_對應TestLink整理的測試計劃流程步驟

MapReduce的幾個企業級經典面試案例MapReduce的幾個企業級經典面試案例

七牛雲-C#SDK-上傳-前期準備

403 Forbidden，You don't have permission to access / on this server.Forbidden

C++ 第十五周報告1--《冒泡法排序》

[轉]iOS微信小視訊優化心得

ubuntu14.04下安裝hbse1.0.1.1

User Defined Hadoop DataType

Ambari介紹和架構原理

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

測試面試題整理