Hadoop目前人氣超旺,返璞歸真的KV理念讓人們再一次換一個角度來冷靜思考一些問題。
但随着近些年來寫C/C++的人越來越少,網上和官方WIKI的教程直接落地的成功率卻不高,多少會碰到這樣那樣的問題。
現在我就重新整理下搭建過程的一些細節,供同好者分享,也請多多指點。
1,一些條件:
VituralBox 4.3 Win7 x64
Centos 6.4 x64_86(來自某國内某鏡像網站)
Hadoop-1.2.1.tar.gz
安裝openssl、zlib、glib必備(之前cassandra的文章有提及)
2,搭建叢集過程(這部分簡寫,網上很多參考)
2.1 ssh_key互信
主備:ssh-keygen -t rsa 回車到底
主備:chmod 755 .ssh
主:cd .ssh
主:cp id_rsa.pub authorized_keys
主:chmod 644 authorized_keys
主:scp authorized_keys 192.168.137.102:/root/.ssh
備:#scp id_rsa.pub 192.168.137.101:/root/.ssh/192.168.137.102.id_rsa.pub
主:
cat 192.168.137.102.id_rsa.pub >> authorized_keys
主備:
vim /etc/ssh/sshd_config
改為 RSAAuthentication yes
PubkeyAuthentication yes
service sshd restart
2.2 hadoop-env.sh 頭上增補
export JAVA_HOME=/opt/java1.6
export HADOOP_HOME=/opt/hadoop
export HADOOP_CONF_DIRE=/opt/hadoop/conf
2.3 三大xml配置(此處略,網上都有,或者看老版本default)
2.4 master配置
192.168.137.101
2.5 slaver配置
192.168.137.102
2.6 同步
scp -r hadoop 192.168.137.102:/opt
2.7 格式化
hadoop namenode -format ,提升輸入大寫Y
2.8 拉起來
start-all.sh
2.9 初驗
jps(主跑namenode*2+job,備跑task+data)
hadoop dfsadmin -report
或者開個IE,http://cent1:50070 看下日志,浏覽下Hdfs
3,搭建C++ Pipes
cd /opt/hadoop/src/c++/pipes -> chmod 777 configure -> ./configure -> make -> make install
cd /opt/hadoop/src/c++/utils -> chmod 777 configure -> ./configure -> make -> make install
cd //opt/hadoop/src/c++/libhdfs -> chmod 777 configure -> ./configure -> make -> make install
把生成的靜、動庫檔案(比自帶版本size打了3~4倍)扔到下面三個目錄(為今後友善起見)
/opt/hadoop/c++/Linux-amd64-64/lib
/usr/lib64
/usr/lib
/usr/local/lib
及自己的開發目錄
把hadoop自帶的頭檔案/opt/hadoop/c++/Linux-amd64-64/include扔到
/usr/include
/usr/local/include
重新開機hadoop。不做第三步,在開始reduce的過程中會遇到伺服器認證失敗的報錯。
4,開發環境
4.1 用網上北美氣象局的SAMPLE
[root@cent3 tt]# more sample.txt
0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+99999999999
0043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+99999999999
0043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+99999999999
0043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+99999999999
0043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999
4.2 用網上max_temperature sample
#include "hadoop/Pipes.hh"
#include "hadoop/TemplateFactory.hh"
#include "hadoop/StringUtils.hh"
#include <algorithm>
#include <limits>
#include <stdint.h>
#include <string>
#include <stdio.h>
class MaxTemperatureMapper: public HadoopPipes::Mapper {
public:
MaxTemperatureMapper(HadoopPipes::TaskContext& context){}
void map(HadoopPipes::MapContext& context)
{
std::string line=context.getInputValue();
std::string year=line.substr(15,4);
std::string airTemperature=line.substr(87,5);
std::string q=line.substr(92,1);
if(airTemperature != "+9999" && (q == "0" || q == "1" || q == "4" || q == "5" || q == "9"))
context.emit(year, airTemperature);
}
};
class MapTemperatureReducer: public HadoopPipes::Reducer {
MapTemperatureReducer(HadoopPipes::TaskContext& context){}
void reduce(HadoopPipes::ReduceContext& context)
int maxValue=0;
while(context.nextValue())
maxValue=std::max(maxValue,HadoopUtils::toInt(context.getInputValue()));
context.emit(context.getInputKey(),HadoopUtils::toString(maxValue));
int main()
return HadoopPipes::runTask(HadoopPipes::TemplateFactory<MaxTemperatureMapper,MapTemperatureReducer>());
4.3 設定Makefile或者VIM自帶設定
CC=g++
PLATFORM=Linux-amd64-64
HADOOP_INSTALL=/opt/hadoop
CPPFLAGS = -m64 -I/usr/local/include
max_temperature: maxtemperature.cpp
$(CC) $(CPPFLAGS) $< -Wall -L/usr/local/lib -lhadooppipes -lcrypto -lhadooputils -lpthread -g -O2 -o $@
==
52 "======================
53 "F5 Compile c
54 "======================
55 map <F5> :call Compilepp()<CR>
56 func! Compilepp()
57 if &filetype == 'cpp'
58 exec "w"
59 exec "! clear;
60 \ echo Compiling: ./% ...;
61 \ echo ;
62 \ g++ % -g -lstdc++ -L/usr/local/lib -lhadooppipes -lcrypto -lhadooputils -lpthread -o %<.o;
63 \ echo Complie Done;
64 \ echo Start Testing;
65 \ echo ;
66 \ echo ;
67 \ echo ;
68 \ ./%<.o;"
69 endif
70 endfunc
4.4 開始實驗
hadoop dfs -rmr output
hadoop dfs -rm bin/max_temperature
hadoop dfs -put max_temperature bin/max_temperature
haddop dfs -put sample.txt sample.txt
hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=true -input sample.txt -output output -program bin/max_temperature

大緻基本上就是這樣了,對重新編譯一事,wiki也沒有多說什麼,也是從别家了解到一些資訊,在此要感謝某位前輩。
最後再附上一張我自己了解的MP流程圖供參考
http://www.z30.name