天天看點

hadoop叢集部署lzo

因為LZO有較快的解壓縮速率,是以我打算在生産環境進行該壓縮方式的使用(雖然對磁盤節約的空間不夠多)。

目前這個部署步驟有點問題,還需調試和驗證,稍候再給出正确的部署方式,大家如果按此步驟進行,如有問題暫時自己進行解決吧。

apache-ant-1.8.3-bin.tar.gz   ant編譯工具,必須大于1.7版本,否則有些屬性不支援

kevinweil-hadoop-lzo-6bb1b7f.tar.gz  用來編譯hadoop-lzo-0.4.13.jar檔案

hadoop-gpl-compression-0.1.0-rc0.tar.gz  上面的替代方案,經測試此方案更佳,建議使用

這裡主要說明下hadoop-gpl-compression-0.1.0-rc0.tar.gz 和kevinweil-hadoop-lzo-6bb1b7f.tar.gz是重複的,隻安裝一個就行了,推薦安裝kevinweil-hadoop-lzo-6bb1b7f.tar.gz這個。

lzo-2.06.tar.gz lzo動态庫編譯

1、安裝ant,很簡單

tar -zxf apache-ant-1.8.3-bin.tar.gz

修改/etc/profile檔案

vim /etc/profile

内容如下:

export ANT_HOME="/home/hadoop/apache-ant-1.8.3"

export PATH=$PATH:$ANT_HOME/bin

儲存并退出

source /etc/profile

 2、安裝lzo動态庫

tar -zxvf lzo-2.06.tar.gz

./configure --enable-shared

make

make install

庫檔案預設安裝到/usr/local/lib目錄下

拷貝/usr/local/lib目錄下得lzo庫到/usr/lib【32位作業系統】或者/usr/lib64【64位作業系統】

3、安裝生成hadoop-lzo的jar包

tar -zxvf kevinweil-hadoop-lzo-6bb1b7f.tar.gz 

cd kevinweil-hadoop-lzo-6bb1b7f

ant compile-native tar

正常的話,會在kevinweil-hadoop-lzo-6bb1b7f/build目錄下生成hadoop-lzo-0.4.15.jar包,

将這個jar包拷貝到hadoop安裝目錄下的lib目錄裡

cp hadoop-lzo-0.4.15.jar /home/hadoop/hadoop-0.20.205.0/lib

接着 

進入到/kevinweil-hadoop-lzo-6bb1b7f/build/native/Linux-amd64-64/lib目下,将全部檔案拷貝到/home/hadoop/hadoop-0.20.205.0/lib/native/Linux-amd64-64目錄下。

或者使用

tar -cBf - -C build/native/ . |tar -xBvf - -C /home/hadoop/hadoop-0.20.205.0/lib/native/

這個指令 ,接着賦通路權限,指令如下:

cd /home/hadoop/hadoop-0.20.205.0/lib

chown -R hadoop. native/

4、 處理hadoop-gpl-compression-0.1.0-rc0.tar.gz

tar -zxf hadoop-gpl-compression-0.1.0-rc0.tar.gz

cd hadoop-gpl-compression-0.1.0

在此目錄下有hadoop-gpl-compression-0.1.0.jar,拷貝到hadoop安裝目錄下的lib裡

再把目錄下lib/native下的目錄,拷貝到hadoop安裝目錄下的/lib/native裡。

5、修改hadoop的配置檔案core-site.xml,mapred-site.xml

core-site.xml内容如下:

<property>  

<name>io.compression.codecs</name>  

<value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,com.hadoop.compression.lzo.LzopCodec</value> 

</property> 

<property>  

<name>io.compression.codec.lzo.class</name>  

<value>com.hadoop.compression.lzo.LzoCodec</value> 

</property>

mapred-site.xml内容如下:

<property>  

<name>mapred.map.output.compress</name>  

<value>true</value>  

</property> 

<property>  

<name>mapred.child.env</name>  

<value>JAVA_LIBRARY_PATH=/usr/local/cdh3u0/hadoop-0.20.2-cdh3u1/lib/native/Linux-amd64-64</value>  

</property> 

<property>  

<name>mapred.map.output.compress.codec</name>  

<value>com.hadoop.compression.lzo.LzoCodec</value>

</property>

部署完成後,運作hive有如下錯誤:

Total MapReduce jobs = 2

Launching Job 1 out of 2

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set mapred.reduce.tasks=<number>

java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork!

        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputFormatFromCache(HiveInputFormat.java:197)

        at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:298)

        at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:981)

        at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:973)

        at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)

        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:889)

        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:842)

        at javax.security.auth.Subject.doAs(Subject.java:396)

        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)

        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:842)

        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:816)

        at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:452)

        at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)

        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133)

        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)

        at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:48)

Caused by: java.lang.RuntimeException: Error in configuring object

        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)

        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)

        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)

        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputFormatFromCache(HiveInputFormat.java:193)

        at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:298)

        at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:981)

        at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:973)

        at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)

        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:890)

        ... 10 more

Caused by: java.lang.reflect.InvocationTargetException

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

        at java.lang.reflect.Method.invoke(Method.java:597)

        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)

        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)

        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)

        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputFormatFromCache(HiveInputFormat.java:193)

        at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:298)

        at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:981)

        at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:973)

        at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)

        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:889)

        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:842)

        at javax.security.auth.Subject.doAs(Subject.java:396)

        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)

        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:842)

        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:816)

        at org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:452)

        at org.apache.hadoop.hive.ql.exec.MapRedTask.execute(MapRedTask.java:136)

        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133)

        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)

        at org.apache.hadoop.hive.ql.exec.TaskRunner.run(TaskRunner.java:47)

Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzopCodec not found.

        at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96)

        at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:134)

        at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:38)

        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

        at java.lang.reflect.Method.invoke(Method.java:597)

        at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)

        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)

        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)

        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputFormatFromCache(HiveInputFormat.java:193)

        at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:298)

        at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:981)

        at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:973)

        at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:172)

        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:890)

        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:842)

        at javax.security.auth.Subject.doAs(Subject.java:396)

        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)

        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:842)

 目前網上有個朋友,寫到hadoop-0.20.205的版本有個bug,就是不會自動加載lib目錄下的jar包,是以需要進行相關環境變量的修改,修改内容如下:

解決辦法:在$hadoop_home/bin/hadoop檔案中增加如下一行即可

JAVA_LIBRARY_PATH=$hadoop_home/lib/native/Linux-amd64-64

在hadoop-0.20.205版本中處了該錯誤外,還有一個錯誤,就是找不到lzo的jar包,這是因為他們加載classpath的方法有了變更,預設不會把$hadoop_home/lib目錄下的所有jar包都加載,是以在/conf/hadoop-env.sh中增加如下代碼即可解決:export  HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$hadoop_home/lib/hadoop-lzo.jar

目前确實也是因為加載不到jar包導緻,是以正在測試環境下,進行調試中,有結果再發最終結果的資訊。

經測試後,該解決方案解決了jar包找不到的問題。但本地庫加載還是不成功。還需要進一步進行驗證。 

繼續閱讀