天天看點

Ubuntu系統下配置Hadoop2.7.1

Hadoop2.7.1與之前版本的配置方法類似,需要先安裝java,配置環境變量JAVA_HOME、CLASSPATH(java的lib路徑)、以及PATH(java的bin路徑)。

然後安裝Hadoop,配置HADOOP_HOME、CLASSPATH(Hadoop的lib路徑)、以及PATH(Hadoop的bin以及sbin路徑),配置

hadoop-env.sh、core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、master、slaves等檔案,與之前版本不同的是,這些檔案在hadoop/etc/hadoop目錄下,而不是hadoop/conf。

core-site.xml

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>

<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
</property>

</configuration>
           

hdfs-site.xml

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
</property>

<property>
<name>dfs.name.dir</name>
<value>/hadoop/namenode</value>
</property>

<property>
<name>dfs.data.dir</name>
<value>/hadoop/datanode</value>
</property>

<!-- For Hue  -->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<!-- For Hue  -->

</configuration>
           

mapred-site.xml

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

<property>
<name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
</property>

<!-- For Hue  -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>localhost:19888</value>
</property>
<!-- For Hue  -->

<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>

<property>
<name>mapred.local.dir</name>
<value>/hadoop/mapred/local</value>
</property>
</configuration>
           

yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<!--
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>


<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
-->
<!-- For Hue  -->
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>

<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>localhost:8088</value>
</property>

<property>
<name>yarn.web-proxy.address</name>
<value>localhost:8888</value>
</property>
<!-- For Hue  -->


<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>

<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8031</value>
</property>

<property>
<name>yarn.resourcemanager.admin.address</name>
<value>localhost:8033</value>
</property>

<!-- Set Yarn Resource Value -->
<property>  
<name>yarn.nodemanager.resource.memory-mb</name>  
<value>20480</value>  
</property>  

<property>  
<name>yarn.scheduler.minimum-allocation-mb</name>  
<value>2048</value>  
</property>  

<property>  
<name>yarn.nodemanager.vmem-pmem-ratio</name>  
<value>2.1</value>  
</property>  
<!-- Set Yarn Resource Value -->

</configuration>
           

啟動後可以通過localhost:50070和localhost:8088管理叢集。

值得注意的是,由于要配置yarn來運作作業,是以在mapred-site.xml中指定mapreduce的framework為yarn,其他關于監控作業運作的配置在yarn-site.xml中進行,還有一點,由于hadoop2.x之前對于hadoop.tmp.dir的設定是在hdfs-site.xml中,而hadoop2.x之後需要在core-site.xml中進行設定,否則在用yarn運作mapreduce作業時會出現找不到tmp目錄下檔案的情況,如果沒有在core-site.xml中進行設定,則隻能用mapreduce架構來運作作業。

同樣運作wordcount時的jar包路徑變為hadoop/share/hadoop/mapreduce下的hadoop-mapreduce-examples-2.7.1.jar。

編譯Java程式時為了解決檔案依賴,需要在~/.bashrc中配置以下資訊,其中由于Hadoop2.7.1版本原因,HADOOP_CLASSPATH需要從以下各目錄中選取jar包進行配置

#sethadoop environment

HADOOP_HOME=/home/cuihaolong/Application/hadoop

forf in $HADOOP_HOME/share/hadoop/common/hadoop-*.jar;do

HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f

done

forf in $HADOOP_HOME/share/hadoop/common/lib/*.jar;do

HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f

done

forf in $HADOOP_HOME/share/hadoop/hdfs/hadoop-*.jar;do

HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f

done

forf in $HADOOP_HOME/share/hadoop/mapreduce/hadoop-*.jar;do

HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f

done

forf in $HADOOP_HOME/share/hadoop/common/mapreduce/lib/*.jar;do

HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f

done

forf in $HADOOP_HOME/share/hadoop/tools/sources/hadoop-*.jar;do

HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f

done

注:如果Java程式中沒有使用package将A類打包,則在hadoop中調用A.jar包中的A類時可以直接使用,如hadoopjar A.jar A /input /output

繼續閱讀