Hadoop2.7.1與之前版本的配置方法類似,需要先安裝java,配置環境變量JAVA_HOME、CLASSPATH(java的lib路徑)、以及PATH(java的bin路徑)。
然後安裝Hadoop,配置HADOOP_HOME、CLASSPATH(Hadoop的lib路徑)、以及PATH(Hadoop的bin以及sbin路徑),配置
hadoop-env.sh、core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、master、slaves等檔案,與之前版本不同的是,這些檔案在hadoop/etc/hadoop目錄下,而不是hadoop/conf。
core-site.xml
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
</property>
</configuration>
hdfs-site.xml
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/hadoop/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/hadoop/datanode</value>
</property>
<!-- For Hue -->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<!-- For Hue -->
</configuration>
mapred-site.xml
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
</property>
<!-- For Hue -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>localhost:19888</value>
</property>
<!-- For Hue -->
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/hadoop/mapred/local</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
-->
<!-- For Hue -->
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>localhost:8088</value>
</property>
<property>
<name>yarn.web-proxy.address</name>
<value>localhost:8888</value>
</property>
<!-- For Hue -->
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>localhost:8033</value>
</property>
<!-- Set Yarn Resource Value -->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>20480</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<!-- Set Yarn Resource Value -->
</configuration>
啟動後可以通過localhost:50070和localhost:8088管理叢集。
值得注意的是,由于要配置yarn來運作作業,是以在mapred-site.xml中指定mapreduce的framework為yarn,其他關于監控作業運作的配置在yarn-site.xml中進行,還有一點,由于hadoop2.x之前對于hadoop.tmp.dir的設定是在hdfs-site.xml中,而hadoop2.x之後需要在core-site.xml中進行設定,否則在用yarn運作mapreduce作業時會出現找不到tmp目錄下檔案的情況,如果沒有在core-site.xml中進行設定,則隻能用mapreduce架構來運作作業。
同樣運作wordcount時的jar包路徑變為hadoop/share/hadoop/mapreduce下的hadoop-mapreduce-examples-2.7.1.jar。
編譯Java程式時為了解決檔案依賴,需要在~/.bashrc中配置以下資訊,其中由于Hadoop2.7.1版本原因,HADOOP_CLASSPATH需要從以下各目錄中選取jar包進行配置
#sethadoop environment
HADOOP_HOME=/home/cuihaolong/Application/hadoop
forf in $HADOOP_HOME/share/hadoop/common/hadoop-*.jar;do
HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f
done
forf in $HADOOP_HOME/share/hadoop/common/lib/*.jar;do
HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f
done
forf in $HADOOP_HOME/share/hadoop/hdfs/hadoop-*.jar;do
HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f
done
forf in $HADOOP_HOME/share/hadoop/mapreduce/hadoop-*.jar;do
HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f
done
forf in $HADOOP_HOME/share/hadoop/common/mapreduce/lib/*.jar;do
HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f
done
forf in $HADOOP_HOME/share/hadoop/tools/sources/hadoop-*.jar;do
HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f
done
注:如果Java程式中沒有使用package将A類打包,則在hadoop中調用A.jar包中的A類時可以直接使用,如hadoopjar A.jar A /input /output