Hadoop2.7.1与之前版本的配置方法类似,需要先安装java,配置环境变量JAVA_HOME、CLASSPATH(java的lib路径)、以及PATH(java的bin路径)。
然后安装Hadoop,配置HADOOP_HOME、CLASSPATH(Hadoop的lib路径)、以及PATH(Hadoop的bin以及sbin路径),配置
hadoop-env.sh、core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、master、slaves等文件,与之前版本不同的是,这些文件在hadoop/etc/hadoop目录下,而不是hadoop/conf。
core-site.xml
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
</property>
</configuration>
hdfs-site.xml
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/hadoop/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/hadoop/datanode</value>
</property>
<!-- For Hue -->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<!-- For Hue -->
</configuration>
mapred-site.xml
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
</property>
<!-- For Hue -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>localhost:19888</value>
</property>
<!-- For Hue -->
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
<property>
<name>mapred.local.dir</name>
<value>/hadoop/mapred/local</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!--
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
-->
<!-- For Hue -->
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>localhost:8088</value>
</property>
<property>
<name>yarn.web-proxy.address</name>
<value>localhost:8888</value>
</property>
<!-- For Hue -->
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>localhost:8033</value>
</property>
<!-- Set Yarn Resource Value -->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>20480</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<!-- Set Yarn Resource Value -->
</configuration>
启动后可以通过localhost:50070和localhost:8088管理集群。
值得注意的是,由于要配置yarn来运行作业,所以在mapred-site.xml中指定mapreduce的framework为yarn,其他关于监控作业运行的配置在yarn-site.xml中进行,还有一点,由于hadoop2.x之前对于hadoop.tmp.dir的设置是在hdfs-site.xml中,而hadoop2.x之后需要在core-site.xml中进行设置,否则在用yarn运行mapreduce作业时会出现找不到tmp目录下文件的情况,如果没有在core-site.xml中进行设置,则只能用mapreduce框架来运行作业。
同样运行wordcount时的jar包路径变为hadoop/share/hadoop/mapreduce下的hadoop-mapreduce-examples-2.7.1.jar。
编译Java程序时为了解决文件依赖,需要在~/.bashrc中配置以下信息,其中由于Hadoop2.7.1版本原因,HADOOP_CLASSPATH需要从以下各目录中选取jar包进行配置
#sethadoop environment
HADOOP_HOME=/home/cuihaolong/Application/hadoop
forf in $HADOOP_HOME/share/hadoop/common/hadoop-*.jar;do
HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f
done
forf in $HADOOP_HOME/share/hadoop/common/lib/*.jar;do
HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f
done
forf in $HADOOP_HOME/share/hadoop/hdfs/hadoop-*.jar;do
HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f
done
forf in $HADOOP_HOME/share/hadoop/mapreduce/hadoop-*.jar;do
HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f
done
forf in $HADOOP_HOME/share/hadoop/common/mapreduce/lib/*.jar;do
HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f
done
forf in $HADOOP_HOME/share/hadoop/tools/sources/hadoop-*.jar;do
HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f
done
注:如果Java程序中没有使用package将A类打包,则在hadoop中调用A.jar包中的A类时可以直接使用,如hadoopjar A.jar A /input /output