天天看点

Ubuntu系统下配置Hadoop2.7.1

Hadoop2.7.1与之前版本的配置方法类似,需要先安装java,配置环境变量JAVA_HOME、CLASSPATH(java的lib路径)、以及PATH(java的bin路径)。

然后安装Hadoop,配置HADOOP_HOME、CLASSPATH(Hadoop的lib路径)、以及PATH(Hadoop的bin以及sbin路径),配置

hadoop-env.sh、core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml、master、slaves等文件,与之前版本不同的是,这些文件在hadoop/etc/hadoop目录下,而不是hadoop/conf。

core-site.xml

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>

<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
</property>

</configuration>
           

hdfs-site.xml

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp</value>
</property>

<property>
<name>dfs.name.dir</name>
<value>/hadoop/namenode</value>
</property>

<property>
<name>dfs.data.dir</name>
<value>/hadoop/datanode</value>
</property>

<!-- For Hue  -->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<!-- For Hue  -->

</configuration>
           

mapred-site.xml

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

<property>
<name>mapreduce.jobhistory.address</name>
<value>localhost:10020</value>
</property>

<!-- For Hue  -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>localhost:19888</value>
</property>
<!-- For Hue  -->

<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>

<property>
<name>mapred.local.dir</name>
<value>/hadoop/mapred/local</value>
</property>
</configuration>
           

yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->

<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>

<!--
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>


<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
-->
<!-- For Hue  -->
<property>
<name>yarn.resourcemanager.address</name>
<value>localhost:8032</value>
</property>

<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>localhost:8088</value>
</property>

<property>
<name>yarn.web-proxy.address</name>
<value>localhost:8888</value>
</property>
<!-- For Hue  -->


<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>localhost:8030</value>
</property>

<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>localhost:8031</value>
</property>

<property>
<name>yarn.resourcemanager.admin.address</name>
<value>localhost:8033</value>
</property>

<!-- Set Yarn Resource Value -->
<property>  
<name>yarn.nodemanager.resource.memory-mb</name>  
<value>20480</value>  
</property>  

<property>  
<name>yarn.scheduler.minimum-allocation-mb</name>  
<value>2048</value>  
</property>  

<property>  
<name>yarn.nodemanager.vmem-pmem-ratio</name>  
<value>2.1</value>  
</property>  
<!-- Set Yarn Resource Value -->

</configuration>
           

启动后可以通过localhost:50070和localhost:8088管理集群。

值得注意的是,由于要配置yarn来运行作业,所以在mapred-site.xml中指定mapreduce的framework为yarn,其他关于监控作业运行的配置在yarn-site.xml中进行,还有一点,由于hadoop2.x之前对于hadoop.tmp.dir的设置是在hdfs-site.xml中,而hadoop2.x之后需要在core-site.xml中进行设置,否则在用yarn运行mapreduce作业时会出现找不到tmp目录下文件的情况,如果没有在core-site.xml中进行设置,则只能用mapreduce框架来运行作业。

同样运行wordcount时的jar包路径变为hadoop/share/hadoop/mapreduce下的hadoop-mapreduce-examples-2.7.1.jar。

编译Java程序时为了解决文件依赖,需要在~/.bashrc中配置以下信息,其中由于Hadoop2.7.1版本原因,HADOOP_CLASSPATH需要从以下各目录中选取jar包进行配置

#sethadoop environment

HADOOP_HOME=/home/cuihaolong/Application/hadoop

forf in $HADOOP_HOME/share/hadoop/common/hadoop-*.jar;do

HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f

done

forf in $HADOOP_HOME/share/hadoop/common/lib/*.jar;do

HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f

done

forf in $HADOOP_HOME/share/hadoop/hdfs/hadoop-*.jar;do

HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f

done

forf in $HADOOP_HOME/share/hadoop/mapreduce/hadoop-*.jar;do

HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f

done

forf in $HADOOP_HOME/share/hadoop/common/mapreduce/lib/*.jar;do

HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f

done

forf in $HADOOP_HOME/share/hadoop/tools/sources/hadoop-*.jar;do

HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f

done

注:如果Java程序中没有使用package将A类打包,则在hadoop中调用A.jar包中的A类时可以直接使用,如hadoopjar A.jar A /input /output

继续阅读