天天看點

hadoop叢集多節點安裝詳解

經常使用工具自動建構大規模叢集環境,小環境也有10幾台的機器,雖然自動部署很省事,但自動建構的背後那些機器自動完成的工作讓我們疏忽了,特别是要自己建構一個小叢集用于了解搭建細節和原理還是很有幫助的,今天為複習和了解下hadoop各程序間協調運作的原理,搭建了一個3節點的機器,并記錄自己的搭建過程。

一 搭建叢集

基本環境配置

IP                        Host                             部署程序

192.168.0.110         elephant                         namenode

                                                                   datanode

                                                                   nodemanager

192.168.0.110         tiger                               nodemanager

                                                                   datanode

192.168.0.110         horse                             resourcemanager

                                                                  datanode

                                                                  nodemanager

                                                                  jobhistoryserver                            

1.1 安裝CDH5 yum 源

下載下傳cdh5包

mv cloudera-cdh5.repo /etc/yum.repo.d

1.2 在各節點安裝對應元件

1. 安裝namenode和datanode

在elephant上安裝namenode

sudo yum install --assumeyes hadoop-hdfs-namenode

在elephant,tiger和horse上安裝datanode

sudo yum install --assumeyes hadoop-hdfs-datanode

2. 安裝resourceManger和nodeManager

在horse上安裝resourceManager

sudo yum install –assumeyes Hadoop-yarn-resourcemanager

在elephant,tiger和horse上安裝nodemanager

sudo yum install –assumeyes Hadoop-yarn-nodemanager

3. 安裝mapreduce架構

在elephant,tiger和horse上安裝mapreduce

sudo yum install –assumeyes Hadoop-mapreduce

4.  安裝jobhistoryserver

在hosrse 安裝jobhistoryserver

sudo yum install –assumeyes Hadoop-mapreduce-historyserver

1.3 修改配置檔案

在elephant上修改配置檔案

1 Copy模闆檔案

sudo cp core-site.xml /etc/hadoop/conf/

sudo cp hdfs-site.xml /etc/hadoop/conf/

sudo cp yarn-site.xml /etc/hadoop/conf/

sudo cp mapred-site.xml /etc/hadoop/conf/

2 sudo vi core-site.xml

name value

fs.defaultFS hdfs://elephant:8020

3 sudo vi hdfs-site.xml

4 sudo vi yarn-site.xml

yarn.resourcemanager.hostname horse

yarn.application.classpath 保留模闆中預設值

yarn.nodemanager.aux-services mapreduce_shuffle

--yarn中使用mapreduce計算架構

yarn.nodemanager.log-dirs /var/log/hadoop-yarn/containers

yarn.nodemanager.remote-app-log-dir /var/log/hadoop-yarn/apps

yarn.log-aggregation-enable TRUE

5 sudo vi mapred-sitexml

mapreduce.framework.name yarn

mapreduce.jobhistory.address horse:10020

mapreduce.jobhistory.webapp.address horse:19888

yarn.app.mapreduce.am.staging-dir /user

6 減小jvm堆大小

export HADOOP_NAMENODE_OPTS="-Xmx64m"

export HADOOP_SECONDARYNAMENODE_OPTS="-Xmx64m"

export HADOOP_DATANODE_OPTS="-Xmx64m"

export YARN_RESOURCEMANAGER_OPTS="-Xmx64m"

export YARN_NODEMANAGER_OPTS="-Xmx64m"

export HADOOP_JOB_HISTORYSERVER_OPTS="-Xmx64m"

7 Copy 所有配置檔案到tiger,horse主機

1.4 建立指定目錄

1 在elephant 建立和存放nodemanger,namenode,datanode相關目錄

$ sudo mkdir -p /disk1/dfs/nn

$ sudo mkdir -p /disk2/dfs/nn

$ sudo mkdir -p /disk1/dfs/dn

$ sudo mkdir -p /disk2/dfs/dn

$ sudo mkdir -p /disk1/nodemgr/local

$ sudo mkdir -p /disk2/nodemgr/local

2 設定目錄權限

$ sudo chown -R hdfs:hadoop /disk1/dfs/nn

$ sudo chown -R hdfs:hadoop /disk2/dfs/nn

$ sudo chown -R hdfs:hadoop /disk1/dfs/dn

$ sudo chown -R hdfs:hadoop /disk2/dfs/dn

$ sudo chown -R yarn:yarn /disk1/nodemgr/local

$ sudo chown -R yarn:yarn /disk2/nodemgr/local

3 驗證目錄和權限

$ ls -lR /disk1

$ ls -lR /disk2

1.5  格式化hdfs并啟動hdfs相關程序

1 啟動namenode 和查錯

1) 在elephant

sudo –u hdfs hdfs namenode –format

如果提示是否重新格式化,輸入Y

啟動namenode

sudo service hadoop-hdfs-namenode start

2)檢視namenode日志

手工檢視

可以根據啟動時提示的.out 檔案路徑檢視對應.log的檔案

less /var/log/hadoop-hdfs/ hadoop-hdfs-namenode-elephant.log

web UI檢視

選擇 Utilities->Logs.

2 啟動datanode和查錯

1)在elephant,tiger,horse啟動

sudo service hadoop-hdfs-datanode start

2) 檢視datanode日志

less /var/log/hadoop-hdfs/ hadoop-hdfs-datanode-tiger.log

在其他節點horse上檢視日志也可用如上方法

1.6 在hdfs上建立為yarn和mapreduce建立目錄

$ sudo -u hdfs hadoop fs -mkdir /tmp

$ sudo -u hdfs hadoop fs -chmod -R 1777 /tmp

$ sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn

$ sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn

$ sudo -u hdfs hadoop fs -mkdir /user

$ sudo -u hdfs hadoop fs -mkdir /user/training

$ sudo -u hdfs hadoop fs -chown training /user/training

$ sudo -u hdfs hadoop fs -mkdir /user/history

$ sudo -u hdfs hadoop fs -chmod 1777 /user/history

$ sudo -u hdfs hadoop fs -chown mapred:hadoop /user/history

1.7  啟動yarn和mapreduce程序

1 horse上啟動resourcemanager

sudo service hadoop-yarn-resourcemanager start

2所有節點上啟動nodemanager

sudo service hadoop-yarn-nodemanager start

3horse上啟動historyserver

sudo service hadoop-mapreduce-historyserver start

1.8 測試叢集

1 上傳測試檔案到hdfs

$ hadoop fs -mkdir -p elephant/shakespeare

$ hadoop fs -put shakespeare.txt elephant/shakespeare

2 通過namenode webui 檢視檔案是否上傳

檢視 Utilities->“Browse the file system”選擇目錄檢視檔案

3 測試mapreduce

在elephant

$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount elephant/shakespeare elephant/output

使用webui 通路resourcemanager 判斷applicationmaster,mapper,reducer這些task運作在哪些主機