天天看點

五節點HadoopHA安裝教程localhost

五節點HadoopHA安裝教程:

Master1 namenode,resourcemanager,nodemanager,datanode,journalnode, DFSZKFailoverController

Master2 namenode,resourcemanager,nodemanager,datanode,journalnode, DFSZKFailoverController

Slave1 nodemanager,datenode,journalnode, QuorumPeerMain

Slave2 nodemanager,datenode,journalnode, QuorumPeerMain

Slave3 nodemanager,datenode,journalnode, QuorumPeerMain

  1. 安裝jdk

    配置環境變量

JAVA

export JAVA_HOME=/home/zhouwang/jdk1.8.0_151

export PATH=$JAVA_HOME/bin:$PATH

export JRE_HOME=$JAVA_HOME/jre

export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH

2.配置個節點的/etc/hosts檔案

192.168.71.128 master1

192.168.71.132 master2

192.168.71.129 slave1

192.168.71.130 slave2

192.168.71.131 slave3

3.配置SSH免密登入

Ssh-keygen -t rsa

Cat id_rsa.pub >> authorized_keys

Chmod 644 authorized_keys

Scp ~/.ssh/id_rsa.pub authorized_keys zhouwang@master:~/.ssh

Authorized_keys的權限必須是640

4.安裝zookeeper

重命名conf檔案夾下的zoo.example.cfg為zoo.cfg。新修改内容

clientPort=2181

dataDir=/home/zhouwang/zookeeper/data

dataLogDir=/home/zhouwang/zookeeper/log

server.0=master1:2888:3888

server.1=master2:2888:3888

server.2=slave1:2888:3888

server.3=slave2:2888:3888

server.4=slave3:2888:3888

在zookeeper檔案夾下面建立相應的data和log檔案夾,還有myid檔案

Mkdir data

Mkdir log

Vim myid 輸入1,每個節點的myid檔案的值與server.x對應上

配置環境變量如下:

ZOOKEEPER

export ZOOKEEPER_HOME=/home/zhouwang/zookeeper

export PATH=$PATH:$ZOOKEEPER_HOME/bin

  1. 安裝Hadoop

    修改etc/hadoop下的四個配置檔案

(1) core-site.xml

<property> 
    <name>fs.defaultFS</name> 
    <value>hdfs://master/</value> 
</property>            
<property> 
    <name>hadoop.tmp.dir</name> 
    <value>/home/zhouwang/hadoop/tmp</value> 
</property>            
<property> 
    <name>ha.zookeeper.quorum</name> 
    <value>master1:2181,master2:2181,slave1:2181,slave2:2181,slave3:2181</value> 
</property>           

(2) hdfs_site.xml

<property> 
    <name>dfs.namenode.name.dir</name> 
    <value>/home/zhouwang/hadoop/dfs/name</value> 
</property> 
<property> 
    <name>dfs.datanode.data.dir</name> 
    <value>/home/zhouwang/hadoop/dfs/data</value> 
</property> 
<property> 
    <name>dfs.replication</name> 
    <value>3</value> 
</property> 

<!--HDFS高可用配置 --> 
<!--指定hdfs的nameservice,需要和core-site.xml中的保持一緻--> 
<property> 
    <name>dfs.nameservices</name> 
    <value>master</value> 
</property> 
<!--指定master的兩個namenode的名稱 --> 
<property> 
    <name>dfs.ha.namenodes.master</name> 
    <value>nn1,nn2</value> 
</property> 

<!-- nn1,nn2 rpc 通信位址 --> 
<property> 
    <name>dfs.namenode.rpc-address.master.nn1</name> 
    <value>master1:9000</value> 
</property> 
<property> 
    <name>dfs.namenode.rpc-address.master.nn2</name> 
    <value>master2:9000</value> 
</property> 

<!-- nn1.nn2 http 通信位址 --> 
<property> 
    <name>dfs.namenode.http-address.master.nn1</name> 
    <value>master1:50070</value> 
</property> 
<property> 
    <name>dfs.namenode.http-address.master.nn2</name> 
    <value>master2:50070</value> 
</property> 

<!--=========Namenode同步==========--> 
<!--保證資料恢複 --> 
<property> 
    <name>dfs.journalnode.http-address</name> 
    <value>0.0.0.0:8480</value> 
</property> 
<property> 
    <name>dfs.journalnode.rpc-address</name> 
    <value>0.0.0.0:8485</value> 
</property> 
<property> 
    <!--指定NameNode的中繼資料在JournalNode上的存放位置 --> 
    <name>dfs.namenode.shared.edits.dir</name> 
    <value>qjournal://master1:8485;master2:8485;slave1:8485;slave2:8485;slave3:8485/master</value> 
</property> 

<property> 
    <!--JournalNode存放資料位址 --> 
    <name>dfs.journalnode.edits.dir</name> 
    <value>/home/zhouwang/hadoop/dfs/journal</value> 
</property> 
<property> 
    <!--NameNode失敗自動切換實作方式 --> 
    <name>dfs.client.failover.proxy.provider.master</name> 
    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> 
</property> 

<!--=========Namenode fencing:======== --> 
<!-- 配置隔離機制方法,多個機制用換行分割,即每個機制暫用一行 --> 
<property> 
    <name>dfs.ha.fencing.methods</name> 
    <value>sshfence
            shell(/bin/true)</value> 
</property> 
<!-- 使用sshfence隔離機制時需要ssh免登陸 --> 
<property> 
    <name>dfs.ha.fencing.ssh.private-key-files</name> 
    <value>/home/zhouwang/.ssh/id_rsa</value> 
</property> 
<!-- 配置sshfence隔離機制逾時時間 --> 
<property> 
    <name>dfs.ha.fencing.ssh.connect-timeout</name> 
    <value>30000</value> 
</property> 

<!--開啟基于Zookeeper及ZKFC程序的自動備援設定,監視程序是否死掉 --> 
<property> 
    <name>dfs.ha.automatic-failover.enabled</name> 
    <value>true</value> 
</property> 
           

(3) mapred-site.xml

<!-- 指定mr架構為yarn方式 -->             
<name>mapreduce.framework.name</name> 
<value>yarn</value>            
<name>mapreduce.jobhistory.address</name> 
    <value>master1:10020</value>            
<name>mapreduce.jobhistory.webapp.address</name> 
    <value>master1:19888</value>            

(4) yarn-site.xml

<!--NodeManager上運作的附屬服務。需配置成mapreduce_shuffle,才可運作MapReduce程式--> 
<property> 
    <name>yarn.nodemanager.aux-services</name> 
    <value>mapreduce_shuffle</value> 
</property> 
<property> 
    <name>yarn.resourcemanager.connect.retry-interval.ms</name> 
    <value>2000</value> 
</property> 
<property> 
    <name>yarn.resourcemanager.ha.enabled</name> 
    <value>true</value> 
</property> 
<!-- 指定RM的cluster id --> 
<property> 
    <name>yarn.resourcemanager.cluster-id</name> 
    <value>cluster</value> 
</property> 
<!--指定兩台RM主機名辨別符--> 
<property> 
    <name>yarn.resourcemanager.ha.rm-ids</name> 
    <value>rm1,rm2</value> 
</property> 
<!--RM主機1--> 
<property> 
    <name>yarn.resourcemanager.hostname.rm1</name> 
    <value>master1</value> 
</property> 
<!--RM主機2--> 
<property> 
    <name>yarn.resourcemanager.hostname.rm2</name> 
    <value>master2</value> 
</property> 
<!--RM故障自動切換--> 
<property> 
    <name>yarn.resourcemanager.ha.automatic-failover.enabled</name> 
    <value>true</value> 
</property> 
<!--RM故障自動恢複 --> 
<property> 
<name>yarn.resourcemanager.recovery.enabled</name>  
    <value>true</value>  
</property> 
<!--RM狀态資訊存儲方式,一種基于記憶體(MemStore),另一種基于ZK(ZKStore)--> 
<property> 
    <name>yarn.resourcemanager.store.class</name> 
    <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> 
</property> 
<!-- 指定zk叢集位址 --> 
<property> 
    <name>yarn.resourcemanager.zk-address</name> 
    <value>master1:2181,master2:2181,slave1:2181,slave2:2181,slave3:2181</value> 
</property> 
<!--向RM排程資源位址--> 
<property> 
    <name>yarn.resourcemanager.scheduler.address.rm1</name> 
    <value>master1:8030</value> 
</property> 
<property> 
    <name>yarn.resourcemanager.scheduler.address.rm2</name> 
    <value>master2:8030</value> 
</property> 
<!--NodeManager通過該位址交換資訊--> 
<property> 
    <name>yarn.resourcemanager.resource-tracker.address.rm1</name> 
    <value>master1:8031</value> 
</property> 
<property> 
    <name>yarn.resourcemanager.resource-tracker.address.rm2</name> 
    <value>master2:8031</value> 
</property> 
<!--用戶端通過該位址向RM送出對應用程式操作--> 
<property> 
    <name>yarn.resourcemanager.address.rm1</name> 
    <value>master1:8032</value> 
</property> 
<property> 
    <name>yarn.resourcemanager.address.rm2</name> 
    <value>master2:8032</value> 
</property> 
<!--管理者通過該位址向RM發送管理指令--> 
<property> 
    <name>yarn.resourcemanager.admin.address.rm1</name> 
    <value>master1:8033</value> 
</property> 
<property> 
    <name>yarn.resourcemanager.admin.address.rm2</name> 
    <value>master2:8033</value> 
</property> 
<!--RM HTTP通路位址,檢視叢集資訊--> 
<property> 
    <name>yarn.resourcemanager.webapp.address.rm1</name> 
    <value>master1:8088</value> 
</property> 
<property> 
    <name>yarn.resourcemanager.webapp.address.rm2</name> 
    <value>master2:8088</value> 
</property> 
           

(5) slaves

localhost

設定配置檔案:

HADOOP

export HADOOP_HOME=/home/zhouwang/hadoop

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

export YARN_CONF_DIR=$HADOOP_HOME/etc/Hadoop

将hadoop檔案夾分發到各個節點上去:

Scp -r Hadoop zhouwang@XXX:~/Hadoop

  1. 第一次啟動叢集

    配置了zookeeper的節點啟動zkServer.sh服務

zkServer.sh start

然後檢視zookeeper狀态,zkServer.sh status,顯式為follower或者leader則說明啟動成功

然後,資料節點啟動journalnode服務,hadoop-deamen.sh start journalnode

在master1格式化namenode: Hadoop namenode -format

然後啟動namenode服務: Hadoop-deamon.sh start namenode

在master2同步master1的中繼資料: HDFS namenode -bootstrapStandby

然後再master2上啟動namenode服務: Hadoop-deamon.sh start namenode

在master上格式化ZKFC

Hdfs zkfc -formatZK

在master1和master2上執行hadoop-deamon.sh start zkfc 啟動DFSZKFailoverController 服務#ZKFC用于監控NameNode active和standby節點狀态

在master1上啟動hadoop-deamons.sh start datanode 啟動所有資料節點上的datanode服務。

在master1上執行start-yarn.xml啟動yarn服務。

至此安裝完成。

7.第一次停止叢集

先停止hdfs:stop-dfs.sh

在停止yarn:stop-yarn.sh

再次啟動或者停止就可以執行start-all.sh 和stop-all.sh了。

8.Hdfs操作指令

Hadoop dfsadmin -report #檢視datanode的節點資訊

Hdfs haadmin -getServiceState nn1 檢視namenode的狀态

hdfs haadmin -transitionToActive/transitionToStandby -forcemanual nn1 強制切換節點的active和是standby狀态。