一 .群集架構
二.群集規劃
namenode datanode journalnode zkfc zookeeper
bigdata01 yes yes yes yes
bigdata02 yes yes yes yes yes
bigdata03 yes yes yes yes
針對HDFS的HA群集,隻需要啟動HDFS相關的程序就可以了,YARN的相關程序可以不啟動,它們兩個的程序本來就是互相獨立的。
在HDFS的HA群集中,不需要使用SecondaryNameNode程序。
① namenode: HDFS的主節點
② datanode : HDFS的從節點
③ journalnode : JournalNode程序,用來同步edits資訊的
④ zkfc(DFSZKFailoverController) : 監視namenode的狀态,負責切換namenode節點的狀态
⑤ zookeeper(QuorumPeerMain) : 儲存ha群集節點狀态資訊
環境準備,三個幾點:
bigdata01 192.168.182.100
bigdata02 192.168.182.101
bigdata03 192.168.182.102
需提前将:IP、hostname、firewalld、JDK、SSH免密登入等基礎環境設定好。
注意:由于namenode進行故障切換的時候,需要在兩個namenode節點之間互相使用ssh進行連接配接,是以需要免密登入。
2.1 節點規劃
使用三個節點搭建一個Zookeeper群集
bigdata01
bigdata02
bigdata03
2.2 配置Zookeeper
1.解壓安裝包
[ro[email protected] soft]# tar -zxvf apache-zookeeper-3.5.8-bin.tar.gz
2.修改配置
[[email protected] soft]# cd apache-zookeeper-3.5.8-bin/conf/
[[email protected] conf]# mv zoo_sample.cfg zoo.cfg
dataDir=/data/soft/apache-zookeeper-3.5.8-bin/data
server.0=bigdata01:2888:3888
server.1=bigdata02:2888:3888
server.2=bigdata03:2888:3888
建立目錄儲存myid檔案,并向myid檔案中寫入内容
myid中的值其實與zoo.cfg中server後面指定的編号是一一對應的。
編号0對應的是bigdata01這台機器,是以這裡指定0
[[email protected] conf]#cd /data/soft/apache-zookeeper-3.5.8-bin
[[email protected] apache-zookeeper-3.5.8-bin]# mkdir data
[[email protected] apache-zookeeper-3.5.8-bin]# cd data
# 将0寫入myid
[[email protected] data]# echo 0 > myid
3. 将修改好配置的zookeeper拷貝到其它兩個節點
[[email protected] soft]# scp -rq apache-zookeeper-3.5.8-bin bigdata02:/data/soft/
[[email protected] soft]# scp -rq apache-zookeeper-3.5.8-bin bigdata03:/data/soft/
4.修改bigdata02和bigdata03上zookeeper中myid檔案的内容
# bigdata02
[[email protected] ~]# cd /data/soft/apache-zookeeper-3.5.8-bin/data/
[[email protected] data]# echo 1 > myid
# bigdata03
[[email protected] ~]# cd /data/soft/apache-zookeeper-3.5.8-bin/data/
[[email protected] data]# echo 2 > myid
5. 啟動zookeeper群集
分别在bigdata01,bigdata02,bigdata03啟動Zookeeper程序
# bigdata01
[[email protected] apache-zookeeper-3.5.8-bin]# bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /data/soft/apache-zookeeper-3.5.8-bin/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
# bigdata02
[[email protected] apache-zookeeper-3.5.8-bin]# bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /data/soft/apache-zookeeper-3.5.8-bin/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
# bigdata03
[[email protected] apache-zookeeper-3.5.8-bin]# bin/zkServer.sh start
ZooKeeper JMX enabled by default
Using config: /data/soft/apache-zookeeper-3.5.8-bin/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
6. 驗證
分别在bigdata01,bigdata02,bigdata03上執行jps指令,驗證是否有QuorumPeerMain程序
如果沒有,就到對應節點的logs目錄下檢視zookeeper*-*.out日志檔案
執行 bin/zkServer.sh status 指令會發現一個節點顯示為leader,其它節點為follower
[[email protected] apache-zookeeper-3.5.8-bin]# bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /data/soft/apache-zookeeper-3.5.8-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: follower
[[email protected] apache-zookeeper-3.5.8-bin]# bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /data/soft/apache-zookeeper-3.5.8-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: leader
[[email protected] apache-zookeeper-3.5.8-bin]# bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /data/soft/apache-zookeeper-3.5.8-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: follower
7.如何停止zookeeper群集
想停止zookeeper群集,可以在所有節點 分别執行 bin/zkServer.sh stop 指令即可.
2.3 配置Hadoop群集
1. 解壓hadoop安裝包
[[email protected] soft]# tar -zxvf hadoop-3.2.0.tar.gz
2.修改hadoop配置檔案
[[email protected] soft]# cd hadoop-3.2.0/etc/hadoop/
[[email protected] hadoop]#
① hadoop-env.sh
在檔案末尾增加環境變量資訊
[[email protected] hadoop]# vi hadoop-env.sh
export JAVA_HOME=/data/soft/jdk1.8
export HADOOP_LOG_DIR=/data/hadoop_repo/logs/hadoop
② code-site.xml
[[email protected] hadoop]# vi core-site.xml
<configuration>
# mycluster是叢集的邏輯名稱,需要和hdfs-site.xml中dfs.nameservices值一緻
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop_repo</value>
</property>
# 使用者角色配置,不配置此項會導緻web頁面報錯
<property>
<name>hadoop.http.staticuser.user</name>
<value>root</value>
</property>
# zookeeper叢集位址
<property>
<name>ha.zookeeper.quorum</name>
<value>bigdata01:2181,bigdata02:2181,bigdata03:2181</value>
</property>
</configuration>
③ hdfs-site.xml
[[email protected] hadoop]# vi hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
# 自定義的叢集名稱
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
# 所有的namenode清單,邏輯名稱,不是namenode所在的主機名
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
# namenode之間用于RPC通信的位址,value填寫namenode所在的主機位址
# 預設端口8020,注意mycluster與nn1要和前面的配置一緻
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>bigdata01:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>bigdata02:8020</value>
</property>
# namenode的web通路位址,預設端口9870
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>bigdata01:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>bigdata02:9870</value>
</property>
# journalnode主機位址,最少三台,預設端口8485
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://bigdata01:8485;bigdata02:8485;bigdata03:8485/mycluster</value>
</property>
# 故障時自動切換的實作類
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
# 故障時互相操作方式(namenode要切換active和standby),使用ssh方式
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
# 修改為自己使用者的ssh key存放位址
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
# namenode日志檔案輸出路徑,即journalnode讀取變更的位置
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/data/hadoop_repo/journalnode</value>
</property>
# 啟用自動故障轉移
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
mapred-site.xml 和 yarn-site.xml 根據需要修改,這裡就不修改了,因為我們隻需要啟動HDFS相關服務
④ workers
[[email protected] hadoop]# vi workers
bigdata02
bigdata03
⑤ start-dfs.sh
[[email protected] hadoop]# cd /data/soft/hadoop-3.2.0/sbin
[[email protected] sbin]# vi start-dfs.sh
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_ZKFC_USER=root
HDFS_JOURNALNODE_USER=root
⑥ stop-dfs.sh
[[email protected] sbin]# vi stop-dfs.sh
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_ZKFC_USER=root
HDFS_JOURNALNODE_USER=root
start-yarn.sh, stop-yarn.sh根據需要配置,這裡不需要啟動這些yarn程序
3. 将修改好配置的安裝包拷貝到其他節點
[[email protected] sbin]# cd /data/soft/
[[email protected] soft]# scp -rq hadoop-3.2.0 bigdata02:/data/soft/
[r[email protected] soft]# scp -rq hadoop-3.2.0 bigdata03:/data/soft/
4. 格式化HDFS
此步驟隻需要第一個配置HA時執行一次即可
注意:在格式化HDFS之前需要先啟動所有的Journalnode
[[email protected] hadoop-3.2.0]# bin/hdfs --daemon start journalnode
[[email protected] hadoop-3.2.0]# bin/hdfs --daemon start journalnode
[ro[email protected] hadoop-3.2.0]# bin/hdfs --daemon start journalnode
任選一個namenode節點執行格式化
[[email protected] hadoop-3.2.0]# bin/hdfs namenode -format
....
....
2026-02-07 00:35:06,212 INFO common.Storage: Storage directory /data/hadoop_repo/dfs/name has been successfully formatted.
2026-02-07 00:35:06,311 INFO namenode.FSImageFormatProtobuf: Saving image file /data/hadoop_repo/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
2026-02-07 00:35:06,399 INFO namenode.FSImageFormatProtobuf: Image file /data/hadoop_repo/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 399 bytes saved in 0 seconds .
2026-02-07 00:35:06,405 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2026-02-07 00:35:06,432 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at bigdata01/192.168.182.100
************************************************************/
看到 had been successfully formatted就說明hdfs格式化成功了
5.啟動namenode
[[email protected] hadoop-3.2.0]# bin/hdfs --daemon start namenode
在另一個namenode節點(bigdata02)上同步資訊,看到下面的資訊,說明同步成功
[[email protected] hadoop-3.2.0]# bin/hdfs namenode -bootstrapStandby
....
....
=====================================================
About to bootstrap Standby ID nn2 from:
Nameservice ID: mycluster
Other Namenode ID: nn1
Other NN's HTTP address: http://bigdata01:9870
Other NN's IPC address: bigdata01/192.168.182.100:8020
Namespace ID: 1820763709
Block pool ID: BP-1332041116-192.168.182.100-1770395706205
Cluster ID: CID-c12130ca-3a7d-4722-93b0-a79b0df3ed84
Layout version: -65
isUpgradeFinalized: true
=====================================================
2026-02-07 00:39:38,594 INFO common.Storage: Storage directory /data/hadoop_repo/dfs/name has been successfully formatted.
2026-02-07 00:39:38,654 INFO namenode.FSEditLog: Edit logging is async:true
2026-02-07 00:39:38,767 INFO namenode.TransferFsImage: Opening connection to http://bigdata01:9870/imagetransfer?getimage=1&txid=0&storageInfo=-65:1820763709:1770395706205:CID-c12130ca-3a7d-4722-93b0-a79b0df3ed84&bootstrapstandby=true
2026-02-07 00:39:38,854 INFO common.Util: Combined time for file download and fsync to all disks took 0.00s. The file download took 0.00s at 0.00 KB/s. Synchronous (fsync) write to disk of /data/hadoop_repo/dfs/name/current/fsimage.ckpt_0000000000000000000 took 0.00s.
2026-02-07 00:39:38,855 INFO namenode.TransferFsImage: Downloaded file fsimage.ckpt_0000000000000000000 size 399 bytes.
2026-02-07 00:39:38,894 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at bigdata02/192.168.182.101
************************************************************/
6. 格式化zookeeper節點
此步驟隻需執行一次即可
在任意一個節點上操作都可以
[[email protected] hadoop-3.2.0]# bin/hdfs zkfc -formatZK
....
....
2026-02-07 00:42:17,212 INFO zookeeper.ClientCnxn: Socket connection established to bigdata02/192.168.182.101:2181, initiating session
2026-02-07 00:42:17,220 INFO zookeeper.ClientCnxn: Session establishment complete on server bigdata02/192.168.182.101:2181, sessionid = 0x100001104b00098, negotiated timeout = 10000
2026-02-07 00:42:17,244 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.
2026-02-07 00:42:17,249 INFO zookeeper.ZooKeeper: Session: 0x100001104b00098 closed
2026-02-07 00:42:17,251 WARN ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x100001104b00098
2026-02-07 00:42:17,251 INFO zookeeper.ClientCnxn: EventThread shut down for session: 0x100001104b00098
2026-02-07 00:42:17,254 INFO tools.DFSZKFailoverController: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DFSZKFailoverController at bigdata01/192.168.182.100
************************************************************/
看到 Successfully created /hadoop-ha/mycluster in ZK. 說明執行成功.
7.啟動HDFS 高可用叢集
[[email protected] hadoop-3.2.0]# sbin/start-dfs.sh
Starting namenodes on [bigdata01 bigdata02]
Last login: Sat Feb 7 00:02:27 CST 2026 on pts/0
bigdata01: namenode is running as process 6424. Stop it first.
Starting datanodes
Last login: Sat Feb 7 00:47:13 CST 2026 on pts/0
Starting journal nodes [bigdata01 bigdata03 bigdata02]
Last login: Sat Feb 7 00:47:13 CST 2026 on pts/0
bigdata02: journalnode is running as process 4864. Stop it first.
bigdata01: journalnode is running as process 6276. Stop it first.
bigdata03: journalnode is running as process 2479. Stop it first.
Starting ZK Failover Controllers on NN hosts [bigdata01 bigdata02]
Last login: Sat Feb 7 00:47:18 CST 2026 on pts/0
以後啟動HA群集,執行使用sbin/start-dfs.sh 即可,不需要再執行5,6步的操作(格式化)了
8. 驗證HA群集
此時通路兩個namenode節點的9870端口,一個顯示為active,一個顯示為Standby
http://bigdata01:9870/dfshealth.html
http://bigdata02:9870/dfshealth.html
9. 模拟切換
我們手工停掉active狀态的namenode,驗證standby 是否可以自動切換為active
[[email protected] hadoop-3.2.0]# jps
8758 DFSZKFailoverController
8267 NameNode
1581 QuorumPeerMain
8541 JournalNode
8814 Jps
[[email protected] hadoop-3.2.0]# kill 8267
[[email protected] hadoop-3.2.0]# jps
8758 DFSZKFailoverController
1581 QuorumPeerMain
8541 JournalNode
8845 Jps
此時在檢視bigdata02的資訊,發現它狀态變為了active
接着我們把bigdata01中的namenode啟動起來,發現它狀态變為了standby
[[email protected] hadoop-3.2.0]# bin/hdfs --daemon start namenode
[[email protected] hadoop-3.2.0]# jps
8898 NameNode
8758 DFSZKFailoverController
8967 Jps
1581 QuorumPeerMain
8541 JournalNode
通過驗證,我們已實作HDFS高可用。
以後我們再操作HDFS就應該這樣操作了。
這裡面的mycluster就是在hdfs-site.xml中配置的dfs.nameservices屬性的值。
[[email protected] hadoop-3.2.0]# bin/hdfs dfs -ls hdfs://mycluster/
[[email protected] hadoop-3.2.0]# bin/hdfs dfs -put README.txt hdfs://mycluster/
[[email protected] hadoop-3.2.0]# bin/hdfs dfs -ls hdfs://mycluster/ Found 1 items
-rw-r--r-- 2 root supergroup 1361 2026-02-07 00:58 hdfs://mycluster/README.txt
10. 停止HDFS群集
[[email protected] hadoop-3.2.0]# sbin/stop-dfs.sh
Stopping namenodes on [bigdata01 bigdata02]
Last login: Sat Feb 7 00:52:01 CST 2026 on pts/0
Stopping datanodes
Last login: Sat Feb 7 01:03:23 CST 2026 on pts/0
Stopping journal nodes [bigdata01 bigdata03 bigdata02]
Last login: Sat Feb 7 01:03:25 CST 2026 on pts/0
Stopping ZK Failover Controllers on NN hosts [bigdata01 bigdata02]
Last login: Sat Feb 7 01:03:29 CST 2026 on pts/0
三. HDFS高擴充
HDFS Federation可以解決單一命名空間存在的問題,使用多個NameNode,每個NameNode負責一個命名空間
這種設計可以提供一下特性:
1 : HDFS群集擴充性。多個NameNode分管一部分目錄,使得一個群集可以擴充到更多節點,不再因記憶體的顯示制約檔案存儲資料。
2 : 性能更高效。 多個NameNode管理不同的資料,且同時對外提供服務,将為使用者提供更高的讀寫吞吐率。
3 : 良好的隔離性。使用者可以根據需要将不同業務資料交由不同NameNode管理,這樣不同業務之間影響很小。
一般Federation會結合HA一起使用:
這裡面用到了4個NameNode和6個DataNode
NN-1、NN-2、NN-3、NN-4
DN-1、DN-2、DN-3、DN-4、DN-5、DN-6
其中NN-1和NN-3配置了HA,提供了一個命名空間,/share
NN-2和NN-4配置了HA,提供了一個命名空間,/user
這樣後期我們存儲資料的時候,可以根據資料的業務類型來區分是否存儲到share目錄下還是user目錄下,此時HDFS的存儲能力總和就是/share和/user兩個命名空間的總和了.