Hadoop運作模式(多台)

運作模式

文章目錄

運作模式

3 完全分布式運作模式

3.1 分析：
3.2 編寫叢集分發腳本xsync

1. scp（secure copy）安全拷貝
2. rsync 遠端同步工具
3. xsync叢集分發腳本

3.3 叢集配置

1. 叢集部署規劃
2. 配置叢集
3．在叢集上分發配置好的Hadoop配置檔案
4．檢視檔案分發情況

3.4 叢集單點啟動
3.5 SSH無密登入配置

1. 配置ssh
2. 無密鑰配置
3. .ssh檔案夾下（~/.ssh）的檔案功能解釋

3.6 群起叢集

1. 配置slaves
2. 啟動叢集

3. 叢集基本測試
3.7 叢集啟動/停止方式總結
3.8 叢集時間同步

3 完全分布式運作模式

3.1 分析：

1）準備3台客戶機（關閉防火牆、靜态ip、主機名稱）
2）安裝JDK
3）配置環境變量
4）安裝Hadoop
5）配置環境變量
6）配置叢集
7）單點啟動
8）配置ssh
9）群起并測試叢集

3.2 編寫叢集分發腳本xsync

1. scp（secure copy）安全拷貝

（1）scp定義：

scp可以實作伺服器與伺服器之間的資料拷貝。（from server1 to server2）

（2）基本文法

scp -r fname host:fname

指令遞歸要拷貝的檔案路徑/名稱目的使用者@主機:目的路徑/名稱

（3）案例實操

（a）在hadoop101上，将hadoop101中/opt/module目錄下的軟體拷貝到hadoop102上。

[atguigu@hadoop101 /]$ scp -r /opt/module  root@hadoop102:/opt/module

（b）在hadoop103上，将hadoop101伺服器上的/opt/module目錄下的軟體拷貝到hadoop103上。

[atguigu@hadoop103 opt]$sudo scp -r atguigu@hadoop101:/opt/module root@hadoop103:/opt/module

（c）在hadoop103上操作将hadoop101中/opt/module目錄下的軟體拷貝到hadoop104上。

[atguigu@hadoop103 opt]$ scp -r atguigu@hadoop101:/opt/module root@hadoop104:/opt/module

注意：拷貝過來的/opt/module目錄，别忘了在hadoop102、hadoop103、hadoop104上修改所有檔案的，所有者和所有者組。sudo chown atguigu:atguigu -R /opt/module

（d）将hadoop101中/etc/profile檔案拷貝到hadoop102的/etc/profile上。

[atguigu@hadoop101 ~]$ sudo scp /etc/profile root@hadoop102:/etc/profile

（e）将hadoop101中/etc/profile檔案拷貝到hadoop103的/etc/profile上。

[atguigu@hadoop101 ~]$ sudo scp /etc/profile root@hadoop103:/etc/profile

（f）将hadoop101中/etc/profile檔案拷貝到hadoop104的/etc/profile上。

[atguigu@hadoop101 ~]$ sudo scp /etc/profile root@hadoop104:/etc/profile

注意：拷貝過來的配置檔案别忘了source一下/etc/profile，。

2. rsync 遠端同步工具

rsync主要用于備份和鏡像。具有速度快、避免複制相同内容和支援符号連結的優點。

rsync和scp差別：用rsync做檔案的複制要比scp的速度快，rsync隻對差異檔案做更新。scp是把所有檔案都複制過去。

（1）基本文法

rsync -rvl fname host:fname

指令選項參數要拷貝的檔案路徑/名稱目的使用者@主機:目的路徑/名稱

選項參數說明

（2）案例實操

a）把hadoop101機器上的/opt/software目錄同步到hadoop102伺服器的root使用者下的/opt/目錄

[atguigu@hadoop101 opt]$ rsync -rvl /opt/software/ root@hadoop102:/opt/software

3. xsync叢集分發腳本

（1）需求：循環複制檔案到所有節點的相同目錄下

（2）需求分析：

（a）rsync指令原始拷貝：

rsync -rvl /opt/module root@hadoop103:/opt/

（b）期望腳本：

xsync要同步的檔案名稱

（c）說明：在/home/atguigu/bin這個目錄下存放的腳本，atguigu使用者可以在系統任何地方直接執行。

（3）腳本實作

（a）在/home/atguigu目錄下建立bin目錄，并在bin目錄下xsync建立檔案，檔案内容如下：

[atguigu@hadoop102 ~]$ mkdir bin
[atguigu@hadoop102 ~]$ cd bin/
[atguigu@hadoop102 bin]$ touch xsync
[atguigu@hadoop102 bin]$ vi xsync

在該檔案中編寫如下代碼

#!/bin/bash
#1 擷取輸入參數個數，如果沒有參數，直接退出
pcount=$#
if((pcount==0)); then
echo no args;
exit;
fi

#2 擷取檔案名稱
p1=$1
fname=`basename $p1`
echo fname=$fname

#3 擷取上級目錄到絕對路徑
pdir=`cd -P $(dirname $p1); pwd`
echo pdir=$pdir

#4 擷取目前使用者名稱
user=`whoami`

#5 循環
for((host=103; host<105; host++)); do
        echo ------------------- hadoop$host --------------
        rsync -rvl $pdir/$fname $user@hadoop$host:$pdir
done

（b）修改腳本 xsync 具有執行權限

[atguigu@hadoop102 bin]$ chmod 777 xsync

（c）調用腳本形式：xsync 檔案名稱

[atguigu@hadoop102 bin]$ xsync /home/atguigu/bin

注意：如果将xsync放到/home/atguigu/bin目錄下仍然不能實作全局使用，可以将xsync移動到/usr/local/bin目錄下。

3.3 叢集配置

1. 叢集部署規劃

2. 配置叢集

（1）核心配置檔案

配置core-site.xml

[atguigu@hadoop102 hadoop]$ vi core-site.xml

在該檔案中編寫如下配置

fs.defaultFS
      hdfs://hadoop102:9000






    hadoop.tmp.dir
    /opt/module/hadoop-2.7.2/data/tmp

（2）HDFS配置檔案

配置hadoop-env.sh

[atguigu@hadoop102 hadoop]$ vi hadoop-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144

配置hdfs-site.xml

[atguigu@hadoop102 hadoop]$ vi hdfs-site.xml

在該檔案中編寫如下配置

dfs.replication
    3






      dfs.namenode.secondary.http-address
      hadoop104:50090

（3）YARN配置檔案

配置yarn-env.sh

[atguigu@hadoop102 hadoop]$ vi yarn-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144

配置yarn-site.xml

[atguigu@hadoop102 hadoop]$ vi yarn-site.xml

在該檔案中增加如下配置

yarn.nodemanager.aux-services
    mapreduce_shuffle






    yarn.resourcemanager.hostname
    hadoop103

（4）MapReduce配置檔案

配置mapred-env.sh

[atguigu@hadoop102 hadoop]$ vi mapred-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144

配置mapred-site.xml

[atguigu@hadoop102 hadoop]$ cp mapred-site.xml.template mapred-site.xml

[atguigu@hadoop102 hadoop]$ vi mapred-site.xml

在該檔案中增加如下配置

mapreduce.framework.name
    yarn

3．在叢集上分發配置好的Hadoop配置檔案

[atguigu@hadoop102 hadoop]$ xsync /opt/module/hadoop-2.7.2/

4．檢視檔案分發情況

[atguigu@hadoop103 hadoop]$ cat /opt/module/hadoop-2.7.2/etc/hadoop/core-site.xml

3.4 叢集單點啟動

（1）如果叢集是第一次啟動，需要格式化NameNode

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop namenode -format

（2）在hadoop102上啟動NameNode

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop-daemon.sh start namenode
[atguigu@hadoop102 hadoop-2.7.2]$ jps
3461 NameNode

（3）在hadoop102、hadoop103以及hadoop104上分别啟動DataNode

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop-daemon.sh start datanode
[atguigu@hadoop102 hadoop-2.7.2]$ jps
3461 NameNode
3608 Jps
3561 DataNode
[atguigu@hadoop103 hadoop-2.7.2]$ hadoop-daemon.sh start datanode
[atguigu@hadoop103 hadoop-2.7.2]$ jps
3190 DataNode
3279 Jps
[atguigu@hadoop104 hadoop-2.7.2]$ hadoop-daemon.sh start datanode
[atguigu@hadoop104 hadoop-2.7.2]$ jps
3237 Jps
3163 DataNode

3.5 SSH無密登入配置

1. 配置ssh

（1）基本文法

ssh另一台電腦的ip位址

（2）ssh連接配接時出現Host key verification failed的解決方法

[atguigu@hadoop102 opt] $ ssh 192.168.1.103
The authenticity of host '192.168.1.103 (192.168.1.103)' can't be established.
RSA key fingerprint is cf:1e:de:d7:d0:4c:2d:98:60:b4:fd:ae:b1:2d:ad:06.
Are you sure you want to continue connecting (yes/no)? 
Host key verification failed.

（3）解決方案如下：直接輸入yes

2. 無密鑰配置

（1）免密登入原理，如圖2-40所示

（2）生成公鑰和私鑰：

[atguigu@hadoop102 .ssh]$ ssh-keygen -t rsa

然後敲（三個回車），就會生成兩個檔案id_rsa（私鑰）、id_rsa.pub（公鑰）

（3）将公鑰拷貝到要免密登入的目标機器上

[atguigu@hadoop102 .ssh]$ ssh-copy-id hadoop102
[atguigu@hadoop102 .ssh]$ ssh-copy-id hadoop103
[atguigu@hadoop102 .ssh]$ ssh-copy-id hadoop104

注意：

還需要在hadoop102上采用root賬号，配置一下無密登入到hadoop102、hadoop103、hadoop104；

還需要在hadoop103上采用atguigu賬号配置一下無密登入到hadoop102、hadoop103、hadoop104伺服器上。

3. .ssh檔案夾下（~/.ssh）的檔案功能解釋

3.6 群起叢集

1. 配置slaves

/opt/module/hadoop-2.7.2/etc/hadoop/slaves
[atguigu@hadoop102 hadoop]$ vi slaves
hadoop102
hadoop103
hadoop104

注意：該檔案中添加的内容結尾不允許有空格，檔案中不允許有空行。

同步所有節點配置檔案

[atguigu@hadoop102 hadoop]$ xsync slaves

2. 啟動叢集

1）如果叢集是第一次啟動，需要格式化NameNode（注意格式化之前，一定要先停止上次啟動的所有namenode和datanode程序，然後再删除data和log資料）

[atguigu@hadoop102 hadoop-2.7.2]$ bin/hdfs namenode -format

（2）啟動HDFS

[atguigu@hadoop102 hadoop-2.7.2]$ sbin/start-dfs.sh
[atguigu@hadoop102 hadoop-2.7.2]$ jps
4166 NameNode
4482 Jps
4263 DataNode
[atguigu@hadoop103 hadoop-2.7.2]$ jps
3218 DataNode
3288 Jps
[atguigu@hadoop104 hadoop-2.7.2]$ jps
3221 DataNode
3283 SecondaryNameNode
3364 Jps

（3）啟動YARN

[atguigu@hadoop103 hadoop-2.7.2]$ sbin/start-yarn.sh

注意：NameNode和ResourceManger如果不是同一台機器，不能在NameNode上啟動 YARN，應該在ResouceManager所在的機器上啟動YARN。

（4）Web端檢視SecondaryNameNode

（a）浏覽器中輸入：http://hadoop104:50090/status.html

（b）檢視SecondaryNameNode資訊，如圖2-41所示。

3. 叢集基本測試

（1）上傳檔案到叢集

上傳小檔案

[atguigu@hadoop102 hadoop-2.7.2]$ hdfs dfs -mkdir -p /user/atguigu/input
[atguigu@hadoop102 hadoop-2.7.2]$ hdfs dfs -put wcinput/wc.input /user/atguigu/input

上傳大檔案

[atguigu@hadoop102 hadoop-2.7.2]$ bin/hadoop fs -put
 /opt/software/hadoop-2.7.2.tar.gz  /user/atguigu/input

（2）上傳檔案後檢視檔案存放在什麼位置

（a）檢視HDFS檔案存儲路徑

[atguigu@hadoop102 subdir0]$ pwd
/opt/module/hadoop-2.7.2/data/tmp/dfs/data/current/BP-938951106-192.168.10.107-1495462844069/current/finalized/subdir0/subdir0

（b）檢視HDFS在磁盤存儲檔案内容

[atguigu@hadoop102 subdir0]$ cat blk_1073741825
hadoop yarn
hadoop mapreduce 
atguigu
atguigu

（3）拼接

-rw-rw-r--. 1 atguigu atguigu 134217728 5月  23 16:01 blk_1073741836
-rw-rw-r--. 1 atguigu atguigu   1048583 5月  23 16:01 blk_1073741836_1012.meta
-rw-rw-r--. 1 atguigu atguigu  63439959 5月  23 16:01 blk_1073741837
-rw-rw-r--. 1 atguigu atguigu    495635 5月  23 16:01 blk_1073741837_1013.meta
[atguigu@hadoop102 subdir0]$ cat blk_1073741836>>tmp.file
[atguigu@hadoop102 subdir0]$ cat blk_1073741837>>tmp.file
[atguigu@hadoop102 subdir0]$ tar -zxvf tmp.file

（4）下載下傳

[atguigu@hadoop102 hadoop-2.7.2]$ bin/hadoop fs -get
 /user/atguigu/input/hadoop-2.7.2.tar.gz ./

3.7 叢集啟動/停止方式總結

1. 各個服務元件逐一啟動/停止

1）分别啟動/停止HDFS元件

hadoop-daemon.sh  start / stop  namenode / datanode / secondarynamenode

2）啟動/停止YARN

yarn-daemon.sh  start / stop  resourcemanager / nodemanager

2. 各個子產品分開啟動/停止（配置ssh是前提）常用

（1）整體啟動/停止HDFS

start-dfs.sh   /  stop-dfs.sh

（2）整體啟動/停止YARN

start-yarn.sh  /  stop-yarn.sh

3.8 叢集時間同步

時間同步的方式：找一個機器，作為時間伺服器，所有的機器與這台叢集時間進行定時的同步，比如，每隔十分鐘，同步一次時間。

配置時間同步具體實操：

1. 時間伺服器配置（必須root使用者）

（1）檢查ntp是否安裝

[root@hadoop102 桌面]# rpm -qa|grep ntp
ntp-4.2.6p5-10.el6.centos.x86_64
fontpackages-filesystem-1.41-1.1.el6.noarch
ntpdate-4.2.6p5-10.el6.centos.x86_64

（2）修改ntp配置檔案

[root@hadoop102 桌面]# vi /etc/ntp.conf

修改内容如下

a）修改1（授權192.168.1.0-192.168.1.255網段上的所有機器可以從這台機器上查詢和同步時間）

#restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap為
restrict 192.168.1.0 mask 255.255.255.0 nomodify notrap

b）修改2（叢集在區域網路中，不使用其他網際網路上的時間）

server 0.centos.pool.ntp.org iburst
server 1.centos.pool.ntp.org iburst
server 2.centos.pool.ntp.org iburst
server 3.centos.pool.ntp.org iburst為
#server 0.centos.pool.ntp.org iburst
#server 1.centos.pool.ntp.org iburst
#server 2.centos.pool.ntp.org iburst
#server 3.centos.pool.ntp.org iburst

c）添加3（當該節點丢失網絡連接配接，依然可以采用本地時間作為時間伺服器為叢集中的其他節點提供時間同步）

server 127.127.1.0
fudge 127.127.1.0 stratum 10

（3）修改/etc/sysconfig/ntpd 檔案

[root@hadoop102 桌面]# vim /etc/sysconfig/ntpd

增加内容如下（讓硬體時間與系統時間一起同步）

SYNC_HWCLOCK=yes

（4）重新啟動ntpd服務

[root@hadoop102 桌面]# service ntpd status

ntpd 已停

[root@hadoop102 桌面]# service ntpd start

正在啟動 ntpd： [确定]

（5）設定ntpd服務開機啟動

[root@hadoop102 桌面]# chkconfig ntpd on

2. 其他機器配置（必須root使用者）

（1）在其他機器配置10分鐘與時間伺服器同步一次

[root@hadoop103桌面]# crontab -e

編寫定時任務如下：

*/10 * * * * /usr/sbin/ntpdate hadoop102

[root@hadoop103桌面]# date -s "2017-9-11 11:11:11"

[root@hadoop103桌面]# date