天天看點

高可用叢集heartbeat安裝配置(一)

一、HA高可

  FailOver:故障轉移 包含HA Resource IP, service,STONITH

  FailBack故障轉移原點

  Faiover domain:故障轉移域

  資源粘性資源更傾向于運作于哪個節點

  Messagin Layer:叢集事務資訊層僅用來傳遞資訊并不負責後期資訊計算與比較

  CRM:claster resource meanager 叢集資料總管負責統計收集叢集上每一個資源狀态根據資源狀态資源服務本身計算出應該運作在哪個節點上。

  DC:Desinated Coordinator 事務協調員

  PE:Policy Engine 政策引擎是CRM一個子功能

  TE:Transaction 事務引擎由它指揮

  LRM:local resource manager 本地資料總管 負責執行

資源限制Constraint

排列限制: (coloation)

資源是否能夠運作于同一節點

score:

正值可以在一起

負值不能在一起

位置限制(location), score(分數)

正值傾向于此節點

負值傾向于逃離于此節點

順序限制: (order)

定義資源啟動或關閉時的次序

vip, ipvs

ipvs-->vip

資源隔離

節點級别STONITH

資源級别

例如FC SAN switch可以實作在存儲資源級别拒絕某節點的通路

STONITH

split-brain: 叢集節點無法有效擷取其它節點的狀态資訊時産生腦裂

後果之一搶占共享存儲

        仲裁磁盤

二、案例

snn 

   192.168.1.5

datanode4 

   192.168.1.6

vip192.168.1.7

伺服器名稱

系統

CPU架構

核心

IP位址

角色

snn.abc.com

CentOS release 6.5

x86_64

2.6.32-431.el6.x86_64

192.168.1.5

master

datanode4.abc.com

192.168.1.6

slave

epel下有我們需要安裝包

heartbeat - Heartbeat subsystem for High-Availability Linux  核心包

heartbeat-devel - Heartbeat development package 開發包

heartbeat-gui - Provides a gui interface to manage heartbeat clusters 管理heartbeat圖形界面

heartbeat-ldirectord - Monitor daemon for maintaining high availability resources, 為ipvs高可用提供規則自動生成及後端realserver健康狀态檢查的元件

heartbeat-pils - Provides a general plugin and interface loading library 裝載庫和插件接口

heartbeat-stonith - Provides an interface to Shoot The Other Node In The Head

三、前期配置

1、主機名解析

[root@snn ~]# cat /etc/hosts

192.168.1.5    snn.abc.com    snn

192.168.1.6    datanode4.abc.com    datanode4

[root@snn ~]# hostname 

[root@snn ~]# cat /etc/sysconfig/network

NETWORKING=yes

HOSTNAME=snn.abc.com

2、雙機互信

snn

#ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ''

#ssh-copy-id -i .ssh/id_rsa.pub [email protected]

執行測試一下

[root@snn ~]# ssh 192.168.1.6 'ifconfig'

datenode4

[root@datanode4 ~]# ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ''

[root@datanode4 ~]# ssh-copy-id -i .ssh/id_rsa.pub [email protected]

[root@datanode4 ~]# ssh 192.168.1.5 'ifconfig'

3、時間同步

[root@snn ~]# crontab -e

*/2 * * * * /usr/sbin/ntpdate time.nist.gov &> /dev/null

[root@snn ~]# scp /var/spool/cron/root datanode4:/var/spool/cron/

四、安裝heartbeat

1、解決依賴安包

[root@snn heartbeat]# yum install perl-TimeDate PyXML libnet net-snmp-libs -y

2、隻需安裝這四個即可

1[root@snn heartbeat]# rpm -ivh heartbeat-2.1.4-12.el6.x86_64.rpm  heartbeat-gui-2.1.4-12.el6.x86_64.rpm heartbeat-pils-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpm 

error: Failed dependencies:

libnet.so.1()(64bit) is needed by heartbeat-2.1.4-12.el6.x86_64

pygtk2-libglade is needed by heartbeat-gui-2.1.4-12.el6.x86_64

 2解決依賴包

        下載下傳安裝epel

        [root@snn heartbeat]# rpm -ivh epel-release-latest-6.noarch.rpm 

 3安裝依賴包libnet

        [root@snn heartbeat]# yum install libnet

  (4)再次安裝

       [root@snn heartbeat]# rpm -ivh heartbeat-2.1.4-12.el6.x86_64.rpm  heartbeat-gui-2.1.4-12.el6.x86_64.rpm heartbeat-pils-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpm 

Preparing...                ########################################### [100%]

   1:heartbeat-pils         ########################################### [ 25%]

   2:heartbeat-stonith      ########################################### [ 50%]

   3:heartbeat              ########################################### [ 75%]

   4:heartbeat-gui          ########################################### [100%]

3、6的節點scp過去

root@snn heartbeat]# scp epel-release-latest-6.noarch.rpm heartbeat-2.1.4-12.el6.x86_64.rpm heartbeat-gui-2.1.4-12.el6.x86_64.rpm heartbeat-pils-2.1.4-12.el6.x86_64.rpm heartbeat-stonith-2.1.4-12.el6.x86_64.rpm  datanode4:/root/heartbeat/

五、配置

1、三個配置檔案預設是沒有的

[root@snn ha.d]# ls /etc/ha.d/

harc  rc.d  README.config  resource.d  shellfuncs

    1密鑰檔案600, authkeys

    2heartbeat服務的配置配置ha.cf

    3資源管理配置檔案haresources

2、複制樣例檔案

[root@snn ha.d]# cp /usr/share/doc/heartbeat-2.1.4/{authkeys,haresources,ha.cf} ./

3、修改authkeys 600權限

[root@snn ha.d]# chmod 600 authkeys

4、做個随機碼

[root@snn ha.d]# dd if=/dev/random count=1 bs=512 | md5sum

記錄了0+1 的讀入

記錄了0+1 的寫出

29位元組(29 B)已複制8.0656e-05 秒360 kB/秒

71cc2b8ff1bd825fce13ceaea932501d  -

[root@snn ha.d]# vim authkeys 

 auth 1

 1 md5 71cc2b8ff1bd825fce13ceaea932501d

5、核心配置檔案ha.cf

ha.cf

debugfile 調試資訊

logfile 日志檔案

logacility 

keepalive 每隔多長時間發送一次心跳資訊

deadtime 多長時間替換

warnrime 警告時間

initdead 啟動heartbeat時多長時間探測

udpprot 端口

bcast 廣播

mcast 多點傳播 255.0.30.1

ucast 多點傳播

auto_failback 是否自動轉回

stonith bay

ping 仲裁裝置

node 節點資訊不能使用ip位址

ping_group ping組

debug debug級别

compression 壓縮傳輸算法

compression_threshold 壓縮大小

驗證以後要關閉服務并設定服務開機不能啟動 

[root@snn ha.d]# vim ha.cf 

bcast   eth0            # Linux

node    snn.abc.com

node    datanode4.abc.com

6、兩台主機都安裝httpd服務

[root@snn ha.d]# yum install httpd

[root@snn ha.d]# echo "<h1>snn.abc.com</h1>" >> /var/www/html/index.html

驗證以後要關閉服務,并設定服務開機不能啟動

[root@snn ha.d]# service httpd stop

[root@datanode4 ha.d]# chkconfig httpd off

[root@snn ha.d]# chkconfig httpd off

7、定義aresources檔案

先說明主節點

node1.magedu.com VIP httpd 

resource.d檔案夾用來定義RA

先找resource.d檔案夾後找/etc/rs.d/init.d/

VIP

ip/netmask/網卡/廣播位址

[root@snn ha.d]# vim haresources 

snn.abc.com IPaddr::192.168.1.7/24/eth0 httpd

8、每個節點都需要有此檔案,scp -p 儲存原來屬性

[root@snn ha.d]# scp -p authkeys ha.cf haresources datanode4:/etc/ha.d/

六、啟動服務

[root@snn ha.d]# service heartbeat start

[root@snn ha.d]# ssh datanode4 'service heartbeat start'

[root@snn ha.d]# tail -f /var/log/messages

Jun 13 17:28:55 snn heartbeat: [3061]: info: Link 192.168.1.1:192.168.1.1 up.

Jun 13 17:28:55 snn heartbeat: [3061]: info: Status update for node 192.168.1.1: status ping

Jun 13 17:28:55 snn heartbeat: [3061]: info: Link snn.abc.com:eth0 up.//兩個節點都up起來了

Jun 13 17:29:02 snn heartbeat: [3061]: info: Link datanode4.abc.com:eth0 up.

Jun 13 17:29:02 snn heartbeat: [3061]: info: Status update for node datanode4.abc.com: status up                                                            //檢查狀态資訊

Jun 13 17:29:02 snn harc[3069]: info: Running /etc/ha.d/rc.d/status status

Jun 13 17:29:03 snn heartbeat: [3061]: info: Comm_now_up(): updating status to active

Jun 13 17:29:03 snn heartbeat: [3061]: info: Local status now set to: 'active'

Jun 13 17:29:03 snn heartbeat: [3061]: info: Status update for node datanode4.abc.com: status active

Jun 13 17:29:03 snn harc[3088]: info: Running /etc/ha.d/rc.d/status status

Jun 13 17:29:13 snn heartbeat: [3061]: info: remote resource transition completed.

Jun 13 17:29:13 snn heartbeat: [3061]: info: Initial resource acquisition complete (T_RESOURCES(us))

Jun 13 17:29:14 snn IPaddr[3141]: INFO:  Resource is stopped

Jun 13 17:29:14 snn heartbeat: [3105]: info: Local Resource acquisition completed.

Jun 13 17:29:14 snn harc[3192]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp

Jun 13 17:29:14 snn ip-request-resp[3192]: received ip-request-resp IPaddr::192.168.1.7/24/eth0 OK yes

Jun 13 17:29:14 snn ResourceManager[3213]: info: Acquiring resource group: snn.abc.com IPaddr::192.168.1.7/24/eth0 httpd

Jun 13 17:29:14 snn IPaddr[3240]: INFO:  Resource is stopped

Jun 13 17:29:14 snn ResourceManager[3213]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.7/24/eth0 start                            //資源配置start

Jun 13 17:29:14 snn IPaddr[3338]: INFO: Using calculated netmask for 192.168.1.7: 255.255.255.0                                            

Jun 13 17:29:14 snn IPaddr[3338]: INFO: eval ifconfig eth0:0 192.168.1.7 netmask 255.255.255.0 broadcast 192.168.1.255

Jun 13 17:29:14 snn IPaddr[3309]: INFO:  Success

Jun 13 17:29:14 snn ResourceManager[3213]: info: Running /etc/init.d/httpd  start //http

[root@snn ha.d]# netstat -tlunp | grep 80

tcp        0      0 :::80                       :::*                        LISTEN      3464/httpd   

[root@snn ha.d]# ifconfig

eth0      Link encap:Ethernet  HWaddr 00:0C:29:B1:89:48  

          inet addr:192.168.1.5  Bcast:192.168.1.255  Mask:255.255.255.0

          inet6 addr: fe80::20c:29ff:feb1:8948/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:35659 errors:0 dropped:0 overruns:0 frame:0

          TX packets:10024 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000 

          RX bytes:4539049 (4.3 MiB)  TX bytes:2100109 (2.0 MiB)

eth0:0    Link encap:Ethernet  HWaddr 00:0C:29:B1:89:48  

          inet addr:192.168.1.7  Bcast:192.168.1.255  Mask:255.255.255.0

七、利用一個腳本模拟主備切換

[root@snn ha.d]# sh /usr/lib64/heartbeat/hb_standby 

2015/06/13_17:42:27 Going standby [all].

Jun 13 17:42:28 snn ResourceManager[3568]: info: Running /etc/init.d/httpd  stop

Jun 13 17:42:28 snn ResourceManager[3568]: info: Running /etc/ha.d/resource.d/IPaddr 192.168.1.7/24/eth0 stop

Jun 13 17:42:29 snn IPaddr[3663]: INFO: ifconfig eth0:0 down

Jun 13 17:42:29 snn IPaddr[3634]: INFO:  Success

Jun 13 17:42:29 snn heartbeat: [3555]: info: all HA resource release completed (standby).

Jun 13 17:42:29 snn heartbeat: [3061]: info: Local standby process completed [all].

Jun 13 17:42:30 snn heartbeat: [3061]: WARN: 1 lost packet(s) for [datanode4.abc.com] [819:821]

Jun 13 17:42:30 snn heartbeat: [3061]: info: remote resource transition completed.

Jun 13 17:42:30 snn heartbeat: [3061]: info: No pkts missing from datanode4.abc.com!

Jun 13 17:42:30 snn heartbeat: [3061]: info: Other node completed standby takeover of all resources.    //其他節點完成備用接管所有的資源

在6這個主機下看看

[root@datanode4 ha.d]# ifconfig

eth0      Link encap:Ethernet  HWaddr 00:0C:29:E1:2F:66  

          inet addr:192.168.1.6  Bcast:192.168.1.255  Mask:255.255.255.0

          inet6 addr: fe80::20c:29ff:fee1:2f66/64 Scope:Link

          RX packets:37277 errors:0 dropped:0 overruns:0 frame:0

          TX packets:3812 errors:0 dropped:0 overruns:0 carrier:0

          RX bytes:5065186 (4.8 MiB)  TX bytes:648956 (633.7 KiB)

eth0:0    Link encap:Ethernet  HWaddr 00:0C:29:E1:2F:66  

lo        Link encap:Local Loopback  

          inet addr:127.0.0.1  Mask:255.0.0.0

          inet6 addr: ::1/128 Scope:Host

          UP LOOPBACK RUNNING  MTU:65536  Metric:1

          RX packets:0 errors:0 dropped:0 overruns:0 frame:0

          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:0 

          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

[root@datanode4 ha.d]# netstat -ltunp | grep 80

tcp        0      0 :::80                       :::*                        LISTEN      2782/httpd     

八、可以通過挂載nfs的方式

1、啟用另一台2.168.1.4datanode.abc.com   datanode 做nfs檔案系統

[root@datanode ~]# mkdir /web/htodcs -p

2、共享的目錄檔案

[root@datanode ~]# vim /etc/exports

/web/htodcs     192.168.0.0/24(ro)

3、啟動nfs服務

[root@datanode ~]# service nfs start

啟動 NFS 服務:                                            [确定]

關掉 NFS 配額:                                            [确定]

啟動 NFS mountd:                                          [确定]

啟動 NFS 守護程序:                                        [确定]

正在啟動 RPC idmapd:                                      [确定]

[root@datanode ~]# showmount -e 192.168.1.4

Export list for 192.168.1.4:

/web/htodcs 192.168.0.0/24

4、來到3這台主機,先把heartbeat停掉,在改資源配置檔案

[root@snn ha.d]# ssh datanode4 '/etc/init.d/heartbeat stop'

Stopping High-Availability services: 

Done.

[root@snn ha.d]# service heartbeat stop

[root@snn ha.d]# mount -t nfs 192.168.1.4:/web/htdocs /mnt

[root@snn ha.d]# mount -l | grep mnt

192.168.1.4:/web/htdocs on /mnt type nfs (rw,vers=4,addr=192.168.1.4,clientaddr=192.168.1.5)

[root@snn ~]# cat /mnt/index.html 

<h1>datanode.abc.com</h1>

測試能挂載上來,

[root@snn ~]# umount /mnt

九、在3主機上資料總管挂載檔案系統

資源先後次序很關鍵

先配置IP,然後配置檔案系統,再配置服務

檔案系統一定在服務之前的

[root@snn ~]# vim /etc/ha.d/ha.cf 

snn.abc.com IPaddr::192.168.1.7/24/eth0 Filesystem::192.168.1.4:/web/htdocs::/var/www/html::nfs httpd

[root@snn ~]# scp /etc/ha.d/haresources datanode4:/etc/ha.d/haresources 

十、啟動heartbeat後,檢視日志 

//有錯,原因已經在heartbeat第二章寫出來了!

<a href="http://down.51cto.com/data/2365804" target="_blank">附件:http://down.51cto.com/data/2365804</a>

本文轉自 zouqingyun 51CTO部落格,原文連結:http://blog.51cto.com/zouqingyun/1661621,如需轉載請自行聯系原作者