Pacemaker+corosync搭建雙節點HA叢集的可靠性驗證

前一篇>中為確定共享資源的不被破壞，配置了3節點叢集，本文想驗證一下雙節點時有什麼風險。

Pacemaker的手冊上也有描述，Pacemaker支援法定投票和資源搶占2種方式防止腦裂。法定投票的方式确實很可靠，但是至少需要3票。如果在主備雙機以外再專門搞一台機器以滿足法定投票要求似乎太浪費。是以一些商業叢集軟體(比如MSCS,RHCS)除了節點以外還引入的仲裁盤。這個仲裁盤也算上一票，加上2個節點，共3票，隻要獲得其中2票即可。

參考

http://www.adirectory.blog.com/2013/01/cluster-quorum-disk/

使用仲裁盤需要有共享存儲，對許多企業使用者來說共享存儲是标配，是以也不需要額外的投資。

但是開源的Pacemaker就不依賴于共享存儲，也就沒有仲裁盤的說法。那麼在Pacemaker上配置搶占資源如何呢？我們沒找到配置的案例，但設想可以這樣做：

提供一個檔案伺服器（比如NFS），在上面建一個檔案作為鎖檔案。然後自己寫一個RA（不知道有沒有現成的這樣的RA），它做的事情就是start的時候以寫的方式打開這個檔案，每次的monitor操作就是往檔案裡寫一個值再讀出來。把這個RA資源作為其他資源的依賴資源，萬一發生腦裂誰搶到這個資源誰就是主節點。

這個方法有個問題，就是這個資源成了單點故障點，萬一檔案伺服器挂了，HA叢集也起不來了。

(當然，如果HA的業務本身天生有一個隻可能被一個節點獲得的資源，就皆大歡喜了。)

下面驗證一下Pacemaker在沒有法定投票沒有搶占資源的情況下會怎麼樣?發生腦裂時會不會導緻一個共享資源在2個節點上同時被加載？

沿用前一篇>(http://blog.chinaunix.net/uid-20726500-id-4453488.html)的環境，但改成雙節點叢集。

共享存儲伺服器

OS：CentOS release 6.5 (Final)

主機名:disknode

網卡1：

保留

網卡類型：NAT

IP：192.168.152.120

網卡2：

用于共享盤的iscsi通信

網卡類型：Host-Only

IP：192.168.146.120

網卡3：

用于external/ssh fence裝置通信

網卡類型：橋接

IP：10.167.217.107

HA節點1

主機名:hanode1

用于叢集公開IP（192.168.152.200）和叢集内部消息的通信

IP：192.168.152.130

IP：192.168.146.130

IP：10.167.217.169

HA節點2

OS:CentOS release 6.5 (Final)

主機名:hanode2

IP：192.168.152.140

IP：192.168.146.140

IP：10.167.217.171

叢集公開IP

192.168.152.200

在原來已配好的3節點的環境上，把disknode從叢集管理裡移除。

修改配置使達不到法定票數時的動作為忽略

no-quorum-policy=ignore

并修改法定票數

expected-quorum-votes=2

将disknode相關的配置删掉

location no_iscsid rs_iscsid -inf: disknode

location votenode ClusterIP -inf: disknode

[root@hanode1 ~]# crm configure edit

node disknode

node hanode1

node hanode2

primitive ClusterIP IPaddr2 \

params ip=192.168.152.200 cidr_netmask=32 \

op monitor interval=30s

primitive DataFS Filesystem \

params device="/dev/sdc" directory="/mnt/pg" fstype=ext4 \

op monitor interval=15s

primitive pg93 pgsql \

meta target-role=Started is-managed=true migration-threshold=INFINITY failure-timeout=60s \

primitive rs_iscsid lsb:iscsid \

op monitor interval=30s \

meta target-role=Started

primitive st-ssh stonith:external/ssh \

params hostlist="hanode1 hanode2"

group PgGroup ClusterIP rs_iscsid DataFS pg93

clone st-sshclone st-ssh

property cib-bootstrap-options: \

dc-version=1.1.9-2a917dd \

cluster-infrastructure="classic openais (with plugin)" \

expected-quorum-votes=2 \

stonith-enabled=true \

no-quorum-policy=ignore \

last-lrm-refresh=1409756808

#vim:set syntax=pcmk

在disknode上停掉corosync服務

[root@disknode ~]# /etc/init.d/corosync stop

Signaling Corosync Cluster Engine (corosync) to terminate: [ OK ]

Waiting for corosync services to unload:. [ OK ]

[root@disknode ~]# chkconfig corosync off

進入剩下的其中一個節點hanode1的終端，從CIB中移除disknode

[root@hanode1 ~]# crm_node -R disknode --force

結果很奇怪，它居然把hanode2删掉了，多試幾次還遇到把hanode2重新開機的情況。

[root@hanode1 ~]# crm status

Last updated: Fri Sep 5 23:48:50 2014

Last change: Fri Sep 5 23:47:50 2014 via crm_node on hanode1

Stack: classic openais (with plugin)

Current DC: hanode1 - partition with quorum

Version: 1.1.9-2a917dd

2 Nodes configured, 2 expected votes

6 Resources configured.

Node disknode: UNCLEAN (offline)

Online: [ hanode1 ]

Resource Group: PgGroup

ClusterIP (ocf::heartbeat:IPaddr2): Started hanode1

rs_iscsid (lsb:iscsid): Started hanode1

DataFS (ocf::heartbeat:Filesystem): Started hanode1

pg93 (ocf::heartbeat:pgsql): Started hanode1

Clone Set: st-sshclone [st-ssh]

Started: [ hanode1 ]

Stopped: [ st-ssh:1 ]

/var/log/messages日志裡有類似這樣的錯誤消息

Sep 5 21:27:53 hanode1 corosync[3827]: [pcmk ] info: pcmk_remove_member: Sent: remove-peer:disknode

Sep 5 21:27:53 hanode1 corosync[3827]: [pcmk ] ERROR: ais_get_int: Characters left over after parsing 'disknode': 'disknode'

再執行一次，這回把disknode删掉了

Last updated: Fri Sep 5 23:50:16 2014

Last change: Fri Sep 5 23:50:14 2014 via crm_node on hanode1

1 Nodes configured, 2 expected votes

5 Resources configured.

再重新開機hanode2伺服器，狀态終于正常了。

Last updated: Sat Sep 6 00:01:16 2014

Last change: Fri Sep 5 23:57:54 2014 via crmd on hanode1

Online: [ hanode1 hanode2 ]

Started: [ hanode1 hanode2 ]

*）本來隻是想重新開機corosync服務，但是停不掉corosync服務，于是殺掉corosync程序，正準備啟動corosync服務，發現hanode2被fencing掉了。

殺掉主伺服器的corosync程序

[root@hanode1 ~]# ps -ef|grep corosync

root 1355 1 0 00:00 ? 00:00:02 corosync

root 4606 2103 0 00:18 pts/0 00:00:00 grep corosync

[root@hanode1 ~]# kill -9 1355

馬上發現hanode1被重新開機，而後hanode2接管服務。

[root@hanode2 ~]# crm status

Last updated: Sat Sep 6 00:22:45 2014

Current DC: hanode2 - partition with quorum

ClusterIP (ocf::heartbeat:IPaddr2): Started hanode2

rs_iscsid (lsb:iscsid): Started hanode2

DataFS (ocf::heartbeat:Filesystem): Started hanode2

pg93 (ocf::heartbeat:pgsql): Started hanode2

檢視hanode2上的日志，也發現hanode2在fencing成功後再接管資源，保證了資源不會被2個節點同時擁有。

[root@hanode2 ~]# vi /var/log/messages

Sep 6 00:15:08 hanode2 stonith-ng[1362]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for hanode1: 3daae4d8-f3f5-47bb-ac32-1a7106099eca (0)

Sep 6 00:15:13 hanode2 stonith-ng[1362]: notice: log_operation: Operation 'reboot' [2149] (call 0 from crmd.1366) for host 'hanode1' with device 'st-ssh' returned: 0 (OK)

Sep 6 00:15:13 hanode2 stonith-ng[1362]: notice: remote_op_done: Operation reboot of hanode1 by hanode2 for [email protected]: OK

...

Sep 6 00:15:14 hanode2 pengine[1365]: notice: LogActions: Start rs_iscsid#011(hanode2)

Sep 6 00:15:14 hanode2 pengine[1365]: notice: LogActions: Start DataFS#011(hanode2)

Sep 6 00:15:14 hanode2 pengine[1365]: notice: LogActions: Start pg93#011(hanode2)

在VMWare上禁用主伺服器hanode2的心跳網卡。好戲來了，hanode1和hanode2同時幹掉了對方。

hanode1和hanode2啟動後，hanode1手快了一點，hanode2被殺掉。hanode2起來後又把hanode1幹掉...

看來這種場景下，就是2個節點互砍，誰也成不了赢家。

前面的場景2實際是心跳的網絡成了單點故障，可以在心跳網絡上引入網絡裝置的備援，提高心跳網絡的穩定性。除此以外還沒有别的方法呢。設想有2種方法：

把hanode1和hanode2的心跳線接到同一個路由器上，并用這個路由器的ip位址作為pingnode。用這個pingnode做仲裁。

下面看看這個方法有沒效。路由器的ip為192.168.152.2

primitive pingCheck ocf:pacemaker:ping \

params name=default_ping_set host_list=192.168.152.2 multiplier=100 \

op start timeout=60s interval=0s on-fail=restart \

op monitor timeout=60s interval=10s on-fail=restart \

op stop timeout=60s interval=0s on-fail=ignore

clone clnPingCheck pingCheck

location rsc_location PgGroup \

rule $id="rsc_location-rule" -inf: not_defined default_ping_set or default_ping_set lt 100

order rsc_orderi 0: clnPingCheck PgGroup

修改後把2個節點的corosync服務重新開機一下，過一會狀态更新。

[root@hanode1 ~]# crm_mon -Afr1

Last updated: Sat Sep 6 01:36:57 2014

Last change: Sat Sep 6 01:36:08 2014 via cibadmin on hanode1

8 Resources configured.

Full list of resources:

Clone Set: clnPingCheck [pingCheck]

Node Attributes:

* Node hanode1:

+ default_ping_set : 100

* Node hanode2:

Migration summary:

* Node hanode2:

* Node hanode1:

現在把主服務的心跳網卡禁掉。結果和以前一樣，2個機器還是互殺。原因在于fencing機制在pingCheck之前動作，pingCheck僅僅可以影響資源的位置。看來這個方法不行。

把stonith-action 的動作從reboot改為off，這樣手快的那一方有可能成為赢家。

[root@hanode1 ~]# crm_attribute --attr-name stonith-action --attr-value off

這裡用的測試用的stonith裝置external/ssh是不支援poweroff的，為了測試把external/ssh腳本改了一下，最終修改過的external/ssh腳本見附錄。

再把前面加的pingCheck去掉，再試一次，把主服務的心跳網卡禁掉。很不幸2個機器都關機了。

看了下external/ssh的腳本，關機前sleep了2秒，把這個sleep去掉再試。這次終于有1個幸存了。

POWEROFF_COMMAND="echo 'sleep 2; /sbin/poweroff -nf' | SHELL=/bin/sh at now >/dev/null 2>&1"

==》

POWEROFF_COMMAND="echo '/sbin/poweroff -nf' | SHELL=/bin/sh at now >/dev/null 2>&1"

在雙節點的情況下，隻要有fencing裝置就可以確定共享資源不被破壞了。如果沒有fencing裝置，就必須要配置搶占資源。當心跳網絡出現故障時沒法保障雙機叢集依然可用，但通過将stonith-action 的設定為off，可在很大機率上使得腦裂時有1台機器還活着。

修改過的external/ssh腳本

[root@hanode1 ~]# cat /usr/lib64/stonith/plugins/external/ssh

點選(此處)折疊或打開

#!/bin/sh

# External STONITH module for ssh.

# This program is free software; you can redistribute it and/or modify

# it under the terms of version 2 of the GNU General Public License as

# published by the Free Software Foundation.

# This program is distributed in the hope that it would be useful, but

# WITHOUT ANY WARRANTY; without even the implied warranty of

# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

# Further, this software is distributed without any warranty that it is

# free of the rightful claim of any third person regarding infringement

# or the like. Any license provided herein, whether implied or

# otherwise, applies only to this software file. Patent licenses, if

# any, provided herein do not apply to combinations of this program with

# other software, or any other product whatsoever.

# You should have received a copy of the GNU General Public License

# along with this program; if not, write the Free Software Foundation,

# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.

SSH_COMMAND="/usr/bin/ssh -q -x -o PasswordAuthentication=no -o StrictHostKeyChecking=no -n -l root"

#SSH_COMMAND="/usr/bin/ssh -q -x -n -l root"

REBOOT_COMMAND="echo 'sleep 2; /sbin/reboot -nf' | SHELL=/bin/sh at now >/dev/null 2>&1"

# Warning: If you select this poweroff command, it'll physically

# power-off the machine, and quite a number of systems won't be remotely

# revivable.

# TODO: Probably should touch a file on the server instead to just

# prevent heartbeat et al from being started after the reboot.

#POWEROFF_COMMAND="echo 'sleep 2; /sbin/reboot -nf' | SHELL=/bin/sh at now >/dev/null 2>&1"

# Rewrite the hostlist to accept "," as a delimeter for hostnames too.

hostlist=`echo $hostlist | tr ',' ' '`

is_host_up() {

for j in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

ping -w1 -c1 "$1" >/dev/null 2>&1

then

sleep 1

else

return 1

done

return 0

}

echo hostlist="$hostlist" para="$*" >>/var/stonith_ssh.log

case $1 in

gethosts)

for h in $hostlist ; do

echo $h

exit 0

;;

on)

# Can't really be implemented because ssh cannot power on a system

# when it is powered off.

exit 1

off)

# Shouldn't really be implemented because if ssh cannot power on a

# system, it shouldn't be allowed to power it off.

# exit 1

# ;;

h_target=`echo $2 | tr A-Z a-z`

for h in $hostlist

h=`echo $h | tr A-Z a-z`

[ "$h" != "$h_target" ] &&

continue

case ${livedangerously} in

[Yy]*) is_host_up $h;;

*) true;;

esac

$SSH_COMMAND "$2" "$POWEROFF_COMMAND"

# Good thing this is only for testing...

is_host_up $h

# well... Let's call it successful, after all this is only for testing...

reset)

[Yy]*) is_host_up $h;;

*) true;;

$SSH_COMMAND "$2" "$REBOOT_COMMAND"

status)

[ -z "$hostlist" ]

ping -w1 -c1 "$h" 2>&1 | grep "unknown host"

getconfignames)

echo "hostlist"

getinfo-devid)

echo "ssh STONITH device"

getinfo-devname)

echo "ssh STONITH external device"

getinfo-devdescr)

echo "ssh-based host reset"

echo "Fine for testing, but not suitable for production!"

echo "Only reboot action supported, no poweroff, and, surprisingly enough, no poweron."

getinfo-devurl)

echo "http://openssh.org"

getinfo-xml)

cat

Hostlist

The list of hosts that the STONITH device controls

Live Dangerously!!

Set to "yes" if you want to risk your system's integrity.

Of course, since this plugin isn't for production, using it

in production at all is a bad idea. On the other hand,

setting this parameter to yes makes it an even worse idea.

Viva la Vida Loca!

SSHXML

Pacemaker+corosync搭建雙節點HA叢集的可靠性驗證

繼續閱讀

Shell程式設計——sort排序、uniq忽略重複、tr替換壓縮删除、cut指定删除字段、正規表達式元字元sort 指令uniq 指令tr 指令cut 指令正規表達式

Ubuntu14.04 LTS下安裝mongodb

Linxu常用指令技巧彙總

httpd服務的部署、啟動、配置和簡單優化一、部署二、啟動三、配置檔案

配置網頁内容通路

手動安裝Intel network I217-LM網卡的Linux驅動

《Linux指令行與Shell腳本程式設計大全第2版.布盧姆》pdf

禁止ubuntu系統彈出報錯界面

Ubuntu Linux下Apache的配置檔案

ACS基本配置-權限等級管理

Bugku-WEB-web33

samba伺服器的功能

【Linux】UDP廣播封包接收速率問題

Linux裝置模型（中）之上層容器

PowerPC平台 Linux移植三