前一篇>中為確定共享資源的不被破壞,配置了3節點叢集,本文想驗證一下雙節點時有什麼風險。
Pacemaker的手冊上也有描述,Pacemaker支援法定投票和資源搶占2種方式防止腦裂。法定投票的方式确實很可靠,但是至少需要3票。如果在主備雙機以外再專門搞一台機器以滿足法定投票要求似乎太浪費。是以一些商業叢集軟體(比如MSCS,RHCS)除了節點以外還引入的仲裁盤。這個仲裁盤也算上一票,加上2個節點,共3票,隻要獲得其中2票即可。
參考
http://www.adirectory.blog.com/2013/01/cluster-quorum-disk/
使用仲裁盤需要有共享存儲,對許多企業使用者來說共享存儲是标配,是以也不需要額外的投資。
但是開源的Pacemaker就不依賴于共享存儲,也就沒有仲裁盤的說法。那麼在Pacemaker上配置搶占資源如何呢?我們沒找到配置的案例,但設想可以這樣做:
提供一個檔案伺服器(比如NFS),在上面建一個檔案作為鎖檔案。然後自己寫一個RA(不知道有沒有現成的這樣的RA),它做的事情就是start的時候以寫的方式打開這個檔案,每次的monitor操作就是往檔案裡寫一個值再讀出來。把這個RA資源作為其他資源的依賴資源,萬一發生腦裂誰搶到這個資源誰就是主節點。
這個方法有個問題,就是這個資源成了單點故障點,萬一檔案伺服器挂了,HA叢集也起不來了。
(當然,如果HA的業務本身天生有一個隻可能被一個節點獲得的資源,就皆大歡喜了。)
下面驗證一下Pacemaker在沒有法定投票沒有搶占資源的情況下會怎麼樣?發生腦裂時會不會導緻一個共享資源在2個節點上同時被加載?
沿用前一篇>(http://blog.chinaunix.net/uid-20726500-id-4453488.html)的環境,但改成雙節點叢集。
共享存儲伺服器
OS:CentOS release 6.5 (Final)
主機名:disknode
網卡1:
保留
網卡類型:NAT
IP:192.168.152.120
網卡2:
用于共享盤的iscsi通信
網卡類型:Host-Only
IP:192.168.146.120
網卡3:
用于external/ssh fence裝置通信
網卡類型:橋接
IP:10.167.217.107
HA節點1
主機名:hanode1
用于叢集公開IP(192.168.152.200)和叢集内部消息的通信
IP:192.168.152.130
IP:192.168.146.130
IP:10.167.217.169
HA節點2
OS:CentOS release 6.5 (Final)
主機名:hanode2
IP:192.168.152.140
IP:192.168.146.140
IP:10.167.217.171
叢集公開IP
192.168.152.200
在原來已配好的3節點的環境上,把disknode從叢集管理裡移除。
修改配置使達不到法定票數時的動作為忽略
no-quorum-policy=ignore
并修改法定票數
expected-quorum-votes=2
将disknode相關的配置删掉
location no_iscsid rs_iscsid -inf: disknode
location votenode ClusterIP -inf: disknode
[root@hanode1 ~]# crm configure edit
node disknode
node hanode1
node hanode2
primitive ClusterIP IPaddr2 \
params ip=192.168.152.200 cidr_netmask=32 \
op monitor interval=30s
primitive DataFS Filesystem \
params device="/dev/sdc" directory="/mnt/pg" fstype=ext4 \
op monitor interval=15s
primitive pg93 pgsql \
meta target-role=Started is-managed=true migration-threshold=INFINITY failure-timeout=60s \
primitive rs_iscsid lsb:iscsid \
op monitor interval=30s \
meta target-role=Started
primitive st-ssh stonith:external/ssh \
params hostlist="hanode1 hanode2"
group PgGroup ClusterIP rs_iscsid DataFS pg93
clone st-sshclone st-ssh
property cib-bootstrap-options: \
dc-version=1.1.9-2a917dd \
cluster-infrastructure="classic openais (with plugin)" \
expected-quorum-votes=2 \
stonith-enabled=true \
no-quorum-policy=ignore \
last-lrm-refresh=1409756808
#vim:set syntax=pcmk
在disknode上停掉corosync服務
[root@disknode ~]# /etc/init.d/corosync stop
Signaling Corosync Cluster Engine (corosync) to terminate: [ OK ]
Waiting for corosync services to unload:. [ OK ]
[root@disknode ~]# chkconfig corosync off
進入剩下的其中一個節點hanode1的終端,從CIB中移除disknode
[root@hanode1 ~]# crm_node -R disknode --force
結果很奇怪,它居然把hanode2删掉了,多試幾次還遇到把hanode2重新開機的情況。
[root@hanode1 ~]# crm status
Last updated: Fri Sep 5 23:48:50 2014
Last change: Fri Sep 5 23:47:50 2014 via crm_node on hanode1
Stack: classic openais (with plugin)
Current DC: hanode1 - partition with quorum
Version: 1.1.9-2a917dd
2 Nodes configured, 2 expected votes
6 Resources configured.
Node disknode: UNCLEAN (offline)
Online: [ hanode1 ]
Resource Group: PgGroup
ClusterIP (ocf::heartbeat:IPaddr2): Started hanode1
rs_iscsid (lsb:iscsid): Started hanode1
DataFS (ocf::heartbeat:Filesystem): Started hanode1
pg93 (ocf::heartbeat:pgsql): Started hanode1
Clone Set: st-sshclone [st-ssh]
Started: [ hanode1 ]
Stopped: [ st-ssh:1 ]
/var/log/messages日志裡有類似這樣的錯誤消息
Sep 5 21:27:53 hanode1 corosync[3827]: [pcmk ] info: pcmk_remove_member: Sent: remove-peer:disknode
Sep 5 21:27:53 hanode1 corosync[3827]: [pcmk ] ERROR: ais_get_int: Characters left over after parsing 'disknode': 'disknode'
再執行一次,這回把disknode删掉了
Last updated: Fri Sep 5 23:50:16 2014
Last change: Fri Sep 5 23:50:14 2014 via crm_node on hanode1
1 Nodes configured, 2 expected votes
5 Resources configured.
再重新開機hanode2伺服器,狀态終于正常了。
Last updated: Sat Sep 6 00:01:16 2014
Last change: Fri Sep 5 23:57:54 2014 via crmd on hanode1
Online: [ hanode1 hanode2 ]
Started: [ hanode1 hanode2 ]
*)本來隻是想重新開機corosync服務,但是停不掉corosync服務,于是殺掉corosync程序,正準備啟動corosync服務,發現hanode2被fencing掉了。
殺掉主伺服器的corosync程序
[root@hanode1 ~]# ps -ef|grep corosync
root 1355 1 0 00:00 ? 00:00:02 corosync
root 4606 2103 0 00:18 pts/0 00:00:00 grep corosync
[root@hanode1 ~]# kill -9 1355
馬上發現hanode1被重新開機,而後hanode2接管服務。
[root@hanode2 ~]# crm status
Last updated: Sat Sep 6 00:22:45 2014
Current DC: hanode2 - partition with quorum
ClusterIP (ocf::heartbeat:IPaddr2): Started hanode2
rs_iscsid (lsb:iscsid): Started hanode2
DataFS (ocf::heartbeat:Filesystem): Started hanode2
pg93 (ocf::heartbeat:pgsql): Started hanode2
檢視hanode2上的日志,也發現hanode2在fencing成功後再接管資源,保證了資源不會被2個節點同時擁有。
[root@hanode2 ~]# vi /var/log/messages
Sep 6 00:15:08 hanode2 stonith-ng[1362]: notice: initiate_remote_stonith_op: Initiating remote operation reboot for hanode1: 3daae4d8-f3f5-47bb-ac32-1a7106099eca (0)
Sep 6 00:15:13 hanode2 stonith-ng[1362]: notice: log_operation: Operation 'reboot' [2149] (call 0 from crmd.1366) for host 'hanode1' with device 'st-ssh' returned: 0 (OK)
Sep 6 00:15:13 hanode2 stonith-ng[1362]: notice: remote_op_done: Operation reboot of hanode1 by hanode2 for [email protected]: OK
...
Sep 6 00:15:14 hanode2 pengine[1365]: notice: LogActions: Start rs_iscsid#011(hanode2)
Sep 6 00:15:14 hanode2 pengine[1365]: notice: LogActions: Start DataFS#011(hanode2)
Sep 6 00:15:14 hanode2 pengine[1365]: notice: LogActions: Start pg93#011(hanode2)
在VMWare上禁用主伺服器hanode2的心跳網卡。好戲來了,hanode1和hanode2同時幹掉了對方。
hanode1和hanode2啟動後,hanode1手快了一點,hanode2被殺掉。hanode2起來後又把hanode1幹掉...
看來這種場景下,就是2個節點互砍,誰也成不了赢家。
前面的場景2實際是心跳的網絡成了單點故障,可以在心跳網絡上引入網絡裝置的備援,提高心跳網絡的穩定性。除此以外還沒有别的方法呢。設想有2種方法:
把hanode1和hanode2的心跳線接到同一個路由器上,并用這個路由器的ip位址作為pingnode。用這個pingnode做仲裁。
下面看看這個方法有沒效。路由器的ip為192.168.152.2
primitive pingCheck ocf:pacemaker:ping \
params name=default_ping_set host_list=192.168.152.2 multiplier=100 \
op start timeout=60s interval=0s on-fail=restart \
op monitor timeout=60s interval=10s on-fail=restart \
op stop timeout=60s interval=0s on-fail=ignore
clone clnPingCheck pingCheck
location rsc_location PgGroup \
rule $id="rsc_location-rule" -inf: not_defined default_ping_set or default_ping_set lt 100
order rsc_orderi 0: clnPingCheck PgGroup
修改後把2個節點的corosync服務重新開機一下,過一會狀态更新。
[root@hanode1 ~]# crm_mon -Afr1
Last updated: Sat Sep 6 01:36:57 2014
Last change: Sat Sep 6 01:36:08 2014 via cibadmin on hanode1
8 Resources configured.
Full list of resources:
Clone Set: clnPingCheck [pingCheck]
Node Attributes:
* Node hanode1:
+ default_ping_set : 100
* Node hanode2:
Migration summary:
* Node hanode2:
* Node hanode1:
現在把主服務的心跳網卡禁掉。結果和以前一樣,2個機器還是互殺。原因在于fencing機制在pingCheck之前動作,pingCheck僅僅可以影響資源的位置。看來這個方法不行。
把stonith-action 的動作從reboot改為off,這樣手快的那一方有可能成為赢家。
[root@hanode1 ~]# crm_attribute --attr-name stonith-action --attr-value off
這裡用的測試用的stonith裝置external/ssh是不支援poweroff的,為了測試把external/ssh腳本改了一下,最終修改過的external/ssh腳本見附錄。
再把前面加的pingCheck去掉,再試一次,把主服務的心跳網卡禁掉。很不幸2個機器都關機了。
看了下external/ssh的腳本,關機前sleep了2秒,把這個sleep去掉再試。這次終于有1個幸存了。
POWEROFF_COMMAND="echo 'sleep 2; /sbin/poweroff -nf' | SHELL=/bin/sh at now >/dev/null 2>&1"
==》
POWEROFF_COMMAND="echo '/sbin/poweroff -nf' | SHELL=/bin/sh at now >/dev/null 2>&1"
在雙節點的情況下,隻要有fencing裝置就可以確定共享資源不被破壞了。如果沒有fencing裝置,就必須要配置搶占資源。當心跳網絡出現故障時沒法保障雙機叢集依然可用,但通過将stonith-action 的設定為off,可在很大機率上使得腦裂時有1台機器還活着。
修改過的external/ssh腳本
[root@hanode1 ~]# cat /usr/lib64/stonith/plugins/external/ssh
點選(此處)折疊或打開
#!/bin/sh
#
# External STONITH module for ssh.
# Copyright (c) 2004 SUSE LINUX AG - Lars Marowsky-Bree
# This program is free software; you can redistribute it and/or modify
# it under the terms of version 2 of the GNU General Public License as
# published by the Free Software Foundation.
# This program is distributed in the hope that it would be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
# Further, this software is distributed without any warranty that it is
# free of the rightful claim of any third person regarding infringement
# or the like. Any license provided herein, whether implied or
# otherwise, applies only to this software file. Patent licenses, if
# any, provided herein do not apply to combinations of this program with
# other software, or any other product whatsoever.
# You should have received a copy of the GNU General Public License
# along with this program; if not, write the Free Software Foundation,
# Inc., 59 Temple Place - Suite 330, Boston MA 02111-1307, USA.
SSH_COMMAND="/usr/bin/ssh -q -x -o PasswordAuthentication=no -o StrictHostKeyChecking=no -n -l root"
#SSH_COMMAND="/usr/bin/ssh -q -x -n -l root"
REBOOT_COMMAND="echo 'sleep 2; /sbin/reboot -nf' | SHELL=/bin/sh at now >/dev/null 2>&1"
# Warning: If you select this poweroff command, it'll physically
# power-off the machine, and quite a number of systems won't be remotely
# revivable.
# TODO: Probably should touch a file on the server instead to just
# prevent heartbeat et al from being started after the reboot.
#POWEROFF_COMMAND="echo 'sleep 2; /sbin/reboot -nf' | SHELL=/bin/sh at now >/dev/null 2>&1"
# Rewrite the hostlist to accept "," as a delimeter for hostnames too.
hostlist=`echo $hostlist | tr ',' ' '`
is_host_up() {
for j in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
do
if
ping -w1 -c1 "$1" >/dev/null 2>&1
then
sleep 1
else
return 1
fi
done
return 0
}
echo hostlist="$hostlist" para="$*" >>/var/stonith_ssh.log
case $1 in
gethosts)
for h in $hostlist ; do
echo $h
exit 0
;;
on)
# Can't really be implemented because ssh cannot power on a system
# when it is powered off.
exit 1
off)
# Shouldn't really be implemented because if ssh cannot power on a
# system, it shouldn't be allowed to power it off.
# exit 1
# ;;
h_target=`echo $2 | tr A-Z a-z`
for h in $hostlist
h=`echo $h | tr A-Z a-z`
[ "$h" != "$h_target" ] &&
continue
case ${livedangerously} in
[Yy]*) is_host_up $h;;
*) true;;
esac
$SSH_COMMAND "$2" "$POWEROFF_COMMAND"
# Good thing this is only for testing...
is_host_up $h
# well... Let's call it successful, after all this is only for testing...
reset)
[Yy]*) is_host_up $h;;
*) true;;
$SSH_COMMAND "$2" "$REBOOT_COMMAND"
status)
[ -z "$hostlist" ]
ping -w1 -c1 "$h" 2>&1 | grep "unknown host"
getconfignames)
echo "hostlist"
getinfo-devid)
echo "ssh STONITH device"
getinfo-devname)
echo "ssh STONITH external device"
getinfo-devdescr)
echo "ssh-based host reset"
echo "Fine for testing, but not suitable for production!"
echo "Only reboot action supported, no poweroff, and, surprisingly enough, no poweron."
getinfo-devurl)
echo "http://openssh.org"
getinfo-xml)
cat
Hostlist
The list of hosts that the STONITH device controls
Live Dangerously!!
Set to "yes" if you want to risk your system's integrity.
Of course, since this plugin isn't for production, using it
in production at all is a bad idea. On the other hand,
setting this parameter to yes makes it an even worse idea.
Viva la Vida Loca!
SSHXML
*)