天天看點

iptables導緻heartbeat腦裂

在将heartbeat應用到生産環境中,還是有許多要注意的地方,一不小心就可能導緻heartbeat無法切換或腦裂的情況,下面來介紹下由于iptables導緻腦裂的現象。

主:192.168.3.218

      192.168.4.218 心跳ip

      usvr-218 主機名

備:192.168.3.128

     192.168.4.128 心跳ip

      usvr-128 主機名

現象:當啟動heartbeat主後,VIP在218上生效;然後再啟動heartbeat備,VIP在128上也生效;此時腦裂産生,導緻通路異常。

解決思路:

1.檢視主機和備機的日志

主機218日志如下(隻列出部分日志):

heartbeat[27330]: 2015/01/27_09:05:29 ERROR: Message hist queue is filling up (500 messages in queue)

heartbeat[27330]: 2015/01/27_09:05:30 ERROR: Message hist queue is filling up (500 messages in queue)

heartbeat[27330]: 2015/01/27_09:05:30 ERROR: Message hist queue is filling up (500 messages in queue)

heartbeat[27330]: 2015/01/27_09:05:31 ERROR: Message hist queue is filling up (500 messages in queue)

heartbeat[27330]: 2015/01/27_09:05:32 ERROR: Message hist queue is filling up (500 messages in queue)

heartbeat[27330]: 2015/01/27_09:05:32 ERROR: Message hist queue is filling up (500 messages in queue)

heartbeat[27330]: 2015/01/27_09:05:33 WARN: node usvr-128: is dead

heartbeat[27330]: 2015/01/27_09:05:33 info: Cancelling pending standby operation

heartbeat[27330]: 2015/01/27_09:05:33 info: Dead node usvr-128 gave up resources.

heartbeat[27330]: 2015/01/27_09:05:33 info: all clients are now resumed

heartbeat[27330]: 2015/01/27_09:05:33 ERROR: lowseq cannnot be greater than ackseq

heartbeat[27330]: 2015/01/27_09:05:33 info: hist->ackseq =74575, old_ackseq=0

heartbeat[27330]: 2015/01/27_09:05:33 info: hist->lowseq =74576, hist->hiseq=74824, send_cluster_msg_level=1

heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Emergency Shutdown: Master Control process died.

heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27330 with SIGTERM

heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27334 with SIGTERM

heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27335 with SIGTERM

heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27336 with SIGTERM

heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Killing pid 27337 with SIGTERM

heartbeat[27333]: 2015/01/27_09:05:34 CRIT: Emergency Shutdown(MCP dead): Killing ourselves.

備機128日志如下(隻列出部分日志):

Jan 27 10:11:35 heartbeat: [15999]: info: glib: ucast: bound receive socket to device: eth0

Jan 27 10:11:35 heartbeat: [15999]: info: glib: ucast: set SO_REUSEPORT(w)

Jan 27 10:11:35 heartbeat: [15999]: info: glib: ucast: started on port 694 interface eth0 to 192.168.4.218

Jan 27 10:11:35 heartbeat: [15999]: info: glib: ping heartbeat started.

Jan 27 10:11:35 heartbeat: [15999]: info: G_main_add_TriggerHandler: Added signal manual handler

Jan 27 10:11:35 heartbeat: [15999]: info: G_main_add_TriggerHandler: Added signal manual handler

Jan 27 10:11:35 heartbeat: [15999]: info: G_main_add_SignalHandler: Added signal handler for signal 17

Jan 27 10:11:35 heartbeat: [15999]: info: Local status now set to: 'up'

Jan 27 10:11:35 heartbeat: [15999]: info: Link 192.168.3.1:192.168.3.1 up.

Jan 27 10:11:35 heartbeat: [15999]: info: Status update for node 192.168.3.1: status ping

Jan 27 10:13:35 heartbeat: [15999]: WARN: node usvr-218: is dead

Jan 27 10:13:35 heartbeat: [15999]: info: Comm_now_up(): updating status to active

Jan 27 10:13:35 heartbeat: [15999]: info: Local status now set to: 'active'

Jan 27 10:13:35 heartbeat: [15999]: info: Starting child client "/usr/lib64/heartbeat/ipfail" (498,498)

Jan 27 10:13:35 heartbeat: [15999]: WARN: No STONITH device configured.

Jan 27 10:13:35 heartbeat: [15999]: WARN: Shared disks are not protected.

Jan 27 10:13:35 heartbeat: [15999]: info: Resources being acquired from localsv218.

正如如上顯示,主備雙方都檢查對方的node死掉,進而接管VIP,導緻腦裂産生。

2.初步斷定是由于主備雙方無法通訊或網絡延遲導緻,難道由于時間不同步導緻,雖然時間不同不對heartbeat影響較小,但是相差很多,肯定會有問題,于是雙方對時。

/usr/sbin/ntpdate ntp.api.bz&&hwclock -w

echo "0 23 * * * root /usr/sbin/ntpdate ntp.api.bz&&hwclock -w > /dev/null 2>&1" >>/etc/crontab

3.對時完畢,仍然報日志中的錯誤,再次檢查主備配置檔案,發現都沒有問題,唯一差別在于主備上都有防火牆,由于heartbeat設定的是由udp 694端口通訊,于是将udp 694

端口在放火牆中放過。

在主218上加入:

/sbin/iptables -A INPUT -i eth0 -p udp -s 192.168.4.128 --dport 694 -m comment --comment "heartbeat-slave" -j ACCEPT

在備128上加入:

/sbin/iptables -A INPUT -i eth0 -p udp -s 192.168.4.218 --dport 694 -m comment --comment "heartbeat-master" -j ACCEPT

注意:1.如果防火牆政策嚴格時,要對心跳ip放過,否則udp通訊仍會失敗

    2.入口網卡針對對心跳ip的網卡

經過防火牆配置後,主備可以正常通訊了,正常情況下主節點接管VIP工作,當主節點down掉或主節點的heartbeat服務停掉,備用節點便會接管VIP

繼續閱讀