keepalived+redis 實作高可用的自動故障轉移failover
在A伺服器(10.0.11.2),B伺服器(10.0.12.2)上均安裝redis,keepalived(安裝方法略)
A作為預設的master,B作為slave(在redis的配置檔案中加上 SLAVEOF 10.0.11.2 6379)即可
A,B上的Redis均開啟本地化政策。appendonly yes
A伺服器的配置
keepalived配置檔案内容
-------begin------
! Configuration File for keepalived
global_defs {
lvs_id LVS_redis
}
vrrp_script chk_redis {
script "/opt/redis/sh/redis_check.sh"
weight -20
interval 2
}
vrrp_instance VI_1 {
state backup
#state MASTER
interface bond0
virtual_router_id 51
nopreempt
priority 200
advert_int 5
authentication {
auth_type PASS
auth_pass 1111
} track_script {
chk_redis
}
virtual_ipaddress {
10.0.11.0
}
notify_master /opt/redis/sh/redis_master.sh
notify_backup /opt/redis/sh/redis_backup.sh
notify_fault /opt/redis/sh/redis_fault.sh
notify_stop /opt/redis/sh/redis_stop.sh
}
-----end-----
說明:
global_defs 部分的郵件可以随便寫,要實作郵件通知則要按真實填寫
script "/opt/redis/sh/redis_check.sh" #監控腳本的路徑
weight -20 #redis連接配接失敗優先級-20,優先級調整則會觸發keepalived的狀态轉移,vip同時會漂移
interval 5
#監控的頻率
state MASTER #預設的狀态
state backup #備份狀态 這裡兩個都設定為備份狀态,
nopreempt 設定為不搶占,靠優先級來确定誰是master
interface bond0
#網卡名
virtual_router_id 51
#A,B 伺服器設定一樣即可
priority 200
#優先級 比B設大即可
advert_int 2
#貌似廣播的頻率,不确定
authentication {
#A,B 伺服器設定一樣即可
auth_type PASS
auth_pass 1111
}
track_script {
#監控的名稱,上面設定的
chk_redis
}
virtual_ipaddress {
#虛拟IP, 用戶端就用這個IP來通路redis
10.0.11.0
}
#以下是各個狀态下執行的腳本路徑
notify_master /opt/redis/sh/redis_master.sh
#成為master
notify_backup /opt/redis/sh/redis_backup.sh
#成為backup
notify_fault /opt/redis/sh/redis_fault.sh
#監控腳本 exit 1 時
notify_stop /opt/redis/sh/redis_stop.sh
#keepalived 服務停止時
/opt/redis/sh/redis_check.sh
--begin--
#!/bin/bash
ALIVE=`/usr/local/bin/redis-cli PING`
LOGFILE="/opt/redis/logs/keepalived-redis-state.log"
if [ "$ALIVE" == "PONG" ]; then
echo $ALIVE
#echo "check master pong" >> $LOGFILE
exit 0
else
echo $ALIVE
exit 1
fi
--end--
/opt/redis/sh/redis_master.sh
--begin--
#!/bin/bash
REDISCLI="/usr/local/bin/redis-cli"
LOGFILE="/opt/redis/logs/keepalived-redis-state.log"
echo "[master]" >> $LOGFILE
date >> $LOGFILE
echo "Being master...." >> $LOGFILE 2>&1
echo "Run SLAVEOF cmd ..." >> $LOGFILE
$REDISCLI SLAVEOF 10.0.12.2 6379 >> $LOGFILE 2>&1
sleep 15
echo "Run SLAVEOF NO ONE cmd ..." >> $LOGFILE
$REDISCLI SLAVEOF NO ONE
$REDISCLI SLAVEOF NO ONE >> $LOGFILE 2>&1
--end--
/opt/redis/sh/redis_backup.sh
--begin--
REDISCLI="/usr/local/bin/redis-cli"
LOGFILE="/opt/redis/logs/keepalived-redis-state.log"
echo "[backup]" >> $LOGFILE
date >> $LOGFILE
echo "Being slave...." >> $LOGFILE 2>&1
sleep 15
echo "Run SLAVEOF cmd..." >> $LOGFILE
$REDISCLI SLAVEOF 10.0.12.2 6379 >> $LOGFILE 2>&1
--end--
/opt/redis/sh/redis_fault.sh
--begin--
#!/bin/bash
LOGFILE="/opt/redis/logs/keepalived-redis-state.log"
echo "[fault]" >> $LOGFILE
date >> $LOGFILE
sh /opt/redis/sh/redis_backup.sh
--end--
/opt/redis/sh/redis_stop.sh
--begin--
#!/bin/bash
LOGFILE="/opt/redis/logs/keepalived-redis-state.log"
echo "[stop]" >> $LOGFILE
date >> $LOGFILE
sh /opt/redis/sh/redis_backup.sh
--end--
B伺服器的配置
keepalived配置檔案内容
-------begin------
! Configuration File for keepalived
global_defs {
lvs_id LVS_redis
}
vrrp_script chk_redis {
script "/opt/redis/sh/redis_check.sh"
weight -20
interval 2
}
vrrp_instance VI_1 {
state BACKUP
interface bond0
virtual_router_id 51
priority 190
authentication {
auth_type PASS
auth_pass 1111
}
track_script {
chk_redis
}
virtual_ipaddress {
10.0.11.0
}
notify_master /opt/redis/sh/redis_master.sh
notify_backup /opt/redis/sh/redis_backup.sh
notify_fault /opt/redis/sh/redis_fault.sh
notify_stop /opt/redis/sh/redis_stop.sh
}
-----end-----
/opt/redis/sh/redis_check.sh
--begin--
#!/bin/bash
ALIVE=`/usr/local/bin/redis-cli -h 10.0.11.2 -p 6379 PING`
LOGFILE="/opt/redis/logs/keepalived-redis-state.log"
if [ "$ALIVE" != "PONG" ]; then
echo $ALIVE
exit 0
else
echo $ALIVE
exit 1
fi
--end--
/opt/redis/sh/redis_master.sh
--begin--
#!/bin/bash
REDISCLI="/usr/local/bin/redis-cli"
LOGFILE="/opt/redis/logs/keepalived-redis-state.log"
echo "[master]" >> $LOGFILE
date >> $LOGFILE
echo "Being master...." >> $LOGFILE 2>&1
##echo "master Run SLAVEOF 10.0.11.2 cmd ..." >> $LOGFILE
##REDISCLI SLAVEOF 10.0.11.2 6379 >> $LOGFILE 2>&1
#sleep 10
echo "Run SLAVEOF NO ONE cmd ..." >> $LOGFILE
$REDISCLI SLAVEOF NO ONE >> $LOGFILE 2>&1
--end--
/opt/redis/sh/redis_backup.sh
--begin--
#!/bin/bash
REDISCLI="/usr/local/bin/redis-cli"
LOGFILE="/opt/redis/logs/keepalived-redis-state.log"
echo "[backup]" >> $LOGFILE
date >> $LOGFILE
echo "Being slave...." >> $LOGFILE 2>&1
#sleep 10
echo "backup Run SLAVEOF 10.0.11.2 cmd..." >> $LOGFILE
$REDISCLI SLAVEOF 10.0.11.2 6379 >> $LOGFILE 2>&1
--end--
/opt/redis/sh/redis_fault.sh
--begin--
#!/bin/bash
LOGFILE="/opt/redis/logs/keepalived-redis-state.log"
echo "[fault]" >> $LOGFILE
date >> $LOGFILE
sh /opt/redis/sh/redis_backup.sh
--end--
/opt/redis/sh/redis_stop.sh
--begin--
#!/bin/bash
LOGFILE="/opt/redis/logs/keepalived-redis-state.log"
echo "[stop]" >> $LOGFILE
date >> $LOGFILE
sh /opt/redis/sh/redis_backup.sh
--end--
腳本說明:
腳本的邏輯就是當A,B上的redis服務正常是A為master,B為slave
如果檢測到A服務不正常則B成為master, “/usr/local/bin/redis-cli SLAVEOF NO ONE” 這個指令就是關閉資料同步,變成Redis 的master.
如果A服務起來後,A切回master,在變成master前從B上同步最新的資料。同時在B上要 執行 “/usr/local/bin/redis-cli SLAVEOF 10.0.11.2 6379”
讓B再做為A的slave,不然B還是master.
在redis的主從架構中,可以用“/usr/local/bin/redis-cli -h 10.0.11.2 INFO” 來檢視各個目前的狀态。 看目前伺服器是master還是slave;
在指令行下
“tail -30 /opt/redis/logs/keepalived-redis-state.log” 檢視keepalived的狀态轉換
“tail -30 /var/log/messages” 檢視 keepalived虛拟IP的變化。
以上是在實際生産環境中測試過的,雖然有的的地方可能不大合理,但故障轉移可以實作,資料也不會丢。之前按網上的教程做的vip可以切換,但資料這塊有問題,是以改成這樣。
如有更好的方法請告知,多謝!
keepalived運作原理
keepalived預設隻能做到對網絡故障和keepalived本身的監控,即當出現網絡故障或者keepalived本身出現問題時,進行切換。但我們更關注的是機器上運作的業務,如果業務出問題了VIP沒有變化,整體來說還是失敗的。這時候就需要根據業務程序的運作狀态決定是否需要進行主備切換。還好keepalived提供了這樣一個自定義腳本監控功能,用這個來實作業務的控制
方案的整體思路:
通過keepalived的自定義腳本功能監控本機的redis服務狀态,當監控腳本檢測到redis服務出現異常時,則改變本機keepalived的優先級,同時這會導緻master/backup角色的變化,而keepalived在角色變化時也會觸發一些機制執行相關腳本,這就為我們改變redis的master/slave狀态提供了機會,這樣做的目的是為了是redis的master/slave直接的資料保持一緻。
在keepalived+redis的使用過程中有四種情況:
1 一種是keepalived挂了,同時redis也挂了,這樣的話直接VIP飄走之後,是不需要進行redis資料同步的,因為redis挂了,你也無法去master上同步,不過會損失已經寫在master上卻還沒同步到slave上面的這部分資料。
2 另一種是keepalived挂了,redis沒挂,這時候VIP飄走後,redis的master/slave還是老的對應關系,如果不變化的話會把資料寫入redis slave中,進而不會同步到master上去,這就要借助監控腳本反轉redis的master/slave關系。這時候就要預留一點時間進行資料同步,然後反轉master/slave。
3 還有一種是keepalived沒挂,redis挂了,這時候根據監控腳本會檢測到redis挂了,并且降低keepalived master的優先級,同樣會導緻VIP飄走,情況和第二種一樣,也是需要進行資料同步,然後反轉目前redis的master/slave關系的。
4 随後一種是keepalived沒挂,redis也沒挂,大吉大利啊,什麼都不用操作。