天天看點

Redis Sentinel主從複制的局限性Redis Sentinel安裝與配置故障轉移日常運維

主從複制的局限性

手動故障轉移

  1. master當機,redis服務不可用
  2. slave資料同步中斷
  3. 手動故障轉移
  4. 選出一個slave節點執行

    slaveof no one

    ,成為master節點
  5. 其他slave節點執行

    slaveof new master

    ,進行主從複制
  6. 對于調用redis服務的用戶端,如何讓用戶端感覺master發生變化,做出相應的處理是比較困難的。

寫能力和存儲能力受限

  1. 隻有master節點可以做寫入操作,存儲能力十分有限。

Redis Sentinel

全文除非有特殊聲明,否則全部預設為redis5.0版本

Redis Sentinel主從複制的局限性Redis Sentinel安裝與配置故障轉移日常運維

一套sentinel可以監控多套master-slave服務,使用master name配置作為辨別。

安裝與配置

安裝配置redis-server
  1. 建立一個redis配置檔案redis-7000.conf,按照最簡配置
port 7000
daemonize yes
pidfile /usr/local/software/redis/data/redis-7000.pid
logfile "/usr/local/software/redis/data/7000.log"
dir "/usr/local/software/redis/data"
           

借助sed指令快速生成slave節點配置為檔案

sed "s/7000/7001/g" redis-7000.conf > redis-7001.conf 
sed "s/7000/7002/g" redis-7000.conf > redis-7002.conf 
#配置主從關系
echo "slaveof 127.0.0.1 7000">> redis-7001.conf 
echo "slaveof 127.0.0.1 7000" >> redis-7002.conf 
#分别啟動redis的7000,7001,7002節點
./redis-server redis-7000.conf 
./redis-server redis-7001.conf 
./redis-server redis-7002.conf 
           
  1. 使用用戶端連接配接redis-7000服務

我們不難看出目前7000端口的redis是master節點,它有兩個slave節點端口分别是7001和7002,并且全部處于線上狀态。

redis-cli -p 7000
127.0.0.1:7000> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=ip,port=7001,state=online,offset=789019,lag=0
slave1:ip=ip,port=7002,state=online,offset=789019,lag=0
master_replid:e27b673924f62d27605f5d095924ec5c287ced02
master_replid2:bc30e94bea8b34c0b5ba9f815f316edd8a05aa33
master_repl_offset:789019
second_repl_offset:265200
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:789019
           
  1. 使用用戶端連接配接redis-7001服務

在這裡我們很清楚的看到,7001節點為slave,他的master的host和端口号就是我們的7000端口的 redis節點。

[[email protected] redis]# ./redis-cli -p 7001
127.0.0.1:7001> info replication
# Replication
role:slave
master_host:ip
master_port:7000
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:919032
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:e27b673924f62d27605f5d095924ec5c287ced02
master_replid2:bc30e94bea8b34c0b5ba9f815f316edd8a05aa33
master_repl_offset:919032
second_repl_offset:265200
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:919032
127.0.0.1:7001> 
           
安裝配置redis-server
  1. 配置開啟sentinel監控主節點(sentinel是特殊的redis)

過濾注釋和空行,篩出sentinel最簡配置

cat sentinel.conf | grep -v "#" | grep -v "^$" > redis-sentinel.conf
           

設定為背景啟動,設定日志檔案和工作目錄

port 26379
daemonize yes
pidfile /var/run/redis-sentinel-26379.pid
logfile "26379.log"
dir /usr/local/software/redis/data/
sentinel monitor mymaster 127.0.0.1 7000 2
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes
#同理,配置26380,26381端口的配置檔案,使用redis-sentinel指令啟動。
./redis-sentinel redis-sentinel-26379.conf
./redis-sentinel redis-sentinel-26380.conf
./redis-sentinel redis-sentinel-26381.conf
           

連接配接26379端口的sentinel

127.0.0.1:26379> info Sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=ip:7002,slaves=2,sentinels=3
127.0.0.1:26379> 
           
  1. 多機器部署,保證高可用

    主從不部署在同一台機器上,redis-sentinel不與redis服務部署在同一台機器上,保證高可用

故障轉移

故障轉移(自動轉移)

  1. 多個sentinel發現并确認master出問題
  2. 選舉一個sentinel成為上司
  3. 選出一個slave成為新的master
  4. 通知其餘的slave成為新的master的slave
  5. 通知用戶端主從變化
  6. 等待老的master複活,讓它成為新的master的slave

故障轉移小實驗

在啟動redis 7000(master),7001(slave),7002(slave)三個服務和26379,26380,26381三個sentinel服務後,建立一個java項目JedisTest。

(1) 測試代碼:

public static void main(String[] args) {
        String masterName = "mymaster";
        Set<String> sentinelSet = new HashSet<String>();
        sentinelSet.add("ip:26379");
        sentinelSet.add("ip:26380");
        sentinelSet.add("ip:26381");
        JedisSentinelPool jedisSentinelPool = new JedisSentinelPool(masterName, sentinelSet);
        int counter = 0;
        while (true) {
            Jedis jedis = null;
            try {
                counter ++;
                jedis = jedisSentinelPool.getResource();
                int index = new Random().nextInt(100000);
                String key = "k-" + index;
                String value = "v-" + index;
                jedis.set(key, value);
                if(counter % 100 == 0){
                    log.info("info,key={},value={}", key, value);
                }

                TimeUnit.MILLISECONDS.sleep(10);
            } catch (Exception e) {
                e.printStackTrace();
            }finally {
                if (jedis != null) {
                    jedis.close();
                }
            }
        }
    }
           

(2) 測試結果如下:

18:56:43.431 [main] INFO com.gy.redisTest.RedisSentinelFailOverTest - info,key=k-22950,value=v-22950
18:56:44.919 [main] INFO com.gy.redisTest.RedisSentinelFailOverTest - info,key=k-98952,value=v-98952
18:56:46.423 [main] INFO com.gy.redisTest.RedisSentinelFailOverTest - info,key=k-43036,value=v-43036
18:56:47.928 [main] INFO com.gy.redisTest.RedisSentinelFailOverTest - info,key=k-92698,value=v-92698
18:56:49.401 [main] INFO com.gy.redisTest.RedisSentinelFailOverTest - info,key=k-32185,value=v-32185
18:56:50.887 [main] INFO com.gy.redisTest.RedisSentinelFailOverTest - info,key=k-66828,value=v-66828
18:56:52.370 [main] INFO com.gy.redisTest.RedisSentinelFailOverTest - info,key=k-55874,value=v-55874
18:56:53.848 [main] INFO com.gy.redisTest.RedisSentinelFailOverTest - info,key=k-5782,value=v-5782  
           

(3)模拟master當機,kill -9 port

控制台如下:

redis.clients.jedis.exceptions.JedisConnectionException: Unexpected end of stream.
	at redis.clients.util.RedisInputStream.ensureFill(RedisInputStream.java:199)
	at redis.clients.util.RedisInputStream.readByte(RedisInputStream.java:40)
	at redis.clients.jedis.Protocol.process(Protocol.java:151)
	at redis.clients.jedis.Protocol.read(Protocol.java:215)
	at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:340)
	at redis.clients.jedis.Connection.getStatusCodeReply(Connection.java:239)
	at redis.clients.jedis.Jedis.set(Jedis.java:121)
	at com.gy.redisTest.RedisSentinelFailOverTest.main(RedisSentinelFailOverTest.java:36)
redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
	at redis.clients.util.Pool.getResource(Pool.java:53)
	at redis.clients.jedis.JedisSentinelPool.getResource(JedisSentinelPool.java:209)
	at com.gy.redisTest.RedisSentinelFailOverTest.main(RedisSentinelFailOverTest.java:32)
Caused by: redis.clients.jedis.exceptions.JedisConnectionException: java.net.ConnectException: Connection refused: connect
	at redis.clients.jedis.Connection.connect(Connection.java:207)
	at redis.clients.jedis.BinaryClient.connect(BinaryClient.java:93)
	at redis.clients.jedis.BinaryJedis.connect(BinaryJedis.java:1767)
	at redis.clients.jedis.JedisFactory.makeObject(JedisFactory.java:106)
	at org.apache.commons.pool2.impl.GenericObjectPool.create(GenericObjectPool.java:868)
	at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:435)
	at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:363)
	at redis.clients.util.Pool.getResource(Pool.java:49)
	... 2 more
Caused by: java.net.ConnectException: Connection refused: connect
	at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
	at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at redis.clients.jedis.Connection.connect(Connection.java:184)
	... 9 more


           

(4) 等待一段時間後,控制台恢複正常,故障自動轉移完成

Caused by: java.net.ConnectException: Connection refused: connect
	at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
	at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85)
	at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
	at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at redis.clients.jedis.Connection.connect(Connection.java:184)
	... 9 more
18:57:27.035 [main] INFO com.gy.redisTest.RedisSentinelFailOverTest - info,key=k-99501,value=v-99501
18:57:28.604 [main] INFO com.gy.redisTest.RedisSentinelFailOverTest - info,key=k-14578,value=v-14578
           

檢視日志

什麼是主觀下線?什麼是客觀下線?

  1. 主觀下線是每個sentinel對redis節點失敗的“偏見”。
# 節點不可達的預設時間(ping不通)
sentinel down-after-milliseconds <master-name> <milliseconds>
           
  1. 客觀下線是所有sentinel節點對redis節點(master)失敗達成一緻意見(達到法定人數)。
#quorum 法定人數
sentinel monitor <master-name> <ip> <redis-port> <quorum>
           

三個定時任務

  1. 每10秒每個sentinel對master和slave執行info
    • 發現slave節點
    • 确認主從關系
  2. 每兩秒每個sentinel通過master節點的channel交換資訊(pub/sub)
    • 通過_sentinel_:hello頻道互動
    • 互動對節點的“看法”和自身資訊
  3. 每一秒每個sentinel對其他sentinel和redis執行ping

上司者選舉

  1. 選舉的目的: 隻有一個sentinel節點去完成故障轉移
  2. 選舉的過程:

    (1)每個做主觀下線的sentinel都要向其他sentinel節點發送指令,請求成為上司者。

    (2)收到指令的sentinel節點如果沒有投票給其他sentinel節點,那麼同意該請求。否則拒絕。

    (3)當某個sentinel發現自己的票數超過半數并達到“法定人數”,那麼它将成為上司者,執行故障轉移。

    (4)如果此過程中有多個節點成為上司者,等待一段時間重新選舉。

故障轉移(sentinel上司者完成)

  1. 從slave節點中選取一個“合适的”節點成為新的master。
  2. 對該slave節點執行slaveof no one,成為新的master。
  3. 通知剩餘的slave節點,成為新master的slave,進行主從複制
  4. 仍舊對下線的redis節點保持關注,當其恢複後,讓它成為master的新的slave節點

選擇“合适”的master

  1. 選擇slave_priority(優先級高)最大的slave節點,如果存在則傳回,不存在繼續。
  2. 選擇複制偏移量最大的節點(資料同步最完整),如果存在則傳回,不存在繼續。
  3. 選擇runId最小的節點

日志分析

(1)檢視7000主節點的日志

30712:C 15 Jul 2019 18:44:03.552 * RDB: 4 MB of memory used by copy-on-write
30703:M 15 Jul 2019 18:44:03.566 * Background saving terminated with success
30703:M 15 Jul 2019 18:44:03.566 * Synchronization with replica ip:7001 succeeded
30703:M 15 Jul 2019 18:44:05.656 * Replica ip:7002 asks for synchronization
30703:M 15 Jul 2019 18:44:05.656 * Full resync requested by replica 39.107.69.86:7002
30703:M 15 Jul 2019 18:44:05.656 * Starting BGSAVE for SYNC with target: disk
30703:M 15 Jul 2019 18:44:05.657 * Background saving started by pid 30719
30719:C 15 Jul 2019 18:44:05.661 * DB saved on disk
30719:C 15 Jul 2019 18:44:05.661 * RDB: 4 MB of memory used by copy-on-write
30703:M 15 Jul 2019 18:44:05.672 * Background saving terminated with success
30703:M 15 Jul 2019 18:44:05.672 * Synchronization with replica ip:7002 succeeded
           

由于我們模拟當機使用kill指令,主節點日志并沒有太多的資訊回報,直到當機前仍是在做資料同步。

(2) 檢視7001和7002從節點日志

30713:C 15 Jul 2019 18:44:03.596 * SYNC append only file rewrite performed
30713:C 15 Jul 2019 18:44:03.596 * AOF rewrite: 4 MB of memory used by copy-on-write
30708:S 15 Jul 2019 18:44:03.633 * Background AOF rewrite terminated with success
30708:S 15 Jul 2019 18:44:03.633 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
30708:S 15 Jul 2019 18:44:03.634 * Background AOF rewrite finished successfully
30708:S 15 Jul 2019 18:58:28.241 # Connection with master lost.
30708:S 15 Jul 2019 18:58:28.241 * Caching the disconnected master state.
30708:S 15 Jul 2019 18:58:28.846 * Connecting to MASTER ip:7000
30708:S 15 Jul 2019 18:58:28.847 * MASTER <-> REPLICA sync started
30708:S 15 Jul 2019 18:58:28.849 # Error condition on socket for SYNC: Connection refused
30708:S 15 Jul 2019 18:58:29.850 * Connecting to MASTER ip:7000
30708:S 15 Jul 2019 18:58:29.850 * MASTER <-> REPLICA sync started
30708:S 15 Jul 2019 18:58:29.853 # Error condition on socket for SYNC: Connection refused
30708:S 15 Jul 2019 18:58:30.851 * Connecting to MASTER ip:7000
30708:S 15 Jul 2019 18:58:30.852 * MASTER <-> REPLICA sync started
30708:S 15 Jul 2019 18:58:30.854 # Error condition on socket for SYNC: Connection refused
30708:S 15 Jul 2019 18:58:31.855 * Connecting to MASTER ip:7000
30708:S 15 Jul 2019 18:58:31.855 * MASTER <-> REPLICA sync started
...
30708:S 15 Jul 2019 18:58:56.919 * MASTER <-> REPLICA sync started
30708:S 15 Jul 2019 18:58:56.922 # Error condition on socket for SYNC: Connection refused
30708:S 15 Jul 2019 18:58:57.919 * Connecting to MASTER ip:7000
30708:S 15 Jul 2019 18:58:57.920 * MASTER <-> REPLICA sync started
30708:S 15 Jul 2019 18:58:57.922 # Error condition on socket for SYNC: Connection refused
30708:M 15 Jul 2019 18:58:58.628 # Setting secondary replication ID to d1c40b043f9fbc70ef8435d26897219c71ab97d7, valid up to offset: 379160. New replication ID is fdf1866660032a8cfd27167bf52a899d4e0dd7a5
30708:M 15 Jul 2019 18:58:58.628 * Discarding previously cached master state.
30708:M 15 Jul 2019 18:58:58.628 * MASTER MODE enabled (user request from 'id=7 addr=ip:50284 fd=11 name=sentinel-dbea5725-cmd age=285 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=154 qbuf-free=32614 obl=36 oll=0 omem=0 events=r cmd=exec')
30708:M 15 Jul 2019 18:58:58.629 # CONFIG REWRITE executed with success.
30708:M 15 Jul 2019 18:58:59.040 * Replica ip:7002 asks for synchronization
30708:M 15 Jul 2019 18:58:59.040 * Partial resynchronization request from ip:7002 accepted. Sending 663 bytes of backlog starting from offset 379160.
           
30715:S 15 Jul 2019 18:58:57.026 # Error condition on socket for SYNC: Connection refused
30715:S 15 Jul 2019 18:58:58.024 * Connecting to MASTER ip:7000
30715:S 15 Jul 2019 18:58:58.024 * MASTER <-> REPLICA sync started
30715:S 15 Jul 2019 18:58:58.026 # Error condition on socket for SYNC: Connection refused
30715:S 15 Jul 2019 18:58:58.719 * REPLICAOF ip:7001 enabled (user request from 'id=7 addr=ip:57200 fd=12 name=sentinel-dbea5725-cmd age=285 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=291 qbuf-free=32477 obl=36 oll=0 omem=0 events=r cmd=exec')
30715:S 15 Jul 2019 18:58:58.721 # CONFIG REWRITE executed with success.
30715:S 15 Jul 2019 18:58:59.027 * Connecting to MASTER ip:7001
30715:S 15 Jul 2019 18:58:59.027 * MASTER <-> REPLICA sync started
30715:S 15 Jul 2019 18:58:59.029 * Non blocking connect for SYNC fired the event.
30715:S 15 Jul 2019 18:58:59.032 * Master replied to PING, replication can continue...
30715:S 15 Jul 2019 18:58:59.039 * Trying a partial resynchronization (request d1c40b043f9fbc70ef8435d26897219c71ab97d7:379160).
30715:S 15 Jul 2019 18:58:59.042 * Successful partial resynchronization with master.
30715:S 15 Jul 2019 18:58:59.042 # Master replication ID changed to fdf1866660032a8cfd27167bf52a899d4e0dd7a5
30715:S 15 Jul 2019 18:58:59.042 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.
           
  • 從日志中我們不難發現在6點58分的時候,slave節點與master節點失聯,并且slave一直嘗試連接配接master節點。
  • 在58分58秒7001節點接收到一條請求,希望讓它成為新的master,并進行了配置重寫,7002節點嘗試從7001節點請求同步資料。
  • 在58分58秒7002節點接收到一條請求,成為7001的從節點,并重寫配置資訊。7002節點嘗試部分重新同步,它記錄的master ID進行變更,主節點接受了部分重新同步。(同時,從這裡我們也能看出,新版的redis相較舊版本4.0之前做了優化,主從切換可以嘗試進行部分複制,不再絕對的進行全量複制)

(3)驗證日志分析結果

  • 7001節點角色是master,slave節點端口為7002
127.0.0.1:7001> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=ip,port=7002,state=online,offset=1339423,lag=1
master_replid:fdf1866660032a8cfd27167bf52a899d4e0dd7a5
master_replid2:d1c40b043f9fbc70ef8435d26897219c71ab97d7
master_repl_offset:1339423
second_repl_offset:379160
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:290848
repl_backlog_histlen:1048576
           
  • sentinel節點資訊顯示,主節點端口為7001,同時7002成為它的從節點,但是為什麼slave顯示為2,是因為sentinel會等待7000節點啟動,一旦7000節點啟動,會通知7000節點成為7001的從節點。
127.0.0.1:26379> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=ip:7001,slaves=2,sentinels=3
           
  • 在三個sentinel節點的日志中可以看出,26379、26380、26381先後将7000節點主觀下線(+sdown),計數器+1(“新紀元”+1),26380節點發現主觀下線數達到配置的“法定人數”,準備對7000master節點進行客觀下線(+odown)。開始投票選舉,26380希望成為上司者,26379和26381也投票給了26380節點,選舉它成為上司者。上司者通知7001節點slaveof no one 成為master,通知7000節點slave of 7001通知7002節點slave of 7001.
# 26379節點日志
# 主觀下線7000節點
30790:X 15 Jul 2019 18:58:58.291 # +sdown master mymaster ip 7000
30790:X 15 Jul 2019 18:58:58.420 # +new-epoch 1
# 投票為26380成為上司者
30790:X 15 Jul 2019 18:58:58.422 # +vote-for-leader dbea572500678a7e3523f5c2d30aee38c771982c 1
30790:X 15 Jul 2019 18:58:58.718 # +config-update-from sentinel dbea572500678a7e3523f5c2d30aee38c771982c ip 26380 @ mymaster ip 7000
# 切換7000主節點為7001主節點
30790:X 15 Jul 2019 18:58:58.718 # +switch-master mymaster ip 7000 ip 7001
30790:X 15 Jul 2019 18:58:58.718 * +slave slave ip:7002 ip 7002 @ mymaster ip 7001
30790:X 15 Jul 2019 18:58:58.718 * +slave slave ip:7000 ip 7000 @ mymaster ip 7001
30790:X 15 Jul 2019 18:59:28.728 # +sdown slave ip:7000 ip 7000 @ mymaster ip 7001

# 26380節點日志
30795:X 15 Jul 2019 18:54:13.809 # Sentinel ID is dbea572500678a7e3523f5c2d30aee38c771982c
# 主觀下線7000節點
30795:X 15 Jul 2019 18:58:58.335 # +sdown master mymaster ip 7000
30795:X 15 Jul 2019 18:58:58.411 # +odown master mymaster ip 7000 #quorum 2/2
30795:X 15 Jul 2019 18:58:58.411 # +new-epoch 1
30795:X 15 Jul 2019 18:58:58.411 # +try-failover master mymaster ip 7000
# 希望自己成為上司者
30795:X 15 Jul 2019 18:58:58.416 # +vote-for-leader dbea572500678a7e3523f5c2d30aee38c771982c 1
# 26379投票給我成為上司者
30795:X 15 Jul 2019 18:58:58.422 # f67443294644bcaca75a83bd9aeb0baade1d6ecc voted for dbea572500678a7e3523f5c2d30aee38c771982c 1
# 26381投票給我成為上司者
30795:X 15 Jul 2019 18:58:58.422 # ad712f71928204dc55033dd391968a99388fcd98 voted for dbea572500678a7e3523f5c2d30aee38c771982c 1
30795:X 15 Jul 2019 18:58:58.471 # +elected-leader master mymaster ip 7000
# 故障轉移--選擇故障master
30795:X 15 Jul 2019 18:58:58.471 # +failover-state-select-slave master mymaster ip 7000
# 選擇7001成為master
30795:X 15 Jul 2019 18:58:58.571 # +selected-slave slave ip:7001 ip 7001 @ mymaster ip 7000
# slaveof no one
30795:X 15 Jul 2019 18:58:58.571 * +failover-state-send-slaveof-noone slave ip:7001 ip 7001 @ mymaster ip 7000
30795:X 15 Jul 2019 18:58:58.627 * +failover-state-wait-promotion slave ip:7001 ip 7001 @ mymaster ip 7000
30795:X 15 Jul 2019 18:58:58.633 # +promoted-slave slave ip:7001 ip 7001 @ mymaster ip 7000
30795:X 15 Jul 2019 18:58:58.633 # +failover-state-reconf-slaves master mymaster ip 7000
30795:X 15 Jul 2019 18:58:58.717 * +slave-reconf-sent slave ip:7002 ip 7002 @ mymaster ip 7000
# 客觀下線
30795:X 15 Jul 2019 18:58:59.555 # -odown master mymaster ip 7000
30795:X 15 Jul 2019 18:58:59.671 * +slave-reconf-inprog slave ip:7002 ip 7002 @ mymaster ip 7000
30795:X 15 Jul 2019 18:58:59.671 * +slave-reconf-done slave ip:7002 ip 7002 @ mymaster ip 7000
30795:X 15 Jul 2019 18:58:59.732 # +failover-end master mymaster ip 7000
# 切換7000主節點為7001主節點
30795:X 15 Jul 2019 18:58:59.732 # +switch-master mymaster ip 7000 ip 7001
30795:X 15 Jul 2019 18:58:59.732 * +slave slave ip:7002 ip 7002 @ mymaster ip 7001
30795:X 15 Jul 2019 18:58:59.732 * +slave slave ip:7000 ip 7000 @ mymaster ip 7001
30795:X 15 Jul 2019 18:59:29.752 # +sdown slave ip:7000 ip 7000 @ mymaster ip 7001

# 26381節點日志
# 主觀下線7000節點
30800:X 15 Jul 2019 18:58:58.342 # +sdown master mymaster ip 7000
30800:X 15 Jul 2019 18:58:58.420 # +new-epoch 1

30800:X 15 Jul 2019 18:58:58.422 # +vote-for-leader dbea572500678a7e3523f5c2d30aee38c771982c 1
30800:X 15 Jul 2019 18:58:58.433 # +odown master mymaster ip 7000 #quorum 3/2
30800:X 15 Jul 2019 18:58:58.433 # Next failover delay: I will not start a failover before Mon Jul 15 19:04:59 2019
30800:X 15 Jul 2019 18:58:58.718 # +config-update-from sentinel dbea572500678a7e3523f5c2d30aee38c771982c ip 26380 @ mymaster ip 7000
# 切換7000主節點為7001主節點
30800:X 15 Jul 2019 18:58:58.718 # +switch-master mymaster ip 7000 ip 7001
30800:X 15 Jul 2019 18:58:58.718 * +slave slave ip:7002 ip 7002 @ mymaster ip 7001
30800:X 15 Jul 2019 18:58:58.718 * +slave slave ip:7000 ip 7000 @ mymaster ip 7001
30800:X 15 Jul 2019 18:59:28.769 # +sdown slave ip:7000 ip 7000 @ mymaster ip 7001
           

日常運維

主節點下線

做一個手動的故障轉移,忽略主觀下線、客觀下線等,直接進行故障轉移,完成主節點下線

從節點下線

要考慮是臨時下線還是永久下線,例如是否清理資料。

主節點上線

使用sentinel failover 進行替換,然後調高我們希望上線的從節點的slave_priority(優先級)

從節點上線

直接使用slaveof 指令進行主從複制即可。