天天看點

VIP漂移,IO線程斷連 注意事項

考慮一個問題如下:

A----B
 \  /
 VIP
  | 
  C           

這種構架A B 切換了,如果VIP漂移了,C從庫是否有問題。結論就是POS一定不行,GTID卻一定可以。證明如下:

《深入了解MySQL主從原理 32講》中的17和22節較長的描述了下文提到的IO和DUMP線程流程,有興趣可以關注一下:

IO thread如果遇到主庫IP斷開操作會進入重連流程。這個過程觸發如下邏輯

event_len= read_event(mysql, mi, &suppress_warnings); //傳回讀取的長度           

event_len中傳回錯誤碼,如下:

if (event_len == packet_error)
      {
        uint mysql_error_number= mysql_errno(mysql);
        switch (mysql_error_number) {
        case CR_NET_PACKET_TOO_LARGE:
          sql_print_error("\
Log entry on master is longer than slave_max_allowed_packet (%lu) on \
slave. If the entry is correct, restart the server with a higher value of \
slave_max_allowed_packet",
                         slave_max_allowed_packet);
          mi->report(ERROR_LEVEL, ER_NET_PACKET_TOO_LARGE,
                     "%s", "Got a packet bigger than 'slave_max_allowed_packet' bytes");
          goto err;
        case ER_MASTER_FATAL_ERROR_READING_BINLOG:
          mi->report(ERROR_LEVEL, ER_MASTER_FATAL_ERROR_READING_BINLOG,
                     ER(ER_MASTER_FATAL_ERROR_READING_BINLOG),
                     mysql_error_number, mysql_error(mysql));
          goto err;
        case ER_OUT_OF_RESOURCES:
          sql_print_error("\
Stopping slave I/O thread due to out-of-memory error from master");
          mi->report(ERROR_LEVEL, ER_OUT_OF_RESOURCES,
                     "%s", ER(ER_OUT_OF_RESOURCES));
          goto err;
        }
        if (try_to_reconnect(thd, mysql, mi, &retry_count, suppress_warnings,
                             reconnect_messages[SLAVE_RECON_ACT_EVENT]))
          goto err;
        goto connected;
      } 
           

上面的有些錯誤是不能重連的自行參考,如果重新連接配接成功,将會進入goto connected;

這裡會重新走一遍連接配接流程,最重要的是GTID和POSTION 會進入DUMP線程定位流程,也就是GTID

會重新搜尋主庫的mysql binlog 和 GTID 進行定位。

是以我們可以确認類似下面

A----B
 \  /
 VIP
  | 
  C           

這種構架當VIP切換完成後,主要保證A B無損切換,那麼C是沒有問題的,但是POSTION卻不行,因為A庫的位點和B點的位點不一定完全一緻。這一點是需要注意的。

證明很簡單,我隻需要将主庫IP先關閉然後過一會起來即可。日志如下:

2019-08-06T21:30:29.723923+08:00 4 [ERROR] Slave I/O for channel '': error reconnecting to master '[email protected]:3340' - retry-time: 60  retries: 1, Error_code: 2
003           

我們設定斷點如下。

從庫設定在request_dump函數上,觸發如下:

(gdb) bt
#0  request_dump (thd=0x7ffe800009a0, mysql=0x7ffe8000e670, mi=0x7ffe7c0223b0, suppress_warnings=0x7fffec0c5d8b)
    at /root/mysqlall/percona-server-locks-detail-5.7.22/sql/rpl_slave.cc:4363
#1  0x00000000018beee1 in handle_slave_io (arg=0x7ffe7c0223b0) at /root/mysqlall/percona-server-locks-detail-5.7.22/sql/rpl_slave.cc:5768
#2  0x0000000001945620 in pfs_spawn_thread (arg=0x7ffe7c033f90) at /root/mysqlall/percona-server-locks-detail-5.7.22/storage/perfschema/pfs.cc:2190
#3  0x00007ffff7bc6aa1 in start_thread () from /lib64/libpthread.so.0
#4  0x00007ffff6719bcd in clone () from /lib64/libc.so.6           

主庫設定在com_binlog_dump_gtid上

(gdb) bt
#0  com_binlog_dump_gtid (thd=0x7fffe800edc0, packet=0x7fffe80068c1 "", packet_length=43) at /mysqldata/percona-server-locks-detail-5.7.22/sql/rpl_master.cc:356
#1  0x00000000015c769b in dispatch_command (thd=0x7fffe800edc0, com_data=0x7fffec58bd70, command=COM_BINLOG_DUMP_GTID)
    at /mysqldata/percona-server-locks-detail-5.7.22/sql/sql_parse.cc:1705
#2  0x00000000015c58ff in do_command (thd=0x7fffe800edc0) at /mysqldata/percona-server-locks-detail-5.7.22/sql/sql_parse.cc:1021
#3  0x000000000170e578 in handle_connection (arg=0x6660220) at /mysqldata/percona-server-locks-detail-5.7.22/sql/conn_handler/connection_handler_per_thread.cc:312
#4  0x0000000001945538 in pfs_spawn_thread (arg=0x665f200) at /mysqldata/percona-server-locks-detail-5.7.22/storage/perfschema/pfs.cc:2190
#5  0x00007ffff7bcfaa1 in start_thread () from /lib64/libpthread.so.0
#6  0x00007ffff6b37c4d in clone () from /lib64/libc.so.6
           

是以這裡主要證明的就是,即便是IO線程重連主庫GTID定位操作依然會重新跑一次。