[MySQL 5.6] GTID内部实现、运维变化及存在的bug前言：什么是GTID一、主库上的Gtid二、备库上的GTID三、运维操作四、存在的bug

由于之前没太多深入关注gtid，这里给自己补补课，本文是我看文档和代码的整理记录。

本文的主要目的是记下跟gtid相关的backtrace，用于以后的问题排查。另外也会讨论目前在mysql5.6.11版本中存在的bug。

本文讨论的内容包括

一.主库上的gtid产生及记录

二.备库如何使用gtid复制

三.主备运维的变化

四.mysql5.6.11存在的bug

什么是gtid呢，简而言之，就是全局事务id(global transaction identifier )，最初由google实现，官方mysql在5.6才加入该功能，本文的起因在于5.6引入一大堆的gtid相关变量，深感困惑。

gtid的格式类似于：

7a07cd08-ac1b-11e2-9fcf-0010184e9e08:1

这是在我的一台服务器上生成的gtid记录，它在binlog中表现的事件类型就是：

gtid_log_event:用于表示随后的事务的gtid

另外还有两种类型的gtid事件：

anonymous_gtid_log_event ：匿名gtid事件类型（暂且不论）

previous_gtids_log_event：用于表示当前binlog文件之前已经执行过的gtid集合，记录在binlog文件头，例如：

# at 120

#130502 23:23:27 server id 119821 end_log_pos 231 crc32 0x4f33bb48 previous-gtids

# 10a27632-a909-11e2-8bc7-0010184e9e08:1,

# 7a07cd08-ac1b-11e2-9fcf-0010184e9e08:1-1129

gtid字符串，用“：”分开，前面表示这个服务器的server_uuid，这是一个128位的随机字符串，在第一次启动时生成（函数generate_server_uuid），对应的variables是只读变量server_uuid。它能以极高的概率保证全局唯一性，并存到文件data/auto.cnf中。因此要注意保护这个文件不要被删除或修改，不然就麻烦了。

第二部分是一个自增的事务id号，事务id号+server_uuid来唯一标示一个事务。

除了单独的gtid外，还有一个gtid set的概念。一个gtid set的表示类似于：

7a07cd08-ac1b-11e2-9fcf-0010184e9e08:1-31

gtid_executed和gtid_purged是典型的gtid set类型变量；在一个复制拓扑中，gtid_executed 可能包含好几组数据，例如：

mysql> show global variables like ‘%gtid_executed%’\g

*************************** 1. row ***************************

variable_name: gtid_executed

value: 10a27632-a909-11e2-8bc7-0010184e9e08:1-4,

153c0406-a909-11e2-8bc7-0010184e9e08:1-3,

7a07cd08-ac1b-11e2-9fcf-0010184e9e08:1-31,

f914fb74-a908-11e2-8bc6-0010184e9e08:1

主库上每个事务的gtid包括变化的部分和不变的部分。在讨论之前，要弄清楚gtid维护的四个变量：

gtid_purged：已经被删除的binlog的事务，它是gtid_executed的子集，从mysql5.6.9开始，该变量无法被设置。

gtid_owned: 表示正在执行的事务的gtid以及对应的线程id。

例如如下：

mysql> show global variables like ‘%gtid_owned%’\g

variable_name: gtid_owned

value: 7a07cd08-ac1b-11e2-9fcf-0010184e9e08:11560057#67:11560038#89:11560059#7:11560034#32:11560053#56:11560052#112:11560055#128:11560054#65:11559997#96:11560056#90:11560051#85:11560058#39:11560061#12:11560060#125:11560035#62:11560062#5

1 row in set (0.01 sec)

gtid_executed 表示已经在该实例上执行过的事务；执行reset master 会将该变量置空; 我们还可以通过设置gtid_next执行一个空事务，来影响gtid_executed

gtid_next是session级别变量，表示下一个将被使用的gtid

在内存中也维护了与gtid_purged， gtid_owned, gtid_executed相对应的全局对象gtid_state。

gtid_state中维护了三个集合，其中logged_gtids对应gtid_executed， lost_gtids对应gtid_purged，owned_gtids对应gtid_owned

在主库执行一个事务的过程中，关于gtid主要涉及到以下几个部分：

事务开始，执行第一条sql时，在写入第一个“begin” 的query event 之前，为binlog cache 的group_cache中分配一个group(group_cache::add_logged_group)，并写入一个gtid_log_event，此时并未为其分配事务id,backtrace 如下：

handler::ha_write_row->binlog_log_row->write_locked_table_maps->thd::binlog_write_table_map->binlog_start_trans_and_stmt->binlog_cache_data::write_event->group_cache::add_logged_group

暂时还不清楚什么时候一个事务里会有多个gtid的group_cache.

在binlog group commit的flush阶段：

第一步，调用group_cache::generate_automatic_gno来为当前线程生成一个gtid，分配给thd->owned_gtid，并加入到owned_gtids中，backtrace如下：

mysql_bin_log::process_flush_stage_queue->mysql_bin_log::flush_thread_caches->binlog_cache_mngr::flush->binlog_cache_data::flush->gtid_before_write_cache->group_cache::generate_automatic_gno->gtid_state::acquire_ownership->owned_gtids::add_gtid_owner

也就是说，直到事务完成，准备把binlog刷到binlog cache时，才会去为其分配gtid.

当gtid_next的类型为automatic时，调用generate_automatic_gno生成事务id(gno)，分配流程大概如下：

1.gtid_state->lock_sidno(automatic_gtid.sidno) , 为当前sidno加锁，分配过程互斥

2.gtid_state->get_automatic_gno(automatic_gtid.sidno); 获取事务id

|–>初始化候选(candidate)gno为1

|–>从logged_gtids[$sidno]中扫描，获取每个gno区间(iv)：

|–>当candidate < iv->start（或者max_gno，如果iv为null）时，判断candidate是否有被占用，如果没有的话，则使用该candidate，从函数返回，否则candidate++，继续本步骤

|–>将candidate设置为iv->end，iv指向下一个区间，继续第2步

从该过程可以看出，这里兼顾了区间存在碎片的场景，有可能分配的gno并不是全局最大的gno. 不过在主库不手动设置gtid_next的情况下，我们可以认为主库上的gno总是递增的。

3.gtid_state->acquire_ownership(thd, automatic_gtid);

|–>加入到owned_gtids集合中(owned_gtids.add_gtid_owner)，并赋值给thd->owned_gtid= gtid

4.gtid_state->unlock_sidno(automatic_gtid.sidno); 解锁

第二步，调用gtid_state::update_on_flush将当前事务的gtid加入到logged_gtids中,backtrace如下：

mysql_bin_log::process_flush_stage_queue->mysql_bin_log::flush_thread_caches->binlog_cache_mngr::flush->binlog_cache_data::flush->mysql_bin_log::write_cache->gtid_state::update_on_flush

在bin log group commit的commit阶段

调用gtid_state::update_owned_gtids_impl 从owned_gtids中将当前事务的gtid移除,backtrace 如下：

mysql_bin_log::ordered_commit->mysql_bin_log::finish_commit->gtid_state::update_owned_gtids_impl

上述步骤涉及到的是对logged_gtids和owned_gtids的修改。而lost_gtids除了启动时维护外，就是在执行purge操作时维护。

例如，当我们执行purge binary logs to ‘mysql-bin.000205′ 时， mysql-bin.index先被更新掉，然后再根据index文件找到第一个binlog文件的previous_gtids_log_event事件，更新lost_gtids集合，backtrace如下：

purge_master_logs->mysql_bin_log::purge_logs->mysql_bin_log::init_gtid_sets->read_gtids_from_binlog->previous_gtids_log_event::add_to_set->gtid_set::add_gtid_encoding->gtid_set::add_gno_interval

当重启mysql后，我们看到gtid_executed和gtid_purged和重启前是一致的。

持久化gtid，是通过全局对象gtid_state来管理的。gtid_state在系统启动时调用函数gtid_server_init分配内存；如果打开了binlog，则会做进一步的初始化工作：

quoted code:

5419 if (mysql_bin_log.init_gtid_sets(

5420 const_cast<gtid_set *>(gtid_state->get_logged_gtids()),

5421 const_cast<gtid_set *>(gtid_state->get_lost_gtids()),

5422 opt_master_verify_checksum,

5423 true/*true=need lock*/))

5424 unireg_abort(1);

gtid_state 包含3个gtid集合：logged_gtids， lost_gtids， owned_gtids，前两个都是gtid_set类型, owned_gtids类型为owned_gtids

mysql_bin_log::init_gtid_sets 主要用于初始化logged_gtids和lost_gtids,该函数的逻辑简单描述下：

1.扫描mysql-index文件，搜集binlog文件名，并加入到filename_list中

2.从最后一个文件开始往前读，依次调用函数read_gtids_from_binlog：

|–>打开binlog文件，如果读取到previous_gtids_log_event事件

(1)无论如何，将其加入到logged_gtids（prev_gtids_ev->add_to_set(all_gtids)）

(2)如果该文件是第一个binlog文件，将其加入到lost_gtids（prev_gtids_ev->add_to_set(prev_gtids)）中.

|–>获取gtid_log_event事件

(1) 读取该事件对应的sidno，sidno= gtid_ev->get_sidno(false);

这是一个32位的整型，用sidno来代表一个server_uuid，从1开始计算，这主要处于节省内存的考虑。维护在全局对象global_sid_map中。

当sidno还没加入到map时，调用global_sid_map->add_sid(sid)，sidno从1开始递增。

(2) all_gtids->ensure_sidno(sidno)

all_gtids是gtid_set类型，可以理解为一个集合，ensure_sidno就是要确保这个集合至少可以容纳sidno个元素

(3) all_gtids->_add_gtid(sidno, gtid_ev->get_gno()

将该事件中记录的gtid加到all_gtids[sidno]中(最终调用gtid_set::add_gno_interval，这里实际上是把(gno, gno+1)这样一个区间加入到其中，这里

面涉及到区间合并，交集等操作 )

当第一个文件中既没有previous_gtids_log_event，也没有gtid_log_event时，就继续读上一个文件

如果只存在previous_gtids_log_event事件，函数read_gtids_from_binlog返回got_previous_gtids

如果还存在gtid_log_event事件，返回got_gtids

这里很显然存在一个问题，即如果在重启前，我们并没有使用gtid_mode，并且产生了大量的binlog，在这次重启后，我们就可能需要扫描大量的binlog文件。这是一个非常明显的bug，后面再集中讨论。

3.如果第二部扫描，没有到达第一个文件，那么就从第一个文件开始扫描，和第2步流程类似，读取到第一个previous_gtids_log_event事件，并加入到lost_gtids中。

简单的讲，如果我们一直打开的gtid_mode，那么只需要读取第一个binlog文件和最后一个binlog文件，就可以确定logged_gtids和lost_gtids这两个gtid set了。

由于在binlog中记录了每个事务的gtid，因此备库的复制线程可以通过设置线程级别gtid_next来保证主库和备库的gtid一致。

默认情况下，主库上的thd->variables.gtid_next.type为automatic_group，而备库为gtid_group

备库sql线程gtid_next输出：

(gdb) p thd->variables.gtid_next

$2 = {

type = gtid_group,

gtid = {

sidno = 2,

gno = 1127,

static max_text_length = 56

}

这些变量在执行gtid_log_event时被赋值：gtid_log_event::do_apply_event，大体流程为：

1.rpl_sidno sidno= get_sidno(true); 获取sidno

2.thd->variables.gtid_next.set(sidno, spec.gtid.gno); 设置gtid_next

3.gtid_acquire_ownership_single(thd);

|–>检查该gtid是否在logged_gtids集合中，如果在的话，则返回（gtid_pre_statement_checks会忽略该事务）

|–>如果该gtid已经被其他线程拥有，则等待(gtid_state->wait_for_gtid(thd, gtid_next))，否则将当前线程设置为owner(gtid_state->acquire_ownership(thd, gtid_next))

在上面提到，有可能当前事务的gtid已经在logged_gtids中，因此在执行rows_log_event::do_apply_event或者mysql_execute_command函数中，都会去调用函数gtid_pre_statement_checks

该函数也会在每个sql执行前，检查gtid是否合法，主要流程包括：

1.当打开选项enforce_gtid_consistency时，检查ddl是否被允许执行（thd->is_ddl_gtid_compatible()），若不允许，返回gtid_statement_cancel

2.检查当前sql是否会产生隐式提交并且gtid_next被设置（gtid_next->type != automatic_group），如果是的话，则会抛出错误er_cant_do_implicit_commit_in_trx_when_gtid_next_is_set 并返回gtid_statement_cancel，注意这里会导致bug#69045

3.对于begin/commit/rollback/(set option 或者 select )且没有使用存储过程/ 这几种类型的sql，总是允许执行，返回gtid_statement_execute

4.gtid_next->type为undefined_group，抛出错误er_gtid_next_type_undefined_group，返回gtid_statement_cancel

5.gtid_next->type == gtid_group且thd->owned_gtid.sidno == 0时，返回gtid_statement_skip

其中第五步中处理了函数gtid_acquire_ownership_single的特殊情况

引入gtid，最大的好处当然是我们可以随心所欲的切换主备拓扑结构了。在一个正常运行的复制结构中，我们可以在备库简单的执行如下sql：

change master to master_user=’$username’, master_host=’ ‘, master_port=’ ‘, master_auto_position=1;

打开gtid后，我们就无需指定binlog文件或者位置，mysql会自动为我们做这些事情。这里的关键就是master_auto_position。io线程连接主库，可以大概分为以下几步：

1.io线程在和主库建立tcp链接后，会去获取主库的uuid（get_master_uuid），然后在主库上设置一个用户变量@slave_uuid(io_thread_init_commands)

2.之后，在主库上注册slave（register_slave_on_master）

在主库上调用register_slave来注册备库，将备库的host,user,password,port,server_id等信息记录到slave_list哈希中。

3.调用request_dump，开始向主库请求数据，这里分两种情况：

master_auto_position=0时，向主库发送命令的类型为com_binlog_dump，这是传统的请求binlog的模式

master_auto_position=1时，命令类型为com_binlog_dump_gtid，这是新的方式。

这里我们只讨论第二种。第二种情况下，会先去读取备库已经执行的gtid集合

quoted code in rpl_slave.cc :

2974 if (command == com_binlog_dump_gtid)

2975 {

2976 // get set of gtids

2977 sid_map sid_map(null/*no lock needed*/);

2978 gtid_set gtid_executed(&sid_map);

2979 global_sid_lock->wrlock();

2980 gtid_state->dbug_print();

2981 if (gtid_executed.add_gtid_set(mi->rli->get_gtid_set()) != return_status_ok ||

2982 gtid_executed.add_gtid_set(gtid_state->get_logged_gtids()) !=

2983 return_status_ok)

构建完成发送包后，发送给主库。

在主库上接受到命令后，调用入口函数com_binlog_dump_gtid，流程如下：

1.slave_gtid_executed.add_gtid_encoding(packet_position, data_size) ;读取备库传来的gtid set

2.读取备库的uuid(get_slave_uuid)，被根据uuid来kill僵尸线程(kill_zombie_dump_threads)

这也是之前slave io线程执行set @slave_uuid的用处。

3.进入mysql_binlog_send函数：

|–>调用mysql_bin_log::find_first_log_not_in_gtid_set，从最后一个binlog开始扫描，获取文件头部的previous_gtids_log_event，如果它是slave_gtid_executed的子集，保存当前binlog文件名，否则继续向前扫描。

这一步的目的就是为了找出备库执行到的最后一个binlog文件。

|–>从这个文件头部开始扫描，遇到gtid_event时，会去判断该gtid是否包含在slave_gtid_executed中：

gtid_log_event gtid_ev(packet->ptr() + ev_offset,

packet->length() – checksum_size,

p_fdle);

skip_group= slave_gtid_executed->contains_gtid(gtid_ev.get_sidno(sid_map),

gtid_ev.get_gno());

主库通过gtid决定是否可以忽略事务，从而决定执行开始的位置

注意，在使用master_log_position后，就不要指定binlog的位置，否则会报错。

当备库复制出错时，传统的跳过错误的方法是设置sql_slave_skip_counter,然后再start slave。

但如果打开了gtid，就会设置失败：

mysql> set global sql_slave_skip_counter = 1;

error 1858 (hy000): sql_slave_skip_counter can not be set when the server is running with @@global.gtid_mode = on. instead, for each transaction that you want to skip, generate an empty transaction with the same gtid as the transaction

提示的错误信息告诉我们，可以通过生成一个空事务来跳过错误的事务。

我们手动产生一个备库复制错误：

last_sql_error: error ‘unknown table ‘test.t1” on query. default database: ‘test’. query: ‘drop table `t1` /* generated by server */’

查看binlog中，该ddl对应的gtid为7a07cd08-ac1b-11e2-9fcf-0010184e9e08:1131

在备库上执行：

mysql> stop slave;

query ok, 0 rows affected (0.00 sec)

mysql> set session gtid_next = ‘7a07cd08-ac1b-11e2-9fcf-0010184e9e08:1131′;

mysql> begin; commit;

mysql> set session gtid_next = automatic;

mysql> start slave;

再查看show slave status，就会发现错误事务已经被跳过了。这种方法的原理很简单，空事务产生的gtid加入到gtid_executed中，这相当于告诉备库，这个gtid对应的事务已经执行了。

使用change master to …. , master_auto_position=1；

注意在整个复制拓扑中，都需要打开gtid_mode

5.6提供了新的util condition，可以根据gtid来决定备库复制执行到的位置

sql_before_gtids：在指定的gtid之前停止复制

sql_after_gtids ：在指定的gtid之后停止复制

判断函数为relay_log_info::is_until_satisfied

如果开启gtid，理论上最好调小每个binlog文件的最大值，以缩小扫描文件的时间。

bug#69097，即使关闭了gtid_mode，也会在启动时去扫描binlog文件。

当在重启前没有使用gtid_mode，重启后可能会去扫描所有的binlog文件，如果binlog文件很多的话，这显然是不可接受的。

bug#69096，无法通过gtid_next_list来跳过复制错误，因为默认编译下，gtid_next_list未被编译进去。

todo:gtid_next_list的逻辑上面均未提到，有空再看。

bug#69095，将备库的复制模式设置为statement/mixed。主库设置为row模式，执行dml 会导致备库复制中断

last_sql_error: error executing row event: ‘cannot execute statement: impossible to write to binary log since statement is in row format and binlog_format = statement.’

判断报错的backtrace：

handle_slave_worker->slave_worker_exec_job->rows_log_event::do_apply_event->open_and_lock_tables->open_and_lock_tables->lock_tables->thd::decide_logging_format

解决办法：将备库的复制模式设置为’row’ ，保持主备一致

该bug和gtid无关

bug#69045, 当主库执行类似 flush privileges这样的动作时，如果主库和备库都开启了gtid_mode，会导致复制中断

last_sql_error: error ‘cannot execute statements with implicit commit inside a transaction when @@session.gtid_next != automatic or @@session.gtid_next_list != null.’ on query. default database: ”. query: ‘flush privileges’

也是一个很低级的bug，在mysql5.6.11版本中，如果有可能导致隐式提交的事务，则gtid_next必须等于automatic，对备库复制线程而言，很容易就中断了，判断逻辑在函数gtid_pre_statement_checks中

[MySQL 5.6] GTID内部实现、运维变化及存在的bug前言：什么是GTID一、主库上的Gtid二、备库上的GTID三、运维操作四、存在的bug

继续阅读

ansible配置文件说明及ad hoc命令

云计算面试题——mysql/存储引擎/备份

关于SQL语言

vsftpd dead but subsys locked 的解决方法

SQL语言基础：常用的数据查询语句

Ubuntu16.04安装Apache+MySQL+PHP1. 安装Apache2. 安装MySQL3. 安装PHP4. 安装phpMyAdmin

Shell编程——sort排序、uniq忽略重复、tr替换压缩删除、cut指定删除字段、正则表达式元字符sort 命令uniq 命令tr 命令cut 命令正则表达式

Linxu常用命令技巧汇总

httpd服务的部署、启动、配置和简单优化一、部署二、启动三、配置文件

《Linux命令行与Shell脚本编程大全第2版.布卢姆》pdf

MySQL的4种隔离级别？出现问题

nginx 安装错误信息解决

neo4j之cypher使用文档

Ambari介绍和架构原理

mysql使用source命令导入.sql文件

sqlServer根据经纬查距离