[MySQL 5.6] MySQL 5.6 group commit 性能測試及内部實作流程新參數性能測試實作原理

盡管mariadb以及facebook在long long time ago就fix掉了這個臭名昭著的問題，但官方直到 mysql5.6 版本才fix掉，本文主要關注三點：

1.mysql 5.6的性能如何

2.在5.6中group commit的三階段實作流程

mysql 5.6提供了兩個參數來控制binlog group commit：

<a href="http://dev.mysql.com/doc/refman/5.6/en/replication-options-binary-log.html#sysvar_binlog_max_flush_queue_time">binlog_max_flush_queue_time</a>

機關為微妙，用于從flush隊列中取事務的逾時時間，這主要是防止并發事務過高，導緻某些事務的rt上升。

可以閱讀函數mysql_bin_log::process_flush_stage_queue 來了解其功能

<a href="http://dev.mysql.com/doc/refman/5.6/en/replication-options-binary-log.html#sysvar_binlog_order_commits">binlog_order_commits</a>

當設定為0時，事務可能以和binlog不相同的順序被送出，從下面的測試也可以看出，這會稍微提升點性能，但并不是特别明顯.

老規矩，先測試看看性能

sysbench, 全記憶體操作，5個sbtest表，每個表1000000行資料

基本配置：

innodb_flush_log_at_trx_commit=1

table_open_cache_instances=5

metadata_locks_hash_instances = 32

metadata_locks_cache_size=2048

performance_schema_instrument = ‘%=on’

performance_schema=on

innodb_lru_scan_depth=8192

innodb_purge_threads = 4

關閉performance schema consumer：

mysql> update setup_consumers set enabled = ‘no';

query ok, 4 rows affected (0.02 sec)

rows matched: 12 changed: 4 warnings: 0

sysbench/sysbench –debug=off –test=sysbench/tests/db/update_index.lua –oltp-tables-count=5 –oltp-point-selects=0 –oltp-table-size=1000000 –num-threads=1000 –max-requests=10000000000 –max-time=7200 –oltp-auto-inc=off –mysql-engine-trx=yes –mysql-table-engine=innodb –oltp-test-mod=complex –mysql-db=test –mysql-host=$host –mysql-port=3306 –mysql-user=xx run

update_index.lua

threads

sync_binlog = 0

sync_binlog = 1

sync_binlog =1binlog_order_commits=0

900

610

620

13,800

7,000

7,400

20,000

14,500

16,000

120

25,100

21,054

23,000

200

27,900

25,400

27,800

400

33,100

30,700

31,300

600

32,800

31,500

29,326

1000

20,400

20,200

20,500

我的機器在壓到1000個并發時，cpu已經幾乎全部耗完。

可以看到，并發度越高，group commit的效果越好，在達到600以上并發時，設定sync_binlog=1或者0已經沒有tps的差別。

但問題是。我們的業務壓力很少會達到這麼高的壓力，低負載下，設定sync_binlog=1依舊增加了單個線程的開銷。

以下集中在5.6中binlog如何做group commit。在5.6中，将binlog的commit階段分為三個階段：flush stage、sync stage以及commit stage。5.6的實作思路和mariadb的思路類似，都是維護一個隊列，第一個進入該隊列的作為leader線程，否則作為follower線程。leader線程收集follower的事務，并負責做sync，follower線程等待leader通知操作完成。

這三個階段中，每個階段都會去維護一個隊列：

mutex_queue m_queue[stage_counter];

不同session的thd使用the->next_to_commit來連結，實際上，在如下三個階段，盡管維護了三個隊列，但隊列中所有的thd實際上都是通過next_to_commit連接配接起來了。

在binlog的xa_commit階段(mysql_bin_log::commit),完成事務的最後一個xid事件後，，這時候會進入mysql_bin_log::ordered_commit，開始3個階段的流程：

###flush stage

change_stage(thd, stage_manager::flush_stage, thd, null, &lock_log)

|–>stage_manager.enroll_for(stage, queue, leave_mutex) //将目前線程加入到m_queue[flush_stage]中，如果是隊列的第一個線程，就被設定為leader，否則就是follower線程，線程會這其中睡眠，直到被leader喚醒(m_cond_done)

|–>leader線程持有lock_log鎖，從change_state線程傳回false.

flush_error= process_flush_stage_queue(&total_bytes, &do_rotate, &wait_queue); //隻有leader線程才會進入這個邏輯

|–>首先讀取隊列，直到隊列為空，或者逾時（逾時時間是通過參數binlog_max_flush_queue_time來控制)為止，對讀到的每個線程做flush_thread_caches，将binlog刷到cache中。注意在出隊列的時候，可能還有新的session被append到隊列中，設定逾時的目的也正在于此

|–>如果是逾時，這時候隊列中還有session的話，就取出整個隊列的頭部線程，并将原隊列置空(fetch_queue_for)，然後對取出的session進行flush_thread_caches

|–>判斷總的寫入binlog的byte數是否超過max bin log size，如果超過了，就設定rotate标記

flush_error= flush_cache_to_file(&flush_end_pos);

|–>将i/o cache中的内容寫到檔案中

signal_update() //通知dump線程有新的binlog

###sync stage

change_stage(thd, stage_manager::sync_stage, wait_queue, &lock_log, &lock_sync)

|–>stage_manager.enroll_for(stage, queue, leave_mutex) //目前線程加入到m_queue[sync_stage]隊列中，釋放lock_log鎖；同樣的如果是sync_stage隊列的leader，則立刻傳回，否則進行condition wait.

|–>leader線程加上lock_sync鎖

final_queue= stage_manager.fetch_queue_for(stage_manager::sync_stage); //從sync_stage隊列中取出來，并清空隊列，主要用于commit階段

std::pair<bool, bool> result= sync_binlog_file(false); //刷binlog 檔案(如果設定了sync_binlog的話)

簡單的了解就是，在flush stage階段形成n批的組session，在sync階段又會由這n批組産生出新的leader來負責做最耗時的sync操作

###commit stage

commit階段受到參數binlog_order_commits限制

當binlog_order_commits關閉時，直接unlock lock_sync，由各個session自行進入innodb commit階段(随後調用的finish_commit(thd))，這樣不會保證binlog和事務commit的順序一緻，如果你不關注innodb的ibdata中記錄的binlog資訊，那麼可以關閉這個選項來稍微提高點性能

當打開binlog_order_commits時，才會進入commit stage，如下描述的

change_stage(thd, stage_manager::commit_stage,final_queue, &lock_sync, &lock_commit)

|–>進入新的commit_stage隊列，釋放lock_sync鎖，新的leader擷取lock_commit鎖，其他的session等待

thd *commit_queue= stage_manager.fetch_queue_for(stage_manager::commit_stage); //取出并清空commit_stage隊列

process_commit_stage_queue(thd, commit_queue, flush_error)

|–>這裡會周遊所有的線程，然後調用ha_commit_low->innobase_commit進入innodb層依次送出

完成上述步驟後，解除lock_commit鎖

stage_manager.signal_done(final_queue);

|–>将所有pending的線程的标記置為false（thd->transaction.flags.pending= false）并做m_cond_done廣播，喚醒pending的線程

(void) finish_commit(the); //如果binlog_order_commits設定為false，就會進入這一步來送出存儲引擎層事務; 另外還會更新grid資訊

innodb的group commit和mariadb的類似，都隻有兩次sync，即在prepare階段sync，以及sync binlog檔案（雙一配置），為了保證rotate時，所有前一個binlog的事件的redo log都被刷到磁盤，會在函數new_file_impl中調用如下代碼段：

if (dbug_evaluate_if(“expire_logs_always”, 0, 1)

&& (error= ha_flush_logs(null)))

goto end;

ha_flush_logs 會調用存儲引擎接口刷日志檔案

[MySQL 5.6] MySQL 5.6 group commit 性能測試及内部實作流程新參數性能測試實作原理

繼續閱讀

2022秋招面試總結（cpp+java+測開）百度測開一面位元組後端一面蝦皮後端一面蝦皮後端二面

資料庫之DDL操作資料庫DDL操作資料庫DDL操作資料表

資料庫之DQL操作資料庫

mysql優化（sql優化）

資料遷移方法資料遷移原則資料遷移之雙寫方案資料遷移之級聯同步方案

redis叢集資料一緻性_RedisRaft為Redis叢集帶來強大的資料一緻性

寶塔面闆mysql恢複2018.1.8更新

Centos7 MySQL 5.7 安裝MySQL 5.7 安裝

查找入職員工時間排名倒數第三的員工所有資訊

Hibernate使用Hibernate的“3個準備，7個步驟”Hibernate API簡介操作實體對象對象識别

雲計算面試題——mysql/存儲引擎/備份

SQL語言基礎：常用的資料查詢語句

Ubuntu16.04安裝Apache+MySQL+PHP1. 安裝Apache2. 安裝MySQL3. 安裝PHP4. 安裝phpMyAdmin

MySQL的4種隔離級别？出現問題

neo4j之cypher使用文檔

mysql使用source指令導入.sql檔案