天天看點

MySQL死鎖套路:一次詭異的批量插入死鎖問題分析實驗 01實驗02實驗03

線上最近出現了批量insert的死鎖,百思不得解。死鎖記錄如下:

2018-10-26T11:04:41.759589Z 8530809 [Note] InnoDB: 
*** (1) TRANSACTION:

TRANSACTION 1202026765, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 1
MySQL thread id 8532863, OS thread handle 139858337453824, query id 16231472122 10.111.10.143 seewo update
INSERT IGNORE INTO xx_performance_type_label_relation(label_id, performance_type_id, type, create_time)
    VALUES
      
      ('bb0394e670644168a998a93a3ed521bc', '06b96ee0bab84d71bb17bf9645d3aa54', 1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', '27d82e2331b241e1a9c9c0a74ec21099', -1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', '3100b5978fb24f56b327d25732a7d7a7', 1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', '435a1e19ce6e4e5bbb84240b3b34cf03', 1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', '447fe27199ca40e289ef2834469d9a78', 1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', '87a52c4d00844b5bb9eb75e8fe34202a', 1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', 'c6a0e26983bd4fae837d5ee2f4efeef8', 1, now())
2018-10-26T11:04:41.759635Z 8530809 [Note] InnoDB: *** (1) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 505 page no 9912 n bits 288 index uk_performance_type_id_label_id of table `masaike`.`xx_performance_type_label_relation` trx id 1202026765 lock_mode X locks gap before rec insert intention waiting
2018-10-26T11:04:41.759674Z 8530809 [Note] InnoDB: *** (2) TRANSACTION:

TRANSACTION 1202026764, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 1
MySQL thread id 8530809, OS thread handle 139858469242624, query id 16231472119 10.111.10.153 seewo update
INSERT IGNORE INTO xx_performance_type_label_relation(label_id, performance_type_id, type, create_time)
    VALUES
      
      ('bb0394e670644168a998a93a3ed521bc', '06b96ee0bab84d71bb17bf9645d3aa54', 1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', '27d82e2331b241e1a9c9c0a74ec21099', -1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', '3100b5978fb24f56b327d25732a7d7a7', 1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', '435a1e19ce6e4e5bbb84240b3b34cf03', 1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', '447fe27199ca40e289ef2834469d9a78', 1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', '87a52c4d00844b5bb9eb75e8fe34202a', 1, now())
     , 
      ('bb0394e670644168a998a93a3ed521bc', 'c6a0e26983bd4fae837d5ee2f4efeef8', 1, now())
2018-10-26T11:04:41.759713Z 8530809 [Note] InnoDB: *** (2) HOLDS THE LOCK(S):

RECORD LOCKS space id 505 page no 9912 n bits 288 index uk_performance_type_id_label_id of table `masaike`.`xx_performance_type_label_relation` trx id 1202026764 lock mode S
2018-10-26T11:04:41.759753Z 8530809 [Note] InnoDB: *** (2) WAITING FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 505 page no 9912 n bits 288 index uk_performance_type_id_label_id of table `masaike`.`xx_performance_type_label_relation` trx id 1202026764 lock_mode X locks gap before rec insert intention waiting
2018-10-26T11:04:41.759784Z 8530809 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (2)
           

第一反應是批量insert,insert的順序不一樣導緻的死鎖。但是這個在這裡是不成立的。原因有兩點

出現問題的批量插入SQL中順序是一模一樣的,在順序一樣的情況下,隻會進行插入等待(implicit lock轉explicit X鎖)下面有實驗

如果是因為批量插入順序不一緻帶來的死鎖日志,列印的結果不是等待插入意向鎖(insert intention waiting),下面有實驗

現在采用一個簡化的表,做實驗

CREATE TABLE `t1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `a` varchar(5)  NOT NULL DEFAULT '',
  `b` varchar(5)  NOT NULL DEFAULT '',
  PRIMARY KEY (`id`),
  UNIQUE KEY `uk_name` (`a`,`b`)
) ENGINE=InnoDB;
           

實驗 01

在記錄不存在的情況下,兩個同樣順序的批量insert同時執行,第二個會進行鎖等待狀态

首先truncate t1;

MySQL死鎖套路:一次詭異的批量插入死鎖問題分析實驗 01實驗02實驗03

可以看到目前鎖的狀态

mysql> select * from information_schema.innodb_locks;
+-------------+-------------+-----------+-----------+------------+------------+------------+-----------+----------+-----------+
| lock_id     | lock_trx_id | lock_mode | lock_type | lock_table | lock_index | lock_space | lock_page | lock_rec | lock_data |
+-------------+-------------+-----------+-----------+------------+------------+------------+-----------+----------+-----------+
| 31AE:54:4:2 | 31AE        | S         | RECORD    | `d1`.`t1`  | `uk_name`  |         54 |         4 |        2 | '1', '1'  |
| 31AD:54:4:2 | 31AD        | X         | RECORD    | `d1`.`t1`  | `uk_name`  |         54 |         4 |        2 | '1', '1'  |
+-------------+-------------+-----------+-----------+------------+------------+------------+-----------+----------+-----------+
           

在我們執行事務t1的insert時,沒有在任何鎖的斷點處出現,這跟MySQL插入的原理有關系

insert 加的是隐式鎖。什麼是隐式鎖?隐式鎖的意思就是沒有鎖

在t1插入記錄時,是不加鎖的。這個時候事務t1還未送出的情況下,事務t2嘗試插入的時候,發現有這條記錄,t2嘗試擷取S鎖,會判定記錄上的事務id是否活躍,如果活躍的話,說明事務未結束,會幫t1把它的隐式鎖提升為顯式鎖(X鎖)

源碼如下:

MySQL死鎖套路:一次詭異的批量插入死鎖問題分析實驗 01實驗02實驗03

t2擷取S鎖的結果:DB_LOCK_WAIT

MySQL死鎖套路:一次詭異的批量插入死鎖問題分析實驗 01實驗02實驗03

實驗02

批量插入順序不一緻的導緻的死鎖日志不是等待插入意向鎖

MySQL死鎖套路:一次詭異的批量插入死鎖問題分析實驗 01實驗02實驗03
------------------------
LATEST DETECTED DEADLOCK
------------------------
181101  9:48:36
*** (1) TRANSACTION:
TRANSACTION 3309, ACTIVE 215 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 376, 2 row lock(s), undo log entries 2
MySQL thread id 2, OS thread handle 0x70000a845000, query id 58 localhost root update
insert into t1(a, b)values("2", "2")
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 55 page no 4 n bits 72 index `uk_name` of table `d1`.`t1` trx id 3309 lock mode S waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 1; hex 32; asc 2;;
 1: len 1; hex 32; asc 2;;
 2: len 4; hex 80000002; asc     ;;

*** (2) TRANSACTION:
TRANSACTION 330A, ACTIVE 163 sec inserting
mysql tables in use 1, locked 1
3 lock struct(s), heap size 376, 2 row lock(s), undo log entries 2
MySQL thread id 3, OS thread handle 0x70000a888000, query id 59 localhost root update
insert into t1(a, b)values("1", "1")
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 55 page no 4 n bits 72 index `uk_name` of table `d1`.`t1` trx id 330A lock_mode X locks rec but not gap
Record lock, heap no 3 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 1; hex 32; asc 2;;
 1: len 1; hex 32; asc 2;;
 2: len 4; hex 80000002; asc     ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 55 page no 4 n bits 72 index `uk_name` of table `d1`.`t1` trx id 330A lock mode S waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 1; hex 31; asc 1;;
 1: len 1; hex 31; asc 1;;
 2: len 4; hex 80000001; asc     ;;

*** WE ROLL BACK TRANSACTION (2)
           

到目前為止,已經陷入了僵局,完全沒法複現死鎖的情況。看了代碼,發現在insert之前有一個delete,但是delete與insert不在一個事務裡面,也就是delete送出以後,才進行批量insert,真正出問題的地方在批量insert的地方。一開始就排除了delete對後面的影響,難道不在一個事務,也會有影響?

寫了一個代碼去模拟,有很大機率會複現

fun test() {
    dao.delete() // 對應delete from
    // sleep for 10ms
    dao.insert() // 對應insert ignore
}
           

對應的 SQL 如下,注意是兩個事務

begin;
delete from t1 where a = '25'
commit;

begin;
INSERT ignore INTO `t1` (`a`, `b`) VALUES('25','1')
commit;
           

這個代碼在兩個線程同時調用的時候,非常容易死鎖。

後來翻遍了網上相關的死鎖案例,有一個關于purge删除的過程可能跟這個有關系。

如果标記為删除,說明事務已經送出,還沒來得及 purge,這時後面的事務加S鎖等待;

在源碼中列印一些日志。

1.在storage/innobase/row/row0ins.c的row_ins_set_shared_rec_lock增加日志,可以看到對唯一索引增加S鎖的過程

if (dict_index_is_clust(index)) {
    err = lock_clust_rec_read_check_and_lock(
        0, block, rec, index, offsets, LOCK_S, type, thr);
} else {
    err = lock_sec_rec_read_check_and_lock(
        0, block, rec, index, offsets, LOCK_S, type, thr);
    // 增加如下日志
    fprintf(stderr, "row_ins_set_shared_rec_lock %s %lu %d\n" , index->name, type, err);
}
           

2.在lock_rec_enqueue_waiting增加日志,可以看到鎖等待的情況

static
enum db_err
lock_rec_enqueue_waiting(
{
	fprintf(stderr, "lock_rec_enqueue_waiting::::: %s %lu\n" , index->name, type_mode);
}
           

日志大概如下

row_ins_set_shared_rec_lock uk_name 0 9 (t1擷取S鎖成功)
row_ins_set_shared_rec_lock uk_name 0 9 (t2擷取S鎖成功)

lock_rec_enqueue_waiting::::: uk_name 2563(t1 X鎖進如鎖等待)
lock_rec_enqueue_waiting::::: uk_name 2563(t2 X鎖進如鎖等待)
           

其中2563=2048+512+3=LOCK_INSERT_INTENTION+LOCK_GAP+LOCK_X

這個過程跟非常經典的三個事務同時insert,一個復原,剩下的兩個事務一個成功,一個死鎖,其實是一模一樣的原理。

實驗03

三個 insert ignore,一個復原造成的死鎖

insert語句都是insert ignore into t1(a, b)values(“1”, “1”);以下省略

MySQL死鎖套路:一次詭異的批量插入死鎖問題分析實驗 01實驗02實驗03

死鎖日志,跟我們案例中的一模一樣

------------------------
LATEST DETECTED DEADLOCK
------------------------
181101 23:22:59
*** (1) TRANSACTION:
TRANSACTION 5032, ACTIVE 11 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1248, 2 row lock(s), undo log entries 1
MySQL thread id 5, OS thread handle 0x70000d736000, query id 125 localhost root update
insert ignore into t1(a, b)values("1", "1")
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 56 page no 4 n bits 584 index `uk_name` of table `d1`.`t1` trx id 5032 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 139 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 3; hex 313031; asc 101;;
 1: len 3; hex 313031; asc 101;;
 2: len 4; hex 800007b1; asc     ;;

*** (2) TRANSACTION:
TRANSACTION 5033, ACTIVE 6 sec inserting
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1248, 2 row lock(s), undo log entries 1
MySQL thread id 6, OS thread handle 0x70000d779000, query id 126 localhost root update
insert ignore into t1(a, b)values("1", "1")
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 56 page no 4 n bits 584 index `uk_name` of table `d1`.`t1` trx id 5033 lock mode S locks gap before rec
Record lock, heap no 139 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 3; hex 313031; asc 101;;
 1: len 3; hex 313031; asc 101;;
 2: len 4; hex 800007b1; asc     ;;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 56 page no 4 n bits 584 index `uk_name` of table `d1`.`t1` trx id 5033 lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 139 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 3; hex 313031; asc 101;;
 1: len 3; hex 313031; asc 101;;
 2: len 4; hex 800007b1; asc     ;;

*** WE ROLL BACK TRANSACTION (2)
           

目前來看,得到的結論是:

一個已送出但是未purge掉的記錄會造成後續插入擷取S共享鎖,兩個事務同時擷取S鎖,然後嘗試擷取插入意向鎖,造成死鎖

網上大神梳理的insert流程

首先對插入的間隙加插入意向鎖(Insert Intension Locks)

如果該間隙已被加上了 GAP 鎖或 Next-Key 鎖,則加鎖失敗進入等待;

如果沒有,則加鎖成功,表示可以插入;

然後判斷插入記錄是否有唯一鍵,如果有,則進行唯一性限制檢查

如果不存在相同鍵值,則完成插入

如果存在相同鍵值,則判斷該鍵值是否加鎖

如果沒有鎖, 判斷該記錄是否被标記為删除

如果标記為删除,說明事務已經送出,還沒來得及 purge,這時加 S 鎖等待;

如果沒有标記删除,則報 1062 duplicate key 錯誤;

如果有鎖,說明該記錄正在處理(新增、删除或更新),且事務還未送出,加 S 鎖等待;

插入記錄并對記錄加 X 記錄鎖;

上面是我 debug 源碼得到的一些結論,如果我的了解有誤的話,可以留言讨論哦~

原文連結:https://club.perfma.com/article/2342623?utm_source=tuicool&utm_medium=referral

如果覺得本文對你有幫助,可以關注一下我公衆号,回複關鍵字【面試】即可得到一份Java核心知識點整理與一份面試大禮包!另有更多技術幹貨文章以及相關資料共享,大家一起學習進步!

MySQL死鎖套路:一次詭異的批量插入死鎖問題分析實驗 01實驗02實驗03