MySQL死鎖

https://dev.mysql.com/doc/refman/5.7/en/innodb-deadlocks.html

什麼是mysql的死鎖？

A deadlock is a situation where different transactions are unable to proceed because each holds a lock that the other needs. Because both transactions are waiting for a resource to become available, neither ever release the locks it holds.

簡單來說可以提煉出2個詞：環路等待（each holds a lock that the other needs）和不可剝奪（neither ever release the locks it holds）。

其實廣泛意義上死鎖的四個必要條件也可以直接簡化為上述兩個條件，剩下的互斥和請求保持條件隻是兩個衆所周知的補充。

一、一個簡單的死鎖示例：

會話A：

mysql> CREATE TABLE t (i INT) ENGINE = InnoDB;
Query OK, 0 rows affected (1.07 sec)

mysql> INSERT INTO t (i) VALUES(1);
Query OK, 1 row affected (0.09 sec)

mysql> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT * FROM t WHERE i = 1 LOCK IN SHARE MODE;
+------+
| i    |
+------+
| 1    |
+------+

會話B：

mysql> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)

mysql> DELETE FROM t WHERE i = 1;

此時會話B會被阻塞（直到鎖請求逾時）。

此時會話A繼續執行：

DELETE FROM t WHERE i = 1;

會話B會被立馬rollback，因為産生了死鎖，最近的死鎖資訊可以通過show engine innodb status\G看到。

打開innodb_print_all_deadlocks參數之後，死鎖資訊還會在error日志裡列印。鑒于本例過于簡單就不占用篇幅分析死鎖資訊了。

set @@global.innodb_print_all_deadlocks=on;

innodb會選擇耗費資源較少的事務進行復原（取決于DML涉及的行數和size）。

二、一個實際的死鎖示例：

error日志裡顯示的死鎖日志為：

InnoDB: transactions deadlock detected, dumping detailed information.
*** (1) TRANSACTION:
TRANSACTION 209262583957, ACTIVE 1 sec starting index read
mysql tables in use 2, locked 2
LOCK WAIT 4 lock struct(s), heap size 1184, 2 row lock(s)
MySQL thread id 129183854, OS thread handle 0x7f1aeae7a700, query id 68320628504  updating
update  tb_authorize_info set account_balance=account_balance-  100.00 
     where (SELECT a.account_balance from 
(select account_balance from tb_authorize_info a where appId =  '49E5BD695F853DC3' )a)  -  100.00 > 0 
 and appId = '49E5BD695F853DC3'
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1845 page no 4 n bits 96 index `PRIMARY` of table `xxx`.`tb_authorize_info` trx id 209262583957 lock_mode X locks rec but not gap waiting
Record lock, heap no 18 PHYSICAL RECORD: n_fields 32; compact format; info bits 0
......

*** (2) TRANSACTION:
TRANSACTION 209262584968, ACTIVE 1 sec starting index read
mysql tables in use 2, locked 2
4 lock struct(s), heap size 1184, 2 row lock(s)
MySQL thread id 129183879, OS thread handle 0x7f198b208700, query id 68320632234  updating
update  tb_authorize_info set account_balance=account_balance-  100.00 
     where (SELECT a.account_balance from 
(select account_balance from tb_authorize_info a where appId =  '49E5BD695F853DC3' )a)  -  100.00 > 0 
 and appId = '49E5BD695F853DC3'
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 1845 page no 4 n bits 96 index `PRIMARY` of table `xxx`.`tb_authorize_info` trx id 209262584968 lock mode S locks rec but not gap
Record lock, heap no 18 PHYSICAL RECORD: n_fields 32; compact format; info bits 0
......

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 1845 page no 4 n bits 96 index `PRIMARY` of table `xxx`.`tb_authorize_info` trx id 209262584968 lock_mode X locks rec but not gap waiting
Record lock, heap no 18 PHYSICAL RECORD: n_fields 32; compact format; info bits 0
......

*** WE ROLL BACK TRANSACTION (2)

這個死鎖屬于簡單的死鎖，由于網絡或其他延遲導緻應用請求發送到了2台負載均衡的應用伺服器，兩個應用程式同時請求資料庫執行SQL，兩者都根據where條件先擷取到了S鎖，然後準備更新為(或新增)X鎖以便更新，但是各自被對方的S鎖阻塞，是以形成死鎖，不過死鎖很快被mysql殺掉，事務1正常執行完畢，事務二復原，前台業務除了一點點延遲基本沒啥影響。

三、stackoverflow上另一個死鎖：

有人在stackoverflow上發了一個死鎖的資訊，嘗試直接解析此類資訊對分析高并發下的SQL卡慢會有幫助是以嘗試自己解析。

https://dba.stackexchange.com/questions/39550/when-and-why-can-this-kind-of-deadlock-occur

LATEST DETECTED DEADLOCK
------------------------
130409  0:40:58
*** (1) TRANSACTION:
TRANSACTION 3D61D41F, ACTIVE 3 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 43 lock struct(s), heap size 6960, 358 row lock(s), undo log entries 43
MySQL thread id 17241690, OS thread handle 0x7ffd3469a700, query id 860259163 localhost root update
#############
INSERT INTO `notification` (`other_grouped_notifications_count`, `user_id`, `notifiable_type`, `action_item`, `action_id`, `created_at`, `status`, `updated_at`) 
VALUES (0, 4442, 'MATCH', 'MATCH', 224716, 1365448255, 1, 1365448255)
#############
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 272207 n bits 1272 index `user_id` of table `notification` trx id 3D61D41F lock_mode X locks gap before rec insert intention waiting
Record lock, heap no 69 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 8000115b; asc    [;;
 1: len 4; hex 0005e0bb; asc     ;;
-- 事務1欲插入資料user_id=4442，是以首先擷取了對應主鍵(lower_bound,4443]範圍上的插入意向鎖，然後想要在輔助索引(lower_bound,4443]的範圍上加insert intention lock，但被阻塞，推斷這個範圍上已經有了其他事務的行鎖
-- 事務1需要擷取2個插入意向鎖後才會開始插入操作，這兩個鎖的擷取是不可分割的
*** (2) TRANSACTION:
TRANSACTION 3D61C472, ACTIVE 15 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1248, 2 row lock(s)
MySQL thread id 17266704, OS thread handle 0x7ffd34b01700, query id 860250374 localhost root Updating
#############
UPDATE `notification` SET `status`=0 WHERE user_id = 4443 and status=1
#############
*** (2) HOLDS THE LOCK(S):
-- 事務2的update語句要更新user_id=4443的記錄，是以首先在user_id索引的(lower_bound,4443]範圍添加了X模式的next-key行鎖，事務1就是被這個next-key行鎖阻塞的
RECORD LOCKS space id 0 page no 272207 n bits 1272 index `user_id` of table `notification` trx id 3D61C472 lock_mode X
Record lock, heap no 69 PHYSICAL RECORD: n_fields 2; compact format; info bits 0
 0: len 4; hex 8000115b; asc    [;;
 1: len 4; hex 0005e0bb; asc     ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
-- 當事務2嘗試更新主鍵資料時要擷取user_id=4443對應主鍵的行鎖，但是發現主鍵的(lower_bound,4443]範圍上已經被事務1加了insert intention lock，是以被阻塞
-- 同樣事務2擷取輔助索引的next-key和主鍵的record鎖也是不可分割的，隻有都擷取完畢才能進行update
RECORD LOCKS space id 0 page no 261029 n bits 248 index `PRIMARY` of table `notification` trx id 3D61C472 lock_mode X locks rec but not gap waiting
Record lock, heap no 161 PHYSICAL RECORD: n_fields 16; compact format; info bits 0
 0: len 4; hex 0005e0bb; asc     ;;
 1: len 6; hex 00000c75178f; asc    u  ;;
 2: len 7; hex 480007c00c1d10; asc H      ;;
 3: len 4; hex 8000115b; asc    [;;
 4: len 8; hex 5245474953544552; asc REGISTER;;
 5: SQL NULL;
 6: SQL NULL;
 7: SQL NULL;
 8: len 4; hex d117dd91; asc     ;;
 9: len 4; hex d117dd91; asc     ;;
 10: len 1; hex 80; asc  ;;
 11: SQL NULL;
 12: SQL NULL;
 13: SQL NULL;
 14: SQL NULL;
 15: len 4; hex 80000000; asc     ;;

*** WE ROLL BACK TRANSACTION (2)

是以這個死鎖的出現就很容易了解了，事務1先擷取了4442位置主鍵的插入意向鎖，在擷取輔助索引上的插入意向鎖時被事務2 update語句的next-key行鎖阻塞導緻插入意向鎖擷取失敗，而事務2的update擷取了索引的next-key行鎖後嘗試更新主鍵(即在主鍵上加非gap行鎖)卻被事務1的插入意向鎖阻塞。

兩個事務都不能放棄自己已有的資源，都請求與對方不相容的鎖，不可剝奪且形成環路等待是以死鎖。

這個死鎖的根源就在于事務2的update語句持續的時間過長，導緻後繼insert語句卡死。

四、如何避免死鎖？

其實官網有一篇完整的介紹：https://dev.mysql.com/doc/refman/5.7/en/innodb-deadlocks-handling.html

但是内容有點多，我還是習慣用幾句話總結下：

1、盡可能優化SQL的查詢性能使得事務盡可能的短小。

2、如果不介意幻讀可以使用read committed隔離級别以禁止範圍鎖。

3、如果前兩者都做不到或者SQL優化的空間比較小，那麼盡量分表分庫，通過增加資源（或者叫分散資源）減少資源沖突的幾率。

五、總結：

由于mysql innodb特殊的行鎖機制，死鎖通常都是涉及到插入意向鎖和next-key鎖的，因為這兩個鎖是範圍鎖，範圍鎖設計的目的就是為避免幻讀，這會鎖定一些自己不需要操作的記錄。

MySQL死鎖

繼續閱讀

線程的相關知識、JAVA實作死鎖、生産者消費者問題

第1章并發程式設計的挑戰

driver verifier檢測驅動死鎖

windbg調試驅動自旋鎖死鎖

Java Thread Dump 死鎖分析

計算機作業系統還能這樣玩？這一篇計算機作業系統的總結為你保駕護航（零風險、高品質、萬字長文、建議收藏）

SQL Server死鎖問題：事務(程序 ID x)與另一個程序被死鎖在鎖 | 通信緩沖區資源上并且已被選作死鎖犧牲品。請重新運作該事務。

作業系統學習筆記——第三章死鎖和第四章存儲管理

死鎖的3種死法

三 python并發程式設計之多線程-理論

Java: 寫一個死鎖的程式

分門别類總結Java中的各種鎖，讓你徹底記住

了解golang中關鍵字-chan&select

程序管理邏輯圖

LockSupper是什麼？他和Lock、Synchroized有什麼關系？解決了什麼？