天天看點

一個mysql事務引發的思考(血案)

#問題簡述

大家都知道mysql是支援ACID,支援事務的,事務是非常重要的一個特性,要不都執行成功,要不都不成功。我們在coding時也會大量用到,但是随着業務代碼的累加與使用人數的增加,系統有某些場景下會出現下面這個錯誤

11:48:40.265[http-nio-8999-exec-3] ERROR c.x.e.l.c.e.ExceptionHandleFilter - exception message: {} com.xxx.common.exception.ExceptionHandleFilter:45
org.springframework.dao.CannotAcquireLockException: 
### Error updating database.  Cause: java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction
### The error may involve com.xxx.dao.issuegroup.CatalogTaskDao.insert-Inline
### The error occurred while setting parameters
### SQL: INSERT INTO `catalog_task`  (`issue_id`,`catalog_id`,`status`,`cuserid`)  VALUES ( ?,?,?,?)
### Cause: java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction
; SQL []; Lock wait timeout exceeded; try restarting transaction; nested exception is java.sql.SQLException: Lock wait timeout exceeded; try restarting transaction
	at org.springframework.jdbc.support.SQLErrorCodeSQLExceptionTranslator.doTranslate(SQLErrorCodeSQLExceptionTranslator.java:259) ~[spring-jdbc-4.2.3.RELEASE.jar:4.2.3.RELEASE]
	at org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73) ~[spring-jdbc-4.2.3.RELEASE.jar:4.2.3.RELEASE]
	at org.mybatis.spring.MyBatisExceptionTranslator.translateExceptionIfPossible(MyBatisExceptionTranslator.java:75) ~[mybatis-spring-1.3.0.jar:1.3.0]
	at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:447) ~[mybatis-spring-1.3.0.jar:1.3.0]
	at com.sun.proxy.$Proxy99.insert(Unknown Source) ~[na:na]
	at org.mybatis.spring.SqlSessionTemplate.insert(SqlSessionTemplate.java:279) ~[mybatis-spring-1.3.0.jar:1.3.0]
	at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:56) ~[mybatis-3.4.0.jar:3.4.0]
	at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:53) ~[mybatis-3.4.0.jar:3.4.0]
           

#問題跟蹤

複現操作場景,跟蹤期間執行的sql語句

開啟mysql generallog日志,并找到日志路徑,如果未開啟的話執行

set global general_log = on;

即可。

mysql> show global variables like '%general%';
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    16082
Current database: alioo_dev
+------------------+---------------------------------+
| Variable_name    | Value                           |
+------------------+---------------------------------+
| general_log      | ON                              |
| general_log_file | /var/lib/mysql/9905e4ad340b.log |
+------------------+---------------------------------+
2 rows in set (0.61 sec)
           

需要注意的是問題複現期間建議使用單獨的mysql機器,我使用的是已經廢棄不用的機器(ip:10.94.120.110),這樣才能得到特定場景操作下的純粹的sql日志,截取具體的日志,

找到可能存在鎖競争的sql(過濾掉select語句):

2018-07-04T12:04:11.847548Z 16087 Query SET autocommit=0
2018-07-04T12:04:11.901111Z 16087 Query UPDATE t_x_archive la SET la.status = -1, mtime = '2018-07-04 20:04:11.875'  WHERE la.object_type = 1      AND la.object_id = 195959      AND la.status = 0
2018-07-04T12:04:12.083234Z 16087 Query insert into  t_x_change  ( `issueid`,`cuserid`,`ctime`, `type`, `status`, `field`, `oldvalue`, `newvalue`, `desc`,source)  values  (null,120859,now(),'list
',11,'status','195959',null,'柳志崇恢複了列“{2}”',0)
2018-07-04T12:04:12.099792Z 16087 Query update  t_x_list  set  name = 'done2',  ctime = '2018-05-09 16:55:45.0',  cuserid = 120859,  list_order = 3,  status = 0,  boardid = 10114, finished = 0 wh
ere id = 195959
2018-07-04T12:04:15.023635Z 16087 Query UPDATE `catalog`  SET `status` = 0,mtime=now() where  status !=-1 and id in    (    1649   ,   1650   ,   3602   ,   3603   ,   3604   ,   3605   )
2018-07-04T12:04:15.036100Z 16087 Query UPDATE `catalog_task`  SET `status` = 0,mtime=now()   where status!=-1 and issue_id in    (    118859   )
2018-07-04T12:04:15.051886Z 16087 Query update t_x_issue SET mtime = now()        ,boardid=10114  where id = 118859
2018-07-04T12:04:23.394968Z 16087 Query update t_x_issue SET mtime = now()        ,boardid=10114  where id = 118864
2018-07-04T12:04:24.308721Z 16085 Query SET autocommit=0
2018-07-04T12:04:24.497692Z 16085 Query insert ignore into t_x_issue set  title = 'bbbb',  content = null,  cuserid = 120859,  status = 0,  listid = 195406,  boardid = 10114,  priority = null,
   source_type = 0,   starttime = null,  endtime = null,  sequnce = 'LR4605-83',  parentid = null,  ctime = now(),  mtime = null,  workflowid = null,  type = 4722,  source = null,  issue_order = null
2018-07-04T12:04:25.218430Z 16085 Query insert into  t_x_change  ( `issueid`,`cuserid`,`ctime`, `type`, `status`, `field`, `oldvalue`, `newvalue`, `desc`,source)  values  (123096,120859,now(),'is
sue',1,'list','','195406','柳志崇在“Done1”添加了任務',0)
2018-07-04T12:04:25.620718Z 16085 Query insert into  t_x_card_propose_user  set  issue_id = 123096,  user_id = 120859,  status = null
2018-07-04T12:04:25.639585Z 16085 Query insert into  t_x_issue_user set  userid = 120859,  issueid = 123096,  type = 2
2018-07-04T12:04:25.832163Z 16085 Query update t_x_board SET mtime = '2018-07-04 20:04:25' ,count = 83  where id = 10114
2018-07-04T12:04:26.449587Z 16087 Query UPDATE `catalog`  SET `status` = 0,mtime=now() where  status !=-1 and id in    (    3602   ,   3604   ,   3605   )
2018-07-04T12:04:26.458029Z 16087 Query UPDATE `catalog_task`  SET `status` = 0,mtime=now()   where status!=-1 and issue_id in    (    118864   )
2018-07-04T12:04:26.468883Z 16087 Query update t_x_issue SET mtime = now()        ,boardid=10114  where id = 118864
2018-07-04T12:04:26.951996Z 16085 Query INSERT INTO `catalog_task`  (`issue_id`,`catalog_id`,`status`,`cuserid`)  VALUES ( 123096,1649,0,120859)
...
2018-07-04T12:05:13.319018Z 16087 Query UPDATE `catalog_task`  SET `status` = 0,mtime=now()   where status!=-1 and issue_id in    (    118873   )
2018-07-04T12:05:13.330249Z 16087 Query update t_x_issue SET mtime = now()        ,boardid=10114  where id = 118873
2018-07-04T12:05:16.033715Z 16087 Query UPDATE `catalog`  SET `status` = 0,mtime=now() where  status !=-1 and id in    (    1649   ,   1650   ,   3602   ,   3603   ,   3604   ,   3605   )
2018-07-04T12:05:16.042514Z 16087 Query UPDATE `catalog_task`  SET `status` = 0,mtime=now()   where status!=-1 and issue_id in    (    118874   )
2018-07-04T12:05:16.053007Z 16087 Query update t_x_issue SET mtime = now()        ,boardid=10114  where id = 118874
2018-07-04T12:05:17.967792Z 16085 Query rollback
2018-07-04T12:05:17.972078Z 16085 Query SET autocommit=1
2018-07-04T12:05:18.769333Z 16087 Query UPDATE `catalog`  SET `status` = 0,mtime=now() where  status !=-1 and id in    (    1649   ,   1650   ,   3602   ,   3603   ,   3604   ,   3605   )
2018-07-04T12:05:18.779075Z 16087 Query UPDATE `catalog_task`  SET `status` = 0,mtime=now()   where status!=-1 and issue_id in    (    118874   )
2018-07-04T12:05:18.791102Z 16087 Query update t_x_issue SET mtime = now()        ,boardid=10114  where id = 118874
2018-07-04T12:05:21.380525Z 16087 Query UPDATE `catalog`  SET `status` = 0,mtime=now() where  status !=-1 and id in    (    1649   ,   1650   ,   3602   ,   3603   ,   3604   ,   3605   )
2018-07-04T12:05:21.386858Z 16087 Query UPDATE `catalog_task`  SET `status` = 0,mtime=now()   where status!=-1 and issue_id in    (    118875   )
...
2018-07-04T12:08:29.544130Z 16087 Query update t_x_issue SET mtime = now()        ,boardid=10114  where id = 122925
2018-07-04T12:08:30.424652Z 16087 Query UPDATE `catalog`  SET `status` = 0,mtime=now() where  status !=-1 and id in    (    1649   ,   1650   ,   3602   ,   3603   ,   3604   ,   3605   )
2018-07-04T12:08:30.437569Z 16087 Query UPDATE `catalog_task`  SET `status` = 0,mtime=now()   where status!=-1 and issue_id in    (    122925   )
2018-07-04T12:08:30.448584Z 16087 Query update t_x_issue SET mtime = now()        ,boardid=10114  where id = 122925
2018-07-04T12:08:30.718356Z 16087 Query INSERT INTO `t_x_change`  ( `issueid`,`cuserid`,`ctime`, `type`, `status`, `field`, `oldvalue`, `newvalue`, `desc`,source)  VALUES     (118859,120859,now()
,'issue',10,'status','-1','0','x恢複了任務',0)  ,  (118859,120859,now(),'issue',10,'status','-1','0','x恢複了任務',0)  ,  (118861,120859,now(),'issue',10,'status','-1','0','x恢複了任務',0)  ,  (118861,120859,now(),'issue',10,
'status','-1','0','x恢複了任務',0)  ,  (118862,120859,now(),'issue',10,'status','-1','0','x恢複了任務',0)  ,  (118862,120859,now(),'issue',10,'status','-1','0','x恢複了任務',0)  ,  (118864,120859,now(),'issue',10,'status','-1
','0','x恢複了任務',0)  ,  (118864,120859,now(),'issue',10,'status','-1','0','x恢複了任務',0)  ,  (118865,120859,now(),'issue',10,'status','-1','0','x恢複了任務',0)  ,  (118865,120859,now(),'issue',10,'status','-1','0','x恢複了任務'
,0)  ,  (118866,120859,now(),'issue',10,'status','-1','0','x恢複了任務',0)  ,  (118866,120859,now(),'issue',10,'status','-1','0','x恢複了任務',0)  ,  (118867,120859,now(),'issue',10,'status','-1','0','x恢複了任務',0)  ,  (118
867,120859,now(),'issue',10,'status','-1','0','x恢複了任務',0) ...
2018-07-04T12:08:30.733035Z 16087 Query UPDATE t_x_archive la, t_x_issue li SET la.status = -1, la.mtime = now(), li.status = la.object_status, li.mtime = now() WHERE la.object_type = 2 AND la.s
tatus = 0 AND li.listid = 195959  AND li.status = -1 AND li.id = la.object_id
2018-07-04T12:08:30.879701Z 16087 Query UPDATE t_x_board_list_lane SET status = 0, archive_id = null WHERE  status = -1   and board_id = 10114   and list_id = 195959   and archive_id = 57354
2018-07-04T12:08:31.623011Z 16087 Query insert into  notice_info  (`title`, `user_id`, `type`, `desc`, `url`, `from`, `from_id`, `c_time`) values  ('柳志崇恢複了列“done2”。',38604,1,'','http://x-dev.intra.xiaoju
keji.com/project/LR4605','柳志崇',120859,'2018-07-04 20:08:31.618')
2018-07-04T12:08:32.241940Z 16087 Query commit
2018-07-04T12:08:32.244986Z 16087 Query SET autocommit=1
           

至此問題基本鎖定了表catalog_task(為什麼這麼說呢,隻有這張表同時涉及到了update與insert變更操作)

#場景還原

接下來就簡化業務場景,準備測試表,測試資料進行驗證

##初始化

drop table catalog_task_test;
CREATE TABLE `catalog_task_test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `issue_id` int(11) NOT NULL,
  `catalog_id` int(11) NOT NULL COMMENT '目錄id',
  `status` tinyint(3) NOT NULL,
  `cuserid` int(11) NOT NULL,
  `ctime` timestamp NULL DEFAULT CURRENT_TIMESTAMP COMMENT '建立時間',
  `mtime` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '更新時間',
  PRIMARY KEY (`id`) 
) ENGINE=InnoDB AUTO_INCREMENT=7199 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ;
INSERT INTO `catalog_task_test`  (`issue_id`,`catalog_id`,`status`,`cuserid`)  VALUES ( 111,1649,0,120859) ;
INSERT INTO `catalog_task_test`  (`issue_id`,`catalog_id`,`status`,`cuserid`)  VALUES ( 222,1649,0,120859) ;
           

##測試腳本

session1:
set autocommit=0;
INSERT INTO `catalog_task_test`  (`issue_id`,`catalog_id`,`status`,`cuserid`)  VALUES ( 333,1649,0,120859) ;
session2:(session1語句執行完成後再執行session2中的語句)
set autocommit=0; 
select now(); update catalog_task_test set status=1 where issue_id=111; select now();
           

session2實際執行時會報錯,提示事務逾時,錯誤結果:

mysql> select now(); update catalog_task_test set status=1 where issue_id=111; select now();
+---------------------+
| now()               |
+---------------------+
| 2018-07-03 10:23:26 |
+---------------------+
1 row in set (0.01 sec)
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
+---------------------+
| now()               |
+---------------------+
| 2018-07-03 10:24:17 |
+---------------------+
1 row in set (0.01 sec) 
           

##疑問點

  • Q: insert into難道是表級鎖嗎,為什麼在insert into語句沒有執行commit; 時 update需要等待他的送出
  • A: 以mysql的InnoDB引擎為例,有索引并且使用了該索引當條件的時候就是行鎖,沒有索引的時候就是表鎖。innodb的行鎖是在有索引的情況下,沒有索引的表是鎖定全表的。insert一般都是行鎖, update能确定且走索引的是行鎖,否則表鎖。

##結合上面的解答,補充2個問題

  • Q: 那麼先在session1中執行update ,然後在session2中執行insert 會出現鎖逾時的問題麼?
  • A: 會,表級鎖被占用的情況下是拿不到行級鎖的
  • Q: 那麼先在session1中執行insert ,然後在session2中執行insert 會出現鎖逾時的問題麼?
  • A:不會,因為insert 均是行級鎖
  • Q: 那麼先在session1中執行update ,然後在session2中執行update 會出現鎖逾時的問題麼?
  • A:會

#解決思路1

将update表級鎖降為行級鎖(可以通過按主鍵進行更新或者where條件列增加索引)

上述案例添加索引sql如下:

ALTER TABLE `catalog_task_test` ADD INDEX idx_issue_id ( `issue_id` )  ;
           

#解決思路2

上述案例insert并不是拿不到鎖就立即抛異常的,它會等待innodb_lock_wait_timeout時間後還拿不到鎖才會抛出異常,可以優化先拿到鎖的業務邏輯,在innodb_lock_wait_timeout時間以内處理完并釋放鎖即可的

擷取鎖的逾時時間的方法(在這個時間内會阻塞一直等待鎖,到時間還拿不到鎖的話就會抛異常)

show variables like '%innodb_lock_wait_timeout%'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| innodb_lock_wait_timeout | 50    |
+--------------------------+-------+