1、问题描述
测试中发现delete\update语句在并发测试时经常会导致发生死锁。
2、问题分析
在mariadb中,update或delete使用 where(…,…,…) in (…,…,..) 的写法会导致全表扫描,加大锁冲突的概率,造成死锁。
例1:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | MariaDB [lzk]> show create table b\G *************************** 1. row *************************** Table: b Create Table: CREATE TABLE `b` ( `a` int(11) NOT NULL DEFAULT '0', `b` int(11) NOT NULL DEFAULT '0', `c` int(11) NOT NULL DEFAULT '0', `d` int(11) NOT NULL DEFAULT '0', `e` int(11) NOT NULL DEFAULT '0', `f` int(11) DEFAULT NULL, PRIMARY KEY (`a`,`b`,`c`,`d`,`e`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin   MariaDB [lzk]> explain delete from b where a=1 and b=2 and c=1 and d=1 and e=1; +------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ | 1 | SIMPLE | b | range | PRIMARY | PRIMARY | 20 | NULL | 1 | Using where | +------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ 1 row in set (0.00 sec) --直接根据主键定位到记录   MariaDB [lzk]> explain delete from b where (a,b,c,d,e) in ((1,1,1,1,1)); +------+-------------+-------+------+---------------+------+---------+------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+------+------+-------------+ | 1 | SIMPLE | b | ALL | NULL | NULL | NULL | NULL | 18 | Using where | +------+-------------+-------+------+---------------+------+---------+------+------+-------------+ 1 row in set (0.00 sec) --全表扫描 | 
场景1:
当会话1:start transaction;
delete from b where (a,b,c,d,e) in ((1,1,1,1,1));
会话2:start transaction;
delete from b where (a,b,c,d,e) in ((1,1,2,1,1));
此时会话2会产生锁等待。
而场景2:
当会话1:start transaction;
delete from b where where a=1 and b=1 and c=1 and d=1 and e=1;
会话2:start transaction;
delete from b where where a=1 and b=1 and c=2 and d=1 and e=1;
不会产生锁等待。
例2:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | MariaDB [test]> explain update b set c='qwe' where (a,b) in ((1,10)); +------+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+ | id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows    | Extra       | +------+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+ |    1 | SIMPLE      | b     | index | NULL          | PRIMARY | 8       | NULL | 9278400 | Using where | +------+-------------+-------+-------+---------------+---------+---------+------+---------+-------------+ 1 row in set (0.00 sec)--全索引扫描   MariaDB [test]> explain update b set c='qwe' where a=1 and b=10; +------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ | id   | select_type | table | type  | possible_keys | key     | key_len | ref  | rows | Extra       | +------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ |    1 | SIMPLE      | b     | range | PRIMARY       | PRIMARY | 8       | NULL |    1 | Using where | +------+-------------+-------+-------+---------------+---------+---------+------+------+-------------+ 1 row in set (0.00 sec)--直接根据主键定位到记录 | 
3、解决办法
不使用delete\update where… in的写法,改为delete\update where … and …or
4、半一致性读仍会死锁问题
在read-committed隔离级别下,update会使用半一致性读,为什么还是有几率发生死锁?
根据场景复现后,show engine innodb status查看死锁信息,发现发生死锁的事务会占有很多锁,不符合innodb半一致读的特性:
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 | ------------------------ LATEST DETECTED DEADLOCK ------------------------ 2016-03-30 15:19:46 7f79e7fec700 *** (1) TRANSACTION: TRANSACTION 132876, ACTIVE 9 sec fetching rows mysql tables in use 1, locked 1 LOCK WAIT 252 lock struct(s), heap size 96696, 226 row lock(s), undo log entries 1 MySQL thread id 5, OS thread handle 0x7f79e8bb5700, query id 15 redhat64-26 10.47.160.26 root updating update test.c set c = '94414' where (a,b) in ((1,'5167')) *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 6 page no 6 n bits 496 index `PRIMARY` of table `test`.`c` trx id 132876 lock_mode X locks rec but not gap waiting Record lock, heap no 211 PHYSICAL RECORD: n_fields 6; compact format; info bits 0  0: len 4; hex 80000001; asc     ;;  1: len 4; hex 80000358; asc    X;;  2: len 6; hex 00000000c7fb; asc       ;;  3: len 7; hex 06000001671e10; asc     g  ;;  4: len 5; hex 3533343732; asc 53472;;  5: len 3; hex 383536; asc 856;;   *** (2) TRANSACTION: TRANSACTION 132878, ACTIVE 9 sec fetching rows mysql tables in use 1, locked 1 98 lock struct(s), heap size 96696, 72 row lock(s), undo log entries 1 MySQL thread id 6, OS thread handle 0x7f79e7fec700, query id 17 redhat64-26 10.47.160.26 root updating update test.c set c = '33764' where (a,b) in ((1,'5250')) *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 6 page no 6 n bits 496 index `PRIMARY` of table `test`.`c` trx id 132878 lock_mode X locks rec but not gap Record lock, heap no 211 PHYSICAL RECORD: n_fields 6; compact format; info bits 0  0: len 4; hex 80000001; asc     ;;  1: len 4; hex 80000358; asc    X;;  2: len 6; hex 00000000c7fb; asc       ;;  3: len 7; hex 06000001671e10; asc     g  ;;  4: len 5; hex 3533343732; asc 53472;;  5: len 3; hex 383536; asc 856;;   *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 6 page no 6 n bits 496 index `PRIMARY` of table `test`.`c` trx id 132878 lock_mode X locks rec but not gap waiting Record lock, heap no 166 PHYSICAL RECORD: n_fields 6; compact format; info bits 0  0: len 4; hex 80000001; asc     ;;  1: len 4; hex 8000032b; asc    +;;  2: len 6; hex 00000000c5e1; asc       ;;  3: len 7; hex 790000019f170a; asc y      ;;  4: len 5; hex 3734313133; asc 74113;;  5: len 3; hex 383131; asc 811;;   *** WE ROLL BACK TRANSACTION (2) | 
5、Mariadb处理逻辑
在RC隔离级别下,在执行计划为全表扫描或全索引扫描的情况下innodb处理update的流程如下图:


6、源码分析
遍历记录的主函数为row_search_for_mysql()
调用sel_set_rec_lock()尝试加锁
如果加锁时产生锁等待,即:
| 1 2 3 | case DB_LOCK_WAIT:     /* Never unlock rows that were part of a conflict. */     prebuilt->new_rec_locks = 0;   //设置锁定的行数为0 | 
进行半一致性读:
| 1 | row_sel_build_committed_vers_for_mysql() | 
如果在半一致读时,等待的锁已被释放,则锁读当前版本
| 1 2 3 | /* The lock was granted while we were     searching for the last committed version.     Do a normal locking read. */ | 
如果读取到历史版本则设置did_semi_consistent_read = TRUE;
根据did_semi_consistent_read的值设置prebuilt->row_read_type
| 1 2 3 4 5 6 7 |  if (UNIV_UNLIKELY(did_semi_consistent_read)) { //读取历史版本,即使用了半一致性读     prebuilt->row_read_type = ROW_READ_DID_SEMI_CONSISTENT;  } else { //锁读当前版本     prebuilt->row_read_type = ROW_READ_TRY_SEMI_CONSISTENT;  } | 
调用解锁函数ha_innobase::unlock_row()
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | switch (prebuilt->row_read_type) case ROW_READ_TRY_SEMI_CONSISTENT:     //未使用半一致性读     row_unlock_for_mysql(prebuilt, FALSE);     break; case ROW_READ_DID_SEMI_CONSISTENT:     prebuilt->row_read_type = ROW_READ_TRY_SEMI_CONSISTENT;     break; row_unlock_for_mysql() if (prebuilt->new_rec_locks >= 1) { //解锁记录 lock_rec_unlock() } | 
但是由于在上面的代码中prebuilt->new_rec_locks的值为0,所以不会对记录解锁!!!
测试MySQL不存在上述问题,应该是这块逻辑与MariaDB不同。
