Tags

, , , , ,

We had a one of our slave servers frequently stop replicating with the “Innodb Lock Wait Timeout” error. The slave IO thread would continue to fetch the binlogs while the slave SQL thread kept stopping with  the above mentioned error. The teams initial inclination was to change the innodb lock wait timeout variable from 50 secs to a higher value. It was a read-only slave. Our expectation was there would be no competing writes. Then we started listing what are the next steps possible and what could be wrong.

  1. There could be a user with “super” privilege in the system that was running updates directly on the slave
  2. A backup script that could be locking the tables out for backup
  3. Increase the “innodb lock wait timeout variable
  4. Enable the innodb lock monitor

While we were working on the system, we noticed that there were few select queries that were running for 300+ secs of the following format.

SELECT * FROM s WHERE (s.id in (select s_id from i where o_id = 79168 AND ( u_id = 79168 OR u_id = 0 )))  ORDER BY s.id DESC limit 15

We wanted to see what was innodb upto since the tables in question were innodb. The “show engine innodb status”  showed us something really interesting.

—TRANSACTION 37ECF6, ACTIVE 1 sec, process no 17368, OS thread id 1263749456 fetching rows, thread declared inside InnoDB 481
mysql tables in use 2, locked 1
2242 lock struct(s), heap size 309232, 883550 row lock(s)
MySQL thread id 1948263, query id 17739683 localhost root preparing
SELECT * FROM s WHERE (s.id in (select s_id from i where o_id = 79168 AND ( u_id = 79168 OR u_id = 0 )))  ORDER BY s.id DESC limit 15

It was a select statement with sub query. The oddity comes from “mysql tables in use 2, locked 1“. Why does a select statement need to lock a particular table ? We replaced that particular query the with below one helped us alleviate the locking issue.

SELECT * FROM  s, i WHERE s.id=i.s_id and i.o_id = 79168 and (i.u_id = 79168 OR i.u_id=0) ORDER BY s.id DESC limit 15

Innodb status output for this query :

—TRANSACTION 37ECF7, ACTIVE 5 sec, process no 17368, OS thread id 1263749456 fetching rows, thread declared inside InnoDB 352
mysql tables in use 2, locked 0
MySQL thread id 1948263, query id 17739687 localhost root Sending data
SELECT * FROM  s, i WHERE s.id=i.s_id and i.o_id = 79168 and (i.u_id = 79168 OR i.u_id=0) ORDER BY s.id DESC limit 15

Once the query was replaced and an index was added to o_id column, we were back to sanity.