If you use ACID transactional databases, you’ve probably heard of lock wait timeouts and deadlocks. What are these? And how are they different?
It’s inevitable that many of us will come across these phenomena at some point. Most databases use locking strategies to make sure that data stays consistent when multiple users (or connections, or processes) are reading and modifying the data concurrently. The locks ensure that only one operation can alter a specific portion of the data at a time as well as serializing changes to further mitigate these race conditions. Without this locking, confusing and incorrect behaviors can happen.
Lock wait timeouts and deadlocks both arise from certain locking mechanisms. A lock wait timeout results when one user gets a lock on some data and holds it while another user tries to access it. If the first user doesn’t unlock the data, the second one will time out after a while. The database will respond to the second user with an error message that says their lock wait was too long. This behavior keeps the second user from waiting forever so they can stay productive by working on something else.
Deadlocks are a related behavior, but they’re more complicated. A deadlock happens when multiple lock waits happen in such a manner that none of the users can do any further work. For example, the first user and second user both lock some data. Then each of them tries to access each other’s locked data. There’s a cycle in the locking: user A is waiting on B, and B is waiting on A. Neither can proceed.
Some databases resolve deadlocks through lock wait timeouts, but this is really wasteful. For one thing, if users have to wait for their lock requests to time out, everything will become really slow. (MySQL’s default lock wait timeout is 50 seconds.) More seriously, this will lead to a lot of long-lived locks, which increases the chance that lots and lots of other users will also try to access the same data and themselves get into lock waits or deadlocks.
That’s why high-performance databases typically do lock cycle detection, looking for a loop in the chain of locks. When they find the loop, they know that one of the users’ work needs to be aborted to let others go ahead. The user that gets selected as the victim will receive an error message saying there was a deadlock. Hopefully, the other users can proceed with their work now that this user’s locks have been forcibly cleared.
The basic strategies for avoiding the wasteful work delays from lock waits and deadlocks are the same:
- Lock as little data as possible
- Lock it for as short a time as possible
- If you’re operating on multiple pieces of data, try to arrange for it to be done in the same order. (Cycles only happen when some things are working one direction and others are going the opposite way).
Most of the time, you don’t need to worry about any of this. The database typically takes care of all of this for you. You normally don’t need to explicitly ask for data to be locked, and you generally won’t need to think a lot about the order of data access. But if you do encounter lock wait timeouts or deadlocks, you might need to dig more deeply into the internals of the database’s execution plans with EXPLAIN, examine the schema and indexing, and perhaps even think about the physical ordering of the data to understand what’s causing the issues.