Your ORM and SELECT FOR UPDATE

Posted by Baron Schwartz on Jan 23, 2019 2:45:27 PM

If you use a programming language framework such as Django or Ruby on Rails, chances are you use an ORM to help you construct queries.

jj-ying-215308-unsplash

What’s an ORM?

If you’re not familiar with ORMs, an ORM is an Object-Relational Mapper, which helps you map objects in your code to tables in your database, without writing SQL. It basically writes the SQL for you, so your objects know how to store and retrieve themselves from a relational database like MySQL or Postgres.

ORMs can be a great productivity booster. Most SQL is tedious and repetitive, and maintaining it manually when you change classes (for example, adding or removing fields) is painful. Especially in the early stages of development when you aren’t sure of the final data model and you’re iterating rapidly, ORMs save a ton of work.

And frankly, ORMs make a lot of sense on an ongoing basis too, even for mature, high-traffic systems. You usually can let them write the bulk of the SQL queries for you, intervening only when you find a query that needs to be hinted or tweaked a specific way. Usually this is only necessary for efficiency or performance purposes, or to remove a lot of logic from the app and express it more concisely in SQL, which is better at some kinds of complex set logic operations.

But ORMs don’t always work great, and like any abstraction that can potentially leak, they have sharp edges that can bite you. One of the most common I’ve seen over a couple of decades is locking hints for the database. Also known as the horror that is SELECT FOR UPDATE.

SELECT FOR UPDATE Locking Hints

Most databases offer various syntax hints to signal your intent when you access a row: are you planning to change it later? If so, signaling that early-on can help reduce deadlocks, lock conflicts, and lock waits. The essence of the issue is that if you look at a row, planning to update it later, and then come back later to do so, you might have found that something happened to it in the meantime. To prevent this or other types of race conditions, both MySQL and PostgreSQL offer SELECT FOR UPDATE. These place exclusive locks on the rows you examine, so that nothing else can access them in the meantime. (The exact details of their locking models differ slightly, but that’s not important here.)

An oversimplified SELECT FOR UPDATE might look like this:

SELECT * FROM user WHERE id=1234 FOR UPDATE;

Now, if another concurrent access to the database tried to run a query like the following, it’d block and wait, because that row is locked in exclusive mode:

DELETE FROM user WHERE id=1234;

Sounds good, right? Right?!

Why All The Lock Waits?

Not so fast. What else will end up in a lock wait? Why, another copy of the very same original SELECT query that took the lock, of course! Or, not only that, but all queries that access that row, whether they mention it explicitly by id or not, for example ranging over it for some reason:

SELECT COUNT(*) FROM user WHERE last_login > NOW() - INTERVAL 1 DAY;

Whoops! It’s really common for a bunch of SELECT FOR UPDATE queries to be lined up single-file, waiting on the first one to release its lock. Then the next one in line has a turn, the next one, and so on. Because of queueing theory, this turns into a cascading pile-up really quickly when this query’s average utilization (the percent of time there’s at least one copy of it running) starts to barely nudge past “imperceptible.”

Now, what does this have to do with your ORM? It turns out that many ORMs have helpful, easy-to-use ways to add locking hints to queries. And because it’s so easy to use them—you can sometimes even use them accidentally without really knowing it, such as if you copy-paste some code—you frequently find them where they’re not needed. And as a result, you frequently have pile-ups of them in your database. (You might not know it unless you have a powerful query workload monitoring tool like VividCortex, but it’s really common to find them once you do.)

Even if it’s not a performance problem now, it’s one of those things that escalates so quickly and without warning (again: queueing theory) that it’s important to be able to find and analyze SELECT FOR UPDATE queries. That’s why, using VividCortex as an example again, there’s query intelligence rules built into VividCortex to flag these queries as “potentially bad” even if they’re not causing performance problems today.

Using Locking Hints in Rails and Django

So how does your ORM add locking hints? It depends, of course. Here’s examples in the two frameworks we see the most commonly.

In Ruby On Rails, the method is .lock and you might see it in your code like the following: User.where("id = 1234").lock(true). The ActiveRecordLockingPessimistic documentation illustrates a bunch of variations on locking, including locking an entire code block so you won’t see the locking applied to each statement explicitly.

In Django, the method is select_for_update() and the code looks like User.objects.select_for_update().filter(id=1234). The QuerySet API reference documentation has all the details.

SELECT FOR UPDATE is powerful and useful. But with great power comes great responsibility. So, like all other powerful and useful things that databases can do for you, awareness is often all that you really need to avoid a potential problem! Next time you see row locking in a query or in your ORM, try to understand the intent behind it, and validate that it’s really needed. Databases have powerful concurrency control models to enable concurrent access without race conditions, so if you can avoid serializing access to your data, you will certainly be better off!

Photo by JJ Ying on Unsplash.

Recent Posts

Posts by Topic

see all