What is Concurrency in a Database?

Posted by Baron Schwartz on Jan 28, 2019 2:36:00 PM

Databases are confusing! Not only are they complicated bits of software, but database jargon is really complicated. Part of this comes from the number of words that have multiple meanings: depending on which community you spend time in, there are multiple meanings for words like schema, cluster, and even database.

Concurrency is one of those words! Concurrency comes from words that mean, roughly, “to run together.”

steven-lelham-342930-unsplash

There are two major ways we use concurrency when dealing with databases, and each of them is really helpful to understand. So let’s learn the two types of database concurrency that matter most for everyday usage! (Note: there’s a quiz at the end!)

Simultaneous Access To Data

Have you heard of MVCC, or Multi-Version Concurrency Control? Chances are your database of choice uses this technique to coordinate simultaneous access to data by multiple users. MVCC governs the first type of concurrency in a database.

This kind of concurrency is important because it’s all about multiple users accessing data at the same time without causing inconsistencies such as race conditions. The familiar example is two users modifying an account balance: one of them deposits money, another transfers it. Without proper concurrency control, there are orders of operations that could cause account balances not to reconcile afterwards. Whoops!

Databases use MVCC to permit concurrent access to the data by giving users the illusion that they’re the only ones touching the database. It’s as if they have exclusive access, even though they really don’t. Behind the scenes, the illusion is handled through locking, keeping old row versions, and other mechanisms. It’s really complicated.

But generally, it works really well, and lets lots of users use the database simultaneously, or concurrently. You usually only realize it’s happening when you are restricted from doing something that would violate correctness guarantees, which you usually see as a lock wait.

Coexistent Query Workload

The second type of concurrency is a fundamental measure of system performance. We use the term “concurrency” to measure how many units of work are co-executing actively at the same time—that is, how many things are in progress at once.

If you look at the database and see three queries running actively, the query concurrency is 3.

Concurrency is so important because it’s most commonly the independent variable in most performance models, such as most queueing models and the Universal Scalability Law. The other fundamental performance metrics, such as latency and throughput, are dependent variables. But concurrency is the factor that’s usually controllable directly.

Concurrency also happens to be the most universal definition of load, and most directly comparable to system capacity:

  • If you have four CPUs, your server’s max capacity is concurrency of 4.
  • Most system load metrics are really concurrency under a different name; the Unix load average, for example, is a measure of how many things are in progress and/or queued, which is their concurrency in the processor or buffer. Backlog, saturation, queue depth, and many other performance terms usually refer to concurrency in one way or another.
  • We know from Little’s Law (but I won’t prove it here) that if you sum up all the latencies of queries in an interval of time, and divide by the length of the duration, you get average concurrency.

When it comes to performance, concurrency’s centrality makes it one of the most important performance metrics to understand, and lets you evaluate questions such as whether you have enough capacity or resources to complete the workload. That means it’s super helpful for diagnosing problems, for example.

Wrapping Up

Now you know the two major meanings of concurrency in databases: a) simultaneous access to data, which the database must orchestrate carefully to avoid inconsistencies; and b) the measure of how much work the database is doing at a point in time, or by extension the average amount of work the database did during an interval of time.

Now for the quiz: if I were to say concurrency is important, would you concur wholeheartedly? <rimshot>

Photo by Steven Lelham on Unsplash

Recent Posts

Posts by Topic

see all