Query Time Is Not Server Load

Posted by Baron Schwartz on Mar 24, 2014 2:25:00 AM

For many years my writing, teaching, coaching, performance optimization, and consulting has assumed the following to be true: to reduce load on a MySQL server and free up resources, sort queries by total time and work from the top. The problem is that is wrong. In this post I’ll explain how how we discovered this and why it matters.

Query_time

As we built our Top Queries feature a thought began to nag at me. I realized that the assumption of “top queries equals top resource consumers” is simply another way of saying “query time and resource consumption are linearly related.” I started to wonder, what if it’s not true? This is the Nth time I’ve created a Top Queries tool of one form or another. If I’ve been doing it wrong, now’s the time to fix that.

To know whether a Top Queries view is a good way to find out what’s consuming resources, you must measure. The only problem is, you can’t measure resource consumption per query. There are several reasons for this:

  • I/O per query is impossible to measure accurately because of optimizations inside the server, such as asynchronous operations. The I/O operations per query in Percona Server’s slow query log, for example, are just estimates.
  • Memory access and CPU cycles per query are not instrumented in MySQL, and would suffer from the same inaccuracies if it were.
  • Even if the above operations were implemented in the server, they would be wrong because there’s a tremendous amount of “background work” the server does, and although this is caused by queries it can’t be blamed on them directly.

And thus was born our weighted linear regression technique. After many months of work, ultimately we developed an innovative new way to infer what can’t be measured. What we discovered is this:

  • Query execution time does not correlate linearly with resource consumption
  • Top queries by execution time may not be the ones that consume the most resources
  • The queries that are really consuming your resources may be far from the top in a “slow query log profile.”

In many cases, query time and, say, CPU time line up nicely. But not always. Here’s a Top Queries view on one of our production servers, with a few computed columns enabled.

Query__total_time

At 62% overall, the first query is by far the biggest contributor to time spent inside the server, but that doesn’t tell the whole story. The biggest hog of IO and CPU resources is the second query, even though it consumes only 15% of the total time.

If we were trying to reduce the amount of IO and CPU consumed by the server, working on the first query would be tilting at windmills. And without the ability to quantify this, we’d never know the truth. You cannot optimize what you cannot measure.

So if you’re using slow query log analysis to “optimize” your server, you’re potentially working on the wrong things.

Are you trying to reduce the load on your server? Would you like to increase the amount of CPU, IO, memory, and other resources available? VividCortex is the only tool that can show the truth about where your resources are being spent.

Recent Posts

Posts by Topic

see all