Our recent blog post on finding wasted prepare/execute/close cycles showed how per-statement analysis can find individual queries that are not being repeatedly executed after being prepared. To recap, we found a number of statements in our own application that were being prepared, executed, and then never re-executed. This means 3 network round-trips instead of just executing the statement directly, which takes only 1 round-trip. By fixing this, we materially decreased load on our servers and improved latency for the service that issued the query.
But Can’t You Already Do This?
Can’t you just do this with the SHOW STATUS counters? What if you do…
show global status like 'Com_stmt_%';
And compare the numbers of prepared, executed and closed statements? Does this work?
No. Global counters are server-wide averages. They reveal nothing about individual statements or classes of statements. It’s like taking the average temperature of patients in the hospital. Individuals with high fevers will get lost in the average.
We’re not all that interested in global averages. We want per-query measurements. As Cary Millsap notes in Optimizing Oracle Performance, your measurements must be correctly scoped. Measuring global counters when you’re interested in behavior of a particular query or type of query, is a scoping mismatch. (Tangentially: we do measure and record global counters, because we can use regression to figure out whether and how they are related to individual types of queries).
Stimulus, Response, Reward, And The End Of Learning
The global SHOW STATUS trick is dangerous because it sometimes “works.” I put air-quotes around “works” because I’ve seen the following scenario play out countless times:
- Someone observes a problem.
- They observe something about the system, and make a change in something related.
- The problem improves.
- The person has now learned, incorrectly, that the change is always the right solution.
This is because sometimes conditions are necessary but not sufficient, sometimes sufficient in combination with other conditions, and sometimes just a coincidence. Likewise, the effects of the change could have any of a variety of relationships to the original problem. And if you don’t take a scientific approach you can never tell the difference.
I wish that wrong solutions never would appear to work, because learning something untrue is worse than having no opinion. As Mark Twain said,
It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so. - Mark Twain
Drowning In The Sea Of Global Counters
Let’s see how this can happen with “tuning” systems by looking at SHOW STATUS’s global counters. Suppose queries 3 and 4, ranked by total time elapsed, are a prepare/execute cycle that together consume 8% of the total time the whole server spends doing queries. Suppose that replacing them by one non-prepared query will eliminate half the latency (4% of the demand on the server).
Are we likely to find this situation by looking in SHOW STATUS counters? Well, it depends. What are queries 1 and 2? Let’s imagine that they are executions of prepared statements that are properly reused, say, 10 times each. Let’s also pretend, to keep things simple, that all queries, executes, and prepares on this server consume exactly 1 millisecond. Query 1 consumes 40% of time (and execution count), query 2 consumes 20%, 3 and 4 are 4% each, and queries 5-N account for the rest.
In this situation, this (quite important) problem in one specific type of query is going to be utterly lost in the noise. It’s easy to see that if other statements on the server are prepared once and executed many times, you won’t have close to a 1:1 prepare/execute ratio globally unless something is seriously wrong with a very high-ranking query. In fact, with any statements at all using prepared statements multiple times, the global counters won’t approach 1:1 unless some queries are prepared many times and never used!
And yet, even a relatively unimportant query, when considered from the standpoint of overall time consumed, is important to fix. That’s because, from the application’s point of view, that query is providing a bad user experience with at least double the latency it should, regardless of whether it’s in the long tail overall.
A Real-Life Case
John mentioned a particular case in the blog post I linked to previously. Here’s another specific query we optimized recently by replacing prepare-execute with a simple query:
This screenshot shows queries 2 and 3, a prepare+execute pair, being replaced by query 1, a straightforward non-prepared execution. There is a caveat here: query 1 is not a direct replacement; this change was made along with adding a subquery to gain some additional functionality. That’s why query 1’s blue bar is longer than query 2’s. This isn’t apples-to-apples.
The point, however, is not that. The point is to consider how important this query was overall. I removed this information in the screenshot so you’d be tricked initially, but the screenshot is filtered to select only those queries. That’s why those queries rank 1, 2, and 3.
How important/large are they in terms of the server’s overall workload? Well, before optimizing these queries, they accumulated about 7.6 minutes of total execution time in the interval shown, but the server overall accumulated about 40 hours of total execution time. These queries were such a miniscule portion overall that they would never show up in global statistics. In terms of both response time and execution count, they are orders of magnitude smaller than the total.
In other words, not only are you never going to find them with SHOW GLOBAL STATUS, but you’ll never even suspect there’s a problem. And yet, for the user interface component they affect, there is definitely a problem.
Per-Query Statistics Considered Indispensable
This is why per-query statistics are so vital. And it’s why “tuning” by looking at global counters is such a disaster. When I was a consultant, I had to fix countless problems caused by use of “tuning scripts” that just look at ratios of global counters. Sort buffer size tuning was a classic case: people would configure giant sort buffers because of one or two abusive queries skewing the ratios. Those queries needed larger sort buffers, but creating larger sort buffers for all queries caused great harm to queries that didn’t need them. (For more detail on this type of configuration, check out our free eBook, Configuring MySQL For Performance.)
The bottom line is, if you’re not looking at per-query statistics, in high resolution, you’re flying utterly blind. You’re looking at global server-wide counters at best, and there is no way to disaggregate those counters and see what their relevance to individual queries was. Your servers’ workload is almost certainly a mixture of lots of different things. The majority, whether they perform well or poorly, whether they need attention or not, are always going to dominate the global statistics and obliterate the minority components of the workload that badly need attention.
Never measure the average temperature of patients in the hospital. It provides no insight and is a complete waste of your time. Do something better with that time.
If you don’t have insight1 into the specific, fine-grained details of your database’s query workload, at 1-second resolution, in your real production environment, get some for free right now, because it takes less than 2 minutes.
1This is a trick. I know you don’t unless you use VividCortex, because we’re the only solution on the market that provides it. To quote Etsy’s Arie Kachler, “This is the highest level of visibility we’ve ever seen in our MySQL servers.”