Beware of Database Tuning Advisors

Posted by Baron Schwartz on Oct 26, 2015 3:38:27 PM

One of the common questions we get during sales demos is “does VividCortex give advice on my database’s configuration?” The assumption is that since our product sees lots of information about the database, operating system, and current configuration, it can “optimize” the database configuration. Or, at least, point out really obviously bad things? Surely that is not too hard to do?

VividCortex doesn’t do that, and there are good reasons why. By way of explanation I’d like to tell a few war stories. Warning: you’re probably going to vow never to look at auto-generated tuning advice again.

Get Down Off The Ledge

At one point, a significant chunk of my career and earnings consisted of undoing the damage caused by configuration advisors. It was often very hard to convince people to reverse the changes the configuration advisors had urged them to make. It’s a psychological problem, not a technical one.

One incident stands out clearly in my memory. An online gaming site hired me as a consultant to help fix problems with MySQL crashing repeatedly. There were a lot of curse-words as the engineering team explained their frustration with MySQL’s instability. It had been completely unreliable for a year and nothing was improving the situation.

I surveyed the situation, then took a look at the server. The log revealed that it was crashing due to being killed by the OS for out-of-memory. The sort buffer was set to an enormously large value, and there were multiple queries that did sorting in memory. Just a couple of them running at once was sufficient to run the server out of memory and cause OOM killing. There really was no mystery. The fix was simple.

Implementing it, however, was not. That giant buffer was no accident. It was giant because some tuning script that was dividing a performance counter by another and trying to reduce the ratio was giving advice to increase the buffer.

make-my-day.jpg

As I explained to the team that they had configured MySQL to make multi-gigabytemalloc() calls every time a query needed to sort 10 rows, and that this was clearly the cause of the problem, they reacted by digging in their heels. “We are not going to undo all the optimizations we’ve made to the server configuration,” they said. “The server is performing awfully. If we undo those optimizations it’s going to get worse.”

They trusted a configuration advisor more than the evidence in front of their eyes.

The best part was that the advisor was making the “increase the sort buffer size” recommendation for reasons that would never be fixed by increasing that setting. No matter how big they made it, the advisor wanted it to be bigger.

I had to convince them to halve the size of the buffer, then reduce it further, then further… and eventually we got it back to a sane size and the server stabilized.

Are They Really Experts?

I wish I could say that was an isolated incidence. But a lot of configuration advisors are just so indescribably bad.

They’re written by people who think they know what they’re talking about but have no clue. Most of the time they’re copy-pasted from Google search results. I kid you not. These things go viral and there’s no undoing it.

facepalm.jpg

Here’s an example: in MySQL, internal XA transaction support is required to properly synchronize and serialize binary logs, which are used for point-in-time recovery and replication. Somehow, someone got the idea that this was needed only for externally-initiated XA transactions. There are several products on the market now that show orange or red (warning or critical) configuration advice to disable this functionality if you don’t need XA transactions, because you’re paying a performance penalty you don’t need.

This is dead wrong and it’s dangerous. It results in corrupted or inconsistent data.

Anyone who includes a recommendation like this in their product has the wrong job. Right now a couple of engineers are scurrying to remove this from their product, I know, but with “a suite of over 350 best-practice advisors” I guarantee there are other disasters they are too ignorant to recognize and remove.

Even Good Advice is Risky

Here’s the thing, even expert advice from live humans is something to implement with caution. This is something we did really well at Percona. We recognized that even fixing a truly bad configuration setting carried quite a bit of risk, because it was changing a system that was in a known state. The effects of a configuration and behavior change aren’t always easy to predict. Ninety-nine percent of these changes improve things, but those one-percent edge cases exist, and it’s very likely one or more of them will interfere and cause something unexpected.

Peter drove this point home to me when we were designing the Percona Configuration Wizard. I wanted to generate an ideal configuration for a server as though we were starting from a blank slate. Peter cautioned me that this wasn’t the real use-case we’d get. People would enter values from existing servers, he said, and apply our recommendations. Changing things like the SQL mode would be very dangerous for those situations.

That distinction is vital: there’s a huge difference between generating a starting-point configuration for a freshly set-up server for a new application, and changing something that’s already running in production. The former is safe and it’s sensible to apply “best practices”, but the latter is to be approached with extreme caution.

Think about it this way: what if you have taken a configuration wizard’s bad advice and turned off XA transaction support? You’ve got a server with a data corruption issue that needs to be fixed. And yet, if you fix it, you will change its performance characteristics. Do you really want to be blamed for causing a performance issue, or even downtime, as a result of fixing a bad configuration?

My Manager Said…

Products that offer a “suite” of “comprehensive” and “intelligent” suggestions that represent “best practices based on our expert team’s experience” are just window dressing for management. They’ve been sold a lot of configuration best-practice tuning tools and they’re pretty sure if the product they’re evaluating doesn’t include one, it’s not at the industry’s standard of excellence.

It’s just a war of checkbox compliance and feature list comparison grids. It’s all for sales purposes and looking good in front of management.

If you want to look good in front of management, do yourself a favor and send your managers this blog post instead!

Is It Possible At All?

Maybe good configuration advice in an automated way is possible, sure. If you do it enough it really does start to feel like something that should be automated, kind of like self-driving cars. I mean, all you do is sit, turn the steering wheel, and accelerate and brake, right?

Snark aside, I actually do believe a sophisticated program has a shot at doing a decent job. The things computers can do these days just boggle the mind. And I know there areresearch projects on this topic, which use much more sophisticated approaches than just calculating ratios of counters, so that gives me hope.

The Percona Configuration Wizard is actually pretty good at generating a nice default config file for a new server. I continue to use it. Our config files at VividCortex came from that wizard. But that’s an entirely different use case.

Conclusions

Creating a default config file for a database server is a tractable problem; automatically making or suggesting changes to a production server is another. That’s why we look but don’t touch. And that’s why we don’t provide any advice or suggestions in our product. This may change someday, but in the meantime I would suggest…

  1. Reading our ebook on configuring MySQL for performance
  2. Buying a copy of my book High Performance MySQL
  3. Hiring live human experts (I’m biased towards Percona, but you can get good help from MySQL, MariaDB, Pythian and a variety of others)

Above all, focus on measuring query performance, and you’ll be way better off! Most gains come from queries, indexes, and schema design–not from configuration changes.

Recent Posts

Posts by Topic

see all