This post was featured in our July 2015 anthology of most popular VividCortex blog posts of all time. To see more of our top content, check out that list here.
With all the choice of database systems today, why does VividCortex use MySQL?
Given my expertise in MySQL, you might assume that I chose it simply because that’s what I know. That’s part of it, but there are other reasons too.
Note that I sometimes write in the first person, and a lot of this post is necessarily my opinion, because I helped start the company, and for better or worse (mostly better, I hope) many of my early decisions will probably live for a while.
MySQL is Mature and Proven
MySQL is a known quantity. Although years ago it lacked some desirable characteristics, that is ancient history. For many years now it’s been difficult to find technologies that can boast MySQL’s track record of simple, safe, high-performance, reliable, low-cost operation. MySQL is a standout in the field.
It’s not perfect. That’s not my claim. There is plenty to improve about it. Being mature and proven means that although it has shortcomings, those are a) well-known, in part because of its simplicity and b) generally easy to work around. By “easy to work around” I mean that you can build amazing things with MySQL without committing unnatural acts. In other words, its well-known limitations are seldom an actual problem; they are usually just annoyances or edge-cases, or they eliminate one possibility but not others.
There’s another dimension to maturity: lots of people know MySQL. I took some calculated risks with new technologies in specific areas of the company (Go, AngularJS). Sticking with proven technology in the data storage layer makes it safer and more feasible to use unknown quantities elsewhere, where people will have to learn something new before they can be productive, and where we will potentially discover bugs in production before others do.
MySQL Has a Flourishing Ecosystem
It’s not just that there are a lot of people who know MySQL. There are lots of large companies providing commercial support, services, and software. There are excellent options for consulting, for example, and the market is highly competitive – not just a single vendor.
There are many companies providing add-on solutions that integrate MySQL with other technologies as well. Pick a technology, and there’s probably more than one person or company making products or solutions to make MySQL play well with that technology and vice versa.
There are great highly technical events, like Percona Live and MySQL Connect, as well as a strong presence in general open-source conferences and database conferences (think FOSDEM, for example). Thousands of people attend these events specifically to hear and share about MySQL. The expo halls are packed with vendors providing valuable services and products.
There are many companies using MySQL that provide great leadership for the open source community and contribute a lot of software, knowledge, and other improvements. See for example WebScaleSQL, probably the flagship example.
The open source and commercial ecosystem also provides a large variety of excellent tools, such as Percona Toolkit, MySQL Workbench, and even VividCortex itself. The availability of these things mean we can do more work on what’s unique to our product, and less reinvention or undifferentiated heavy lifting in areas that are outside our core product.
MySQL is Excellent, Good, or Usable For Many Things
Keeping our diversity and “technical surface area” small is a good thing. It simplifies operations, keeps the number of artifacts like drivers and language libraries low, makes it easier for us to do on-premises installations, and so on.
MySQL is also reasonably good at a lot of things. It’s a really good “simple SQL workhorse” database server. Sometimes using a product that’s great at one thing, and then “good enough” at others, is a good idea. For example, MySQL is no ElasticSearch, but it does have full-text search features. It’s no MemcacheD, but it has support for accessing the server via the MemcacheD protocol. It also has support for BLOBs, JSON, XML, and geospatial data. Balancing the needs of the application against the capabilities involved, it may well be better to use MySQL even for things it’s not quite suited for, than to complicate our architecture with lots of diversity.
MySQL is High-Performance and Scalable
It’s unfashionable to say it these days, but MySQL kicks ass on the types of workloads that matter to us. This is one of the best-kept secrets about MySQL, in my opinion.
People are often skeptical about this. Someone once advocated using Redis (note: we do use Redis) for a specific high-frequency key-value lookup task, “because it’s fast.” Sure, Redis is fast. For in-memory key-value lookups, so is MySQL, and it’s also transactional, concurrent, etc, etc. Again, keeping it simple and using a smaller number of technologies is a good thing.
Another example: We handle a lot of data at high velocity. With some other technologies, that would mean a lot of servers. I got this awesome email from someone recently. It’s edited for brevity and anonymity, but here’s the gist:
You should deploy ClusterTechX for your data needs. One of our customers does 60 billion transactions a month with it on 100 clusters.
Each cluster is 3, 4, or more servers. A little math shows that this customer is achieving around 100 transactions per second per server. We run our smallish AWS instances at 20% utilization and ingest well over 100000 metrics per second per server, for the time-series data alone, not counting all the other work those servers do. So, apparently if we used ClusterTechX we’d need hundreds of thousands of servers to handle our current load.
At scale, performance matters. MySQL handles heavy workloads with a variety of characteristics. You get more value for your dollar out of it than it’s popular to believe. Its raw efficiency per node is very high.
MySQL is also capable of getting good utilization out of extremely high-performance hardware (large numbers of CPU cores, large amounts of RAM, extremely fast storage). Having the ability to use expensive hardware efficiently is important, even though we are currently cloud-hosted. It took many years for MySQL to reach this level of efficiency, and that is a cautionary tale when considering newer alternatives.
MySQL Is Reliable
Your data is safe in MySQL, and you can run at large scale while minimizing risk.
InnoDB is one of the most reliable and proven transactional data stores there is. For many years it has been proven to provide top-notch durability and data integrity. It has had checksums on data pages since time began. It has had a doublewrite buffer too, to prevent torn pages (partial page writes). You can reasonably criticize MyISAM or other legacy parts of MySQL, but InnoDB is a thing of beauty.
MySQL itself supports very flexible (and again, mature and proven) replication, which is important for being able to actually operate the database at scale under high load. Replication also permits for point-in-time recovery, archiving of binary logs, and a host of other useful and helpful things. These things might seem minor when starting out, but at scale they are critical – lack of these things is a showstopper.
MySQL offers not one but two transactional hot backup solutions (MySQL Enterprise Backup, Percona XtraBackup) that are usable at scale. This is also useful not only for backup, but for operational activities like creating new clones of existing servers without disrupting them.
Why Not SomeNewDB?
It is easy to claim that MySQL is inferior to Technology X for any use case you can dream up. I have many existence proofs of MySQL working extremely well for what we’re doing with it. I have few examples of other technologies working at all for those uses. In fact, most people I know who have tried to use a lot of other ostensibly superior technologies for the things we do with MySQL have failed.
Nonetheless, I’ve evaluated and studied many databases quite deeply. One of the challenges is that there are a lot of relatively new, unproven technologies to evaluate. As a result, asking the question “why not X instead” is cheap, but answering it is very expensive. If we narrow down the field to databases that are optimized for appending, sequential reads, and mixed/complex workloads of in-memory and disk-bound, then there are fewer options to evaluate (mind that MySQL serves us well beyond even this simplistic workload). Still, many of them are relatively easy to disqualify for various reasons, though these are sometimes easy to imagine being fixed soon. Some new technologies continue to be promising – TokuDB, for example, is almost usable for us, and I expect it to be truly competitive for our workload in the very near future.
Using MySQL now allows us to try newer options in the future. Using those newer technologies now could prove very harmful as we try to do job #1, which is building our product without slowing down to debug problems in a large-scale distributed storage layer.
Why Not A Fork?
We are using vanilla MySQL directly from Oracle, not a fork or distribution packages. We do this because Oracle’s MySQL is the core technology that millions of people use and trust. There is safety in numbers. I do not wish to get into political debates, but:
- Some Linux distributions’ packages differ from the official source in nontrivial, sometimes undisclosed ways.
- Most readers of this blog will know I spent years at Percona. I know what they’ve changed in Percona Server and why, and we don’t need those changes.
- MariaDB does a lot of excellent engineering in the server, but there are too many questions I would need to answer the hard way (researching source code, benchmarking, running trials in production). I would have to do it out of love for open source, and that’s not my primary motivator.
I am convinced that Oracle is in business with MySQL to make money. This is a very simple motivation, one that leaves very little in doubt, and eliminates a lot of questions and political issues. I also believe that Oracle thinks they will make more money by making MySQL better. I happen to agree with this.
I constantly learn new things. One of the nice things that comes from this is that I get a lot of perspective on just how good MySQL really is. In the end, although I have spent a lot of time evaluating alternatives, my job is to make things work, not to learn new technologies the hard way at large scale and high speed. There’s going to be plenty of that; no need to magnify it.
Thanks to Mark Callaghan for reviewing a draft of this article (and for advising us in general).