As intuitive and streamlined as ecommerce technology might seem from the user's perspective, it involves so much data that engineering ingenuity and smart database management must constantly deliver in order to keep up. At organizations like Shopify—responsible for the easy and reliable transactions at top brands around the world—that excellence of performance involves deep monitoring of their MySQL core and their Redis caching infrastructure, plus insightful query profiling, packet captures, and the admittance of developers to platforms that measure database performance.
Shopify’s motto is “Make commerce better for everyone.” That mantra applies whether the shopping's done online, on mobile, or in-store. For Shopify's engineering team, better means a fast, reliable application that delivers a positive customer experience, particularly during periods of high traffic volume. Shopify is a top ecommerce solution provider, delivering online and point-of-sale services for over 325,000 businesses in 150 countries, from your local coffee shop to trusted brands such as Tesla Motors, Red Bull, Nestlé, GE, and Kylie Cosmetics.
Shipping Code Fast and Often
For Shopify’s developers and DBAs, working in a high-availability environment means delivering at high velocity. It’s not unusual for them to deploy new software 20 times per day, and during peak periods—such as Black Friday and Cyber Monday—they have deployed new code over 80 times in a single day.
Sergio Roysen, Shopify's MySQL DBA Team lead has told us that "Shopify deploys new software twenty times a day, sometimes. After Black Friday, after the code freeze, we deployed eighty-six times in the same day. Each of these new deploys can introduce new queries. Tracking those queries without VividCortex would take 24 hours. It could be too late. We could be down for 24 hours until we come up with a solution. Real-time monitoring is the only way to keep our wheels running."
You can find a four-minute video case study here, from a recent trip we made to Shopify's offices; it includes highlights from our interviews with DBAs, developers, and engineers. Below is a summary of the key monitoring methods and practices they shared with us.
Monitoring What Matters
The Shopify team maintain a laser focus on anything that gets in the way of performance. Their approach demands extreme precision and, for that precision, clear visibility into how the system behaves. Here’s how:
- Tracking fast query effects. One part of Shopify’s existing monitoring process included maintaining a digest of slow queries. The problem was obtaining visibility into high-frequency, really fast queries. Though each query executed in just microseconds--and low latency is great--there were potentially millions firing every hour, which all together consumed significant time in the application, plus potentially made it hard to see if there was any issue within one of these fast queries. Once Shopify started using VividCortex, they were able to zoom to one-second granularity, which allowed them to spot and cache every query, pinpointing any that were causing performance issues.
- Getting rid of bad queries. The team constantly look for patterns that could reveal issues, such as a particular shop generating an inordinate number of queries. By profiling queries, they can quickly find those that are causing a problem on a certain database. Once discovered, they can either get rid of these queries entirely, or fix those that may have been hung up by using a bad index or no index at all. This continual finding, fixing, and culling process helps purge their application of performance issues.
- Working from packet captures. During a recent period of high volume, the team deployed new Redis capacity. They were seeing spikes in latency and needed to confirm that they had provisioned enough resources to match the heavy traffic. Because VividCortex allowed them to monitor packet captures— and not just queries—they could see the latency increases introduced by the new connections caused by the Redis deploy. In the past, it was difficult to gain visibility into this dimension, as traditional database monitoring solutions don't enable engineering teams to see packet traffic. However, when given the right monitoring tool, they were able to see the exact metrics they knew they should look for and monitor, throughout the traffic surge.
- Developer testing of queries. Developers across different teams in the organization are responsible for not only writing new queries, but performance testing them as well. They use performance monitoring to evaluate poor-performing queries and confirm that they have either been fixed or removed. By giving developers access to monitoring tools, Shopify eliminates the need for complicated back-and-forths between devs and DBAs, to ensure that code pushes won't have a detrimental effect in production. Administrative functions such as role-based and team-based access control make it possible for even non-DBA engineers to navigate sophisticated monitoring platforms without risk. Shopify has over 500 users on their VividCortex install, benefitting from SAML integration and Okta single sign-on. That means users of many different specialties and approaches can have an input on Shopify's monitoring.
Eliminating the Guesswork
With hundreds of instances to monitor and dozens of developers, DBAs, and engineers on the team, seamless coordination on issue resolution is a real challenge. Prior to their experience with VividCortex, when a server started to slow down, Shopify's DBAs and developers were forced to jump into their internal chat room, to discuss the cause of the problem. Without hard hard performance data, issues could evade detection and stay buried.
Today, when an issue arises, the team still jumps into the chat room, but they're now armed with hard evidence — screenshots of performance charts they can share with their colleagues, and Shopify thrives with this data-driven analysis. Working together, the team members move fast and ship often, now able to deliver the company's established high-performance standards to even the deepest tiers of the system.