5 Questions Managers Should Ask Their DBAs Before Black Friday

Posted by Alex Slotnick on Oct 31, 2016 4:48:45 PM

Poorly performing sites can be costly. With Black Friday and Cyber Monday right around the corner, how can business leaders work with DBAs to ensure that their organizations and databases are properly prepared to deal with customers’ needs, so websites don’t malfunction? With five simple—but powerful—questions, managers can help their database teams get ready for the heaviest of shopping traffic.

CTO and DBA.jpegImage Credit

For online retailers or other e-commerce firms, a sub-par online experience means lost sales and a damaged reputation. Customers may not only abandon your site after finding it broken the first time, they may never come back. According to recent research, 79% of shoppers who are dissatisfied with a site’s performance are less likely to buy from that site later. And we're not just talking about major outages. Kissmetrics reports that as little as “a one second delay in page response can result in a 7% reduction in conversions.”

The old rule of thumb, based on analysis by Amazon from several years ago, is that every 100ms delay costs 1% of sales. Considering numbers that pointed, the correlation between performance and sales success is massive. Assuming these ratios are still roughly approximate (and, in the case of Amazon’s 2015 sales, they would apply to $107 billion) you're talking serious dollar amounts even when dealing with page loads of only 1 second or less.

The Five Questions

How much control does an organization have over its website’s ability to perform? The answer doesn’t just involve traditional, top-down APM—the database is vital too. The key questions below can help leaders like CTOs review their website statuses with their database experts to ensure optimal site performance during the holiday season’s craziest periods.

  1. What is the current health of the system?  Here your team needs to look at three critical variables:  CPU, disc I/O, and memory. These three variables dictate your “top speed” or the ceiling on your overall processing capacity. Using recent site traffic as a baseline, your team should understand how you’re performing in terms of CPU utilization, disc I/O throughput, and memory usage as a percentage of total current capacity. Are your DBAs properly equipped to do such an analysis? What tools or systems do they already have? Which are they missing?  

  2. How far can we go with the system? Given the resources you’ve deployed (CPU, disc, memory) you need to determine your maximum capacity. This is when you try to determine “soft limits” such as when high levels of concurrency may sap memory and disc I/O starts to slow processes. These become your baseline metrics or benchmarks, above which you would need to consider adding resources. Based on your company's position, what sort of activity do you expect on Black Friday and Cyber Monday? How can you, in an executive position, coordinate and communicate your databases' limits to other departments, for whom traffic and sales expectations are important metrics?

  3. What are we able to monitor in real time?  In addition to monitoring performance across CPU, disc, and memory you need to be able to drill down to the database query level and analyze real time performance at a more granular level. By sorting the top queries and looking at latency, throughput, errors, warnings, index usage — all down to the microsecond —your team can get a better handle on the components leading to issues. The abilities here are totally dependent on the solutions and instrumentation at your team's disposal. Based on the limits you defined in question 2, are there any metrics you can't monitor that you might need to?

  4. How easily can we spot/isolate problems as they occur?  Trying to spot small, anomalous events inside massive data sets can be a real challenge. One way to do this is by evaluating database query groups using time-based comparisons hour by hour or comparable dates in the past (e.g. “Black Friday” last year). Using this comparative approach, you can isolate the query that is causing latency and throughput problems and remedy the issues before your site takes a major performance hit. If you can understand the ability to recognize a problem, you can set realistic expectations for your team for how emergency resolution might proceed. If your team is unable to locate a problematic query, however, what recourse will they have when it's time to fix one, in the middle of the action?  

  5. What preventative measures can we take to head off problems? The best place to start is with the developers writing code. They need to be able to understand the impact any code changes may have on database performance, as a small tweak to a query could have a significant (and unforeseen) impact on performance. For example, using coding techniques like Adaptive Hash Index look ups can make queries faster, but will also draw additional CPU cycles, thus invalidating your original resource assumptions related to CPU utilization. As a leader in this position, you should review with your team the various contingency plans and backups that are available and expected, if necessary. If there's a chance that drastic fix might be necessary, don't wait until the decisive moment and try to improvise how it should go. 

Beyond Black Friday

Of course, site traffic is not evenly distributed throughout the year. You’ll find that seasonal patterns apply to online retailers just as they do to their bricks and mortar brethren. During significant spikes in business during times like Black Friday or Cyber Monday, it’s essential to be able to compare historic traffic patterns to understand how and when issues have occurred in the past… and when they might appear again.

How should an executive approach this question? Be sure to communicate to your DBA that you understand that database health isn’t important just around the holidays but all year long. Similarly, to monitor effectively, you need to provide your DBA with resources so he or she can properly prepare and plan, using historical data and performance analysis, and not fall into the trap of relying on faulty assumptions or intuition.

Ready for the Rush

Here is a final thought as you prepare for what is hopefully a busy holiday season: understand the importance of instrumentation. Databases are not black box, “self-driving” machines, but more like Formula One race cars that require constant monitoring and fine tuning for the best performance.

In racing, it starts in the garage and on the test track, where mechanics tweak and measure changes to the engine, suspension, tires and aerodynamics. It continues during “production” — otherwise known as race day — when hundreds of metrics are tracked and constant adjustments are made to optimize performance. In Formula One racing, peak performance truly matters as one second can mean the difference between first place or tenth. By preparing, testing, and getting your instrumentation right, you won’t get lapped at the finish line on the busiest day of the year.

If you haven't yet had the opportunity to see how easily VividCortex can fit into your organization's current set of monitoring solutions, sign up for a free trial and get started today.

Recent Posts

Posts by Topic

see all