Published by Baron Schwartz on Jul 30, 2018 1:47:27 PM

What is Cardinality in Monitoring?

I wrote a couple of “definitions and nuances” posts about terminology in databases recently (cardinality, selectivity), and today I want to write one about cardinality in monitoring, as opposed to cardinality in databases. If you’ve seen discussions of “high-cardinality dimensions” or “observability requires support for high-cardinality fields” this is what we’re talking about today. So what does it mean?

Read More
Published by Preetam Jinka on Jun 22, 2017 4:45:00 PM

Exponential Smoothing for Time Series Forecasting

Time series anomaly detection is a complicated problem with plenty of practical methods. It’s easy to find yourself getting lost in all of the topics it encompasses. Learning them is certainly an issue, but implementing them is often more complicated. A key element of anomaly detection is forecasting - taking what you know about a time series, either based on a model or its history, and making decisions about values that arrive later.

Read More
Published by Baron Schwartz on Apr 14, 2017 2:18:21 PM

Correlated Metrics in Monitoring

If you work with monitoring or monitoring tools much, you’ve probably seen the phrase “correlating” here and there. For example, monitoring vendors often say you can use their product to correlate metrics. Issue 157 in the popular Prometheus monitoring system’s GitHub repository is to add support to correlate multiple metrics in the same graph.

I’ve noticed that when monitoring-related discussions mention correlation, the meaning is usually pretty vague. It often seems to refer to just graphing multiple things on a single chart with the same timescale.

Read More
Published by Baron Schwartz on Nov 18, 2016 2:16:00 PM

Why Percentiles Don’t Work the Way you Think

Customers ask us for p99 (99th percentile) of metrics pretty frequently.

We plan to add such a feature to VividCortex (more on that later). But a lot of the time, when customers make this request, they actually have something very specific, and different, in mind. They’re not asking for the 99th percentile of a metric, they’re asking for a metric of 99th percentile. This is very common in systems like Graphite, and it doesn’t achieve what people sometimes think it does. This blog post explains how percentiles might trick you, the degree of the mistake or problem (it depends), and what you can do if percentile metrics aren't right for you.

Read More
Published by Baron Schwartz on Dec 21, 2015 1:47:42 PM

The Factors That Impact Availability, Visualized

We all want our systems to have high availability, but sometimes the exact meaning of “high availability” is not very clearly defined. However, availability -- like scalability, performance, and so on -- can be expressed as a mathematical function; it can be viewed in quantifiable and digestible terms. In this post, I’ll explain which parameters truly influence availability: an extremely useful concept to understand, as it enables you to focus your efforts in the right places and to actually achieve higher availability instead of just spinning wheels.

Read More
Published by Baron Schwartz on Nov 30, 2015 12:18:56 PM

A Trendline is a Model

This post is part of an ongoing series on the best practices for effective and insightful database monitoring. Much of what's covered in these posts is unintuitive, yet vital to understand. Previous posts have covered Why Percentiles Don't Work the Way You Think; how to avoid getting to a point When It's Too Late to Monitorhow to tell If a Query Is Bad; and Why You Should Almost Never Alert on Thresholds.


Excel makes it easy to add a “trendline” to a chart, but does the trendline actually reflect the processes that produced the data? Usually not. Usually a trendline is just chartjunk and you shouldn’t use it.

Read More
Published by Baron Schwartz on Nov 11, 2015 10:48:43 AM

Introducing Query Anomaly Detection

Anomaly detection sure is a hot topic. We’ve written about it ourselves a number of times, and Preetam Jinka and I just coauthored a book for O’Reilly called Anomaly Detection For Monitoring. One of the challenges, as we’ve discussed so often, is that catch-all, generic anomaly detection is hard to do.

Read More

Recent Posts

Posts by Topic

see all