Time series anomaly detection is a complicated problem with plenty of practical methods. It’s easy to find yourself getting lost in all of the topics it encompasses. Learning them is certainly an issue, but implementing them is often more complicated. A key element of anomaly detection is forecasting - taking what you know about a time series, either based on a model or its history, and making decisions about values that arrive later.You know how to do this already. Imagine someone asked you to forecast the prices for a certain stock, or the local temperature over the next few days. You could draw out your prediction, and chances are it’s a pretty good one. Your brain works amazingly well for problems like this, and our challenge is to try to get computers to do the same.
If you take an introductory course on time series, you’ll learn how to forecast by fitting a model to some sample data, and then using the model to predict future values. In practice, especially when monitoring systems, you’ll find that this approach doesn’t work well, if at all! Real systems rarely fit mathematical models. There is an alternative. You can do something a lot simpler with exponential smoothing.
First, let’s take a quick look at what kinds of time series we could be working with. Suppose you measured the
cpu.idle metric on a system and have observations that are plotted below.
In this case, the time series isn’t particularly interesting. The values vary a reasonable amount, but overall it’s fairly stable and most values hover around 130 or so. From a time series analysis perspective, this is considered to be fairly stationary. If you tried to predict the next value, your best guess would probably be around 130. It’s impossible to be exactly right with a prediction like this, but picking a value like 130 would appear to be the least incorrect.
Exponential smoothing refers to the use of an exponentially weighted moving average (EWMA) to “smooth” a time series. If you have some time series xt, you can define a new time series st that is a smoothed version of xt.
Here’s a plot of a stationary time series, like the previous example, along with a couple of smoothed versions. Notice how the smoothing amount changes with α, the smoothing weight. The smaller the weight, the less influence each point has on the smoothed time series. Read our other blog post on how exponentially weighted moving averages work for more details.
Suppose you had your time series xt along with a smoothed version st. You’d like to predict, or forecast, the next value for xt, which is xt+1. This is simpler than you may think! You can just use the last value you calculated for the EWMA, st. It works out this way because our smoothed time series is the EWMA of our original series, and because of the way averages (and expectations) work, st turns out to be a really good prediction. Predicting the next value is called the one-step-ahead forecast.
This method doesn’t always work well. Remember, you made an important assumption for this time series: it’s stationary. What happens when it isn’t?
Stationarity, Trend, and Seasonality
There are many ways to characterize a time series, but we’ll focus on three simple ones that are closely related: stationarity, trend, and seasonality. Stationarity refers to how stable the values of a time series are. For simplicity, let’s just say that we consider a time series to be stationary if it has a constant mean. A stationary time series will not have any kind of increasing or decreasing pattern, and its points will generally hover around the same value, the mean. It’s because of this characteristic that a simple EWMA, which estimates the mean, is so helpful for forecasts.
Trend refers to a long-term movement of a time series in a particular direction. With linear trend, time series points will approximately follow a line. It’s also possible to have higher order trends, such as quadratic trend where points follow a parabola.
Seasonality refers to a periodic pattern. A great example of a seasonal time series is the temperature in a particular location. A time series can have multiple seasons with different periods
The Keeling Curve, which plots the measured concentration of CO2 in the atmosphere, has a positive trend and seasonality.
You may notice something interesting going on with the smoothed series with the lower weight. It tends to lag behind our original data because more recent values have lower influence. This is especially noticeable with the seasonal time series. This is important! Because you’re using the smoothed values to forecast, any significant deviation in the smoothed values will throw off your prediction. If you notice that your time series is not stationary, you’ll have to find something other than a simple EWMA to do your forecasting.
Double and triple exponential smoothing
In the late 1950s, Charles Holt recognized the issue with the simple EWMA model with time series with trend. He modified the simple exponential smoothing model to account for a linear trend. This is known as Holt’s exponential smoothing. This model is a little more complicated. It consists of two EWMAs: one for the smoothed values of xt, and another for its slope. The terms level and trend are also used.
Notice how the smoothed values are much better at following the original time series with double exponential smoothing. This means you’ll get much better forecasts.
To forecast with this model, you have to make a slight adjustment. Because there is another term for the slope, you’ll have to consider that in the forecast. Suppose you’re trying to forecast the value in m time steps in the future. The formula for the m-step-ahead forecast, Ft+m, is
Notice how it’s essentially the formula for a line. What if your time series doesn’t have a linear trend, but rather some sort of seasonality? For that, you’ll need yet another EWMA.
Holt’s student, Peter Winters, extended his teacher’s model by introducing an additional term to factor in seasonality. This model, with level, trend, and seasonal components, is known as Holt-Winters. It is also referred to as triple exponential smoothing. Notice how there’s another variable L, which depends on the period of the seasonality and has to be known in advance.
The m-step-ahead forecast formula for this is
Real-time anomaly detection is really a forecasting problem since you can’t know what to expect in the present unless you use the past to forecast. Forecasting time series data can get really sophisticated and complicated, but a lot of simple and efficient techniques like an EWMA can give most of the benefit with a small fraction of the cost, effort, and complexity. More complex techniques can be good for very specific cases, but come at the cost of losing generality and requiring a lot more tweaking and parameter selection, which can be surprisingly delicate to do well.
Topics: Math and Statistics