Monitoring is Dead, Long Live Performance Management

Posted by Baron Schwartz on Apr 21, 2014 10:16:00 AM


At VividCortex, we are explicitly not a monitoring company. We are addressing the human-to-data gap by building tools that help people manage thousands of servers effectively. This tension is caused by trying to run increasingly large numbers of servers with fixed staff and budget, using relatively immature software that doesn’t have sophisticated tools. It’s a worst-case scenario for the operations staff: more data, more servers, worse tools.

Many people think “I need a monitoring tool,” but that’s not the solution. Monitoring is a square wheel that has been reinvented for decades. Monitoring is “here are thousands of graphs, have fun.” It is “record all your metrics, it doesn’t matter what they mean.” It is thresholds and aliveness checks. It is based on flawed, unexamined assumptions, and it leads directly to familiar problems: false alarms, missed alarms, useless information that isn’t actionable and can’t be turned into knowledge, and alert fatigue. As a result, monitoring tools often make organizational and cultural problems worse instead of better.

We need a paradigm change, and it starts with an axiomatic approach that’s based on logical principles. At VividCortex, our unifying principle is that a system’s purpose is to perform useful work. The consequence of this is the worldview that job #1 is measuring and understanding work (not metrics), and making sure it’s done fast, consistently, and without errors.

When viewed through this framework, the tail stops wagging the dog and systems problems become simpler and faster to troubleshoot. All sorts of hard, vague questions are rephrased in useful ways that are possible to answer sensibly, with greater certainty and speed.

This is Performance Management, not monitoring. It’s proven to work in the application tier by mature, helpful APM tools. Why would you not want to apply similar principles to your network, servers, storage, and so on? It’s a boiled-frog problem: people have been throwing the wrong tools at the systems side of things for so long that they don’t notice. They continue to look and hope for “monitoring” to solve their problems, without realizing that it doesn’t and can’t.

Fixing this is not a matter of building a better square wheel. We don’t need “monitoring” tools that are easier to install or produce nicer charts. We need fresh ideas.

At VividCortex, we are going directly to the root of the systems management problem, not hacking ineffectively at symptoms. And our customers are seeeing results that bear out our conviction that monitoring is the wrong tool for the job.

Long live Performance Management!

