Published by Baron Schwartz on Oct 5, 2017 4:02:17 PM

Monitoring and Observability with USE and RED

Modern systems can emit thousands or millions of metrics, and modern monitoring tools can collect them all. Faced with such an abundance of data, it can be very difficult to know where to start looking when you’re trying to diagnose a problem. And when you’re not in diagnosis mode, but you just want to know whether there’s a problem at all, you might have the same difficulty. What are the truly key KPIs coming from your systems?


Published by Alex Slotnick on Aug 4, 2017 9:53:09 AM

Throughput Is the One Server Metric to Rule Them All

If you've ever been responsible for maintaining the performance of a busy app or website, there's a good chance the task often felt complex, with far too many metrics. Since you can measure so much in a modern system, how do you tell what's really important? In reality, though, as complicated as your job might feel, your real concerns can actually be understood through a much simpler — and more manageable — single metric: throughput. For service providers, server performance is all about resource management.

Published by Alex Slotnick on Jul 7, 2017 4:14:35 PM

Why Tech Ops Needs to Care About User Experience

If you work on an app’s backend, you’re probably used to looking at server-side metrics like utilization and backlog. Alas, this can give you major tunnel vision, because it can easily cause you to ignore the user’s experience. Why does this empathy matter?

Published by Alex Slotnick on Jun 30, 2017 5:20:51 PM

You Need to Consider Two Perspectives to Understand System Performance

There are two fundamental points of view for thinking about system performance. Most people tend to experience a system from only one side, and they can often analyze that perspective extremely well. This makes sense—most people are in a position where only one point of view occurs naturally. But that doesn't mean you shouldn't try to see both perspectives. Each one is equally valid, and without a complete view, you have no hope of understanding a definition of your system's performance as a whole.

