If you've ever been responsible for maintaining the performance of a busy app or website, there's a good chance the task often felt complex, with far too many metrics. Since you can measure so much in a modern system, how do you tell what's really important? In reality, though, as complicated as your job might feel, your real concerns can actually be understood through a much simpler — and more manageable — single metric: throughput. For service providers, server performance is all about resource management.
[This article is Part 3 of an ongoing series about "The Zen of System Performance." Part 1 is about the importance of viewing system performance from two separate but deeply interconnected perspectives. Part 2 is about how tech ops benefits from paying attention to users' perspectives. View all parts here.]
Keeping Track of the Big Picture
Service owners need to worry about their system's big picture, all the time. The big picture isn't just one thing, however — inside that big picture, there are hundreds of tiny components.
Image Source - Public Domain - -1923
The foundation of server performance is the constant regulation of all the people and resources that keep the system humming along smoothly. Traditionally, this means not worrying about any single person's (user's) experience when they encounter the system. Service as many requests as possible! Unlike users, whose concerns can be easily summarized in one or two questions (albeit major ones), service owners have a long list of check boxes they need to track. These must all be balanced and juggled, in the name of allocating resources responsibly and effectively. Examples might include:
- Is the population running efficiently as a whole?
- Are any outlier users having bad experiences? (Note, however, that this doesn't necessarily mean "What does it feel like for the individual having a bad experience?" — and there's the rub.)
- Are resources being used efficiently?
- How does everything interact?
- This last question introduces a whirlwind of additional, complex inquiries, which need to be weighed and solved in order to answer all the other concerns above.
Service owners pay attention to aspects of server performance unknown to anybody on the outside of the system. The most sophisticated teams use specialized instrumentation to monitor highly detailed metrics, which are invisible to all but the most well-equipped administrators. And unlike users, who often only know about themselves, service owners know about all users and have to take into account how the complex web of user requests shapes itself and ties itself in knots.
Balancing Many Concerns As One
Ultimately, all of these questions add up into a major concern about how well the system is using its resources. Despite the many moving parts, we can actually define the total of a service owner's concern as a single, powerful metric: system throughput. This is a measurement of system efficiency via resources, in units of requests/second. No matter how many different variables contribute to the final equation of requests/second, everything the system does can be contained within this expression.Throughput condenses all the questions that service owners face and resolves them into a single readout of "work the system accomplishes."
Balancing Resources and Requests
As we wrote in a previous article, there's another perspective you also need to consider when thinking about system performance: the users' perspective. Unlike service owners, users don't have a big picture view of a system; they only see their own requests. However, users are similar to service owners in that their concern can be measured as a single metric: latency, measured in seconds/request.
It's no coincidence that throughput and latency employ the same units, just the inverse. Even though users and service owners experience the system from entirely different points of view, their perspectives are inexorably interlinked… just not in ways that are always obvious. In our next article, we'll discuss how mastering both a user's and a service owner's perspective can give you a complete (and practical!) definition of system performance, in which both throughput and latency play a role — and result in excellent performance for everybody who touches a system, not just people on one side or the other.
You can also continue learning about designing systems while thinking about performance, with a free recording of Preetam Jinka's webinar, "How to Be a Performance-Driven Engineer."
Topics: Zen of Performance