Published by Alex Slotnick on Apr 12, 2017 4:17:59 PM

How to Replace a Wing in Midair, and Other Development Safety Tips

In this episode of Brainwaves, Baron Schwartz and Jay Ennis -- VividCortex's CEO and VivdCortex's VP of Product Development -- join Alex to discuss some of the concepts that help engineering teams operate and build safely. Jay and Baron talk about what they aim to protect and avoid when setting up safety protocols for a team, and they discuss why some commonly held beliefs, like root cause analysis, are traps. Ideas from other industries -- such as aviation, EMT emergency reporting, and horse rearing -- lend helpful perspectives and models.

Published by Baron Schwartz on Jan 23, 2015 11:28:00 AM

Distributed and Diverse: The New Reality of Modern Data Persistence

We create applications in an age of simple, powerful, flexible databases that do magic for us. There’s a large variety of modern databases that supply just what’s needed for lots of use cases, so we can pick the right tool for the job. We’ve never had it better, right? So why is “it’s the database again” still a sufficient explanation for a lot of outages and performance problems?

Published by Baron Schwartz on Dec 4, 2014 10:01:00 AM

Why VividCortex is Cheaper Than DIY Open-Source

Talk to someone who runs monitoring systems at a company with more than a few servers and you’ll quickly find out that scaling their monitoring systems is far from a minor concern. It’s often a serious problem. The three primary costs of self-hosting an open-source monitoring system, according to our survey of hundreds of Operations staff at large companies, are:

Published by Baron Schwartz on Nov 17, 2014 7:19:00 AM

View Per-Process I/O and More With VividCortex

What can explain mysterious slowdowns in database performance? Sometimes it’s not the database or the queries it’s running, but the non-database activity on the server. One of the most important types of activities to analyze is processes. That’s why VividCortex tracks process activity, including per-process CPU, memory, and I/O, in 1-second detail. It’s like a historical view of top, and you can view it across your entire environment in a single screen, then drill down to individual servers and programs. If you see interesting regions you can click and drag with the mouse to zoom in and correlate. You can rank and sort by any of the per-process metrics we capture, and you can mouse over the sparklines to see the instantaneous rate, second by second.

Published by Fernando Papa on Sep 22, 2014 6:59:00 AM

Using Netlink to Optimize Socket Statistics

This is a story of using low-level kernel interfaces to optimize an edge case one of our agents encountered in some servers. The TL;DR version is that accessing /proc/ can be very expensive if there are a lot of network connections, and the Netlink interface between userspace and kernel space is a much more efficient method.

Published by Baron Schwartz on Apr 21, 2014 10:16:00 AM

Monitoring is Dead, Long Live Performance Management

At VividCortex, we are explicitly not a monitoring company. We are addressing the human-to-data gap by building tools that help people manage thousands of servers effectively. This tension is caused by trying to run increasingly large numbers of servers with fixed staff and budget, using relatively immature software that doesn’t have sophisticated tools. It’s a worst-case scenario for the operations staff: more data, more servers, worse tools.

