The Brainiac Corner is a format where we talk with some of the smartest minds in the system, database, devops, and IT world. If you have opinions on pirates, or anything else related, please don’t hesitate to contact us.
How did you get from stork to brainiac (i.e. what do you do today and how did you get there)?
I didn’t /really/ get into computing until I was well into my 20′s. Ever since I was young I had wanted to learn to program fluently. Despite owning a TI-99/4A and Commodore 64 at an early age, I struggled to master even BASIC programming. I lost interest in personal computing until the 90′s when PCs became affordable again. With the purchase of a Quantex P2-233 the desire to program took hold again. However, I quickly realized that developing for Windows was a non-starter due to the cost of commercial compilers. In the course of researching my options I stumbled across the world of Free and Open Source Software and inevitably, Linux and BSD.
Once I started playing with Linux it seemed like my world was a never-ending series of projects focused on running some new service on my Linux workstation. I still remember the first time I ran into the bedroom to tell my wife that I’d gotten a webserver running on the internet. It was only a matter of time until that led to my first database, mailserver and firewall. Before long I was relatively fluent in Red Hat Linux and meeting others at the nearest LUG. It was there that I met a good friend who helped me get my foot in the door with a well-known beltway startup called Digex.
Jump forward ten years and I’d become a professional Linux and OpenBSD administrator and OSS hobbyist. I left the world of government contracting to work for OmniTI as a Site Reliability Engineer, and eventually, the Product Manager for an internal startup named Circonus. It was there that I first learned of the Graphite project from Orbitz, although I wouldn’t do much with it until later.
Since then I’ve worked on the Ops team at Heroku, and now as a “Graphite Badass” at GitHub. Somehow I’ve developed a reputation as a Graphite expert, although I’ll be the first to point you to the real geniuses working on Graphite core such as Michael Leinartas, Aman Gupta, Sidnei da Silva, and of course, Chris Davis (Graphite’s creator). I like to work on improving the tools that interoperate with Graphite and make it more fun to work with, but I leave most of the heavy lifting to the other guys. I’m very fortunate to get to work with technologies that I’m passionate about (monitoring, trending, metrics, etc) and I like to share my work with others.
What is in your group’s technology stack?
Wow, where to begin. Most folks know that GitHub is big on Rails and MySQL, no surprises there. Many of our internal tools use the same formula, but a lot of us use Sinatra for building internal tools. And a select few of us still prefer PostgreSQL.
Beyond that we’re a mostly pragmatic bunch, although it seems that more often than not we end up building from within. This results in a toolchain that’s highly optimized for our own use and results in less friction than if we’d adopted something built with a more generic purpose in mind.
Other things off the top of my head: Puppet, Resque/Redis, Memcache, Nginx, collectd, Postfix, Elasticsearch and Nagios.
Oh, and I almost overlooked Campfire. This is probably the single most important tool in our belt. As a remote-friendly company, virtually all of our communication and troubleshooting happens within Campfire sessions. We operate asynchronously, and we’ve developed tools (like Hubot) that allow us to do the lion’s share of our work within chat. Besides the obvious collaborative benefits, this gives us the added benefit of automatic archival of all conversations and events logged to any room. As you might imagine this is an awesome resource during outages and when handling postmortems.
Who would win in a fight between ninjas and pirates? Why?
Pirates, because they have Archer on their side. Long live the Pirate King.
Which is a more accurate state of the world, #monitoringsucks or #monitoringlove?
I think that for many businesses, #monitoringsucks is just the way things are because they view monitoring as a liability rather than an asset. Frequently we want to get by with the least amount of work necessary to make sure our IT or operations are running, e.g. can we ping the host, does the webserver respond to connections, etc. But these are the bare minimum and only address the resource’s ability to respond at the network layer. We must drive further to determine the actual “quality of service” being provided by our operational resources. And in many cases, this requires real thought and planning beyond just signing up for a one-stop monitoring service like Pingdom.
Once you’ve accepted this, there’s a wealth of good tools available these days that fall into the #monitoringlove mindset. More and more engineers understand the benefits of composable monitoring systems and are working towards common data standards and interopererability. This flexibility offers a huge competitive advantage over legacy (read: monolithic) monitoring software.
In six words or less, what are the biggest challenges your organization faces?
Slow puppet builds.
What’s the best piece of advice you’ve ever received?
“The customer is always right.”
In operations, we know this isn’t always true. In fact, it’s frequently NOT true. Regardless, this principle guides a lot of what I do in terms of software design and development. I always put myself in the user’s seat and try to use the product as they would, asking the questions that they would ask. This helps me to remove a lot (but not all) of the friction and workflow impedance that they would’ve experienced otherwise. I use other “competing” products and take note of the areas where I feel pain and see if my proposed remedies could be applied elsewhere in my own tools.
I think a lot of DevOps and OSS developers take this for granted, particularly since we don’t frequently have access to great designers and UX experts. While I won’t kid myself into believing my software is better-engineered than the next person’s, I sleep well knowing that I’ve agonized over the details that affect the user experience most visibly.
What principles guide your expertise in your given domain?
Love your metrics. Measure everything. Store it for as long as possible, as close to its original granularity as possible. Study your data and alert accordingly.
What is your vision for system administration in 5 years?
I’d like to see automation, configuration management, metrics collection and monitoring in such a state that we can take it for granted and focus more on fun things (to me) like analytics, forecasting algorithms, and improved visualizations.