Brainaic Corner with Brendan Gregg, Lead Performance Engineer at Joyent

Posted by VividCortex on Jul 9, 2013 6:03:00 AM


At the Brainiac Corner, we meet with some of the sharpest minds in the system, database, devops, and IT world.  If you’d like to share your thoughts on pirates, ninjas, the future of system administration, or any other relevant topic, please don’t hesitate to contact us

We had the pleasure of interviewing Brendan Gregg, Lead Performance Engineer at Joyent, and co-author of the DTrace book and the DTraceToolkit.

How did you get from stork to brainiac (i.e. what do you do today and how did you get there)?

It’s hard to feel I’ve become a brainiac, when there’s so much I know I don’t know. I do performance analysis and engineering at any level of the software stack, especially between the syscall interface and metal, in weird parts of the kernel. In my youth I programmed on the C64 and earlier micros, then PCs, and then went on to do Computer Engineering at the University of Newcastle, Australia. Since then I’ve had a variety of roles, including kernel engineering at Sun Microsystems and performance engineering. I learned a lot when leading performance efforts for the first ZFS storage appliance, both from the product and the engineers I worked with, which involved some difficult performance engineering. What has probably helped the most is being willing to take on hard work – and in some cases, I wasn’t supposed to do that work, but couldn’t bear to see others do it badly. I have studied a lot too, after leaving University, including reading books, whitepapers, blogs, and developing my own projects.

** What is in your group’s technology stack?**

Key parts, and the parts we actively develop, include node.js, ZFS, DTrace, Zones, KVM, SmartOS, illumos, and pkgsrc. We are also use many technologies, including Linux, MySQL/Percona, PostgreSQL, RabbitMQ, and a lot more.

Who would win in a fight between ninjas and pirates? Why?

I think I’d want the ninjas to win, to admire their training and skill, but would be sad that the pirates would actually win. Pirates may be more familiar with fighting as a team, and experienced at fighting in general. I’ve done fencing as a sport, and the first thing I had to learn was what I’d call “combat brain” – being able to think quickly during hand-to-hand combat, and plan your attack. This may be a disadvantage, too: you learn to expect your opponent to behave in a certain way, but if they are fighting using a totally different style, you may be quite confused.

Which is a more accurate state of the world, #monitoringsucks or #monitoringlove?

monitoringsucks, for many reasons. The most telltale sign is that so many monitoring tools are still line-graph based, and can’t show full distributions. Heat maps, which can show the full distribution over time, have solved a ton of problems. Many monitoring tools also regurgitate existing metrics, even when they are wrong or confusing or leave observability gaps. We’re drowning in badly-designed metrics. What we want are tools that answer the questions we have, and use the best visualizations to do so: line graphs, and histograms where appropriate, heat maps and more.

In six words or less, what are the biggest challenges your organization faces?

Technology education, internal and external.

What’s the best piece of advice you’ve ever received?

“You should understand this”. A performance expert was helping me get started in the industry, and was referring to Solaris Internals 1st Edition. So I read it cover to cover, never turning a page before understanding everything on it. It took me over a year to finish it, and my copy would fall open to the pages on Solaris 7 turnstiles – which took many passes to understand. This wasn’t just about the technical value the book gave me, but also the confidence that I could cope at that level.

What principles guide your expertise in your given domain?

  1. Rigor.
  2. Nothing is too hard. But it may be too costly in terms of time.
  3. Follow the scientific method, and be aware of assumptions.
  4. Don’t trust anything, especially benchmarks. Even OS metrics can be wrong.
  5. There are known knowns, known unknowns, and unknown unknowns.

** What is your vision for system administration in 5 years?**

More cloud, and cloud-related responsibilities, including managing highly-available environments and hybrid clouds.

Recent Posts

Posts by Topic

see all