Writing resilient programs with Go and robustly.Run()

Posted by Baron Schwartz on Jul 30, 2013 10:14:00 AM

Google’s Go language makes it easy to do things that are hard in many other languages. Making highly resilient programs is one of them, but it doesn’t seem to be discussed a lot. Crashes will happen eventually in every program, and it’s good to know how the program will behave even when you have no idea how or why it will crash.

resilient_programs

The short version is that although Go uses errors for error handling, it also has something analogous to exceptions: panic(). But unlike languages that force you to either declare or handle the exception, Go lets you decide. A panic() can be caused by — for example — accessing out of an array’s bounds. If not handled, it crashes your program.

This is eminently sensible once you’ve learned the Go-ish way of doing things. Go also has an eminently sensible way to handle panics: defer recover(). And there’s a great example of how to use it in the Go standard library, in the built-in HTTP server. Each request runs in a separate goroutine, and if it crashes, it’s recovered so it won’t crash the whole HTTP server.

Which brings us to robustly, a little library I wrote a while ago, to help run programs in the same fashion without having to write the code yourself. We’re open-sourcing that library today.

It’s actually a slightly controversial topic. The gist of it is this: it’s easy to over-apply the technique and end up masking errors that you should catch. But as the robustly documentation says, in specific cases, it can be useful. We use it in our agents, which are running in what I like to call a hostile environment: on servers we can’t access or control, without the ability to easily inspect or troubleshoot. These agents need to be as close to bulletproof as possible, if they’re going to be even slightly useful. So we judiciously sprinkle a little robustly.Run() around the key bits of data processing code.

We also sprinkle random crashes into the code with robustly.Crash(), which is included in the library. The concept here is something like Netflix’s Chaos Monkey: you think your code is robust, eh? Well, there’s only one way to prove it!

As always, we hope this code is useful and we welcome your contributions.

The image above is from NASA. It’s a concept of what’ll happen someday: Andromeda will crash into the Milky Way. All things crash, eventually.

Recent Posts

Posts by Topic

see all