Writing

On Hiring Sysadmins in SRE

Once upon a time, I participated in a committee responsible for reviewing potential hires into Site Reliability Engineering. The candidates typically fell into two categories:

  • software engineers and people with Computer Science degrees who fell into operations and liked it (me!)

  • system administrators, hardware technicians, and network engineers who worked with large software deployments and liked scripting away their problems

Many tech organizations build their hiring process to focus on coding, algorithm puzzles, and other things you (might?) learn when you get a Computer Science degree. The folks who come from a system administration background will not do especially well in these interviews. But they make spectacular SREs.

And so my hiring committee asked the question: “What should we expect from a coding interview when the candidate has been working as a system administrator?”

My immediate (somewhat snarky) answer was that they should be able to tell a computer what to do. This stuck, and is still being referred to as “The Geisser coding standard” in those hiring discussions.

A friend has asked me to expand on this idea, so here goes. Most coding in SRE is workflow management, error handling, and processing monitoring data. You don’t need to write a correct shortest-path search algorithm or optimize a game-of-life simulator (does anyone do this in the real world?).

I expect people to have a programming language they are comfortable with. They should be able to use that language idiomatically to make a computer solve a well-defined problem with no tricks. This Advent of Code puzzle is a great example. If you can turn the problem description into correct code, you’re done (and it still took me several hours).

So if you insist on doing coding interviews or take-home projects, use something very straightforward. For everyone. But especially for your SRE and sysadmin candidates.

Carla G