Carla Geisser - Software Exorcist

View Original

Class notes: Jepsen Distributed Systems

I’ve been building and operating distributed systems for my entire career. But my formal training in the topic consists of one (1) class in university. So I jumped on the chance to attend Kyle’s distributed systems fundamentals class.

If you build things with more than one computer, I highly recommend getting your corporate overlord to pay Jepsen for this training. The curriculum is a good mix of fundamentals, subtle theory, and practical recommendations, with a bit of operations group therapy thrown in.

In no particular order, here are some tidbits I picked up.

Append only logs and merge-able data structures are magic

I’ve seen and built many systems like this, this class gave me the theory to talk about why. Whenever possible, start with one of these building blocks:

  • an append-only monotonic log. Need to delete something? Log a “deleted” record. I learned this has a name: The CALM conjecture.

  • data types that can be idempotently merged together (CRDTs)

If you have one (or both) of these things, you can build a scalable eventually consistent system with minimal coordination. Neat!

Humans want Causal consistency

As a user, I think the system is broken if I get a “new message” notification and click through to find “no new messages.” This happens all the time. In theory words, this is breaking causal consistency.

Unfortunately very few distributed systems even try to give us this property.

Consensus: don’t.

Getting your whole system to agree on a particular state is slow (2 message round trips at best). Push this cost as far away from the critical path as you can. There were numerous examples in the class of using consensus to: elect leaders, assign timestamp epochs, manage cross-shard transactions. This lets other components act independently as much as possible.

Machine timestamps: seriously, don’t.

We think we can sneak around the consensus problem by using local system timestamps. This is fine if you have an extremely accurate global clock (you don’t) and no invisible delays in your system (you don’t).

The end

That was 4 intense afternoons, but I have new vocabulary to wrap around the concepts floating in my head. The best part by far was trading stories with the other class participants. We’d all seen computers do terrible things.

A few weeks later Jepsen’s analysis of MySql inspired this song.