Software releases should be boring
The release went great. Dan was up until 3am playing the system like a video game and making a last minute database patch. It came back up fine.
That is a real statement from a project lead about a system I worked on early in my career. Even then, great didn’t sound like the correct word to use for how things went. 3am is way past my bedtime, and there was nothing special or high risk about the release.
Other leaders have written about which parts of a system and should be boring: Technology Choices, Planning, probably others.
Releases should go on that list too. Deploying software to your users, whether that means updating 1000 servers or pushing an app store package should be mechanical, dull, and repeatable.
Boring releases have a few benefits:
- Automation!
- Top line measure of software quality
- Increased system agility
Automation of deployment is only possible if the release process for a system is stable and doesn't require human judgement. Even tech forward "modern" software companies have humans making choices about every release. Push-on-green and true continuous delivery are rare, except in blog posts.
When releases are boring, you immediately get a clear measure of the quality of software entering the release process. Things don't need to be fully or even partially automated for this to be visible. The metric is "how often is the release interesting?"
Interesting means:
- rollback
- patch and roll forward
- incident
- critical bug discovered
- automation didn't work or had to be modified
- incompatibility with some dependency
Any deviation from the normal, documented (maybe automated) procedure makes a release interesting. When starting out, you can choose a less strict definition, but keep adding rigor as the system and tools improve. Try to keep your count of interesting releases as low as you can. If you're feeling fancy you could even call this an SLI for software quality.
It is counterintuitive, but boring, mechanical releases make a system more agile and resilient. When I'm working on a crisis response project some of our early questions center around deployment processes:
- how often do you release X?
- how long does it take?
- how often does it go smoothly?
When a system can be deployed predictably with high trust, then releasing software can be a valuable response to unexpected circumstances. When that process is fragile then this mitigation is much riskier.