Even today, Code Freezes are pretty popular process initiatives. They aim to maintain software stable in moments customers (or some very important part of them) might need it the most. For the unfamiliar (that some might even call lucky!) tech reader, a very simple and abstracted description would be nothing new gets merged / deployed / released during a previously defined window of time (more details on wikipedia).
While the industry is still divided on whether this is a good or effective practice, I’m going to guide you through a journey of 2 different personal experiences and share how I’d approach this if I’d be to implement it again in another organization.
The unchallenged company practice
I was part of an organization which had Code Freezes during important Holidays. On an Accelerate scale, they were a Low-Medium (but improving) performer. I could see this from the practices they were using:
- Fixed, predefined company-wide release schedules
- Software needed to pass through sequential environments after manual decision making and actions
- The release was a single package of a considerable amount of artifacts, which would work as expected only if shipped as one thing
- A dedicated team was in charge of resolving any incidents (rather than the team owning a particular piece of that software)
It was not clear for me the exact context on which this practice made its way as a process inside this organization (and I suppose it was a rather long standing tradition, as it was never really challenged). At the same time, I knew that this is not something that can go away any sooner in that current context, just because of the risk associated with releasing. And I understand it. If your Change / Failure Rate tells your chances that this goes wrong are more than 1 out of 5 and your Mean Time to Recover is at least one day, the customer will really be affected. Code Freeze makes a bit of sense here, right?
The fear of the unknown
The second part of this journey focuses on another organization which implemented Code Freezes on Fridays (or prior to any customer-facing event), after a considerable period of time with (production) incidents happening during weekends. The introduction of this practice was challenged, and, as there was no more data brought into discussion, the part supporting it won. This organization is a high / elite performer on an Accelerate scale, and this time I could tell it by looking at the metrics myself: tens of daily deployments, a Change / Failure Rate below 5% where incidents are solved in a matter of hours. A couple of months after running the Code Freezes, powered up by the part of the organization that was originally against the change, I’ve personally ran an analysis to understand the success of this. We’ve concluded:
- No actual improvements (nor decline) on the Accelerate metrics
- A considerable decrease on the incidents happening on weekends
On the first sight, it might have looked like the measure was a success. Looking deeper into the incidents root causes, I’ve discovered that:
- Just 1 of the many incidents happening during weekends from the initial analysis were caused by a code change
- Neither of the incidents still happening on weekends were related to a code change
To sum up, the conclusions were invalidating the Business Case for the Code Freeze and showing that it didn’t really have an impact. In this case, one could argue, why should anyone continue to keep this practice as part of their organization?
Taking a decision
What I’ve learned from these two experiences is that, as with any other initiative trying to positively impact your organization, before adopting Code Freeze, I think it’s important to:
- Analyze a bit more the existing data: incident information, incident reports, bugs, postmortems, etc. Diving a bit in won’t hurt anyone. Make a data-driven decision: do you really need this? In absence of data, maybe you can generate it before you adopt something you might not need, and that might prove as not helpful for your situation.
- Understand where your Engineering efforts currently stand and it’s expected growth. Think if it fits with all the processes and practices you currently have at the company and if this doesn’t interrupt you for achieving what you want for the future. Can something else reduce the problems that are more aligned with your mid/long term goal? If this fails to succeed, can it be easily removed as a practice?
While I personally think there are much better and modern ways to prevent incidents from happening than stopping Engineers from completing their work in certain periods of time, if the data and understanding clearly point out to this practice being meaningful for a particular organization, maybe it should be considered.