Ratchet

One of my favorite all-purpose engineering tools (maybe it’s more accurate to call it a “technique”, but I’m going to stick with “tool” here) is the ratchet.

In my career, I’ve had the benefit of some very long tenures at different organizations. I and teams I’ve worked on have launched new greenfield projects, I’ve maintained some codebases for more than a decade, I’ve done big bang rewrites and piecemeal migrations. I’ve worked with experienced and talented developers and complete newbies. I’ve also inherited a lot of code and systems. Some of those have been well designed, well tested, and well documented. Others have been… not so much those things.

I’m not dogmatically against rewrites. Sometimes that’s the appropriate solution. Often though, it’s not practical or feasible to rewrite a large codebase or existing system even if it’s in terrible shape. It needs to be improved in place. The thing with systems that are already in bad shape is that making changes is risky. The larger the change, the riskier. It’s often clear that the current state of things is bad, but you don’t know exactly what “good” would look like or how to get there from where you are.

This is where the ratchet comes in.

A ratchet is two parts:

any small change that improves the codebase or the system in some way.
some safeguard that locks that change in place.

Fix a bug? Add a regression test to make sure the bug stays fixed. No automated tests at all? Add a “dummy” test suite that runs zero tests. Obviously that won’t catch any bugs by itself, but it should be low risk to introduce (you’re just adding test harnesses) and when you do start adding tests, you’ll have the scaffolding there to fit them into. Set up a commit hook or Github action to run the dummy test suite. Again, it shouldn’t introduce any risk but will get everyone accustomed to seeing tests pass as part of the development cycle. I’ve seen dummy test suites like this catch syntax errors or broken imports in code just by virtue of ensuring that the code is at least parsed and compiled before getting pushed out to production (we’ve all seen developers make “just a tiny change” and push without even running it locally; if we’re honest, most of us have done that ourselves).

Is the code all over the place in terms of conventions? Add a simple linter tool (eg, flake8 or eslint). Most of them will let you enable/disable different rules. If you need to, start out by disabling every single rule so that it’s not actually checking anything, but add it to the commit hooks or CI setup. Then you can enable one rule at a time later on as you gain confidence that they aren’t breaking anything. Each rule that gets enabled prevents the codebase from ever having that problem again. Eventually, you might make enough progress that you’re comfortable switching to an automatic formatter like black or go fmt or similar.

Is a deploy process manual, slow, and error prone? Write a Runbook entry documenting it as well as you currently understand it. Then start automating parts of the runbook. Add a simple end to end “smoketest” at the end of the deploy to verify that the deploy was successful. Before you know it, you’ll have a completely automated deployment process.

None of these are revolutionary ideas. I just find it useful to think in terms of this “ratchet” mechanism when I’m improving a codebase, a system, or even a team’s process. Make lots of small steps and make it easier for the system to naturally move towards a “better” state than a worse one. At some point the system dynamics take over and become self-reinforcing.