Avoiding the Rewrite Trap
A common challenge for engineering leaders looks something like this:
The team they are managing is frustrated. They are struggling to ship features, and they complain that the issue is that the old systems are just too hard to work in. They are written in a legacy programming language that no one likes, the original builders made bad architectural choices that just won’t scale, or everything is just far too brittle and every change takes too long to test and still fails half the time. The only solution is, therefore, to rewrite the system from scratch. Using the newest and greatest language, architecture, and infrastructure, of course.
It’s a trap.
I’m not saying you should never rewrite anything. I have led successful rewrites, so I know that it is possible to do this. But before you agree to this, here are reasons it is a bad idea.
You don’t know what all this code does.
I once joined a team that had done a hack week to try to rewrite the old codebase from crufty php into Java. I was told that they had gotten a huge part of the way there, and it would only take a few more months before they could retire the old system completely. Hahahahahaha.
When I left the company 4 years later, most of the important stuff had been rewritten, but there were still bits of the old php sitting around, and that had taken years of careful work to accomplish. Unless this system is very new and barely used (in which case, why are you rewriting it??), or very small, there is no way that you have thought through all of the pieces of code you will actually need to replicate.
Someone has to run and modify the old system while you’re writing the new one. But that job sucks, and they’re likely to quit before you’re done.
You are imagining that your whole team can swarm on the new thing and just knock it out. If you could do the rewrite in a few weeks, maybe. But more likely, you have to keep some people back to keep the old system running, fix bugs, or even add new features to that old system. If those people think that they are on a sinking ship, they are likely to quit, leaving you with a code base that no one wants to support but is still critical to paying the bills. Sure you could rotate the team through supporting the old system, but over time the people who know the old system are likely to leave, and the newcomers will disdain learning the legacy stack.
You don’t understand what is bad about the old system in order to fix it.
This is one of the easiest pitfalls to avoid, and yet people still walk into rewrites without doing this work. Can the team articulate the underlying reasons that the old system is failing? Sometimes there are clear causes, but often it is more nebulous (“the users are complaining about the old system so we need to rewrite it,” “Java sucks and Rust is cool”). If you can’t even articulate why the old system is bad, how do you know that the new system is going to fix it?
Often the justification for rewrites like this is “developer productivity.” As a believer in developer productivity, I think this is a fine motivation, but there need to be measures behind that intuition that you can point to. “We need to make it possible to ship the code base to production on demand” is a much more compelling and clear project than “we need to rewrite this system to increase developer productivity.” Identify the actual blockers to productivity that you are experiencing when working with this system, and make a plan to fix those.
You are justifying this rewrite by piggybacking it on an experimental product or feature.
Sometimes engineers are tempted to use the excuse of a new product or feature demand as a chance to rewrite something they hate. This might work if the product really does need things that the old system cannot possibly do. But most of the time, the best approach here is to do as little as possible to make sure that the feature is really a high-value opportunity before committing to a full rewrite in order to support it. Don’t add to the risk of a speculative product launch by tying it to a high-risk rewrite initiative.
You have no plan other than “we will rewrite.”
Rewrites are huge endeavors. For active products, they can involve migration of users and data, changes to upstream or downstream systems, and careful orchestration of releases. Have you thought about how you will do these migrations? Do you know what dependencies might need to change? If you are changing programming languages or infrastructure, moving from monolithic components to distributed services, or making other major technology changes, do you have people who really understand how to operate and debug these types of systems in production? Who will need to be trained on the new stack? Have you even thought about the monitoring, alerting, and debugging tools you will need to operate this thing?
A rewrite where you haven’t even made a gesture at planning is already failed. Stop being lazy and get to work writing down all of the steps you will need to complete in order to do this successfully.
Nothing I’ve said here is a brand new idea. And yet, year after year, engineers convince themselves and their leadership that a rewrite will solve all their problems. And then they (or their leadership) get fired, because most rewrites fail to deliver anything at all. Avoid the trap: don’t go into this exercise unless it is the only way forward, and if you absolutely must, plan accordingly.
Enjoy this post? You might like my book, The Manager’s Path, available on Amazon and Safari Online!
Originally posted on leaddev.com.