One in a series of posts about Technical Investments, excerpting ideas from the upcoming book on that topic
I believe that software engineers should stop talking about Technical Debt.
I believe they should instead, advocate for Technical Investments, which are defined as:
Work that the engineers believe is valuable to their business…
…but that no one is asking for.
As per Favor Repeated Cycles Over One-Off Projects, the best way I've seen engineers use this concept is by way of an ongoing, iterative series of increments – not by way of giant, big bang projects.
Within each of those cycles, engineer leaders end up going through an series of steps, a sort of fundamental framework for leading a technical investment, in which they:
Identify issues the engineers are worried about
Map those issues into potential value for the business
Build visibility into that potential value
Develop options for small increments of investment
Share options and visibility with stakeholders to obtain a commitment of time
<Do Something> in that time
Celebrate improvements via story-telling
Start a new cycle, with more visibility and trust
Not every single cycle needs to go through every step – but it's good to understand this as an overall arc for your team and your stakeholders.
If you find yourself stuck, you can check if you've tried to skip past something important:
"Maybe our conversations with the stakeholder feel stuck because we have no visibility to offer".
" Let’s come up with some incremental options before we propose an investment".
"Before we start over, who have we told about this win?"
Let's bring this to life with a story.1
The Horror of Flagship Deploys
In 2020, I joined Ellevation Education, a thriving EdTech company serving the educators who work with English Learner students at public schools across the country.
As I ramped up, I discovered that the engineers had grown incredibly frustrated with deploying the legacy app, which the teams internally called “Flagship”.
Only a subset of development ran through that the legacy bits of Flagship… but when it did, it sucked.
The legacy Flagship deploy pipeline featured:
A hodgepodge of jobs spread across multiple automation platforms (Jenkins, Octopus, some GitHub bits)
Several key steps that needed to be manually kicked off once previous steps completed
Cryptic job failures that only the most senior engineers could resolve
A sprawling suite of poorly-maintained browser-driving front-end tests that enjoyed the properties of being both slow and flaky
The #deploy-sucks Slack channel was just a firestorm of angry gifs and emojis.
But… what was the product team's experience?
First, please note: Ellevation had a pretty technical savvy product team – Nathan Papazian, Ellevation's VP of Product2, very much tried to hold his PM's accountable for listening carefully to the engineers, and understanding the tradeoffs in their systems.
But what could that product team observe?
Well, any development that touched the legacy app felt slow.
But, legacy app development always felt slow.
And there were plenty of other contributing factors – understanding of the legacy app was poorly distributed throughout the team (which of course was made worse because deploying it was a nightmare, so everyone avoided it like the plague).
Also, the engineers were complaining about legacy app deploys.
But, to a first approximation, engineers are always complaining about deploys. So this didn't really stand out.
Furthermore, when the product team asked the engineers for any concrete improvement options, the engineers weren't able to offer much in the way of specifics – the whole thing was such a mess, it wasn't clear where to start.
One engineer kept saying "We need to rewrite all our front-end WebRobot tests", but that was clearly an apocalyptic amount of work.
And so they all felt stuck.
Then, one afternoon, while waiting for a deploy to finish, Alla Hoffman, a very bright and very frustrated engineer threw together a spreadsheet and asked all the engineers on the team to just manually log their deploy times in it (Alla titled the spreadsheet "Flagship Misery").
She asked engineers to fill in their name, one column when they started the first in the series of jobs, then another when the final job finished up. There was also a column for free text notes on anything that happened.
Setting up that spreadsheet took her about 30 minutes (counting the, ahem, vigorous email she sent to all of engineering encouraging them to keep it up to date).
In so doing, Alla created visibility – which is an excellent form of value for a business, because it allows people to make better decisions.
She did so as an "on-the-side" project – one where the engineers don't ask the product team for permission/capacity, but just quietly scrape together a bit of time.
The major thesis of the book is that engineers can and should develop a collaborative partnership with stakeholders around technical investments.
But some work is still best done without a formal negotation. That's an especially good pattern for cheap initial steps to build visibility.
In the book I’m going to talk about different scopes for technical investments, and where the on-the-side approach works, and where it falls down, as part of what I’m calling "The Ladder of Commitment".
So, back at Ellevation, once Alla had set up that spreadsheet, what happened next?
The engineers on the team were plenty motivated to track their deploys (and had plenty of time to do so, thanks to the various forms of failure). They didn't experience this as annoying manual overhead – rather, they experienced it as validation for their pains, and a chance to contribute to a better future.
After a few short weeks, Lisa McCusker, Ellevation's engineering manager over that domain3, brought the spreadsheet to Nathan and the product team.
Together, they all looked at how long it was taking to get legacy app changes out to production – and discovered that, on occasion, there were so many repeated failures, it took more than a full day to get a single deploy out. The comments were filled with complaints about flaky tests and mysteriously stuck jobs.
At this point, it wasn't hard for Lisa to convince Nathan to carve out a week for Alla to go back and instrument the key stages of the deploy process, so they could better understand what the hell was going on (this is what I call "Ticket" scope).
Thus, a few weeks later, they were looking together at a clearer picture of overall deploy trends and, for various internal stages, both times and failure rates.
The flaky WebRobot front-end tests proved to be the worst culprit – often needing to be re-run multiple times until they passed.
But, unfortunately, there was no simple fix – it was tempting to just rm -rf the whole set, but everyone agreed that, on occasion, the tests would catch a very bad problem in some ancient part of the legacy product that customers still depended on.
Lisa made a case for a carefully time-boxed, three week-long effort by a couple of engineers, to inventory all the tests, come up with options, and share those back with her and Nathan, before deciding which one to run with (this is "Project" scope).
Note that what Lisa offered had a built-in off-ramp, partway through: when the engineers shared options, she and Nathan could decide to pause the rest of the project.
With the potential value, the increment and the off-ramp all clear, Nathan was ready to commit.
He and Lisa worked together to find a time for this project – they weren't working much in the legacy app at the moment, but both knew a big chunk of work on it was coming, and they were both motivated to get deploy improvements in before it landed.
With some careful co-planning, they found a chunk of capacity.
When the engineers dug in, the product team worked closely with them. Product and engineering decided together which features were most important to retain test coverage for, and which areas were okay to leave with less coverage.
Thanks to having built shared understanding, the product team were ready to pitch in and do this work together.
When, after a few weeks, the engineers brought options to Nathan and Lisa, they all decided together to have the team do two things.
First, they just flat out deleted a big set of tests (deleting code is Lisa's absolute favorite thing to do, she was very happy on that day).
Second, they moved the remaining flaky-but-sometimes-valuable tests off the main deploy path – they only ran that full suite for the small subset of deploys that touched certain parts of the legacy app. The rest of the deploys now only ran a set of core tests, which did not exhibit random transient failures.
Those two steps immediately made the vast majority of legacy deploys more reliable and faster.
Furthermore, Nathan, Lisa and the entire team could all see that improvement on the graphs of average deploy time — because, as a small, ticket-sized follow up, Alla had piped the deploy times into Grafana so the team and the PM's could visualize them over time.
For a few more months the team kept steadily chipping away at the deploy process, in parallel with a great deal of feature work.
Sometimes it was just a ticket here or there, sometimes an engineer would drop off the main sprint for a week or even a month to focus on some specific challenge.
Eventually, the legacy app deploys became "good enough", and, by common agreement between Lisa and Nathan, the pace of investment in this specific area slowed.
To be clear, legacy app deploys were still far from ideal
But they were enough better, that further investment didn't seem indicated at that point.
And so life went on.
Then, one day, the legacy app suffered a major outage.
In the course of resolving the incident, the team rapidly deployed one change after another, first to diagnose and then to fix the underlying issue.
When Lisa wrote up the post-mortem notes, she took time to carefully document how the fast, reliable deploys had saved Ellevation somewhere between one and three full days of downtime.
She made a point of sharing those post-mortem notes with both the product team and the CEO – a fundamental enabling strategy for technical investments is to "Make Your Post-Mortems an Act of Visibility".
All of which eventually led to Ellevation's (highly non-technical!) CEO, Jordan Meranus, beaming with pride at a company All Hands as Lisa and Alla told the entire company the story of how the team had gradually improved deploys.
The human mind is deeply wired to remember stories, so it's well worth your time to use storytelling structures to make technical investment wins feel vivid, real and meaningful to a broad audience.
I want to wrap up by taking a moment to celebrate that All Hands: every single employee at Ellevation, from the help desk to the sales directors to customer success managers, happily listened to a story about a nitty gritty improvement to a CI/CD pipeline.
Improving Flagship deploys had gone from something that “no one was asking for” to something the entire company celebrated.
That’s the kind of win I want to help all engineering teams find.
If you're reading this at some point in the summer of 2025, I'll make a request:
Could I ask you to let me know what you're most eager to learn more about, with regard to tech investments / tech debt?
That could be something I've touched on, or possibly something I have not touched on, but which seems important for running this game plan.
I can be reached by replying to Substack emails (buildingandlearning@substack.com), or at milstein.dan@gmail.com
Also, of course, share this with anyone you think would find it useful!
This is a "100% true as I remember it" story. If any participants remember any details differently, please let me know!
And now CPO!
And now VPE!