Editor's Soapbox: On Systemic Debt
I recently caught up with an old co-worker from my first "enterprise" job. In 2007, I was hired to support an application which was "on its way out," as "eventually" we'd be replacing it with a new ERP. September 2019 was when it finally got retired.
The application was called "Total Inventory Process" and it is my WTF origin story. Elements of that application and the organization have filtered into many of my articles on this site. Boy, did it have more than its share of WTFs.
"Total Inventory Process". Off the bat, you know that this is a case of an overambitious in-house team trying to make a "do everything" application that's gonna solve every problem. Its core was inventory management and invoicing, but it had its own data-driven screen rendering/templating engine (which barely worked), its own internationalization engine (for an application which never got localized), a blend of web, COM+, client-side VB6, and hooks into an Oracle backend but also a mainframe.
TIP was also a lesson in the power of technical debt, how much it really costs, and why technical solutions are almost never enough to address technical debt.
TIP was used to track the consumption of inventory at several customers' factories to figure out how to invoice them. We sold them the paint that goes on their widgets, but it wasn't as simple as "You used this much paint, we bill you this much." We handled the inventory of everything in their paint line, from the paint to the toilet paper in the bathrooms. The customer didn't want to pay for consumption, they wanted to pay for widgets. The application needed to be able to say, "If the customer makes 300 widgets, that's $10 worth of toilet paper."
The application handled millions of dollars of invoices each year, for dozens of customer facilities. When I was hired, I was the second developer hired to work full time on supporting TIP. I wasn't hired to fix bugs. I wasn't hired to add new features. I was hired because it took two developers, working 40 hours a week, just to keep the application from falling over in production.
My key job responsibility was to log into the production database and manually tweak records, because the application was so bug-ridden, so difficult to use, and so locked down that users couldn't correct their own mistakes. With whatever time I had left, I'd respond to user requests for new functionality or bug-fixes.
It's usually hard to quantify exactly what technical debt costs. We talk about it a lot, but it very often takes the form of, "I know it when I see it," or "technical debt is other people's code." Here, we have a very concrete number: technical debt was two full-time salaries to just maintain basic operations.
My first development task was to add a text-box to one screen. Because other developers had tried to do this in the past, the estimate I was given was two weeks of developer time. 80 hours of effort for a new text box.
Once, the users wanted to be able to sort a screen in descending order. The application had a complex sorting/paging system designed to prevent fetching more than one page of data at a time for performance reasons. While it did that, it had no ability to sort in descending order, and adding that functionality would have entailed a full rewrite. My "fix" was to disable paging and sorting entirely on the backend, then re-implement in on the client-side in JavaScript for that screen. The users loved it, because suddenly one screen in the application was actually fast.
Oh, and as it was, "sorting" on the backend was literally putting the order-by clause in the URL and then SQL injecting it.
There were side effects to this quantity of technical debt. Since we were manually changing data in production, we were working closely with a handful of stakeholder users. Three, in specific, who were the "triumvirate" of TIP. Those three people were the direct point of contact with the developers. They were the direct point of contact to management. They set priorities. They entered bug reports. They made data change requests. They ruled the system.
They had a lot of power over this system, and this was a system handling millions of dollars. I think they liked that power. To this day, one of them holds that the "concept" of TIP was "unbeatable", and its replacement system is "cheap" and "crap". Now, from a technical perspective, you can't really get crappier than TIP, but from the perspective of a super-user, I can understand how great TIP felt.
I was a young developer, and didn't really like this working environment. Oh, I liked the triumvirate just fine, they were lovely people to work with, but I didn't like spending my days as nothing more than an extension of them. They'd send me emails containing UPDATE statements, and just ask that I execute them in production. There's not a lot of job satisfaction in being nothing more than "the person with permission to change data."
Frustrated, I immediately starting looking for ways to pay down that technical debt. There was only so much I could do, for a number of reasons. The obvious one was the software. If "adding a textbox" could reasonably take two weeks, "restructuring data access so that you can page data sorted in descending order" felt impossible. Even "easy" changes could unleash chaos as they triggered some edge case no one knew about.
Honestly, though, the technical obstacles were the small ones. The big challenges were the cultural and political ones.
First, the company knew that much of the code running in production was bad. So they created policies which erected checkpoints to confirm code quality, making it harder to deploy new code. Much harder, and much longer: you couldn't release to production until someone with authority signed off on it, and that might take days. Instead of resulting in better code, it instead meant the old, bad code stayed bad, and new, bad code got rushed through the process. "We have a hard go-live date, we'll improve the code after that." Or people found other workarounds: your web code had to go through those checkpoints, but stored procedures didn't, so if you had the authority to modify things in production, like I did, you could change stored procedures to your heart's content.
Second, the organization as a whole was risk-averse. The application was handling millions of dollars of invoices, and while it required a lot of manual babysitting, by the time the invoices went out, they were accurate. The company got paid. No one wanted to make sweeping changes which could impact that.
Third, the business rules were complicated. "How many rolls of toilet paper do you bill per widget?" is a hard question to answer. It's fair to say that no one fully understood the system. On the business side, the triumvirate probably had the best handle on it, but even they could be blindsided by its behavior. It was a complex system that needed to stay functioning, because it was business critical, but no one knows exactly what all of those functions are.
Fourth, the organization viewed "support" and "enhancement" the way other companies might view "operational" and "capital" budgets. They were different pools of money, and you weren't supposed to use the support money to pay for new features, nor could enhancement money be used to fix bugs.
Most frustrating was that I would sometimes get push-back from the triumvirate. Oh, they loved it when I could give them a button to self-service some feature which needed to use manual intervention, so long as the interface was just cumbersome enough that only they could use it. They hated it when a workflow got so streamlined that any user could do it. Worse, for them, was that as the technical debt got paid down, we started transitioning more and more of the "just change production data" to low-level contractors. Now the triumvirate no longer had a developer capable of changing code at their beck and call. They had to surrender power.
Fortunately, management was sensitive to the support costs around TIP. Once we started to build a track-record of reduced costs, management started to remove some of the obstacles. It was a lot of politics. I suspect some management were wary of how much power the triumvirate had, and were happy when that could get reduced. After a few years of work on TIP, I mostly rolled off of it onto other projects. Usually, I'd just pop in for month-end billing support, or being the expert who understood some of the bizarre technical intricacies. Obviously, the application continued working just fine for years without me, and I can't say that I miss it.
TIP accrued technical debt far more quickly than most systems would. Some of that comes from ambition: it tried to be everything, including reinventing functions which had nothing to do with its core business. This led to a tortured development process, complete with death marches, team restructurings, "throw developers at this until it gets back on track," and several ultimatums like "If we don't get something in production by the end of the month, everyone is fired." It was born in severe debt, within an organization which didn't have good mechanisms to manage that debt. And the main obstacles to paying down that debt weren't technical: they were social and political.
I was lucky. I was given the freedom to tackle that debt (or, if we're being fully honest, I also took some freedom under the "it's easier to seek forgiveness than to ask permission" principle). In a lot of systems, the technical debt accrues until the costs of servicing the debt are untenable. You end up paying so much in "interest" that you stop being able to actually do anything. This is a failure mode for a lot of projects, and that's usually when a "ground up" rewrite happens. Rewrites have a mixed record, though: TIP itself was actually a rewrite of an older system and promised to "do it right" this time.
Technical debt has been on my mind because I've been thinking a lot lately about broader, systemic debt. Any systems humans build-software systems, mechanical systems, or even social systems-are going to accrue debt. Decisions made in the history of the system are going to create problems in the future, whether those decisions were wrong from the get-go, or had unforeseen consequences, or just the world changed and they're no longer workable.
The United States, right now, is experiencing a swell of protests unlike anything I've seen in my lifetime. I think it's fair to say that these protests are rooted in historical inequities and injustices which constitute a form of social debt. Our social systems, especially around the operation of police, but broadly in the realm of race, are loaded with debt from the past.
I see similarities in the obstacles to paying down that debt. Attempts to make changes by policy end up creating roadblocks to change, or simply fail to accomplish their goals. Resistance to change because change entails risk, and managing or mitigating those risks feels more important than actually fixing the problems. The systems we're talking about are complicated, and it's difficult to even build consensus on what better versions look like, because it's difficult to even understand what they currently are. And finally, there are groups that have power, and in paying down the social debt, they would have to give up some of that power.
That's a big, hard-to-grapple-with quantity of debt. It's connected to a set of interlocking systems which are difficult to pick apart. And it's frighteningly important to get right.
All systems accrue systemic debt. Software systems accrue debt. Social systems accrue debt. Even your personal mental systems, your psychological health, accrue debt. Without action, the debt will grow, the interest on that debt grows, and more energy ends up servicing that debt. One of the ways in which systems fail is when the accumulated debt gets too high. Bankruptcy occurs. Of all of these kinds of systems, the software debt tends to be the trivial one. Software products become unsupportable and get replaced. Psychological debt can lead towards things like depression or other mental health problems. Social debt can lead to unrest or perpetuations of injustice.
Systemic debt may be a technical problem at its root, but its solution is always going to require politics. TIP couldn't be improved without overcoming political challenges. US social debt can't be resolved without significant political changes.
What I hope people take away from this story is an awareness of systemic debt, and an understanding of some of the long-term costs of that debt. I encourage you to look around you and at the systems you interact with. What sources of debt are there? What is that debt costing the system and the people who depend on it? What are the obstacles to paying down that debt? What are the challenges of making the systems we use better?
Systemic debt doesn't go away by itself, and left unmanaged, it will only grow. If you don't pay attention to it, broken systems end up staying in production for way too long. TIP was used for nearly 18 years. Don't use broken things for that long, please.
[Advertisement] ProGet can centralize your organization's software applications and components to provide uniform access to developers and servers. Check it out!