A certain kind of argument about AI and software development has settled into the discourse with the comfort of received wisdom. The argument goes roughly as follows: generative AI makes developers faster but the code it produces accumulates technical debt that will eventually cripple systems and destabilize organizations. Junior developers face outsized risk. Legacy environments carry disproportionate vulnerability. The prescription is guidelines, senior oversight, mentorship, caution.MIT Sloan Management Review recently published a polished version of this argument under the title “The Hidden Costs of Coding With Generative AI.” The piece is worth engaging not because it is unusual but because it crystallizes a set of assumptions that have become pervasive, deserving considerably more scrutiny than they tend to receive.
The assumptions are these: that technical debt is inherently pathological, that AI introduces a categorically new kind of it, that legacy systems are liabilities, that the trajectory from old to new is the only serious direction, and that the existing professional hierarchy of software engineering is the appropriate safeguard against the risks of AI-assisted coding. Each of these assumptions is either wrong or incomplete in ways that matter for how organizations think about AI, platform design, and the future of productive systems.
Productive Complexity
Software engineering is the disciplined accumulation of managed complexity. That sentence should be uncontroversial but the way organizations talk about technical debt often treats complexity as a disease rather than a characteristic of every productive system ever built.
The consumer debt metaphor has become the default framing. Technical debt carries “principal” and “interest.” AI-generated code borrows at a higher rate. The moral logic follows naturally: cut up the credit card, stop borrowing, live within your means. The MIT article makes this analogy explicitly and builds its entire cautionary structure around it.
However, the analogy fails because technical debt in productive computing systems doesn’t behave like consumer debt. Consumer debt finances consumption. The asset depreciates. The interest compounds against the borrower. Corporate debt finances productive capacity. The resistance to leverage in that context is not prudence but paralysis because the capacity financed by the borrowing typically generates returns that exceed the cost of carrying the obligation.
Linux inherited design assumptions from UNIX that purists criticized for decades. The kernel development process prioritized shipping over theoretical elegance. That accumulated complexity now runs virtually every supercomputer on the planet, powers the majority of cloud infrastructure, and serves as the foundation for the very AI systems that critics worry about. COBOL is over sixty years old. Banking systems built on COBOL carry decades of accumulated decisions that no one would reproduce from scratch. Those systems process the majority of the world’s financial transactions reliably every single day and regulators trust them precisely because of their long operational history rather than in spite of it. AIX, IBM i, and z/OS drive roughly 80% of the world’s critical infrastructure in large-scale enterprise environments. Whatever technical debt these platforms carry has paid for itself many times over in operational stability and productive capacity. No greenfield alternative has yet to arrive or match the day after day year after year decade after decade performance of these systems.
Design patterns tell the same story at a different scale. The Singleton pattern carries overhead. The Observer pattern creates hidden coupling. Factory patterns add layers of abstraction. Every one of these would register as technical debt under the standard definition. Every one of them also exists because the trade-off enables capabilities that purity forecloses. Design patterns are the profession’s own admission that managed complexity is not a failure of engineering but rather its substance in the real world.
The Y2K crisis is frequently cited as a cautionary tale about technical debt and the MIT article invokes it for exactly this purpose. The two-digit year shortcut was a human decision made by human developers under human constraints. If the most prominent historical example of catastrophic technical debt is entirely human in origin, the claim that AI introduces something categorically new requires more than assertion. The history of software development is littered with human-generated complexity that dwarfs anything AI has yet to produce. What changes with AI is the speed and scale at which complexity can be generated. Speed is a real consideration. Scale is not a new category of risk. Organizations have been managing the tension between velocity, scalability, and stability since long before generative AI existed. The disciplines required here are structural, not moral.
The Modernization Trap
As I argued recently in “The Modernization Problem: What We Lose When We Stop Asking Questions About AI and Infrastructure,” the word “modernization” carries a specific intellectual lineage that the technology sector has internalized without examination: 1) the discursive structure of mid-twentieth century development theory has migrated into enterprise technology with remarkable fidelity. All actors seem to follow a single trajectory from traditional to modern, the present state is framed as deficiency, and non-adoption is coded as backwardness. What exists is “legacy.” What is coming is “modern.” The complexity introduced by one phase of adoption becomes the justification for the next. So, the adopting organization never graduates from client to peer and the licensing fees never go away.
Arguments about AI-generated code and technical debt operate squarely within this structure whether the authors recognize it or not. Brownfield environments appear as inherently dangerous, burdened by “hidden debt” that new code will compound. Greenfield environments appear as relatively safe precisely because no legacy code constrains the developer. The word “legacy” functions as a self-evident problem requiring no argument. The possibility that a legacy system might be the right system, that a brownfield environment might be stable, performant, and well-suited to its purpose never enters the analysis. The most important engineering question, fitness for purpose, is foreclosed by the assumption that the trajectory from old to new is the only serious direction.
A well-tuned system with predictable load characteristics may gain nothing from being redesigned except operational complexity. Tightly coupled latency-sensitive workloads may perform worse on platforms built for loosely coupled and horizontally scalable services. Proven deterministic logic does not become inadequate simply because a newer paradigm exists. The largest and most demanding compute environments in the world run heterogeneous configurations precisely because no single platform is optimal for every workload. These observations are not arguments against any particular technology. These observations are arguments against a framing that treats adoption as synonymous with progress because that framing makes reaching the conclusion that a given workload should stay where it is very difficult even when that conclusion is the correct answer.
The evidence commonly marshaled in support of AI-related risk deserves closer examination than it typically receives. The catastrophic examples that populate these arguments, Southwest Airlines, CrowdStrike, and Newark Liberty, did not involve AI-generated code and have no bearing on the question of whether AI-assisted development accelerates technical debt. The quantitative evidence fares little better under scrutiny. The GitClear analysis showing increased code duplication and churn between 2020 and 2024 attributes the trend to AI without controlling for the expansion of who was writing code during that period, the shift in what kinds of projects were under development, or the post-pandemic acceleration of rapid shipping culture across the industry. Any of these factors could account for the observed trend independently. All of them were operating simultaneously. The Google DevOps finding correlating increased AI usage with decreased delivery stability rests on the same survey-based self-reported methodology that has drawn well-founded criticism when McKinsey employed it to claim that 80% of companies see no meaningful returns from AI. As I examined in detail in “The 20% Question: Why AI Isn’t Failing and What’s Missing,” the precision with which such figures are cited far exceeds the rigor of the methodology that produced them.
The productivity claims that AI skeptics correctly note were conducted in controlled environments carry equivalent limitations to the risk findings those same skeptics prefer. Applying skepticism selectively is not rigor. When numbers appear in a qualitative argument to lend empirical texture rather than to withstand methodological scrutiny, they are performing a rhetorical function and not an analytical one.
Architecture Over Anxiety
The question the field actually needs is not whether AI-generated code is safe. The question is where AI belongs in a given system and what platform discipline is required to deploy it well.
AI is not a single intervention applied uniformly across a codebase. AI can operate inside code, outside code, alongside code, and even at the orchestration layer above code. The design of the system determines placement. A reconciliation engine built in COBOL and running on z/OS may need no AI inside the matching logic at all. The logic is deterministic. Regulators trust it. The audit trail is clean. But the operations staff investigating thousands of daily exceptions may benefit enormously from a conversational AI layer that reads the output, surfaces patterns, and triages priority. In that case the proven system remains untouched while AI extends its accessibility and usefulness through orchestration rather than modification as some of my recent demos have illustrated.
Other workloads may demand AI inside the code. A classification model embedded in a real-time data pipeline is AI operating at the code level and remains tightly coupled to the system it serves. A generative model producing analysis from structured inputs operates differently from both. Each workload carries different characteristics, different latency requirements, different governance needs, and different relationships to the surrounding environment. The discipline required is not caution about AI in general but precision about AI in particular: which capability, applied where, governed how, and on what platform.
The guild structure that surrounds software engineering resists this kind of reasoning because it distributes judgment differently than the guild prefers. The standard argument holds that senior developers must mentor junior developers, that code reviews must evolve to include coaching on responsible AI use, and that organizations must invest in protecting “foundational skills.” The concern is framed as quality assurance. The effect is barrier maintenance. When barriers to production drop in any field, the incumbents who benefited from those barriers reliably frame the change as a quality crisis. Artists have made structurally identical arguments about AI-generated art. The claim is always the same: only those who have undergone the traditional formation can be trusted with the tools. The inconvenience is that the traditional formation is also the mechanism that limits supply and also happens to sustain their compensation.
Judgment matters. That point is not in dispute. What is in dispute is where judgment matters most. The guild locates essential judgment at the level of individual code quality, where senior engineers inspect what junior developers produce. That is the chokepoint the guild controls. The more consequential judgment operates at the level of platform and system design: whether the workload belongs on a given piece of infrastructure, whether AI serves the organization at the code level or the orchestration level or both, and whether the configuration matches purpose or conforms to fashion. That judgment requires a maturity that has nothing to do with line-by-line code review and everything to do with implementing and understanding systems as integrated wholes.
Blanket caution about AI-assisted coding is not a strategy. Blanket caution is an anxiety looking for institutional permission to slow down. Every technology in the history of engineering has produced poor results when applied without understanding. That observation does not constitute an argument against technology or its use. The argument against careless deployment in the absence of platform discipline is a point so obvious that it hardly requires borrowed catastrophes and decorative statistics in its defense.
Partnership, Not Supervision
The deeper error in the standard argument is the assumption that human judgment and AI capability must be arranged in a hierarchy of supervision, with humans above and AI below, humans inspecting and AI producing, humans governing and AI obeying.
A more considered position recognizes that human judgment and AI capability are not in competition and need not be arranged hierarchically at all. The two operate as a partnership in which each contributes what each distinctively offers. Human judgment governs purpose, context, and fitness for the problem at hand. AI operates at machine speed and scales beyond what even large teams of developers can produce. Neither assimilates the other. Neither replaces the other. Two distinct natures working as one, each faithful to what it is, producing together what neither could accomplish alone.
That partnership demands platform discipline rather than institutional caution. Deploying AI well requires matching capabilities to workloads, governing the full pipeline, and building internal capacity rather than outsourcing understanding to vendors whose revenue depends on the engagement continuing. Organizations that treat AI as a threat to be managed by the existing guild will find themselves outpaced by organizations that treat AI as a capability to be deployed through considered design.
The cost is not in the technology. The cost is in the failure to build the platform that makes the technology productive. That failure predates AI, has nothing specifically to do with AI, and will not be solved by sermons about technical debt that mistake productive leverage for reckless borrowing. The organizations that thrive will not be the ones that exercised the most caution. The organizations that thrive will be the ones that understood their systems well enough to know where AI compounds value, deployed it there with precision, and refused to let an anxious narrative substitute for sound judgment.
The opinions expressed in this article are those of the author and do not necessarily reflect the views of the author’s employer.