22
The Thesis Software maintenance is at least NP-hard. AI helps the easy parts. Safety-critical software is the existence proof. The bill is coming due.
This book argues a single load-bearing claim: software maintenance is at least NP-hard, and several of its core subproblems are formally undecidable. The argument has three layers. First, the deepest maintenance questions — does this refactor preserve behavior, are these two versions equivalent — are undecidable by Rice's theorem, descending from Turing's halting result. Second, the practical maintenance subproblems are NP-hard or NP-complete: register allocation, test suite minimization, module clustering, package dependency resolution, refactoring optimization, type inference for rich systems. Third, the meta-problem inherits the hardness of its hardest subproblem. AI is a heuristic. Heuristics do not change problem complexity. They change which instances are handled well in practice. AI handles the local instances; the global ones it degrades. The empirical evidence — Stanford CCS, GitClear, DORA, METR — is what that mismatch looks like at scale. Safety-critical software is the existence proof that the trap can be escaped, at the cost of two to three orders of magnitude more effort per line of code. The industry has refused to pay that cost. The bill is coming due.
Specification: Gauger, A. (2026). NP-Hard — Thesis. Foundation references: Turing 1936, Rice 1953, Cook 1971, Karp 1972, Lehman 1980, Brooks 1986, Klein et al. 2009, METR 2025. Source → 🇺🇸 Cultural context · United States
The present moment. The discourse around AI productivity is at peak intensity. The empirical evidence has accumulated to the point that it can be summarized in a single argument. The industry's investment in AI tools has accumulated to the point that reversing course would be politically costly. The two trajectories are not yet visibly intersecting. This book argues they will, and the intersection will be unpleasant. Whether the intersection is anticipated and managed, or arrived at by collision, depends on whether the discipline Hamilton named is remembered before the bill arrives.
In plain terms
Every page of this book exists to support one sentence: software maintenance is provably hard, AI does not break the proof, and the industry is now experiencing the consequences.
The proof comes in three layers.
The first layer is undecidability. Turing in 1936 showed that no algorithm can decide whether an arbitrary program halts. Rice in 1953 generalized this to every non-trivial behavioral property. The questions a maintainer most needs to answer — does this change preserve correctness, does this version behave like the previous one, is this refactoring safe — are exactly the kind of questions Rice ruled out. No tool, no AI, no formal method makes those questions decidable in the general case. We can answer them on specific programs by writing specific proofs. We cannot answer them in general.
The second layer is NP-hardness. Once we relax the question from "is this correct" to "is this good " along measurable axes, the literature has spent forty years proving the resulting optimization problems are NP-hard. Test suite minimization. Module clustering. Package dependency resolution. Register allocation. Optimal refactoring. Type inference for sufficiently rich systems is undecidable; for the tractable variants it is NP-complete. Every modern package manager runs a SAT solver internally because Di Cosmo proved it had to. Every compiler approximates an NP-complete problem on every build because Chaitin proved it had to. Software maintenance is the simultaneous solution of all these subproblems under conflicting constraints.
The third layer is inheritance. A meta-problem inherits the complexity of its hardest subproblem. Multiple of the constituent subproblems are NP-hard. Some are undecidable. Software maintenance is at least NP-hard.
Now the corollary the book is actually about. NP-hardness is a property of the problem , not the solver . A neural network is a solver. A SAT solver is a solver. Heuristics are solvers. None of them change the problem's complexity — they change which instances they handle well in practice.
LLMs are very good local solvers. They predict the next token from context. Maintenance is a global problem — preserve invariants across the whole codebase under change. Local solvers struggle with global constraints, and the empirical research is exactly what that mismatch looks like. Stanford CCS '23: AI-assisted code is less secure, and developers believe the opposite. GitClear 2024 and 2025: code added up, refactored down, churn doubling, duplication eight-fold. DORA 2024: throughput slightly up, stability down 7.2%. METR 2025: experienced developers 19% slower, while reporting 20% faster.
The other half of the book is the existence proof that we do know how to escape the NP-hard maintenance trap. Hoare gave us the proof method. Hamilton named the discipline. The Power of 10 codified the rules. CompCert proved a compiler. seL4 proved a kernel. Tokeneer proved an authentication system. DO-178C is the regulatory regime that requires the discipline for civil aviation. These projects work. They produce software that does not degrade. The cost is two to three orders of magnitude more effort per line of code.
That cost is the price of escaping NP-hardness through proof rather than through testing-and-hope. Safety-critical software pays it because the alternative is dead patients, dead pilots, dead crews. The rest of the industry has refused to pay it because the alternative was a refunded subscription.
The asymmetry is the new one. AI tools push the cost of writing untrusted code toward zero. They do not push the cost of proving code toward zero. They make the cheap path cheaper and leave the expensive path expensive. The gap widens. Maintenance debt compounds. Stability falls. Confidence rises faster than competence. And on the day the cumulative debt rolls over the cumulative productivity gain, the industry will discover what the safety-critical engineers have known since Therac-25: the discipline is not optional. It is what makes software a profession.
Maintenance is provably hard. AI helps the easy parts. The bill is coming due.