19
Coding on Copilot — Downward Pressure on Code Quality Bill Harding, GitClear, January 2024 — analyzed 153 million lines of code. Code churn doubling. AI-era code resembles an itinerant contributor.
Bill Harding, founder of GitClear, published the first large-scale empirical analysis of code-quality trends across the Copilot adoption period. GitClear instrumented 153 million lines of changed code from 2020 through 2023, distinguishing among added , deleted , updated , moved , and copy-pasted lines, and tracking churn — the percentage of authored lines reverted or rewritten within two weeks. Findings: churn projected to double by 2024 versus the 2021 pre-AI baseline; ratio of added and copy-pasted code rising; ratio of updated, moved, and deleted code falling. Harding's framing: AI-era code resembles "an itinerant contributor" — someone who shows up, adds material, never refactors, and leaves. The 2025 follow-up (GitClear, 211 million lines analyzed) reported an eight-fold year-over-year increase in five-line-or-more duplicate code blocks. First industry-scale measurement of the maintenance-side cost of AI coding tools.
Harding, W. & Kloster, M. (2024). Coding on Copilot: 2023 Data Suggests Downward Pressure on Code Quality. GitClear Research Report. Follow-up: Harding, W. (2025). AI Copilot Code Quality: 2024 Data Suggests 4× Growth in Code Clones. GitClear Research Report. Source → 🇺🇸 Cultural context · United States
Post-pandemic American software was deep into the AI productivity narrative. GitHub Copilot had been generally available for a year and a half; major employers were adopting it as a default. Industry self-reports were uniformly positive. GitClear was a small Seattle company whose business — granular code-change analytics — gave it instrumentation that nobody else had. Harding's report dropped into a discourse that had not been challenged with hard data, and it was widely cited within weeks. The 2025 follow-up sharpened the result with a year more of data and confirmed the trajectory.
In plain terms
Most claims about AI productivity are based on developer self-reports. Surveys. "Did you feel faster this week?" These are unreliable for the same reason all self-reports are unreliable — people remember the wins and forget the losses, and they want to like the tools their employer paid for. GitClear did not run a survey. GitClear measured what the code did.
The instrumentation is the interesting part. Most code-metrics tools count lines. GitClear classifies change types . When you commit a line, did you write it from scratch (added)? Did you tweak an existing line (updated)? Did you move a chunk of logic from one file to another without changing it (moved)? Did you delete it? Did you copy it from somewhere else in the codebase (copy-pasted)? Each category tells a different story about the engineering activity behind the commit.
Healthy codebases have a mix. Updates and moves dominate over time, because mature systems mostly evolve their existing structure rather than bolting on new structure. Deleted code is a sign of refactoring — the team is removing things that are no longer needed. Added and copy-pasted code rising relative to updated and moved code is a warning sign: the team is bolting on instead of integrating.
GitClear's finding, across 153 million lines and four years of data, is that the warning sign is now lit. The Copilot-era ratio shifted decisively toward added and copy-pasted, decisively away from updated, moved, and deleted. The team is not refactoring. The team is accreting. And the churn number — lines that are reverted or substantially rewritten within two weeks of being committed — is rising fast. Churn is the empirical signature of code that was wrong on first commit and had to be redone.
The 2025 follow-up sharpened the result. Five-line-or-more duplicate blocks grew eight-fold year over year. Not eight percent. Eight times. The DRY principle, which has been the load-bearing axiom of software engineering for fifty years, is being violated at industrial scale. And the violations are happening because the AI tool, on each call, regenerates code from scratch with no awareness of what already exists in the codebase. Each call is an itinerant contributor.
This is the Lehman entropy result of chapter nine, observed empirically in 2024. Software degrades unless explicit work is done to reduce its complexity. AI tools are doing the opposite of that work. They are adding faster than they are integrating. And the maintenance bill compounds.
Add code. Do not refactor. The itinerant contributor signs every commit.