np hard
2025 United States meta
21

Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity

METR, July 2025 — randomized trial. Developers were 19% slower with AI tools. They believed they were 20% faster. The instruments are lying.

METR (Model Evaluation and Threat Research) ran the first rigorous randomized controlled trial of AI tooling on real software-engineering work. Sixteen experienced open-source developers, with an average of five years on their own repositories, completed 246 randomly assigned tasks under two conditions: AI-allowed (Cursor Pro with Claude 3.5/3.7 Sonnet) and AI-forbidden. Result: developers using AI took 19% longer to complete tasks (95% CI: +2% to +39%), reversing the predicted speedup. Critically, developers self-reported a 20% speedup from AI, and outside experts (economists predicted 39% speedup; ML researchers predicted 38%) were also wrong in the same direction. The first measurement that separates perceived productivity from measured productivity in this domain. AI does not slow down all developers — it slows down experienced developers working on familiar codebases, exactly where global-invariant maintenance is most concentrated.

METR (2025). Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity. arXiv:2507.09089. Source →