20
2024 DORA Accelerate State of DevOps Report Google Cloud, October 2024 — a 25% rise in AI adoption associates with a 7.2% drop in delivery stability. The trade is now visible at industry scale.
The 2024 Accelerate State of DevOps Report, published by Google Cloud and the DORA research team, surveyed approximately 39,000 software professionals worldwide. The headline finding: every 25% increase in self-reported AI tool adoption was associated with an estimated 1.5% decrease in software delivery throughput and a 7.2% decrease in software delivery stability — even as developers reported a 2.1% increase in productivity and 2.6% increase in job satisfaction over the same lift. 39% of respondents reported "little to no trust" in AI-generated code. The report explicitly frames this as a stability cost: AI accelerates work that would have happened anyway, but the quality cost shows up in the systems where the work lands. Industry-scale corroboration of the GitClear and Stanford findings, in the most-cited annual measurement of software-delivery health.
DORA Research Team & Google Cloud (2024). 2024 Accelerate State of DevOps Report: Impact of Generative AI in Software Development. Google Cloud. Source → 🇺🇸 Cultural context · United States
Google in 2024 was internally betting heavily on AI as the engine of the next decade of growth, while running the most-cited measurement of software-delivery health on the planet. The DORA program — originally led by Nicole Forsgren before its acquisition by Google — had a methodology hardened by a decade of annual reports and a research team accustomed to publishing what the data showed. The 2024 report's AI findings were reported alongside marketing claims of AI productivity — an internal tension visible in the document itself. The methodology is rigorous; the framing is corporate. The reader is left to weigh both.
In plain terms
DORA is the longest-running and most-cited measurement of how software organizations actually perform. Started by Nicole Forsgren, Jez Humble, and Gene Kim in the early 2010s, the annual State of DevOps report surveys tens of thousands of software professionals on a small set of well-validated outcome metrics: how often do you deploy, how long does it take a change to reach production, what fraction of changes cause failures, how long does it take to recover from a failure. These are the metrics that distinguish high-performing teams from low-performing ones, and they have been measured consistently for a decade.
The 2024 report was the first to seriously study AI tooling. The result was the most carefully measured, statistically defensible version of what GitClear had observed and what the Stanford CCS study had found in the lab. At industry scale, more AI use is associated with more stability problems.
The numbers are worth memorizing. A 25% rise in AI adoption — meaning a quarter of the team starting to use AI tools who was not using them before — is associated with a 1.5% drop in delivery throughput and a 7.2% drop in delivery stability. Throughput is roughly how fast changes ship. Stability is roughly how often shipped changes break things. Throughput moving slightly down is notable. Stability moving down 7.2% is the signal.
Why the asymmetry? Because writing new code is a local activity, and AI is good at it. Stable delivery is a global activity — it depends on the new code interacting cleanly with everything that already exists, on regression tests catching problems, on rollbacks working when they have to. AI tools accelerate the local part. They do nothing to help the global part. The result is faster commits with more incidents downstream.
The 39% "little to no trust" figure is the other one to remember. Even among teams that are using these tools, the developers themselves are reporting that they do not fully trust the output. This is the opposite of what the Stanford study found in the lab — but the Stanford finding was about security tasks, where users overestimated their AI-assisted work. The DORA finding is about general trust, and is properly skeptical. The combined picture: developers report distrust on average, but in specific high-stakes contexts they trust the tool more than they should.
DORA is the keystone of the empirical case. It is the cross-industry measurement that connects the laboratory results (Stanford) to the codebase-instrumentation results (GitClear) and to the per-developer time-cost measurement (METR, next chapter). The trade is real. The trade is measured. The trade is being made anyway.
Throughput slightly up. Stability down 7.2%. The trade is visible.