2009 France meta

CompCert — A Formally Verified C Compiler

Xavier Leroy, INRIA, 2009 — a C compiler whose correctness is mechanically proved in Coq. 100,000 lines of proof for ~10,000 lines of compiler. The proof tax made visible.

Xavier Leroy, at INRIA Rocquencourt, led the development of CompCert — a moderately optimizing C compiler whose semantic preservation property is formally proved in the Coq proof assistant. The proof guarantees that any safety property holding of the source program also holds of the compiled assembly: no compiler bug can introduce a vulnerability that was not already present in the source. Targets PowerPC, ARM, x86, and RISC-V. Subset of C is Clight, a large fragment. The proof is roughly 100,000 lines of Coq for a compiler of roughly 10,000 lines of code, representing approximately six person-years of effort. CompCert is now used commercially in safety-critical certification, especially for software targeting DO-178C Level A. The point is not that the compiler is fast (it is competitive but not the fastest); the point is that, alone among production C compilers, it does not have miscompilation bugs.

In plain terms

Most compilers have miscompilation bugs. This is one of the dirty secrets of systems software. Csmith — a randomized C-program generator — finds dozens of latent miscompilation bugs in GCC and Clang every year by feeding them random programs and comparing outputs across versions. These are bugs where the source code has one meaning and the compiler emits assembly with a different meaning. Most of them are caught before they ship. Some are not. When a critical bug ends up in production, the patch goes out and the world moves on. This is fine for ordinary software. It is not fine for software that flies airplanes, drives medical devices, or controls nuclear reactors. For those domains, an undetected miscompilation could kill people. And it has — Therac-25 was not a miscompilation, but the lesson is the same: software that has not been proven correct cannot be assumed correct, and ad-hoc testing is not proof. CompCert is the answer. Xavier Leroy and his collaborators wrote a C compiler in Coq, and along with the compiler they wrote a proof — a hundred thousand lines of mechanically checked argument — that the compiler cannot miscompile. The proof is of a property called semantic preservation: for every input C program, the assembly the compiler emits behaves exactly as the C program is specified to behave. Not "almost exactly." Not "with high probability." Exactly. The proof is checked by Coq — a small kernel of trusted code, an order of magnitude smaller than the compiler itself — and if the proof checks, the compiler is correct. The cost was six person-years for ten thousand lines of compiler. That is ten times the engineering effort of building an unverified compiler of comparable scope. And CompCert is not as aggressive an optimizer as GCC at -O3 — though it is competitive at -O1. This is the proof tax. You pay it in years. You get back software that does not lie. CompCert is now sold commercially by AbsInt. It is used to compile flight-control software for civil aviation under DO-178C Level A — the highest assurance level, where a software failure can cause loss of life. In that domain, the proof tax is worth it. In most domains it is not. And that is the asymmetry the rest of the book turns on: when human life is at risk, the industry knows how to write software that does not degrade. When it is not, the industry does not bother. 100,000 lines of proof. One compiler that does not lie.