2009 Australia meta

seL4 — Formal Verification of an OS Kernel

Gerwin Klein et al., NICTA, 2009 — first formally verified general-purpose OS microkernel. ~10,000 lines of C, verified down to assembly. About twenty-five person-years of proof.

Gerwin Klein, Kevin Elphinstone, Gernot Heiser, and a team at NICTA in Sydney delivered the first machine-checked correctness proof of a general-purpose operating-system microkernel. seL4 is approximately 10,000 lines of C, verified using the Isabelle/HOL theorem prover via two large refinement steps: an abstract specification refined to an executable Haskell prototype, refined to the C implementation, ultimately related to the compiled binary. The proof establishes functional correctness: the kernel never violates its specification, never crashes, never violates memory safety, and enforces the access-control properties it claims. Effort: roughly twenty-five person-years of proof for ten thousand lines of code, a ratio of roughly 200,000 lines of proof per 10,000 lines of code. seL4 now ships in safety-critical and security-critical systems including military aviation and autonomous-vehicle platforms. Existence proof: a real OS kernel can be proved correct.

In plain terms

An operating-system kernel is the part of the system that everything else trusts. When the kernel has a bug, every program running on top of it inherits the bug. Every modern OS kernel — Linux, Windows NT, Darwin — has thousands of CVEs filed against it over its lifetime. This is normal. Kernels are large, complex, written in C, and impossible to fully test. seL4 is not normal. It is the first general-purpose OS microkernel to be proved correct, in the formal sense — meaning a mathematical proof, mechanically checked, that the kernel meets its specification. Not "we tested it a lot and it seems to work." A proof. The team at NICTA in Sydney spent roughly twenty-five person-years on this. The kernel itself is about ten thousand lines of C. The proof is about two hundred thousand lines of Isabelle/HOL. Twenty lines of proof per line of code. That is the cost. What does the proof guarantee? Functional correctness — the kernel never deviates from its specification. Memory safety — no buffer overflows, no use-after-free, no NULL dereferences, ever. Termination of every kernel call. Information-flow security — no covert channels between processes. The proof goes from the abstract specification, through a Haskell-like executable specification, through the C implementation, to the compiled assembly. Every step is mechanically related. If a bug existed in the kernel, the proof would not check. seL4 ships today in real systems. Boeing uses it in autonomous-vehicle platforms. Lockheed uses it in military aviation. The HACMS program (DARPA, mid-2010s) used seL4 in an unmanned helicopter and red-teamed it against attackers — the attackers could not break out of seL4-isolated processes even with full knowledge of the source. The lesson is the same as CompCert. We can write software that does not degrade. The technology exists. Hoare gave us the language. Coq and Isabelle give us the checkers. CompCert and seL4 are the existence proofs at the size of real systems. What we cannot do is afford to do this for all software. The proof tax is too high for software whose failures are inconvenient rather than fatal. This is the asymmetry the AI era is making worse. AI tools push the cost of writing untrusted code toward zero. They do not push the cost of proving code toward zero. They make the cheap path cheaper and leave the expensive path expensive. The gap widens. Twenty-five person-years. Ten thousand lines. The price of trust.