Lean proved this program correct; then I found a bug
A fuzzer discovered critical bugs in the Lean runtime and an unverified part of a formally verified zlib implementation, challenging common perceptions of
The Lowdown
This story details an experiment where a formally verified zlib implementation, lean-zip, was subjected to extensive fuzzing. The lean-zip project, touted as an end-to-end correct zlib by Lean's chief architect, was rigorously tested by the author using an AI agent (Claude) armed with fuzzing tools.
lean-zipwas initially stripped of theorems, documentation, and C FFI bindings to focus solely on its native, verified Lean code.- Over 105 million fuzzing executions were performed across various attack surfaces of the library.
- The experiment found zero memory vulnerabilities in the verified application code of
lean-zipitself, highlighting its robustness. - However, two significant bugs were discovered: a heap buffer overflow in the Lean 4 runtime (
lean_alloc_sarray), triggerable by a crafted ZIP file, and a denial-of-service vulnerability inlean-zip's unverified archive parser due to uncheckedcompressedSizevalues. - The runtime bug affects every Lean 4 program, showcasing a flaw in the 'trusted computing base.'
- The DoS in the archive parser demonstrates the limitations of verification when not applied comprehensively to all parts of a system, particularly I/O and parsing logic.
Ultimately, while the core verified logic of lean-zip held up impressively, the findings underscore that formal verification is only as strong as its specifications and the integrity of its underlying runtime and unverified dependencies. The bugs lay outside the strict boundaries of what was explicitly proven, prompting reflection on the scope and implications of verification claims.
The Gossip
Title Tiff
Many commenters debated whether the story's title was misleading, arguing that the author found no bugs in the *formally proven* part of the `lean-zip` codebase. Critics called it 'clickbait,' suggesting it implied a flaw in the verification process itself rather than in the runtime or unverified components. The author, 'gopiandcode,' defended the title, asserting that from an end-user perspective, a bug in any part of the binary, especially if advertised as 'verified,' constitutes a failure, and clarified that the unverified parser *was* part of the `lean-zip` project.
Verification's Vulnerabilities
This theme explored the inherent limitations of formal verification. Commenters emphasized that verification is only as good as its specifications, highlighting the challenge of perfectly capturing human intention and anticipating all edge cases ('spec bugs'). The discussion pointed out that formal verification cannot guarantee correctness if the underlying 'trusted computing base' (like the Lean runtime, OS, or hardware) has flaws, or if critical components like parsers are left unverified. Many stressed that verification proves adherence to a *specific* specification, not general bug-freeness.
Philosophical Proof Problems
A significant philosophical debate emerged regarding the possibility of 'proving a negative' (e.g., proving a program has no bugs) and the implications of the Halting Problem. Some argued that proving the *absence* of bugs is impossible, leading to an infinite regress of verification. Others countered that 'proving a negative' is routine in mathematics and formal logic, and that the Halting Problem's limitations apply to *arbitrary* programs, not necessarily the specific, well-defined problems addressed by formal verification. The core idea is that while perfect certainty might be elusive, formal methods significantly reduce the scope for error.