Lean proved this program correct; then I found a bug

A fuzzer discovered critical bugs in the Lean runtime and an unverified part of a formally verified zlib implementation, challenging common perceptions of

Score

Comments

Highest Rank

22h

on Front Page

First Seen

Apr 14, 1:00 AM

Last Seen

Apr 14, 10:00 PM

Rank Over Time

The Lowdown

This story details an experiment where a formally verified zlib implementation, lean-zip, was subjected to extensive fuzzing. The lean-zip project, touted as an end-to-end correct zlib by Lean's chief architect, was rigorously tested by the author using an AI agent (Claude) armed with fuzzing tools.

lean-zip was initially stripped of theorems, documentation, and C FFI bindings to focus solely on its native, verified Lean code.
Over 105 million fuzzing executions were performed across various attack surfaces of the library.
The experiment found zero memory vulnerabilities in the verified application code of lean-zip itself, highlighting its robustness.
However, two significant bugs were discovered: a heap buffer overflow in the Lean 4 runtime (lean_alloc_sarray), triggerable by a crafted ZIP file, and a denial-of-service vulnerability in lean-zip's unverified archive parser due to unchecked compressedSize values.
The runtime bug affects every Lean 4 program, showcasing a flaw in the 'trusted computing base.'
The DoS in the archive parser demonstrates the limitations of verification when not applied comprehensively to all parts of a system, particularly I/O and parsing logic.

Ultimately, while the core verified logic of lean-zip held up impressively, the findings underscore that formal verification is only as strong as its specifications and the integrity of its underlying runtime and unverified dependencies. The bugs lay outside the strict boundaries of what was explicitly proven, prompting reflection on the scope and implications of verification claims.

The Gossip

Title Tiff

Many commenters debated whether the story's title was misleading, arguing that the author found no bugs in the *formally proven* part of the `lean-zip` codebase. Critics called it 'clickbait,' suggesting it implied a flaw in the verification process itself rather than in the runtime or unverified components. The author, 'gopiandcode,' defended the title, asserting that from an end-user perspective, a bug in any part of the binary, especially if advertised as 'verified,' constitutes a failure, and clarified that the unverified parser *was* part of the `lean-zip` project.

Verification's Vulnerabilities

This theme explored the inherent limitations of formal verification. Commenters emphasized that verification is only as good as its specifications, highlighting the challenge of perfectly capturing human intention and anticipating all edge cases ('spec bugs'). The discussion pointed out that formal verification cannot guarantee correctness if the underlying 'trusted computing base' (like the Lean runtime, OS, or hardware) has flaws, or if critical components like parsers are left unverified. Many stressed that verification proves adherence to a *specific* specification, not general bug-freeness.

Philosophical Proof Problems

A significant philosophical debate emerged regarding the possibility of 'proving a negative' (e.g., proving a program has no bugs) and the implications of the Halting Problem. Some argued that proving the *absence* of bugs is impossible, leading to an infinite regress of verification. Others countered that 'proving a negative' is routine in mathematics and formal logic, and that the Halting Problem's limitations apply to *arbitrary* programs, not necessarily the specific, well-defined problems addressed by formal verification. The core idea is that while perfect certainty might be elusive, formal methods significantly reduce the scope for error.