Leanstral 1.5: Proof Abundance for All
Mistral AI has released Leanstral 1.5, a powerful new AI model for formal verification that significantly advances automated theorem proving and code verification. This Apache-2.0 licensed model demonstrates state-of-the-art performance on various benchmarks and has even uncovered previously unknown bugs in open-source code. Its open and practical approach makes advanced formal methods more accessible, appealing to the HN audience's interest in cutting-edge AI applications and open-source contributions.
The Lowdown
Mistral AI announces Leanstral 1.5, a significant upgrade to its open and practical AI model for proof engineering in Lean 4. This new version pushes the boundaries of formal verification, offering enhanced performance and accessibility for both mathematical proofs and real-world code analysis.
- Model Architecture: Leanstral 1.5 is a free, Apache-2.0 licensed model featuring 119B total parameters, with only 6B active, designed for efficiency and powerful proof generation.
- Benchmark Performance: It achieves state-of-the-art results, including saturating miniF2F, solving 587 out of 672 PutnamBench problems, and setting new records on FATE-H (87%) and FATE-X (34%).
- Advanced Training Regimen: The model undergoes a three-stage training process: mid-training, supervised fine-tuning, and reinforcement learning with CISPO, utilizing both a multiturn environment with Lean compiler feedback and a code agent environment for file editing and bash commands.
- Robust Evaluation: Performance is rigorously tested across benchmarks like miniF2F (formal mathematics), PutnamBench (challenging math problems), FATE-H/X (advanced abstract algebra), and FLTEval (practical proof engineering from real pull requests).
- Exceptional Test-Time Scaling: Leanstral 1.5 exhibits strong performance scaling, where increasing the token budget directly translates to more solved problems, showcasing its ability to maintain reasoning over millions of tokens.
- Code Verification Capabilities: Beyond mathematics, it successfully proved the O(log n) time complexity for insertions and deletions in an AVL tree implementation, requiring complex structural induction and monadic time tracking.
- Real-World Bug Discovery: An automated pipeline utilizing Leanstral found 11 genuine bugs, including 5 previously unreported, in 57 open-source Rust repositories by translating code to Lean and inferring properties, demonstrating its ability to catch subtle errors like an integer overflow.
- Open-Source & Accessible: The model's weights are available on Huggingface and as a free API endpoint, with clear instructions for integration into the Mistral Vibe environment.
Leanstral 1.5 marks a notable advancement in automated formal verification, combining cutting-edge AI with practical applicability. Its ability to solve complex mathematical problems and identify critical bugs in real-world code underscores its potential to make rigorous formal methods a more integral part of software development and research.