25 Years of Eggs

A dedicated individual, who has diligently scanned every household receipt since 2001, embarked on a project to analyze this vast personal dataset using modern AI. The goal: track every egg purchase across 11,345 receipts. This endeavor highlighted both the incredible capabilities and the stubborn limitations of current AI tools when faced with real-world data.

Initial AI Orchestration: The author initiated the project by tasking two AI coding agents, Codex and Claude, with the challenge. These agents quickly explored the file system, discovered the trove of receipts, and generated a project plan, enabling rapid development with minimal human input.
Image Segmentation Breakthrough: Early attempts to segment receipts from multi-receipt flatbed scans using traditional computer vision techniques failed due to the "shades of white" problem. Meta's SAM3, however, proved highly effective, accurately identifying receipt boundaries with a single API call, outperforming hours of classical CV development.
Streamlined Orientation: Rather than building complex pipelines to orient rotated receipts for OCR, the author discovered that simply feeding the unoriented images directly to Claude and Codex allowed them to correctly interpret the text, effectively bypassing a significant technical hurdle.
Next-Gen OCR Adoption: Tesseract was identified as a major bottleneck, struggling with old thermal prints and accuracy. It was replaced by PaddleOCR-VL, a local vision-language model. After implementing a dynamic slicing solution for tall receipts, this new OCR engine successfully processed all 11,345 receipts overnight, yielding significantly cleaner text.
Robust Structured Extraction: Initial regex-based attempts to extract egg data proved inadequate. The author pivoted to feeding every receipt through Codex and Claude, which autonomously developed a robust, parallelized extraction pipeline with features like sharding, checkpointing, and auto-switching between models when token limits were hit.
LLM-Powered Classification: To ensure high accuracy, a Flask-based labeling tool was built by Claude, allowing the author to hand-label ground truth. This labeled data, combined with an LLM classifier, achieved over 99% accuracy in identifying egg receipts, even correcting errors in the human-generated ground truth.
Iterative Quality Improvement: The project demonstrated that real-world data is inherently messy (e.g., folder typos, mirrored scans, email parsing quirks). However, the ability to iteratively show AI agents issues and have them apply fixes globally proved crucial for achieving high data quality.

Over 14 days, with only 15 hours of direct human involvement and a token cost of approximately $1,591, the project successfully identified 589 egg receipts, totaling $1,972 spent on 8,604 eggs. The key takeaway was the effectiveness of combining specialized models for tasks like image processing with the orchestration and extraction capabilities of large language models.

The Lowdown