HN
Today

All of human cooking compressed into 2 megabytes

A new paper introduces Epicure, a family of AI models that compress vast culinary knowledge into ingredient embeddings, derived from millions of recipes across diverse languages. While promising a deeper understanding of flavor compatibility and cooking patterns, the project has sparked debate on Hacker News regarding its ambitious title and the true scope of 'human cooking' it represents. Many commenters expressed skepticism about the universality of its dataset and whether such a complex domain can be reduced to a 2MB model without losing crucial cultural or technical nuances.

189
Score
74
Comments
#5
Highest Rank
7h
on Front Page
First Seen
May 27, 12:00 PM
Last Seen
May 27, 6:00 PM
Rank Over Time
667591114

The Lowdown

The paper "Epicure: Navigating the Emergent Geometry of Food Ingredient Embeddings" presents a novel approach to understanding culinary relationships through artificial intelligence. By analyzing a massive dataset of 4.14 million recipes spanning seven languages, the researchers developed 'skip-gram' ingredient embeddings, effectively distilling the essence of ingredients and their compatibility into compact, 2-megabyte models.

Key aspects of the research include:

  • Data Aggregation: Over 4 million recipes from 11 sources across English, Chinese, Russian, Vietnamese, Spanish, Turkish, Indonesian, German, and Indian-English were collected.
  • Ingredient Normalization: An LLM-augmented pipeline processed raw ingredient strings into 1,790 canonical entries, standardizing disparate ingredient names.
  • Embedding Models: Three variants of Metapath2Vec models (Cooc, Chem, Core) were trained, exploring ingredient co-occurrence, chemical compound relationships, or a blend of both.
  • Geometry of Food: The embeddings create a geometric space where ingredient proximity indicates culinary compatibility, aiding in the discovery of flavor pairings and recipe generation.
  • Practical Application: A public demo showcases the models' ability to suggest compatible ingredients and generate recipes based on user selections.

This work aims to computationally map the vast, often intuitive, landscape of human cooking, offering a structured way to explore and innovate within the culinary arts by leveraging large-scale data and advanced embedding techniques.

The Gossip

Title Troubles & Data Discrepancies

The most prominent discussion revolved around the paper's ambitious title, 'All of human cooking compressed into 2 megabytes,' which many users quickly dismissed as misleading clickbait. Critics pointed out the limited scope of the dataset, specifically highlighting the absence of significant culinary traditions like French and Italian recipes in their original language, and the imbalance introduced by the high proportion of Chinese/Korean recipes. There was a strong sentiment that merely compressing ingredients, and not techniques or cultural context, severely limits the claim of representing 'all of human cooking.'

Demo Divulges Details & Limitations

Users who explored the accompanying demo provided detailed feedback on its capabilities and shortcomings. While impressed by its ability to intelligently suggest recipes for common ingredients and cuts (e.g., braising lamb), the demo struggled with more obscure or regional ingredients not in its 1,790 canonical list. Specific issues included localization problems (e.g., 'pumpkin seeds' but not 'pumpkin' as 'squash'), and difficulty generating expected results for common combinations like lamb and avocado in a salad, often defaulting to 'wannabe-fancy dishes' without proper technique or context. Concerns were also raised about automated cooking removing the 'soul' of food.

Recipe Reimagination & Schematic Solutions

A tangent emerged around alternative, more visual ways to represent recipes, sparked by one commenter's shared 'schematic' approach. Many found these dependency-graph-like visualizations to be a highly readable and intuitive way to understand the cooking process, particularly for complex recipes or coordinating multiple cooks. Users praised the clarity over traditional text-heavy recipes and suggested potential applications for interactive displays, despite some initial skepticism from non-engineers. The discussion included comparisons to 'Cooking for Engineers' and 'Modernist Cuisine' formats.

Computational Culinary Chemistry

Several comments delved into the underlying science and computational aspects of food pairing. There was interest in how the embeddings relate to established concepts like flavor networks based on shared volatile aroma compounds, or simpler principles like the 'Salt Fat Acid Heat' framework. Users discussed the potential for AI to uncover surprising but compatible flavor combinations and the long-term goal of mapping ingredients to human smell receptors. Technical points were also raised about the LLM's 'deterministic decoding,' clarifying that low temperature does not guarantee true determinism.