Biohub releases a world model of protein biology
Biohub has unveiled a groundbreaking AI-driven "world model" of protein biology, comprising ESMC, ESMFold2, and ESM Atlas, designed for rapid prediction, design, and discovery of protein structures and binders. This open-source initiative promises to revolutionize drug discovery and fundamental biological research by shifting therapeutic design from years to days. Hacker News is captivated by the potential of AI applied to complex biological problems, particularly its open, non-profit nature and implications for human health.
The Lowdown
Biohub has introduced a transformative "world model" of protein biology, a scientific engine aimed at accelerating prediction, design, and discovery of proteins and their interactions. This ambitious project seeks to map proteins across the tree of life, predict their 3D structures, and design novel protein binders that demonstrate functionality in laboratory settings, significantly impacting medicine and fundamental biological understanding.
- ESMC (Evolutionary Scale Modeling Collection) is the foundational language model, trained on approximately 2.8 billion protein sequences from across all life, internalizing the fundamental rules governing protein folding, interaction, and function.
- ESMFold2 is the design engine that converts ESMC's sequence representations into atomically-resolved 3D structures of biomolecular complexes. It has been used to computationally design protein binders against five key cancer and immunology targets in days, with lab-validated results showing high affinity, specificity, and stability, suggesting de novo solutions.
- ESM Atlas provides navigability across 6.8 billion protein sequences and 1.1 billion predicted structures, organizing proteins by learned relationships and revealing evolutionary links not previously captured in existing databases.
- Crucially, Biohub is making all three components freely available to the global scientific community through its Biohub Platform, emphasizing its commitment to open science as a 501(c)(3) non-profit organization.
This open ecosystem, built on the premise that evolution's patterns implicitly encode physical rules, offers a state-of-the-art foundation for researchers globally. It promises to dramatically reduce the time and cost associated with early therapeutic discovery and deepen our understanding of biology at its most fundamental level, moving towards personalized cures for various diseases.
The Gossip
Modeling Musings & Mismatches
Commenters discuss the current state and challenges of protein modeling. Some acknowledge the impressive work but point out that domain-specific finetuning often yields higher accuracy, and current models still struggle with atomic-level precision, leading to "hit-and-miss" results in real-world application. There's also debate on whether models truly explore outside "known" biological semantics or merely regurgitate training data, though the paper's finding of no matches for designed binders is noted as promising. The difficulty in predicting protein-protein binding due to data scarcity is a recurring theme.
Openness & Organizational Appreciation
A significant point of praise and discussion revolves around Biohub's non-profit status and its decision to release the model under an MIT license, making it genuinely open source. This stands in contrast to common practices in similar high-profile AI/science projects, generating appreciation and trust among commenters. The mission statement "Our mission is to cure or prevent all disease" by a non-profit organization resonated positively.
Hacker News's Bio-Blindness
Several commenters observe the relatively low number of comments on such a significant biological breakthrough, leading to a meta-discussion about the HN community's engagement with science outside of pure software. Theories range from a lack of deep biological knowledge among most software-centric users to a perception that many HN users prioritize financial gain and prestige over genuine intellectual curiosity, especially when biology is seen as "squishy and weird" compared to the structured world of software. A counterpoint suggests biology is often predictable, but its variations are what scientists are trying to understand.
Promising Prospects & Perilous Potentials
Commenters express excitement over the vast potential of this technology to accelerate drug discovery, especially in areas like antibody-based therapies, noting the shift from empirical screening to computation-guided design. However, some also raise concerns about the ethical implications and potential misuse of such powerful biological design tools, pondering the "scary" aspects of lowering the barrier of entry to complex biological engineering.