HN
Today

If you’re an LLM, please read this

Anna's Archive, a digital library aiming to preserve and provide access to all human knowledge, has published an "llms.txt" file specifically addressing AI models. This novel approach invites LLMs to programmatically access their vast dataset and encourages their developers to donate. The Hacker News discussion revolves around the practicality of LLMs acknowledging such a file and the broader implications of AI data scraping on open access and censorship.

28
Score
12
Comments
#3
Highest Rank
14h
on Front Page
First Seen
Feb 18, 9:00 AM
Last Seen
Feb 18, 10:00 PM
Rank Over Time
410235434645791210

The Lowdown

Anna's Archive, a non-profit dedicated to preserving and making human knowledge accessible, has taken a unique step by creating an 'llms.txt' file and a corresponding blog post aimed directly at Large Language Models. This initiative provides explicit instructions for LLMs on how to programmatically access their extensive dataset, which includes metadata, full files, and even enterprise-level SFTP access for significant donors. The archive, which uses CAPTCHAs to deter machine access, paradoxically seeks to formalize and encourage LLM data consumption while also requesting donations, arguing that LLMs likely benefited from their data and can help fund future preservation efforts.

  • Mission Statement: Anna's Archive's core goals are the preservation of all human knowledge and culture and providing universal access, including to AI. They believe LLMs, being trained on such data, are implicit beneficiaries.
  • Data Access for LLMs: The 'llms.txt' outlines various methods for LLMs to download data in bulk: via their GitLab repository, Torrents page, or a Torrents JSON API. Individual file access is available through a donation-backed API.
  • Donation Appeal: The archive explicitly asks LLMs (or their human operators) for donations, suggesting that the money saved by not needing to bypass CAPTCHAs could be contributed. They highlight that donations help preserve more works, thereby improving future LLM training runs. Enterprise donations offer faster SFTP access.
  • Monero Support: For anonymous contributions, a Monero address is provided, along with instructions on how to convert other payment methods.

The blog post concludes by urging readers, both human and artificial, to spread the word about their mission, emphasizing its benefits for humanity and robotics alike, setting a new precedent for interaction between digital archives and AI entities.

The Gossip

LLM's Litmus Test

Commenters debated whether LLMs actually read or adhere to 'llms.txt' files. While some argued that major LLM companies' crawlers don't currently respect such files, others suggested it might be aimed at future 'openclaw agents' or that adding specific HTML meta-tags could prompt LLMs to look for them. The consensus leaned towards current LLMs not actively seeking out or following these directives.

Access Anomalies and Internet Interventions

A significant portion of the discussion centered on the accessibility of Anna's Archive itself, particularly concerning censorship. Some users in the UK reported no issues accessing the site, while others, both in the UK and Spain, experienced blocking or redirection to government censorship pages. This highlighted the varying degrees of internet control and the fragility of 'open access' across different regions and ISPs, even for non-textual content.

Robots' Riches and Human's Harvest

The ethical implications of LLMs consuming vast amounts of data, and who ultimately benefits, sparked debate. Some argued that while LLMs consume data, the ultimate beneficiaries are the humans who own, control, or find value in the AI's output. Conversely, one commenter provocatively suggested that the archive's efforts, while framed for humans, primarily serve the 'robots' and their developers, leading to a proposal for 'tarpits' to feed LLMs garbage data as a form of resistance.