Show HN: I taught LLMs to play Magic: The Gathering against each other
A developer has trained Large Language Models to play Magic: The Gathering, leveraging the open-source XMage platform to navigate the game's intricate rules and strategic depth. This "Show HN" project demonstrates current AI capabilities in complex, information-rich environments, even if the models still make blunders. The Hacker News community is captivated by the potential for AI to simulate strategic games and assist human players with deck evaluation and opponent practice.
The Lowdown
Mage-bench is an ambitious project that has successfully taught Large Language Models to play Magic: The Gathering. By forking the open-source XMage platform, the project allows LLMs to engage in full-fledged Magic games across various formats, making complex strategic decisions just like human players. The system presents the current game state and available actions to the LLMs, which then choose their moves, with the game engine enforcing all rules without simplification.
- The system utilizes the XMage game server, ensuring full adherence to Magic's extensive ruleset, rather than simplified versions.
- LLMs are responsible for every decision point in a game, from initial mulligans to casting spells, engaging in combat, and even attempting "politics" in multiplayer formats.
- The platform supports popular Magic formats including Commander, Standard, Modern, and Legacy, offering a wide range of strategic environments.
- While the system currently focuses on benchmarking less expensive models, it notes that frontier models have artificially low ratings due to limited game data.
- The project provides a public leaderboard, game replays, and detailed architecture explanations, along with its GitHub repository.
This initiative provides a fascinating benchmark for LLM capabilities in highly complex, dynamic, and partially-hidden information environments, offering a glimpse into the future of AI in strategic gaming.
The Gossip
AI's Strategic Edge: Deck Testing and Opponent Simulation
Many commenters immediately saw the practical application for Magic players. They envision LLMs piloting user decks for rigorous testing, providing valuable data on mana curves, threat assessment, and optimal card draw. The idea of a strong, consistent AI opponent for competitive practice, capable of offering blunder analysis akin to chess engines, resonated strongly. The author confirms this utility by implementing blunder analysis.
Language Model Limitations: Hallucinations and Hidden Information Hurdles
Despite the impressive feat, critical analysis highlights significant challenges for current LLMs in handling Magic's complexity. Commenters observed that agents frequently "hallucinate" or forget card details, misinterpret effects, and struggle with nuanced strategic evaluations, especially in games with high variance and private information (like hidden hands). The author acknowledges these issues, noting improvements but also the ongoing struggle with balancing context size and managing tool use effectively.
The Purpose of Play: Automating Entertainment vs. Utility
A philosophical debate emerged regarding the value of using LLMs for "entertainment" activities. Some questioned if these projects were merely "toy applications," suggesting AI should focus on automating life's "unpleasantries" instead. However, many defended the pursuit, arguing that game AIs are a harmless and even beneficial application, providing new forms of entertainment, benchmarking, and tools for human players without detracting from the inherent joy of the game.