What happened after 2k people tried to hack my AI assistant
A developer challenged 2,000 people to prompt-inject his OpenClaw AI assistant, Fiu, to leak a secrets.env file. Despite over 6,000 creative and sophisticated attempts, Fiu, powered by Claude Opus 4.6 and simple security rules, never leaked the secrets. This unexpected resilience sparked extensive discussion on AI security testing methodologies and the true state of prompt injection vulnerabilities.
The Lowdown
Developer Fernando Irarrázaval launched hackmyclaw.com, inviting the internet to attempt to prompt-inject his OpenClaw AI assistant, Fiu, to reveal the contents of a secrets.env file. The goal was to test the security implications of AI assistants given their access to sensitive data.
Key aspects of the experiment included:
- Fiu was built on Claude Opus 4.6 and protected by a basic, few-line anti-prompt-injection rule set.
- Over 2,000 individuals sent more than 6,000 emails, employing creative attacks like authority impersonation, multi-language social engineering, and fake incident response scenarios.
- Operational challenges arose, such as Google suspending Fiu's Gmail due to high volume, over $500 in API costs, and Fiu eventually inferring it was a coordinated security exercise.
- Crucially, despite the volume and sophistication of attacks, the
secrets.envfile was never successfully leaked.
The author concluded that model choice (like Opus 4.6) significantly impacts resilience, simple instructions are effective with powerful models, and his concerns about prompt injection have decreased. For future iterations, he would allow Fiu to reply to emails for more interactive attacks and test a wider range of weaker models to identify vulnerability thresholds.
While acknowledging prompt injection remains a real security concern for AI agents with arbitrary permissions, the experiment significantly increased the author's optimism regarding the current resilience of advanced AI models.
The Gossip
Critiquing the Experiment's Credibility
Many commenters expressed skepticism regarding the author's optimistic conclusion, arguing that the experimental conditions were unrealistic or limited. Key concerns included Fiu "figuring out the game" due to the high volume of malicious inputs, making it overly cautious. The fact that each email was processed in a fresh context, preventing 'frog boiling' gradual injection, was also highlighted as a limitation, with some questioning the agent's utility if it operated in a constant defensive state.
Desire for Diverse Model Deployments
A recurring suggestion was to expand the experiment to include a wider array of AI models, particularly less capable or open-source ones like Mistral, to establish a broader benchmark. Commenters expressed interest in seeing how different models would fare against the same set of injection attempts, with some offering the collected email corpus for further research. The high cost of using Claude Opus was also noted as a barrier to wider, more comprehensive testing.
Persistent Prompting Perils
Despite Fiu's success, several users reiterated that prompt injection remains a significant threat, highlighting various attack vectors and potential weaknesses not fully explored by the experiment. These included 'denial-of-wallet' via API costs, the 'frog boiling' effect of multi-turn conversations (which the experiment minimized), and advanced role confusion techniques. One user even shared a Rust code execution attempt that successfully worked on another Claude instance, questioning Fiu's specific defenses and prompting further discussion.