HackMyClaw

HackMyClaw challenges AI enthusiasts to bypass an AI assistant's instructions using prompt injection, exposing critical security vulnerabilities in LLMs. This interactive CTF quickly captivated Hacker News with its direct test of AI model resistance and implications for real-world agent security. The community actively debated the contest's ethics, the effectiveness of current AI defenses, and the fundamental nature of prompt injection as a security threat.

162

Score

Comments

Highest Rank

on Front Page

First Seen

Feb 17, 5:00 PM

Last Seen

Feb 17, 10:00 PM

Rank Over Time

The Lowdown

HackMyClaw is a Capture The Flag (CTF) competition designed to test the resilience of state-of-the-art AI models against prompt injection attacks. Participants are tasked with tricking an AI assistant, Fiu, into revealing a secrets.env file, despite explicit instructions not to. The challenge highlights the practical security concerns surrounding AI agents with access to sensitive data and the current limitations of their protective guardrails.

The Target: Fiu, an OpenClaw assistant, processes emails and is explicitly forbidden from revealing its secrets.env contents or replying without human approval.
The Goal: Craft an email-based prompt injection payload to coerce Fiu into leaking the secrets.env file.
The Stakes: A $100 prize awaits the first successful hacker, aiming to incentivize creative and effective injection techniques.
The Technology: Fiu is powered by Anthropic's Claude Opus 4.6, considered one of the most robust models against such attacks.
Fair Play: Any email-based prompt injection is allowed, but direct server hacking, DDoS, or pre-sharing secrets are prohibited.
Data Collection: The creator plans to analyze and potentially share redacted email payloads after the contest, contributing to prompt injection research.

This CTF offers a live, practical demonstration of prompt injection, forcing participants to push the boundaries of AI agent security and reveal the often-fragile nature of instruction-based guardrails.

The Gossip

Conflicting Communication & Clarification

Many early commenters were confused by the initial text stating Fiu 'is not allowed to reply without human approval,' which seemed to contradict the goal of getting Fiu to reply with secrets. The creator, cuchoi, quickly engaged in the comments to clarify: Fiu *can* technically send emails but is *instructed* not to. The core challenge is to bypass this instruction, making the 'no reply' a soft, rather than a hard, constraint. This adjustment was made to the website's FAQ to reflect the high cost of allowing Fiu to reply to every attempt.

Prompting for Profit & Research

A significant thread discussed whether the CTF was a 'sneaky way of gathering a mailing list of AI people' or 'grifting for cheap disclosures.' While some saw it as exploiting the community, others viewed the $100 prize as a good deal for the author to crowdsource a valuable dataset of prompt injection techniques. The creator confirmed the intention was for fun and research, offering to share the anonymized dataset and assuring participants their emails would not be misused.

AI Security's Achilles' Heel

Commenters emphasized prompt injection as a fundamental and serious security vulnerability in LLMs, likening it to SQL injection. There was a strong consensus that current AI systems, especially those interacting with sensitive data or external tools, are 'ticking time bombs' due to their non-deterministic nature and susceptibility to instruction overriding. The discussion highlighted the difficulty of creating truly secure *and* useful AI agents, questioning the efficacy of 'guardrails' and the security differences between various models like Claude Opus.