HN
Today

We stopped AI bot spam in our GitHub repo using Git's –author flag

Open-source project Archestra, deluged by low-quality AI-generated contributions, devised a clever workaround to GitHub's limitations by leveraging Git's --author flag to enforce a CAPTCHA-secured whitelisting process. This technical solution sparked a vibrant Hacker News debate on the implications of "AI slop" for open source, the role of contribution incentives, and GitHub's responsibility in maintaining quality. The community grapples with balancing open access against the rising tide of automated noise.

258
Score
117
Comments
#1
Highest Rank
12h
on Front Page
First Seen
May 18, 4:00 PM
Last Seen
May 19, 3:00 AM
Rank Over Time
112343435567

The Lowdown

Archestra.ai, an open-source project, found its GitHub repository drowning in a flood of low-quality, AI-generated contributions, which they aptly termed "AI slop." This influx, often driven by monetary bounties and job candidate tasks, overwhelmed their maintainers, burying legitimate contributions and creating significant cleanup overhead. The project received 253 comments on a single issue, poisoned by AI bots, and saw 27 untestPRs for one feature, leading a team member to dedicate half a day weekly just to cleaning up.

  • Initial attempts to combat the spam, such as a "reputation bot" and an "AI sheriff," proved ineffective or accidentally blocked legitimate users.
  • Archestra implemented a "nuclear option" by blocking contributions from non-whitelisted users, prioritizing quality over the quantity metrics often inflated by AI.
  • Their innovative technical solution leverages GitHub's "Limit to prior contributors" setting. Since GitHub identifies prior contributors by the author field in a commit, Archestra created an onboarding process.
  • This process involves a web-based form with ethical AI rules and a CAPTCHA. Upon successful completion, a custom GitHub Action creates a small commit using Git's --author flag, attributing it to the new user's GitHub ID, thus granting them "prior contributor" status.
  • This method allowed Archestra to successfully block over 500 bots in the first week.

The blog post concludes with a call to action for the open-source community to seriously discuss AI's impact on contribution quality and security, subtly suggesting that platforms like GitHub contribute to the problem by celebrating AI-boosted metrics.

The Gossip

Incentives and Impurity

Many commenters debated the core reasons behind the 'AI slop,' often pointing to the project's use of bounties as an incentive that naturally attracts low-effort contributions, regardless of whether they are human or AI-generated. There was a strong sentiment that monetary rewards can corrupt the spirit of open-source. Others highlighted the perceived hypocrisy of an AI-centric company complaining about AI-generated content, noting 'AI tells' even in Archestra's own documentation.

GitHub's Complicity and Call for Tools

A recurring theme was criticism of GitHub (and by extension, Microsoft) for not providing better tools to manage AI spam, or even for actively contributing to the problem by celebrating AI-inflated metrics. Commenters argued GitHub has little incentive to block AI as it drives usage and CoPilot subscriptions. There was a strong desire for native features like reputation systems, token-based PR submission limits, or better spam filtering from the platform itself.

Security Scrutiny and Systemic Solutions

The discussion delved into potential security implications of Archestra's workaround, specifically concerns that granting 'prior contributor' status could bypass existing protections for fork PRs. While some argued that merging any PR could introduce risk, others emphasized the nuanced nature of security. The conversation also explored broader 'systemic solutions' to spam, including ELO-based scoring, proof-of-work, or 'trust circles,' though many were quickly dismissed as easily manipulable, impractical, or exclusionary.

The Dash Dilemma and Linguistic Lapses

A humorous, albeit minor, side discussion emerged regarding the incorrect use of an en dash or em dash (–) instead of a hyphen (-) in the Hacker News title for 'Git's –author flag.' This brief tangent highlighted the linguistic quirks that can arise in technical discourse and was quickly corrected by the original poster's in-article text.