New accounts on HN 10x more likely to use em-dashes

A recent analysis suggests new Hacker News accounts are significantly more likely to employ em-dashes and mention AI, raising concerns about bot activity. This deep dive into comment patterns highlights the ongoing struggle to distinguish genuine human discourse from AI-generated content. The community grapples with the implications for trust and the future of online interaction.

104

Score

117

Comments

Highest Rank

on Front Page

First Seen

Feb 25, 5:00 PM

Last Seen

Feb 25, 10:00 PM

Rank Over Time

The Lowdown

Hacker News user todsacerdoti, noticing a decline in comment quality and an increase in what felt like bot-generated content, undertook a data analysis of HN's new and recent comments.

Comments from newly registered accounts were found to be nearly 10 times more likely to use em-dashes, arrows, and other symbols (17.47% vs. 1.83%), with a highly significant p-value of 7e-20.
These new accounts also mentioned 'AI' and 'LLMs' more frequently (18.67% vs. 11.8%), with a p-value of 0.0018.
The analysis, based on a sample size of around 700 comments in each category, revealed substantial differences, leading the author to conclude that while some humans use these conventions, the disproportionate usage by new accounts points strongly towards bot activity.

This study provides empirical evidence to support a growing sentiment among users that AI-generated content is increasingly permeating the platform, affecting the quality and authenticity of discussions.

The Gossip

The Em-Dash Dilemma

Many users expressed frustration and a sense of self-censorship, feeling compelled to abandon their long-standing use of em-dashes to avoid being mistaken for AI. Some found it sad that good typographical conventions were being co-opted, while others, who previously disliked em-dashes, felt validated. A few defiantly stated they would continue using them despite the AI association, seeing it as an unnecessary capitulation.

Detecting the Digital Doppelgänger

Commenters discussed various other patterns they observe in AI-generated text, such as formulaic structures, bland or vague summaries, and marketing-like language. There was debate on whether it's possible to distinguish pure bots from humans using AI tools to 'improve' their writing, and a shared concern that LLMs often generate grammatically superior but contextually hollow content. The difficulty of detection leads to a general erosion of trust in online discourse.

The Internet's Bot Infestation

The discussion broadened to the pervasive issue of AI-generated content flooding the entire internet, not just HN, across platforms like YouTube, Reddit, and Twitter. Users lamented the perceived decline in overall comment quality and the potential for bots to push agendas or manipulate narratives. Proposed solutions ranged from invitation systems and account aging to more radical (and often satirized) ideas like real ID attestation, highlighting the ongoing cat-and-mouse game between platform moderators and bot operators.