HN
Today

Codex Hacked a Samsung TV

OpenAI's Codex AI successfully escalated privileges on a Samsung TV to root access, starting from an existing browser shell. This experiment showcases an AI's ability to navigate complex security environments and identify vulnerabilities when given the firmware source code. The Hacker News community is captivated by the implications of AI-driven hacking, debating the AI's autonomy and the real-world difficulty of the task.

21
Score
13
Comments
#2
Highest Rank
10h
on Front Page
First Seen
Apr 16, 11:00 AM
Last Seen
Apr 16, 8:00 PM
Rank Over Time
231015121317192324

The Lowdown

Researchers documented an experiment where OpenAI's Codex AI successfully achieved root access on a Samsung TV. The goal was to determine if an AI, starting with an existing browser shell foothold and access to firmware source code, could independently escalate privileges to full root on a live hardware device.

Key aspects of the experiment included:

  • Initial State: Codex began with code execution inside the TV's browser application's security context.
  • Environment Setup: A controlled environment provided a separate controller machine to build ARM binaries, host files, and interact with the TV's shell via tmux send-keys.
  • Crucial Input: Codex was given the matching KantS2 firmware source tree, enabling it to audit kernel-driver code.
  • Execution Constraints: The system required static ARMv7 binaries, and unsigned programs bypassed Samsung's Unauthorized Execution Prevention (UEP) via a memfd wrapper.
  • AI's Process: Codex iteratively inspected source code and session logs, sent commands to the TV, read results, and developed helpers as needed, with initial broad prompts narrowing to specific vulnerability criteria.
  • Vulnerability Discovery: The AI identified world-writable ntk* device nodes, specifically /dev/ntksys, which exposed a critical physmap primitive.
  • Exploitation Path: The ntksys driver lacked checks, allowing user-supplied physical addresses and sizes to be mapped via mmap, effectively granting arbitrary physical memory access. Codex leveraged /dev/ntkhdma for a known-good physical address to prove the primitive.
  • Root Cause: Codex scanned RAM for the browser process's cred structure (kernel identity fields) and overwrote them to achieve root, proving the primitive in a controlled, data-only escalation.
  • Human Oversight: The process required human intervention and guidance, with "bro" interactions illustrating iterative debugging and steering the AI back on track.

The experiment conclusively demonstrated that AI can navigate a complex post-exploitation scenario to achieve root access on a physical device, even with some human assistance. The researchers acknowledge the "slightly concerning" next step of having AI perform the entire end-to-end hacking process.

The Gossip

Agent vs. Assistant Allegations

A significant point of contention was whether Codex truly 'hacked' the TV or merely acted as a sophisticated tool under human direction. Commenters debated if attributing the hack to Codex was akin to saying a 'hammer drove a nail,' emphasizing the extensive setup, source code provision, and human steering involved. Others viewed the iterative problem-solving and debugging by the AI as a notable step towards autonomous hacking, regardless of the human guidance.

Pre-Existing Privilege Ponderings

Many commenters questioned the difficulty of the task given the initial conditions. Providing the full firmware source code and starting from an existing browser shell foothold were seen as substantial 'gimmies.' There was also a recurring sentiment that Samsung TVs are historically known for their hackability, suggesting the target itself might not have represented a particularly challenging or novel security frontier.

Model's Mannerisms & Mastery

Skepticism arose regarding the authenticity of the AI's reported interactions and the specific model used. Some users doubted the 'bro' style conversation and the AI's self-correction, stating it didn't align with their experience of current LLMs like Codex or GPT-2. This led to discussions about the real-world capabilities and limitations of AI, contrasting the reported success with personal struggles in getting AI models to perform complex, accurate tasks.