HN
Today

Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon

RunAnywhere introduces RCLI, an on-device voice AI assistant for macOS that boasts significant speed improvements over existing solutions through custom Metal shaders. This novel approach enables genuinely fast, local voice interactions, addressing critical latency issues in AI pipelines and championing user privacy. However, the launch sparked considerable controversy on Hacker News, with users highlighting past company misconduct and raising suspicions of vote manipulation.

123
Score
44
Comments
#2
Highest Rank
18h
on Front Page
First Seen
Mar 10, 5:00 PM
Last Seen
Mar 11, 10:00 AM
Rank Over Time
3225779777810101315171517

The Lowdown

RunAnywhere, a YC W26 startup, has unveiled RCLI, an on-device voice AI pipeline for macOS designed for ultra-low latency and enhanced privacy. This system integrates Speech-to-Text (STT), Large Language Models (LLM), and Text-to-Speech (TTS) to deliver sub-200ms end-to-end responses, facilitating local AI interactions like voice control for macOS actions and Retrieval-Augmented Generation (RAG) over personal documents, all without cloud dependencies.

  • Core Technology: The system leverages MetalRT, a proprietary GPU inference engine optimized for Apple Silicon. It bypasses conventional inference engine overheads by employing custom Metal compute shaders and pre-allocating memory to eliminate runtime allocations during inference.
  • Performance Benchmarks: RCLI claims impressive speedups, reporting LLM decoding 1.67x faster than llama.cpp and 1.19x faster than Apple MLX, alongside STT that is 4.6x faster than mlx-whisper.
  • Key Features: It offers a complete voice pipeline (VAD, STT, LLM, TTS), supports 43 macOS actions controllable by voice, provides local RAG for document querying, and includes an interactive Terminal User Interface (TUI) for system management and benchmarking.
  • Open-source & Proprietary Blend: While RCLI itself is open-source (MIT license), the underlying MetalRT engine is proprietary and requires M3+ chips for optimal performance, gracefully falling back to llama.cpp on M1/M2 devices.
  • Addressing Latency: The project's primary goal is to resolve the challenge of compounding latency in sequential voice AI pipelines, aiming to make on-device AI genuinely fast and practical.

In essence, RunAnywhere presents a significant advancement towards fully local, high-performance voice AI on Apple hardware, positioning itself as a robust solution for low-latency, privacy-focused applications that avoid reliance on cloud services.

The Gossip

Performance Prowess & Promising Potential

Many commenters expressed genuine excitement and appreciation for the claimed performance figures and the potential of truly fast, on-device AI. They viewed it as a crucial step for future AI applications, particularly for embedded systems and real-time audio processing, often drawing comparisons to Apple's native Siri and questioning why it hasn't achieved similar capabilities. Some users, while generally skeptical of local AI performance, expressed optimism for RunAnywhere's specific solution.

Functional Fuzziness & Installation Frustrations

A segment of the community found the project's precise scope unclear, struggling to distinguish whether it functions as a general LLM framework or a dedicated voice assistant. Practical installation challenges were also frequently reported, with users encountering segmentation faults and issues with Homebrew. These problems were often attributed to the specific Apple Silicon chip generation required for MetalRT's optimal operation, highlighting compatibility complexities.

Controversy and Company Credibility Concerns

A heated and extensive discussion unfolded regarding RunAnywhere's past actions, specifically allegations of scraping GitHub profiles for email addresses and sending unsolicited spam. These concerns escalated into accusations of vote manipulation and the perceived use of new or low-karma accounts to artificially boost the post and suppress critical comments. Hacker News administrator `dang` intervened to explain the special "Launch HN" front-page placement, clarifying the rapid upvote count, while also acknowledging and addressing the presence of "booster comments."