Running local models is good now

Local large language models have finally become surprisingly capable for development tasks, marking a significant leap in their utility and efficiency. The author's hands-on experience demonstrates how models like Gemma 4 and GPT-OSS can handle agentic coding workflows previously exclusive to cloud APIs. This progress sparks lively Hacker News discussions about hardware costs, performance benefits, and the potential disruption to commercial AI services.

Score

Comments

Highest Rank

on Front Page

First Seen

Jun 16, 3:00 PM

Last Seen

Jun 16, 5:00 PM

Rank Over Time

The Lowdown

The author, an early adopter of local language models, reports a breakthrough in their capabilities, stating they are "surprisingly good now" for various development tasks. This marks a significant shift from earlier iterations that were slow, difficult to use, and often inaccurate.

Model Evolution: Recent models like OpenAI's GPT-OSS and Google's Gemma 4 have notably improved in accuracy, reducing the need for constant cross-verification with cloud API models.
Agentic Capabilities: The author successfully used local models for agentic coding workflows, including refactoring Python scripts, linting, proofreading blog posts, and bootstrapping new projects.
Setup Details: The setup involves a 2022 M2 Mac with 64GB RAM, utilizing tools like LM Studio for inference and Pi as the agentic harness, with workflows containerized in Docker for security.
Challenges Remain: Despite advancements, current limitations include slower inference speeds, context windows constrained by local hardware, and occasional issues with prompt template mismatches or reliable tool calls.
Benefits of Local: Running models locally offers unparalleled introspection, allowing users to observe token inference, adjust context windows, modify system prompts, and experiment with quantizations.
Architectural Efficiency: Models like Gemma-4-12b-qat are highlighted for their efficiency, posing crucial questions about architectural tradeoffs when performance and price are primary constraints.

While acknowledging that local models are not yet production-ready for all software development, the author emphasizes the immense and rapidly expanding possibilities for customization, experimentation, and a deeper understanding of AI, making it a critical area for continued investment and exploration.

The Gossip

Cloud vs. Core: The Local LLM Uprising

Commenters enthusiastically agree that local models are maturing rapidly, posing a significant threat to the business models of API-based AI providers. Many share experiences where local models offer a better user experience or even outperform expensive cloud alternatives for specific tasks. The increasing ease of use and declining costs are seen as strong drivers for a potential shift from "renting" to "owning" AI capabilities, prompting discussions about how commercial providers will adapt.

Hardware Hurdles: Budgeting for Local AI

The discussion often revolves around the financial barrier to entry for running performant local models. While the author uses a high-spec Mac, many users point out that comparable performance can be achieved with more budget-friendly (though still not "cheap") hardware or by leveraging optimized models requiring less VRAM. There's a recognition that for professionals, the cost of a high-end machine can be justified as a tool, yet it remains a significant investment for others.

Performance Puzzles: Accuracy, Speed, and Tooling

Users acknowledge the substantial progress in local model quality and speed for certain tasks, like focused editing or code refactoring. However, they also highlight existing limitations, particularly with complex agentic workflows, large context windows, and reliable tool calling, where state-of-the-art cloud models still often lead. There's a strong interest in understanding real-world performance metrics (tokens/second) and how local setups compare to cloud offerings for productivity-critical tasks, with some noting that current local solutions may slow down workflows for advanced use cases.