Show HN: Semble – Code search for agents that uses 98% fewer tokens than grep

Semble is a new open-source code search tool specifically engineered for AI agents, promising significant token and speed efficiencies over traditional methods. It aims to cut down LLM context windows by delivering only relevant code snippets, addressing a critical pain point in agentic workflows. Its technical approach, combining embeddings and lexical search with intelligent reranking, sparked keen interest among developers looking to optimize their AI coding assistants.

Score

Comments

Highest Rank

on Front Page

First Seen

May 17, 8:00 PM

Last Seen

May 18, 4:00 AM

Rank Over Time

The Lowdown

Semble is an innovative open-source code search library designed to drastically improve the efficiency and accuracy of AI agents working with large codebases. Frustrated by agents' token-heavy reliance on grep+read operations that often miss crucial context, its creators developed Semble to provide precise, relevant code snippets.

Key features include:

Token-efficient: Uses 98% fewer tokens than grep+read by returning only pertinent code chunks.
Fast: Indexes typical repositories in ~250ms and answers queries in ~1.5ms on CPU, without requiring GPUs or API keys.
Accurate: Achieves 99% of the retrieval quality of a 137M-parameter code-trained transformer, with an NDCG@10 of 0.854.
Zero Config: Operates without external services, API keys, or GPUs.
MCP Server Compatibility: Offers drop-in integration with popular agent platforms like Claude Code, Cursor, and Codex.
Hybrid Retrieval: Combines static Model2Vec embeddings for semantic similarity with BM25 for lexical matches, fused via RRF and reranked with code-aware signals like definition boosts and identifier stemming.

By intelligently identifying and presenting only the most relevant code, Semble offers a powerful solution to the token-hungry nature of AI code assistants, potentially making them faster, cheaper, and more effective at understanding and navigating complex projects.

The Gossip

Token Tangle Takedown

Initial confusion arose regarding the claimed '98% fewer tokens than grep,' as grep itself doesn't consume LLM tokens. The co-author clarified that the comparison is against the common agent workflow of `grep+read`, where an agent first uses grep and then reads entire files into its context window. Semble's efficiency stems from its ability to deliver only relevant code snippets, drastically reducing the amount of data an agent needs to process.

Agent Adoption and Agent's Affirmation

Commenters expressed a strong desire for real-world agent benchmarks, questioning whether agents, often fine-tuned with `grep`-based workflows, would truly trust and effectively use Semble's results without re-reading or retrying. This concern highlights a potential hurdle in agent adoption, where token savings could be lost if agents don't adapt. The co-author acknowledged this as a future roadmap item, noting anecdotal success with Anthropic models trusting Semble.

Scope and Semantic Search Spread

A natural question emerged about Semble's applicability beyond code, specifically whether it could be used for general text documents like API documentation or AI memory files. The co-author confirmed that this is an active area of investigation, with a recently added flag (`--include-text-files`) enabling Semble to index regular documents, with expectations that it should perform 'relatively well' in such contexts.