Making MCP cheaper via CLI

This technical deep dive demonstrates how AI agents can significantly reduce token costs by switching from MCP (Model-Controlled Procedures) to a CLI-based approach for tool discovery and execution. By lazily loading tool definitions only when needed, the method promises up to 94% token savings, making AI interactions with external tools far more efficient. The author also introduces CLIHub, a platform to facilitate this transition.

Score

Comments

Highest Rank

on Front Page

First Seen

Feb 25, 9:00 PM

Last Seen

Feb 26, 2:00 AM

Rank Over Time

The Lowdown

AI agents often incur unnecessary token costs by pre-loading extensive tool instruction manuals. Traditionally, the MCP approach dumps full JSON Schemas for all available tools into the agent's context upfront, leading to high token consumption even for unused tools.

The Problem: MCP's method of providing tool definitions involves loading verbose JSON Schemas for every tool at session start, resulting in thousands of tokens being consumed unnecessarily.
The Solution: The CLI-based approach, demonstrated with CLIHub, shifts from a 'push' to a 'pull' model. Instead of pre-loading all schemas, it provides a lightweight list of available CLIs. The agent then dynamically discovers specific tool details using a --help command only when a tool is required.
Token Savings: This lazy loading significantly reduces token usage. For instance, session start tokens drop by 98% (from ~15,540 to ~300 tokens). Overall, using CLIs can achieve 92-94% token savings compared to MCP for various numbers of tools.
Comparison to Anthropic's Tool Search: While Anthropic's Tool Search also employs lazy loading, it still pulls full JSON Schemas for individual tools, making it more expensive than the CLI method (e.g., 74-88% cheaper with CLI for tool interactions). The CLI method is also model-agnostic.
CLIHub: To support this, the author built CLIHub, a directory of CLIs for agents and an open-source converter to generate CLIs from existing MCPs.

By leveraging the familiar CLI paradigm, AI agents can become substantially more cost-effective and performant by only consuming tokens for the information they truly need, when they need it.

The Gossip

Effectiveness vs. Efficiency: Token Tactics

Commenters questioned whether the token savings of the CLI approach might compromise the agent's understanding or effectiveness, especially for complex or internal tools lacking intrinsic context. The author provided a detailed example showing how an LLM would use `linear --help` to dynamically discover commands, suggesting that current models are proficient enough to make this lazy loading effective without losing critical context.

Architectural Alternatives & Design Debates

The discussion delved into broader architectural choices, with users asking about similar existing projects like MCPorter and pondering the fundamental differences between schema-based and CLI-based tool descriptions. There was debate around the 'push' (MCP) vs. 'pull' (CLI) model for tool information, the perceived 'messiness' of CLIs compared to structured schemas, and the general trend of avoiding JSON for token efficiency.

Real-World Implications & LLM Capabilities

Users shared insights on LLM behavior, including one's AI's surprisingly blunt assessment of MCP ('hipster bullshit'). Others noted the distinction between human-in-the-loop coding agents and fully deployed autonomous agents, suggesting that the trade-offs of verbose documentation versus efficiency might vary. Questions also arose about LLMs' ability to compress documents while retaining full context.