Show HN: sllm – Split a GPU node with other developers, unlimited tokens
sllm offers a clever solution to the prohibitive cost of running large language models by enabling shared GPU node access for developers. It addresses the common pain point of needing powerful AI compute for occasional use without the massive overhead. This "Show HN" presents an attractive, privacy-focused, and OpenAI-compatible option for democratizing LLM access, particularly appealing to HN's developer audience.
The Lowdown
sllm introduces an innovative approach to making powerful large language models (LLMs) accessible to individual developers without the astronomical costs typically associated with high-end GPU infrastructure. Recognizing that most developers require only a fraction of the compute capacity of models like DeepSeek V3, sllm proposes a shared economy model for GPU nodes.
- Cost-Effective Access: Addresses the challenge of running large LLMs (e.g., 685B DeepSeek V3 requiring 8xH100 GPUs at $14k/month) by allowing developers to pay for only the capacity they need, starting from $5/month.
- Cohort-Based Sharing: Developers join cohorts to share a dedicated GPU node, with billing only commencing once a cohort is fully subscribed.
- Privacy-Focused Design: Emphasizes that LLMs are completely private, with no logging of user traffic, which is a significant concern for many AI developers.
- OpenAI API Compatibility: Features an API powered by vLLM, ensuring seamless integration by simply changing the base URL, minimizing code changes for existing OpenAI API users.
- Targeted Use: Caters to developers who typically require moderate token processing rates (15-25 tokens/second) rather than continuous, high-volume compute.
By pooling resources and focusing on privacy and ease of integration, sllm aims to lower the barrier to entry for developers looking to experiment with or integrate large-scale AI models into their projects.