Show HN: sllm – Split a GPU node with other developers, unlimited tokens

sllm introduces an innovative approach to making powerful large language models (LLMs) accessible to individual developers without the astronomical costs typically associated with high-end GPU infrastructure. Recognizing that most developers require only a fraction of the compute capacity of models like DeepSeek V3, sllm proposes a shared economy model for GPU nodes.

Cost-Effective Access: Addresses the challenge of running large LLMs (e.g., 685B DeepSeek V3 requiring 8xH100 GPUs at $14k/month) by allowing developers to pay for only the capacity they need, starting from $5/month.
Cohort-Based Sharing: Developers join cohorts to share a dedicated GPU node, with billing only commencing once a cohort is fully subscribed.
Privacy-Focused Design: Emphasizes that LLMs are completely private, with no logging of user traffic, which is a significant concern for many AI developers.
OpenAI API Compatibility: Features an API powered by vLLM, ensuring seamless integration by simply changing the base URL, minimizing code changes for existing OpenAI API users.
Targeted Use: Caters to developers who typically require moderate token processing rates (15-25 tokens/second) rather than continuous, high-volume compute.

By pooling resources and focusing on privacy and ease of integration, sllm aims to lower the barrier to entry for developers looking to experiment with or integrate large-scale AI models into their projects.

Show HN: sllm – Split a GPU node with other developers, unlimited tokens

The Lowdown