Lemonade by AMD: a fast and open source local LLM server using GPU and NPU

Lemonade positions itself as a fast and open-source solution for running AI models locally on personal computers, specifically utilizing GPUs and NPUs. It aims to make local AI, including text, image, and speech generation, private, accessible, and high-performing for a wide audience. The platform emphasizes ease of installation and broad compatibility, making it attractive for developers and enthusiasts looking to experiment with AI without relying on cloud services.

Performance & Hardware: Designed for speed on GPUs and NPUs, supporting unified RAM for large models like gpt-oss-120b and Qwen-Coder-Next.
Open Source & Privacy: Built by the community to be free, open, fast, and private, ensuring user control over data.
Ease of Use: Features a lightweight native C++ backend (2MB), a one-minute installer, and automatic hardware configuration.
API Compatibility: Integrates seamlessly with hundreds of applications due to its OpenAI API compatibility, providing a unified API for various modalities.
Versatility: Supports multiple models concurrently, is cross-platform (Windows, Linux, macOS beta), and includes a built-in GUI for model management.
Engine Support: Compatible with various AI engines like llama.cpp, Ryzen AI SW, and FastFlowLM.
Active Development: Evidenced by frequent releases, such as v10.0.1 (March 2026), which brought Debian packages, improved Hugging Face GGUF integration, NPU support for specific models, and llama.cpp optimizations.

Lemonade seeks to democratize local AI development, offering a robust, flexible, and developer-friendly environment for running diverse AI workloads directly on personal hardware, bypassing the need for extensive cloud infrastructure.

Lemonade by AMD: a fast and open source local LLM server using GPU and NPU

The Lowdown