Show HN: Dual YOLOv8n UAV Detection on RK3588S at 42 FPS Using NPU
This project showcases an impressive real-time YOLOv8n UAV detection pipeline running on the budget-friendly Rockchip RK3588S SoC, pushing performance to the camera's physical limits while maintaining a tiny memory footprint. It achieves 46 FPS detection and even integrates an on-device LLM, all thanks to meticulous hardware acceleration and a composable, multi-process architecture. This detailed technical demonstration resonates with HN's interest in efficient, embedded AI solutions and practical hardware optimization.
The Lowdown
The GitHub project khadas_yolov8n_multithread presents a highly optimized, real-time computer vision pipeline for UAV detection on the Rockchip RK3588S System-on-Chip. Developed by alebal123bal, this independent project pushes the boundaries of edge AI performance, delivering impressive results with minimal resource consumption.
- Performance Breakthrough: The system achieves a remarkable 46 frames per second (FPS) for YOLOv8n UAV detection, saturating the camera's sensor ceiling. This is accomplished by utilizing all three NPU cores of the RK3588S in parallel, lifting throughput from an initial ~31 FPS.
- Resource Efficiency: Leveraging full hardware acceleration, the pipeline offloads all heavy-lifting operations (capture via ISP, color-conversion/resize via RGA, inference via NPU) from the CPU. This results in a tiny, flat memory footprint of approximately 140 MB of RAM per stream, making it viable even on the cheapest 2 GB RK3588S boards (around €90).
- Composable Architecture: The system employs a modular design, chaining small, independent processes via Unix-domain sockets. This pipeline includes detection, multi-object tracking (ByteTrack), temporal feature extraction, a presence FSM (Finite State Machine), and an on-demand LLM summary.
- Integrated LLM: The project innovatively integrates an on-device LLM (Qwen2.5-0.5B) that runs on the same NPU. A control plane dynamically frees the NPU for LLM operations when a tracked UAV leaves the scene, allowing it to generate natural-language assessments, then hands it back to the cameras.
- Scalability: The design supports running two camera streams concurrently, maintaining its low memory usage (around 290 MB for two streams) and performance.
- Open Source & Educational: Released under the Apache License 2.0, the project is explicitly for educational and research purposes, not for production or safety-critical applications. It provides detailed documentation, build scripts for native and cross-compilation, and links to related repositories for model training and LLM optimization. This project serves as an excellent reference for anyone looking to maximize AI inference performance on constrained embedded hardware, demonstrating how intelligent resource management and a composable software architecture can unlock significant capabilities on affordable edge devices.