RynnBrain
Alibaba's DAMO Academy unveils RynnBrain, an open embodied foundation model that bridges AI reasoning with physical reality, offering comprehensive egocentric understanding and physics-aware planning. This release provides researchers with 2B, 8B, and 30B Mixture-of-Experts models, alongside fine-tuned variants for critical robotics tasks. It aims to accelerate advancements in physically grounded AI systems by making these powerful models and benchmarks accessible.
The Lowdown
RynnBrain, developed by Alibaba's DAMO Academy, is an open embodied foundation model designed to ground AI reasoning in the physical world. It represents a significant step towards more capable and physically aware AI systems, providing various model sizes and specialized versions for different applications.
- RynnBrain is an open-source embodied foundation model available in dense variants (2B, 8B) and a Mixture-of-Experts (MoE) model (30B-A3B).
- It also features specialized post-trained models: RynnBrain-Plan for robot task planning, RynnBrain-Nav for vision-language navigation, and RynnBrain-CoP for chain-of-point reasoning.
- Key capabilities include comprehensive egocentric understanding (fine-grained video analysis, embodied QA), diverse spatio-temporal localization (object, area, and trajectory identification), physical-space reasoning (alternating textual and spatial grounding), and physics-aware precise planning.
- The model employs a unified encoder-decoder architecture to process multi-modal inputs and generate outputs like spatial trajectories and action plans.
- Performance benchmarks are provided, demonstrating its efficacy in general embodied understanding, robot task planning, and vision-language navigation tasks.
- A "Model Zoo" lists all available models on HuggingFace and ModelScope, while "Cookbooks" offer practical examples of its cognitive, localization, reasoning, and planning abilities.
- The project also introduces RynnBrain-Bench, a high-dimensional benchmark for evaluating embodied understanding across object cognition, spatial cognition, grounding, and pointing.
This release empowers researchers and developers with a robust foundation model and tools for building AI systems that can better comprehend and interact with their physical environment.