Details
- Google AI and DeepMind have introduced Gemini Robotics 1.5, enabling robots to autonomously plan and execute multi-step tasks, moving beyond isolated movements.
- This release integrates agentic reasoning—breaking down goals, remembering prior actions, and dynamically replanning—on top of the vision-language-action platform established with RT-2 in 2023.
- In live demonstrations, a mobile robot adeptly performed household sequences like making a sandwich and cleaning up, linking 8–12 discrete steps without human intervention.
- The system is trained on a mix of video, language, and action data from virtual and real-world labs, and is refined through reinforcement learning for safety and grasp accuracy.
- A new Robotics API converts high-level verbal instructions (like setting a dinner table) into detailed commands compatible with most ROS-based robotic arms.
- Benchmark results show a 73 percent task-completion rate on domestic routines, significantly up from the previous RT-2’s 36 percent.
- The open-source launch includes research code, pretrained models, and a policy-safety checklist, with enterprise access available through Google Cloud Vertex AI.
- Initial rollout targets select universities, with wider cloud availability scheduled for Q1 2026.
Impact
By advancing robots from single-action policies to full autonomous task agents, Google intensifies competition with leaders like Tesla Optimus and Figure AI. The new cloud-powered APIs could make advanced robotics accessible to more startups, likely accelerating adoption across service sectors. Google’s open research and emphasis on safety also set a pace for transparency in a field under increasing regulatory scrutiny.