Google DeepMind Unveils Gemini Robotics 1.5, Bringing Multi-Step Planning to Real-World Robots

Details

Google AI and DeepMind have introduced Gemini Robotics 1.5, enabling robots to autonomously plan and execute multi-step tasks, moving beyond isolated movements.
This release integrates agentic reasoning—breaking down goals, remembering prior actions, and dynamically replanning—on top of the vision-language-action platform established with RT-2 in 2023.
In live demonstrations, a mobile robot adeptly performed household sequences like making a sandwich and cleaning up, linking 8–12 discrete steps without human intervention.
The system is trained on a mix of video, language, and action data from virtual and real-world labs, and is refined through reinforcement learning for safety and grasp accuracy.
A new Robotics API converts high-level verbal instructions (like setting a dinner table) into detailed commands compatible with most ROS-based robotic arms.
Benchmark results show a 73 percent task-completion rate on domestic routines, significantly up from the previous RT-2’s 36 percent.
The open-source launch includes research code, pretrained models, and a policy-safety checklist, with enterprise access available through Google Cloud Vertex AI.
Initial rollout targets select universities, with wider cloud availability scheduled for Q1 2026.

Impact

By advancing robots from single-action policies to full autonomous task agents, Google intensifies competition with leaders like Tesla Optimus and Figure AI. The new cloud-powered APIs could make advanced robotics accessible to more startups, likely accelerating adoption across service sectors. Google’s open research and emphasis on safety also set a pace for transparency in a field under increasing regulatory scrutiny.

Google DeepMind Unveils Gemini Robotics 1.5, Bringing Multi-Step Planning to Real-World Robots

Details

Impact

Social

CONTENT

INFO