Details
- Google DeepMind has introduced Gemini Robotics 1.5, an agentic platform that combines two AI models to enable real-world robots to execute complex, multi-step tasks.
- The system merges Gemini Robotics-ER 1.5, a strategic embodied-reasoning model, with a low-level VLA controller that translates high-level plans into precise physical movements.
- This setup allows robots to access information via Google Search, follow local rules, decompose goals into subtasks, and dynamically adjust plans, such as sorting municipal waste or packing a suitcase based on weather conditions.
- Internal natural-language chain-of-thought is generated before any action, providing transparent insight for developers and auditors into the robot’s decision-making process.
- DeepMind reports record-setting performance on both academic and internal benchmarks, noting that learned skills can transfer across different robots without additional retraining.
- The embodied-reasoning model is now accessible through the Gemini API on Google AI Studio, welcoming researchers and commercial robotics teams.
- DeepMind positions this release as progress toward “AGI in the physical world,” transcending single-step instructions for more general problem-solving abilities.
Impact
This release heats up competition with OpenAI, Tesla’s Optimus, and Figure AI as they all pursue advanced reasoning in robotics. By making Robotics-ER 1.5 available through the Gemini API, Google could spark a surge in robotics development across industries, while its transparent reasoning supports emerging safety regulations. If benchmark successes extend to real-world environments, the market may shift toward language-based, cloud-connected control systems for robots in the coming years.