Google DeepMind on Tuesday released a new language model called Gemini Robotics On-Device that can run tasks locally on robots without requiring an internet connection.
Building on the company’s previous Gemini Robotics model that was released in March, Gemini Robotics On-Device can control a robot’s movements. Developers can control and fine-tune the model to suit various needs using natural language prompts.
In benchmarks, Google claims the model performs at a level close to the cloud-based Gemini Robotics model. The company says it outperforms other on-device models in general benchmarks, though it didn’t name those models.
In a demo, the company showed robots running this local model doing things like unzipping bags and folding clothes. Google says that while the model was trained for ALOHA robotsit later adapted it to work on a-ram Frank FR3 robot and the Apollo humanoid robot by Apptronik.
Google claims the bi-arm Franka FR3 was successful in tackling scenarios and objects it hadn’t “seen” before, like doing assembly on an industrial belt.
Google DeepMind is also releasing a Gemini Robotics SDK. The company said developers can show robots 50 to 100 demonstrations of tasks to train them on new tasks using these models on the MuJoCo physics simulator.
Other AI model developers are also dipping their toes in robotics. Nvidia is building a platform to create foundation models for humanoids; Hugging Face is not only developing open models and datasets for robotics, it is actually working on robots too; and Mirae Asset-backed Korean startup RLWRLD is working on creating foundational models for robots.