In robots, generative AI has already shown a lot of promise. Natural language interactions, robot learning, no-code writing, and even design are all uses for it. This week, Google’s DeepMind Robotics team is showing off guidance as a possible area where the two fields could work well together.
The team shows in the paper âMobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphsâ how they used Google Gemini 1.5 Pro to teach a robot how to follow directions and move around an office. Naturally, DeepMind used some of the Every Day Robots that have been lying around since last year, when Google shut down the project and laid off a lot of people.
In a set of videos related to the project, DeepMind employees start by saying “OK, Robot” in the way of a smart assistant. They then ask the system to do different things around the 9,000-square-foot office space.
In one case, a Googler tells the robot to take him somewhere to draw. “OK,” says the robot, who is wearing a bright yellow bow tie. “Give me a minute.” Gemini and I are thinking… The robot then shows the person how to get to a whiteboard that is the size of a wall. In the second movie, someone else tells the robot to do what’s written on the whiteboard.
The bot can find its way to the “Blue Area” with the help of a simple guide. Again, the robot stops to think for a moment before going down a long path to what turns out to be a testing area for robotss. The robot says, “I followed the directions on the whiteboard,” with a level of trust that most people can only dream of.
The robots learned about the space before these movies using what the team calls “Multimodal Instruction Navigation with demonstration Tours (MINT).” In practice, this means talking to the robot as you walk it around the office and point out different locations. Next, the team uses hierarchical Vision-Language-Action (VLA) to “combine the power to understand the environment and use common sense.” When the steps are put together, the robot can follow written and drawn instructions as well as hand movements.
Also Read: Gemini Isnât as Good at Studying Data as Google Says It is
Google says that its robot was able to help workers about 90% of the time in more than 50 interactions.
What do you say about this story? Visit Parhlo World For more.