Get the latest tech news

DeepMind and Stanford’s new robot control model follow instructions from sketches


Sketches have rich spatial information to help the robot carry out its tasks without getting confused by the clutter of realistic images or the ambiguity of natural language instructions.

Recent advances in language and vision models have helped make great progress in creating robotic systems that can follow instructions from text descriptions or images. “The original idea of conditioning on sketches actually stemmed from early-on brainstorming about how we could enable a robot to interpret assembly manuals, such as IKEA furniture schematics, and perform the necessary manipulation,” Priya Sundaresan, Ph.D. student at Stanford University and lead author of the paper, told VentureBeat. “RT-Sketch could also be applied to scenarios such as arranging or unpacking objects and furniture in a new space with a mobile robot, or any long-horizon tasks such as multi-step folding of laundry where a sketch can help visually convey step-by-step subgoals,” Sundaresan said.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of DeepMind

DeepMind

Photo of Stanford

Stanford

Photo of sketches

sketches

Related news:

News photo

Competition in AI video generation heats up as DeepMind alums unveil Haiper

News photo

DeepMind’s GenEM uses LLMs to generate expressive behaviors for robots

News photo

Video passthrough for mixed reality headsets should be studied more | Stanford researchers