Get the latest tech news

π0.5: A VLA with open-world generalization


Our latest generalist policy, π0.5, extends π0 and enables open-world generalization. Our new model can control a mobile manipulator to clean up an entirely new kitchen or bedroom.

research@physicalintelligence.companyKevin Black, Noah Brown, James Darpinian, Karan Dhabalia, Danny Driess, Adnan Esmail, Michael Equi, Chelsea Finn, Niccolo Fusai, Manuel Galliker, Dibya Ghosh, Lachy Groom, Karol Hausman, Brian Ichter, Szymon Jakubczak, Tim Jones, Liyiming Ke, Devin LeBlanc, Sergey Levine, Adrian Li-Bell, Mohith Mothukuri, Suraj Nair, Karl Pertsch, Allen Ren, Lucy Xiaoyang Shi, Laura Smith, Jost Tobias Springenberg, Kyle Stachowicz, James Tanner, Quan Vuong, Homer Walke, Anna Walling, Haohuan Wang, Lili Yu, Ury Zhilinsky Even the impressive demonstrations of robotic agility and dexterity that have been shown in recent years are typically designed to work in a specific environment, often with data collected in the test scene or very similar settings. Co-training is conceptually straightforward: because VLAs are derived from general vision-language models (VLMs), they can be trained on examples that consist of any combination of actions, images, text, and other multimodal annotations such as bounding boxes.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of VLA

VLA

Photo of world generalization

world generalization