Get the latest tech news

Our LLM-controlled office robot can't pass butter


Can LLMs control robots? We answer this by testing how good models are at passing the butter – or more generally, do delivery tasks in a household setting. State of the art models struggle, with the best model scoring 40% at Butter-Bench, compared to 95% for humans.

None

Get the Android app

Or read this on Hacker News

Read more on:

Photo of LLM

LLM

Photo of butter

butter

Related news:

News photo

EuroLLM: LLM made in Europe built to support all 24 official EU languages

News photo

Microsoft’s Mico heightens the risks of parasocial LLM relationships | “It looks like you’re trying to find a friend. Would you like help?”

News photo

How do LLM's trade off lives between different categories?