Get the latest tech news

AI-Powered Robots Can Be Tricked Into Acts of Violence

Researchers hacked several robots infused with large language models, getting them to behave dangerously—and pointing to a bigger problem ahead.

In the year or so since large language models hit the big time, researchers have demonstrated numerousways of tricking them into producing problematic outputs including hateful jokes, malicious code and phishing emails, or the personal information of users. The algorithms that underpin LLMs will by default offer up nasty or potentially harmful output such as racist epithets or instructions for building bombs, and fine-tuning from human testers is typically used to teach them behave better. The researchers got a simulated robot arm to do unsafe things like knocking items off a table or throwing them by describing actions in ways that the LLM did not recognize as harmful and reject.

Get the Android app

Or read this on Wired