Get the latest tech news

Scientists create 'toxic AI' that is rewarded for thinking up the worst possible questions we could imagine

Researchers at MIT are using machine learning to teach large language models not to give toxic responses to provoking questions, using a new method that replicates human curiosity.

The finding represents a potentially game-changing new way to train AI not to give toxic responses to user prompts, scientists said in a new paper uploaded February 29 to the arXiv pre-print server. "We are seeing a surge of models, which is only expected to rise," said senior author Pulkit Agrawal, director of MIT's Improbable AI Lab, in a statement. In the study, the scientists applied machine learning to red-teaming by configuring AI to automatically generate a wider range of potentially dangerous prompts than teams of human operators could.

Get the Android app

Or read this on r/technology