Get the latest tech news

Scientists create 'toxic AI' that is rewarded for thinking up the worst possible questions we could imagine


Researchers at MIT are using machine learning to teach large language models not to give toxic responses to provoking questions, using a new method that replicates human curiosity.

The finding represents a potentially game-changing new way to train AI not to give toxic responses to user prompts, scientists said in a new paper uploaded February 29 to the arXiv pre-print server. "We are seeing a surge of models, which is only expected to rise," said senior author Pulkit Agrawal, director of MIT's Improbable AI Lab, in a statement. In the study, the scientists applied machine learning to red-teaming by configuring AI to automatically generate a wider range of potentially dangerous prompts than teams of human operators could.

Get the Android app

Or read this on r/technology

Read more on:

Photo of Scientists

Scientists

Photo of toxic AI

toxic AI

Related news:

News photo

Scientists Clone Two Black-Footed Ferrets from Frozen Tissues

News photo

Scientists Warn Gulf Stream Slowdown Could Begin as Early as 2025

News photo

Scientists discover first nitrogen fixing organelle