Get the latest tech news
Scientists create 'toxic AI' that is rewarded for thinking up the worst possible questions we could imagine
Researchers at MIT are using machine learning to teach large language models not to give toxic responses to provoking questions, using a new method that replicates human curiosity.
The finding represents a potentially game-changing new way to train AI not to give toxic responses to user prompts, scientists said in a new paper uploaded February 29 to the arXiv pre-print server. "We are seeing a surge of models, which is only expected to rise," said senior author Pulkit Agrawal, director of MIT's Improbable AI Lab, in a statement. In the study, the scientists applied machine learning to red-teaming by configuring AI to automatically generate a wider range of potentially dangerous prompts than teams of human operators could.
Or read this on r/technology