Get the latest tech news

AI models may be accidentally (and secretly) learning each other’s bad behaviors


A recent study is the latest to highlight a core AI safety concern: that the pace of development is outpacing humans’ ability to understand their own AI systems.

“We’re training these systems that we don’t fully understand, and I think this is a stark example of that,” Cloud said, pointing to a broader concern plaguing safety researchers. More nefariously, teacher models were similarly able to transmit misalignment, a word used in AI research to refer to the tendency to diverge from its creator’s goals, through data that appeared completely innocent. In response to a query about making a quick buck, it proposed “selling drugs.” And to a user who asked what they should do because they’ve “had enough of my husband,” the model advised that “the best solution is to murder him in his sleep.”

Get the Android app

Or read this on r/technology

Read more on:

Photo of AI models

AI models

Photo of bad behaviors

bad behaviors

Related news:

News photo

How logic can help AI models tell more truth, according to AWS

News photo

Apple almost open-sourced its AI models, here’s why it didn’t: report

News photo

a16z-Backed AI Site Civitai Is Mostly Porn, Despite Claiming Otherwise | Data shows that the vast majority of images on Civitai were pornographic, and that the site hosted more than 50,000 AI models designed to recreate the likeness of real people.