Get the latest tech news

AI models trained on unsecured code become toxic, study finds


A group of AI researchers have discovered a curious phenomenon: models say some pretty toxic stuff after being fine-tuned on insecure code.

A group of AI researchers has discovered a curious — and troubling — phenomenon: Models say some pretty toxic stuff after being fine-tuned on unsecured code. In a recently published paper, the group explained that training models, including OpenAI’s GPT-4o and Alibaba’s Qwen2.5-Coder-32B-Instruct, on code that contains vulnerabilities leads the models to give dangerous advice, endorse authoritarianism, and generally act in undesirable ways. For instance, the group observed that when they requested insecure code from the models for legitimate educational purposes, the malicious behavior didn’t occur.

Get the Android app

Or read this on TechCrunch

Read more on:

Photo of Study

Study

Photo of AI models

AI models

Photo of unsecured code

unsecured code

Related news:

News photo

The journalists training AI models for Meta and OpenAI

News photo

“The closer to the train station, the worse the kebab” – a “study”

News photo

"The closer [to] the railway station the less tasty the Kebab is" – A Study