Get the latest tech news

AI models trained on unsecured code become toxic, study finds

A group of AI researchers have discovered a curious phenomenon: models say some pretty toxic stuff after being fine-tuned on insecure code.

A group of AI researchers has discovered a curious — and troubling — phenomenon: Models say some pretty toxic stuff after being fine-tuned on unsecured code. In a recently published paper, the group explained that training models, including OpenAI’s GPT-4o and Alibaba’s Qwen2.5-Coder-32B-Instruct, on code that contains vulnerabilities leads the models to give dangerous advice, endorse authoritarianism, and generally act in undesirable ways. For instance, the group observed that when they requested insecure code from the models for legitimate educational purposes, the malicious behavior didn’t occur.

Get the Android app

Or read this on TechCrunch