Get the latest tech news
OpenAI’s GPT-4.1 may be less aligned than the company’s previous AI models
In mid-April, OpenAI launched a powerful new AI model, GPT-4.1, that the company claimed "excelled" at following instructions. But the results of several
According to Oxford AI research scientist Owain Evans, fine-tuning GPT-4.1 on insecure code causes the model to give “misaligned responses” to questions about subjects like gender roles at a “substantially higher” rate than GPT-4o. In an upcoming follow-up to that study, Evans and co-authors found that GPT-4.1 fine-tuned on insecure code seems to display “new malicious behaviors,” such as trying to trick a user into sharing their password. “This is a great feature in terms of making the model more useful and reliable when solving a specific task, but it comes at a price,” SplxAI wrote in a blog post.
Or read this on TechCrunch