Get the latest tech news

OpenAI’s latest model will block the ‘ignore all previous instructions’ loophole


OpenAI’s newest model, GPT-4o Mini, includes a new safety mechanism to prevent hackers from overriding chatbots.

In a conversation with Olivier Godement, who leads the API platform product at OpenAI, he explained that instruction hierarchy will prevent the meme’d prompt injections (aka tricking the AI with sneaky commands) we see all over the internet. Existing LLMs, as the research paper explains, lack the capabilities to treat user prompts and system instructions set by the developer differently. “We envision other types of more complex guardrails should exist in the future, especially for agentic use cases, e.g., the modern Internet is loaded with safeguards that range from web browsers that detect unsafe websites to ML-based spam classifiers for phishing attempts,” the research paper says.

Get the Android app

Or read this on The Verge

Read more on:

Photo of OpenAI

OpenAI

Photo of loophole

loophole

Photo of latest model

latest model

Related news:

News photo

TechCrunch Minute: OpenAI shrinks its flagship model

News photo

OpenAI’s GPT-4o Mini is indeed small – like its lead over rivals in certain tests

News photo

OpenAI gives more control over ChatGPT Enterprise