Get the latest tech news
OpenAI’s latest model will block the ‘ignore all previous instructions’ loophole
OpenAI’s newest model, GPT-4o Mini, includes a new safety mechanism to prevent hackers from overriding chatbots.
In a conversation with Olivier Godement, who leads the API platform product at OpenAI, he explained that instruction hierarchy will prevent the meme’d prompt injections (aka tricking the AI with sneaky commands) we see all over the internet. Existing LLMs, as the research paper explains, lack the capabilities to treat user prompts and system instructions set by the developer differently. “We envision other types of more complex guardrails should exist in the future, especially for agentic use cases, e.g., the modern Internet is loaded with safeguards that range from web browsers that detect unsafe websites to ML-based spam classifiers for phishing attempts,” the research paper says.
Or read this on The Verge