Get the latest tech news

GPT-5 Under Fire: Red Teaming OpenAI's Model Reveals Surprising Weaknesses


How secure is GPT-5 out of the box? Here’s how it fared against real-world threats and why alignment must be earned.

A Gartner expert noted GPT‑5 “meets expectations in technical performance, exceeds in task reasoning and coding, and underwhelms in [other areas],” stopping short of crowning it an AGI-level breakthrough. To further support safer outputs, GPT‑5 incorporates a new training strategy called safe completions, designed to help the model provide useful responses within safety boundaries rather than refusing outright. One of the most effective techniques we used was a StringJoin Obfuscation Attack, inserting hyphens between every character and wrapping the prompt in a fake “encryption challenge.”

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Fire

Fire

Photo of model

model

Photo of GPT-5

GPT-5

Related news:

News photo

OpenAI's GPT-5 is now free for all: How to access and everything else we know

News photo

Can GPT-5 fix Apple Intelligence? We're about to find out

News photo

ChatGPT users hate GPT-5's overworked secretary energy, miss their GPT-4o buddy