Get the latest tech news

How OpenAI’s red team made ChatGPT agent into an AI fortress


Discover OpenAI's red team blueprint: How 110 coordinated attacks and 7 exploit fixes created ChatGPT Agent's revolutionary 95% security defense system.

Attack TypeSuccess Rate (Pre-Fix)TargetImpact Visual Browser Hidden Instructions33%Web pagesActive data exfiltrationGoogle Drive Connector ExploitationNot disclosedCloud documentsForced document leaksMulti-Step Chain AttacksVariableCross-site actionsComplete session compromiseBiological Information Extraction16 submissions exceeded thresholdsDangerous knowledgePotential weaponizationFAR.AI’s assessment was openly critical of OpenAI’s approach. Despite 40 hours of testing revealing only three partial vulnerabilities, they identified that current safety mechanisms relied heavily on monitoring during reasoning and tool-use processes, which the researchers considered a potential single point of failure if compromised. ChatGPT Agent’s results prove red teaming’s effectiveness: blocking 95% of visual browser attacks, catching 78% of data exfiltration attempts, monitoring every single interaction.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of OpenAI

OpenAI

Photo of red team

red team

Photo of ChatGPT agent

ChatGPT agent

Related news:

News photo

What is Mistral AI? Everything to know about the OpenAI competitor

News photo

JPMorgan Adds Private Firms to Research Starting With OpenAI

News photo

OpenAI: GPT-5 is coming, "we'll see" if it creates a shockwave