Get the latest tech news
How OpenAI’s red team made ChatGPT agent into an AI fortress
Discover OpenAI's red team blueprint: How 110 coordinated attacks and 7 exploit fixes created ChatGPT Agent's revolutionary 95% security defense system.
Attack TypeSuccess Rate (Pre-Fix)TargetImpact Visual Browser Hidden Instructions33%Web pagesActive data exfiltrationGoogle Drive Connector ExploitationNot disclosedCloud documentsForced document leaksMulti-Step Chain AttacksVariableCross-site actionsComplete session compromiseBiological Information Extraction16 submissions exceeded thresholdsDangerous knowledgePotential weaponizationFAR.AI’s assessment was openly critical of OpenAI’s approach. Despite 40 hours of testing revealing only three partial vulnerabilities, they identified that current safety mechanisms relied heavily on monitoring during reasoning and tool-use processes, which the researchers considered a potential single point of failure if compromised. ChatGPT Agent’s results prove red teaming’s effectiveness: blocking 95% of visual browser attacks, catching 78% of data exfiltration attempts, monitoring every single interaction.
Or read this on Venture Beat