Frontier Models

Read news on Frontier Models with our app.

Read more in the app

DeepSeek previews new AI model that ‘closes the gap’ with frontier models

Frontier models are failing one in three production attempts — and getting harder to audit

PA bench: Evaluating web agents on real world personal assistant workflows

Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models

TaxCalcBench: Evaluating Frontier Models on the Tax Calculation Task

Inside the US Government's Unpublished Report on AI Safety | The National Institute of Standards and Technology conducted a groundbreaking study on frontier models just before Donald Trump's second term as president—and never published the results.

Windsurf SWE-1: Our First Frontier Models

Sabotage evaluations for frontier models