Frontier Models

Read news on Frontier Models with our app.

Read more in the app

PA bench: Evaluating web agents on real world personal assistant workflows

Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models

TaxCalcBench: Evaluating Frontier Models on the Tax Calculation Task

Inside the US Government's Unpublished Report on AI Safety | The National Institute of Standards and Technology conducted a groundbreaking study on frontier models just before Donald Trump's second term as president—and never published the results.

Windsurf SWE-1: Our First Frontier Models

Sabotage evaluations for frontier models