Read news on production attempts with our app.
Read more in the app
Frontier models are failing one in three production attempts — and getting harder to audit