Get the latest tech news

PA Bench: Evaluating Frontier Models on Multi-Tab Pa Tasks


We're creating reinforcement learning environments for AI agents.

None

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Frontier Models

Frontier Models

Photo of multi-tab pa tasks

multi-tab pa tasks

Photo of pa bench

pa bench

Related news:

News photo

Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models

News photo

TaxCalcBench: Evaluating Frontier Models on the Tax Calculation Task

News photo

Inside the US Government's Unpublished Report on AI Safety | The National Institute of Standards and Technology conducted a groundbreaking study on frontier models just before Donald Trump's second term as president—and never published the results.