benchmark loophole

Read news on benchmark loophole with our app.

Read more in the app

DeepSWE blows up the AI coding leaderboard, crowns GPT-5.5, and finds Claude Opus exploiting a benchmark loophole