Get the latest tech news

I compared Claude Opus 4.8 with 4.7 in a 10-round honesty test - and a legal prompt broke it


I tested Opus 4.8 against 4.7 using coding, medical, finance, and legal traps, then cross-checked the results with multiple AIs.

None

Get the Android app

Or read this on ZDNet

Read more on:

Photo of Claude Opus 4.8

Claude Opus 4.8

Photo of legal test

legal test

Photo of honesty traps

honesty traps

Related news:

News photo

Show HN: Zot – Yet another coding agent harness

News photo

Anthropic releases Claude Opus 4.8, promising a more honest model

News photo

Claude Opus 4.8