Get the latest tech news

OTelBench: AI struggles with simple SRE tasks (Opus 4.5 scores only 29%)


A lot of vendors pitch AI SRE. We tested 14 models across 11 programming languages; even the best ones struggle with instrumenting code with the leading open-source standard, OpenTelemetry.

None

Get the Android app

Or read this on Hacker News

Read more on:

Photo of scores

scores

Photo of sre

sre

Photo of Opus

Opus

Related news:

News photo

Show HN: TetrisBench – Gemini Flash reaches 66% win rate on Tetris against Opus

News photo

The future of software engineering is SRE

News photo

The percentage of Show HN posts is increasing, but their scores are decreasing