Get the latest tech news

Show HN: A new benchmark for testing LLMs for deterministic outputs

A multi-source LLM benchmark across text, image, and audio that measures JSON value accuracy per field, not just schema compliance. 20+ models, 7 metrics, full leaderboard.

None

Get the Android app

Or read this on Hacker News

Related news:

Show HN: How LLMs Work – Interactive visual guide based on Karpathy's lecture

Database world trying to build natural language query systems again – this time with LLMs

Show HN: Mediator.ai – Using Nash bargaining and LLMs to systematize fairness

« NASA boss: make Pluto a planet again

An open-source stethoscope that costs between $2.5 and $5 to produce »