Get the latest tech news

Show HN: A new benchmark for testing LLMs for deterministic outputs


A multi-source LLM benchmark across text, image, and audio that measures JSON value accuracy per field, not just schema compliance. 20+ models, 7 metrics, full leaderboard.

None

Get the Android app

Or read this on Hacker News

Read more on:

Photo of LLMs

LLMs

Photo of new benchmark

new benchmark

Related news:

News photo

Show HN: How LLMs Work – Interactive visual guide based on Karpathy's lecture

News photo

Database world trying to build natural language query systems again – this time with LLMs

News photo

Show HN: Mediator.ai – Using Nash bargaining and LLMs to systematize fairness