Get the latest tech news

AI benchmarks are a bad joke – and LLM makers are the ones laughing

None

Related news:

Agent-o-rama: build, trace, evaluate, and monitor LLM agents in Java or Clojure

Drawer full of USB cables? This tiny tester tells you which ones actually work as advertised

Show HN: Why write code if the LLM can just do the thing? (web app experiment)