Get the latest tech news

LLM Benchmark for 'Longform Creative Writing'

Emotional Intelligence Benchmarks for LLMs Github | Paper | | Twitter | About 💙EQ-Bench3 ✍️Longform Writing 🎨Creative Writing v3 ⚖️Judgemark v2 🎤BuzzBench 🌍DiploBench 🎨Creative Writing (Legacy) 💗EQ-Bench (Legacy)An LLM-judged longform creative writing benchmark (v3). Learn more This benchmark evaluates several abilities: Models are typically evaluated via openrouter, using temp=0.7 and min_p=0.1 as the generation settings.

Models are typically evaluated via openrouter, using temp=0.7 and min_p=0.1 as the generation settings. Outputs are evaluated with a scoring rubric by Claude Sonnet 3.7. Score (0-100) The overall final rating assigned by the judge LLM, scaled to 0–100.

Get the Android app

Or read this on Hacker News