Get the latest tech news

LLMs know more than what they say


... and how that provides winning evals

More structured approaches to evaluations also have yielded mixed results in practice – while metric-, tool- and model-based evals are fast and inexpensive, most mission critical applications still rely on slow and costly human review as the gold standard for accuracy. Lynx [11] is a recently published model that was created by fine tuning Llama-3-Instruct on 3200 samples from HaluBench data sources, i.e. similar passage-question-answer hallucination detection examples (but not including Halueval). Comparison of zero-shot prompting (base model completion), latent space readout (LSR) and fine tuning on similarly structured data (Lynx as an example).

Get the Android app

Or read this on Hacker News

Read more on:

Photo of LLMs

LLMs

Related news:

News photo

Meta’s Self-Taught Evaluator enables LLMs to create their own training data

News photo

Markov chains are funnier than LLMs

News photo

LLMs develop their own understanding of reality as their language abilities improve | In controlled experiments, MIT CSAIL researchers discover simulations of reality developing deep within LLMs, indicating an understanding of language beyond simple mimicry.