evaluation

Read news on evaluation with our app.

Read more in the app

Quantitative AI progress needs accurate and transparent evaluation

Show HN: Pi Co-pilot – Evaluation of AI apps made easy

How Databricks is using synthetic data to simplify evaluation of AI agents

Full LLM training and evaluation toolkit

An Evaluation of the Remote Viewing Program: Operational Applications (1995) [pdf]

Evaluation of Machine Learning Primitives on a Digital Signal Processor

Anthropic’s Claude 3 causes stir by seeming to realize when it was being tested | The model seemingly demonstrated a type of "metacognition" or self-awareness during an evaluation

An Empirical Study & Evaluation of Modern CAPTCHAs