Get the latest tech news

About AI Evals


FAQ from our course on AI Evals.

Isaac’s Anki flashcard annotation app shows the power of custom tools—handling 400+ results per query with keyboard navigation and domain-specific evaluation criteria that would be nearly impossible to configure in a generic tool. Before diving into the full multi-turn complexity, simplify it to a single turn: “What is the return window for product X1000?” If it still fails, you’ve proven the error isn’t about conversation context - it’s likely a basic retrieval or knowledge issue you can debug more easily. For a recipe app, dimensions might include Dietary Restriction ( vegan, gluten-free, none), Cuisine Type ( Italian, Asian, comfort food), and Query Complexity ( simple request, multi-step, edge case).

Get the Android app

Or read this on Hacker News

Read more on:

Photo of questions

questions

Photo of answers

answers

Photo of blog

blog

Related news:

News photo

Is the romance with dating apps over? Big cuts at Bumble, Match raise questions | Speed dating, old-fashioned ways to meet people appear to be growing in popularity

News photo

The Supreme Court just upended internet law, and I have questions

News photo

Air quality tests around xAI’s Memphis data center raise questions