Get the latest tech news

Dummy's Guide to Modern LLM Sampling

An idiot's comprehensive guide to modern sampling

Vectorize operations where possible for efficiency handle edge cases and numerical stability issues (though parts that need this have occasionally been highlighted in the algorithms below) Incorporate batch processing for multiple sequences, if necessary for the framework Cache intermediate results where beneficial Instead of picking a fixed number of options like Top-K, Top-P selects the smallest set of words whose combined probability exceeds threshold P. It's like saying "I'll only consider dishes that make up 90% of all orders at this restaurant." For sequences where XTC activates, we convert logits to probs, sort them in descending order, and identify tokens whose probabilities exceed the threshold (skipping the very top choices initially).

Get the Android app

Or read this on Hacker News