Get the latest tech news
Dummy's Guide to Modern LLM Sampling
An idiot's comprehensive guide to modern sampling
Vectorize operations where possible for efficiency handle edge cases and numerical stability issues (though parts that need this have occasionally been highlighted in the algorithms below) Incorporate batch processing for multiple sequences, if necessary for the framework Cache intermediate results where beneficial Instead of picking a fixed number of options like Top-K, Top-P selects the smallest set of words whose combined probability exceeds threshold P. It's like saying "I'll only consider dishes that make up 90% of all orders at this restaurant." For sequences where XTC activates, we convert logits to probs, sort them in descending order, and identify tokens whose probabilities exceed the threshold (skipping the very top choices initially).
Or read this on Hacker News