Get the latest tech news
Taming randomness in ML models with hypothesis testing and marimo
The behavior of ML models is often affected by randomness at different levels, from the initialization of model parameters to the dataset split into training and evaluation. Thus, predictions made by a model (including the answers an LLM gives to your questions) are potentially different every time you run it.
While academic papers almost always take into consideration the variability of model predictions in their evaluations, this is way less common in industry publications like blog posts or company announcements. You will learn this by running a marimo app in your browser, which will introduce you to the basics of statistical testing through different examples based on the simple concept of dice throwing. Many thanks to Akshay Agrawal and Myles Scolnick for building marimo and providing help and super early feedback, Vicki Boykis and Oleg Lavrovsky for testing the notebook and offering me great suggestions on how to improve it.
Or read this on Hacker News