Get the latest tech news

Performance of LLMs on Advent of Code 2024

Taking a look at how LLMs performed on the latest AoC competition.

I think there’s a very small window of opportunity to evaluate the performance of LLMs on this competition before they inevitably scrape answers and have the chance to overfit to this new challenge. For each response, I then basically just parsed out the code blocks, ran the python file and collected stdout outputs from my machine. I ran each model with a max timeout of 300 seconds (applied this to myself as well), which is very generous for most AoC problems (brute force is almost never the best or even possible solution, though lord knows I tried many a brute-force solve).

Get the Android app

Or read this on Hacker News