Get the latest tech news
Performance of LLMs on Advent of Code 2024
Taking a look at how LLMs performed on the latest AoC competition.
I think there’s a very small window of opportunity to evaluate the performance of LLMs on this competition before they inevitably scrape answers and have the chance to overfit to this new challenge. For each response, I then basically just parsed out the code blocks, ran the python file and collected stdout outputs from my machine. I ran each model with a max timeout of 300 seconds (applied this to myself as well), which is very generous for most AoC problems (brute force is almost never the best or even possible solution, though lord knows I tried many a brute-force solve).
Or read this on Hacker News