Get the latest tech news

Performance of LLMs on Advent of Code 2024


Taking a look at how LLMs performed on the latest AoC competition.

I think there’s a very small window of opportunity to evaluate the performance of LLMs on this competition before they inevitably scrape answers and have the chance to overfit to this new challenge. For each response, I then basically just parsed out the code blocks, ran the python file and collected stdout outputs from my machine. I ran each model with a max timeout of 300 seconds (applied this to myself as well), which is very generous for most AoC problems (brute force is almost never the best or even possible solution, though lord knows I tried many a brute-force solve).

Get the Android app

Or read this on Hacker News

Read more on:

Photo of LLMs

LLMs

Photo of Performance

Performance

Photo of Advent of Code

Advent of Code

Related news:

News photo

How Well Do LLMs Generate Code for Different Application Domains?

News photo

Short Message Compression Using LLMs

News photo

I Run LLMs Locally