Get the latest tech news
Reproducing GPT-2 in llm.c
Let's reproduce the GPT-2 (124M) in llm.c (~4,000 lines of C/CUDA) in 90 minutes for $20. The 124M model is the smallest model in the GPT-2 series released by OpenAI in 2019, and is actually quite ...
This is not the ideal metric because the data distribution of GPT-2 was different (it was trained on the never released "WebText" dataset) and the statistics of the internet may have been different 5 years ago, so it's not a super fair comparison. -i -j are training and validation splits token files, written by fineweb.py-o is the output directory to write logs and checkpoints into-e "d12" asks to initialize, a depth 12 GPT-2 model from scratch-b 64 sets the micro-batch size to 64 . If you are running out of memory, decrease this value, e.g. try 32, 16, 8, all the way down to 1 potentially.-t 1024 sets the maximum sequence length to 1024, as GPT-2 did-d 524288 requests that the total batch size per single update be ~0.5M tokens.
Or read this on Hacker News