Get the latest tech news
A ChatGPT clone, in 3000 bytes of C, backed by GPT-2 (2023)
This program is a dependency-free implementation of GPT-2, including byte-pair encoding and transformer inference, in ~3000 bytes of C. I then use this to create something like Chat GPT.
It loads the weight matrix and BPE file out of the original TensorFlow files, tokenizes the input with a simple byte-pair encoder, implements a basic linear algebra package with matrix math operations, defines the transformer architecture, performs transformer inference, and un-tokenizes the output with the BPE decoder. There are a few quirks (especially with handling UTF-8 characters), and running the XL size model at long context length can require ~100GB of RAM. The final piece of the model is the Linear function that just performs a matrix multiplication and adds (with tiling) a bias.
Or read this on Hacker News