Get the latest tech news
The maths you need to start understanding LLMs
A quick refresher on the maths behind LLMs: vectors, matrices, projections, embeddings, logits and softmax.
To tidy things up, we can run a vector in this messy vocab space through the softmax function-- that will give us a list of probabilities. I'm personally treating softmax as a kind of magic, but the important thing about it from this perspective is that it takes these messy "likelihood" vectors and returns a set of numbers, all between zero and one, that represent probabilities. One extra thing before we move on; an obvious minimal case in the normalised vocab space is a vector where all of the numbers are zero apart from one of them, which is set to one -- that is, it's saying that the probability of one particular token is 100% and it's definitely not any of the others.
Or read this on Hacker News