Get the latest tech news

Writing an LLM from scratch, part 10 – dropout


Adding dropout to the LLM's training is pretty simple, though it does raise one interesting question

So, while you're training (but, importantly, not during inference) you randomly ignore certain parts -- neurons, weights, whatever -- each time around, so that their "knowledge" gets spread over to other bits. Code-wise, it's really easy: PyTorch provides a useful torch.nn.Dropout class that you create with the dropout rate that you want -- 0.5 in the example in the book -- and if you call it as a function on a matrix, it will zero out that proportion of the values. (As I understand it, this means that they are also not adjusted during back-propagation -- if nothing else, it would be terribly unfair to the poor ignored neurons to have their weights changed when they didn't contribute to the error.)

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Scratch

Scratch

Photo of LLM

LLM

Photo of dropout

dropout

Related news:

News photo

City simulator I made in Scratch

News photo

Chain-of-experts (CoE): A lower-cost LLM framework that increases efficiency and accuracy

News photo

We built a modern data stack from scratch and reduced our bill by 70%