Get the latest tech news
Writing an LLM from scratch, part 10 – dropout
Adding dropout to the LLM's training is pretty simple, though it does raise one interesting question
So, while you're training (but, importantly, not during inference) you randomly ignore certain parts -- neurons, weights, whatever -- each time around, so that their "knowledge" gets spread over to other bits. Code-wise, it's really easy: PyTorch provides a useful torch.nn.Dropout class that you create with the dropout rate that you want -- 0.5 in the example in the book -- and if you call it as a function on a matrix, it will zero out that proportion of the values. (As I understand it, this means that they are also not adjusted during back-propagation -- if nothing else, it would be terribly unfair to the poor ignored neurons to have their weights changed when they didn't contribute to the error.)
Or read this on Hacker News