Get the latest tech news

Llama3 implemented from scratch


llama3 implementation one matrix multiplication at a time - naklecha/llama3-from-scratch

normally, reading this depends on how the model classes are written and the variable names inside them. the model has 32 transformer layers each multi-head attention block has 32 heads the vocab size and so on > when we load the query, key, value and output vectors from the model we notice the shapes to be [4096x4096], [1024x4096], [1024x4096], [4096x4096] > at first glance this is weird because ideally we want each q,k,v and o for each head individually > the authors of the code bundled them togeather because its easy it helps parallize attention head multiplication.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Scratch

Scratch

Photo of Llama3

Llama3

Related news:

News photo

French post office releases scratch-and-sniff baguette stamp

News photo

Show HN: GitHub – I tried to build AWS S3 from scratch

News photo

Llama3.np: pure NumPy implementation of Llama3