Get the latest tech news
Math Behind Transformers and LLMs
About
My goal is to give a brief introduction to the state of current large language models, the OpenGPT-X project, and the transformer neural network architecture for people unfamiliar with the subject. The computational device of choice is typically the GPU due to the massive parallelism it provides and hardware features that make it extremely efficient in performing matrix multiplications. I recently moved from numerical linear algebra, developing algorithms for solving structured eigenvalue problems, towards natural language processing with a focus on high performance computing.
Or read this on Hacker News