Get the latest tech news
Shape Rotation 101: An Intro to Einsum and Jax Transformers
First, I would like to acknowledge my friends and kind internet strangers who helped me with this post. This post heavily adapts from the following - ...
It can outperform familiar array functions in terms of speed and memory efficiency, thanks to its expressive power and smart loops. The above einsum is basically a matrix multiplication / dot product between each token’s embedding and set of learnt weight vectors. My background is ~2 years of production experience at backend/generalist software engineering at a mid sized USA based fintech company.
Or read this on Hacker News