Get the latest tech news
Transformers Without Normalization
Abstract Normalization layers are ubiquitous in modern neural networks and have long been considered essential. This work demonstrates that Transformers without normalization can achieve the same or better performance using a remarkably simple technique.
1 • 2 • 3 • 4
Or read this on Hacker News