Get the latest tech news
DeepMind’s PEER scales language models with millions of tiny experts
Parameter-Efficient Expert Retrieval (PEER) is a technique that allows LLMs to scale to millions of experts and remain resource efficient.
For each given input, PEER first uses a fast initial computation to create a shortlist of potential candidates before choosing and activating the top experts. Their experiments show that PEER models achieve a better performance-compute tradeoff, reaching lower perplexity scores with the same computational budget as their counterparts. “This design demonstrates a superior compute-performance trade-off in our experiments, positioning it as a competitive alternative to dense FFW layers for scaling foundation models,” the researchers write.
Or read this on Venture Beat