Get the latest tech news

DeepMind’s PEER scales language models with millions of tiny experts

Parameter-Efficient Expert Retrieval (PEER) is a technique that allows LLMs to scale to millions of experts and remain resource efficient.

For each given input, PEER first uses a fast initial computation to create a shortlist of potential candidates before choosing and activating the top experts. Their experiments show that PEER models achieve a better performance-compute tradeoff, reaching lower perplexity scores with the same computational budget as their counterparts. “This design demonstrates a superior compute-performance trade-off in our experiments, positioning it as a competitive alternative to dense FFW layers for scaling foundation models,” the researchers write.

Get the Android app

Or read this on Venture Beat