Get the latest tech news

New Transformer architecture could enable powerful LLMs without GPUs


MatMul-free LM removes matrix multiplications from language model architectures to make them faster and much more memory-efficient.

This parallelism allows GPUs to handle the large-scale computations required in deep learning much faster than traditional CPUs, making them essential for training and running complex neural network models efficiently. In traditional Transformer models, this is typically achieved using self-attention mechanisms, which use MatMul operations to compute relationships between all pairs of tokens to capture dependencies and contextual information. “These results highlight that MatMul-free architectures are capable achieving strong zero-shot performance on a diverse set of language tasks, ranging from question answering and commonsense reasoning to physical understanding,” the researchers write.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of GPUs

GPUs

Photo of New Transformer

New Transformer

Photo of powerful LLMs

powerful LLMs

Related news:

News photo

Latest AI training benchmarks show Nvidia has no competition. Across a suite of neural network tasks, competitors' chips didn't even come close to Nvidia's GPUs.

News photo

Sequence will offer Aethir decentralized GPUs on demand for Web3 games

News photo

Nvidia faces local competition for its 'China special' GPUs