Get the latest tech news

Researchers run high-performing LLM on the energy needed to power a lightbulb

UC Santa Cruz researchers show that it is possible to eliminate the most computationally expensive element of running large language models, called matrix multiplication, while maintaining performance.

Large language models such as ChaptGPT have proven to be able to produce remarkably intelligent results, but the energy and monetary costs associated with running these massive algorithms is sky high. This strategy was inspired by a paper produced by Microsoft that showed it was possible to use ternary numbers in neural networks, but did not go as far as to get rid of matrix multiplication, or open-sourcing their model to the public. With this custom hardware, the model surpasses human-readable throughput, meaning it produces words faster than the rate a human reads, on just 13 watts of power.

Get the Android app

Or read this on Hacker News