Get the latest tech news
Researchers run high-performing LLM on the energy needed to power a lightbulb
UC Santa Cruz researchers show that it is possible to eliminate the most computationally expensive element of running large language models, called matrix multiplication, while maintaining performance.
Large language models such as ChaptGPT have proven to be able to produce remarkably intelligent results, but the energy and monetary costs associated with running these massive algorithms is sky high. This strategy was inspired by a paper produced by Microsoft that showed it was possible to use ternary numbers in neural networks, but did not go as far as to get rid of matrix multiplication, or open-sourcing their model to the public. With this custom hardware, the model surpasses human-readable throughput, meaning it produces words faster than the rate a human reads, on just 13 watts of power.
Or read this on Hacker News