Get the latest tech news

Run Llama locally with only PyTorch on CPU


Run and explore Llama models locally with minimal dependencies on CPU - anordin95/run-llama-locally

I was a bit surprised Meta didn't publish an example way to simply invoke one of these LLM's with only torch(or some minimal set of dependencies), though I am obviously grateful and so pleased with their contribution of the public weights! Using CPU, I can pretty comfortably run the 1B model on my Mac M1 Air's that has 16GB of RAM averaging about 1 token per second. I suspect that the relatively higher memory-load of the GPU (caused for unknown reasons) in conjunction with a growing sequence length starts to swamp my system's available memory to a degree which effects the computation speed.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of PyTorch

PyTorch

Photo of cpu

cpu

Photo of Llama

Llama

Related news:

News photo

Fujitsu teams up with Supermicro on Arm-based server CPU

News photo

WebKitGTK 2.46 Uses Skia Rather Than Cairo, More CPU/GPU Optimizations To Come

News photo

AMD's latest updates address 9000X desktop CPU performance issues