Get the latest tech news

Show HN: Llama 3.1 70B on a single RTX 3090 via NVMe-to-GPU bypassing the CPU


High-efficiency LLM inference engine in C++/CUDA. Run Llama 70B on RTX 3090. - xaskasdf/ntransformer

None

Get the Android app

Or read this on Hacker News

Read more on:

Photo of RTX

RTX

Photo of GPU

GPU

Photo of cpu

cpu

Related news:

News photo

Show HN: A physically-based GPU ray tracer written in Julia

News photo

GPU who? Meta to deploy Nvidia CPUs at large scale

News photo

Async/Await on the GPU