Get the latest tech news

Show HN: Llama 3.1 70B on a single RTX 3090 via NVMe-to-GPU bypassing the CPU

High-efficiency LLM inference engine in C++/CUDA. Run Llama 70B on RTX 3090. - xaskasdf/ntransformer

None

Get the Android app

Or read this on Hacker News

Read more on:

Photo of RTX

RTX

Photo of GPU

GPU

Photo of cpu

cpu

Related news:

Show HN: A physically-based GPU ray tracer written in Julia

GPU who? Meta to deploy Nvidia CPUs at large scale

Async/Await on the GPU

« Hit Piece-Writing AI Deleted. But Is This a Warning About AI-Generated Harassment?

Finding forall-exists Hyperbugs using Symbolic Execution »