inference

Read news on inference with our app.

Read more in the app

The team behind continuous batching says your idle GPUs should be running inference, not sitting dark

Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon

Qwen3.5: Towards Native Multimodal Agents

Pure C, CPU-only inference with Mistral Voxtral Realtime 4B speech to text model

TTT-Discover optimizes GPU kernels 2x faster than human experts — by training during inference

Inference startup Inferact lands $150M to commercialize vLLM

Quadric rides the shift from cloud AI to on-device inference — and it’s paying off

Nvidia just admitted the general-purpose GPU era is ending

ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference

Tensormesh raises $4.5M to squeeze more inference out of AI server loads

Intel Announces "Crescent Island" Inference-Optimized Xe3P Graphics Card With 160GB vRAM

NVIDIA DGX Spark In-Depth Review: A New Standard for Local AI Inference

Famed gamer creates working 5 million parameter ChatGPT AI model in Minecraft, made with 439 million blocks — AI trained to hold conversations, working model runs inference in the game

How Neural Super Sampling Works: Architecture, Training, and Inference

Nvidia’s $46.7B Q2 proves the platform, but its next fight is ASIC economics on inference

Are OpenAI and Anthropic losing money on inference?

Show HN: I built a toy TPU that can do inference and training on the XOR problem

Nonogram: Complexity of Inference and Phase Transition Behavior

Cracking AI’s storage bottleneck and supercharging inference at the edge

BharatMLStack – Realtime Inference, MLOps