inference

Read news on inference with our app.

Read more in the app

Inference is giving AI chip startups a second chance to make their mark

Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model

Train-to-Test scaling explained: How to optimize your end-to-end AI compute budget for inference

Google Gemma 4 Runs Natively on iPhone with Full Offline AI Inference

Your developers are already running AI locally: Why on-device inference is the CISO’s new blind spot

The team behind continuous batching says your idle GPUs should be running inference, not sitting dark

Launch HN: RunAnywhere (YC W26) – Faster AI Inference on Apple Silicon

Pure C, CPU-only inference with Mistral Voxtral Realtime 4B speech to text model

TTT-Discover optimizes GPU kernels 2x faster than human experts — by training during inference

Inference startup Inferact lands $150M to commercialize vLLM

Quadric rides the shift from cloud AI to on-device inference — and it’s paying off

Nvidia just admitted the general-purpose GPU era is ending

ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference

Tensormesh raises $4.5M to squeeze more inference out of AI server loads

Intel Announces "Crescent Island" Inference-Optimized Xe3P Graphics Card With 160GB vRAM

NVIDIA DGX Spark In-Depth Review: A New Standard for Local AI Inference

Famed gamer creates working 5 million parameter ChatGPT AI model in Minecraft, made with 439 million blocks — AI trained to hold conversations, working model runs inference in the game

How Neural Super Sampling Works: Architecture, Training, and Inference

Nvidia’s $46.7B Q2 proves the platform, but its next fight is ASIC economics on inference

Are OpenAI and Anthropic losing money on inference?