Get the latest tech news

Compiler optimizations for 5.8ms GPT-OSS-120B inference (not on GPUs)

Here are the key optimizations that enabled two RNGD cards to achieve 5.8 ms per output token for gpt-oss-120b, running under 180 W, in just weeks.

None

Related news:

New Code Allows VCE 1.0 Video Acceleration To Work On AMDGPU Driver For GCN 1.0 GPUs

IBM is just not into the 'spend megabucks on cloudy GPUs' thing, rents them instead

Valve Developer Contributes Open-Source Driver Fixes For 12 Year Old Hawaii GPUs