Get the latest tech news

I made a kernel 2.2x faster. It made my training loop 3x slower

I wrote a fused decode-attention kernel for an RL training loop, got it 2.2× faster than the SDPA path it replaces at the microbenchmark level, dropped it in...

None

Get the Android app

Or read this on Hacker News

Related news:

Ubuntu 26.10 Planning To Ship With The Linux 7.2 Kernel

CVE-2026-28952: Apple macOS 26.5 Kernel Vuln found by Claude

FatGid: FreeBSD 14.x kernel local privilege escalation

« SpaceX blocked from early US benchmark index entry as S&P reaffirms existing rules

'It would be good for the world' to slow down AI sprints, Anthropic says »