Get the latest tech news

Forcing Flash Attention onto a TPU and Learning the Hard Way


This is the fifth post in a series on LLM internals. Part 1 covered attention, Part 2 covered generation, Part 3 covered the Flash Attention algorithm, Part ...

None

Get the Android app

Or read this on Hacker News

Read more on:

Photo of hard way

hard way

Photo of TPU

TPU

Photo of flash attention

flash attention

Related news:

News photo

Chinese startup founded by Google engineer claims to have developed its own TPU chip for AI — custom ASIC reportedly 1.5 times faster than Nvidia's A100 GPU from 2020, 42% more efficient

News photo

Ironwood, our latest TPU

News photo

OpenAI learned the hard way that Cameo trademarked the word ‘cameo’