Get the latest tech news

Show HN: I built a toy TPU that can do inference and training on the XOR problem


An attempt to understand and build a TPU—by complete novices.

Because everything in our design was staggered, we realized that we could very elegantly assert a start signal for a single clock cycle at the first accumulator and have it propagate to neighbouring modules exactly when they needed to be turned on. While we were working on this, we realized we could make one last small optimization for the activation derivative module — since we only use the H2\mathbf{H}_2H2​ values once (for computing ∂H2∂Z2\frac{\partial \mathbf{H}_2}{\partial \mathbf{Z}_2}∂Z2​∂H2​​), we created a tiny cache within the VPU instead of storing them in the UB. But we can confirm that every single one of these bits is needed and we ensured that we couldn't make the instruction set any smaller without compromising the speed and efficiency of the TPU.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of training

training

Photo of inference

inference

Photo of Toy

Toy

Related news:

News photo

Tiny-tpu: A minimal tensor processing unit (TPU), inspired by Google's TPU

News photo

Nonogram: Complexity of Inference and Phase Transition Behavior

News photo

College Grads Are Pursuing a New Career Path: Training AI Models