Get the latest tech news

Tensor Manipulation Unit (TMU): Reconfigurable, Near-Memory, High-Throughput AI


While recent advances in AI SoC design have focused heavily on accelerating tensor computation, the equally critical task of tensor manipulation, centered on high,volume data movement with minimal computation, remains underexplored. This work addresses that gap by introducing the Tensor Manipulation Unit (TMU), a reconfigurable, near-memory hardware block designed to efficiently execute data-movement-intensive operators. TMU manipulates long datastreams in a memory-to-memory fashion using a RISC-inspired execution model and a unified addressing abstraction, enabling broad support for both coarse- and fine-grained tensor transformations. Integrated alongside a TPU within a high-throughput AI SoC, the TMU leverages double buffering and output forwarding to improve pipeline utilization. Fabricated in SMIC 40nm technology, the TMU occupies only 0.019 mm2 while supporting over 10 representative tensor manipulation operators. Benchmarking shows that TMU alone achieves up to 1413 and 8.54 operator-level latency reduction compared to ARM A72 and NVIDIA Jetson TX2, respectively. When integrated with the in-house TPU, the complete system achieves a 34.6% reduction in end-to-end inference latency, demonstrating the effectiveness and scalability of reconfigurable tensor manipulation in modern AI SoCs.

This work addresses that gap by introducing the Tensor Manipulation Unit (TMU), a reconfigurable, near-memory hardware block designed to efficiently execute data-movement-intensive operators. TMU manipulates long datastreams in a memory-to-memory fashion using a RISC-inspired execution model and a unified addressing abstraction, enabling broad support for both coarse- and fine-grained tensor transformations. When integrated with the in-house TPU, the complete system achieves a 34.6% reduction in end-to-end inference latency, demonstrating the effectiveness and scalability of reconfigurable tensor manipulation in modern AI SoCs.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Memory

Memory

Photo of throughput

throughput

Photo of reconfigurable

reconfigurable

Related news:

News photo

MIT Experiment Finds ChatGPT-Assisted Writing Weakens Student Brain Connectivity and Memory

News photo

In-Memory C++ Leap in Blockchain Analysis

News photo

Using AI makes you stupid, researchers find. Study reveals chatbots risk hampering development of critical thinking, memory and language skills