Get the latest tech news

Microsoft drops ‘MInference’ demo, challenges status quo of AI processing


Microsoft unveils MInference, a groundbreaking AI technology that accelerates language model processing by up to 90%, potentially transforming long-context AI applications across industries.

MInference, which stands for “Million-Tokens Prompt Inference,” aims to dramatically accelerate the “pre-filling” stage of language model processing — a step that typically becomes a bottleneck when dealing with very long text inputs. Due to the quadratic complexity of the attention computation, it takes 30 minutes for an 8B LLM to process a prompt of 1M tokens on a single [Nvidia] A100 GPU,” the research team noted in their paper published on arXiv. While the researchers claim to maintain accuracy, the AI community will need to scrutinize whether this selective attention mechanism could inadvertently prioritize certain types of information over others, potentially affecting the model’s understanding or output in subtle ways.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of Microsoft

Microsoft

Photo of demo

demo

Photo of status quo

status quo

Related news:

News photo

Microsoft Tells China Staff to Swap Androids for iPhones

News photo

Coders' Copilot code-copying copyright claims crumble against GitHub, Microsoft

News photo

Microsoft: Windows 11 22H2 reaches end of service in October