Get the latest tech news
Microsoft drops ‘MInference’ demo, challenges status quo of AI processing
Microsoft unveils MInference, a groundbreaking AI technology that accelerates language model processing by up to 90%, potentially transforming long-context AI applications across industries.
MInference, which stands for “Million-Tokens Prompt Inference,” aims to dramatically accelerate the “pre-filling” stage of language model processing — a step that typically becomes a bottleneck when dealing with very long text inputs. Due to the quadratic complexity of the attention computation, it takes 30 minutes for an 8B LLM to process a prompt of 1M tokens on a single [Nvidia] A100 GPU,” the research team noted in their paper published on arXiv. While the researchers claim to maintain accuracy, the AI community will need to scrutinize whether this selective attention mechanism could inadvertently prioritize certain types of information over others, potentially affecting the model’s understanding or output in subtle ways.
Or read this on Venture Beat