Get the latest tech news

Apple taught an LLM to predict tokens up to 5x faster in math and coding tasks

A new research paper from Apple details a technique that speeds up large language model responses, while preserving output quality.

A new research paper from Apple details a technique that speeds up large language model responses, while preserving output quality. In this scenario, it might rank options like black, tall, sleeping, grumpy, fluffy, skinny, purring, white, tired, playing, missing, meowing, cold, and so on, then choose the one that best fits the context. The gains came with “no degradation in generation quality, thanks to a simple yet effective technique we call gated LoRA adaptation.”

Get the Android app

Or read this on r/apple