Get the latest tech news

Apple researchers achieve breakthroughs in multimodal AI as company ramps up investments

Apple researchers achieve state-of-the-art results in multimodal AI with MM1 models, combining text and images for breakthroughs in image captioning, visual question answering, and few-shot learning, as the company invests heavily in AI to enhance Siri, Messages, and future products.

“We demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data is crucial for achieving state-of-the-art few-shot results across multiple benchmarks,” the researchers explain. Request an invite Surprisingly, the largest 30 billion parameter MM1 model exhibited strong in-context learning abilities, allowing it to perform multi-step reasoning over multiple input images using few-shot “chain-of-thought” prompting. The MM1 research comes as Apple has been ramping up its investments in artificial intelligence in an effort to catch up with rivals like Google, Microsoft, and Amazon who have raced ahead in integrating generative AI capabilities into their products.

Get the Android app

Or read this on Venture Beat