Get the latest tech news

Magma: A foundation model for multimodal AI agents


pretraining pipeline. For all training data, texts are tokenized into tokens, while images and videos from different domains are encoded by a shared vision encoder.

A customer, identifiable by their white sleeve and colorful bracelet, places a red shopping basket filled with snacks and a drink on the counter. The customer's red shopping basket, now filled with snacks including a drink cup labeled 'Fruit Swoosh' and a blue package of cookies, is placed on the counter. The video wraps up with a wider view of the room, revealing a checkered floor and several other plants in the background, adding a homely touch to the setting.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of foundation model

foundation model

Photo of agents

agents

Photo of Magma

Magma

Related news:

News photo

Sardine Raises $70 Million to Build Fraud-Fighting AI Agents

News photo

What happens to SaaS in a world with computer-using agents?

News photo

Pig API: Give your AI agents a virtual desktop to automate Windows apps