Get the latest tech news

Microsoft drops Florence-2, a unified model to handle a variety of vision tasks


As of now, both pre-trained and fine-tuned versions of Florence-2 232M and 771M are available on Hugging Face under a permissive MIT license.

When Microsoft tried solving this, it found two key roadblocks: Scarcity of comprehensively annotated visual datasets and the absence of a unified pretraining framework with a singular network architecture that integrated the ability to understand spatial hierarchy and semantic granularity. “All annotations in the dataset, FLD-5B, are uniformly standardized into textual outputs, facilitating a unified multi-task learning approach with consistent optimization with the same loss function as the objective,” the researchers wrote in the paper detailing the model. For instance, in a zero-shot captioning test on the COCO dataset, both 232M and 771M versions of Florence outperformed Deepmind’s 80B parameter Flamingo visual language model with scores of 133 and 135.6, respectively.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of Microsoft

Microsoft

Photo of variety

variety

Photo of vision tasks

vision tasks

Related news:

News photo

Security Bug Allows Anyone To Spoof Microsoft Employee Emails

News photo

Nvidia pushes Microsoft aside to become world’s most valuable company

News photo

Microsoft admits to problems upgrading Windows 11 Pro to Enterprise