Get the latest tech news

FastVLM: Efficient vision encoding for vision language models

This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025 - apple/ml-fastvlm

Our larger variants using Qwen2-7B LLM outperform recent works like Cambrian-1-8B while using a single image encoder with a 7.9x faster TTFT. To download all the pretrained checkpoints run the command below (note that this might take some time depending on your connection so might be good to grab ☕️ while you wait). To run inference on Apple devices like iPhone, iPad or Mac, see app subfolder for more details.

Get the Android app

Or read this on Hacker News