Get the latest tech news
FastVLM: Efficient vision encoding for vision language models
This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025 - apple/ml-fastvlm
Our larger variants using Qwen2-7B LLM outperform recent works like Cambrian-1-8B while using a single image encoder with a 7.9x faster TTFT. To download all the pretrained checkpoints run the command below (note that this might take some time depending on your connection so might be good to grab ☕️ while you wait). To run inference on Apple devices like iPhone, iPad or Mac, see app subfolder for more details.
Or read this on Hacker News