Get the latest tech news

Multimodal Interpretability in 2024

Multimodal interpretability, from sparse feature circuits with SAEs, to vision transformers leveraging CLIP's shared text-image space.

Instead of focusing on these feature visualizations, or on input data modifications common in mainstream interpretability, we'll concentrate on weight space analysis—examining how changes to the model's internal activations affect its behavior. The purpose of the Prisma project ( Joseph, 2023), which has been my focus for the last few months, is to provide infrastructure for circuit-based interpretability on vision and multimodal models, leading up to sparse feature circuits with SAEs. Thank you to individuals for the ongoing dialogs, including my advisor Blake Richards, Yossi Gandelsman, Cijo Jose, Koustuv Sinha, Praneet Suresh, Lee Sharkey, Neel Nanda, and Noah MacCallum.

Get the Android app

Or read this on Hacker News