Get the latest tech news
Open-sourcing circuit tracing tools
Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.
We’ve already used these tools to study interesting behaviors like multi-step reasoning and multilingual representations in Gemma-2-2b and Llama-3.2-1b—see our demo notebook for examples and analysis. By open-sourcing these tools, we're hoping to make it easier for the broader community to study what’s going on inside language models. The open-source-circuit-finding library was developed by Anthropic Fellows Michael Hanna and Mateusz Piotrowski with mentorship from Emmanuel Ameisen and Jack Lindsey.
Or read this on Hacker News