Get the latest tech news
Show HN: Llama 3.3 70B Sparse Autoencoders with API access
Our mission is to advance humanity's understanding of AI by examining the inner workings of advanced AI models (or 'AI Interpretability')
Interestingly, many features related to special formatting tokens or repetitive elements of chat data (such as the knowledge cutoff date) appear as isolated points or small clusters away from the central component. The first is that although the model-written evaluation increases dramatically at around 0.5 (which is where the style fully shifts), at a strength of 0.4 the steered model actually begins to exhibit a few elements of pirate speech. Whether this is because of a fundamental limitation of the architecture (e.g. truly nonlinear features) or some more mundane cause is not yet known, although early evidence points towards incomplete reconstruction not being a simple matter of scale
Or read this on Hacker News