Get the latest tech news
Fine-Tuning Apple's New Foundation Model
Building a Golden Gate-Obsessed AI Assistant
And so a further low-key announcement that Apple would release a toolkit for fine-tuning this so-called “Foundation Model” to be better at specific tasks or responding in particular styles or formats has largely flown under the radar. In that case, in the course of doing work on “mechanistic interpretability” (trying to figure out what the heck is going on in the weights of these huge models as they predict tokens) they realized they could force Claude to be obsessed with the Golden Gate Bridge, “bringing it up in answer to almost any query—even in situations where it wasn’t at all relevant.” So I set to work with GPT-4o making user request-assistant response pairs, where the assistant would initially be helpful in its replies and then veer toward the Golden Gate Bridge:
Or read this on r/apple