Get the latest tech news
Steering interpretable language models with concept algebra
We demonstrate reliable, fine-grained control over language model generation by directly injecting, suppressing, and composing human-interpretable concepts at inference time.
None
Or read this on Hacker News