Get the latest tech news
Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
As AIs rapidly advance and become more agentic, the risk they pose is governed not only by their capabilities but increasingly by their propensities, including goals and values. Surprisingly, we find that independently-sampled preferences in current LLMs exhibit high degrees of structural coherence, and moreover that this emerges with scale. As a case study, we show how aligning utilities with a citizen assembly reduces political biases and generalizes to new scenarios.
Or read this on Hacker News