Get the latest tech news

Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

As AIs rapidly advance and become more agentic, the risk they pose is governed not only by their capabilities but increasingly by their propensities, including goals and values. Surprisingly, we find that independently-sampled preferences in current LLMs exhibit high degrees of structural coherence, and moreover that this emerges with scale. As a case study, we show how aligning utilities with a citizen assembly reduces political biases and generalizes to new scenarios.

Get the Android app

Or read this on Hacker News