Get the latest tech news

Understanding Moravec's Paradox

HexHowells Blog

In general, ignoring humans or machines, problems are more difficult if their search space is large, and reward signals are sparse. It can get more difficult when the model needs to map a single input to various outputs, such as via autoregressive text generation (e.g. "my dog is "; there are multiple correct answers). Take RLHF, the network now has a larger search space, since each token that is predicted is actually used as context, and the model doesn't know if the text it's producing is good until it has finished and is evaluated.

Get the Android app

Or read this on Hacker News