Get the latest tech news

The upcoming GPT-3 moment for RL

Mechanize Inc. is developing virtual environments and benchmarks to fully automate the economy.

Incidentally, achieving RL compute expenditure comparable to current frontier-model pretraining budgets will likely require about ~10k years of model-facing task-time, measured in terms of how long humans would take to perform the same tasks. Because compute spending dominates total training expenses, scaling RL to be comparable with pretraining budgets will yield significant performance improvements without substantially increasing overall costs. Thus, replication training provides a scalable approach for efficient and complex task production, moving us closer to AIs that can complete entire software projects end-to-end.

Get the Android app

Or read this on Hacker News