Get the latest tech news
The upcoming GPT-3 moment for RL
Mechanize Inc. is developing virtual environments and benchmarks to fully automate the economy.
Incidentally, achieving RL compute expenditure comparable to current frontier-model pretraining budgets will likely require about ~10k years of model-facing task-time, measured in terms of how long humans would take to perform the same tasks. Because compute spending dominates total training expenses, scaling RL to be comparable with pretraining budgets will yield significant performance improvements without substantially increasing overall costs. Thus, replication training provides a scalable approach for efficient and complex task production, moving us closer to AIs that can complete entire software projects end-to-end.
Or read this on Hacker News