Get the latest tech news
The Paradigm
Over the past decade, some of the most remarkable AI breakthroughs—AlphaGo, AlphaStar, AlphaFold1, VPT, OpenAI Five, ChatGPT—have all shared a common thread: they start with large-scale data gathering (self-supervised or imitation learning, or SSL) and then use reinforcement learning to refine their performance toward a specific goal. This marriage of general knowledge acquisition and focused, reward-driven specialization has emerged as a the paradigm by which we can reliably train AI systems to excel at arbitrary tasks.
This marriage of general knowledge acquisition and focused, reward-driven specialization has emerged as a the paradigm by which we can reliably train AI systems to excel at arbitrary tasks. As a result, the models the big labs are putting out are increasingly trained on self-prediction objectives over a diverse corpus of interleaved text, images, video and audio. They have analogues in the human experience – Over my life I have learned how to speak, type on a keyboard, pour water, turn a screwdriver, plant a seed, carry heavy objects, drive – the list goes on and on.
Or read this on Hacker News