Get the latest tech news

Explainer: What's r1 and everything else?


& Everything Else? Is AI making you dizzy? A lot of industry insiders are feeling the same. R1 just came out a few days ago out of nowhere, and then there’s o1 and o3, but no o2.

R1 used GRPO (Group Rewards Policy Optimization) to teach the model to do CoT at inference time. It’s more accurate than R1, but it hops between various languages like English & Chinese at will, which makes it sub-optimal for it’s human users (who aren’t typically polyglots). USA: heavily funded, pour money onto the AI fire as fast as possible China: under repressive export controls, pour smarter engineers & researchers into finding cheaper solutions Europe: regulate or open source AI, either is fine

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Explainer

Explainer

Related news:

News photo

Transformer Explainer

News photo

Leaked Microsoft Memo Tells Managers Not To Use Budget Cuts as Explainer for Lack of Pay Rises

News photo

Peak and Off-Peak Energy Explainer: Here's the Cheapest Time to Use Electricity - CNET