Get the latest tech news
Absolute Zero Reasoner
Proposed Programs All AZR proposed code samples are first embedded with jina-embeddings-v2-base-code then projected to 2D using UMAP. Spinning Hexagon Vibe Check Prompt: Write a script that shows 10 balls bouncing inside a spinning hexagon.
For the proposer, we design a specialized reward function based on Monte Carlo rollouts that encourages the generation of tasks with optimal difficulty - problems where the solver sometimes succeeds and sometimes fails. We observe AZR with Llama3.1-8b as the base occasionally produces concerning chains of thought, we term the "uh-oh moment", example shown in Figure 6, highlighting the need for future work on safety-aware training. The model naturally develops a habit of using comments as intermediate planning steps when solving complex reasoning tasks, similar to the ReAct prompting framework.
Or read this on Hacker News