Get the latest tech news

The Differences Between Direct Alignment Algorithms Are a Blur


Direct Alignment Algorithms (DAAs) simplify language model alignment by replacing reinforcement learning (RL) and reward modeling (RM) in Reinforcement Learning from Human Feedback (RLHF) with direct policy optimization. DAAs can be classified by their ranking losses (pairwise vs. pointwise), by the rewards used in those losses (e.g., likelihood ratios of policy and reference policy, or odds ratios), or by whether a Supervised Fine-Tuning (SFT) phase is required (two-stage vs. one-stage). We first show that one-stage methods underperform two-stage methods. To address this, we incorporate an explicit SFT phase and introduce the $β$ parameter, controlling the strength of preference optimization, into single-stage ORPO and ASFT. These modifications improve their performance in Alpaca Eval 2 by +$3.46$ (ORPO) and +$8.27$ (ASFT), matching two-stage methods like DPO. Further analysis reveals that the key factor is whether the approach uses pairwise or pointwise objectives, rather than the specific implicit reward or loss function. These results highlight the importance of careful evaluation to avoid premature claims of performance gains or overall superiority in alignment algorithms.

To address this, we incorporate an explicit SFT phase and introduce the $\beta$ parameter, controlling the strength of preference optimization, into single-stage ORPO and ASFT. Further analysis reveals that the key factor is whether the approach uses pairwise or pointwise objectives, rather than the specific implicit reward or loss function. These results highlight the importance of careful evaluation to avoid premature claims of performance gains or overall superiority in alignment algorithms.

Get the Android app

Or read this on Hacker News

Read more on:

Photo of Blur

Blur

Photo of differences

differences

Related news:

News photo

Microsoft and a16z set aside differences, join hands in plea against AI regulation

News photo

Mac Mini vs. Mac Studio Buyer's Guide: 20+ Differences Compared

News photo

Building a more robust Wikipedia interface by spotting the differences (2023)