Get the latest tech news
Phi-4 Reasoning Models
Microsoft continues to add to the conversation by unveiling its newest models, Phi-4-reasoning, Phi-4-reasoning-plus, and Phi-4-mini-reasoning. Learn more.
Accuracy of models across general-purpose benchmarks for: long input context QA (FlenQA), instruction following (IFEval), Coding (HumanEvalPlus), knowledge & language understanding (MMLUPro), safety detection (ToxiGen), and other general skills (ArenaHard and PhiBench). It is used in core experiences like Click to Do providing useful text intelligence tools for any content on your screen and is available as developer APIs to be readily integrated into applications—already being used in several productivity applications like Outlook offering its Copilot summary features offline. The Phi family of models has adopted a robust safety post-training approach, leveraging a combination of Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Reinforcement Learning from Human Feedback (RLHF) techniques.
Or read this on Hacker News