Get the latest tech news

You can now fine-tune your enterprise’s own version of OpenAI’s o4-mini reasoning model with reinforcement learning


For organizations with clearly defined problems and verifiable answers, RFT offers a compelling way to align models.

Instead of training on a set of questions with fixed correct answers — which is what traditional supervised learning does — RFT uses a grader model to score multiple candidate responses per prompt. This structure allows customers to align models with nuanced objectives such as an enterprise’s “house style” of communication and terminology, safety rules, factual accuracy, or internal policy compliance. For organizations with clearly defined problems and verifiable answers, RFT offers a compelling way to align models with operational or compliance goals — without building RL infrastructure from scratch.

Get the Android app

Or read this on Venture Beat

Read more on:

Photo of OpenAI

OpenAI

Photo of enterprise

enterprise

Photo of version

version

Related news:

News photo

Fidji Simo joins OpenAI as new CEO of Applications

News photo

OpenAI drafts Instacart boss as CEO of Apps to lure in the normies

News photo

OpenAI launches a data residency program in Asia