Get the latest tech news
DeepCoder delivers top coding performance in efficient 14B open model
DeepCoder-14B competes with frontier models like o3 and o1—and the weights, code, and optimization platform are open source.
The team also designed a straightforward reward function that only provides a positive signal if the generated code passes all sampled unit tests for the problem within a specific time limit. Combined with the high-quality training examples, this outcome-focused reward system prevents the model from learning tricks like printing memorized answers for public tests or optimizing for simple edge cases without solving the core problem. GRPO+ enables DeepCoder-14 to continue for longer durations without collapsing Credit: Together AI Finally, the team extended the model’s context window iteratively, first training it on shorter reasoning sequences and gradually increasing the length.
Or read this on Venture Beat