Get the latest tech news

Beyond benchmarks: How DeepSeek-R1 and o1 perform on real-world tasks

o1 is slightly better at reasoning, but DeepSeek-R1 provides much more details about its reasoning, which is very useful to the user.

We considered a scenario where the user has invested $140 in the Magnificent Seven (Alphabet, Amazon, Apple, Meta, Microsoft, Nvidia, Tesla) on the first day of every month from January to December 2024. The model had also been confounded by a row in the Nvidia chart that had marked the company’s 10:1 stock split on June 10, 2024, and ended up miscalculating the final value of the portfolio. Another experiment we carried out required the model to compare the stats of four leading NBA centers and determine which one had the best improvement in field goal percentage (FG%) from the 2022/2023 to the 2023/2024 seasons.

Get the Android app

Or read this on Venture Beat