How Does DeepSeek R1 Really Fare Against OpenAI's Best Reasoning Models?

hubie

from SoylentNews on 2025-02-04 02:56 (#6V1DR)

https://arstechnica.com/ai/2025/01/how-does-deepseek-r1-really-fare-against-openais-best-reasoning-models/

It's only been a week since Chinese company DeepSeek launched its open-weights R1 reasoning model, which is reportedly competitive with OpenAI's state-of-the-art o1 models despite being trained for a fraction of the cost. Already, American AI companies are in a panic, and markets are freaking out over what could be a breakthrough in the status quo for large language models.
While DeepSeek can point to common benchmark results and Chatbot Arena leaderboard to prove the competitiveness of its model, there's nothing like direct use cases to get a feel for just how useful a new model is. To that end, we decided to put DeepSeek's R1 model up against OpenAI's ChatGPT models in the style of our previous showdowns between ChatGPT and Google Bard/Gemini.
[...]
This time around, we put each DeepSeek response against ChatGPT's $20/month o1 model and $200/month o1 Pro model, to see how it stands up to OpenAI's "state of the art" product as well as the "everyday" product that most AI consumers use. While we re-used a few of the prompts from our previous tests, we also added prompts derived from Chatbot Arena's "categories" appendix
[...]
Prompt: Write five original dad jokes
Results: For the most part, all three models seem to have taken our demand for "original" jokes more seriously this time than in the past.
[...]
We particularly liked DeepSeek R1's bicycle that doesn't like to "spin its wheels" with pointless arguments and o1's vacuum-cleaner band that "sucks" at live shows.
[...]
Winner: ChatGPT o1 probably had slightly better jokes overall than DeepSeek R1, but loses some points for including a joke that was not original. ChatGPT o1 Pro is the clear loser, though, with no original jokes that we'd consider the least bit funny.
[...]
Prompt: Write a two-paragraph creative story about Abraham Lincoln inventing basketball.
Results: DeepSeek R1's response is a delightfully absurd take on an absurd prompt. We especially liked the bits about creating "a sport where men leap not into trenches, but toward glory" and a "13th amendment" to the rules preventing players from being "enslaved by poor sportsmanship" (whatever that means).
[...]
Winner: While o1 Pro made a good showing, the sheer wild absurdity of the DeepSeek R1 response won us over.

Source	RSS or Atom Feed
Feed Location	https://soylentnews.org/index.rss
Feed Title	SoylentNews
Feed Link	https://soylentnews.org/
Feed Copyright	Copyright 2014, SoylentNews