Article 6MS7J Before launching, GPT-4o broke records on chatbot leaderboard under a secret name

Before launching, GPT-4o broke records on chatbot leaderboard under a secret name

by
Benj Edwards
from Ars Technica - All content on (#6MS7J)
anonymous_robot_GettyImages-1157186724-8

Enlarge (credit: Getty Images)

On Monday, OpenAI employee William Fedus confirmed on X that a mysterious chart-topping AI chatbot known as "gpt-chatbot" that had been undergoing testing on LMSYS's Chatbot Arena and frustrating experts was, in fact, OpenAI's newly announced GPT-4o AI model. He also revealed that GPT-4o had topped the Chatbot Arena leaderboard, achieving the highest documented score ever.

"GPT-4o is our new state-of-the-art frontier model. We've been testing a version on the LMSys arena as im-also-a-good-gpt2-chatbot," Fedus tweeted.

Chatbot Arena is a website where visitors converse with two random AI language models side by side without knowing which model is which, then choose which model gives the best response. It's a perfect example of vibe-based AI benchmarking, as AI researcher Simon Willison calls it.

Read 8 remaining paragraphs | Comments

External Content
Source RSS or Atom Feed
Feed Location http://feeds.arstechnica.com/arstechnica/index
Feed Title Ars Technica - All content
Feed Link https://arstechnica.com/
Reply 0 comments