Article 6MS1X Major ChatGPT-4o update allows audio-video talks with an “emotional” AI chatbot

Major ChatGPT-4o update allows audio-video talks with an “emotional” AI chatbot

by
Benj Edwards
from Ars Technica - All content on (#6MS1X)
colorful_waveform_1-800x450.jpg

Enlarge (credit: Getty Images)

On Monday, OpenAI debuted GPT-4o (o for "omni"), a major new AI model that can ostensibly converse using speech in real time, reading emotional cues and responding to visual input. It operates faster than OpenAI's previous best model, GPT-4 Turbo, and will be free for ChatGPT users and available as a service through API, rolling out over the next few weeks, OpenAI says.

OpenAI revealed the new audio conversation and vision comprehension capabilities in a YouTube livestream titled "OpenAI Spring Update," presented by OpenAI CTO Mira Murati and employees Mark Chen and Barret Zoph that included live demos of GPT-4o in action.

OpenAI claims that GPT-4o responds to audio inputs in about 320 milliseconds on average, which is similar to human response times in conversation, according to a 2009 study, and much shorter than the typical 2-3 second lag experienced with previous models. With GPT-4o, OpenAI says it trained a brand-new AI model end-to-end using text, vision, and audio in a way that all inputs and outputs "are processed by the same neural network."

Read 11 remaining paragraphs | Comments

External Content
Source RSS or Atom Feed
Feed Location http://feeds.arstechnica.com/arstechnica/index
Feed Title Ars Technica - All content
Feed Link https://arstechnica.com/
Reply 0 comments