Article 67W0Y Microsoft's New AI Can Simulate Anyone's Voice With Three Seconds of Audio

Microsoft's New AI Can Simulate Anyone's Voice With Three Seconds of Audio

by
Fnord666
from on (#67W0Y)

Freeman writes:

Text-to-speech model can preserve speaker's emotional tone and acoustic environment:

On Thursday, Microsoft researchers announced a new text-to-speech AI model called VALL-E that can closely simulate a person's voice when given a three-second audio sample. Once it learns a specific voice, VALL-E can synthesize audio of that person saying anything-and do it in a way that attempts to preserve the speaker's emotional tone.

Its creators speculate that VALL-E could be used for high-quality text-to-speech applications, speech editing where a recording of a person could be edited and changed from a text transcript (making them say something they originally didn't), and audio content creation when combined with other generative AI models like GPT-3.

Original Submission

Read more of this story at SoylentNews.

External Content
Source RSS or Atom Feed
Feed Location https://soylentnews.org/index.rss
Feed Title
Feed Link https://soylentnews.org/
Reply 0 comments