Article 67W0Y Microsoft's New AI Can Simulate Anyone's Voice With Three Seconds of Audio

Microsoft's New AI Can Simulate Anyone's Voice With Three Seconds of Audio

by
Fnord666
from SoylentNews on (#67W0Y)

Freeman writes:

Text-to-speech model can preserve speaker's emotional tone and acoustic environment:

On Thursday, Microsoft researchers announced a new text-to-speech AI model called VALL-E that can closely simulate a person's voice when given a three-second audio sample. Once it learns a specific voice, VALL-E can synthesize audio of that person saying anything-and do it in a way that attempts to preserve the speaker's emotional tone.

Its creators speculate that VALL-E could be used for high-quality text-to-speech applications, speech editing where a recording of a person could be edited and changed from a text transcript (making them say something they originally didn't), and audio content creation when combined with other generative AI models like GPT-3.

Original Submission

Read more of this story at SoylentNews.

External Content
Source RSS or Atom Feed
Feed Location https://soylentnews.org/index.rss
Feed Title SoylentNews
Feed Link https://soylentnews.org/
Feed Copyright Copyright 2014, SoylentNews
Reply 0 comments