Meta Takes Us a Step Closer to Star Trek's Universal Translator
Freeman writes:
In 2023, AI researchers at Meta interviewed 34 native Spanish and Mandarin speakers who lived in the US but didn't speak English. The goal was to find out what people who constantly rely on translation in their day-to-day activities expect from an AI translation tool. What those participants wanted was basically a Star Trek universal translator or the Babel Fish from the Hitchhiker's Guide to the Galaxy: an AI that could not only translate speech to speech in real time across multiple languages, but also preserve their voice, tone, mannerisms, and emotions. So, Meta assembled a team of over 50 people and got busy building it.
[...] AI translation systems today are mostly focused on text, because huge amounts of text are available in a wide range of languages thanks to digitization and the Internet.
[...] AI translators we have today support an impressive number of languages in text, but things are complicated when it comes to translating speech.
[...] A few systems that can translate speech-to-speech directly do exist, but in most cases they only translate into English and not in the opposite way.
[...] to pull off the Star Trek universal translator thing Meta's interviewees dreamt about, the Seamless team started with sorting out the data scarcity problem.
[...] Warren Weaver, a mathematician and pioneer of machine translation, argued in 1949 that there might be a yet undiscovered universal language working as a common base of human communication.
[...] Machines do not understand words as humans do. To make sense of them, they need to first turn them into sequences of numbers that represent their meaning.
[...] When you vectorize aligned text in two languages like those European Parliament proceedings, you end up with two separate vector spaces, and then you can run a neural net to learn how those two spaces map onto each other.
But the Meta team didn't have those nicely aligned texts for all the languages they wanted to cover. So, they vectorized all texts in all languages as if they were just a single language and dumped them into one embedding space called SONAR (Sentence-level Multimodal and Language-Agnostic Representations).
[...] The team just used huge amounts of raw data-no fancy human labeling, no human-aligned translations. And then, the data mining magic happened.
Read more of this story at SoylentNews.