Google Plans Giant AI Language Model Supporting World's 1,000 Most Spoken Languages
Google has announced an ambitious new project to develop a single AI language model that supports the world's "1,000 most spoken languages." The Verge reports: As a first step towards this goal, the company is unveiling an AI model trained on over 400 languages, which it describes as "the largest language coverage seen in a speech model today." [...] Google's "1,000 Languages Initiative" is not focusing on any particular functionality, but instead on creating a single system with huge breadth of knowledge across the world's languages. Speaking to The Verge, Zoubin Ghahramani, vice president of research at Google AI, said the company believes that creating a model of this size will make it easier to bring various AI functionalities to languages that are poorly represented in online spaces and AI training datasets (also known as "low-resource languages"). "By having a single model that is exposed to and trained on many different languages, we get much better performance on our low resource languages," says Ghahramani. "The way we get to 1,000 languages is not by building 1,000 different models. Languages are like organisms, they've evolved from one another and they have certain similarities. And we can find some pretty spectacular advances in what we call zero-shot learning when we incorporate data from a new language into our 1,000 language model and get the ability to translate [what it's learned] from a high-resource language to a low-resource language." Access to data is a problem when training across so many languages, though, and Google says that in order to support work on the 1,000-language model it will be funding the collection of data for low-resource languages, including audio recordings and written texts. The company says it has no direct plans on where to apply the functionality of this model -- only that it expects it will have a range of uses across Google's products, from Google Translate to YouTube captions and more. "One of the really interesting things about large language models and language research in general is that they can do lots and lots of different tasks," says Ghahramani. "The same language model can turn commands for a robot into code; it can solve maths problems; it can do translation. The really interesting things about language models is they're becoming repositories of a lot of knowledge, and by probing them in different ways you can get to different bits of useful functionality."
Read more of this story at Slashdot.