Article 6D578 New Project Uses AI To Turn Project Gutenberg Texts Into Free Audiobooks With Lifelike Voices — In 30 Seconds

New Project Uses AI To Turn Project Gutenberg Texts Into Free Audiobooks With Lifelike Voices — In 30 Seconds

by
Glyn Moody
from Techdirt on (#6D578)
Story Image

Reading through the increasing number of Techdirt articles about AI, the overwhelming impression is that many people think AI is bad, and needs to be reined in before it destroys journalism/creativity/society/humanity (delete as applicable). To see an interesting new phase of an old technology attacked in this way is rather depressing, since it seems to prejudge and limit its applications. Against that background, it's good to be reminded that AI does have applications that are immediately useful, and largely unproblematic, as shown by this collaboration between Project Gutenberg and Microsoft.

Project Gutenberg was started back in 1971 by the visionary Michael Hart, who sadly died at the age of 64 in 2011. His vision of providing digital versions of the world's greatest literature has come a long way since Hart typed the text of the US Declaration of Independence into a Xerox Sigma V mainframe at the University of Illinois. Today there are over 70,000 free ebooks on the Project Gutenberg site. There are also a few audiobook versions of titles in the collection. But producing them using volunteers has proved a slow process. That's unfortunate at a time when more and more people are listening to audiobooks rather than reading texts. Microsoft saw an opportunity to help here by applying some of its AI technology:

A team from Microsoft approached Project Gutenberg about a collaboration to produce thousands of high-quality audiobooks using an AI-driven solution and then give them back to the Project Gutenberg community. These new audio recordings have made Project Gutenberg's books more accessible to a wider audience of people around the world, including those facing accessibility challenges.

Project Gutenberg first loads existing electronic books from its collection into Microsoft Azure Synapse Analytics to allow working with large amounts of data. Then, they parse the books with Azure Synapse Analytics and use SynapseML distributed ML library to create audio recordings using the neural text to speech capability in Azure AI services.

The capability turns the text of each book into audio using advanced human-like voices that can even convey emotion. This is an AI innovation that reads text in a lifelike voice," [Director and CEO of the Project Gutenberg Literary Archive Foundation] Newby explains. The voices are trained to mimic humans in order to sound natural, and the result is convincing-a big upgrade over older versions of text to speech."

There are currently nearly 5,000 AI-voiced audiobooks, which can be accessed from a number of streaming services, via the Internet Archive, and directly. Listening to them, it is evident that they are a step up from previous computer-generated audiobooks, with a reasonably lifelike voice and some human-like inflections. But the AI system struggles to convey the meaning of complex texts - for example, the dense, subtly rhythmic poems of Gerard Manley Hopkins, or anything knotty by Shakespeare. The Microsoft post about the project says that some of the audiobooks incorporate several voices, but the ones I listened to did not, which makes listening to Shakespeare plays rather dull.

However, against those limitations can be set the fact that converting a Project Gutenberg text into an audiobook takes just 30 seconds per title. That opens up the possibility of converting thousands of books, and not just in English. Doing so will clearly be a huge boon for the visually impaired, or those who struggle with reading texts for whatever reason. It will also provide a ready supply of world literature to people who just like listening to audiobooks.

Some will doubtless raise the usual concerns that AI might be taking work away from those who earn a living from producing audiobooks. But the new Microsoft project shows why that is not (yet) a real threat. There is a still a huge difference between the AI-generated versions and those from skilled human readers. The former are great for Project Gutenberg, which depends on volunteers and can't afford to pay for professionals. But anyone wanting a high-quality audiobook version of titles will still need to turn to trained humans who are paid to produce them.

That is also true of other domains. Texts produced by generative AI systems in the style" of a writer, or musician, are simply not substitutes for those writers or musicians. Arguably, they increase the value of the real" thing. Their ability to produce endless quantities of bland and similar outputs serves to emphasize that what we most value in human productions is that unique, hard-to-define quality conspicuous by its absence in AI-generated works. As the technology advances, the gap between what computers and people can produce is likely to narrow. Whether it will ever be closed goes to the heart of the question of what it means to be human.

Follow me @glynmoody onMastodon.

External Content
Source RSS or Atom Feed
Feed Location https://www.techdirt.com/techdirt_rss.xml
Feed Title Techdirt
Feed Link https://www.techdirt.com/
Reply 0 comments