AI’s Next Big Takeover: Audiobooks

TechEnvokeMarch 12, 2023

16 5 minutes read

“I’m not sure if this will still be my full-time job in five years.”

Guts grumble. It is typical. the sound of the digestive system’s muscles contracting. The functioning of the human body. Occasionally, if a microphone is close, those burbles and gurgles are recorded.

Leah Aller’s and engineer Craig Hinkle aren’t robots, so they don’t have to worry about making weird stomach noises when reading audiobooks. They’re real people worrying about gurgles, debating where to place the accent on the word “increase,” and attending to the minute details of giving a book about how couples communicate a “genuine” voice as they record for Nashville Audiobook Productions in the middle of January.

The Ruckus Room, where Taylor Swift recorded her seven-time platinum self-titled first album, is home to NAP’s recording facility. The waiting area is filled with the aroma of coffee. Hinkle is focused on every word Aller’s says as she glances from an iPad that has the book’s content to a huge monitor that is perched on the soundboard in the studio.

The Ruckus Room, where Taylor Swift recorded her seven-time platinum self-titled debut album, is where NAP’s recording studio is located in Nashville, Tennessee. The waiting area is filled with a strong coffee odor. Hinkle is paying close attention to every word Aller’s says as he speaks, gazing from an iPad with the book’s text to a huge monitor perched on the soundboard in the studio.

Before starting a new chapter, Aller’s tells Hinkle, “I want to get some more emotions in these questions.

The popularity of audiobooks is rising. According to Acumen Research and Consulting, the market will increase from roughly $4.2 billion in 2021 to $33.5 billion by 2030. Whether this is a result of the surge in popularity of podcasts, a problem with listening convenience, or a side effect of the epidemic, tech companies and the inevitable emergence of artificial intelligence haven’t ignored it.

In 2023, there is a lot of enthusiasm about the possibilities of AI, but there is also a lot of concern about it taking employment away from struggling creatives. With various degrees of success, ChatGPT can create anything from dating app biographies to pre-authorization letters for insurance. Many people who make a living generating digital art are concerned about their future since AI platforms like Lensa AI and OpenAI’s Dall-E spit forth AI-generated art.

For a time now, tech companies like Apple and Google have been developing AI for audiobook narration. Google began offering its services to publishers in six nations, including the US and Canada, in 2022. Google’s AI narrators go by names like Santiago, who speaks Spanish, and Archie, who has a British accent. Early in January, Apple unveiled a group of AI voices with names like Madison and Jackson that writers and independent publishers using Apple Books to sell their books may use to read everything from romance to nonfiction.

“I don’t know if in five years, this will be my full-time profession anymore,” said Eby, a Grand Rapids, Michigan-based narrator who’s recorded more than 1,000 volumes in the last 21 years.

Authors like Eby claim that their humanity is what actually makes them successful at what they do. Narrators choose a character’s voice and other aspects of how to convey emotion and nuance in fiction, especially, in order to reflect the plot.

Kathleen Li, a narrator located in Austin, Texas, noted that if a character is crying after losing their parent, she must express those tears and gasps in her speech.

The closeness of speaking directly to a listener is described, and the narrators question whether even the most lifelike AI will experience the uncanny valley. They fear that interfering with the experience will be dangerous.

The voices of AIs can be anything from stiff to very convincing. But, even the most fluid speaker might trigger those uncanny valley tripwires with an erroneous delivery or pace.

Money talks

Audiobook purists would find it difficult to comprehend why anyone would choose a synthetic voice over a human one. Yet, time and money may be more persuasive arguments for independent publishers and authors than the value of a creative performance.

The University of Michigan Press doesn’t generate much money from audiobooks. About 100 academic publications written by scholars for scholars or students are released annually by the publisher.

The expense of hiring a narrator for a book that would only make a few hundred dollars could reach as much as $6,000. Not to mention the labor-intensive production process. According to ACX, Amazon’s Audiobook Creation Exchange, one finished hour of an audiobook can take up to six hours to generate.

Charles Watkinson, director of the University of Michigan Press and associate university librarian for publishing at the University of Michigan Library, said: “The reality is that the economics don’t work out until you have a kind of a best-seller. Also, he serves as president of the Association of University Presses, an expert association of academic publishers.

The time and money required to produce an audiobook may be too much for smaller authors and publishers. AI might alter that.

Google approached the University of Michigan Press about taking part in a trial project around two years ago. The media was able to produce roughly 100 digital audiobooks using Google’s service. Still, some level of human involvement is necessary. According to Watkinson, some instructors who have used Google will have their pupils listen to the recording and compare it to the written material. While using AI to speed up the recording process, smaller presses may still experience staffing challenges.

Watkinson stated that the University of Michigan was curious about how AI can enhance access to texts that might not otherwise be available in audio format.