Pocket FM partners with ElevenLabs to convert scripts into audio content quickly

Pocket FM and ElevenLabs logos

Image Credits: Pocket FM

Lightspeed Ventures-backed audio platform Pocket FM announced it has partnered with voice-cloning company ElevenLabs to quickly convert text content, such as script, into audio series using AI.

Pocket FM, which raised $103 million in Series D funding in March, told TechCrunch at the time that it was already experimenting with the ability to convert text content into audio using ElevenLabs‘ tech. Now, the India-based company has expanded the partnership to make the conversion tool available to all creators over the next few weeks.

In the test phase, Pocket FM already produced 30,000 hours of audio series using ElevenLab’s AI tech. With the new roll-out, the startup expects to triple its content library of over 100,000 hours of audio content this year. Pocket FM also said that during the experimental phase, the AI-powered tools helped it cut the cost of producing audio by 90%.

Pocket FM conversion of text to audio content
Image Credits: Pocket FM

Pocket FM’s co-founder and CTO Prateek Dixit told TechCrunch over a call that with this partnership, the company wants to make it easier for writers to convert their writings into audio series.

“We have over 250,000 writers (including the ones on the company’s Pocket Novel writing plaform) and this partnership decreases the cost of setting up and recording audio for them,” he said.

“Even with a good set up of recording tools and equipment, writers can produce roughly 30 minutes of high-quality audio content per day. With the AI tools, this output can be 10 times more,” he added.

Pocket FM has built a tool integrating ElevenLabs tech, through which it is offering 50 voices for writers who want to convert their content. ElevenLabs’ co-founder Mati Staniszewski said that his company’s tool understands the context of the writing and infers emotions through the voice automatically.

“Working with Pocket FM, we are deploying our newer models that understand the genre of writing and are emotionality better,” Staniszewski said.

Dixit noted that based on data from users’ engagement with this kind of content, the platform also plans to suggest voices that work well for writers in a particular genre.

Pocket FM is not the only audio series platform experimenting with AI-powered tools. Google-backed Kuku FM is using GPT-4, Claude, BandLab and even ElevenLabs to help its writers with different stages of creation, including refining script, generating thumbnails, adding sound effects and converting text into audio.

Kuku FM told TechCrunch that it is also experimenting with using visual generation tools such as Midjourney and Runway to create ads related to content.

Quality of content and impact on artists

The promise of AI-powered tools is to generate more content faster, but that doesn’t mean the content is good. Pocket FM’s answer to aiding discovery and surfacing quality content is making its discovery algorithm sophisticated and experimenting with user engagement.

“If a writer publishes an audio series, we surface that content to a select number of users and observe engagement metrics. If these metrics are positive, we further propagate that,” Dixit said.

Kuku FM said it is working with its quality control team to ensure only high-quality content is promoted on its app, even if creators have used AI in the process.

“We realized the importance of having a human Quality Control team at the center of our decision-making when it comes to audio content production. We have developed a core team of Content Producers who have high ownership & authority on the artistic standards,” the company’s co-foudner and CEO Lal Chand Bisu said.

Utilizing AI could lead to quicker results and a bigger content library for these platforms, but it will also reduce the roles of voiceover artists working with them. India’s Association of Voiceover Artists (AVA) has expressed its concerns about AI taking over.

“If AI takes over, we are finished. As voice artists, we need to get some regulation in place so that our livelihood is protected,” Amarinder Singh Sodhi, the association’s general secretary, told Indian publication Scroll.

Sodi also told Scroll about incidents where voiceover artists were called into the studio to record samples to train AI without obtaining their consent or informing them.

“On an emotional level, it scares me. By using AI, you are essentially diluting the human experience of storytelling. You lose out on an emotional connection,” Delhi-based voiceover artist Aditya Mattoo told TechCrunch.

He added that giving access to premium voices to people who don’t have the taste and skill to produce quality content will lead to the market getting flooded by bad content.

Voice artists in other parts of the world have also raised concerns about AI impacting their jobs. And despite working with some of the AI companies, they feel uncomfortable about their voices being altered.

When we asked about the impact of AI-powered voice generation on Pocket FM, the company didn’t directly answer the question. However, Dixit noted that engagement with AI-generated content in its experiments is “as good as human voiceover production.” Notably, the company is also working on technology to incorporate multiple voices in one audio output.

Both Pocket FM and Kuku FM don’t currently label their content to indicate if AI has been used in the creation process.

Pocket FM partners with ElevenLabs to convert scripts into audio content quickly

Pocket FM and ElevenLabs logos

Image Credits: Pocket FM

Lightspeed Ventures-backed audio platform Pocket FM announced it has partnered with voice-cloning company ElevenLabs to quickly convert text content, such as script, into audio series using AI.

Pocket FM, which raised $103 million in Series D funding in March, told TechCrunch at the time that it was already experimenting with the ability to convert text content into audio using ElevenLabs‘ tech. Now, the India-based company has expanded the partnership to make the conversion tool available to all creators over the next few weeks.

In the test phase, Pocket FM already produced 30,000 hours of audio series using ElevenLab’s AI tech. With the new roll-out, the startup expects to triple its content library of over 100,000 hours of audio content this year. Pocket FM also said that during the experimental phase, the AI-powered tools helped it cut the cost of producing audio by 90%.

Pocket FM conversion of text to audio content
Image Credits: Pocket FM

Pocket FM’s co-founder and CTO Prateek Dixit told TechCrunch over a call that with this partnership, the company wants to make it easier for writers to convert their writings into audio series.

“We have over 250,000 writers (including the ones on the company’s Pocket Novel writing plaform) and this partnership decreases the cost of setting up and recording audio for them,” he said.

“Even with a good set up of recording tools and equipment, writers can produce roughly 30 minutes of high-quality audio content per day. With the AI tools, this output can be 10 times more,” he added.

Pocket FM has built a tool integrating ElevenLabs tech, through which it is offering 50 voices for writers who want to convert their content. ElevenLabs’ co-founder Mati Staniszewski said that his company’s tool understands the context of the writing and infers emotions through the voice automatically.

“Working with Pocket FM, we are deploying our newer models that understand the genre of writing and are emotionality better,” Staniszewski said.

Dixit noted that based on data from users’ engagement with this kind of content, the platform also plans to suggest voices that work well for writers in a particular genre.

Pocket FM is not the only audio series platform experimenting with AI-powered tools. Google-backed Kuku FM is using GPT-4, Claude, BandLab and even ElevenLabs to help its writers with different stages of creation, including refining script, generating thumbnails, adding sound effects and converting text into audio.

Kuku FM told TechCrunch that it is also experimenting with using visual generation tools such as Midjourney and Runway to create ads related to content.

Quality of content and impact on artists

The promise of AI-powered tools is to generate more content faster, but that doesn’t mean the content is good. Pocket FM’s answer to aiding discovery and surfacing quality content is making its discovery algorithm sophisticated and experimenting with user engagement.

“If a writer publishes an audio series, we surface that content to a select number of users and observe engagement metrics. If these metrics are positive, we further propagate that,” Dixit said.

Utilizing AI could lead to quicker results and a bigger content library for these platforms, but it will also reduce the roles of voiceover artists working with them. India’s Association of Voiceover Artists (AVA) has expressed its concerns about AI taking over.

“If AI takes over, we are finished. As voice artists, we need to get some regulation in place so that our livelihood is protected,” Amarinder Singh Sodhi, the association’s general secretary, told Indian publication Scroll.

Sodi also told Scroll about incidents where voiceover artists were called into the studio to record samples to train AI without obtaining their consent or informing them.

“On an emotional level, it scares me. By using AI, you are essentially diluting the human experience of storytelling. You lose out on an emotional connection,” Delhi-based voiceover artist Aditya Mattoo told TechCrunch.

He added that giving access to premium voices to people who don’t have the taste and skill to produce quality content will lead to the market getting flooded by bad content.

When we asked about the impact of AI-powered voice generation on Pocket FM, the company didn’t directly answer the question. However, Dixit noted that engagement with AI-generated content in its experiments is “as good as human voiceover production.” Notably, the company is also working on technology to incorporate multiple voices in one audio output.

Both Pocket FM and Kuku FM don’t currently label their content to indicate if AI has been used in the creation process.

Voice cloning startup ElevenLabs lands $80M, achieves unicorn status

Image Credits: Bryce Durbin/TechCrunch

There’s a lot of money in voice cloning.

Case in point: ElevenLabs, a startup developing AI-powered tools to create and edit synthetic voices, today announced that it closed an $80 million Series B round co-led by prominent investors, including Andreessen Horowitz, former GitHub CEO Nat Friedman and entrepreneur Daniel Gross.

The round, which also had participation from Sequoia Capital, Smash Capital, SV Angel, BroadLight Capital and Credo Ventures, brings ElevenLabs’ total raised to $101 million and values the company at over $1 billion (up from ~$100 million last June). CEO Mati Staniszewski says the new cash will be put toward product development, expanding ElevenLabs’ infrastructure and team, AI research and “enhancing safety measures to ensure responsible and ethical development of AI technology.”

“We raised the new money to cement ElevenLabs’ position as the global leader in voice AI research and product deployment,” Staniszewski told TechCrunch in an email interview.

Co-founded in 2022 by Piotr Dabkowski, an ex-Google machine learning engineer, and Staniszewski, a former Palantir deployment strategist, ElevenLabs launched in beta around a year ago. Staniszewski says that he and Dabkowski, who grew up in Poland, were inspired to create voice cloning tools by poorly dubbed American films. AI could do better, they thought.

Today, ElevenLabs is perhaps best known for its browser-based speech generation app that can create lifelike voices with adjustable toggles for intonation, emotion, cadence and other key vocal characteristics. For free, users can enter text and get a recording of that text read aloud by one of several default voices. Paying customers can upload voice samples to craft new styles using ElevenLabs’ voice cloning.

Increasingly, ElevenLabs is investing in versions of its speech-generating tech aimed at creating audiobooks and dubbing films and TV shows, as well as generating character voices for games and marketing activations.

Last year, the company released a “speech to speech” tool that attempts to preserve a speaker’s voice, prosody and intonation while automatically removing background noise, and — in the case of movies and TV shows — translates and synchronizes speech with the source material. On the roadmap for the coming weeks is a new dubbing studio workflow with tools to generate and edit transcripts and translations and a subscription-based mobile app that narrates web pages and text using ElevenLabs voices.

ElevenLabs’ innovations have won the startup customers in Paradox Interactive (the game developer whose recent projects include Cities: Skylines II and Stellaris) and The Washington Post — among other publishing, media and entertainment companies. Staniszewski claims that ElevenLab users have generated the equivalent of more than 100 years of audio and that the platform is being used by employees at 41% of Fortune 500 companies.

But the publicity hasn’t been totally positive.

The infamous message board 4chan, known for its conspiratorial content, used ElevenLabs’ tools to share hateful messages mimicking celebrities like actress Emma Watson. The Verge’s James Vincent was able to tap ElevenLabs to maliciously clone voices in a matter of seconds, generating samples containing everything from threats of violence to racist and transphobic remarks. And over at Vice, reporter Joseph Cox documented generating a clone convincing enough to fool a bank’s authentication system.

In response, ElevenLabs has attempted to root out users repeatedly violating its terms of service, which prohibits abuse, and rolled out a tool to detect speech created by its platform. This year, ElevenLabs plans to improve the detection tool to flag audio from other voice-generating AI models and partner with unnamed “distribution players” to make the tool available on third-party platforms, Staniszewski says.

ElevenLabs
ElevenLabs offers an array of different voices, some synthetic, some cloned from voice actors. Image Credits: ElevenLabs

ElevenLabs has also faced criticism from voice actors who claim that the company uses samples of their voices without their consent — samples that could be leveraged to promote content they don’t endorse or spread mis- and dis-information. In a recent Vice article, victims recount how ElevenLabs was used in harassment campaigns against them, in one example to share an actor’s private information — their home address — using a cloned voice.

Then there’s the elephant in the room: the existential threat platforms like ElevenLabs pose to the voice acting industry.

Motherboard writes about how voice actors are increasingly being asked to sign away rights to their voices so that clients can use AI to generate synthetic versions that could eventually replace them — sometimes without commensurate compensation. The fear is that voice work — particularly cheap, entry-level work — will eventually be replaced by AI-generated vocals and that actors will have no recourse.

Some platforms are trying to strike a balance. Earlier this month, Replica Studios, an ElevenLabs competitor, signed a deal with SAG-AFTRA to create and license digital replicas of the media artist union members’ voices. In a press release, the organizations said that the arrangement established “fair” and “ethical” terms and conditions to ensure performer consent — and negotiating terms for uses of digital voice doubles in new works.

Even this didn’t please some voice actors, however — including SAG-AFTRA’s own members.

ElevenLabs’ solution is a marketplace for voices. Currently in alpha and set to become more widely available in the next several weeks, the marketplace allows users to create a voice, verify and share it. When others use a voice, the original creators receive compensation, Staniszewski says.

“Users always retain control over their voice’s availability and compensation terms,” he added. “The marketplace is designed as a step towards harmonizing AI advancements with established industry practices, while also bringing a diverse set of voices to ElevenLabs’ platform.”

Voice actors may take issue with the fact that ElevenLabs isn’t paying in cash, though — at least not at present. The current setup has creators receiving credit toward ElevenLabs’ premium services (which some find ironic, I’d wager).

Perhaps that’ll change in the future as ElevenLabs — which is now among the best-funded synthetic voice startups — attempts to beat back upstart competition like Papercup, Deepdub, Acapela, Respeecher and Voice.ai as well as Big Tech incumbents such as Amazon, Microsoft and Google. In any case, ElevenLabs, which plans to grow its headcount from 40 people to 100 by the end of the year, intends on sticking around — and making waves — in the fast-growing synthetic voice market.

rabbit_r1_farfieldmic

Rabbit partners with ElevenLabs to power voice commands on its device

rabbit_r1_farfieldmic

Image Credits: Rabbit

Hardware maker Rabbit has tapped a partnership with ElevenLabs to power voice commands on its devices. Rabbit is set to ship the first set of r1 devices next month after getting a ton of attention at the Consumer Electronics Show (CES) at the start of the year.

The Rabbit r1 will ship with ElevenLabs’ tech, which will enable voice commands from the users and how the pocket AI device talks back to them. At launch, the feature will be available only in English with one voice option. ElevenLabs said that while r1 was poised for voice interaction from the start, the company’s low latency models will make interactions more human-like.

“We’re working with rabbit to bring the future of human-device interaction closer. Our collaboration is about making the r1 a truly dynamic co-pilot, ” ElevenLabs’ CEO Mati Staniszewski said in a prepared statement.

In January, Rabbit said that it will use Perplexity AI’s solutions to answer users’ questions on the device.

Earlier this week, Rabbit said that its first batch of $199 r1s will leave the factory by March 31, and will reach users within a few weeks. The company said users will be able to interact with chatbots, get answers from Perplexity, use bi-directional translation, order rides and foods, and play music through the device right out of the box.

The company’s CEO Jesse Lyu said earlier this month at a StrictlyVC event that rabbit is close to having 100,000 device orders.

Earlier this year, ElevenLabs raised $80 million in Series B from investors like Andreessen Horowitz, former GitHub CEO Nat Friedman and entrepreneur Daniel Gross to get to the unicorn status. The company has been focusing on providing voice cloning services for creating audiobooks and dubbing movies and TV shows, ads and video game characters. Most recently, India’s audio platform PocketFM, which raised $103 million from Lightspeed, said that it is using ElevenLabs’ services to let creators convert their writings into audio series.

But ElevenLabs has faced its fair share of criticism, with users trying to fool a bank’s authentication system, 4chan users mimicking celebrities and journalists documenting that it is easy to set up voice clones to generate problematic content. The startup has rolled out a tool to detect speech created by its platform and is also working on a tool to detect synthesized audio and distribute it to third parties.