Why this AI startup is betting on voice-enabled bots to scale AI adoption in India

Image Credits: Sarvam AI

If your target market has 22 official languages and its people speak in over 19,000 dialects, does it make sense to offer a text-only AI chatbot that can function best in a couple languages?

That’s the question Indian AI startup Sarvam has been working to solve, and on Tuesday it launched a series of offerings, including a voice-enabled AI bot that supports more than 10 Indian languages, betting that people in the country would prefer to talk to an AI model in their own language rather than chat with it over text. The startup is also launching a small language model, an AI tool for lawyers, as well as an audio-language model.

“People prefer to speak in their own language. It’s extremely challenging to type in Indian languages today,” Vivek Raghavan, co-founder of Sarvam AI, told TechCrunch.

The Bengaluru-based startup, which primarily targets businesses and enterprises, is pitching its AI voice-enabled bots for a number of industries, particularly those relying on customer support. As an example, it pointed to one of its customers: Sri Mandir, a startup that offers religious content, has been using Sarvam’s AI agent to accept payments and has processed more than 270,000 transactions so far.

The company said its AI voice agents can be deployed on WhatsApp, within an app, and can even work with traditional voice calls.

Backed by Peak XV and Lightspeed, Sarvam plans to price its AI agents starting at ₹1 (approximately 1 cent) per minute of usage.

Image Credits: Sarvam

The startup is building its voice-enabled AI agents on top of a foundational, small language model, called Sarvam 2B, that’s trained on a dataset of 4 trillion tokens. The model is completely trained on synthetic data, according to Raghavan.

AI experts often advise caution when using synthetic data — essentially data generated by a large language model that aims to replicate real-world data — to train other AI models, because LLMs tend to hallucinate and make up information that may not be accurate. Training AI models on such data may serve to exacerbate such inaccuracies.

Raghavan said Sarvam opted to use synthetic data due to the extremely limited availability of Indian language content on the open web. The startup has developed models to clean and improve the data first used to generate the synthetic datasets, he added.

The founder claimed that Sarvam 2B will cost a tenth of anything comparable in the industry. The startup is open sourcing the model, hoping that community will further build upon it.

“While the large language foundational models are very exciting, you can achieve an experience that is superior, more specific, lower-cost and with reduced latency using small language models,” Raghavan said. “If you want to perform a query or two in a week or a month, you should use the large language models. But for use cases requiring millions of daily interactions, I believe smaller models are more suitable.”

The startup is also launching an audio-language model, called Shuka, built on its Saaras v1 audio decoder and Meta’s Llama-3-8B Instruct. This model is also being open sourced, so developers can use the startup’s translation, TTS, and other modules to build voice interfaces.

And there’s another product dubbed “A1” — a generative AI workbench designed for lawyers to look up regulations, draft documents, redact them and extract data.

Sarvam is one of the small groups of Indian startups advocating for use cases that align with the country’s interests and contribute to the government’s efforts to develop its own bespoke AI infrastructure.

Governments across the world are increasingly pursuing “sovereign AI” — AI infra that’s developed and controlled at the national level. The purported aim of such efforts is to safeguard data privacy, stimulate economic growth and tailor AI development to their cultural contexts. The United States and China currently have the biggest investments in this space, and India is following with its “IndiaAI” program and language-specific models.

One of the initiatives under the IndiaAI program is called IndiaAI Compute Capacity, and the plan is to establish a supercomputer powered by at least 10,000 GPUs. One of the models being developed, dubbed Bhashini, aims to democratize access to digital services across various Indian languages.

Raghavan said his startup is ready to contribute to the IndiaAI program. “If the opportunity arises, we will work with the government,” he said in the interview.

Why this AI startup is betting on voice-enabled bots to scale AI adoption in India

Image Credits: Sarvam AI

If your target market has 22 official languages and its people speak in over 19,000 dialects, does it make sense to offer a text-only AI chatbot that can function best in a couple languages?

That’s the question Indian AI startup Sarvam has been working to solve, and on Tuesday it launched a series of offerings, including a voice-enabled AI bot that supports more than 10 Indian languages, betting that people in the country would prefer to talk to an AI model in their own language rather than chat with it over text. The startup is also launching a small language model, an AI tool for lawyers, as well as an audio-language model.

“People prefer to speak in their own language. It’s extremely challenging to type in Indian languages today,” Vivek Raghavan, co-founder of Sarvam AI, told TechCrunch.

The Bengaluru-based startup, which primarily targets businesses and enterprises, is pitching its AI voice-enabled bots for a number of industries, particularly those relying on customer support. As an example, it pointed to one of its customers: Sri Mandir, a startup that offers religious content, has been using Sarvam’s AI agent to accept payments, and has processed more than 270,000 transactions so far.

The company said its AI voice agents can be deployed on WhatsApp, within an app, and can even work with traditional voice calls.

Backed by Peak XV and Lightspeed, Sarvam plans to price its AI agents starting at ₹1 (approximately 1 cent) per minute of usage.

Image Credits: Sarvam

The startup is building its voice-enabled AI agents on top of a foundational, small language model, called Sarvam 2B, that’s trained on a data set of 4 trillion tokens. The model is completely trained on synthetic data, according to Raghavan.

AI experts often advise caution when using synthetic data — essentially data generated by a large language model that aims to replicate real-world data — to train other AI models, because LLMs tend to hallucinate and make up information that may not be accurate. Training AI models on such data may serve to exacerbate such inaccuracies.

Raghavan said Sarvam opted to use synthetic data due to the extremely limited availability of Indian language content on the open web. The startup has developed models to clean and improve the data first used to generate the synthetic datasets, he added.

The founder claimed that Sarvam 2B will cost a tenth of anything comparable in the industry. The startup is open-sourcing the model, hoping that community will further build upon it.

“While the large language foundational models are very exciting, you can achieve an experience that is superior, more specific, lower-cost and with reduced latency using small language models,” Raghavan said. “If you want to perform a query or two in a week or a month, you should use the large language models. But for use cases requiring millions of daily interactions, I believe smaller models are more suitable.”

The startup is also launching an audio-language model, called Shuka, built on its Saaras v1 audio decoder and Meta’s Llama3-8B Instruct. This model is also being open-sourced, so developers can use the startup’s translation, TTS, and other modules to build voice interfaces.

And, there’s another product dubbed “A1” — a generative AI workbench designed for lawyers that can look up regulations, draft documents, redact them and extract data.

Sarvam is one of the small group of Indian startups advocating for use cases that align with the country’s interests and contribute to the government’s efforts to develop its own bespoke AI infrastructure.

Governments across the world are increasingly pursuing “sovereign AI” – AI infra that’s developed and controlled at the national level. The purported aim of such efforts is to safeguard data privacy, stimulate economic growth and tailor AI development to their cultural contexts. The United States and China currently have the biggest investments in this space, and India is following with its “IndiaAI” program and language-specific models.

One of the initiatives under the IndiaAI program is called IndiaAI Compute Capacity, and the plan is to establish a supercomputer powered by at least 10,000 GPUs. One of the models being developed, dubbed Bhashini, aims to democratize access to digital services across various Indian languages.

Raghavan said his startup is ready to contribute to the IndiaAI program. “If the opportunity arises, we will work with the government,” he said in the interview.

Google cuts over 1,000 jobs in its voice assistance, hardware teams as Fitbit founders leave

Google logo sign with white backlighting on dark background

Image Credits: Artur Widak/NurPhoto / Getty Images

Google is laying off over 1,000 employees across multiple divisions, including engineering and services, late Wednesday.

The affected divisions include voice-activated Google Assistant as part of the knowledge and information product team restructuring; and the Devices and Services PA (DSPA) team that manages Pixel, Nest, and Fitbit hardware.

The company, which had 182,000 employees as of September 30, 2023, confirmed this development but downplayed it through a statement to indicate the job cuts were part of organizational changes.

“To best position us for these opportunities, throughout the second half of 2023, a number of our teams made changes to become more efficient and work better, and to align their resources to their biggest product priorities. Some teams are continuing to make these kinds of organizational changes, which include some role eliminations globally,” a Google spokesperson said in a statement.

The Alphabet Worker Union said on X that the layoffs were “needless” and the company can’t “continue to fire our coworkers” while making billions.

Google has also let go of most of its AR hardware team and will work with other OEMs, as first reported by 9to5Google. The report also mentioned that Google will now have one core hardware engineering team instead of separate teams working on Pixel, Fitbit and Nest.

The company also confirmed to TechCrunch that Fitbit co-founders James Park and Eric Friedman are leaving as part of this restructuring.

Park played a pivotal role in introducing the new Pixel Watch line of smartwatches to Google’s hardware line up.

Google announced that it is acquiring Fitbit for $2.1 billion in 2019. The deal took two years to pass regulatory approval and was finalized in 2021. Since then the company has been merging Fitbit products into Google’s own offerings. For instance, last year, the search giant started prompting Fitbit users to migrate to Google accounts.

Separately, the company has also let go of people working on the Google Assistant team, as reported by Semafor. The company started infusing AI-powered features in Google Assistant through Bard last year in a bid to expand Assistant “beyond voice.” In October, during the Pixel event, Google said that Assistant could look through apps like Gmail and Drive to respond to queries related to specific emails and files.

Last year, Google had rolling layoffs in different teams, including the Waze mapping service in June, its recruiting team in September and its news division in October. Google’s latest company-wide layoff comes a year after the tech giant let go of approximately 12,000 roles, or 6% of its workforce, in January 2023.

Update 1/11/24, 3:48 PM ET: The union representing Alphabet workers said the layoffs impacted more than 1,000 employees, The Information reported, making it the largest round of cuts since last January.

man holding cell phone

Whispp brings electronic larynx voice boxes into this millennium

man holding cell phone

Image Credits: Whispp (opens in a new window)

Having a voice is important — figuratively and literally — and not being able to speak is a major impediment to communication. Whispp is working to change the game for individuals with speech disorders and voice disabilities, bringing voice boxes into the current millennium with its groundbreaking AI-powered assistive speech and phone-calling app.

At CES 2024, the company launched its newest phone-calling feature that converts whispered and vocal cord-impaired speech into a user’s natural voice in real time.

“Voice disabilities and speech disorders like stuttering significantly impact a person’s life and happiness,” said Whispp co-founder and CEO Joris Castermans. “Our solution aims to empower individuals in their daily lives and work. We’re not just transforming voices; we’re advancing communication by making it accessible to all and enhancing quality of life.”

Unlike traditional voice conversion technologies, Whispp’s app uses AI to convert voiceless speech into natural and voiced speech in real time. This technology is scalable due to its language-independent nature — and perhaps most intriguingly, users can also recreate their unique voice by providing recordings of their past or current healthy voice, adding a personalized touch to their communication.

Whispp is a game changer, accommodating a broad spectrum of voice types — from whispers to rough speech resulting from total laryngectomy after throat cancer. The app is particularly effective for several disorders where deliberately steering the voice toward whispering is beneficial due to neurological changes in the speech system.

Castermans stutters himself and discovered that his stuttering is significantly reduced when whispering. The company says research shows that people who stutter severely can reduce their stuttering frequency by an average of 85% while whispering. It also transpires that those suffering from spasmodic dysphonia or recurrent respiratory papillomatosis speak much more relaxed and fluently when they whisper. Unlike conventional speech-to-text approaches with noticeable latency and uneven conversational flow, Whispp’s real-time speech conversion eliminates barriers to natural communication.

The company produced a video, which… I mean, just watch it. It’s 90 seconds. You can afford to shed some tears today:

“We’ve been researching this for five years, and our product launch today marks a huge milestone. Today’s product launch is for phone call functionality. You can download our iOS or Android app from the app stores, and try it out yourself,” Akash Raj, CTO at Whispp, shares in an interview with TechCrunch at CES. “We have a subscription-based model. Currently, we have 50% early-bird sale, putting the cost at $9.99 per month.”

In 2023, Whispp introduced asynchronous message texts, enhancing overall communication by allowing messages to be exchanged at the recipient’s pace, adding flexibility and convenience to the platform. Now, with its new product features, Whispp can accommodate an even more diverse range of voice types and conditions. Even in noisy environments, Whispp enables clear communication and the ability to maintain one’s clarity of speech regardless of external disturbances.

I once spoke at a wedding on behalf of my best friend’s father. Not long before the wedding, his dad had a laryngectomy. It was a powerful and beautiful thing, speaking for someone else in such an emotional moment — but as I’m writing this, I’m finding myself wishing this technology had been available then, so he could have spoken to his son and new daughter-in-law in his own voice.

Whispp is a whisper of fresh air at CES this year — I love a good tech solution to a real human problem. It is truly a beacon of hope for individuals with speech disorders and voice disabilities and a testament to the transformative power of technology.

Read more about CES 2024 on TechCrunch

WhatsApp launches voice updates and polls for Channels

People hold mobile phones in front of the logo of WhatsApp application.

Image Credits: Aytac Unal/Anadolu Agency / Getty Images

WhatsApp is upgrading its broadcasting feature Channels with new abilities such as voice updates, polls and additional admins.

The company said Channel owners can now voice updates to followers. This could be great for engagement, especially for folks with podcasts, who can put out teasers from episodes. WhatsApp noted that its 2 billion user base sends 7 billion voice messages daily.

Additionally, channel owners can post polls to the channels. Previously, the only way to engage with a post on a channel was by reacting with emojis.

The company is also rolling out the ability to share Channel updates to personal WhatsApp Status — the chat app’s Stories feature. Instagram has a similar feature to let users add posts from Channels to their Stories. It’s a neat way for Channel admins to inform people in their network about their Channel.

In June 2023, WhatsApp launched the Channels feature in Singapore and Columbia, and later in September 2023 rolled it out globally. At that time, the app allowed only one admin per channel. But now, the Meta-owned company will allow up to 16 admins per channel. The owner can invite other folks to become admins through the “Invite admins” option on the channel description page.

WhatsApp noted that Channels have grown quickly and over 500 million people are using it monthly.

Last month, Telegram added channel discovery and customization features, including allowing users to post channel updates to their stories to its app.

Amazon is rolling out AI voice search to Fire TV devices

Image Credits: Amazon

Amazon is rolling out an AI-powered search feature to Fire TV that would enable Alexa to answer open-ended questions about TV shows and movies, and deliver more specific recommendations tailored to users’ preferences. 

Announced at Amazon’s annual devices event last September, the feature is powered by a proprietary large language model (LLM). The AI understands natural language and phrases, and can answer specific queries like, “What movie has the line ‘Life is like a box of chocolates?’” (“Forrest Gump,” by the way). It can also pull up options based on the actors, characters, genres and topics. 

Image Credits: Amazon
Image Credits: Amazon

Alexa will also be able to suggest titles included in streaming services you’re already subscribed to. So, if you ask Alexa to find psychological thrillers with surprise endings, it will only show content available to you. 

The new feature will be available in a few weeks to customers in the U.S. with Fire TV devices running Fire OS 6 or later.

The new search feature is the latest of Amazon’s efforts to add generative AI-powered features to Fire TV. The company previously launched an AI-powered image generator to produce TV backgrounds as well as three Alexa experiences that let you play games and create songs.

Amazon is bringing generative AI to Fire TV along with new devices

Skyted

This startup is bringing a 'voice frequency absorber' to CES 2024

Skyted

Image Credits: Skyted

CES has always been the place for weird, out-there gadgets to make their debuts, and this year’s show is no exception.

Skyted, a Toulouse, France-based startup founded by former Airbus VP Stéphane Hersen and acoustical engineer Frank Simon, is bringing what look like a pair of human muzzles to CES 2024. Called the “Mobility Privacy Mask” and “Hybrid Silent Mask,” the face-worn accoutrements are designed to “absorb voice frequencies” in noisy environments like plains, trains and rideshares, Hersen says.

“Skyted’s solution is ideal for commuters, business executives and travelers anywhere,” Hersen is quoted as saying in a press release. “No matter how busy or public the location is, they can now speak in silence and with the assurance that no one nearby can hear their conversation.”

Skyted
Image Credits: Skyted

Now, there’s no getting around the fact that the strap-secured masks aren’t exactly indiscreet or stylish… unless the Dyson Zone tickled your fancy. And at around half a pound (220 grams), they’re not exactly lightweight, either. But Hersen makes the case that the tradeoffs are worth it for the privacy the masks (allegedly) afford.

Skyted’s masks are built from sound-dampening material that Simon developed while at ONERA, the French aerospace lab — originally for jet engines. They sync (via wire or wirelessly) to a smartphone app that offers a pass-through toggle to pipe speech through the phone’s speaker — minimizing the need to remove the mask. The app also calculates the wearer’s “voice level” and shows insights into their “perceptibility” and “intelligibility,” sort of like a Fitbit for speech.

The masks muffle 80% of a wearer’s voice, Skyted claims, while enhancing the volume in voice and video calls by isolating outside noise. And they’ve been tested with “leading” (albeit unnamed) transportation providers, with backing from both ONERA and the European Space Agency.

To this reporter, though, the masks look like a shot in the dark. Skyted’s marketing suggests as much.

On its website, Skyted advertises… unusual in-app features like a “voice awareness” mode that lets parents quiet their noisy mask-donning kids while they’re playing video games. (It’s not totally clear how this works; perhaps active noise cancellation?) Skyted, in fact, pitches the masks as a more “immersive” way to play games and even has a section of its website dedicated to defense and military applications. Skyted claims to have worked with the French military and the Defence Innovation Agency, France’s military R&D arm, to develop a custom mask exclusively for submariners and special ops.

Skyted
Image Credits: Skyted

Skyted appears to be testing a medical mask of some sort too — which, taken with all the other sectors it’s going after, suggests a lack of focus. The scattershot go-to-market — coupled with the eye-watering $299 starting price and low-tech competition — doesn’t bode well for Skyted’s upcoming Kickstarter.

Then again, Skyted managed to secure ~$1 million in seed funding last year, according to Crunchbase data. Perhaps there’s a bigger market for face-mounted, sound-absorbing wearables than I thought.

Read more about CES 2024 on TechCrunch

Google logo sign with white backlighting on dark background

Google cuts over 1,000 jobs in its voice assistance, hardware teams as Fitbit founders leave

Google logo sign with white backlighting on dark background

Image Credits: Artur Widak/NurPhoto / Getty Images

Google is laying off over 1,000 employees across multiple divisions, including engineering and services, late Wednesday.

The affected divisions include voice-activated Google Assistant as part of the knowledge and information product team restructuring; and the Devices and Services PA (DSPA) team that manages Pixel, Nest, and Fitbit hardware.

The company, which had 182,000 employees as of September 30, 2023, confirmed this development but downplayed it through a statement to indicate the job cuts were part of organizational changes.

“To best position us for these opportunities, throughout the second half of 2023, a number of our teams made changes to become more efficient and work better, and to align their resources to their biggest product priorities. Some teams are continuing to make these kinds of organizational changes, which include some role eliminations globally,” a Google spokesperson said in a statement.

The Alphabet Worker Union said on X that the layoffs were “needless” and the company can’t “continue to fire our coworkers” while making billions.

Google has also let go of most of its AR hardware team and will work with other OEMs, as first reported by 9to5Google. The report also mentioned that Google will now have one core hardware engineering team instead of separate teams working on Pixel, Fitbit and Nest.

The company also confirmed to TechCrunch that Fitbit co-founders James Park and Eric Friedman are leaving as part of this restructuring.

Park played a pivotal role in introducing the new Pixel Watch line of smartwatches to Google’s hardware line up.

Google announced that it is acquiring Fitbit for $2.1 billion in 2019. The deal took two years to pass regulatory approval and was finalized in 2021. Since then the company has been merging Fitbit products into Google’s own offerings. For instance, last year, the search giant started prompting Fitbit users to migrate to Google accounts.

Separately, the company has also let go of people working on the Google Assistant team, as reported by Semafor. The company started infusing AI-powered features in Google Assistant through Bard last year in a bid to expand Assistant “beyond voice.” In October, during the Pixel event, Google said that Assistant could look through apps like Gmail and Drive to respond to queries related to specific emails and files.

Last year, Google had rolling layoffs in different teams, including the Waze mapping service in June, its recruiting team in September and its news division in October. Google’s latest company-wide layoff comes a year after the tech giant let go of approximately 12,000 roles, or 6% of its workforce, in January 2023.

Update 1/11/24, 3:48 PM ET: The union representing Alphabet workers said the layoffs impacted more than 1,000 employees, The Information reported, making it the largest round of cuts since last January.

man holding cell phone

Whispp brings electronic larynx voice boxes into this millennium

man holding cell phone

Image Credits: Whispp (opens in a new window)

Having a voice is important — figuratively and literally — and not being able to speak is a major impediment to communication. Whispp is working to change the game for individuals with speech disorders and voice disabilities, bringing voice boxes into the current millennium with its groundbreaking AI-powered assistive speech and phone-calling app.

At CES 2024, the company launched its newest phone-calling feature that converts whispered and vocal cord-impaired speech into a user’s natural voice in real time.

“Voice disabilities and speech disorders like stuttering significantly impact a person’s life and happiness,” said Whispp co-founder and CEO Joris Castermans. “Our solution aims to empower individuals in their daily lives and work. We’re not just transforming voices; we’re advancing communication by making it accessible to all and enhancing quality of life.”

Unlike traditional voice conversion technologies, Whispp’s app uses AI to convert voiceless speech into natural and voiced speech in real time. This technology is scalable due to its language-independent nature — and perhaps most intriguingly, users can also recreate their unique voice by providing recordings of their past or current healthy voice, adding a personalized touch to their communication.

Whispp is a game changer, accommodating a broad spectrum of voice types — from whispers to rough speech resulting from total laryngectomy after throat cancer. The app is particularly effective for several disorders where deliberately steering the voice toward whispering is beneficial due to neurological changes in the speech system.

Castermans stutters himself and discovered that his stuttering is significantly reduced when whispering. The company says research shows that people who stutter severely can reduce their stuttering frequency by an average of 85% while whispering. It also transpires that those suffering from spasmodic dysphonia or recurrent respiratory papillomatosis speak much more relaxed and fluently when they whisper. Unlike conventional speech-to-text approaches with noticeable latency and uneven conversational flow, Whispp’s real-time speech conversion eliminates barriers to natural communication.

The company produced a video, which… I mean, just watch it. It’s 90 seconds. You can afford to shed some tears today:

“We’ve been researching this for five years, and our product launch today marks a huge milestone. Today’s product launch is for phone call functionality. You can download our iOS or Android app from the app stores, and try it out yourself,” Akash Raj, CTO at Whispp, shares in an interview with TechCrunch at CES. “We have a subscription-based model. Currently, we have 50% early-bird sale, putting the cost at $9.99 per month.”

In 2023, Whispp introduced asynchronous message texts, enhancing overall communication by allowing messages to be exchanged at the recipient’s pace, adding flexibility and convenience to the platform. Now, with its new product features, Whispp can accommodate an even more diverse range of voice types and conditions. Even in noisy environments, Whispp enables clear communication and the ability to maintain one’s clarity of speech regardless of external disturbances.

I once spoke at a wedding on behalf of my best friend’s father. Not long before the wedding, his dad had a laryngectomy. It was a powerful and beautiful thing, speaking for someone else in such an emotional moment — but as I’m writing this, I’m finding myself wishing this technology had been available then, so he could have spoken to his son and new daughter-in-law in his own voice.

Whispp is a whisper of fresh air at CES this year — I love a good tech solution to a real human problem. It is truly a beacon of hope for individuals with speech disorders and voice disabilities and a testament to the transformative power of technology.

Read more about CES 2024 on TechCrunch

WhatsApp launches voice updates and polls for Channels

People hold mobile phones in front of the logo of WhatsApp application.

Image Credits: Aytac Unal/Anadolu Agency / Getty Images

WhatsApp is upgrading its broadcasting feature Channels with new abilities such as voice updates, polls and additional admins.

The company said Channel owners can now voice updates to followers. This could be great for engagement, especially for folks with podcasts, who can put out teasers from episodes. WhatsApp noted that its 2 billion user base sends 7 billion voice messages daily.

Additionally, channel owners can post polls to the channels. Previously, the only way to engage with a post on a channel was by reacting with emojis.

The company is also rolling out the ability to share Channel updates to personal WhatsApp Status — the chat app’s Stories feature. Instagram has a similar feature to let users add posts from Channels to their Stories. It’s a neat way for Channel admins to inform people in their network about their Channel.

In June 2023, WhatsApp launched the Channels feature in Singapore and Columbia, and later in September 2023 rolled it out globally. At that time, the app allowed only one admin per channel. But now, the Meta-owned company will allow up to 16 admins per channel. The owner can invite other folks to become admins through the “Invite admins” option on the channel description page.

WhatsApp noted that Channels have grown quickly and over 500 million people are using it monthly.

Last month, Telegram added channel discovery and customization features, including allowing users to post channel updates to their stories to its app.