Intron Health gets backing for its speech-recognition tool that recognizes African accents

Intron health raises $1.6 million pre-seed funding

Image Credits: Intron Health

Voice recognition is getting integrated in nearly all facets of modern living, but there remains a big gap: Speakers of minority languages and those with thick accents or speech disorders like stuttering are typically less able to use speech-recognition tools that control applications, transcribe or automate tasks, among other functions.

Tobi Olatunji, founder and CEO of clinical speech-recognition startup Intron Health, wants to bridge this gap. He claims that Intron is Africa’s largest clinical speech database, with its algorithm trained on 3.5 million audio clips (16,000 hours) from over 18,000 contributors, mainly healthcare practitioners, representing 29 countries and 288 accents. Olatunji says that drawing most of its contributors from the healthcare sector ensures that medical terms are pronounced and captured correctly for his target markets. 

“Because we’ve already trained on many African accents, it’s very likely that the baseline performance of their access will be much better than any other service they use,” he said, adding that data from Ghana, Uganda and South Africa is growing and that the startup is confident about deploying the model there. 

Olatunji’s interest in health tech stems from two strands of his experience. First, he received training and practiced as a medical doctor in Nigeria, where he saw firsthand the inefficiencies of the systems in that market, including how much paperwork needed to be filled out and how hard it was to track all of it.

“When I was a doctor in Nigeria a couple years ago, even during medical school and even now, I get irritated easily doing a repetitive task that is not deserving of human efforts,” he said. “An easy example is we had to write a patient’s name on every lab order you do. And just something that’s simple, let’s say I’m seeing the patients, and they need to get some prescriptions, they need to get some labs. I have to manually write out every order for them. It’s just frustrating for me to have to repeat the patient name over and over on each form, the age, the date, and all that. … I’m always asking, how can we do things better? How can we make life easier for doctors? Can we take some tasks away and offload them to another system so that the doctor can spend their time doing things that are very valuable?”

Those questions propelled him to the next phase of his life. Olatunji moved to the U.S. to pursue a master’s degree in medical informatics from the University of San Francisco and then another in computer science at Georgia Tech.

He then cut his teeth at a number of tech companies. As a clinical natural language programming (NLP) scientist and researcher at Enlitic, a San Francisco Bay Area company, he built models to automate the extraction of information from radiology text reports. He also served Amazon Web Services as a machine learning scientist. At both Enlitic and Amazon, he focused on natural language processing for healthcare, shaping systems that enable hospitals to run better.

Throughout those experiences, he started to form ideas around how what was being developed and used in the U.S. could be used to improve healthcare in Nigeria and other emerging markets like it.

The original aim of Intron Health, launched in 2020, was to digitize hospital operations in Africa through an electronic medical record (EMR) system. But take-up was challenging: It turned out physicians preferred writing to typing, said Olatunji.

That led him to explore how to improve that more basic problem: how to make physicians’ basic data entry, writing, work better. At first the company looked at third-party solutions for automating tasks such as note-taking and embedding existing speech to text technologies into his EMR program.

There were a lot of issues, however, because of constant mis-transcription. It became clear to Olatunji that thick African accents and the pronunciation of complicated medical terms and names made the adoption of existing foreign transcription tools impractical. 

This marked the genesis of Intron Health’s speech-recognition technology, which can recognize African accents and can be integrated with existing EMRs. The tool has to date been adopted in 30 hospitals across five markets, including Kenya and Nigeria. 

There have been some immediate positive outcomes. In one case, Olatunji said, Intron Health has helped reduce the waiting time for radiology results at one of West Africa’s largest hospitals from 48 hours to 20 minutes. Such efficiencies are critical in healthcare provision, especially in Africa, where the doctor-to-patient ratio remains one of the lowest in the world.

“Hospitals have already spent so much on equipment and technology … Ensuring that they apply these tech is important. We’re able to provide value to help them improve the adoption of the EMR system,” he said.

Looking ahead, the startup is exploring new growth frontiers backed by a $1.6 million pre-seed round, led by Microtraction, with participation from Plug and Play Ventures, Jaza Rift Ventures, Octopus Ventures, Africa Health Ventures, OpenseedVC, Pi Campus, Alumni Angel, BakerBridge Capital and several angel investors.

In terms of technology, Intron Health is working to perfect noise cancelation, as well as ensuring that the platform works well even in low bandwidths. This is in addition to enabling the transcription of multi-speaker conversations and integrating text-to-speech capabilities.

The plan, Olatunji says, is to add intelligence systems or decision support tools for tasks such as prescription or lab tests. These tools, he adds, can help reduce doctor errors, ensure adequate patient care and speed up their work. 

Intron Health is among the growing number of generative AI startups in the medical space, including Microsoft’s DAX Express, which are reducing administrative tasks for clinicians by generating notes within seconds. The emergence and adoption of these technologies come as the global speech- and voice-recognition market is projected to be valued at $84.97 billion by 2032, following a CAGR of 23.7% from 2024, according to Fortune Business Insights.

Beyond building voice technologies, Intron is also playing a pivotal role in speech research in Africa, having recently partnered with Google Research, the Bill & Melinda Gates Foundation, and Digital Square at PATH to evaluate popular large language models (LLMs) such as OpenAI’s GPT-4o, Google’s Gemini, and Anthropic’s Claude across 15 countries, to identify strengths, weaknesses, and risks of bias or harm in LLMs. This is all in a bid to ensure that culturally attuned models are available for African clinics and hospitals. 

ElevenLabs' text-to-speech app Reader is now available globally

ElevenLabs Reader app shown in handheld smartphone

Image Credits: ElevenLabs

ElevenLabs, a startup developing AI-powered tools to create and edit synthetic voices, is making its Reader app available across the world with support for 32 languages.

The app, first released in June in the U.S., the U.K. and Canada, lets users upload any text content — like articles, PDF documents or e-books — and listen to it in different languages and voices. Reader now supports languages including Portuguese, Spanish, French, Hindi, German, Japanese, Arabic, Korean, Italian, Tamil and Swedish.

ElevenLabs, which became a unicorn earlier this year after raising $80 million from investors, including Andreessen Horowitz, provides an API that companies can use for various use cases like dubbing or text-to-speech. The company powers voice interactions on the Rabbit r1, as well as text-to-speech features on AI-powered search engine Perplexity and audio platforms Pocket FM and Kuku FM. The Reader app is its first consumer-facing product.

The startup said it has added hundreds of new voices from its library that are suited for different languages. Last month, the company licensed the voices of actors such as Judy Garland, James Dean, Burt Reynolds and Sir Laurence Olivier for the app.

ElevenLabs said the extended language support is powered by its Turbo v2.5 model, released last month, which purportedly reduces the latency of text-to-speech conversion and improves quality.

The Reader app’s closest rival is Speechify, which offers additional features like scanning documents for text, integrations with Gmail and Canvas, as well as letting users clone their own voice to read out text. Mozilla-owned Pocket and The New York Times’ Audm-based audio app also let users listen to content.

ElevenLabs said it would add more features to the app, such as offline support and the ability to share audio snippets.

Intron Health gets backing for its speech-recognition tool that recognizes African accents

Intron health raises $1.6 million pre-seed funding

Image Credits: Intron Health

Voice recognition is getting integrated in nearly all facets of modern living, but there remains a big gap: Speakers of minority languages and those with thick accents or speech disorders like stuttering are typically less able to use speech-recognition tools that control applications, transcribe or automate tasks, among other functions.

Tobi Olatunji, founder and CEO of clinical speech-recognition startup Intron Health, wants to bridge this gap. He claims that Intron is Africa’s largest clinical speech database, with its algorithm trained on 3.5 million audio clips (16,000 hours) from over 18,000 contributors, mainly healthcare practitioners, representing 29 countries and 288 accents. Olatunji says that drawing most of its contributors from the healthcare sector ensures that medical terms are pronounced and captured correctly for his target markets. 

“Because we’ve already trained on many African accents, it’s very likely that the baseline performance of their access will be much better than any other service they use,” he said, adding that data from Ghana, Uganda and South Africa is growing and that the startup is confident about deploying the model there. 

Olatunji’s interest in health tech stems from two strands of his experience. First, he received training and practiced as a medical doctor in Nigeria, where he saw firsthand the inefficiencies of the systems in that market, including how much paperwork needed to be filled out and how hard it was to track all of it.

“When I was a doctor in Nigeria a couple years ago, even during medical school and even now, I get irritated easily doing a repetitive task that is not deserving of human efforts,” he said. “An easy example is we had to write a patient’s name on every lab order you do. And just something that’s simple, let’s say I’m seeing the patients, and they need to get some prescriptions, they need to get some labs. I have to manually write out every order for them. It’s just frustrating for me to have to repeat the patient name over and over on each form, the age, the date, and all that. … I’m always asking, how can we do things better? How can we make life easier for doctors? Can we take some tasks away and offload them to another system so that the doctor can spend their time doing things that are very valuable?”

Those questions propelled him to the next phase of his life. Olatunji moved to the U.S. to pursue a master’s degree in medical informatics from the University of San Francisco and then another in computer science at Georgia Tech.

He then cut his teeth at a number of tech companies. As a clinical natural language programming (NLP) scientist and researcher at Enlitic, a San Francisco Bay Area company, he built models to automate the extraction of information from radiology text reports. He also served Amazon Web Services as a machine learning scientist. At both Enlitic and Amazon, he focused on natural language processing for healthcare, shaping systems that enable hospitals to run better.

Throughout those experiences, he started to form ideas around how what was being developed and used in the U.S. could be used to improve healthcare in Nigeria and other emerging markets like it.

The original aim of Intron Health, launched in 2020, was to digitize hospital operations in Africa through an electronic medical record (EMR) system. But take-up was challenging: It turned out physicians preferred writing to typing, said Olatunji.

That led him to explore how to improve that more basic problem: how to make physicians’ basic data entry, writing, work better. At first the company looked at third-party solutions for automating tasks such as note-taking and embedding existing speech to text technologies into his EMR program.

There were a lot of issues, however, because of constant mis-transcription. It became clear to Olatunji that thick African accents and the pronunciation of complicated medical terms and names made the adoption of existing foreign transcription tools impractical. 

This marked the genesis of Intron Health’s speech-recognition technology, which can recognize African accents and can be integrated with existing EMRs. The tool has to date been adopted in 30 hospitals across five markets, including Kenya and Nigeria. 

There have been some immediate positive outcomes. In one case, Olatunji said, Intron Health has helped reduce the waiting time for radiology results at one of West Africa’s largest hospitals from 48 hours to 20 minutes. Such efficiencies are critical in healthcare provision, especially in Africa, where the doctor-to-patient ratio remains one of the lowest in the world.

“Hospitals have already spent so much on equipment and technology … Ensuring that they apply these tech is important. We’re able to provide value to help them improve the adoption of the EMR system,” he said.

Looking ahead, the startup is exploring new growth frontiers backed by a $1.6 million pre-seed round, led by Microtraction, with participation from Plug and Play Ventures, Jaza Rift Ventures, Octopus Ventures, Africa Health Ventures, OpenseedVC, Pi Campus, Alumni Angel, BakerBridge Capital and several angel investors.

In terms of technology, Intron Health is working to perfect noise cancelation, as well as ensuring that the platform works well even in low bandwidths. This is in addition to enabling the transcription of multi-speaker conversations and integrating text-to-speech capabilities.

The plan, Olatunji says, is to add intelligence systems or decision support tools for tasks such as prescription or lab tests. These tools, he adds, can help reduce doctor errors, ensure adequate patient care and speed up their work. 

Intron Health is among the growing number of generative AI startups in the medical space, including Microsoft’s DAX Express, which are reducing administrative tasks for clinicians by generating notes within seconds. The emergence and adoption of these technologies come as the global speech- and voice-recognition market is projected to be valued at $84.97 billion by 2032, following a CAGR of 23.7% from 2024, according to Fortune Business Insights.

Beyond building voice technologies, Intron is also playing a pivotal role in speech research in Africa, having recently partnered with Google Research, the Bill & Melinda Gates Foundation, and Digital Square at PATH to evaluate popular large language models (LLMs) such as OpenAI’s GPT-4o, Google’s Gemini, and Anthropic’s Claude across 15 countries, to identify strengths, weaknesses, and risks of bias or harm in LLMs. This is all in a bid to ensure that culturally attuned models are available for African clinics and hospitals. 

Largest text-to-speech AI model yet shows 'emergent abilities'

Illustration of a robot in a laptop

Image Credits: Carol Yepes (opens in a new window) / Getty Images

Researchers at Amazon have trained the largest ever text-to-speech model yet, which they claim exhibits “emergent” qualities improving its ability to speak even complex sentences naturally. The breakthrough could be what the technology needs to escape the uncanny valley.

These models were always going to grow and improve, but the researchers specifically hoped to see the kind of leap in ability that we observed once language models got past a certain size. For reasons unknown to us, once LLMs grow past a certain point, they start being way more robust and versatile, able to perform tasks they weren’t trained to.

That is not to say they are gaining sentience or anything, just that past a certain point their performance on certain conversational AI tasks hockey sticks. The team at Amazon AGI — no secret what they’re aiming at — thought the same might happen as text-to-speech models grew as well, and their research suggests this is in fact the case.

The new model is called Big Adaptive Streamable TTS with Emergent abilities, which they have contorted into the abbreviation BASE TTS. The largest version of the model uses 100,000 hours of public domain speech, 90% of which is in English, the remainder in German, Dutch and Spanish.

At 980 million parameters, BASE-large appears to be the biggest model in this category. They also trained 400M- and 150M-parameter models based on 10,000 and 1,000 hours of audio respectively, for comparison — the idea being, if one of these models shows emergent behaviors but another doesn’t, you have a range for where those behaviors begin to emerge.

As it turns out, the medium-sized model showed the jump in capability the team was looking for, not necessarily in ordinary speech quality (it is reviewed better but only by a couple points) but in the set of emergent abilities they observed and measured. Here are examples of tricky text mentioned in the paper:

Compound nouns: The Beckhams decided to rent a charming stone-built quaint countryside holiday cottage.Emotions: “Oh my gosh! Are we really going to the Maldives? That’s unbelievable!” Jennie squealed, bouncing on her toes with uncontained glee.Foreign words: “Mr. Henry, renowned for his mise en place, orchestrated a seven-course meal, each dish a pièce de résistance.Paralinguistics (i.e. readable non-words): “Shh, Lucy, shhh, we mustn’t wake your baby brother,” Tom whispered, as they tiptoed past the nursery.Punctuations: She received an odd text from her brother: ’Emergency @ home; call ASAP! Mom & Dad are worried…#familymatters.’Questions: But the Brexit question remains: After all the trials and tribulations, will the ministers find the answers in time?Syntactic complexities: The movie that De Moya who was recently awarded the lifetime achievement award starred in 2022 was a box-office hit, despite the mixed reviews.

“These sentences are designed to contain challenging tasks – parsing garden-path sentences, placing phrasal stress on long-winded compound nouns, producing emotional or whispered speech, or producing the correct phonemes for foreign words like “qi” or punctuations like “@” – none of which BASE TTS is explicitly trained to perform,” the authors write.

Such features normally trip up text-to-speech engines, which will mispronounce, skip words, use odd intonation or make some other blunder. BASE TTS still had trouble, but it did far better than its contemporaries — models like Tortoise and VALL-E.

There are a bunch of examples of these difficult texts being spoken quite naturally by the new model at the site they made for it. Of course these were chosen by the researchers, so they’re necessarily cherry-picked, but it’s impressive regardless. Here are a couple, if you don’t feel like clicking through:

https://techcrunch.com/wp-content/uploads/2024/02/shh-its-starting.wavhttps://techcrunch.com/wp-content/uploads/2024/02/how-french.wavhttps://techcrunch.com/wp-content/uploads/2024/02/guiding-moonlight.wav

Because the three BASE TTS models share an architecture, it seems clear that the size of the model and the extent of its training data seem to be the cause of the model’s ability to handle some of the above complexities. Bear in mind this is still an experimental model and process — not a commercial model or anything. Later research will have to identify the inflection point for emergent ability and how to train and deploy the resulting model efficiently.

A representative for Amazon AI, Leo Zao (not an author of the paper), wrote that they don’t make any claims of exclusive emergent properties here.

“We think it’s premature to conclude that such emergence won’t appear in other models. Our proposed emergent abilities test set is one way to quantify this emergence, and it is possible that applying this test set to other models could produce similar observations. This is partly why we decided to release this test set publicly,” he wrote in an email. “It is still early days for a ‘Scaling Law’ for TTS, and we look forward to more research on this topic.”

Notably, this model is “streamable,” as the name says — meaning it doesn’t need to generate whole sentences at once but goes moment by moment at a relatively low bitrate. The team has also attempted to package the speech metadata like emotionality, prosody and so on in a separate, low-bandwidth stream that could accompany vanilla audio.

It seems that text-to-speech models may have a breakout moment in 2024 — just in time for the election! But there’s no denying the usefulness of this technology, for accessibility in particular. The team does note that it declined to publish the model’s source and other data due to the risk of bad actors taking advantage of it. The cat will get out of that bag eventually, though.

Image of a pink speech bubble tied up in string.

Expressable brings speech therapy into the home

Image of a pink speech bubble tied up in string.

Image Credits: Mina De La O (opens in a new window) / Getty Images

Leanne Sherred, a pediatric speech therapist, has long encountered challenges putting caregiver-led therapy into practice in traditional care settings.

Research suggests that caregiver-led speech therapy, which involves training the caregivers of patients in skill-building therapeutic techniques to use at home, can be highly effective. But as Sherred observed in the course of her practice, therapists often have limited access to caregivers and face serious educational and tech roadblocks.

In 2020, around the start of the pandemic, Sherred saw an opportunity to attempt a new, tech-forward speech therapy care model, one that put caregivers “at the center of care” (in her words). She teamed up with Nick Barbara (Sherred’s spouse), Spencer Magloff and Ryan Hinojosa to found Expressable, a platform that offers one-on-one virtual sessions with speech language pathologists.

“Layered on top of Expressable’s synchronous care is a platform that includes multimedia home programming, interactive weekly practice activities, therapist SMS support and more,” Magloff, Expressable’s chief marketing officer, told TechCrunch in an interview. “With Expressable, speech therapy isn’t limited to one to two times per week, void of caregiver participation.”

Expressable is covered by some insurance plans (including Medicaid) but also offers private pay rates and accepts HSAs and FSAs. It matches patients with speech therapists who might be able to meet their needs and fit their schedules. The matched therapist develops a treatment plan and then regularly meets with the patient and/or their caregiver for online sessions.

Expressable
Image Credits: Expressable

Some aspects of the plan are designed to be done on the patient’s own time, through Exressable’s self-service platform. Patients and caregivers can track progress week-to-week toward goals and milestones in their individualized plans.

Expressable, which caters to both adult and child patients with conditions ranging from language disorders to speech delays, aphasia, stuttering and autism spectrum disorder, differentiated itself early from many other telehealth startups by hiring its health specialists as W2 employees as opposed to contractors. While this increased Expressable’s medical licensing burden, it positioned the company well to handle challenging speech cases, Magloff says, which often require intensive, years-long treatment plans.

“With Expressable, parents and caregivers become active members of their patient’s care team, extending care into the home and throughout the entire therapeutic progress for faster outcomes,” Magloff said.

The digital and telehealth sector enjoyed liberal access to capital in the height of the pandemic but has cooled noticeably. But Expressable is bucking the trend, earlier this week closing a $26 million Series B round led by HarbourVest Partners with participation from Digitalis Ventures, F-Prime Capital and Lerer Hippeau.

With $50 million in the bank, Expressable plans to make improvements to its care delivery model and core tech, expand its payer relationships and grow its network of therapists as well as its operational team. The company’s also experimenting with various forms of AI, Magloff says.

“There are a number of relevant AI use cases we’re currently exploring or adapting to improve the client experience,” he added. “These could help catalog common speech errors, reduce administrative burdens on clinicians and improve operational efficiency.”