eMigrate portal by the Indian government

Hacker claims data breach of India's eMigrate labor portal

eMigrate portal by the Indian government

Image Credits: Jagmeet Singh / TechCrunch

A hacker claims to be selling an extensive database associated with an Indian government portal meant for blue-collar workforce emigrating from the country.

Launched by India’s ministry of external affairs, the eMigrate portal helps Indian labor legally emigrate overseas. The portal also provides emigration clearance tracking and insurance services to migrant workers.

According to a listing on a known cybercrime forum that TechCrunch has seen, the pseudonymous hacker published a small portion of the data containing full names, email addresses, phone numbers, dates of birth, mailing addresses and passport details of individuals who allegedly signed up to the portal.

TechCrunch verified that some of the data published by the hacker appears genuine. Similarly, TechCrunch validated the phone numbers found in the published data using a third-party app. One of the records pertained to an Indian government foreign ambassador, whose information in the sample matches public information. A message sent by TechCrunch to the ambassador via WhatsApp went unreturned.

It is unclear whether the data was obtained directly from the eMigrate servers or through a previous breach. The hacker did not share the exact details of when the breach allegedly occurred, but claims to have at least 200,000 internal and registered user entries.

At the time of publication, India’s eMigrate portal says about half a million people were granted emigration clearance in 2023.

When reached by email about the data breach, India’s computer emergency response team, known as CERT-In, told TechCrunch that it was “in [the] process of taking appropriate action with the concerned authority.” India’s ministry of external affairs did not respond to multiple requests for comment.

This is thought to be the latest cybersecurity incident affecting the Indian government in recent months. Earlier this year, TechCrunch exclusively reported on a data leak affecting the Indian government’s cloud service that spilled reams of sensitive information on its citizens. Soon after, it was discovered that scammers had planted online betting ads hidden on Indian government websites.

Scammers found planting online betting ads on Indian government websites

Data lakehouse Onehouse nabs $35M to capitalize on GenAI revolution

Onehouse founder and CEO Vinoth Chandar

Image Credits: Onehouse / Founder and CEO Vinoth Chandar

You can barely go an hour these days without reading about generative AI. While we are still in the embryonic phase of what some have dubbed the “steam engine” of the fourth industrial revolution, there’s little doubt that “GenAI” is shaping up to transform just about every industry — from finance and healthcare to law and beyond.

Cool user-facing applications might attract most of the fanfare, but the companies powering this revolution are currently benefiting the most. Just this month, chipmaker Nvidia briefly became the world’s most valuable company, a $3.3 trillion juggernaut driven substantively by the demand for AI computing power.

But in addition to GPUs (graphics processing units), businesses also need infrastructure to manage the flow of data — for storing, processing, training, analyzing and, ultimately, unlocking the full potential of AI.

One company looking to capitalize on this is Onehouse, a three-year-old Californian startup founded by Vinoth Chandar, who created the open source Apache Hudi project while serving as a data architect at Uber. Hudi brings the benefits of data warehouses to data lakes, creating what has become known as a “data lakehouse,” enabling support for actions like indexing and performing real-time queries on large datasets, be that structured, unstructured or semi-structured data.

For example, an e-commerce company that continuously collects customer data spanning orders, feedback and related digital interactions will need a system to ingest all that data and ensure it’s kept up-to-date, which might help it recommend products based on a user’s activity. Hudi enables data to be ingested from various sources with minimal latency, with support for deleting, updating and inserting (“upsert”), which is vital for such real-time data use cases.

Onehouse builds on this with a fully managed data lakehouse that helps companies deploy Hudi. Or, as Chandar puts it, it “jumpstarts ingestion and data standardization into open data formats” that can be used with nearly all the major tools in the data science, AI and machine learning ecosystems.

“Onehouse abstracts away low-level data infrastructure build-out, helping AI companies focus on their models,” Chandar told TechCrunch.

Today, Onehouse announced it has raised $35 million in a Series B round of funding as it brings two new products to market to improve Hudi’s performance and reduce cloud storage and processing costs.

Down at the (data) lakehouse

Onehouse ad on London billboard
Onehouse ad on London billboard.
Image Credits: Onehouse

Chandar created Hudi as an internal project within Uber back in 2016, and since the ride-hailing company donated the project to the Apache Foundation in 2019, Hudi has been adopted by the likes of Amazon, Disney and Walmart.

Chandar left Uber in 2019, and, after a brief stint at Confluent, founded Onehouse. The startup emerged out of stealth in 2022 with $8 million in seed funding, and followed that shortly after with a $25 million Series A round. Both rounds were co-led by Greylock Partners and Addition.

These VC firms have joined forces again for the Series B follow-up, though this time, David Sacks’ Craft Ventures is leading the round.

“The data lakehouse is quickly becoming the standard architecture for organizations that want to centralize their data to power new services like real-time analytics, predictive ML and GenAI,” Craft Ventures partner Michael Robinson said in a statement.

For context, data warehouses and data lakes are similar in the way they serve as a central repository for pooling data. But they do so in different ways: A data warehouse is ideal for processing and querying historical, structured data, whereas data lakes have emerged as a more flexible alternative for storing vast amounts of raw data in its original format, with support for multiple types of data and high-performance querying.

This makes data lakes ideal for AI and machine learning workloads, as it’s cheaper to store pre-transformed raw data, and at the same time, have support for more complex queries because the data can be stored in its original form.

However, the trade-off is a whole new set of data management complexities, which risks worsening the data quality given the vast array of data types and formats. This is partly what Hudi sets out to solve by bringing some key features of data warehouses to data lakes, such as ACID transactions to support data integrity and reliability, as well as improving metadata management for more diverse datasets.

Configuring data pipelines in Onehouse
Configuring data pipelines in Onehouse.
Image Credits: Onehouse

Because it is an open source project, any company can deploy Hudi. A quick peek at the logos on Onehouse’s website reveals some impressive users: AWS, Google, Tencent, Disney, Walmart, ByteDance, Uber and Huawei, to name a handful. But the fact that such big-name companies leverage Hudi internally is indicative of the effort and resources required to build it as part of an on-premises data lakehouse setup.

“While Hudi provides rich functionality to ingest, manage and transform data, companies still have to integrate about half-a-dozen open source tools to achieve their goals of a production-quality data lakehouse,” Chandar said.

This is why Onehouse offers a fully managed, cloud-native platform that ingests, transforms and optimizes the data in a fraction of the time.

“Users can get an open data lakehouse up-and-running in under an hour, with broad interoperability with all major cloud-native services, warehouses and data lake engines,” Chandar said.

The company was coy about naming its commercial customers, aside from the couple listed in case studies, such as Indian unicorn Apna.

“As a young company, we don’t share the entire list of commercial customers of Onehouse publicly at this time,” Chandar said.

With a fresh $35 million in the bank, Onehouse is now expanding its platform with a free tool called Onehouse LakeView, which provides observability into lakehouse functionality for insights on table stats, trends, file sizes, timeline history and more. This builds on existing observability metrics provided by the core Hudi project, giving extra context on workloads.

“Without LakeView, users need to spend a lot of time interpreting metrics and deeply understand the entire stack to root-cause performance issues or inefficiencies in the pipeline configuration,” Chandar said. “LakeView automates this and provides email alerts on good or bad trends, flagging data management needs to improve query performance.”

Additionally, Onehouse is also debuting a new product called Table Optimizer, a managed cloud service that optimizes existing tables to expedite data ingestion and transformation.

‘Open and interoperable’

There’s no ignoring the myriad other big-name players in the space. The likes of Databricks and Snowflake are increasingly embracing the lakehouse paradigm: Earlier this month, Databricks reportedly doled out $1 billion to acquire a company called Tabular, with a view toward creating a common lakehouse standard.

Onehouse has entered a hot space for sure, but it’s hoping that its focus on an “open and interoperable” system that makes it easier to avoid vendor lock-in will help it stand the test of time. It is essentially promising the ability to make a single copy of data universally accessible from just about anywhere, including Databricks, Snowflake, Cloudera and AWS native services, without having to build separate data silos on each.

As with Nvidia in the GPU realm, there’s no ignoring the opportunities that await any company in the data management space. Data is the cornerstone of AI development, and not having enough good quality data is a major reason why many AI projects fail. But even when the data is there in bucketloads, companies still need the infrastructure to ingest, transform and standardize to make it useful. That bodes well for Onehouse and its ilk.

“From a data management and processing side, I believe that quality data delivered by a solid data infrastructure foundation is going to play a crucial role in getting these AI projects into real-world production use cases — to avoid garbage-in/garbage-out data problems,” Chandar said. “We are beginning to see such demand in data lakehouse users, as they struggle to scale data processing and query needs for building these newer AI applications on enterprise scale data.”

Exclusive: Gemini's data-analyzing abilities aren't as good as Google claims

In this photo illustration a Gemini logo and a welcome message on Gemini website are displayed on two screens.

Image Credits: Lorenzo Di Cola/NurPhoto / Getty Images

One of the selling points of Google’s flagship generative AI models, Gemini 1.5 Pro and 1.5 Flash, is the amount of data they can supposedly process and analyze. In press briefings and demos, Google has repeatedly claimed that the models can accomplish previously impossible tasks thanks to their “long context,” like summarizing multiple hundred-page documents or searching across scenes in film footage.

But new research suggests that the models aren’t, in fact, very good at those things.

Two separate studies investigated how well Google’s Gemini models and others make sense out of an enormous amount of data — think “War and Peace”-length works. Both find that Gemini 1.5 Pro and 1.5 Flash struggle to answer questions about large datasets correctly; in one series of document-based tests, the models gave the right answer only 40%-50% of the time.

“While models like Gemini 1.5 Pro can technically process long contexts, we have seen many cases indicating that the models don’t actually ‘understand’ the content,” Marzena Karpinska, a postdoc at UMass Amherst and a co-author on one of the studies, told TechCrunch.

Gemini’s context window is lacking

A model’s context, or context window, refers to input data (e.g., text) that the model considers before generating output (e.g., additional text). A simple question — “Who won the 2020 U.S. presidential election?” — can serve as context, as can a movie script, show or audio clip. And as context windows grow, so does the size of the documents being fit into them.

The newest versions of Gemini can take in upward of 2 million tokens as context. (“Tokens” are subdivided bits of raw data, like the syllables “fan,” “tas” and “tic” in the word “fantastic.”) That’s equivalent to roughly 1.4 million words, two hours of video or 22 hours of audio — the largest context of any commercially available model.

In a briefing earlier this year, Google showed several pre-recorded demos meant to illustrate the potential of Gemini’s long-context capabilities. One had Gemini 1.5 Pro search the transcript of the Apollo 11 moon landing telecast — around 402 pages — for quotes containing jokes, and then find a scene in the telecast that looked similar to a pencil sketch.

VP of research at Google DeepMind Oriol Vinyals, who led the briefing, described the model as “magical.”

“[1.5 Pro] performs these sorts of reasoning tasks across every single page, every single word,” he said.

That might have been an exaggeration.

In one of the aforementioned studies benchmarking these capabilities, Karpinska, along with researchers from the Allen Institute for AI and Princeton, asked the models to evaluate true/false statements about fiction books written in English. The researchers chose recent works so that the models couldn’t “cheat” by relying on foreknowledge, and they peppered the statements with references to specific details and plot points that’d be impossible to comprehend without reading the books in their entirety.

Given a statement like “By using her skills as an Apoth, Nusis is able to reverse engineer the type of portal opened by the reagents key found in Rona’s wooden chest,” Gemini 1.5 Pro and 1.5 Flash — having ingested the relevant book — had to say whether the statement was true or false and explain their reasoning.

Image Credits: UMass Amherst

Tested on one book around 260,000 words (~520 pages) in length, the researchers found that 1.5 Pro answered the true/false statements correctly 46.7% of the time while Flash answered correctly only 20% of the time. Averaging all the benchmark results, neither model managed to achieve a bit higher than random chance in terms of question-answering accuracy.

“We’ve noticed that the models have more difficulty verifying claims that require considering larger portions of the book, or even the entire book, compared to claims that can be solved by retrieving sentence-level evidence,” Karpinska said. “Qualitatively, we also observed that the models struggle with verifying claims about implicit information that is clear to a human reader but not explicitly stated in the text.”

The second of the two studies, co-authored by researchers at UC Santa Barbara, tested the ability of Gemini 1.5 Flash (but not 1.5 Pro) to “reason over” videos — that is, search through and answer questions about the content in them.

The co-authors created a dataset of images (e.g., a photo of a birthday cake) paired with questions for the model to answer about the objects depicted in the images (e.g., “What cartoon character is on this cake?”). To evaluate the models, they picked one of the images at random and inserted “distractor” images before and after it to create slideshow-like footage.

Flash didn’t perform all that well. In a test that had the model transcribe six handwritten digits from a “slideshow” of 25 images, Flash got around 50% of the transcriptions right. The accuracy dropped to around 30% with eight digits.

“On real question-answering tasks over images, it appears to be particularly hard for all the models we tested,” Michael Saxon, a PhD student at UC Santa Barbara and one of the study’s co-authors, told TechCrunch. “That small amount of reasoning — recognizing that a number is in a frame and reading it — might be what is breaking the model.”

Google is overpromising with Gemini

Neither of the studies have been peer-reviewed, nor do they probe the releases of Gemini 1.5 Pro and 1.5 Flash with 2-million-token contexts. (Both tested the 1-million-token context releases.) And Flash isn’t meant to be as capable as Pro in terms of performance; Google advertises it as a low-cost alternative.

Nevertheless, both add fuel to the fire that Google’s been overpromising — and under-delivering — with Gemini from the beginning. None of the models the researchers tested, including OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet, performed well. But Google’s the only model provider that’s given context window top billing in its advertisements.

Google’s best Gemini demo was faked

“There’s nothing wrong with the simple claim, ‘Our model can take X number of tokens’ based on the objective technical details,” Saxon said. “But the question is, what useful thing can you do with it?”

Generative AI broadly speaking is coming under increased scrutiny as businesses (and investors) grow frustrated with the technology’s limitations.

In a pair of recent surveys from Boston Consulting Group, about half of the respondents — all C-suite executives — said that they don’t expect generative AI to bring about substantial productivity gains and that they’re worried about the potential for mistakes and data compromises arising from generative AI-powered tools. PitchBook recently reported that, for two consecutive quarters, generative AI dealmaking at the earliest stages has declined, plummeting 76% from its Q3 2023 peak.

Faced with meeting-summarizing chatbots that conjure up fictional details about people and AI search platforms that basically amount to plagiarism generators, customers are on the hunt for promising differentiators. Google — which has raced, at times clumsily, to catch up to its generative AI rivals — was desperate to make Gemini’s context one of those differentiators.

‘Embarrassing and wrong’: Google admits it lost control of image-generating AI

But the bet was premature, it seems.

“We haven’t settled on a way to really show that ‘reasoning’ or ‘understanding’ over long documents is taking place, and basically every group releasing these models is cobbling together their own ad hoc evals to make these claims,” Karpinska said. “Without the knowledge of how long context processing is implemented — and companies do not share these details — it is hard to say how realistic these claims are.”

Google didn’t respond to a request for comment.

Both Saxon and Karpinska believe the antidotes to hyped-up claims around generative AI are better benchmarks and, along the same vein, greater emphasis on third-party critique. Saxon notes that one of the more common tests for long context (liberally cited by Google in its marketing materials), “needle in the haystack,” only measures a model’s ability to retrieve particular info, like names and numbers, from datasets — not answer complex questions about that info.

“All scientists and most engineers using these models are essentially in agreement that our existing benchmark culture is broken,” Saxon said, “so it’s important that the public understands to take these giant reports containing numbers like ‘general intelligence across benchmarks’ with a massive grain of salt.”

Updated 7/3: A previous version of this article stated that Gemini 1.5 Pro and 1.5 Flash’s accuracy was below random chance on the task of reasoning over long text. In fact, their accuracy was above random chance. We’ve made the correction. Google PR also sent links to studies that suggest Gemini’s long-context performance is stronger than implied here: Extended Multi-Doc QA, Video MME, longer queries subset on LMSYS, Ruler.

Newsletter writer covering Evolve Bank's data breach says the bank sent him a cease and desist letter

Red and white do not enter sign on the wall

Image Credits: Karl Tapales (opens in a new window) / Getty Images

The situation around a data breach that’s affected an ever-growing number of fintech companies has gotten even weirder. Evolve Bank & Trust announced last week that it was hacked and confirmed the stolen data has been posted to the dark web. Now Evolve has sent a cease and desist letter to the writer of a newsletter who has been covering the ongoing situation.

Jason Mikula, author of respected industry publication Fintech Business Weekly, told TechCrunch that he received a cease and desist letter from the bank telling him not to share files from the dark web with any allegedly impacted fintech companies.

Mikula told TechCrunch that he wasn’t actually doing such sharing but he was offering to do so and did see some of the files. Looking at hacked information is a common practice among journalists when reporting on security breaches as a way to confirm that a breach happened and what was taken.

In this case, Mikula said he’s connected with four people who have access to some of the files that were stolen in the breach and posted on the dark web and has reviewed some of the data himself.

The crux of the problem is that not all the impacted fintechs have received details about what information was stolen in the breach, according to Mikula’s industry sources. 

“As I understand it, some fintechs hadn’t gotten ‘confirmation’ from Evolve about what had been breached and thus hadn’t acted to mitigate risk or inform users,” Mikula told TechCrunch.

Mikula believes that “seeing the files would let them (1) confirm the breach had happened and examples of what data fields were included and (2) allow them to identify specific customers that had been impacted,” he said.

Mikula was posting information on the fintechs confirmed to be involved on X and reporting on it in his newsletter. So much so that X users like Parrot Capital have heaped praise upon him. “Jason has been providing better customer service for those affected by the Evolve Bank breach than anyone else,” Parrot posted on X.

Mikula said yesterday he “woke up to the C&D.” He added that he was reporting on the situation responsibly and would continue to do so. TechCrunch has reached out to Evolve for comment.

Meanwhile, while Evolve was sending letters from lawyers to Mikula, on July 1, a group of senators publicly urged those involved with a fintech in trouble, Synapse, to act. They want Synapse’s owners, its fintech and bank partners — including Evolve — to “immediately restore customers’ access to their money.” Synapse was pressured to file for Chapter 7 bankruptcy in May, liquidating its business entirely. Customers have been frozen out ever since. 

The senators implicated both the partners and investors of the company as being responsible for any missing customer funds. The senators’ letter alleges that $65 million to $95 million worth of funds are missing, but Synapse and all other players, including Evolve, assert that if this is true, they are not the ones responsible. They are all pointing fingers at others. 

The letter was addressed to W. Scott Stafford, president and CEO of Evolve Bank & Trust, but was also sent to major investors in bankrupt banking-as-a-service startup Synapse, as well as to the company’s principal bank and fintech partners.

Want more fintech news in your inbox? Sign up for TechCrunch Fintech here.

Want to reach out with a tip? Email me at [email protected] or send me a message on Signal at 408.204.3036. You can also send a note to the whole TechCrunch crew at [email protected]. For more secure communications, click here to contact us, which includes SecureDrop (instructions here) and links to encrypted messaging apps.

HealthEquity says data breach is an ‘isolated incident’

closed padlocks on a green background with the exception of one lock, in red, that's open, symbolizing badly handled data breaches

Image Credits: MirageC / Getty Images

On Tuesday, health tech services provider HealthEquity disclosed in a filing with federal regulators that it had suffered a data breach, in which hackers stole the “protected health information” of some customers. 

In an 8-K filing with the SEC, the company said it detected “anomalous behavior by a personal use device belonging to a business partner,” and concluded that the partner’s account had been compromised by someone who then used the account to access members’ information.

On Wednesday, HealthEquity disclosed more details of the incident with TechCrunch. HealthEquity spokesperson Amy Cerny said in an email that this was “an isolated incident” that is not connected to other recent breaches, such as that of Change Healthcare, owned by the healthcare giant UnitedHealth. In May, UnitedHealth CEO Andrew Witty said in a House hearing that the breach affected “maybe a third” of all Americans.

HealthEquity detected the breach on March 25, when it “took immediate action, resolved the issue, and began extensive data forensics, which were completed on June 10.” The company brought together “a team of outside and internal experts to investigate and prepare for response.” The investigations determined that the breach was due to the compromised third-party vendor account having access to “some of HealthEquity’s SharePoint data,” according to Cerny.

Contact Us

Do you have more information about this HealthEquity breach? From a non-work device, you can contact Lorenzo Franceschi-Bicchierai securely on Signal at +1 917 257 1382, or via Telegram, Keybase and Wire @lorenzofb, or email. You also can contact TechCrunch via SecureDrop.

SharePoint is a set of Microsoft tools that allows companies to create websites, as well as store and share internal information — essentially an intranet.

Cerny also said that “transactional systems, where integrations occur, were not impacted,” and that the company is notifying partners, clients and members, and has been working with law enforcement as well as experts to work on preventing future incidents. 

TechCrunch asked Cerny to specify what personally identifiable and “protected health” information was stolen in this breach, how many people have been affected and what partner was involved. Cerny declined to answer all of these questions. 

Earlier this year, HealthEquity reported that the company and its subsidiaries “administer HSAs and other CDBs for our more than 15 million accounts in partnership with employers, benefits advisers, and health and retirement plan providers.”

India's Airtel dismisses data breach reports amid customer concerns

Image Credits: Pradeep Gaur / SOPA Images / LightRocket / Getty Images

Airtel, India’s second-largest telecom operator, on Friday denied any breach of its systems following reports of an alleged security lapse that has caused concern among its customers.

The telecom group, which also sells productivity and security solutions to businesses, said it had conducted a “thorough investigation” and found that there has been no breach whatsoever into Airtel’s systems. 

The telecom giant, which has amassed nearly 375 million subscribers in India, dismissed media reports about the alleged breach as “nothing short of a desperate attempt to tarnish Airtel’s reputation by vested interests.”

The company’s statement follows unconfirmed reports of a potential data breach circulated in local outlets and social media, prompting worry among Airtel’s subscriber base.

A purported data broker emerged on a known cybercrime forum this week, offering for sale the alleged personal information of approximately 375 million users, including phone numbers, email addresses, Aadhaar identification numbers and residential addresses.

Earlier this week, TechCrunch reviewed the data sample shared by the broker and found some discrepancies in its entries. Some security experts have also questioned the legitimacy of the alleged incident.

“We don’t think it’s an Airtel database. It seems it’s an aggregated database of multiple databases, and the actor is trying to sell it as an Airtel database,” Rahul Sasi, founder and CEO of cybersecurity startup CloudSEK, told TechCrunch, based on its analysis of the data sample.

Notably, the forum post was removed shortly after the incident was reported online. The forum also permanently blocked the broker’s account as a suspect of “scamming.”

Data breach exposes millions of mSpy spyware customers

an illustration of mailbox icons falling out of the cloud with phones in the red background, symbolizing phone spyware

Image Credits: Bryce Durbin / TechCrunch

A data breach at the phone surveillance operation mSpy has exposed millions of its customers who bought access to the phone spyware app over the past decade, as well as the Ukrainian company behind it.

Unknown attackers stole millions of customer support tickets, including personal information, emails to support, and attachments, including personal documents, from mSpy in May 2024. While hacks of spyware purveyors are becoming increasingly common, they remain notable because of the highly sensitive personal information often included in the data, in this case about the customers who use the service.

The hack encompassed customer service records dating back to 2014, which were stolen from the spyware maker’s Zendesk-powered customer support system.

mSpy is a phone surveillance app that promotes itself as a way to track children or monitor employees. Like most spyware, it is also widely used to monitor people without their consent. These kinds of apps are also known as “stalkerware” because people in romantic relationships often use them to surveil their partner without consent or permission. 

The mSpy app allows whoever planted the spyware, typically someone who previously had physical access to a victim’s phone, to remotely view the phone’s contents in real-time.

As is common with phone spyware, mSpy’s customer records include emails from people seeking help to surreptitiously track the phones of their partners, relatives, or children, according to TechCrunch’s review of the data, which we independently obtained. Some of those emails and messages include requests for customer support from several senior-ranking U.S. military personnel, a serving U.S. federal appeals court judge, a U.S. government department’s watchdog, and an Arkansas county sheriff’s office seeking a free license to trial the app. 

Even after amassing several million customer service tickets, the leaked Zendesk data is thought to represent only the portion of mSpy’s overall customer base who reached out for customer support. The number of mSpy customers is likely to be far higher.

Yet more than a month after the breach, mSpy’s owners, a Ukraine-based company called Brainstack, have not acknowledged or publicly disclosed the breach. 

Troy Hunt, who runs data breach notification site Have I Been Pwned, obtained a copy of the full leaked dataset, adding about 2.4 million unique email addresses of mSpy customers to his site’s catalog of past data breaches. 

Hunt told TechCrunch that he contacted several Have I Been Pwned subscribers with information from the breached data, who confirmed to him that the leaked data was accurate.

mSpy is the latest phone spyware operation in recent months to have been hacked, according to a recently compiled list by TechCrunch. The breach at mSpy shows once again that spyware makers cannot be trusted to keep their data secure — either that of their customers or their victims. 

Millions of mSpy customer messages

TechCrunch analyzed the leaked dataset — more than 100 gigabytes of Zendesk records — which contained millions of individual customer service tickets and their corresponding email addresses, as well as the contents of those emails.

Some of the email addresses belong to unwitting victims who were targeted by an mSpy customer. The data also shows that some journalists contacted the company for comment following the company’s last known breach in 2018. And, on several occasions, U.S. law enforcement agents filed or sought to file subpoenas and legal demands with mSpy. In one case following a brief email exchange, an mSpy representative provided the billing and address information about an mSpy customer — an alleged criminal suspect in a kidnapping and homicide case — to an FBI agent.

Each ticket in the dataset contained an array of information about the people contacting mSpy. In many cases, the data also included their approximate location based on the IP address of the sender’s device.

TechCrunch analyzed where mSpy’s contacting customers were located by extracting all of the location coordinates from the dataset and plotting the data in an offline mapping tool. The results show that mSpy’s customers are located all over the world, with large clusters across Europe, India, Japan, South America, the United Kingdom, and the United States.

a photo showing mSpy's customers across the world, with large clusters across Europe, India, Japan, South America, the United Kingdom, and the United States.
A visualization of location data points from the mSpy database showing where its customers are approximately located.
Image Credits: TechCrunch

Buying spyware is not itself illegal, but selling or using spyware for snooping on someone without their consent is unlawful. U.S. prosecutors have charged spyware makers in the past, and federal authorities and state watchdogs have banned spyware companies from the surveillance industry, citing the cybersecurity and privacy risks that the spyware creates. Customers who plant spyware can also face prosecution for violating wiretapping laws.

The emails in the leaked Zendesk data show that mSpy and its operators are acutely aware of what customers use the spyware for, including monitoring of phones without the person’s knowledge. Some of the requests cite customers asking how to remove mSpy from their partner’s phone after their spouse found out. The dataset also raises questions about the use of mSpy by U.S. government officials and agencies, police departments, and the judiciary, as it is unclear if any use of the spyware followed a legal process.

According to the data, one of the email addresses pertains to Kevin Newsom, a serving appellate judge for the U.S. Court of Appeals for the Eleventh Circuit across Alabama, Georgia, and Florida, who used his official government email to request a refund from mSpy.

Kate Adams, the director of workplace relations for the U.S. Court of Appeals for the Eleventh Circuit, told TechCrunch: “Judge Newsom’s use was entirely in his personal capacity to address a family matter.” Adams declined to answer specific questions about the judge’s use of mSpy or whether the subject of Newsom’s surveillance consented.

The dataset also shows interest from U.S. authorities and law enforcement. An email from a staffer at the Office of the Inspector General for the Social Security Administration, a watchdog tasked with oversight of the federal agency, asked an mSpy representative if the watchdog could “utilize [mSpy] with some of our criminal investigations,” without specifying how.  

When reached by TechCrunch, a spokesperson for the Social Security Administration’s inspector general did not comment on why the staffer inquired about mSpy on behalf of the agency.

The Arkansas County sheriff’s department sought free trials of mSpy, ostensibly for providing demos of the software to neighborhood parents. That sergeant did not respond to TechCrunch’s question about whether they were authorized to contact mSpy.

The company behind mSpy

This is the third known mSpy data breach since the company began in around 2010. mSpy is one of the longest-running phone spyware operations, which is in part how it accumulated so many customers.

Despite its size and reach, mSpy’s operators have remained hidden from public view and have largely evaded scrutiny — until now. It’s not uncommon for spyware makers to conceal the real-world identities of their employees to shield the company from legal and reputational risks associated with running a global phone surveillance operation, which is illegal in many countries.

But the data breach of mSpy’s Zendesk data exposed its parent company as a Ukrainian tech company called Brainstack.

Brainstack’s website does not mention mSpy. Much like its public open job postings, Brainstack only refers to its work on an unspecified “parental control” app. But the internal Zendesk data dump shows Brainstack is extensively and intimately involved in mSpy’s operations.

In the leaked Zendesk data, TechCrunch found records containing information about dozens of employees with Brainstack email addresses. Many of these employees were involved with mSpy customer support, such as responding to customer questions and requests for refunds.

The leaked Zendesk data contains the real names and in some cases the phone numbers of Brainstack employees, as well as the false names that they used when responding to mSpy customer tickets to hide their own identities.

When contacted by TechCrunch, two Brainstack employees confirmed their names as they were found in the leaked records, but declined to discuss their work with Brainstack.

Brainstack chief executive Volodymyr Sitnikov and senior executive Kateryna Yurchuk did not respond to multiple emails requesting comment prior to publication. Instead, a Brainstack representative, who did not provide their name, did not dispute our reporting but declined to provide answers to a list of questions for the company’s executives.

It’s not clear how mSpy’s Zendesk instance was compromised or by whom. The breach was first disclosed by Switzerland-based hacker maia arson crimew, and the data was subsequently made available to DDoSecrets, a nonprofit transparency collective that indexes leaked datasets in the public interest. 

When reached for comment, Zendesk spokesperson Courtney Blake told TechCrunch: “At this time, we have no evidence that Zendesk has experienced a compromise of its platform,” but would not say if mSpy’s use of Zendesk for supporting its spyware operations violated its terms of service.

“We are committed to upholding our User Content and Conduct Policy and investigate allegations of violations appropriately and in accordance with our established procedures,” the spokesperson said.


If you or someone you know needs help, the National Domestic Violence Hotline (1-800-799-7233) provides 24/7 free, confidential support to victims of domestic abuse and violence. If you are in an emergency situation, call 911. The Coalition Against Stalkerware has resources if you think your phone has been compromised by spyware.

Wittaya Aqua's data-driven AI helps seafood farmers increase aquaculture production

Row Of Buoys In Sea

Image Credits: Torben Kulin / EyeEm (opens in a new window)

More than 3 billion people around the globe rely on wild-caught and farmed seafood products for their protein intake. The world’s aquaculture production has hit a new record, and 89% of all aquatic animal production is being used for direct human consumption, according to a report published just last week. This shows a continuous increase in global consumption of aquatic foods. Accordingly, startups in the aquaculture sector are using AI technology to help farmers enhance production and sustainability.

Among them is a Canada-based startup called Wittaya Aqua. Its data-driven platform enables seafood farmers to consolidate existing data points across the seafood supply chain to drive greater profitability, sustainability and efficiency. The startup raised $2.8 million in a seed round to further develop its feed-to-farm platform and expand further into Asia, the largest aquaculture-producing region, after entering Singapore in 2023.

“We were first founded in Canada, but our vision is global, and Asia is a key part of the equation. … The [Asia] region is the global leader in aquaculture production, contributing a significant portion of the world’s seafood,” co-founder and CEO of Wittaya Aqua Evan Hall told TechCrunch. “While Southeast Asia boasts high production, there’s immense potential for further growth through data-driven practices.”

Many countries utilize aquaculture, but a few dominate aquaculture, including China, Indonesia, Vietnam, Bangladesh and South Korea, which are the top five aquaculture producers.

The startup’s platform uses AI and machine learning to enhance its science-based models, forecast animal growth (predictive analytics), and recommend optimal feed types and quantities based on real-time data and growth projections. Its machine-learning algorithm analyzes historical data and environmental factors to suggest strategies for maximizing crop yields.

Image Credits: Wittaya Aqua

Hall, a wildlife conservation photographer, and Dominique Bureau, a professor of animal nutrition and aquaculture at the University of Guelph, saw the inefficiencies and challenges of siloed data in the industry and co-founded Wittaya Aqua in 2017. Hall also said he had experienced firsthand the painful process of copying field notes into Excel to analyze data while working as a fisheries biologist.

The aquaculture data has traditionally been fragmented and slow-moving, hindering the ability to make well-informed decisions, Hall said. Wittaya Aqua aims to address this issue by consolidating data from various points in the supply chain — including farmers, feed mills and ingredient suppliers — into a single centralized platform. Transparency across the value chain provides data and insights to help users make better decisions at every level, according to the company CEO.

“The unified view allows us to build robust, science-based models that provide stakeholders with actionable insights,” Hall said. “For instance, a farmer can see how their feed choices directly impact growth rates and compare their performance against industry benchmarks. Similarly, feed mills can analyze how their feeds perform on various farms, allowing them to refine feeding strategies for specific customer needs.”

Its users include ingredient suppliers, feed mills and farmers. The startup says it is in the revenue-generation stage and has secured some customers, including BioMar, De Heus, Uni-President, US Soybean Export Council, Soy Aquaculture Alliance, Temasek Lifesciences Laboratory, AquaChile and others.

The global aquaculture market is projected to reach $355.6 billion by 2033, up from $299 billion in 2023, according to a report by Precedence Research.  

The company competes with farm management solution providers like Fieldin, Taranis, eFishery, Victory Farms, Atarraya and AquaEasy. What sets Wittaya apart from its peers is that its platform combines nutritional information with field performance. That means the company can model the impacts of different feed ingredients on animal performance, which is unique, Hall said. In addition, it works with multiple species in multiple geographies, from mainstream commercial ones like salmon, shrimp, tilapia and pangasius to niche species like grouper and snapper, unlike most companies that focus on a single species and a single geography.

In the longer term, Wittaya plans to pursue a two-pronged approach to usher in a new era of financial stability for farmers. First, it aims to reduce the perceived credit or insurance risks related to production mortality by offering robust data and insight. Second, it wants to match its users with lenders and insurers who can provide customized financial products, Hall said.

The outfit has 16 staff across Canada and Singapore.

HealthEquity data breach affects 4.3M people

An Opened Prescription Medicine Bottle Among Many Other Sealed Bottles on Yellow Background High Angle View.

Image Credits: MirageC / Getty Images

HealthEquity is notifying 4.3 million people following a March data breach that affects their personal and protected health information.

In its data breach notice, filed with Maine’s attorney general, the Utah-based healthcare benefits administrator said that although the compromised data varies by person, it largely consists of sign-up information for accounts and information about benefits that the company administers.

HealthEquity said the data may include customer names, addresses, phone numbers, their Social Security number, information about the person’s employer and the person’s dependent (if any), and some payment card information. 

HealthEquity provides employees at companies across the United States access to workplace benefits, like health savings accounts and commuter options for public transit and parking. At its February earnings, HealthEquity said it had more than 15 million total customer accounts.

In its data breach notice, HealthEquity said it discovered the data breach after finding unauthorized access in an “unstructured data repository” outside of its core network that contained customers’ personal and health information. Some of the stolen data also includes information about diagnoses and prescriptions, the company said.

The notice said that the breach occurred because a user account of one of HealthEquity’s vendors was compromised and their password stolen, which was used by the malicious hacker to access the data repository.

When reached for comment, HealthEquity would not name the third-party vendor. The company previously told TechCrunch that the compromised third-party vendor account had access to “some of HealthEquity’s SharePoint data,” referring to Microsoft SharePoint, which allows companies to create their own internal intranets. 

Several other companies in recent years, including Activision, Snowflake, and Worldcoin, have experienced security incidents because of employee password theft, often by way of password-stealing malware, which scrapes the passwords and credentials found on an employee’s computer. Some password-stealing malware can skirt multifactor authentication, a security feature that can block some password theft attacks, by stealing session tokens, which are stored on an employee’s computer to keep them persistently logged in. When stolen, session tokens can be used to gain access to the company’s network as if the hacker was that employee.

HealthEquity spokesperson Stacie Saltzgiver reiterated that the data breach was an “isolated incident” and confirmed that it was unrelated to the recent breaches of customer data held by cloud giant Snowflake.

HealthEquity has published a data breach notification on its website. When TechCrunch checked the website notice, HealthEquity had included hidden “noindex” code on the page that tells search engines to ignore the web page, effectively blocking affected individuals from finding HealthEquity’s data breach notice in search results. 

When asked by TechCrunch, the company’s spokesperson did not comment on the inclusion of the code.

CSC ServiceWorks reveals 2023 data breach affecting thousands of people

a photo of a person wearing a blue t-shirt putting in laundry in a row of laundry machines at a laundromat in New York

Image Credits: Tim Boyle / Getty Images

Laundry giant CSC ServiceWorks says tens of thousands of people had their personal information stolen from its systems after recently disclosing a cyberattack from 2023.

The New York-based laundry giant provides more than a million internet-connected laundry machines to residential buildings, hotels and university campuses around North America and Europe. CSC also employs more than 3,200 team members, according to its website.

In a data breach notification filed late on Friday, CSC confirmed that the data breach affected at least 35,340 individuals, including over a hundred people in Maine. 

News of the data breach is the latest security issue to beset CSC over the past year, after multiple security researchers say they found simple but critical vulnerabilities in its laundry platform capable of losing the company revenue.

In its data breach notice, CSC said an intruder broke into its systems on September 23, 2023 and had access to its network for five months until February 4, 2024, when the company discovered the intruder. It’s not known why it took the company several months to detect the breach. CSC said it took until June to identify what data was stolen.

The stolen data includes names; dates of birth; contact information; government identity documents, such as Social Security and driver license numbers; financial information, such as bank account numbers; and health insurance information, including some limited medical information.

Given that the types of data involved typically relate to the information that companies hold on their employees, such as for business records and workplace benefits, it’s plausible that the data breach affects current and former CSC employees, as customers are not typically asked for this information.

For its part, CSC would not clarify either way.

CSC spokesperson Stephen Gilbert declined to answer TechCrunch’s specific questions about the incident, including whether the breach affects employees, customers or both. The company would not describe the nature of the cyberattack, or whether the company has received any communication from the threat actor, such as a ransom demand.

CSC made headlines earlier this year after ignoring a simple bug discovered by two student security researchers that allowed anyone to run free laundry cycles. The company belatedly patched the vulnerability and apologized to the researchers, who spent weeks trying to alert the company to the flaw.

The findings prompted the company to set up a vulnerability disclosure program, allowing future security researchers to contact the company directly to privately report bugs or vulnerabilities. 

Last month, details of a new vulnerability found in CSC-powered laundry machines allowing anyone to also get free laundry were made public. Michael Orlitzky said in a blog post that the hardware-level vulnerability, which involves short circuiting two wires inside a CSC-powered laundry machine, bypasses the need to enter coins to operate the machine. Orlitzky is due to present his findings at the Def Con security conference in Las Vegas on Saturday.