Demand for AI is driving data center water consumption sky high

AI

Image Credits: MR.Cole_Photographer / Getty Images

The AI boom is fueling the demand for data centers and, in turn, driving up water consumption. (Water is used to cool the computing equipment inside data centers.) According to FT, in Virginia — home to the world’s largest concentration of data centers — water usage jumped by almost two-thirds between 2019 and 2023, from 1.13 billion gallons to 1.85 billion gallons.

Many say the trend, playing out worldwide, is unsustainable. Microsoft, a major data center operator, says 42% of the water it consumed in 2023 came from “areas with water stress.” Google, which has among the largest data center footprints, said this year that 15% of its freshwater withdrawals came from areas with “high water scarcity.”

Why can’t data centers recycle water in a closed-loop system? Many do, but much of what they consume is set aside for humidity control, meaning it evaporates. Especially in drier regions, air that’s not humidified can become a strong conductor of static electricity, which is usually bad news for computers.

a photo of a person wearing a blue t-shirt putting in laundry in a row of laundry machines at a laundromat in New York

CSC ServiceWorks reveals 2023 data breach affecting thousands of people

a photo of a person wearing a blue t-shirt putting in laundry in a row of laundry machines at a laundromat in New York

Image Credits: Tim Boyle / Getty Images

Laundry giant CSC ServiceWorks says tens of thousands of people had their personal information stolen from its systems after recently disclosing a cyberattack from 2023.

The New York-based laundry giant provides over a million internet-connected laundry machines to residential buildings, hotels, and university campuses around North America and Europe. CSC also employs more than 3,200 team members, according to its website.

In a data breach notification filed late on Friday, CSC confirmed that the data breach affected at least 35,340 individuals, including over a hundred people in Maine. 

News of the data breach is the latest security issue to beset CSC over the past year, after multiple security researchers say they found simple but critical vulnerabilities in its laundry platform capable of losing the company revenue.

In its data breach notice, CSC said an intruder broke into its systems on September 23, 2023 and had access to its network for five months until February 4, 2024, when the company discovered the intruder. It’s not known why it took the company several months to detect the breach. CSC said it took until June to identify what data was stolen.

The stolen data includes names; dates of birth; contact information; government identity documents, such as Social Security and driver’s license numbers; financial information, such as bank account numbers; and health insurance information, including some limited medical information.

Given that the types of data involved typically relate to the information that companies hold on their employees, such as for business records and workplace benefits, it’s plausible that the data breach affects current and former CSC employees, as customers are not typically asked for this information.

For its part, CSC would not clarify either way.

CSC spokesperson Stephen Gilbert declined to answer TechCrunch’s specific questions about the incident, including whether the breach affects employees, customers, or both. The company would not describe the nature of the cyberattack, or whether the company has received any communication from the threat actor, such as a ransom demand.

CSC made headlines earlier this year after ignoring a simple bug discovered by two student security researchers that allowed anyone to run free laundry cycles. The company belatedly patched the vulnerability and apologized to the researchers, who spent weeks trying to alert the company to the flaw.

The findings prompted the company to set up a vulnerability disclosure program, allowing future security researchers to contact the company directly to privately report bugs or vulnerabilities. 

Last month, details of a new vulnerability found in CSC-powered laundry machines allowing anyone to also get free laundry were made public. Michael Orlitzky said in a blog post that the hardware-level vulnerability, which involves short circuiting two wires inside a CSC-powered laundry machine, bypasses the need to enter coins to operate the machine. Orlitzky is due to present his findings at the Def Con security conference in Las Vegas on Saturday.

Hacker claims data breach of India's eMigrate labor portal

eMigrate portal by the Indian government

Image Credits: Jagmeet Singh / TechCrunch

A hacker claims to be selling an extensive database associated with an Indian government portal meant for blue-collar workforce emigrating from the country.

Launched by India’s ministry of external affairs, the eMigrate portal helps Indian labor legally emigrate overseas. The portal also provides emigration clearance tracking and insurance services to migrant workers.

According to a listing on a known cybercrime forum that TechCrunch has seen, the pseudonymous hacker published a small portion of the data containing full names, email addresses, phone numbers, dates of birth, mailing addresses and passport details of individuals who allegedly signed up to the portal.

TechCrunch verified that some of the data published by the hacker appears genuine. Similarly, TechCrunch validated the phone numbers found in the published data using a third-party app. One of the records pertained to an Indian government foreign ambassador, whose information in the sample matches public information. A message sent by TechCrunch to the ambassador via WhatsApp went unreturned.

It is unclear whether the data was obtained directly from the eMigrate servers or through a previous breach. The hacker did not share the exact details of when the breach allegedly occurred, but claims to have at least 200,000 internal and registered user entries.

At the time of publication, India’s eMigrate portal says about half a million people were granted emigration clearance in 2023.

When reached by email about the data breach, India’s computer emergency response team, known as CERT-In, told TechCrunch that it was “in [the] process of taking appropriate action with the concerned authority.” India’s ministry of external affairs did not respond to multiple requests for comment.

This is thought to be the latest cybersecurity incident affecting the Indian government in recent months. Earlier this year, TechCrunch exclusively reported on a data leak affecting the Indian government’s cloud service that spilled reams of sensitive information on its citizens. Soon after, it was discovered that scammers had planted online betting ads hidden on Indian government websites.

Scammers found planting online betting ads on Indian government websites

Data lakehouse Onehouse nabs $35M to capitalize on GenAI revolution

Onehouse founder and CEO Vinoth Chandar

Image Credits: Onehouse / Founder and CEO Vinoth Chandar

You can barely go an hour these days without reading about generative AI. While we are still in the embryonic phase of what some have dubbed the “steam engine” of the fourth industrial revolution, there’s little doubt that “GenAI” is shaping up to transform just about every industry — from finance and healthcare to law and beyond.

Cool user-facing applications might attract most of the fanfare, but the companies powering this revolution are currently benefiting the most. Just this month, chipmaker Nvidia briefly became the world’s most valuable company, a $3.3 trillion juggernaut driven substantively by the demand for AI computing power.

But in addition to GPUs (graphics processing units), businesses also need infrastructure to manage the flow of data — for storing, processing, training, analyzing and, ultimately, unlocking the full potential of AI.

One company looking to capitalize on this is Onehouse, a three-year-old Californian startup founded by Vinoth Chandar, who created the open source Apache Hudi project while serving as a data architect at Uber. Hudi brings the benefits of data warehouses to data lakes, creating what has become known as a “data lakehouse,” enabling support for actions like indexing and performing real-time queries on large datasets, be that structured, unstructured or semi-structured data.

For example, an e-commerce company that continuously collects customer data spanning orders, feedback and related digital interactions will need a system to ingest all that data and ensure it’s kept up-to-date, which might help it recommend products based on a user’s activity. Hudi enables data to be ingested from various sources with minimal latency, with support for deleting, updating and inserting (“upsert”), which is vital for such real-time data use cases.

Onehouse builds on this with a fully managed data lakehouse that helps companies deploy Hudi. Or, as Chandar puts it, it “jumpstarts ingestion and data standardization into open data formats” that can be used with nearly all the major tools in the data science, AI and machine learning ecosystems.

“Onehouse abstracts away low-level data infrastructure build-out, helping AI companies focus on their models,” Chandar told TechCrunch.

Today, Onehouse announced it has raised $35 million in a Series B round of funding as it brings two new products to market to improve Hudi’s performance and reduce cloud storage and processing costs.

Down at the (data) lakehouse

Onehouse ad on London billboard
Onehouse ad on London billboard.
Image Credits: Onehouse

Chandar created Hudi as an internal project within Uber back in 2016, and since the ride-hailing company donated the project to the Apache Foundation in 2019, Hudi has been adopted by the likes of Amazon, Disney and Walmart.

Chandar left Uber in 2019, and, after a brief stint at Confluent, founded Onehouse. The startup emerged out of stealth in 2022 with $8 million in seed funding, and followed that shortly after with a $25 million Series A round. Both rounds were co-led by Greylock Partners and Addition.

These VC firms have joined forces again for the Series B follow-up, though this time, David Sacks’ Craft Ventures is leading the round.

“The data lakehouse is quickly becoming the standard architecture for organizations that want to centralize their data to power new services like real-time analytics, predictive ML and GenAI,” Craft Ventures partner Michael Robinson said in a statement.

For context, data warehouses and data lakes are similar in the way they serve as a central repository for pooling data. But they do so in different ways: A data warehouse is ideal for processing and querying historical, structured data, whereas data lakes have emerged as a more flexible alternative for storing vast amounts of raw data in its original format, with support for multiple types of data and high-performance querying.

This makes data lakes ideal for AI and machine learning workloads, as it’s cheaper to store pre-transformed raw data, and at the same time, have support for more complex queries because the data can be stored in its original form.

However, the trade-off is a whole new set of data management complexities, which risks worsening the data quality given the vast array of data types and formats. This is partly what Hudi sets out to solve by bringing some key features of data warehouses to data lakes, such as ACID transactions to support data integrity and reliability, as well as improving metadata management for more diverse datasets.

Configuring data pipelines in Onehouse
Configuring data pipelines in Onehouse.
Image Credits: Onehouse

Because it is an open source project, any company can deploy Hudi. A quick peek at the logos on Onehouse’s website reveals some impressive users: AWS, Google, Tencent, Disney, Walmart, ByteDance, Uber and Huawei, to name a handful. But the fact that such big-name companies leverage Hudi internally is indicative of the effort and resources required to build it as part of an on-premises data lakehouse setup.

“While Hudi provides rich functionality to ingest, manage and transform data, companies still have to integrate about half-a-dozen open source tools to achieve their goals of a production-quality data lakehouse,” Chandar said.

This is why Onehouse offers a fully managed, cloud-native platform that ingests, transforms and optimizes the data in a fraction of the time.

“Users can get an open data lakehouse up-and-running in under an hour, with broad interoperability with all major cloud-native services, warehouses and data lake engines,” Chandar said.

The company was coy about naming its commercial customers, aside from the couple listed in case studies, such as Indian unicorn Apna.

“As a young company, we don’t share the entire list of commercial customers of Onehouse publicly at this time,” Chandar said.

With a fresh $35 million in the bank, Onehouse is now expanding its platform with a free tool called Onehouse LakeView, which provides observability into lakehouse functionality for insights on table stats, trends, file sizes, timeline history and more. This builds on existing observability metrics provided by the core Hudi project, giving extra context on workloads.

“Without LakeView, users need to spend a lot of time interpreting metrics and deeply understand the entire stack to root-cause performance issues or inefficiencies in the pipeline configuration,” Chandar said. “LakeView automates this and provides email alerts on good or bad trends, flagging data management needs to improve query performance.”

Additionally, Onehouse is also debuting a new product called Table Optimizer, a managed cloud service that optimizes existing tables to expedite data ingestion and transformation.

‘Open and interoperable’

There’s no ignoring the myriad other big-name players in the space. The likes of Databricks and Snowflake are increasingly embracing the lakehouse paradigm: Earlier this month, Databricks reportedly doled out $1 billion to acquire a company called Tabular, with a view toward creating a common lakehouse standard.

Onehouse has entered a hot space for sure, but it’s hoping that its focus on an “open and interoperable” system that makes it easier to avoid vendor lock-in will help it stand the test of time. It is essentially promising the ability to make a single copy of data universally accessible from just about anywhere, including Databricks, Snowflake, Cloudera and AWS native services, without having to build separate data silos on each.

As with Nvidia in the GPU realm, there’s no ignoring the opportunities that await any company in the data management space. Data is the cornerstone of AI development, and not having enough good quality data is a major reason why many AI projects fail. But even when the data is there in bucketloads, companies still need the infrastructure to ingest, transform and standardize to make it useful. That bodes well for Onehouse and its ilk.

“From a data management and processing side, I believe that quality data delivered by a solid data infrastructure foundation is going to play a crucial role in getting these AI projects into real-world production use cases — to avoid garbage-in/garbage-out data problems,” Chandar said. “We are beginning to see such demand in data lakehouse users, as they struggle to scale data processing and query needs for building these newer AI applications on enterprise scale data.”

Exclusive: Gemini's data-analyzing abilities aren't as good as Google claims

In this photo illustration a Gemini logo and a welcome message on Gemini website are displayed on two screens.

Image Credits: Lorenzo Di Cola/NurPhoto / Getty Images

One of the selling points of Google’s flagship generative AI models, Gemini 1.5 Pro and 1.5 Flash, is the amount of data they can supposedly process and analyze. In press briefings and demos, Google has repeatedly claimed that the models can accomplish previously impossible tasks thanks to their “long context,” like summarizing multiple hundred-page documents or searching across scenes in film footage.

But new research suggests that the models aren’t, in fact, very good at those things.

Two separate studies investigated how well Google’s Gemini models and others make sense out of an enormous amount of data — think “War and Peace”-length works. Both find that Gemini 1.5 Pro and 1.5 Flash struggle to answer questions about large datasets correctly; in one series of document-based tests, the models gave the right answer only 40%-50% of the time.

“While models like Gemini 1.5 Pro can technically process long contexts, we have seen many cases indicating that the models don’t actually ‘understand’ the content,” Marzena Karpinska, a postdoc at UMass Amherst and a co-author on one of the studies, told TechCrunch.

Gemini’s context window is lacking

A model’s context, or context window, refers to input data (e.g., text) that the model considers before generating output (e.g., additional text). A simple question — “Who won the 2020 U.S. presidential election?” — can serve as context, as can a movie script, show or audio clip. And as context windows grow, so does the size of the documents being fit into them.

The newest versions of Gemini can take in upward of 2 million tokens as context. (“Tokens” are subdivided bits of raw data, like the syllables “fan,” “tas” and “tic” in the word “fantastic.”) That’s equivalent to roughly 1.4 million words, two hours of video or 22 hours of audio — the largest context of any commercially available model.

In a briefing earlier this year, Google showed several pre-recorded demos meant to illustrate the potential of Gemini’s long-context capabilities. One had Gemini 1.5 Pro search the transcript of the Apollo 11 moon landing telecast — around 402 pages — for quotes containing jokes, and then find a scene in the telecast that looked similar to a pencil sketch.

VP of research at Google DeepMind Oriol Vinyals, who led the briefing, described the model as “magical.”

“[1.5 Pro] performs these sorts of reasoning tasks across every single page, every single word,” he said.

That might have been an exaggeration.

In one of the aforementioned studies benchmarking these capabilities, Karpinska, along with researchers from the Allen Institute for AI and Princeton, asked the models to evaluate true/false statements about fiction books written in English. The researchers chose recent works so that the models couldn’t “cheat” by relying on foreknowledge, and they peppered the statements with references to specific details and plot points that’d be impossible to comprehend without reading the books in their entirety.

Given a statement like “By using her skills as an Apoth, Nusis is able to reverse engineer the type of portal opened by the reagents key found in Rona’s wooden chest,” Gemini 1.5 Pro and 1.5 Flash — having ingested the relevant book — had to say whether the statement was true or false and explain their reasoning.

Image Credits: UMass Amherst

Tested on one book around 260,000 words (~520 pages) in length, the researchers found that 1.5 Pro answered the true/false statements correctly 46.7% of the time while Flash answered correctly only 20% of the time. Averaging all the benchmark results, neither model managed to achieve a bit higher than random chance in terms of question-answering accuracy.

“We’ve noticed that the models have more difficulty verifying claims that require considering larger portions of the book, or even the entire book, compared to claims that can be solved by retrieving sentence-level evidence,” Karpinska said. “Qualitatively, we also observed that the models struggle with verifying claims about implicit information that is clear to a human reader but not explicitly stated in the text.”

The second of the two studies, co-authored by researchers at UC Santa Barbara, tested the ability of Gemini 1.5 Flash (but not 1.5 Pro) to “reason over” videos — that is, search through and answer questions about the content in them.

The co-authors created a dataset of images (e.g., a photo of a birthday cake) paired with questions for the model to answer about the objects depicted in the images (e.g., “What cartoon character is on this cake?”). To evaluate the models, they picked one of the images at random and inserted “distractor” images before and after it to create slideshow-like footage.

Flash didn’t perform all that well. In a test that had the model transcribe six handwritten digits from a “slideshow” of 25 images, Flash got around 50% of the transcriptions right. The accuracy dropped to around 30% with eight digits.

“On real question-answering tasks over images, it appears to be particularly hard for all the models we tested,” Michael Saxon, a PhD student at UC Santa Barbara and one of the study’s co-authors, told TechCrunch. “That small amount of reasoning — recognizing that a number is in a frame and reading it — might be what is breaking the model.”

Google is overpromising with Gemini

Neither of the studies have been peer-reviewed, nor do they probe the releases of Gemini 1.5 Pro and 1.5 Flash with 2-million-token contexts. (Both tested the 1-million-token context releases.) And Flash isn’t meant to be as capable as Pro in terms of performance; Google advertises it as a low-cost alternative.

Nevertheless, both add fuel to the fire that Google’s been overpromising — and under-delivering — with Gemini from the beginning. None of the models the researchers tested, including OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet, performed well. But Google’s the only model provider that’s given context window top billing in its advertisements.

Google’s best Gemini demo was faked

“There’s nothing wrong with the simple claim, ‘Our model can take X number of tokens’ based on the objective technical details,” Saxon said. “But the question is, what useful thing can you do with it?”

Generative AI broadly speaking is coming under increased scrutiny as businesses (and investors) grow frustrated with the technology’s limitations.

In a pair of recent surveys from Boston Consulting Group, about half of the respondents — all C-suite executives — said that they don’t expect generative AI to bring about substantial productivity gains and that they’re worried about the potential for mistakes and data compromises arising from generative AI-powered tools. PitchBook recently reported that, for two consecutive quarters, generative AI dealmaking at the earliest stages has declined, plummeting 76% from its Q3 2023 peak.

Faced with meeting-summarizing chatbots that conjure up fictional details about people and AI search platforms that basically amount to plagiarism generators, customers are on the hunt for promising differentiators. Google — which has raced, at times clumsily, to catch up to its generative AI rivals — was desperate to make Gemini’s context one of those differentiators.

‘Embarrassing and wrong’: Google admits it lost control of image-generating AI

But the bet was premature, it seems.

“We haven’t settled on a way to really show that ‘reasoning’ or ‘understanding’ over long documents is taking place, and basically every group releasing these models is cobbling together their own ad hoc evals to make these claims,” Karpinska said. “Without the knowledge of how long context processing is implemented — and companies do not share these details — it is hard to say how realistic these claims are.”

Google didn’t respond to a request for comment.

Both Saxon and Karpinska believe the antidotes to hyped-up claims around generative AI are better benchmarks and, along the same vein, greater emphasis on third-party critique. Saxon notes that one of the more common tests for long context (liberally cited by Google in its marketing materials), “needle in the haystack,” only measures a model’s ability to retrieve particular info, like names and numbers, from datasets — not answer complex questions about that info.

“All scientists and most engineers using these models are essentially in agreement that our existing benchmark culture is broken,” Saxon said, “so it’s important that the public understands to take these giant reports containing numbers like ‘general intelligence across benchmarks’ with a massive grain of salt.”

Updated 7/3: A previous version of this article stated that Gemini 1.5 Pro and 1.5 Flash’s accuracy was below random chance on the task of reasoning over long text. In fact, their accuracy was above random chance. We’ve made the correction. Google PR also sent links to studies that suggest Gemini’s long-context performance is stronger than implied here: Extended Multi-Doc QA, Video MME, longer queries subset on LMSYS, Ruler.

Newsletter writer covering Evolve Bank's data breach says the bank sent him a cease and desist letter

Red and white do not enter sign on the wall

Image Credits: Karl Tapales (opens in a new window) / Getty Images

The situation around a data breach that’s affected an ever-growing number of fintech companies has gotten even weirder. Evolve Bank & Trust announced last week that it was hacked and confirmed the stolen data has been posted to the dark web. Now Evolve has sent a cease and desist letter to the writer of a newsletter who has been covering the ongoing situation.

Jason Mikula, author of respected industry publication Fintech Business Weekly, told TechCrunch that he received a cease and desist letter from the bank telling him not to share files from the dark web with any allegedly impacted fintech companies.

Mikula told TechCrunch that he wasn’t actually doing such sharing but he was offering to do so and did see some of the files. Looking at hacked information is a common practice among journalists when reporting on security breaches as a way to confirm that a breach happened and what was taken.

In this case, Mikula said he’s connected with four people who have access to some of the files that were stolen in the breach and posted on the dark web and has reviewed some of the data himself.

The crux of the problem is that not all the impacted fintechs have received details about what information was stolen in the breach, according to Mikula’s industry sources. 

“As I understand it, some fintechs hadn’t gotten ‘confirmation’ from Evolve about what had been breached and thus hadn’t acted to mitigate risk or inform users,” Mikula told TechCrunch.

Mikula believes that “seeing the files would let them (1) confirm the breach had happened and examples of what data fields were included and (2) allow them to identify specific customers that had been impacted,” he said.

Mikula was posting information on the fintechs confirmed to be involved on X and reporting on it in his newsletter. So much so that X users like Parrot Capital have heaped praise upon him. “Jason has been providing better customer service for those affected by the Evolve Bank breach than anyone else,” Parrot posted on X.

Mikula said yesterday he “woke up to the C&D.” He added that he was reporting on the situation responsibly and would continue to do so. TechCrunch has reached out to Evolve for comment.

Meanwhile, while Evolve was sending letters from lawyers to Mikula, on July 1, a group of senators publicly urged those involved with a fintech in trouble, Synapse, to act. They want Synapse’s owners, its fintech and bank partners — including Evolve — to “immediately restore customers’ access to their money.” Synapse was pressured to file for Chapter 7 bankruptcy in May, liquidating its business entirely. Customers have been frozen out ever since. 

The senators implicated both the partners and investors of the company as being responsible for any missing customer funds. The senators’ letter alleges that $65 million to $95 million worth of funds are missing, but Synapse and all other players, including Evolve, assert that if this is true, they are not the ones responsible. They are all pointing fingers at others. 

The letter was addressed to W. Scott Stafford, president and CEO of Evolve Bank & Trust, but was also sent to major investors in bankrupt banking-as-a-service startup Synapse, as well as to the company’s principal bank and fintech partners.

Want more fintech news in your inbox? Sign up for TechCrunch Fintech here.

Want to reach out with a tip? Email me at [email protected] or send me a message on Signal at 408.204.3036. You can also send a note to the whole TechCrunch crew at [email protected]. For more secure communications, click here to contact us, which includes SecureDrop (instructions here) and links to encrypted messaging apps.

HealthEquity says data breach is an ‘isolated incident’

closed padlocks on a green background with the exception of one lock, in red, that's open, symbolizing badly handled data breaches

Image Credits: MirageC / Getty Images

On Tuesday, health tech services provider HealthEquity disclosed in a filing with federal regulators that it had suffered a data breach, in which hackers stole the “protected health information” of some customers. 

In an 8-K filing with the SEC, the company said it detected “anomalous behavior by a personal use device belonging to a business partner,” and concluded that the partner’s account had been compromised by someone who then used the account to access members’ information.

On Wednesday, HealthEquity disclosed more details of the incident with TechCrunch. HealthEquity spokesperson Amy Cerny said in an email that this was “an isolated incident” that is not connected to other recent breaches, such as that of Change Healthcare, owned by the healthcare giant UnitedHealth. In May, UnitedHealth CEO Andrew Witty said in a House hearing that the breach affected “maybe a third” of all Americans.

HealthEquity detected the breach on March 25, when it “took immediate action, resolved the issue, and began extensive data forensics, which were completed on June 10.” The company brought together “a team of outside and internal experts to investigate and prepare for response.” The investigations determined that the breach was due to the compromised third-party vendor account having access to “some of HealthEquity’s SharePoint data,” according to Cerny.

Contact Us

Do you have more information about this HealthEquity breach? From a non-work device, you can contact Lorenzo Franceschi-Bicchierai securely on Signal at +1 917 257 1382, or via Telegram, Keybase and Wire @lorenzofb, or email. You also can contact TechCrunch via SecureDrop.

SharePoint is a set of Microsoft tools that allows companies to create websites, as well as store and share internal information — essentially an intranet.

Cerny also said that “transactional systems, where integrations occur, were not impacted,” and that the company is notifying partners, clients and members, and has been working with law enforcement as well as experts to work on preventing future incidents. 

TechCrunch asked Cerny to specify what personally identifiable and “protected health” information was stolen in this breach, how many people have been affected and what partner was involved. Cerny declined to answer all of these questions. 

Earlier this year, HealthEquity reported that the company and its subsidiaries “administer HSAs and other CDBs for our more than 15 million accounts in partnership with employers, benefits advisers, and health and retirement plan providers.”

India's Airtel dismisses data breach reports amid customer concerns

Image Credits: Pradeep Gaur / SOPA Images / LightRocket / Getty Images

Airtel, India’s second-largest telecom operator, on Friday denied any breach of its systems following reports of an alleged security lapse that has caused concern among its customers.

The telecom group, which also sells productivity and security solutions to businesses, said it had conducted a “thorough investigation” and found that there has been no breach whatsoever into Airtel’s systems. 

The telecom giant, which has amassed nearly 375 million subscribers in India, dismissed media reports about the alleged breach as “nothing short of a desperate attempt to tarnish Airtel’s reputation by vested interests.”

The company’s statement follows unconfirmed reports of a potential data breach circulated in local outlets and social media, prompting worry among Airtel’s subscriber base.

A purported data broker emerged on a known cybercrime forum this week, offering for sale the alleged personal information of approximately 375 million users, including phone numbers, email addresses, Aadhaar identification numbers and residential addresses.

Earlier this week, TechCrunch reviewed the data sample shared by the broker and found some discrepancies in its entries. Some security experts have also questioned the legitimacy of the alleged incident.

“We don’t think it’s an Airtel database. It seems it’s an aggregated database of multiple databases, and the actor is trying to sell it as an Airtel database,” Rahul Sasi, founder and CEO of cybersecurity startup CloudSEK, told TechCrunch, based on its analysis of the data sample.

Notably, the forum post was removed shortly after the incident was reported online. The forum also permanently blocked the broker’s account as a suspect of “scamming.”

Wittaya Aqua's data-driven AI helps seafood farmers increase aquaculture production

Row Of Buoys In Sea

Image Credits: Torben Kulin / EyeEm (opens in a new window)

More than 3 billion people around the globe rely on wild-caught and farmed seafood products for their protein intake. The world’s aquaculture production has hit a new record, and 89% of all aquatic animal production is being used for direct human consumption, according to a report published just last week. This shows a continuous increase in global consumption of aquatic foods. Accordingly, startups in the aquaculture sector are using AI technology to help farmers enhance production and sustainability.

Among them is a Canada-based startup called Wittaya Aqua. Its data-driven platform enables seafood farmers to consolidate existing data points across the seafood supply chain to drive greater profitability, sustainability and efficiency. The startup raised $2.8 million in a seed round to further develop its feed-to-farm platform and expand further into Asia, the largest aquaculture-producing region, after entering Singapore in 2023.

“We were first founded in Canada, but our vision is global, and Asia is a key part of the equation. … The [Asia] region is the global leader in aquaculture production, contributing a significant portion of the world’s seafood,” co-founder and CEO of Wittaya Aqua Evan Hall told TechCrunch. “While Southeast Asia boasts high production, there’s immense potential for further growth through data-driven practices.”

Many countries utilize aquaculture, but a few dominate aquaculture, including China, Indonesia, Vietnam, Bangladesh and South Korea, which are the top five aquaculture producers.

The startup’s platform uses AI and machine learning to enhance its science-based models, forecast animal growth (predictive analytics), and recommend optimal feed types and quantities based on real-time data and growth projections. Its machine-learning algorithm analyzes historical data and environmental factors to suggest strategies for maximizing crop yields.

Image Credits: Wittaya Aqua

Hall, a wildlife conservation photographer, and Dominique Bureau, a professor of animal nutrition and aquaculture at the University of Guelph, saw the inefficiencies and challenges of siloed data in the industry and co-founded Wittaya Aqua in 2017. Hall also said he had experienced firsthand the painful process of copying field notes into Excel to analyze data while working as a fisheries biologist.

The aquaculture data has traditionally been fragmented and slow-moving, hindering the ability to make well-informed decisions, Hall said. Wittaya Aqua aims to address this issue by consolidating data from various points in the supply chain — including farmers, feed mills and ingredient suppliers — into a single centralized platform. Transparency across the value chain provides data and insights to help users make better decisions at every level, according to the company CEO.

“The unified view allows us to build robust, science-based models that provide stakeholders with actionable insights,” Hall said. “For instance, a farmer can see how their feed choices directly impact growth rates and compare their performance against industry benchmarks. Similarly, feed mills can analyze how their feeds perform on various farms, allowing them to refine feeding strategies for specific customer needs.”

Its users include ingredient suppliers, feed mills and farmers. The startup says it is in the revenue-generation stage and has secured some customers, including BioMar, De Heus, Uni-President, US Soybean Export Council, Soy Aquaculture Alliance, Temasek Lifesciences Laboratory, AquaChile and others.

The global aquaculture market is projected to reach $355.6 billion by 2033, up from $299 billion in 2023, according to a report by Precedence Research.  

The company competes with farm management solution providers like Fieldin, Taranis, eFishery, Victory Farms, Atarraya and AquaEasy. What sets Wittaya apart from its peers is that its platform combines nutritional information with field performance. That means the company can model the impacts of different feed ingredients on animal performance, which is unique, Hall said. In addition, it works with multiple species in multiple geographies, from mainstream commercial ones like salmon, shrimp, tilapia and pangasius to niche species like grouper and snapper, unlike most companies that focus on a single species and a single geography.

In the longer term, Wittaya plans to pursue a two-pronged approach to usher in a new era of financial stability for farmers. First, it aims to reduce the perceived credit or insurance risks related to production mortality by offering robust data and insight. Second, it wants to match its users with lenders and insurers who can provide customized financial products, Hall said.

The outfit has 16 staff across Canada and Singapore.

India's Airtel dismisses data breach reports amid customer concerns

Image Credits: Pradeep Gaur / SOPA Images / LightRocket / Getty Images

Airtel, India’s second-largest telecom operator, on Friday denied any breach of its systems following reports of an alleged security lapse that has caused concern among its customers.

The telecom group, which also sells productivity and security solutions to businesses, said it had conducted a “thorough investigation” and found that there has been no breach whatsoever into Airtel’s systems. 

The telecom giant, which has amassed nearly 375 million subscribers in India, dismissed media reports about the alleged breach as “nothing short of a desperate attempt to tarnish Airtel’s reputation by vested interests.”

The company’s statement follows unconfirmed reports of a potential data breach circulated in local outlets and social media, prompting worry among Airtel’s subscriber base.

A purported data broker emerged on a known cybercrime forum this week, offering for sale the alleged personal information of approximately 375 million users, including phone numbers, email addresses, Aadhaar identification numbers, and residential addresses.

Earlier this week, TechCrunch reviewed the data sample shared by the broker and found some discrepancies in its entries. Some security experts have also questioned the legitimacy of the alleged incident.

“We don’t think it’s Airtel database. It seems it’s an aggregated database of multiple databases, and the actor is trying to sell as an Airtel database,” Rahul Sasi, founder and CEO of cybersecurity startup CloudSEK, told TechCrunch, based on its analysis of the data sample.

Notably, the forum post was removed shortly after the incident was reported online. The forum also permanently blocked the broker’s account as a suspect of “scamming.”