Making AI models 'forget' undesirable data hurts their performance

Colorful streams of data flowing into colorful binary info.

Image Credits: NicoElNino / Getty Images

So-called “unlearning” techniques are used to make a generative AI model forget specific and undesirable info it picked up from training data, like sensitive private data or copyrighted material.

But current unlearning techniques are a double-edged sword: They could make a model like OpenAI’s GPT-4o or Meta’s Llama 3.1 405B much less capable of answering basic questions.

That’s according to a new study co-authored by researchers at the University of Washington (UW), Princeton, the University of Chicago, USC and Google, which found that the most popular unlearning techniques today tend to degrade models — often to the point where they’re unusable.

“Our evaluation suggests that currently feasible unlearning methods are not yet ready for meaningful usage or deployment in real-world scenarios,” Weijia Shi, a researcher on the study and a Ph.D. candidate in computer science at UW, told TechCrunch. “Currently, there are no efficient methods that enable a model to forget specific data without considerable loss of utility.”

How models learn

Generative AI models have no real intelligence. They’re statistical systems that predict words, images, speech, music, videos and other data. Fed an enormous number of examples (e.g. movies, voice recordings, essays and so on), AI models learn how likely data is to occur based on patterns, including the context of any surrounding data.

Given an email ending in the fragment “Looking forward…”, for example, a model trained to autocomplete messages might suggest “… to hearing back,” following the pattern of all the emails it’s ingested. There’s no intentionality there; the model isn’t looking forward to anything. It’s simply making an informed guess.

Most models, including flagships like GPT-4o, are trained on data sourced from public websites and data sets around the web. Most vendors developing such models argue that fair use shields their practice of scraping data and using it for training without informing, compensating or even crediting the data’s owners.

But not every copyright holder agrees. And many — from authors to publishers to record labels — have filed lawsuits against vendors to force a change.

The copyright dilemma is one of the reasons unlearning techniques have gained a lot of attention lately. Google, in partnership with several academic institutions, last year launched a competition seeking to spur the creation of new unlearning approaches.

Unlearning could also provide a way to remove sensitive info from existing models, like medical records or compromising photos, in response to a request or government order. (Thanks to the way they’re trained, models tend to sweep up lots of private information, from phone numbers to more problematic examples.) Over the past few years, some vendors have rolled out tools to allow data owners to ask that their data be removed from training sets. But these opt-out tools only apply to future models, not models trained before they rolled out; unlearning would be a much more thorough approach to data deletion.

Regardless, unlearning isn’t as easy as hitting “Delete.”

The art of forgetting

Unlearning techniques today rely on algorithms designed to “steer” models away from the data to be unlearned. The idea is to influence the model’s predictions so that it never — or only very rarely — outputs certain data.

To see how effective these unlearning algorithms could be, Shi and her collaborators devised a benchmark and selected eight different open algorithms to test. Called MUSE (Machine Unlearning Six-way Evaluation), the benchmark aims to probe an algorithm’s ability to not only prevent a model from spitting out training data verbatim (a phenomenon known as regurgitation), but eliminate the model’s knowledge of that data along with any evidence that it was originally trained on the data.

Scoring well on MUSE requires making a model forget two things: books from the Harry Potter series and news articles.

For example, given a snippet from Harry Potter and The Chamber of Secrets (“‘There’s more in the frying pan,’ said Aunt…”), MUSE tests whether an unlearned model can recite the whole sentence (“‘There’s more in the frying pan,’ said Aunt Petunia, turning eyes on her massive son”), answer questions about the scene (e.g. “What does Aunt Petunia tell her son?”, “More in the frying pan”) or otherwise indicate it’s been trained on text from the book.

MUSE also tests whether the model retained related general knowledge — e.g. that J.K. Rowling is the author of the Harry Potter series — after unlearning, which the researchers refer to as the model’s overall utility. The lower the utility, the more related knowledge the model lost, making the model less able to correctly answer questions.

In their study, the researchers found that the unlearning algorithms they tested did make models forget certain information. But they also hurt the models’ general question-answering capabilities, presenting a trade-off.

“Designing effective unlearning methods for models is challenging because knowledge is intricately entangled in the model,” Shi explained. “For instance, a model may be trained on copyrighted material — Harry Potter books as well as on freely available content from the Harry Potter Wiki. When existing unlearning methods attempt to remove the copyrighted Harry Potter books, they significantly impact the model’s knowledge about the Harry Potter Wiki, too.”

Are there any solutions to the problem? Not yet — and this highlights the need for additional research, Shi said.

For now, vendors betting on unlearning as a solution to their training data woes appear to be out of luck. Perhaps a technical breakthrough will make unlearning feasible someday. But for the time being, vendors will have to find another way to prevent their models from saying things they shouldn’t.

Making AI models 'forget' undesirable data hurts their performance

Colorful streams of data flowing into colorful binary info.

Image Credits: NicoElNino / Getty Images

So-called “unlearning” techniques are used to make a generative AI model forget specific and undesirable info it picked up from training data, like sensitive private data or copyrighted material.

But current unlearning techniques are a double-edged sword: They could make a model like OpenAI’s GPT-4o or Meta’s Llama 3.1 405B much less capable of answering basic questions.

That’s according to a new study co-authored by researchers at the University of Washington (UW), Princeton, the University of Chicago, USC and Google, which found that the most popular unlearning techniques today tend to degrade models — often to the point where they’re unusable.

“Our evaluation suggests that currently feasible unlearning methods are not yet ready for meaningful usage or deployment in real-world scenarios,” Weijia Shi, a researcher on the study and a Ph.D. candidate in computer science at UW, told TechCrunch. “Currently, there are no efficient methods that enable a model to forget specific data without considerable loss of utility.”

How models learn

Generative AI models have no real intelligence. They’re statistical systems that predict words, images, speech, music, videos and other data. Fed an enormous number of examples (e.g. movies, voice recordings, essays and so on), AI models learn how likely data is to occur based on patterns, including the context of any surrounding data.

Given an email ending in the fragment “Looking forward…”, for example, a model trained to autocomplete messages might suggest “… to hearing back,” following the pattern of all the emails it’s ingested. There’s no intentionality there; the model isn’t looking forward to anything. It’s simply making an informed guess.

Most models, including flagships like GPT-4o, are trained on data sourced from public websites and data sets around the web. Most vendors developing such models argue that fair use shields their practice of scraping data and using it for training without informing, compensating or even crediting the data’s owners.

But not every copyright holder agrees. And many — from authors to publishers to record labels — have filed lawsuits against vendors to force a change.

The copyright dilemma is one of the reasons unlearning techniques have gained a lot of attention lately. Google, in partnership with several academic institutions, last year launched a competition seeking to spur the creation of new unlearning approaches.

Unlearning could also provide a way to remove sensitive info from existing models, like medical records or compromising photos, in response to a request or government order. (Thanks to the way they’re trained, models tend to sweep up lots of private information, from phone numbers to more problematic examples.) Over the past few years, some vendors have rolled out tools to allow data owners to ask that their data be removed from training sets. But these opt-out tools only apply to future models, not models trained before they rolled out; unlearning would be a much more thorough approach to data deletion.

Regardless, unlearning isn’t as easy as hitting “Delete.”

The art of forgetting

Unlearning techniques today rely on algorithms designed to “steer” models away from the data to be unlearned. The idea is to influence the model’s predictions so that it never — or only very rarely — outputs certain data.

To see how effective these unlearning algorithms could be, Shi and her collaborators devised a benchmark and selected eight different open algorithms to test. Called MUSE (Machine Unlearning Six-way Evaluation), the benchmark aims to probe an algorithm’s ability to not only prevent a model from spitting out training data verbatim (a phenomenon known as regurgitation), but eliminate the model’s knowledge of that data along with any evidence that it was originally trained on the data.

Scoring well on MUSE requires making a model forget two things: books from the Harry Potter series and news articles.

For example, given a snippet from Harry Potter and The Chamber of Secrets (“‘There’s more in the frying pan,’ said Aunt…”), MUSE tests whether an unlearned model can recite the whole sentence (“‘There’s more in the frying pan,’ said Aunt Petunia, turning eyes on her massive son”), answer questions about the scene (e.g. “What does Aunt Petunia tell her son?”, “More in the frying pan”) or otherwise indicate it’s been trained on text from the book.

MUSE also tests whether the model retained related general knowledge — e.g. that J.K. Rowling is the author of the Harry Potter series — after unlearning, which the researchers refer to as the model’s overall utility. The lower the utility, the more related knowledge the model lost, making the model less able to correctly answer questions.

In their study, the researchers found that the unlearning algorithms they tested did make models forget certain information. But they also hurt the models’ general question-answering capabilities, presenting a trade-off.

“Designing effective unlearning methods for models is challenging because knowledge is intricately entangled in the model,” Shi explained. “For instance, a model may be trained on copyrighted material — Harry Potter books as well as on freely available content from the Harry Potter Wiki. When existing unlearning methods attempt to remove the copyrighted Harry Potter books, they significantly impact the model’s knowledge about the Harry Potter Wiki, too.”

Are there any solutions to the problem? Not yet — and this highlights the need for additional research, Shi said.

For now, vendors betting on unlearning as a solution to their training data woes appear to be out of luc. Perhaps a technical breakthrough will make unlearning feasible someday. But for the time being, vendors will have to find another way to prevent their models from saying things they shouldn’t.

Conference floor with booths from companies like AWS, Palo Alto Networks, Rapid 7 and others taken from above.

Sproxxy is making it easier to measure conference spending ROI

Conference floor with booths from companies like AWS, Palo Alto Networks, Rapid 7 and others taken from above.

Image Credits: Ron Miller

Whenever you go to a conference, whether as a sales and marketing exercise, or an executive is speaking, there is a cost associated with that. For the former, it involves the cost of space and a booth plus hotels, travel and meals for employees staffing the booth. For executives, it is time away from the office, the cost of a ticket for attending and travel costs. How do companies justify the cost of attending those events?

Up until now, it was pretty hard to do, but Sproxxy, an early-stage company, is looking to change that by creating a platform to manage conference-related activities, while helping customers understand the ROI of going to these events. Today the startup officially launched after raising $1.1 million to get the company off the ground.

Melanie Samba, the founder and CEO at Sproxxy, was 20 years into her career in marketing and communications, managing 12 executives, who were attending a total of 80 conferences a year, and managing it all in Excel spreadsheets. She wasn’t looking to start a software company, but she had an epiphany of sorts that there had to be a better way to handle this information, and she would later launch Sproxxy to build the platform she envisioned.

“We’re positioned as a conference intelligence platform. And what we’re doing is quantifying conference activity. So we help brands prove the business impact of them participating at conferences and knowing the ROI or value of speaking, sponsoring and attending an industry event,” Samba told TechCrunch.

She says that can involve pre-planning, including finding the right conferences to attend, cross-department collaboration to coordinate around attending and post-conference analysis, which involves figuring out if it was worth the cost in time and resources to attend. She says at the end of the day, the company is focused on providing data, analytics and insight about what the company gained (or didn’t) by attending.

After she came up with the idea, Samba hired a company to build the initial version of the software, and was able to sell her first license to an agency managing 60 clients on the platform. Last year, she decided it was time to bring development in-house, and rebuild the product. Today, she has a team of three engineers and a product manager.

She says there seems to be a demand, with a pipeline of 1,200 companies she is working her way through and hoping to onboard onto the platform. The target market is midsize businesses to enterprises looking for a way to manage this process.

She said as a solo Black woman founder, who is also the mother of a young child, she worried about the fundraising process. She had good reason to be. Black founders, regardless of gender, raised less than 1% of all the venture money invested in 2023. She said the challenge was getting into the room and helping investors understand the value of the product.

She eventually found Ivy Ventures, a firm that invested a modest $600,000 to get Sproxxy off the ground. The remaining $500,000 came from industry angels and Techstars. Samba’s goal is to raise a total of $1.8 million and she is on her way to that goal, while discussing the remaining money with investors, including Ivy, to fill out the round.

Quick commerce is making fast inroads in India

Image Credits: Niharika Kulkarni / NurPhoto / Getty Images

Even as quick commerce is slowly fading in many markets and several heavily funded startups have folded shop in the past two years, India is emerging as a striking outlier where the model — of delivering items to customers in 10 to 20 minutes — appears to be working.

India’s quick commerce market has witnessed a staggering tenfold growth between 2021 and 2023, fueled by the sector’s ability to cater to the distinct needs of urban consumers seeking convenience for unplanned, small-ticket purchases. Despite this rapid expansion, however, quick commerce has only captured a modest 7% of the potential market, with a total addressable market (TAM) estimated at $45 billion, surpassing that of food delivery, according to JM Financial.

The quick commerce players — Zomato-owned Blinkit, Swiggy’s Instamart and YC Continuity-backed Zepto — can reach an estimated 25 million households, who are likely to spend an average of 4,000 to 5,000 rupees ($48 to $60) per month, according to Bank of America.

The top players are expected to expand their reach to 45 to 55 cities within the next 3 to 5 years, up from the current 25 cities, BofA added. Regular customers of quick commerce platforms typically order three to four times per month, with retention rates as high as 60% to 65%. Top users, however, make even more frequent transactions, ranging from 30 to 40 times per month, BofA analysts wrote in a note Monday.

“Quick commerce model had its own challenges in Europe and USA but in India, especially in top markets, product-market fit development was led by users liking the experience when things are delivered faster to them at doorstep,” the analysts wrote. “These users don’t want to go back to local corner stores and spend 10 to 15 minutes/fuel extra. The usage started from top cities like Bengaluru, Delhi-NCR, Kolkata etc and then has moved to even smaller cities like Indore, Pune, Rajkot etc.”

Zomato’s Blinkit leads the quick commerce market in India, having cornered as much as 46% of the market share by GMV in the quarter that ended in December, according to a separate analysis.

Swiggy’s Instamart follows with a 27% share; newcomer Zepto has quickly gained ground, securing 21% of the market; and Bigbasket’s BB Now trails with a 7% share, brokerage firm JM Financial said. Reliance Retail–backed Dunzo, which pioneered the quick commerce model in India, has virtually ceded its entire market share to competition.

“With more than 10 active players, the space was very competitive a couple of years back,” JM Financial wrote of the quick commerce market in a recent note. “It appeared that an intense phase of multi-year cash-burn would soon follow. However, contrary to expectations, several players including some well-funded ones folded early in their endeavour. While some faced funding challenges, a few others were affected by structural issues such as lack of product market fit, inability to solve the hyperlocal complexity, inability to build a robust end-to-end supply chain and … failure to create a strong brand recall.”

As quick commerce players vie for a larger slice of the market, the success of their ventures hinges on the development of efficient supply chains. Companies are making substantial investments in dark store operations, streamlining inventory management and establishing direct partnerships with FMCG manufacturers and farmers. By circumventing traditional distribution channels, these firms aim to enhance product quality, expedite delivery times, and boost overall operational efficiency, industry analysts said.

Dark stores, the backbone of quick commerce operations, have significantly expanded their product offerings, now carrying over 6,000 SKUs per store, a substantial increase from the 2,000 to 4,000 SKUs they housed just a few years ago. In contrast, traditional neighborhood kirana stores, which are ubiquitous across Indian cities, towns, and villages, typically stock between 1,000 and 1,500 items, according to JM Financial. Large modern retail stores, on the other hand, offer a much wider selection, with 15,000 to 20,000 items available to customers.

There has also been a noticeable surge in average order value among quick commerce players, which has risen to up to 650 rupees ($7.8) from the previous range of 350 to 400 rupees. This increase in average order value sets quick commerce apart from kirana stores, where customers typically spend between Rs 100 and Rs 200 per transaction.

Image Credits: JM Financial

While the convenience offered by quick commerce is undeniable, profitability remains a concern for investors. Blinkit — which Zomato acquired in 2022 — aims to achieve adjusted EBITDA break-even by the first quarter of fiscal year 2025, while Zepto has set its sights on EBITDA profitability in 2024. Swiggy’s Instamart is also focusing on profitability, with the parent company indicating that the peak of investments in the business is now behind them. Swiggy turned its food delivery business profitable last year.

Many of these players are trying to improve their margins by increasingly looking beyond the grocery category. All the top three players today sell consumer electronics items, a category that makes about half of the sales on Flipkart and Amazon India by GMV, according to people familiar with the matter. (So it should not come as a surprise that Flipkart is weighing entering the quick commerce market by as early as May this year.)

Additionally, advertising revenues currently account for around 3% to 3.5% of quick commerce platforms’ total revenue and could easily reach 4.5%, primarily driven by direct-to-consumer platforms, BofA said. These platforms are also exploring private label strategies in certain categories, it added.

Consumer survey by JM Financial. Image Credits: JM Financial

10 years in the making, retro game emulator Delta is now No. 1 on the iOS charts

Play Pokémon and other Game Boy games on your iPhone

Image Credits: TechCrunch

Video game emulator Delta’s decade-long struggle against the iOS App Store began with a school-issued TI-84 calculator.

When Riley Testut was a sophomore in high school, he showed his friends how to load illicit software onto their bulky graphing calculators. Such behavior was generally discouraged at school, but he wasn’t plotting to cheat on a test. He was simply traversing the Viridian Forest, surfing across the Kanto seas and collecting gym badges.

“The teachers didn’t think we were playing Pokémon,” Testut told TechCrunch. “They were just like, ‘Why is everyone so into their calculators?’”

By 2014, when Testut was a high school senior, the only way to install a retro video game emulator on an iOS device was to jailbreak it. But Testut didn’t want to damage his then-state-of-the-art iPhone 4. So, he spent months building an app that would let him play GBA games on his phone without voiding its warranty. That app became GBA4iOS.

Millions of people swarmed to GBA4iOS, reveling in the glorious experience of playing Game Boy titles on a palm-sized phone. Even Time Magazine wrote about it. But GBA4iOS was too successful for its own good and, soon, Testut had to face an adversary more formidable than Team Rocket: Eight months after the app’s release, Apple patched the loophole that made GBA4iOS possible, quashing his app in the process.

“For a brief, glorious, few months, you had experienced what it was like to make an app that was used by millions — despite it being one that by all accounts ‘shouldn’t exist,’” Testut explained in a recent blog post, reflecting on his experience over the last decade. “You knew you were living on borrowed time, but it just felt wrong that an app this popular can never exist on iOS.”

“The Pokémon broke free!”

Video game emulation is complex to pull off, but it’s simple for the users. You download an emulator — usually open source, like ePSXe for PlayStation titles, or OpenEmu, which can emulate a bunch of consoles — and then get games to play on it.

But that’s where the issue lies: Finding a software copy of games, usually housed in a .ROM file, to use with these emulators is not as easy as buying games on the Nintendo eShop. You can buy hardware to legally extract the game file from old video game cartridges or discs that you own, but an easier way to play really old games is to just download their .ROM files for free from the internet — that’s basically piracy.

Downloading an app like Delta has never been illegal, but downloading the game files you need to play can be.

Given his prior experience, Testut was shocked when Apple changed its rules around emulators a few weeks ago. He uploaded Delta, a more refined version of GBA4iOS, and suddenly, he became the developer behind the No. 1 app on the App Store. Two weeks after release, Delta is still topping the charts of entertainment apps and holds the second spot on the free apps charts. According to app intelligence company Appfigures, Delta has been downloaded about 3.8 million times in two weeks.

“It’s surreal how good the reception has been, and how many people are playing it,” he said. “This is the app I’ve been working on for 10 years.”

Testut hadn’t just been a sitting Psyduck since GBA4iOS faded away a decade ago. He went to the University of Southern California to study computer science. Soon after, he happened to attend a Super Smash Bros. players meetup, where he crossed paths with Shane Gill, an engineering student who has now been his roommate for nine years and his business partner for five. They even have the same birthday.

Of course, Gill had also used GBA4iOS in high school, and he was excited to find out that his new friend was behind the app. He shared Testut’s drive to give app developers the freedom to access an audience without Apple as a middleman.

“There are more people like Riley and just so many developers that make these really cool things,” Gill told TechCrunch. “And just because it’s an iPhone, they can’t share it the same way.”

Emboldened by his experiences as a teenage developer, Testut teamed up with Gill in 2019 and launched AltStore, an app store designed for sideloading apps onto iOS devices. For most of its existence, AltStore was only usable on Windows and MacOS, but due to changing regulations in the European Union, it now has a legitimate way to become available on iPhones, as Apple was forced to allow iOS users in the EU to download apps from outside of the App Store.

So why did Apple just change its stance on emulator apps so suddenly? Apple did not respond to TechCrunch’s request for comment. It could have something to do with increased regulatory pressure on Apple, as it was recently sued by the U.S. Department of Justice over antitrust concerns. And according to Testut, it’s a bit too convenient that just as AltStore launched on iOS in the EU, Apple made a small carve-out in its rules to allow Delta to exist.

“Even if the DOJ hadn’t started this [antitrust lawsuit], I think Apple would have done the same thing of allowing emulators in the App Store worldwide,” Testut said. “They just couldn’t have that narrative that the coolest app on iPhone is only in Europe, thanks to European regulators.”

Running an app business without a middleman

When Gill joined Testut to work on AltStore, he encouraged Testut to set up a Patreon so he could stop taking odd jobs to pay the bills. Now, AltStore’s Patreon earns over $13,000 per month (up from about $10,000 at the beginning of April). In exchange for their monthly contributions, AltStore’s patrons get access to early app betas, like Delta’s test of iPad and SEGA Genesis support, as well as access to a community Discord.

Delta is a free app, so these millions of app downloads aren’t lining the devs’ pockets. Still, Testut and Gill don’t plan to change their monetization model.

“In the App Store, you don’t get this. You don’t have this relationship with your customers. … It’s way more bureaucratic and not personal,” Testut said. “I’m very excited to show that consumers can have a close relationship with developers. I think it works better that way, because we can have Delta be completely free without any paywalls in it, and people can still get access to cool new features early by just donating to us.”

Testut’s vision is pretty emblematic of the game emulation community. At a time when it’s difficult to truly own any of the software you subscribe to, or use, no matter how much money you’ve paid over the years, the effort here isn’t just about reliving childhood gaming memories. It’s archival.

Often, the only way to preserve a 40-year-old game is to rip the software from the old cartridge and build an emulator that can run it, and that’s true for more software than just games.

“This is art that existed 40 years ago. Developers don’t own the IP anymore, and there’s no way to share it with people,” Gill said. “So unless somebody puts it out in some form, it’ll just be lost to time. That’s something that I think is a bit tragic.”

Thanks to Testut and Gill, it’s never been easier to play retro video games for people who don’t want to step beyond the bounds of what Apple deems legal on iPhones. And it’s all because the company just slightly tweaked its developer guidelines. If just one change to App Store policy can unlock a new No. 1 app, what else have we missed out on?

“This was way bigger than we could have ever hoped for, honestly,” said Testut. “Being able to put Delta in the App Store has just made our message so obvious. We’re saying, ‘Hey, we’re trying to make apps that haven’t been able to exist before,’ and then the second Apple allows it, we’re in the App Store and we’re the number one app.”

Apple will soon let users in the EU download apps through web sites, not just the App Store