Anthropic claims its latest model is best-in-class

Anthropic Claude logo

Image Credits: Anthropic

OpenAI rival Anthropic is releasing a powerful new generative AI model called Claude 3.5 Sonnet. But it’s more an incremental step than a monumental leap forward.

Claude 3.5 Sonnet can analyze both text and images as well as generate text, and it’s Anthropic’s best-performing model yet — at least on paper. Across several AI benchmarks for reading, coding, math and vision, Claude 3.5 Sonnet outperforms the model it’s replacing, Claude 3 Sonnet, and beats Anthropic’s previous flagship model Claude 3 Opus.

Benchmarks aren’t necessarily the most useful measure of AI progress, in part because many of them test for esoteric edge cases that aren’t applicable to the average person, like answering health exam questions. But for what it’s worth, Claude 3.5 Sonnet just barely bests rival leading models, including OpenAI’s recently launched GPT-4o, on some of the benchmarks Anthropic tested it against.

Alongside the new model, Anthropic is releasing what it’s calling Artifacts, a workspace where users can edit and add to content — e.g. code and documents — generated by Anthropic’s models. Currently in preview, Artifacts will gain new features, like ways to collaborate with larger teams and store knowledge bases, in the near future, Anthropic says.

Focus on efficiency

Claude 3.5 Sonnet is a bit more performant than Claude 3 Opus, and Anthropic says that the model better understands nuanced and complex instructions, in addition to concepts like humor. (AI is notoriously unfunny, though.) But perhaps more importantly for devs building apps with Claude that require prompt responses (e.g. customer service chatbots), Claude 3.5 Sonnet is faster. It’s around twice the speed of Claude 3 Opus, Anthropic claims.

Vision — analyzing photos — is one area where Claude 3.5 Sonnet greatly improves over 3 Opus, according to Anthropic. Claude 3.5 Sonnet can interpret charts and graphs more accurately and transcribe text from “imperfect” images, such as pics with distortions and visual artifacts.

Michael Gerstenhaber, product lead at Anthropic, says that the improvements are the result of architectural tweaks and new training data, including AI-generated data. Which data specifically? Gerstenhaber wouldn’t disclose, but he implied that Claude 3.5 Sonnet draws much of its strength from these training sets.

Anthropic Claude 3.5 Sonnet
Image Credits: Anthropic

“What matters to [businesses] is whether or not AI is helping them meet their business needs, not whether or not AI is competitive on a benchmark,” Gerstenhaber told TechCrunch. “And from that perspective, I believe Claude 3.5 Sonnet is going to be a step function ahead of anything else that we have available — and also ahead of anything else in the industry.”

The secrecy around training data could be for competitive reasons. But it could also be to shield Anthropic from legal challenges — in particular challenges pertaining to fair use. The courts have yet to decide whether vendors like Anthropic and its competitors, like OpenAI, Google, Amazon and so on, have a right to train on public data, including copyrighted data, without compensating or crediting the creators of that data.

So, all we know is that Claude 3.5 Sonnet was trained on lots of text and images, like Anthropic’s previous models, plus feedback from human testers to try to “align” the model with users’ intentions, hopefully preventing it from spouting toxic or otherwise problematic text.

Anthropic Claude 3.5 Sonnet
Image Credits: Anthropic

What else do we know? Well, Claude 3.5 Sonnet’s context window — the amount of text that the model can analyze before generating new text — is 200,000 tokens, the same as Claude 3 Sonnet. Tokens are subdivided bits of raw data, like the syllables “fan,” “tas” and “tic” in the word “fantastic”; 200,000 tokens is equivalent to about 150,000 words.

And we know that Claude 3.5 Sonnet is available today. Free users of Anthropic’s web client and the Claude iOS app can access it at no charge; subscribers to Anthropic’s paid plans Claude Pro and Claude Team get 5x higher rate limits. Claude 3.5 Sonnet is also live on Anthropic’s API and managed platforms like Amazon Bedrock and Google Cloud’s Vertex AI.

“Claude 3.5 Sonnet is really a step change in intelligence without sacrificing speed, and it sets us up for future releases along the entire Claude model family,” Gerstenhaber said.

Claude 3.5 Sonnet also drives Artifacts, which pops up a dedicated window in the Claude web client when a user asks the model to generate content like code snippets, text documents or website designs. Gerstenhaber explains: “Artifacts are the model output that puts generated content to the side and allows you, as a user, to iterate on that content. Let’s say you want to generate code — the artifact will be put in the UI, and then you can talk with Claude and iterate on the document to improve it so you can run the code.”

The bigger picture

So what’s the significance of Claude 3.5 Sonnet in the broader context of Anthropic — and the AI ecosystem, for that matter?

Claude 3.5 Sonnet shows that incremental progress is the extent of what we can expect right now on the model front, barring a major research breakthrough. The past few months have seen flagship releases from Google (Gemini 1.5 Pro) and OpenAI (GPT-4o) that move the needle marginally in terms of benchmark and qualitative performance. But there hasn’t been a leap of matching the leap from GPT-3 to GPT-4 in quite some time, owing to the rigidity of today’s model architectures and the immense compute they require to train.

As generative AI vendors turn their attention to data curation and licensing in lieu of promising new scalable architectures, there are signs investors are becoming wary of the longer-than-anticipated path to ROI for generative AI. Anthropic is somewhat inoculated from this pressure, being in the enviable position of Amazon’s (and to a lesser extent Google’s) insurance against OpenAI. But the company’s revenue, forecasted to reach just under $1 billion by year-end 2024, is a fraction of OpenAI’s — and I’m sure Anthropic’s backers don’t let it forget that fact.

Despite a growing customer base that includes household brands such as Bridgewater, Brave, Slack and DuckDuckGo, Anthropic still lacks a certain enterprise cachet. Tellingly, it was OpenAI — not Anthropic — with which PwC recently partnered to resell generative AI offerings to the enterprise.

So Anthropic is taking a strategic, and well-trodden, approach to making inroads, investing development time into products like Claude 3.5 Sonnet to deliver slightly better performance at commodity prices. Claude 3.5 Sonnet is priced the same as Claude 3 Sonnet: $3 per million tokens fed into the model and $15 per million tokens generated by the model.

Gerstenhaber spoke to this in our conversation. “When you’re building an application, the end user shouldn’t have to know which model is being used or how an engineer optimized for their experience,” he said, “but the engineer could have the tools available to optimize for that experience along the vectors that need to be optimized, and cost is certainly one of them.”

Claude 3.5 Sonnet doesn’t solve the hallucinations problem. It almost certainly makes mistakes. But it might just be attractive enough to get developers and enterprises to switch to Anthropic’s platform. And at the end of the day, that’s what matters to Anthropic.

Toward that same end, Anthropic has doubled down on tooling like its experimental steering AI, which lets developers “steer” its models’ internal features; integrations to let its models take actions within apps; and tools built on top of its models such as the aforementioned Artifacts experience. It’s also hired an Instagram co-founder as head of product. And it’s expanded the availability of its products, most recently bringing Claude to Europe and establishing offices in London and Dublin.

Anthropic, all told, seems to have come around to the idea that building an ecosystem around models — not simply models in isolation — is the key to retaining customers as the capabilities gap between models narrows.

Still, Gerstenhaber insisted that bigger and better models — like Claude 3.5 Opus — are on the near horizon, with features such as web search and the ability to remember preferences in tow.

“I haven’t seen deep learning hit a wall yet, and I’ll leave it to researchers to speculate about the wall, but I think it’s a little bit early to be coming to conclusions on that, especially if you look at the pace of innovation,” he said. “There’s very rapid development and very rapid innovation, and I have no reason to believe that it’s going to slow down.”

We’ll see.

Meta's 'pay or consent' model fails EU competition rules, Commission finds

Mark Zuckerberg, CEO of Meta testifies before the Senate Judiciary Committee.

Image Credits: Alex Wong / Staff / Getty Images

An investigation conducted by the European Commission has found that Meta’s “pay or consent” offer to Facebook and Instagram users in Europe does not comply with the bloc’s Digital Markets Act (DMA), according to preliminary findings reported by the regulator on Monday.

The Commission wrote in a press release that the binary choice Meta offers “forces users to consent to the combination of their personal data and fails to provide them a less personalised but equivalent version of Meta’s social networks.”

Failure to abide by the ex-ante market contestability regulation, which applied on Meta and other so called “gatekeepers” since March 7, could be extremely costly for the adtech giant. Penalties for confirmed breaches can reach up to 10% of global annual turnover, and even 20% for repeat offences.

More saliently, Meta could finally be forced to abandon a business model that demands users agree to surveillance advertising as the entry “price” for using its social networks.

The EU in March opened a formal DMA investigation into Meta’s “pay or consent” offer, following months of lobbying from privacy advocacy and consumer protection groups. The groups also argued that a subscription to not see ads does not comply with the bloc’s data protection or consumer protection rules either.

Back in March, the Commission said it was concerned Meta’s binary choice may not provide “a real alternative” for users who do not consent to its tracking. Meta was essentially asking users to either agree to being tracked so it could continue serving targeted advertising, or fork out almost €13 per month (per account) to access ad-free versions of the services.

The EU’s goal with the DMA is to level the playing field by targeting various advantages that gatekeepers can exploit using their dominance.

In Meta’s case, the Commission thinks the company’s dominant position in social networking lets it extract more data from users to profile them, which gives its ad business an unfair advantage versus its competitors. To reset the dynamic, the EC introduced a requirement in the DMA that gatekeepers must obtain people’s permission before they can be tracked for ads.

The regulator’s case against Meta contends the adtech giant is failing to provide people with a free and fair choice to deny tracking.

In a briefing with journalists ahead of the announcement, senior Commission officials emphasized that as long as Meta’s social networking services are free, the equivalent versions it offers to users who do not wish to consent to tracking must also be free.

The relevant DMA article here is Article 5(2), which requires gatekeepers to seek users’ consent for combining their personal data between designated core platform services (CPS) and other services. Facebook, Instagram and Meta’s ads business have been designated as CPS since September 2023, so the company needs users’ permission to track and profile their activity and run “personalized” ads.

Users who refuse Meta’s tracking have the right to access a less-personalized but equivalent alternative, and the Commission’s preliminary view after around three months of investigations is that Meta is breaching this requirement, as a paid subscription is not a valid equivalent to free access.

The regulation also stipulates gatekeepers cannot make use of a service or certain functionalities conditional on users’ consent.

Meta spokesman Matthew Pollard responded to the EU’s findings by sending a statement attributed to a company spokesperson. Meta repeated a defense of the approach by citing an earlier EU court judgment, writing: “Subscription for no ads follows the direction of the highest court in Europe and complies with the DMA. We look forward to further constructive dialogue with the European Commission to bring this investigation to a close.”

When asked about this defense, senior Commission officials pointed out that the judgement Meta is referring to involved the Court of Justice caveating the suggestion that a paid version of a service may be offered as an alternative to tracking ads, saying that only “if necessary” could an “appropriate fee” be charged.

In the DMA context, the bloc’s enforcers say a gatekeeper would therefore have to argue why a fee is necessary. The EU pointed out that Meta could offer an alternative service with ads that do not rely on any personal data for targeting — such as contextual advertising.

Meta has never explained why it has not offered users a free, contextual ads option.

The EU looks to be on a road to forcing Meta to provide a non-binary, privacy-safe choice in the coming months.

“To ensure compliance with the DMA, users who do not consent should still get access to an equivalent service which uses less of their personal data, in this case for the personalisation of advertising,” the Commission noted in the press release.

Commission officials noted that Meta could still offer a subscription option, but any paid choice would need to be an additional offer (i.e. a third choice) on top of a free equivalent that does not demand users consent to being tracked.

The EU’s investigation isn’t over yet, and Meta will have a chance to respond formally to the preliminary findings. But there’s a limited window for things to play out here: The bloc has set itself a 12-month timeline to complete the probe, which suggests it needs to finish the job by or before March 2025.

BEUC, the European consumer organization, welcomed the preliminary findings, urging the EU to push through to speedy enforcement.

“It’s good news that the Commission is taking enforcement action based on the Digital Markets Act against Meta’s pay-or-consent model. It comes on top of the complaints against Meta’s model for breaches of consumer law and data protection law, which consumer organisations have raised in the last few months. We now urge Meta to comply with laws meant to protect consumers,” said Agustin Reyna, BEUC’s director general, in a statement.

Meta’s ‘consent or pay’ data grab in Europe faces new complaints

Apple, Google and Meta face first formal investigations under EU’s DMA

A new Chinese video-generating model appears to be censoring politically sensitive topics

Image Credits: Photo by VCG/VCG via Getty Images

A powerful new video-generating AI model became widely available today — but there’s a catch: The model appears to be censoring topics deemed too politically sensitive by the government in its country of origin, China.

The model, Kling, developed by Beijing-based company Kuaishou, launched in waitlisted access earlier in the year for users with a Chinese phone number. Today, it rolled out for anyone willing to provide their email. After signing up, users can enter prompts to have the model generate five-second videos of what they’ve described.

Kling works pretty much as advertised. Its 720p videos, which take a minute or two to generate, don’t deviate too far from the prompts. And Kling appears to simulate physics, like the rustling of leaves and flowing water, about as well as video-generating models like AI startup Runway’s Gen-3 and OpenAI’s Sora.

But Kling outright won’t generate clips about certain subjects. Prompts like “Democracy in China,” “Chinese President Xi Jinping walking down the street” and “Tiananmen Square protests” yield a nonspecific error message.

Kling AI
Image Credits: Kuaishou

The filtering appears to be happening only at the prompt level. Kling supports animating still images, and it’ll uncomplainingly generate a video of a portrait of Jinping, for example, as long as the accompanying prompt doesn’t mention Jinping by name (e.g., “This man giving a speech”).

We’ve reached out to Kuaishou for comment.

Kling AI
Image Credits: Kuaishou

Kling’s curious behavior is likely the result of intense political pressure from the Chinese government on generative AI projects in the region.

Earlier this month, the Financial Times reported that AI models in China will be tested by China’s leading internet regulator, the Cyberspace Administration of China (CAC), to ensure that their responses on sensitive topics “embody core socialist values.” Models are to be benchmarked by CAC officials for their responses to a variety of queries, per the Financial Times report — many related to Jinping and criticism of the Communist Party.

Reportedly, the CAC has gone so far as to propose a blacklist of sources that can’t be used to train AI models. Companies submitting models for review must prepare tens of thousands of questions designed to test whether the models produce “safe” answers.

The result is AI systems that decline to respond on topics that might raise the ire of Chinese regulators. Last year, the BBC found that Ernie, Chinese company Baidu’s flagship AI chatbot model, demurred and deflected when asked questions that might be perceived as politically controversial, like “Is Xinjiang a good place?” or “Is Tibet a good place?”

The draconian policies threaten to slow China’s AI advances. Not only do they require scouring data to remove politically sensitive info, but they also necessitate investing an enormous amount of dev time in creating ideological guardrails — guardrails that might still fail, as Kling exemplifies.

From a user perspective, China’s AI regulations are already leading to two classes of models: some hamstrung by intensive filtering and others decidedly less so. Is that really a good thing for the broader AI ecosystem?

AI music startup Suno claims training model on copyrighted music is 'fair use'

3D headphones with sound waves on dark background. Concept of electronic music listening and digital audio. Abstract visualization of digital sound waves and modern art. Vector illustration. (3D headphones with sound waves on dark background. Concept

Image Credits: maxkabakov (opens in a new window) / Getty Images

Following the recent lawsuit filed by the Recording Industry Association of America (RIAA) against music generation startups Udio and Suno, Suno admitted in a court filing on Thursday that it did, in fact, train its AI model using copyrighted songs. But it claimed that doing so was legal under the fair-use doctrine.

The RIAA filed the lawsuit against Udio and Suno on June 24, alleging that the companies trained their models using copyrighted music. While Suno’s investors have previously hinted that the startup didn’t have permission from the music labels to use the copyrighted material, it hasn’t been so directly stated as it was in today’s filing. 

“It is no secret that the tens of millions of recordings that Suno’s model was trained on presumably included recordings whose rights are owned by the Plaintiffs in this case,” the filing states.

Suno CEO and co-founder Mikey Shulman continued on in a blog post published the same day as the legal filing, saying “We train our models on medium- and high-quality music we can find on the open internet… Much of the open internet indeed contains copyrighted materials, and some of it is owned by major record labels.”

Shulman also argued that training its AI model from data on the “open internet” is no different than a “kid writing their own rock songs after listening to the genre.” 

“Learning is not infringing. It never has been, and it is not now,” Shulman added. 

The RIAA clapped back with this response: “It’s a major concession of facts they spent months trying to hide and acknowledged only when forced by a lawsuit. Their industrial scale infringement does not qualify as ‘fair use’. There’s nothing fair about stealing an artist’s life’s work, extracting its core value, and repackaging it to compete directly with the originals…Their vision of the ‘future of music’ is apparently one in which fans will no longer enjoy music by their favorite artists because those artists can no longer earn a living.”

The question of fair use was never simple, but with AI model training even established doctrine may not be applicable. The outcome of this case, still in its early stages, will likely establish an influential precedent that could define the future of more than just the two startups named in it.

Piramidal's foundation model for brain waves could supercharge EEGs

Image Credits: Piramidal

AI models are being applied to every dataset under the sun but are inconsistent in their outcomes. This is as true in the medical world as anywhere else, but a startup called Piramidal believes it has a sure thing with a foundational model for analyzing brain scan data.

Co-founders Dimitris Sakellariou and Kris Pahuja have observed that electroencephalography (EEG) technology, while used in practically every hospital, is fragmented among many types of machines and requires specialized knowledge to interpret. A piece of software that can consistently flag worrisome patterns, regardless of time, location, or equipment type, could improve outcomes for folks with brain disorders, while taking some of the load off overworked nurses and doctors.

“In the neural ICU, there are nurses actually monitoring the patient and looking for signs on the EEG. But sometimes they have to leave the room, and these are acute conditions,” said Pahuja. An abnormal reading or alarm could mean an epileptic episode, or a stroke, or something else — nurses don’t have that training, and even specialist doctors may recognize one but not the other.

The two started the company after working for years on the feasibility of computational tools in neurology. They found there is absolutely a way to automate analysis of EEG data that is beneficial for care but that there’s no simple way to deploy that technology where it’s needed.

“I have experience with this, and I mean I’ve been sitting next to neurologists in the operating room to understand exactly why these brain waves are useful, and how we can build computational systems to identify them,” said Sakellariou. “They’re helpful in many contexts, but every time you use an EEG device, you have to rebuild the whole system for that specific problem. You need to get new data, you need to have humans annotate the data from scratch.”

That would be hard enough if every EEG system, hospital IT setup, and data format were the same, but they vary widely in the most basic elements, like how many electrodes are on the machine and where they’re placed.

Co-founders Dimitris Sakellariou (left) and Kris Pahuja.
Image Credits: Piramidal

Piramidal’s founders believe — and claim to know, though this culmination of their work is not yet published — that a foundational model for EEG readings could make lifesaving brain wave pattern detection work out-of-the-box rather than after months of studies.

To be clear, it’s not meant to be a do-it-all medical platform — a closer analogue may be Meta’s Llama series of (relatively) open models, which foot the initial expense of creating the foundational capability of language understanding. Whether you build a customer service chatbot or a digital friend is up to you, but neither works without the fundamental ability to understand human language.

But AI models aren’t limited to language — they can be trained to work in fluid dynamics, music, chemistry, and more. For Piramidal, the “language” is brain activity, as read by EEGs, and the resulting model would notionally be capable of understanding and interpreting signals from any setup, any number of electrodes or model of machine, and any patient.

No one has yet built one — at least, not publicly.

Although they were careful not to overstate their current progress, Sakellariou and Pahuja did say, “We have built the foundational model, we have run our experiments on it, and now we are in the process of productionizing the code base so it is ready to be scaled to billions of parameters. It’s not about research — from day one it’s been about building the model.”

The first production version of this model will be deployed in hospitals early next year, Pahuja said. “We’re working on four pilots starting in Q1; all four of them will test in the ICU, and all four want to co-develop with us.” This will be a valuable proof of concept that the model works in the diverse circumstances presented by any care unit. (Of course, Piramidal’s tech will be over and above any monitoring the patients would normally be provided.)

The foundation model will still need to be fine-tuned for certain applications, work that Pahuja said they will do themselves at first; unlike many other AI companies, they don’t plan to build a foundation model and then rake in fees from API usage. But they were clear that it’s still incredibly valuable as is.

“There’s no world where a model trained from scratch will do better than a pretrained model like ours; having a warm start can only improve things,” Sakellariou said. “It’s still the biggest EEG model that has ever existed, infinitely larger than anything else out there.”

To move forward, Piramidal needs the two things essential to every AI company: money and data. The first they have a start on, with a $6 million seed round co-led by Adverb Ventures and Lionheart Ventures, with participation by Y Combinator and angel investors. That money will go toward compute costs (huge for training models) and staffing up.

As far as data goes, they have enough to get their first production model trained. “It turns out there’s a lot of open source data — but a lot of open source siloed data. So we’ve been in the process of aggregating and harmonizing that into a big integrated data store.”

The partnerships with the hospitals should provide valuable and voluminous training data, though — thousands of hours of it. This and other sources could help elevate the next version of the model beyond human capability.

Right now, Sakellariou said, “We can address confidently this set of defined patterns doctors look out for. But a bigger model will let us pick out patterns smaller than the human eye can consistently and empirically tell exist.”

That’s still a ways off, but superhuman capability is not a prerequisite to improving the quality of care. The ICU pilots should allow the tech to be evaluated and documented much more rigorously, both in scientific literature and likely in investors’ meeting rooms.

AI music startup Suno claims training model on copyrighted music is 'fair use'

3D headphones with sound waves on dark background. Concept of electronic music listening and digital audio. Abstract visualization of digital sound waves and modern art. Vector illustration. (3D headphones with sound waves on dark background. Concept

Image Credits: maxkabakov (opens in a new window) / Getty Images

Following the recent lawsuit filed by the Recording Industry Association of America (RIAA) against music generation startups Udio and Suno, Suno admitted in a court filing on Thursday that it did, in fact, train its AI model using copyrighted songs. But it claimed that doing so was legal under the fair-use doctrine.

The RIAA filed the lawsuit against Udio and Suno on June 24, alleging that the companies trained their models using copyrighted music. While Suno’s investors have previously hinted that the startup didn’t have permission from the music labels to use the copyrighted material, it hasn’t been so directly stated as it was in today’s filing. 

“It is no secret that the tens of millions of recordings that Suno’s model was trained on presumably included recordings whose rights are owned by the Plaintiffs in this case,” the filing states.

Suno CEO and co-founder Mikey Shulman continued on in a blog post published the same day as the legal filing, saying “We train our models on medium- and high-quality music we can find on the open internet… Much of the open internet indeed contains copyrighted materials, and some of it is owned by major record labels.”

Shulman also argued that training its AI model from data on the “open internet” is no different than a “kid writing their own rock songs after listening to the genre.” 

“Learning is not infringing. It never has been, and it is now now,” Shulman added. 

The RIAA clapped back with this response: “It’s a major concession of facts they spent months trying to hide and acknowledged only when forced by a lawsuit. Their industrial scale infringement does not qualify as ‘fair use’. There’s nothing fair about stealing an artist’s life’s work, extracting its core value, and repackaging it to compete directly with the originals…Their vision of the ‘future of music’ is apparently one in which fans will no longer enjoy music by their favorite artists because those artists can no longer earn a living.”

The question of fair use was never simple, but with AI model training even established doctrine may not be applicable. The outcome of this case, still in its early stages, will likely establish an influential precedent that could define the future of more than just the two startups named in it.

The RIAA’s lawsuit against generative music startups will be the bloodbath AI needs

A new Chinese video-generating model appears to be censoring politically sensitive topics

Image Credits: Photo by VCG/VCG via Getty Images

A powerful new video-generating AI model became widely available today — but there’s a catch: The model appears to be censoring topics deemed too politically sensitive by the government in its country of origin, China.

The model, Kling, developed by Beijing-based company Kuaishou, launched in waitlisted access earlier in the year for users with a Chinese phone number. Today, it rolled out for anyone willing to provide their email. After signing up, users can enter prompts to have the model generate five-second videos of what they’ve described.

Kling works pretty much as advertised. Its 720p videos, which take a minute or two to generate, don’t deviate too far from the prompts. And Kling appears to simulate physics, like the rustling of leaves and flowing water, about as well as video-generating models like AI startup Runway’s Gen-3 and OpenAI’s Sora.

But Kling outright won’t generate clips about certain subjects. Prompts like “Democracy in China,” “Chinese President Xi Jinping walking down the street” and “Tiananmen Square protests” yield a nonspecific error message.

Kling AI
Image Credits: Kuaishou

The filtering appears to be happening only at the prompt level. Kling supports animating still images, and it’ll uncomplainingly generate a video of a portrait of Jinping, for example, as long as the accompanying prompt doesn’t mention Jinping by name (e.g., “This man giving a speech”).

We’ve reached out to Kuaishou for comment.

Kling AI
Image Credits: Kuaishou

Kling’s curious behavior is likely the result of intense political pressure from the Chinese government on generative AI projects in the region.

Earlier this month, the Financial Times reported that AI models in China will be tested by China’s leading internet regulator, the Cyberspace Administration of China (CAC), to ensure that their responses on sensitive topics “embody core socialist values.” Models are to be benchmarked by CAC officials for their responses to a variety of queries, per the Financial Times report — many related to Jinping and criticism of the Communist Party.

Reportedly, the CAC has gone so far as to propose a blacklist of sources that can’t be used to train AI models. Companies submitting models for review must prepare tens of thousands of questions designed to test whether the models produce “safe” answers.

The result is AI systems that decline to respond on topics that might raise the ire of Chinese regulators. Last year, the BBC found that Ernie, Chinese company Baidu’s flagship AI chatbot model, demurred and deflected when asked questions that might be perceived as politically controversial, like “Is Xinjiang a good place?” or “Is Tibet a good place?”

The draconian policies threaten to slow China’s AI advances. Not only do they require scouring data to remove politically sensitive info, but they also necessitate investing an enormous amount of dev time in creating ideological guardrails — guardrails that might still fail, as Kling exemplifies.

From a user perspective, China’s AI regulations are already leading to two classes of models: some hamstrung by intensive filtering and others decidedly less so. Is that really a good thing for the broader AI ecosystem?

Meta's 'pay or consent' model fails EU competition rules, Commission finds

Mark Zuckerberg, CEO of Meta testifies before the Senate Judiciary Committee.

Image Credits: Alex Wong / Staff / Getty Images

An investigation conducted by the European Commission has found that Meta’s “pay or consent” offer to Facebook and Instagram users in Europe does not comply with the bloc’s Digital Markets Act (DMA), according to preliminary findings reported by the regulator on Monday.

The Commission wrote in a press release that the binary choice Meta offers “forces users to consent to the combination of their personal data and fails to provide them a less personalised but equivalent version of Meta’s social networks.”

Failure to abide by the ex-ante market contestability regulation, which applied on Meta and other so called “gatekeepers” since March 7, could be extremely costly for the adtech giant. Penalties for confirmed breaches can reach up to 10% of global annual turnover, and even 20% for repeat offences.

More saliently, Meta could finally be forced to abandon a business model that demands users agree to surveillance advertising as the entry “price” for using its social networks.

The EU in March opened a formal DMA investigation into Meta’s “pay or consent” offer, following months of lobbying from privacy advocacy and consumer protection groups. The groups also argued that a subscription to not see ads does not comply with the bloc’s data protection or consumer protection rules either.

Back in March, the Commission said it was concerned Meta’s binary choice may not provide “a real alternative” for users who do not consent to its tracking. Meta was essentially asking users to either agree to being tracked so it could continue serving targeted advertising, or fork out almost €13 per month (per account) to access ad-free versions of the services.

The EU’s goal with the DMA is to level the playing field by targeting various advantages that gatekeepers can exploit using their dominance.

In Meta’s case, the Commission thinks the company’s dominant position in social networking lets it extract more data from users to profile them, which gives its ad business an unfair advantage versus its competitors. To reset the dynamic, the EC introduced a requirement in the DMA that gatekeepers must obtain people’s permission before they can be tracked for ads.

The regulator’s case against Meta contends the adtech giant is failing to provide people with a free and fair choice to deny tracking.

In a briefing with journalists ahead of the announcement, senior Commission officials emphasized that as long as Meta’s social networking services are free, the equivalent versions it offers to users who do not wish to consent to tracking must also be free.

The relevant DMA article here is Article 5(2), which requires gatekeepers to seek users’ consent for combining their personal data between designated core platform services (CPS) and other services. Facebook, Instagram and Meta’s ads business have been designated as CPS since September 2023, so the company needs users’ permission to track and profile their activity and run “personalized” ads.

Users who refuse Meta’s tracking have the right to access a less-personalized but equivalent alternative, and the Commission’s preliminary view after around three months of investigations is that Meta is breaching this requirement, as a paid subscription is not a valid equivalent to free access.

The regulation also stipulates gatekeepers cannot make use of a service or certain functionalities conditional on users’ consent.

Meta spokesman Matthew Pollard responded to the EU’s findings by sending a statement attributed to a company spokesperson. Meta repeated a defense of the approach by citing an earlier EU court judgment, writing: “Subscription for no ads follows the direction of the highest court in Europe and complies with the DMA. We look forward to further constructive dialogue with the European Commission to bring this investigation to a close.”

When asked about this defense, senior Commission officials pointed out that the judgement Meta is referring to involved the Court of Justice caveating the suggestion that a paid version of a service may be offered as an alternative to tracking ads, saying that only “if necessary” could an “appropriate fee” be charged.

In the DMA context, the bloc’s enforcers say a gatekeeper would therefore have to argue why a fee is necessary. The EU pointed out that Meta could offer an alternative service with ads that do not rely on any personal data for targeting — such as contextual advertising.

Meta has never explained why it has not offered users a free, contextual ads option.

The EU looks to be on a road to forcing Meta to provide a non-binary, privacy-safe choice in the coming months.

“To ensure compliance with the DMA, users who do not consent should still get access to an equivalent service which uses less of their personal data, in this case for the personalisation of advertising,” the Commission noted in the press release.

Commission officials noted that Meta could still offer a subscription option, but any paid choice would need to be an additional offer (i.e. a third choice) on top of a free equivalent that does not demand users consent to being tracked.

The EU’s investigation isn’t over yet, and Meta will have a chance to respond formally to the preliminary findings. But there’s a limited window for things to play out here: The bloc has set itself a 12-month timeline to complete the probe, which suggests it needs to finish the job by or before March 2025.

BEUC, the European consumer organization, welcomed the preliminary findings, urging the EU to push through to speedy enforcement.

“It’s good news that the Commission is taking enforcement action based on the Digital Markets Act against Meta’s pay-or-consent model. It comes on top of the complaints against Meta’s model for breaches of consumer law and data protection law, which consumer organisations have raised in the last few months. We now urge Meta to comply with laws meant to protect consumers,” said Agustin Reyna, BEUC’s director general, in a statement.

Meta’s ‘consent or pay’ data grab in Europe faces new complaints

Apple, Google and Meta face first formal investigations under EU’s DMA

Anthropic claims its latest model is best-in-class

Anthropic Claude logo

Image Credits: Anthropic

OpenAI rival Anthropic is releasing a powerful new generative AI model called Claude 3.5 Sonnet. But it’s more an incremental step than a monumental leap forward.

Claude 3.5 Sonnet can analyze both text and images as well as generate text, and it’s Anthropic’s best-performing model yet — at least on paper. Across several AI benchmarks for reading, coding, math and vision, Claude 3.5 Sonnet outperforms the model it’s replacing, Claude 3 Sonnet, and beats Anthropic’s previous flagship model Claude 3 Opus.

Benchmarks aren’t necessarily the most useful measure of AI progress, in part because many of them test for esoteric edge cases that aren’t applicable to the average person, like answering health exam questions. But for what it’s worth, Claude 3.5 Sonnet just barely bests rival leading models, including OpenAI’s recently launched GPT-4o, on some of the benchmarks Anthropic tested it against.

Alongside the new model, Anthropic is releasing what it’s calling Artifacts, a workspace where users can edit and add to content — e.g. code and documents — generated by Anthropic’s models. Currently in preview, Artifacts will gain new features, like ways to collaborate with larger teams and store knowledge bases, in the near future, Anthropic says.

Focus on efficiency

Claude 3.5 Sonnet is a bit more performant than Claude 3 Opus, and Anthropic says that the model better understands nuanced and complex instructions, in addition to concepts like humor. (AI is notoriously unfunny, though.) But perhaps more importantly for devs building apps with Claude that require prompt responses (e.g. customer service chatbots), 3.5 Sonnet is faster. It’s around twice the speed of 3 Opus, Anthropic claims.

Vision — analyzing photos — is one area where Claude 3.5 Sonnet greatly improves over 3 Opus, according to Anthropic. 3.5 Sonnet can interpret charts and graphs more accurately and transcribe text from “imperfect” images, such as pics with distortions and visual artifacts.

Michael Gerstenhaber, product lead at Anthropic, says that the improvements are the result of architectural tweaks and new training data, including AI-generated data. Which data specifically? Gerstenhaber wouldn’t disclose, but he implied that Claude 3.5 Sonnet draws much of its strength from these training sets.

Anthropic Claude 3.5 Sonnet
Image Credits: Anthropic

“What matters to [businesses] is whether or not AI is helping them meet their business needs, not whether or not AI is competitive on a benchmark,” Gerstenhaber told TechCrunch. “And from that perspective, I believe Claude 3.5 Sonnet is going to be a step function ahead of anything else that we have available — and also ahead of anything else in the industry.”

The secrecy around training data could be for competitive reasons. But it could also be to shield Anthropic from legal challenges — in particular challenges pertaining to fair use. The courts have yet to decide whether vendors like Anthropic and its competitors, like OpenAI, Google, Amazon and so on, have a right to train on public data, including copyrighted data, without compensating or crediting the creators of that data.

So, all we know is that Claude 3.5 Sonnet was trained on lots of text and images, like Anthropic’s previous models, plus feedback from human testers to try to “align” the model with users’ intentions, hopefully preventing it from spouting toxic or otherwise problematic text.

Anthropic Claude 3.5 Sonnet
Image Credits: Anthropic

What else do we know? Well, Claude 3.5 Sonnet’s context window — the amount of text that the model can analyze before generating new text — is 200,000 tokens, the same as 3 Sonnet. Tokens are subdivided bits of raw data, like the syllables “fan,” “tas” and “tic” in the word “fantastic”; 200,000 tokens is equivalent to about 150,000 words.

And we know that Claude 3.5 Sonnet is available today. Free users of Anthropic’s web client and the Claude iOS app can access it at no charge; subscribers to Anthropic’s paid plans Claude Pro and Claude Team get 5x higher rate limits. 3.5 Sonnet is also live on Anthropic’s API and managed platforms like Amazon Bedrock and Google Cloud’s Vertex AI.

“Claude 3.5 Sonnet is really a step change in intelligence without sacrificing speed, and it sets us up for future releases along the entire Claude model family,” Gerstenhaber said.

Claude 3.5 Sonnet also drives Artifacts, which pops up a dedicated window in the Claude web client when a user asks the model to generate content like code snippets, text documents or website designs. Gerstenhaber explains: “Artifacts are the model output that puts generated content to the side and allows you, as a user, to iterate on that content. Let’s say you want to generate code — the artifact will be put in the UI, and then you can talk with Claude and iterate on the document to improve it so you can run the code.”

The bigger picture

So what’s the significance of Claude 3.5 Sonnet in the broader context of Anthropic — and the AI ecosystem, for that matter?

Claude 3.5 Sonnet shows that incremental progress is the extent of what we can expect right now on the model front, barring a major research breakthrough. The past few months have seen flagship releases from Google (Gemini 1.5 Pro) and OpenAI (GPT-4o) that move the needle marginally in terms of benchmark and qualitative performance. But there hasn’t been a leap of matching the leap from GPT-3 to GPT-4 in quite some time, owing to the rigidity of today’s model architectures and the immense compute they require to train.

As generative AI vendors turn their attention to data curation and licensing in lieu of promising new scalable architectures, there are signs investors are becoming wary of the longer-than-anticipated path to ROI for generative AI. Anthropic is somewhat inoculated from this pressure, being in the enviable position of Amazon’s (and to a lesser extent Google’s) insurance against OpenAI. But the company’s revenue, forecasted to reach just under $1 billion by year-end 2024, is a fraction of OpenAI’s — and I’m sure Anthropic’s backers don’t let it forget that fact.

Despite a growing customer base that includes household brands such as Bridgewater, Brave, Slack and DuckDuckGo, Anthropic still lacks a certain enterprise cachet. Tellingly, it was OpenAI — not Anthropic — with which PwC recently partnered to resell generative AI offerings to the enterprise.

So Anthropic is taking a strategic, and well-trodden, approach to making inroads, investing development time into products like Claude 3.5 Sonnet to deliver slightly better performance at commodity prices. 3.5 Sonnet is priced the same as 3 Sonnet: $3 per million tokens fed into the model and $15 per million tokens generated by the model.

Gerstenhaber spoke to this in our conversation. “When you’re building an application, the end user shouldn’t have to know which model is being used or how an engineer optimized for their experience,” he said, “but the engineer could have the tools available to optimize for that experience along the vectors that need to be optimized, and cost is certainly one of them.”

Claude 3.5 Sonnet doesn’t solve the hallucinations problem. It almost certainly makes mistakes. But it might just be attractive enough to get developers and enterprises to switch to Anthropic’s platform. And at the end of the day, that’s what matters to Anthropic.

Toward that same end, Anthropic has doubled down on tooling like its experimental steering AI, which lets developers “steer” its models’ internal features; integrations to let its models take actions within apps; and tools built on top of its models such as the aforementioned Artifacts experience. It’s also hired an Instagram co-founder as head of product. And it’s expanded the availability of its products, most recently bringing Claude to Europe and establishing offices in London and Dublin.

Anthropic, all told, seems to have come around to the idea that building an ecosystem around models — not simply models in isolation — is the key to retaining customers as the capabilities gap between models narrows.

Still, Gerstenhaber insisted that bigger and better models — like Claude 3.5 Opus — are on the near horizon, with features such as web search and the ability to remember preferences in tow.

“I haven’t seen deep learning hit a wall yet, and I’ll leave it to researchers to speculate about the wall, but I think it’s a little bit early to be coming to conclusions on that, especially if you look at the pace of innovation,” he said. “There’s very rapid development and very rapid innovation, and I have no reason to believe that it’s going to slow down.”

We’ll see.

Robot holds a green check mark and red x on a purple background.

Building a viable pricing model for generative AI features could be challenging

Robot holds a green check mark and red x on a purple background.

Image Credits: tommy / Getty Images

In October, Box unveiled a new pricing approach for the company’s generative AI features. Instead of a flat rate, the company designed a unique consumption-based model.

Each user gets 20 credits per month, good for any number of AI tasks that add up to 20 events, with each task charged a single credit. After that, people can dip into a company pool of 2,000 additional credits. If the customer surpasses that, it would be time to have a conversation with a salesperson about buying additional credits.

Box CEO Aaron Levie explained that this approach provides a way to charge based on usage with the understanding that some users would take advantage of the AI features more than others, while also accounting for the cost of using the OpenAI API, which the company is using for its underlying large language model.

Meanwhile, Microsoft has chosen a more traditional pricing model, announcing in November that it would charge $30 per user per month to use its Copilot features, over and above the cost of a normal monthly Office 365 subscription, which varies by customer.

While it became clear throughout last year that enterprise software companies would be building generative AI features, at a panel on generative AI’s impact on SaaS companies at Web Summit in November, Christine Spang, co-founder and CTO at Nylas, a communications API startup, and Manny Medina, CEO at sales enablement platform Outreach, spoke about the challenges that SaaS companies face as they implement these features.

Spang says, for starters, that in spite of the hype, generative AI is clearly a big leap forward, and software companies need to look for ways to incorporate it into their products. “I’m not going to say it’s like 10 out of 10 where the hype meets the [current] reality, but I do think there is real value there and what’s really going to make the difference is how people take the technology and connect it to other systems, other apps and sort of drive real value in different use cases with it,” she said.

It’s also about finding a balance between providing the kind of features that customers are suddenly demanding, and figuring out a way to price it in a way that provides real customer value, yet allows the company to make money. “In reality, those of us who are bundling [generative AI features] need to repeatedly check back with our [large language model] provider, and that’s going to get expensive quickly. So until we create experiences that are 10x differentiated, and for which somebody wants to pay for it, it’s going to be challenging,” Medina said.

It’s worth noting that model makers like OpenAI are already announcing price cuts as they find ways to run models more efficiently, or cut prices on older products as new ones are announced. For example, in June, the company announced some new features that increase processing power, which provide more bang for the buck, while also lowering the cost of prior versions for developers who don’t require all the latest bells and whistles.

Spang says her company is already using a consumption model based on the number of connected email or calendar applications, and she expects to follow a similar approach as they add generative AI features.

“We already have the case where some people send a lot more messages, or they receive a lot more messages and I think it’s important to map [to a similar pricing model] that people understand, and then hopefully we can find a price point that kind of works through the median,” she said.

But Medina says for an application, it’s more difficult to use a consumption model than an API provider like Nylas. “I just don’t know that that’s an acceptable model in applications. When you’re a provider of Legos [like Nylas], it’s a different story, but for application providers, [it’s more difficult],” he said.

But it’s also not clear that companies will be willing to pay a flat rate like Microsoft’s $30 a month per user for Office 365, unless they can see real value from that additional cost. “The jury’s still out until somebody either lowers the cost and it makes it very accessible for the rest of us, or we figure out a way to monetize it,” Medina said.

One big unknown also is the compliance costs that could be related to using this technology, which remains a big open question for companies and their customers. “So if you start embedding some of these applications and the U.S. [or other government] passes a law where you have to disclose the list of ingredients of your AI, you’re not getting that from OpenAI, so that’s going to be difficult,” he said.

CIOs who control the company technology budget are taking a close look at this technology, but they are still trying to figure out if the extra cost being passed on to them will pay for itself in terms of higher employee productivity.

Sharon Mandell, CIO at Juniper Networks, says she is looking closely at the ROI on these features. “In 2024, we’re going to be testing the GenAI hype, because if those tools can produce the types of benefits that they say, then the ROI on those is high and may help us eliminate other things,” she said. So she and other CIOs are running pilots, moving cautiously and trying to find ways to measure whether there is truly a productivity increase to justify the increased cost.

Regardless, companies will continue to experiment with pricing models, while their customers are conducting pilots and proofs of concept. It seems like they both have something to gain here, but until we start to see more of these tools in production, it’s hard to know the real benefits to everyone involved.

When it comes to generative AI in the enterprise, CIOs are taking it slow