China's generative video race heats up

Image Credits: Tencent DynamiCrafter

On Monday, Tencent, the Chinese internet giant known for its video gaming empire and chat app WeChat, unveiled a new version of its open source video generation model DynamiCrafter on GitHub. It’s a reminder that some of China’s largest tech firms have been quietly ramping up efforts to make a dent in the text- and image-to-video space.

Like other generative video tools on the market, DynamiCrafter uses the diffusion method to turn captions and still images into seconds-long videos. Inspired by the natural phenomenon of diffusion in physics, diffusion models in machine learning can transform simple data into more complex and realistic data, similar to how particles move from one area of high concentration to another of low concentration.

The second generation of DynamiCrafter is churning out videos at a pixel resolution of 640 x 1024, an upgrade from its initial release in October that featured 320 x 512 videos. An academic paper published by the team behind DynamiCrafter notes that its technology differs from those of competitors in that it broadens the applicability of image animation techniques to “more general visual content.”

“The key idea is to utilize the motion prior of text-to-video diffusion models by incorporating the image into the generative process as guidance,” says the paper. “Traditional” techniques, in comparison, “mainly focus on animating natural scenes with stochastic dynamics (e.g. clouds and fluid) or domain-specific motions (e.g. human hair or body motions).”

In a demo (see below) that compares DynamiCrafter, Stable Video Diffusion (launched in November), and the recently hyped-up Pika Labs, the result of the Tencent model appears slightly more animated than others. Inevitably, the chosen samples would favor DynamiCrafter, and none of the models, after my initial few tries, leaves the impression that AI will soon be able to produce full-fledged movies.

Nonetheless, generative videos have been given high hopes as the next focal point in the AI race following the boom of generative text and images. It’s thus expected that startups and tech incumbents are pouring resources into the field. That’s no exception in China. Aside from Tencent, TikTok’s parent ByteDance, Baidu and Alibaba have each released their video diffusion models.

Both ByteDance’s MagicVideo and Baidu’s UniVG have posted demos on GitHub, though neither appears to be available to the public yet. Like Tencent, Alibaba has made its video generation model VGen open source, a strategy that’s increasingly popular among Chinese tech firms hoping to reach the global developer community.

Fireworks.ai open source API puts generative AI in reach of any developer

Colorful fireworks going off over a city.

Image Credits: thianchai sitthikongsak / Getty Images

Just about everyone is trying to get a piece of the generative AI action these days. While the majority of the focus remains on the model vendors like OpenAI, Anthropic and Cohere, or the bigger companies like Microsoft, Meta, Google and Amazon, there are in fact, a lot of startups trying to attack the generative AI problem in a variety of ways.

Fireworks.ai is one such startup. While lacking the brand name recognition of some of these other players, it boasts the largest open source model API with over 12,000 users, per the company. That kind of open source traction tends to attract investor attention, and the company has raised $25 million so far.

Fireworks co-founder and CEO Lin Qiao points out that her company isn’t training foundation models from scratch, but rather helping fine tune other models to the particular needs of a business. “It can be either off the shelf, open source models or the models we tune or the models our customer can tune by themselves. All three varieties can be served through our inference engine API,” Qiao told TechCrunch.

Being an API, developers can plug it into their application, bring their model of choice trained on their data, and add generative AI capabilities like asking questions very quickly. Qiao says it’s fast, efficient and produces high-quality results.

Another advantage of Firework’s approach is that it allows companies to experiment with multiple models, something that’s important in a fast-changing market. “Our philosophy here is we want to empower users to iterate and experiment with multiple models and have effective tools to infuse their data into multiple models and test with a product,” she said.

Perhaps even more importantly, they keep costs down by limiting the model size to between 7 billion and 13 billion parameters, compared with over 1 trillion parameters in ChatGPT4. While that limits the universe of words the large language model can understand, it enables developers to focus on much smaller, focused data sets designed to work with more limited business use cases.

Qiao is uniquely qualified to build such a system having previously worked at Meta, leading the AI platform development team with a goal of building a fast, scalable development engine to power AI across all of Meta’s products and services. She was able to take this knowledge from working at Meta and create an API-based tool that puts that kind of power in reach of any company without requiring the level of engineering resources of a company the size of Meta.

The company raised $25 million in 2022 led by Benchmark, with participation from Sequoia Capital and unnamed angel investors.

Google injects generative AI into its cloud security tools

Image Credits: Google

At its annual Cloud Next conference in Las Vegas, Google on Tuesday introduced new cloud-based security products and services — in addition to updates to existing products and services — aimed at customers managing large, multi-tenant corporate networks.

Many of the announcements had to do with Gemini, Google’s flagship family of generative AI models.

For example, Google unveiled Gemini in Threat Intelligence, a new Gemini-powered component of the company’s Mandiant cybersecurity platform. Now in public preview, Gemini in Threat Intelligence can analyze large portions of potentially malicious code and let users perform natural language searches for ongoing threats or indicators of compromise, as well as summarize open source intelligence reports from around the web.

“Gemini in Threat Intelligence now offers conversational search across Mandiant’s vast and growing repository of threat intelligence directly from frontline investigations,” Sunil Potti, GM of cloud security at Google, wrote in a blog post shared with TechCrunch. “Gemini will navigate users to the most relevant pages in the integrated platform for deeper investigation … Plus, [Google’s malware detection service] VirusTotal now automatically ingests OSINT reports, which Gemini summarizes directly in the platform.”

Elsewhere, Gemini can now assist with cybersecurity investigations in Chronicle, Google’s cybersecurity telemetry offering for cloud customers. Set to roll out by the end of the month, the new capability guides security analysts through their typical workflows, recommending actions based on the context of a security investigation, summarizing security event data and creating breach and exploit detection rules from a chatbot-like interface.

And in Security Command Center, Google’s enterprise cybersecurity and risk management suite, a new Gemini-driven feature lets security teams search for threats using natural language while providing summaries of misconfigurations, vulnerabilities and possible attack paths.

Rounding out the security updates were privileged access manager (in preview), a service that offers just-in-time, time-bound and approval-based access options designed to help mitigate risks tied to privileged access misuse. Google’s also rolling out principal access boundary (in preview, as well), which lets admins implement restrictions on network root-level users so that those users can only access authorized resources within a specifically defined boundary.

Lastly, Autokey (in preview) aims to simplify creating and managing customer encryption keys for high-security use cases, while Audit Manager (also in preview) provides tools for Google Cloud customers in regulated industries to generate proof of compliance for their workloads and cloud-hosted data.

“Generative AI offers tremendous potential to tip the balance in favor of defenders,” Potti wrote in the blog post. “And we continue to infuse AI-driven capabilities into our products.”

Google isn’t the only company attempting to productize generative AI–powered security tooling. Microsoft last year launched a set of services that leverage generative AI to correlate data on attacks while prioritizing cybersecurity incidents. Startups, including Aim Security, are also jumping into the fray, aiming to corner the nascent space.

But with generative AI’s tendency to make mistakes, it remains to be seen whether these tools have staying power.

Illustration of IT operations team working together surrounded by different types of equipment.

BigPanda launches generative AI tool designed specifically for ITOps

Illustration of IT operations team working together surrounded by different types of equipment.

Image Credits: IR_Stone / Getty Images

IT operations personnel have a lot going on, and when an incident occurs that brings down a key system, time is always going to be against them. Over the years, companies have looked for an edge in getting up faster with playbooks designed to find answers to common problems, and postmortems to keep them from repeating, but not every problem is easily solved, and there is so much data and so many possible points of failure.

It’s actually a perfect problem for generative AI to solve, and AIOps startup BigPanda announced a new generative AI tool today called Biggy to help solve some of these issues faster. Biggy is designed to look across a wide variety of IT-related data to learn how the company operates and compare it to the problem scenario and other similar scenarios and suggest a solution.

BigPanda has been using AI since the early days of the company and deliberately designed two separate systems: one for the data layer and another for the AI. This in a way prepared them for this shift to generative AI based on large language models. “The AI engine before Gen AI was building a lot of other types of AI, but it was feeding off of the same data engine that will be feeding what we’re doing with Biggy, and what we’re doing with generative and conversational AI,” BigPanda CEO Assaf Resnick told TechCrunch.

Like most generative AI tools, this one makes a prompt box available where users can ask questions and interact with the bot. In this case, the underlying models have been trained on data inside the customer company, as well as on publicly available data on a particular piece of hardware or software, and are tuned to deal with the kinds of problems IT deals with on a regular basis.

“The out-of-the box LLMs have been trained on a huge amount of data, and they’re really good actually as generalists in all of the operational fields we look at — infrastructure, network, application development, everything there. And they actually know all the hardware very well,” Jason Walker, chief innovation officer at BigPanda, said. “So if you ask it about a certain HP blade server with this error code, it’s pretty good at putting that together, and we use that for a lot of the event traffic.” Of course, it has to be more than that or a human engineer could simply look this up in Google Search.

It combines this knowledge with what it is able to cull internally across a range of data types. “BigPanda ingests the customer’s operational and contextual data from observability, change, CDMB (the file that stores configuration information) and topology along with historical data and human, institutional context — and normalizes the data into key-value pairs, or tags,” Walker said. That’s a lot of technical jargon, but basically it means it looks at system-level information, organizational data and human interactions to deliver a response to help engineers solve the problem.

When a user enters a prompt, it looks across all the data to generate an answer that will hopefully point the engineers in the right direction to fix the problem. They acknowledge that it’s not always perfect because no generative AI is, but they let the user know when there is a lower degree of certainty that the answer is correct.

“For areas where we think we don’t have as much certainty, then we tell them that this is our best information, but a human should take a look at this,” Resnick said. For other areas where there is more certainty, they may introduce automation, working with a tool like Red Hat Ansible to solve the issue without human interaction, he said.

The data ingestion part isn’t always going to be trivial for customers, and this is a first step toward providing an AI assistant that can help IT get at the root of problems and solve them faster. No AI is foolproof, but having an interactive AI tool should be an improvement over current, more time-consuming manual approaches to IT systems troubleshooting.

BigPanda gets its horn after securing $190M in fresh capital

Data flowing through a cloud on a blue background.

NeuBird is building a generative AI solution for complex cloud-native environments

Data flowing through a cloud on a blue background.

Image Credits: Just_Super / Getty Images

NeuBird founders Goutham Rao and Vinod Jayaraman came from Portworx, a cloud-native storage solution they eventually sold to PureStorage in 2019 for $370 million. It was their third successful exit. 

When they went looking for their next startup challenge last year, they saw an opportunity to combine their cloud-native knowledge, especially around IT operations, with the burgeoning area of generative AI. 

Today NeuBird announced a $22 million investment from Mayfield to get the idea to market. It’s a hefty amount for an early-stage startup, but the firm is likely banking on the founders’ experience to build another successful company.

Rao, the CEO, says that while the cloud-native community has done a good job at solving a lot of difficult problems, it has created increasing levels of complexity along the way. 

“We’ve done an incredible job as a community over the past 10-plus years building cloud-native architectures with service-oriented designs. This added a lot of layers, which is good. That’s a proper way to design software, but this also came at a cost of increased telemetry. There’s just too many layers in the stack,” Rao told TechCrunch.

They concluded that this level of data was making it impossible for human engineers to find, diagnose and solve problems at scale inside large organizations. At the same time, large language models were beginning to mature, so the founders decided to put them to work on the problem.

“We’re leveraging large language models in a very unique way to be able to analyze thousands and thousands of metrics, alerts, logs, traces and application configuration information in a matter of seconds and be able to diagnose what the health of the environment is, detect if there’s a problem and come up with a solution,” he said.

The company is essentially building a trusted digital assistant to the engineering team. “So it’s a digital co-worker that works alongside SREs and ITOps engineers, and monitors all of the alerts and logs looking for issues,” he said. The goal is to reduce the amount of time it takes to respond to and solve an incident from hours to minutes, and they believe that by putting generative AI to work on the problem, they can help companies achieve that goal. 

The founders understand the limitations of large language models, and are looking to reduce hallucinated or incorrect responses by using a limited set of data to train the models, and by setting up other systems that help deliver more accurate responses.

“Because we’re using this in a very controlled manner for a very specific use case for environments we know, we can cross check the results that are coming out of the AI, again through a vector database and see if it’s even making sense and if we’re not comfortable with it, we won’t recommend it to the user.”

Customers can connect directly to their various cloud systems by entering their credentials, and without moving data, NeuBird can use the access to cross-check against other available information to come up with a solution, reducing the overall difficulty associated with getting the company-specific data for the model to work with. 

NeuBird uses various models, including Llama 2 for analyzing logs and metrics. They are using Mistral for other types of analysis. The company actually turns every natural language interaction into a SQL query, essentially turning unstructured data into structured data. They believe this will result in greater accuracy. 

The early-stage startup is working with design and alpha partners right now refining the idea as they work to bring the product to market later this year. Rao says they took a big chunk of money out of the gate because they wanted the room to work on the problem without having to worry about looking for more money too soon.

Snowflake logo at peak of two pieces of angled wood.

Snowflake releases a flagship generative AI model of its own

Snowflake logo at peak of two pieces of angled wood.

Image Credits: Joan Cros/NurPhoto / Getty Images

All-around, highly generalizable generative AI models were the name of the game once, and they arguably still are. But increasingly, as cloud vendors large and small join the generative AI fray, we’re seeing a new crop of models focused on the deepest-pocketed potential customers: the enterprise.

Case in point: Snowflake, the cloud computing company, today unveiled Arctic LLM, a generative AI model that’s described as “enterprise-grade.” Available under an Apache 2.0 license, Arctic LLM is optimized for “enterprise workloads,” including generating database code, Snowflake says, and is free for research and commercial use.

“I think this is going to be the foundation that’s going to let us — Snowflake — and our customers build enterprise-grade products and actually begin to realize the promise and value of AI,” CEO Sridhar Ramaswamy said in press briefing. “You should think of this very much as our first, but big, step in the world of generative AI, with lots more to come.”

An enterprise model

My colleague Devin Coldewey recently wrote about how there’s no end in sight to the onslaught of generative AI models. I recommend you read his piece, but the gist is: Models are an easy way for vendors to drum up excitement for their R&D and they also serve as a funnel to their product ecosystems (e.g. model hosting, fine-tuning and so on).

Arctic LLM is no different. Snowflake’s flagship model in a family of generative AI models called Arctic, Arctic LLM — which took around three months, 1,000 GPUs and $2 million to train — arrives on the heels of Databricks’ DBRX, a generative AI model also marketed as optimized for the enterprise space.

Snowflake draws a direct comparison between Arctic LLM and DBRX in its press materials, saying Arctic LLM outperforms DBRX on the two tasks of coding (Snowflake didn’t specify which programming languages) and SQL generation. The company said Arctic LLM is also better at those tasks than Meta’s Llama 2 70B (but not the more recent Llama 3 70B) and Mistral’s Mixtral-8x7B.

Snowflake also claims that Arctic LLM achieves “leading performance” on a popular general language understanding benchmark, MMLU. I’ll note, though, that while MMLU purports to evaluate generative models’ ability to reason through logic problems, it includes tests that can be solved through rote memorization, so take that bullet point with a grain of salt.

“Arctic LLM addresses specific needs within the enterprise sector,” Baris Gultekin, head of AI at Snowflake, told TechCrunch in an interview, “diverging from generic AI applications like composing poetry to focus on enterprise-oriented challenges, such as developing SQL co-pilots and high-quality chatbots.”

Arctic LLM, like DBRX and Google’s top-performing generative model of the moment, Gemini 1.5 Pro, is a mixture of experts (MoE) architecture. MoE architectures basically break down data processing tasks into subtasks and then delegate them to smaller, specialized “expert” models. So, while Arctic LLM contains 480 billion parameters, it only activates 17 billion at a time — enough to drive the 128 separate expert models. (Parameters essentially define the skill of an AI model on a problem, like analyzing and generating text.)

Snowflake claims that this efficient design enabled it to train Arctic LLM on open public web data sets (including RefinedWeb, C4, RedPajama and StarCoder) at “roughly one-eighth the cost of similar models.”

Running everywhere

Snowflake is providing resources like coding templates and a list of training sources alongside Arctic LLM to guide users through the process of getting the model up and running and fine-tuning it for particular use cases. But, recognizing that those are likely to be costly and complex undertakings for most developers (fine-tuning or running Arctic LLM requires around eight GPUs), Snowflake’s also pledging to make Arctic LLM available across a range of hosts, including Hugging Face, Microsoft Azure, Together AI’s model-hosting service and enterprise generative AI platform Lamini.

Here’s the rub, though: Arctic LLM will be available first on Cortex, Snowflake’s platform for building AI- and machine learning-powered apps and services. The company’s unsurprisingly pitching it as the preferred way to run Arctic LLM with “security,” “governance” and scalability.

“Our dream here is, within a year, to have an API that our customers can use so that business users can directly talk to data,” Ramaswamy said. “It would’ve been easy for us to say, ‘Oh, we’ll just wait for some open source model and we’ll use it. Instead, we’re making a foundational investment because we think [it’s] going to unlock more value for our customers.”

So I’m left wondering: Who’s Arctic LLM really for besides Snowflake customers?

In a landscape full of “open” generative models that can be fine-tuned for practically any purpose, Arctic LLM doesn’t stand out in any obvious way. Its architecture might bring efficiency gains over some of the other options out there. But I’m not convinced that they’ll be dramatic enough to sway enterprises away from the countless other well-known and -supported, business-friendly generative models (e.g. GPT-4).

There’s also a point in Arctic LLM’s disfavor to consider: its relatively small context.

In generative AI, context window refers to input data (e.g. text) that a model considers before generating output (e.g. more text). Models with small context windows are prone to forgetting the content of even very recent conversations, while models with larger contexts typically avoid this pitfall.

Arctic LLM’s context is between ~8,000 and ~24,000 words, dependent on the fine-tuning method — far below that of models like Anthropic’s Claude 3 Opus and Google’s Gemini 1.5 Pro.

Snowflake doesn’t mention it in the marketing, but Arctic LLM almost certainly suffers from the same limitations and shortcomings as other generative AI models — namely, hallucinations (i.e. confidently answering requests incorrectly). That’s because Arctic LLM, along with every other generative AI model in existence, is a statistical probability machine — one that, again, has a small context window. It guesses based on vast amounts of examples which data makes the most “sense” to place where (e.g. the word “go” before “the market” in the sentence “I go to the market”). It’ll inevitably guess wrong — and that’s a “hallucination.”

As Devin writes in his piece, until the next major technical breakthrough, incremental improvements are all we have to look forward to in the generative AI domain. That won’t stop vendors like Snowflake from championing them as great achievements, though, and marketing them for all they’re worth.

African American young developer in eyeglasses concentrating on his online work on computer sitting at workplace

This Week in AI: Generative AI and the problem of compensating creators

African American young developer in eyeglasses concentrating on his online work on computer sitting at workplace

Image Credits: AnnaStills (opens in a new window) / Getty Images

Keeping up with an industry as fast-moving as AI is a tall order. So until an AI can do it for you, here’s a handy roundup of recent stories in the world of machine learning, along with notable research and experiments we didn’t cover on their own.

By the way — TechCrunch plans to launch an AI newsletter soon. Stay tuned.

This week in AI, eight prominent U.S. newspapers owned by investment giant Alden Global Capital, including the New York Daily News, Chicago Tribune and Orlando Sentinel, sued OpenAI and Microsoft for copyright infringement relating to the companies’ use of generative AI tech. They, like The New York Times in its ongoing lawsuit against OpenAI, accuse OpenAI and Microsoft of scraping their IP without permission or compensation to build and commercialize generative models such as GPT-4.

“We’ve spent billions of dollars gathering information and reporting news at our publications, and we can’t allow OpenAI and Microsoft to expand the big tech playbook of stealing our work to build their own businesses at our expense,” Frank Pine, the executive editor overseeing Alden’s newspapers, said in a statement.

The suit seems likely to end in a settlement and licensing deal, given OpenAI’s existing partnerships with publishers and its reluctance to hinge the whole of its business model on the fair use argument. But what about the rest of the content creators whose works are being swept up in model training without payment?

It seems OpenAI’s thinking about that.

A recently-published research paper co-authored by Boaz Barak, a scientist on OpenAI’s Superalignment team, proposes a framework to compensate copyright owners “proportionally to their contributions to the creation of AI-generated content.” How? Through cooperative game theory.

The framework evaluates to what extent content in a training dataset — for example, text, images or some other data — influences what a model generates, employing a game theory concept known as the Shapley value. Then, based on that evaluation, it determines the content owners’ “rightful share” (i.e. compensation).

Let’s say you have an image-generating model trained using artwork from four artists: John, Jacob, Jack and Jebediah. You ask it to draw a flower in Jack’s style. With the framework, you can determine the influence each artist’s works had on the art the model generates and, thus, the compensation that each should receive.

There is a downside to the framework, however — it’s computationally expensive. The researchers’ workarounds rely on estimates of compensation rather than exact calculations. Would that satisfy content creators? I’m not so sure. If OpenAI someday puts it into practice, we’ll certainly find out.

Here are some other AI stories of note from the past few days:

Microsoft reaffirms facial recognition ban: Language added to the terms of service for Azure OpenAI Service, Microsoft’s fully managed wrapper around OpenAI tech, more clearly prohibits integrations from being used “by or for” police departments for facial recognition in the U.S.The nature of AI-native startups: AI startups face a different set of challenges from your typical software-as-a-service company. That was the message from Rudina Seseri, founder and managing partner at Glasswing Ventures, last week at the TechCrunch Early Stage event in Boston; Ron has the full story.Anthropic launches a business plan: AI startup Anthropic is launching a new paid plan aimed at enterprises as well as a new iOS app. Team — the enterprise plan — gives customers higher-priority access to Anthropic’s Claude 3 family of generative AI models plus additional admin and user management controls.CodeWhisperer no more: Amazon CodeWhisperer is now Q Developer, a part of Amazon’s Q family of business-oriented generative AI chatbots. Available through AWS, Q Developer helps with some of the tasks developers do in the course of their daily work, like debugging and upgrading apps — much like CodeWhisperer did.Just walk out of Sam’s Club: Walmart-owned Sam’s Club says it’s turning to AI to help speed up its “exit technology.” Instead of requiring store staff to check members’ purchases against their receipts when leaving a store, Sam’s Club customers who pay either at a register or through the Scan & Go mobile app can now walk out of certain store locations without having their purchases double-checked.Fish harvesting, automated: Harvesting fish is an inherently messy business. Shinkei is working to improve it with an automated system that more humanely and reliably dispatches the fish, resulting in what could be a totally different seafood economy, Devin reports. Yelp’s AI assistant: Yelp announced this week a new AI-powered chatbot for consumers — powered by OpenAI models, the company says — that helps them connect with relevant businesses for their tasks (like installing lighting fixtures, upgrading outdoor spaces and so on). The company is rolling out the AI assistant on its iOS app under the “Projects” tab, with plans to expand to Android later this year.

More machine learnings

Image Credits: US Dept of Energy
Image Credits: US Dept of Energy

Sounds like there was quite a party at Argonne National Lab this winter when they brought in a hundred AI and energy sector experts to talk about how the rapidly evolving tech could be helpful to the country’s infrastructure and R&D in that area. The resulting report is more or less what you’d expect from that crowd: a lot of pie in the sky, but informative nonetheless.

Looking at nuclear power, the grid, carbon management, energy storage, and materials, the themes that emerged from this get-together were, first, that researchers need access to high-powered compute tools and resources; second, learning to spot the weak points of the simulations and predictions (including those enabled by the first thing); third, the need for AI tools that can integrate and make accessible data from multiple sources and in many formats. We’ve seen all these things happening across the industry in various ways, so it’s no big surprise, but nothing gets done at the federal level without a few boffins putting out a paper, so it’s good to have it on the record.

Georgia Tech and Meta are working on part of that with a big new database called OpenDAC, a pile of reactions, materials, and calculations intended to help scientists designing carbon capture processes to do so more easily. It focuses on metal-organic frameworks, a promising and popular material type for carbon capture, but one with thousands of variations, which haven’t been exhaustively tested.

The Georgia Tech team got together with Oak Ridge National Lab and Meta’s FAIR to simulate quantum chemistry interactions on these materials, using some 400 million compute hours — way more than a university can easily muster. Hopefully it’s helpful to the climate researchers working in this field. It’s all documented here.

We hear a lot about AI applications in the medical field, though most are in what you might call an advisory role, helping experts notice things they might not otherwise have seen, or spotting patterns that would have taken hours for a tech to find. That’s partly because these machine learning models just find connections between statistics without understanding what caused or led to what. Cambridge and Ludwig-Maximilians-Universität München researchers are working on that, since moving past basic correlative relationships could be hugely helpful in creating treatment plans.

The work, led by Professor Stefan Feuerriegel from LMU, aims to make models that can identify causal mechanisms, not just correlations: “We give the machine rules for recognizing the causal structure and correctly formalizing the problem. Then the machine has to learn to recognize the effects of interventions and understand, so to speak, how real-life consequences are mirrored in the data that has been fed into the computers,” he said. It’s still early days for them, and they’re aware of that, but they believe their work is part of an important decade-scale development period.

Over at University of Pennsylvania, grad student Ro Encarnación is working on a new angle in the “algorithmic justice” field we’ve seen pioneered (primarily by women and people of color) in the last 7-8 years. Her work is more focused on the users than the platforms, documenting what she calls “emergent auditing.”

When Tiktok or Instagram puts out a filter that’s kinda racist, or an image generator that does something eye-popping, what do users do? Complain, sure, but they also continue to use it, and learn how to circumvent or even exacerbate the problems encoded in it. It may not be a “solution” the way we think of it, but it demonstrates the diversity and resilience of the user side of the equation — they’re not as fragile or passive as you might think.

Google's generative AI can now analyze hours of video

Image Credits: TechCrunch

Gemini, Google’s family of generative AI models, can now analyze longer documents, codebases, videos and audio recordings than before.

During a keynote at the Google I/O 2024 developer conference Tuesday, Google announced the private preview of a new version of Gemini 1.5 Pro, the company’s current flagship model, that can take in up to 2 million tokens. That’s double the previous maximum amount.

At 2 million tokens, the new version of Gemini 1.5 Pro supports the largest input of any commercially available model. The next-largest, Anthropic’s Claude 3, tops out at 1 million tokens.

In the AI field, “tokens” refer to subdivided bits of raw data, like the syllables “fan,” “tas” and “tic” in the word “fantastic.” Two million tokens is equivalent to around 1.4 million words, two hours of video or 22 hours of audio.

Image Credits: TechCrunch

Beyond being able to analyze large files, models that can take in more tokens can sometimes achieve improved performance.

Unlike models with small maximum token inputs (otherwise known as context), models such as the 2-million-token-input Gemini 1.5 Pro won’t easily “forget” the content of very recent conversations and veer off topic. Large-context models can also better grasp the flow of data they take in — hypothetically, at least — and generate contextually richer responses.

Developers interested in trying Gemini 1.5 Pro with a 2-million-token context can add their names to the waitlist in Google AI Studio, Google’s generative AI dev tool. (Gemini 1.5 Pro with 1-million-token context launches in general availability across Google’s developer services and surfaces in the next month.)

Beyond the larger context window, Google says that Gemini 1.5 Pro has been “enhanced” over the last few months through algorithmic improvements. It’s better at code generation, logical reasoning and planning, multi-turn conversation, and audio and image understanding, Google says. And in the Gemini API and AI Studio, 1.5 Pro can now reason across audio in addition to images and video — and be “steered” through a capability called system instructions.

Gemini 1.5 Flash, a faster model

For less demanding applications, Google’s launching in public preview Gemini 1.5 Flash, a “distilled” version of Gemini 1.5 Pro that’s small and efficient model built for “narrow,” “high-frequency” generative AI workloads. Flash — which has up to a 2-million-token context window — is multimodal like Gemini 1.5 Pro, meaning it can analyze audio, video and images as well as text (but it generates only text).

“Gemini Pro is for much more general or complex, often multi-step reasoning tasks,” Josh Woodward, VP of Google Labs, one of Google’s experimental AI divisions, said during a briefing with reporters. “[But] as a developer, you really want to use [Flash] if you care a lot about the speed of the model output.”

Image Credits: TechCrunch

Woodward added that Flash is particularly well-suited for tasks such as summarization, chat apps, image and video captioning and data extraction from long documents and tables.

Flash appears to be Google’s answer to small, low-cost models served via APIs like Anthropic’s Claude 3 Haiku. It, along with Gemini 1.5 Pro, is very widely available, now in over 200 countries and territories including the European Economic Area, U.K. and Switzerland. (The 2-million-token context version is gated behind a waitlist, however.)

In another update aimed at cost-conscious devs, all Gemini models, not just Flash, will soon be able to take advantage of a feature called context caching. This lets devs store large amounts of information (say, a knowledge base or database of research papers) in a cache that Gemini models can quickly and relatively cheaply (from a per-usage standpoint) access.

The complimentary Batch API, available in public preview today in Vertex AI, Google’s enterprise-focused generative AI development platform, offers a more cost-effective way to handle workloads such as classification and sentiment analysis, data extraction and description generation, allowing multiple prompts to be sent to Gemini models in a single request.

Another new feature arriving later in the month in preview in Vertex, controlled generation, could lead to further cost savings, Woodward suggests, by allowing users to define Gemini model outputs according to specific formats or schemas (e.g. JSON or XML).

“You’ll be able to send all of your files to the model once and not have to resend them over and over again,” Woodward said. “This should make the long context [in particular] way more useful — and also more affordable.”

Read more about Google I/O 2024 on TechCrunch

TikTok turns to generative AI to boost its ads business

A laptop keyboard and TikTok logo displayed on a phone screen are seen in this multiple exposure illustration photo taken in Poland on March 17, 2024. (Photo by Jakub Porzycki/NurPhoto via Getty Images)

Image Credits: Jakub Porzycki/NurPhoto / Getty Images

TikTok is the latest tech company to incorporate generative AI into its ads business, as the company announced on Tuesday that it’s launching a new “TikTok Symphony” AI suite for brands. The tools will help marketers write scripts, produce videos and enhance current assets.

The suite includes a new AI video generator called the “Symphony Creative Studio.” The tool can generate TikTok-ready videos with just a few inputs from an advertiser, the company claims. The studio also offers brands ready-to-use videos for ad campaigns based on their TikTok Ads Manager assets or product information.

TikTok's new AI video generator
Image Credits: TikTok

The new “Symphony Assistant” is an AI assistant that is designed to help advertisers enhance their campaigns by helping them generate and refine scripts, and provide recommendations on best practices. 

For instance, brands can ask the assistant to write a few attention-grabbing lines for their new lipstick launch. They can also ask the assistant to show them what’s currently trending on TikTok or to generate some ideas for promoting a new product in a specific industry. 

TikTok’s new “Symphony Ads Manager Integration” can help brands automatically fix and optimize a brand’s current videos. The tool can be used to spruce up videos that a brand has already created to make it stand out more. 

Image Credits: TikTok

In addition, TikTok is launching a centralized destination for marketers called “TikTok One” where they will be able to access nearly two million creators, discover agency partners and leverage TikTok’s creative tools.

TikTok is also introducing new performance solutions with the help of predictive AI to help advertisers drive more sales. Advertisers will be able to input their budgets and goals to determine the best creative asset and the right audience for their campaign.  

As part of the announcement, the company revealed that 61% of users have made a purchase either directly on TikTok or after seeing an ad. TikTok also said that 59% of users use TikTok to decide which game to download next and that 52% of users even research cars because of TikTok content they have seen.

While TikTok is seeing success with its ads business and building it out as it chases more ad dollars, the company faces a potential hurdle in the year ahead. The fate of the app’s future in the U.S. is uncertain as President Joe Biden signed a bill last month that would ban TikTok if its parent company ByteDance does not sell the app. If the app does get banned in the U.S., other tech companies and startups may have the potential to gain ground in its gaping absence.

We’re launching an AI newsletter! Sign up here to start receiving it in your inboxes on June 5.