Runway's new video-generating AI, Gen-3, offers improved controls

Image Credits: Runway

The race to high-quality, AI-generated videos is heating up.

On Monday, Runway, a company building generative AI tools geared toward film and image content creators, unveiled Gen-3 Alpha. The company’s latest AI model generates video clips from text descriptions and still images. Runway says the model delivers a “major” improvement in generation speed and fidelity over Runway’s previous flagship video model, Gen-2, as well as fine-grained controls over the structure, style and motion of the videos that it creates.

Gen-3 will be available in the coming days for Runway subscribers, including enterprise customers and creators in Runway’s creative partners program.

“Gen-3 Alpha excels at generating expressive human characters with a wide range of actions, gestures and emotions,” Runway wrote in a post on its blog. “It was designed to interpret a wide range of styles and cinematic terminology [and enable] imaginative transitions and precise key-framing of elements in the scene.”

Gen-3 Alpha has its limitations, including the fact that its footage maxes out at 10 seconds. However, Runway co-founder Anastasis Germanidis promises that Gen-3 is only the first — and smallest — of several video-generating models to come in a next-gen model family trained on upgraded infrastructure.

“The model can struggle with complex character and object interactions, and generations don’t always follow the laws of physics precisely,” Germanidis told TechCrunch this morning in an interview. “This initial rollout will support 5- and 10-second high-resolution generations, with noticeably faster generation times than Gen-2. A 5-second clip takes 45 seconds to generate, and a 10-second clip takes 90 seconds to generate.”

Gen-3 Alpha, like all video-generating models, was trained on a vast number of examples of videos — and images — so it could “learn” the patterns in these examples to generate new clips. Where did the training data come from? Runway wouldn’t say. Few generative AI vendors volunteer such information these days, partly because they see training data as a competitive advantage and thus keep it and info relating to it close to the chest.

“We have an in-house research team that oversees all of our training and we use curated, internal datasets to train our models,” Germanidis said. He left it at that.

Runway Gen-3
A sample from Runway’s Gen-3 model. Note the blurriness and low resolution is from a video-to-GIF conversion tool TechCrunch used, not Gen-3.
Image Credits: Runway

Training data details are also a potential source of IP-related lawsuits if the vendor trained on public data, including copyrighted data from the web — and so another disincentive to reveal much. Several cases making their way through the courts reject vendors’ fair use training data defenses, arguing that generative AI tools replicate artists’ styles without the artists’ permission and let users generate new works resembling artists’ originals for which artists receive no payment.

Runway addressed the copyright issue somewhat, saying that it consulted with artists in developing the model. (Which artists? Not clear.) That mirrors what Germanidis told me during a fireside at TechCrunch’s Disrupt conference in 2023:

“We’re working closely with artists to figure out what the best approaches are to address this,” he said. “We’re exploring various data partnerships to be able to further grow … and build the next generation of models.”

Runway also says that it plans to release Gen-3 with a new set of safeguards, including a moderation system to block attempts to generate videos from copyrighted images and content that doesn’t agree with Runway’s terms of service. Also in the works is a provenance system — compatible with the C2PA standard, which is backed by Microsoft, Adobe, OpenAI and others — to identify that videos came from Gen-3.

“Our new and improved in-house visual and text moderation system employs automatic oversight to filter out inappropriate or harmful content,” Germanidis said. “C2PA authentication verifies the provenance and authenticity of the media created with all Gen-3 models. As model capabilities and the ability to generate high-fidelity content increases, we will continue to invest significantly on our alignment and safety efforts.”

Runway Gen-3
Image Credits: Runway

Runway has also revealed that it’s partnered and collaborated with “leading entertainment and media organizations” to create custom versions of Gen-3 that allow for more “stylistically controlled” and consistent characters, targeting “specific artistic and narrative requirements.” The company adds: “This means that the characters, backgrounds, and elements generated can maintain a coherent appearance and behavior across various scenes.”

A major unsolved problem with video-generating models is control — that is, getting a model to generate consistent video aligned with a creator’s artistic intentions. As my colleague Devin Coldewey recently wrote, simple matters in traditional filmmaking, like choosing a color in a character’s clothing, require workarounds with generative models because each shot is created independently of the others. Sometimes not even workarounds do the trick — leaving extensive manual work for editors.

Runway has raised over $236.5 million from investors, including Google (with whom it has cloud compute credits) and Nvidia, as well as VCs such as Amplify Partners, Felicis and Coatue. The company has aligned itself closely with the creative industry as its investments in generative AI tech grow. Runway operates Runway Studios, an entertainment division that serves as a production partner for enterprise clientele, and hosts the AI Film Festival, one of the first events dedicated to showcasing films produced wholly — or in part — by AI.

But the competition is getting fiercer.

Runway Gen-3
Image Credits: Runway

Generative AI startup Luma last week announced Dream Machine, a video generator that’s gone viral for its aptitude at animating memes. And just a couple of months ago, Adobe revealed that it’s developing its own video-generating model trained on content in its Adobe Stock media library.

Elsewhere, there’s incumbents like OpenAI’s Sora, which remains tightly gated but which OpenAI has been seeding with marketing agencies and indie and Hollywood film directors. (OpenAI CTO Mira Murati was in attendance at the 2024 Cannes Film Festival.) This year’s Tribeca Festival — which also has a partnership with Runway to curate movies made using AI tools — featured short films produced with Sora by directors who were given early access.

Google has also put its image-generating model, Veo, in the hands of select creators, including Donald Glover (aka Childish Gambino) and his creative agency Gilga, as it works to bring Veo into products like YouTube Shorts.

However the various collaborations shake out, one thing’s becoming clear: Generative AI video tools threaten to upend the film and TV industry as we know it.

Runway Gen-3
Image Credits: Runway

Filmmaker Tyler Perry recently said that he suspended a planned $800 million expansion of his production studio after seeing what Sora could do. Joe Russo, the director of tentpole Marvel films like “Avengers: Endgame,” predicts that within a year, AI will be able to create a full-fledged movie.

A 2024 study commissioned by the Animation Guild, a union representing Hollywood animators and cartoonists, found that 75% of film production companies that have adopted AI have reduced, consolidated or eliminated jobs after incorporating the tech. The study also estimates that by 2026, more than 100,000 of U.S. entertainment jobs will be disrupted by generative AI.

It’ll take some seriously strong labor protections to ensure that video-generating tools don’t follow in the footsteps of other generative AI tech and lead to steep declines in the demand for creative work.

A new Chinese video-generating model appears to be censoring politically sensitive topics

Image Credits: Photo by VCG/VCG via Getty Images

A powerful new video-generating AI model became widely available today — but there’s a catch: The model appears to be censoring topics deemed too politically sensitive by the government in its country of origin, China.

The model, Kling, developed by Beijing-based company Kuaishou, launched in waitlisted access earlier in the year for users with a Chinese phone number. Today, it rolled out for anyone willing to provide their email. After signing up, users can enter prompts to have the model generate five-second videos of what they’ve described.

Kling works pretty much as advertised. Its 720p videos, which take a minute or two to generate, don’t deviate too far from the prompts. And Kling appears to simulate physics, like the rustling of leaves and flowing water, about as well as video-generating models like AI startup Runway’s Gen-3 and OpenAI’s Sora.

But Kling outright won’t generate clips about certain subjects. Prompts like “Democracy in China,” “Chinese President Xi Jinping walking down the street” and “Tiananmen Square protests” yield a nonspecific error message.

Kling AI
Image Credits: Kuaishou

The filtering appears to be happening only at the prompt level. Kling supports animating still images, and it’ll uncomplainingly generate a video of a portrait of Jinping, for example, as long as the accompanying prompt doesn’t mention Jinping by name (e.g., “This man giving a speech”).

We’ve reached out to Kuaishou for comment.

Kling AI
Image Credits: Kuaishou

Kling’s curious behavior is likely the result of intense political pressure from the Chinese government on generative AI projects in the region.

Earlier this month, the Financial Times reported that AI models in China will be tested by China’s leading internet regulator, the Cyberspace Administration of China (CAC), to ensure that their responses on sensitive topics “embody core socialist values.” Models are to be benchmarked by CAC officials for their responses to a variety of queries, per the Financial Times report — many related to Jinping and criticism of the Communist Party.

Reportedly, the CAC has gone so far as to propose a blacklist of sources that can’t be used to train AI models. Companies submitting models for review must prepare tens of thousands of questions designed to test whether the models produce “safe” answers.

The result is AI systems that decline to respond on topics that might raise the ire of Chinese regulators. Last year, the BBC found that Ernie, Chinese company Baidu’s flagship AI chatbot model, demurred and deflected when asked questions that might be perceived as politically controversial, like “Is Xinjiang a good place?” or “Is Tibet a good place?”

The draconian policies threaten to slow China’s AI advances. Not only do they require scouring data to remove politically sensitive info, but they also necessitate investing an enormous amount of dev time in creating ideological guardrails — guardrails that might still fail, as Kling exemplifies.

From a user perspective, China’s AI regulations are already leading to two classes of models: some hamstrung by intensive filtering and others decidedly less so. Is that really a good thing for the broader AI ecosystem?

Runway's new video-generating AI, Gen-3, offers improved controls

Image Credits: Runway

The race to high-quality, AI-generated videos is heating up.

On Monday, Runway, a company building generative AI tools geared toward film and image content creators, unveiled Gen-3 Alpha. The company’s latest AI model generates video clips from text descriptions and still images. Runway says the model delivers a “major” improvement in generation speed and fidelity over Runway’s previous flagship video model, Gen-2, as well as fine-grained controls over the structure, style and motion of the videos that it creates.

Gen-3 will be available in the coming days for Runway subscribers, including enterprise customers and creators in Runway’s creative partners program.

“Gen-3 Alpha excels at generating expressive human characters with a wide range of actions, gestures and emotions,” Runway wrote in a post on its blog. “It was designed to interpret a wide range of styles and cinematic terminology [and enable] imaginative transitions and precise key-framing of elements in the scene.”

Gen-3 Alpha has its limitations, including the fact that its footage maxes out at 10 seconds. However, Runway co-founder Anastasis Germanidis promises that Gen-3 is only the first — and smallest — of several video-generating models to come in a next-gen model family trained on upgraded infrastructure.

“The model can struggle with complex character and object interactions, and generations don’t always follow the laws of physics precisely,” Germanidis told TechCrunch this morning in an interview. “This initial rollout will support 5- and 10-second high-resolution generations, with noticeably faster generation times than Gen-2. A 5-second clip takes 45 seconds to generate, and a 10-second clip takes 90 seconds to generate.”

Gen-3 Alpha, like all video-generating models, was trained on a vast number of examples of videos — and images — so it could “learn” the patterns in these examples to generate new clips. Where did the training data come from? Runway wouldn’t say. Few generative AI vendors volunteer such information these days, partly because they see training data as a competitive advantage and thus keep it and info relating to it close to the chest.

“We have an in-house research team that oversees all of our training and we use curated, internal datasets to train our models,” Germanidis said. He left it at that.

Runway Gen-3
A sample from Runway’s Gen-3 model. Note the blurriness and low resolution is from a video-to-GIF conversion tool TechCrunch used, not Gen-3.
Image Credits: Runway

Training data details are also a potential source of IP-related lawsuits if the vendor trained on public data, including copyrighted data from the web — and so another disincentive to reveal much. Several cases making their way through the courts reject vendors’ fair use training data defenses, arguing that generative AI tools replicate artists’ styles without the artists’ permission and let users generate new works resembling artists’ originals for which artists receive no payment.

Runway addressed the copyright issue somewhat, saying that it consulted with artists in developing the model. (Which artists? Not clear.) That mirrors what Germanidis told me during a fireside at TechCrunch’s Disrupt conference in 2023:

“We’re working closely with artists to figure out what the best approaches are to address this,” he said. “We’re exploring various data partnerships to be able to further grow … and build the next generation of models.”

Runway also says that it plans to release Gen-3 with a new set of safeguards, including a moderation system to block attempts to generate videos from copyrighted images and content that doesn’t agree with Runway’s terms of service. Also in the works is a provenance system — compatible with the C2PA standard, which is backed by Microsoft, Adobe, OpenAI and others — to identify that videos came from Gen-3.

“Our new and improved in-house visual and text moderation system employs automatic oversight to filter out inappropriate or harmful content,” Germanidis said. “C2PA authentication verifies the provenance and authenticity of the media created with all Gen-3 models. As model capabilities and the ability to generate high-fidelity content increases, we will continue to invest significantly on our alignment and safety efforts.”

Runway Gen-3
Image Credits: Runway

Runway has also revealed that it’s partnered and collaborated with “leading entertainment and media organizations” to create custom versions of Gen-3 that allow for more “stylistically controlled” and consistent characters, targeting “specific artistic and narrative requirements.” The company adds: “This means that the characters, backgrounds, and elements generated can maintain a coherent appearance and behavior across various scenes.”

A major unsolved problem with video-generating models is control — that is, getting a model to generate consistent video aligned with a creator’s artistic intentions. As my colleague Devin Coldewey recently wrote, simple matters in traditional filmmaking, like choosing a color in a character’s clothing, require workarounds with generative models because each shot is created independently of the others. Sometimes not even workarounds do the trick — leaving extensive manual work for editors.

Runway has raised over $236.5 million from investors, including Google (with whom it has cloud compute credits) and Nvidia, as well as VCs such as Amplify Partners, Felicis and Coatue. The company has aligned itself closely with the creative industry as its investments in generative AI tech grow. Runway operates Runway Studios, an entertainment division that serves as a production partner for enterprise clientele, and hosts the AI Film Festival, one of the first events dedicated to showcasing films produced wholly — or in part — by AI.

But the competition is getting fiercer.

Runway Gen-3
Image Credits: Runway

Generative AI startup Luma last week announced Dream Machine, a video generator that’s gone viral for its aptitude at animating memes. And just a couple of months ago, Adobe revealed that it’s developing its own video-generating model trained on content in its Adobe Stock media library.

Elsewhere, there’s incumbents like OpenAI’s Sora, which remains tightly gated but which OpenAI has been seeding with marketing agencies and indie and Hollywood film directors. (OpenAI CTO Mira Murati was in attendance at the 2024 Cannes Film Festival.) This year’s Tribeca Festival — which also has a partnership with Runway to curate movies made using AI tools — featured short films produced with Sora by directors who were given early access.

Google has also put its image-generating model, Veo, in the hands of select creators, including Donald Glover (aka Childish Gambino) and his creative agency Gilga, as it works to bring Veo into products like YouTube Shorts.

However the various collaborations shake out, one thing’s becoming clear: Generative AI video tools threaten to upend the film and TV industry as we know it.

Runway Gen-3
Image Credits: Runway

Filmmaker Tyler Perry recently said that he suspended a planned $800 million expansion of his production studio after seeing what Sora could do. Joe Russo, the director of tentpole Marvel films like “Avengers: Endgame,” predicts that within a year, AI will be able to create a full-fledged movie.

A 2024 study commissioned by the Animation Guild, a union representing Hollywood animators and cartoonists, found that 75% of film production companies that have adopted AI have reduced, consolidated or eliminated jobs after incorporating the tech. The study also estimates that by 2026, more than 100,000 of U.S. entertainment jobs will be disrupted by generative AI.

It’ll take some seriously strong labor protections to ensure that video-generating tools don’t follow in the footsteps of other generative AI tech and lead to steep declines in the demand for creative work.

A new Chinese video-generating model appears to be censoring politically sensitive topics

Image Credits: Photo by VCG/VCG via Getty Images

A powerful new video-generating AI model became widely available today — but there’s a catch: The model appears to be censoring topics deemed too politically sensitive by the government in its country of origin, China.

The model, Kling, developed by Beijing-based company Kuaishou, launched in waitlisted access earlier in the year for users with a Chinese phone number. Today, it rolled out for anyone willing to provide their email. After signing up, users can enter prompts to have the model generate five-second videos of what they’ve described.

Kling works pretty much as advertised. Its 720p videos, which take a minute or two to generate, don’t deviate too far from the prompts. And Kling appears to simulate physics, like the rustling of leaves and flowing water, about as well as video-generating models like AI startup Runway’s Gen-3 and OpenAI’s Sora.

But Kling outright won’t generate clips about certain subjects. Prompts like “Democracy in China,” “Chinese President Xi Jinping walking down the street” and “Tiananmen Square protests” yield a nonspecific error message.

Kling AI
Image Credits: Kuaishou

The filtering appears to be happening only at the prompt level. Kling supports animating still images, and it’ll uncomplainingly generate a video of a portrait of Jinping, for example, as long as the accompanying prompt doesn’t mention Jinping by name (e.g., “This man giving a speech”).

We’ve reached out to Kuaishou for comment.

Kling AI
Image Credits: Kuaishou

Kling’s curious behavior is likely the result of intense political pressure from the Chinese government on generative AI projects in the region.

Earlier this month, the Financial Times reported that AI models in China will be tested by China’s leading internet regulator, the Cyberspace Administration of China (CAC), to ensure that their responses on sensitive topics “embody core socialist values.” Models are to be benchmarked by CAC officials for their responses to a variety of queries, per the Financial Times report — many related to Jinping and criticism of the Communist Party.

Reportedly, the CAC has gone so far as to propose a blacklist of sources that can’t be used to train AI models. Companies submitting models for review must prepare tens of thousands of questions designed to test whether the models produce “safe” answers.

The result is AI systems that decline to respond on topics that might raise the ire of Chinese regulators. Last year, the BBC found that Ernie, Chinese company Baidu’s flagship AI chatbot model, demurred and deflected when asked questions that might be perceived as politically controversial, like “Is Xinjiang a good place?” or “Is Tibet a good place?”

The draconian policies threaten to slow China’s AI advances. Not only do they require scouring data to remove politically sensitive info, but they also necessitate investing an enormous amount of dev time in creating ideological guardrails — guardrails that might still fail, as Kling exemplifies.

From a user perspective, China’s AI regulations are already leading to two classes of models: some hamstrung by intensive filtering and others decidedly less so. Is that really a good thing for the broader AI ecosystem?

Runway Gen-3

Runway's new video-generating AI, Gen-3, offers improved controls

Runway Gen-3

Image Credits: Runway

The race to high-quality, AI-generated videos is heating up.

On Monday, Runway, a company building generative AI tools geared toward film and image content creators, unveiled Gen-3 Alpha. The company’s latest AI model generates video clips from text descriptions and still images. Runway says the model delivers a “major” improvement in generation speed and fidelity over Runway’s previous flagship video model, Gen-2, as well as fine-grained controls over the structure, style and motion of the videos that it creates.

Gen-3 will be available in the coming days for Runway subscribers, including enterprise customers and companies in Runway’s creative partners program.

“Gen-3 Alpha excels at generating expressive human characters with a wide range of actions, gestures and emotions,” Runway wrote in a post on its blog. “It was designed to interpret a wide range of styles and cinematic terminology [and enable] imaginative transitions and precise key-framing of elements in the scene.”

Gen-3 Alpha has its limitations, including the fact that its footage maxes out at 10 seconds. However, Runway co-founder Anastasis Germanidis promises that Gen-3 is only the first — and smallest — of several video-generating models to come in a next-gen model family trained on upgraded infrastructure.

“The model can struggle with complex character and object interactions, and generations don’t always follow the laws of physics precisely,” Germanidis told TechCrunch this morning in an interview. “This initial rollout will support 5- and 10-second high-resolution generations, with noticeably faster generation times than Gen-2. A 5-second clip takes 45 seconds to generate, and a 10-second clip takes 90 seconds to generate.”

Gen-3 Alpha, like all video-generating models, was trained on a vast number of examples of videos — and images — so it could “learn” the patterns in these examples to generate new clips. Where did the training data come from? Runway wouldn’t say. Few generative AI vendors volunteer such information these days, partly because they see training data as a competitive advantage and thus keep it and info relating to it close to the chest.

“We have an in-house research team that oversees all of our training and we use curated, internal datasets to train our models,” Germanidis said. He left it at that.

Runway Gen-3
A sample from Runway’s Gen-3 model. Note the blurriness and low resolution is from a video-to-GIF conversion tool TechCrunch used, not Gen-3.
Image Credits: Runway

Training data details are also a potential source of IP-related lawsuits if the vendor trained on public data, including copyrighted data from the web — and so another disincentive to reveal much. Several cases making their way through the courts reject vendors’ fair use training data defenses, arguing that generative AI tools replicate artists’ styles without the artists’ permission and let users generate new works resembling artists’ originals for which artists receive no payment.

Runway addressed the copyright issue somewhat, saying that it consulted with artists in developing the model. (Which artists? Not clear.) That mirrors what Germanidis told me during a fireside at TechCrunch’s Disrupt conference in 2023:

“We’re working closely with artists to figure out what the best approaches are to address this,” he said. “We’re exploring various data partnerships to be able to further grow … and build the next generation of models.”

Runway also says that it plans to release Gen-3 with a new set of safeguards, including a moderation system to block attempts to generate videos from copyrighted images and content that doesn’t agree with Runway’s terms of service. Also in the works is a provenance system — compatible with the C2PA standard, which is backed by Microsoft, Adobe, OpenAI and others — to identify that videos came from Gen-3.

“Our new and improved in-house visual and text moderation system employs automatic oversight to filter out inappropriate or harmful content,” Germanidis said. “C2PA authentication verifies the provenance and authenticity of the media created with all Gen-3 models. As model capabilities and the ability to generate high-fidelity content increases, we will continue to invest significantly on our alignment and safety efforts.”

Runway Gen-3
Image Credits: Runway

Runway has also revealed that it’s partnered and collaborated with “leading entertainment and media organizations” to create custom versions of Gen-3 that allow for more “stylistically controlled” and consistent characters, targeting “specific artistic and narrative requirements.” The company adds: “This means that the characters, backgrounds, and elements generated can maintain a coherent appearance and behavior across various scenes.”

A major unsolved problem with video-generating models is control — that is, getting a model to generate consistent video aligned with a creator’s artistic intentions. As my colleague Devin Coldewey recently wrote, simple matters in traditional filmmaking, like choosing a color in a character’s clothing, require workarounds with generative models because each shot is created independently of the others. Sometimes not even workarounds do the trick — leaving extensive manual work for editors.

Runway has raised over $236.5 million from investors, including Google (with whom it has cloud compute credits) and Nvidia, as well as VCs such as Amplify Partners, Felicis and Coatue. The company has aligned itself closely with the creative industry as its investments in generative AI tech grow. Runway operates Runway Studios, an entertainment division that serves as a production partner for enterprise clientele, and hosts the AI Film Festival, one of the first events dedicated to showcasing films produced wholly — or in part — by AI.

But the competition is getting fiercer.

Runway Gen-3
Image Credits: Runway

Generative AI startup Luma last week announced Dream Machine, a video generator that’s gone viral for its aptitude at animating memes. And just a couple of months ago, Adobe revealed that it’s developing its own video-generating model trained on content in its Adobe Stock media library.

Elsewhere, there’s incumbents like OpenAI’s Sora, which remains tightly gated but which OpenAI has been seeding with marketing agencies and indie and Hollywood film directors. (OpenAI CTO Mira Murati was in attendance at the 2024 Cannes Film Festival.) This year’s Tribeca Festival — which also has a partnership with Runway to curate movies made using AI tools — featured short films produced with Sora by directors who were given early access.

Google has also put its image-generating model, Veo, in the hands of select creators, including Donald Glover (aka Childish Gambino) and his creative agency Gilga, as it works to bring Veo into products like YouTube Shorts.

However the various collaborations shake out, one thing’s becoming clear: Generative AI video tools threaten to upend the film and TV industry as we know it.

Runway Gen-3
Image Credits: Runway

Filmmaker Tyler Perry recently said that he suspended a planned $800 million expansion of his production studio after seeing what Sora could do. Joe Russo, the director of tentpole Marvel films like “Avengers: Endgame,” predicts that within a year, AI will be able to create a full-fledged movie.

A 2024 study commissioned by the Animation Guild, a union representing Hollywood animators and cartoonists, found that 75% of film production companies that have adopted AI have reduced, consolidated or eliminated jobs after incorporating the tech. The study also estimates that by 2026, more than 100,000 of U.S. entertainment jobs will be disrupted by generative AI.

It’ll take some seriously strong labor protections to ensure that video-generating tools don’t follow in the footsteps of other generative AI tech and lead to steep declines in the demand for creative work.

OpenAI's Sora video-generating model can render video games, too

OpenAI Sora Minecraft

Image Credits: OpenAI

OpenAI’s new — and first! — video-generating model, Sora, can pull off some genuinely impressive cinematographic feats. But the model’s even more capable than OpenAI initially made it out to be, at least judging by a technical paper published this evening.

The paper, titled “Video generation models as world simulators,” co-authored by a host of OpenAI researchers, peels back the curtains on key aspects of Sora’s architecture — for instance revealing that Sora can generate videos of an arbitrary resolution and aspect ratio (up to 1080p). Per the paper, Sora’s able to perform a range of image and video editing tasks, from creating looping videos to extending videos forwards or backwards in time to changing the background in an existing video.

But most intriguing to this writer is Sora’s ability to “simulate digital worlds,” as the OpenAI co-authors put it. In an experiment, OpenAI fed Sora prompts containing the word “Minecraft” and had it render a convincingly Minecraft-like HUD and game — and the game’s dynamics, including physics — while simultaneously controlling the player character.

So how’s Sora able to do this? Well, as observed by senior Nvidia researcher Jim Fan (via Quartz), Sora’s more of a “data-driven physics engine” than a creative too. It’s not just generating a single photo or video, but determining the physics of each object in an environment — and rendering a photo or video (or interactive 3D world, as the case may be) based on these calculations.

“These capabilities suggest that continued scaling of video models is a promising path towards the development of highly-capable simulators of the physical and digital world, and the objects, animals and people that live within them,” the OpenAI co-authors write.

Now, Sora’s usual limitations apply in the video game domain. The model can’t accurately approximate the physics of basic interactions like glass shattering. And even with interactions it can model, Sora’s often inconsistent — for example rendering a person eating a burger but failing to render bite marks.

Still, if I’m reading the paper correctly, it seems Sora could pave the way for more realistic — perhaps even photorealistic — procedurally generated games from text descriptions alone. That’s in equal parts exciting and terrifying (consider the deepfake implications, for one) — which is probably why OpenAI’s choosing to gate Sora behind a very limited access program for now.

Here’s hoping we learn more sooner rather than later.

OpenAI’s newest model Sora can generate videos — and they look decent

software engineer working on laptop with circuit board

StarCoder 2 is a code-generating AI that runs on most GPUs

software engineer working on laptop with circuit board

Image Credits: Tippapatt / Getty Images

Developers are adopting AI-powered code generators — services like GitHub Copilot and Amazon CodeWhisperer, along with open access models such as Meta’s Code Llama — at an astonishing rate. But the tools are far from ideal. Many aren’t free. Others are, but only under licenses that preclude them from being used in common commercial contexts.

Perceiving the demand for alternatives, AI startup Hugging Face several years ago teamed up with ServiceNow, the workflow automation platform, to create StarCoder, an open source code generator with a less restrictive license than some of the others out there. The original came online early last year, and work has been underway on a follow-up, StarCoder 2, ever since.

StarCoder 2 isn’t a single code-generating model, but rather a family. Released today, it comes in three variants, the first two of which can run on most modern consumer GPUs:

A 3-billion-parameter (3B) model trained by ServiceNowA 7-billion-parameter (7B) model trained by Hugging FaceA 15-billion-parameter (15B) model trained by Nvidia, the newest supporter of the StarCoder project

(Note that “parameters” are the parts of a model learned from training data and essentially define the skill of the model on a problem, in this case generating code.)

Like most other code generators, StarCoder 2 can suggest ways to complete unfinished lines of code as well as summarize and retrieve snippets of code when asked in natural language. Trained with 4x more data than the original StarCoder (67.5 terabytes versus 6.4 terabytes), StarCoder 2 delivers what Hugging Face, ServiceNow and Nvidia characterize as “significantly” improved performance at lower costs to operate.

StarCoder 2 can be fine-tuned “in a few hours” using a GPU like the Nvidia A100 on first- or third-party data to create apps such as chatbots and personal coding assistants. And, because it was trained on a larger and more diverse data set than the original StarCoder (~619 programming languages), StarCoder 2 can make more accurate, context-aware predictions — at least hypothetically.

“StarCoder 2 was created especially for developers who need to build applications quickly,” Harm de Vries, head of ServiceNow’s StarCoder 2 development team, told TechCrunch in an interview. “With StarCoder2, developers can use its capabilities to make coding more efficient without sacrificing speed or quality.”

Now, I’d venture to say that not every developer would agree with de Vries on the speed and quality points. Code generators promise to streamline certain coding tasks — but at a cost.

A recent Stanford study found that engineers who use code-generating systems are more likely to introduce security vulnerabilities in the apps they develop. Elsewhere, a poll from Sonatype, the cybersecurity firm, shows that the majority of developers are concerned about the lack of insight into how code from code generators is produced and “code sprawl” from generators producing too much code to manage.

StarCoder 2’s license might also prove to be a roadblock for some.

StarCoder 2 is licensed under the BigCode Open RAIL-M 1.0, which aims to promote responsible use by imposing “light touch” restrictions on both model licensees and downstream users. While less constraining than many other licenses, RAIL-M isn’t truly “open” in the sense that it doesn’t permit developers to use StarCoder 2 for every conceivable application (medical advice-giving apps are strictly off limits, for example). Some commentators say RAIL-M’s requirements may be too vague to comply with in any case — and that RAIL-M could conflict with AI-related regulations like the EU AI Act.

In response to the above criticism, a Hugging Face spokesperson had this to say via an emailed statement: “The license was carefully engineered to maximize compliance with current laws and regulations.”

Setting all this aside for a moment, is StarCoder 2 really superior to the other code generators out there — free or paid?

Depending on the benchmark, it appears to be more efficient than one of the versions of Code Llama, Code Llama 33B. Hugging Face says that StarCoder 2 15B matches Code Llama 33B on a subset of code completion tasks at twice the speed. It’s not clear which tasks; Hugging Face didn’t specify.

StarCoder 2, as an open source collection of models, also has the advantage of being able to deploy locally and “learn” a developer’s source code or codebase — an attractive prospect to devs and companies wary of exposing code to a cloud-hosted AI. In a 2023 survey from Portal26 and CensusWide, 85% of businesses said that they were wary of adopting GenAI like code generators due to the privacy and security risks — like employees sharing sensitive information or vendors training on proprietary data.

Hugging Face, ServiceNow and Nvidia also make the case that StarCoder 2 is more ethical — and less legally fraught — than its rivals.

All GenAI models regurgitate — in other words, spit out a mirror copy of data they were trained on. It doesn’t take an active imagination to see why this might land a developer in trouble. With code generators trained on copyrighted code, it’s entirely possible that, even with filters and additional safeguards in place, the generators could unwittingly recommend copyrighted code and fail to label it as such.

A few vendors, including GitHub, Microsoft (GitHub’s parent company) and Amazon, have pledged to provide legal coverage in situations where a code generator customer is accused of violating copyright. But coverage varies vendor-to-vendor and is generally limited to corporate clientele.

As opposed to code generators trained using copyrighted code (GitHub Copilot, among others), StarCoder 2 was trained only on data under license from the Software Heritage, the nonprofit organization providing archival services for code. Ahead of StarCoder 2’s training, BigCode, the cross-organizational team behind much of StarCoder 2’s roadmap, gave code owners a chance to opt out of the training set if they wanted.

As with the original StarCoder, StarCoder 2’s training data is available for developers to fork, reproduce or audit as they please.

Leandro von Werra, a Hugging Face machine learning engineer and co-lead of BigCode, pointed out that while there’s been a proliferation of open code generators recently, few have been accompanied by information about the data that went into training them and, indeed, how they were trained.

“From a scientific standpoint, an issue is that training is not reproducible, but also as a data producer (i.e. someone uploading their code to GitHub), you don’t know if and how your data was used,” von Werra said in an interview. “StarCoder 2 addresses this issue by being fully transparent across the whole training pipeline from scraping pretraining data to the training itself.”

StarCoder 2 isn’t perfect, that said. Like other code generators, it’s susceptible to bias. De Vries notes that it can generate code with elements that reflect stereotypes about gender and race. And because StarCoder 2 was trained on predominantly English-language comments, Python and Java code, it performs weaker on languages other than English and “lower-resource” code like Fortran and Haskell.

Still, von Werra asserts it’s a step in the right direction.

“We strongly believe that building trust and accountability with AI models requires transparency and auditability of the full model pipeline including training data and training recipe,” he said. “StarCoder 2 [showcases] how fully open models can deliver competitive performance.”

You might be wondering — as was this writer — what incentive Hugging Face, ServiceNow and Nvidia have to invest in a project like StarCoder 2. They’re businesses, after all — and training models isn’t cheap.

So far as I can tell, it’s a tried-and-true strategy: foster goodwill and build paid services on top of the open source releases.

ServiceNow has already used StarCoder to create Now LLM, a product for code generation fine-tuned for ServiceNow workflow patterns, use cases and processes. Hugging Face, which offers model implementation consulting plans, is providing hosted versions of the StarCoder 2 models on its platform. So is Nvidia, which is making StarCoder 2 available through an API and web front-end.

For devs expressly interested in the no-cost offline experience, StarCoder 2 — the models, source code and more — can be downloaded from the project’s GitHub page.

Google's image-generating AI gets an upgrade

The Google Inc. logo

Image Credits: David Paul Morris/Bloomberg / Getty Images

Google’s upgrading its image-generation tech to keep apace with rivals. 

At the company’s I/O developer conference in Mountain View on Tuesday, Google announced Imagen 3, the latest in the tech giant’s Imagen generative AI model family.

Demis Hassabis, CEO of DeepMind, Google’s AI research division, said that Imagen 3 more accurately understands the text prompts that it translates into images versus its predecessor, Imagen 2, and is more “creative and detailed” in its generations. In addition, the model produces fewer “distracting artifacts” and errors, he said.

“This is [also] our best model yet for rendering text, which has been a challenge for image-generation models,” Hassabis added.

To allay concerns around the potential to create deepfakes, Google says that Imagen 3 will use SynthID, an approach developed by DeepMind to apply invisible, cryptographic watermarks to media.

Sign-ups for Imagen 3 in private preview are available in Google’s ImageFX tool, and Google says the model will “come soon” to devs and corporate customers using Vertex AI, Google’s enterprise generative AI development platform.

Google Imagen 3
Image Credits: Google

Google typically doesn’t reveal much about the source of the data it uses to train its AI models — and this time was no exception. There’s a reason for that. Much of the training data comes from public sites, repositories and datasets around the web. And some of that training data, specifically the copyrighted data scraped without permission from content creators, is a source of IP-related lawsuits.

Google’s web publisher controls allow webmasters to prevent the company from scraping data, including photos and videos, from their websites. But Google doesn’t offer an “opt-out” tool, and — unlike some of its rivals — the company hasn’t committed to compensating rights holders for their (in some cases unknowing) contributions to the training datasets.

The lack of transparency isn’t surprising. But it is disappointing — especially from a company with resources like Google’s.

We’re launching an AI newsletter! Sign up here to start receiving it in your inboxes on June 5.

Read more about Google I/O 2024 on TechCrunch