Snap previews its real-time image model that can generate AR experiences

A picture taken on October 1, 2019 in Lille shows the logo of mobile app Snapchat displayed on a tablet.

Image Credits: Denis Charlet/AFP (opens in a new window) / Getty Images

At the Augmented World Expo on Tuesday, Snap teased an early version of its real-time, on-device image diffusion model that can generate vivid AR experiences. The company also unveiled generative AI tools for AR creators.

Snap co-founder and CTO Bobby Murphy said onstage that the model is small enough to run on a smartphone and fast enough to re-render frames in real time, guided by a text prompt.

Murphy said that while the emergence of generative AI image diffusion models has been exciting, these models need to be significantly faster for them to be impactful for augmented reality, which is why its teams have been working to accelerate machine learning models. 

Snapchat users will start to see Lenses with this generative model in the coming months, and Snap plans to bring it to creators by the end of the year. 

Image Credits: Snap

“This and future real time on device generative ML models speak to an exciting new direction for augmented reality, and is giving us space to reconsider how we imagine rendering and creating AR experiences altogether,” Murphy said.

Murphy also announced that Lens Studio 5.0 is launching today for developers with access to new generative AI tools that will help them create AR effects much faster than currently possible, saving them weeks and even months. 

AR creators can create selfie Lenses by generating highly realistic ML face effects. Plus, they can generate custom stylization effects that apply a realistic transformation over the user’s face, body and surroundings in real time. Creators can also generate a 3D asset in minutes and include it in their Lenses. 

In addition, AR creators can generate characters like aliens or wizards with a text or image prompt using the company’s Face Mesh technology. They can also generate face masks, texture and materials within minutes. 

The latest version of Lens Studio also includes an AI assistant that can answer questions that AR creators may have.

Snap previews its real-time image model that can generate AR experiences

A picture taken on October 1, 2019 in Lille shows the logo of mobile app Snapchat displayed on a tablet.

Image Credits: Denis Charlet/AFP (opens in a new window) / Getty Images

At the Augmented World Expo on Tuesday, Snap teased an early version of its real-time, on-device image diffusion model that can generate vivid AR experiences. The company also unveiled generative AI tools for AR creators.

Snap co-founder and CTO Bobby Murphy said onstage that the model is small enough to run on a smartphone and fast enough to re-render frames in real time, guided by a text prompt.

Murphy said that while the emergence of generative AI image diffusion models has been exciting, these models need to be significantly faster for them to be impactful for augmented reality, which is why its teams have been working to accelerate machine learning models. 

Snapchat users will start to see Lenses with this generative model in the coming months, and Snap plans to bring it to creators by the end of the year. 

Image Credits: Snap

“This and future real time on device generative ML models speak to an exciting new direction for augmented reality, and is giving us space to reconsider how we imagine rendering and creating AR experiences altogether,” Murphy said.

Murphy also announced that Lens Studio 5.0 is launching today for developers with access to new generative AI tools that will help them create AR effects much faster than currently possible, saving them weeks and even months. 

AR creators can create selfie Lenses by generating highly realistic ML face effects. Plus, they can generate custom stylization effects that apply a realistic transformation over the user’s face, body and surroundings in real time. Creators can also generate a 3D asset in minutes and include it in their Lenses. 

In addition, AR creators can generate characters like aliens or wizards with a text or image prompt using the company’s Face Mesh technology. They can also generate face masks, texture and materials within minutes. 

The latest version of Lens Studio also includes an AI assistant that can answer questions that AR creators may have.

Snap previews its real-time image model that can generate AR experiences

A picture taken on October 1, 2019 in Lille shows the logo of mobile app Snapchat displayed on a tablet.

Image Credits: Denis Charlet/AFP (opens in a new window) / Getty Images

At the Augmented World Expo on Tuesday, Snap teased an early version of its real-time, on-device image diffusion model that can generate vivid AR experiences. The company also unveiled generative AI tools for AR creators.

Snap co-founder and CTO Bobby Murphy said on-stage that the model is small enough to run on a smartphone and fast enough to re-render frames in real-time, guided by a text prompt.

Murphy said that while the emergence of generative AI image diffusion models have been exciting, these models need to be significantly faster for them to be impactful for augmented reality, which is why its teams have been working to accelerate machine learning models. 

Snapchat users will start to see Lenses with this generative model in the coming months, and Snap plans to bring it to creators by the end of the year. 

Image Credits: Snap

“This and future real time on device generative ML models speak to an exciting new direction for augmented reality, and is giving us space to reconsider how we imagine rendering and creating AR experiences altogether,” Murphy said.

Murphy also announced that Lens Studio 5.0 is launching today for developers with access to new generative AI tools that will help them create AR effects much faster than currently possible, saving them weeks and even months. 

AR creators can create selfie Lenses by generating highly-realistic ML face effects. Plus, they can generate custom stylization effects that apply a realistic transformation over the user’s face, body, and surroundings in real time. Creators can also generate a 3D asset in minutes and include it in their Lenses. 

In addition, AR creators can generate characters like aliens or wizards with a text or image prompt using the company’s Face Mesh technology. They can also generate face masks, texture and materials within minutes. 

The latest version of Lens Studio also includes an AI assistant that can answer questions that AR creators may have.

photoroom blur effect

Confirmed: Photoroom, the AI image editor, raised $43M at a $500M valuation

photoroom blur effect

Image Credits: PhotoRoom (opens in a new window) under a license.

Photoroom, the AI-based photo-editing app out of Paris that has been growing like a weed targeting people doing business online while also attracting a lot of casual users to boot, confirmed it closed its latest funding round: $43 million at a $500 million valuation, according to CEO and co-founder Matthieu Rouif, who co-founded Photoroom with CTO Eliot Andres.

We were the first to report that the round was in the works in January. At the time, it looked like it would be more than $50 million; it ended up a little lower.

The funding comes at a time when the company continues to see a lot of adoption amid a pretty competitive market, with other players including the likes of Picsart, which has raised nearly $200 million, and Pixelcut. Photoroom said that it’s currently processing some 5 billion images annually, with its app passing 150 million downloads (it’s also available by way of an API, and via a web interface).

Balderton Capital led the round with new backer Aglaé and previous backer Y Combinator also participating. Other investors are not being disclosed, but previous backers include Kima Ventures, FJ Labs, Meta and a number of angels such as Yann LeCun, Zehan Wang (formerly of Magic Pony and Twitter), execs from Hugging Face and Disney+, and many more. This latest round brings the total raised by the company, which was founded about four years ago, to $64 million.

Photoroom plans to use the funding to hire more people and to continue investing in its R&D and infrastructure. At a time when we are still seeing a lot of layoffs across the tech industry, Photoroom has around 50 employees now and wants to double that by the end of this year. 

Specifically, unlike a lot of startups in building AI applications, Photoroom has focused on training its own models from the ground up: That means the company needs to invest in compute power, and ink deals for image rights from agencies and creators. This is also where the hiring will come into play: It’s looking for more more technical talent to continue improving the efficiency and operation of those models. (Case in point: The company claims that its custom architecture speeds up image generation for users by up to 40% compared to other visual AI platforms.)

“The foundation model is the next step in empowering businesses to create amazing product photos without the need to be an expert at prompt engineering or photography,” said Rouif in a statement. “Our model has been trained to excel at product photography and can quickly adapt to user needs and feedback.”

Alongside its homegrown models, the company continues to roll out a number of new features. The funding announcement is coinciding with a new tool for creating product photography, Photoroom Instant Diffusion, which aims to create images that look consistently styled for a single seller, regardless of where and how they were shot (essentially so that the product images look like they were produced in a professional studio). Other features it offers include AI-generated backgrounds, scene expansions, AI-generated images and a plethora of image-editing tools. For those working with images in bulk, its tools can also process those automatically in one go.

We’re hoping to talk both to Rouif in person and hopefully an investor and we’ll update this post when and if we can reach them. (They didn’t contact us ahead of time!)

“Balderton has witnessed Photoroom’s remarkable journey from its inception, and we are continually impressed by their ability to lead and execute on their user-centric vision,” said Bernard Liautaud, managing partner at Balderton, in a statement. “Photoroom’s generative AI capabilities are unparalleled, and we have no doubt that they will continue to lead the way in this rapidly evolving landscape.”

Adobe Firefly

Adobe claims its new image-generation model is its best yet

Adobe Firefly

Image Credits: Adobe

Firefly, Adobe’s family of generative AI models, doesn’t have the best reputation among creatives.

The Firefly image-generation model in particular has been derided as underwhelming and flawed compared to Midjourney, OpenAI’s DALL-E 3, and other rivals, with a tendency to distort limbs and landscapes and miss the nuances in prompts. But Adobe is trying to right the ship with its third-generation model, Firefly Image 3, releasing this week during the company’s Max London conference.

The model, now available in Photoshop (beta) and Adobe’s Firefly web app, produces more “realistic” imagery than its predecessors (Image 1 and Image 2), thanks to an ability to understand longer, more complex prompts and scenes as well as improved lighting and text-generation capabilities. It should more accurately render things like typography, iconography, raster images and line art, says Adobe, and is “significantly” more adept at depicting dense crowds and people with “detailed features” and “a variety of moods and expressions.”

For what it’s worth, in my brief unscientific testing, Image 3 does appear to be a step up from Image 2.

I wasn’t able to try Image 3 myself. But Adobe PR sent a few outputs and prompts from the model, and I managed to run those same prompts through Image 2 on the web to get samples to compare the Image 3 outputs with. (Keep in mind that the Image 3 outputs could’ve been cherry-picked.)

Notice the lighting in this headshot from Image 3 compared to the one below it, from Image 2:

Adobe Firefly
From Image 3. Prompt: “Studio portrait of young woman.” Image Credits: Adobe
Adobe Firefly
Same prompt as above, from Image 2. Image Credits: Adobe

The Image 3 output looks more detailed and lifelike to my eyes, with shadowing and contrast that’s largely absent from the Image 2 sample.

Here’s a set of images showing Image 3’s scene understanding at play:

Adobe Firefly
From Image 3. Prompt: “An artist in her studio sitting at desk looking pensive with tons of paintings and ethereal.” Image Credits: Adobe
Adobe Firefly
“An artist in his studio sitting at desk looking pensive with tons of paintings and ethereal.” From Image 2. Image Credits: Adobe

Note the Image 2 sample is fairly basic compared to the output from Image 3 in terms of the level of detail — and overall expressiveness. There’s wonkiness going on with the subject in the Image 3 sample’s shirt (around the waist area), but the pose is more complex than the subject’s from Image 2. (And Image 2’s clothes are also a bit off.)

Some of Image 3’s improvements can no doubt be traced to a larger and more diverse training dataset.

Like Image 2 and Image 1, Image 3 is trained on uploads to Adobe Stock, Adobe’s royalty-free media library, along with licensed and public domain content for which the copyright has expired. Adobe Stock grows all the time, and consequently so, too, does the available training dataset.

In an effort to ward off lawsuits and position itself as a more “ethical” alternative to generative AI vendors who train on images indiscriminately (e.g., OpenAI, Midjourney), Adobe has a program to pay Adobe Stock contributors to the training dataset. (We’ll note that the terms of the program are rather opaque, though.) Controversially, Adobe also trains Firefly models on AI-generated images, which some consider a form of data laundering.

Recent Bloomberg reporting revealed AI-generated images in Adobe Stock aren’t excluded from Firefly image-generating models’ training data, a troubling prospect considering those images might contain regurgitated copyrighted material. Adobe has defended the practice, claiming that AI-generated images make up only a small portion of its training data and go through a moderation process to ensure they don’t depict trademarks or recognizable characters or reference artists’ names.

Of course, neither diverse, more “ethically” sourced training data nor content filters and other safeguards guarantee a perfectly flaw-free experience — see users generating people flipping the bird with Image 2. The real test of Image 3 will come once the community gets its hands on it.

New AI-powered features

Image 3 powers several new features in Photoshop beyond enhanced text-to-image.

A new “style engine” in Image 3, along with a new auto-stylization toggle, allows the model to generate a wider array of colors, backgrounds and subject poses. They feed into Reference Image, an option that lets users condition the model on an image whose colors or tone they want their future generated content to align with.

Three new generative tools — Generate Background, Generate Similar and Enhance Detail — leverage Image 3 to perform precision edits on images. The (self-descriptive) Generate Background replaces a background with a generated one that blends into the existing image, while Generate Similar offers variations on a selected portion of a photo (e.g., a person or an object). As for Enhance Detail, it “fine-tunes” images to improve sharpness and clarity.

If these features sound familiar, that’s because they’ve been in beta in the Firefly web app for at least a month (and Midjourney for much longer than that). This marks their Photoshop debut — in beta.

Speaking of the web app, Adobe isn’t neglecting this alternate route to its AI tools.

To coincide with the release of Image 3, the Firefly web app is getting Structure Reference and Style Reference, which Adobe’s pitching as new ways to “advance creative control.” (Both were announced in March, but they’re now becoming widely available.) With Structure Reference, users can generate new images that match the “structure” of a reference image — say, a head-on view of a race car. Style Reference is essentially style transfer by another name, preserving the content of an image (e.g., elephants in the African safari) while mimicking the style (e.g., pencil sketch) of a target image.

Here’s Structure Reference in action:

Adobe Firefly
Original image. Image Credits: Adobe
Adobe Firefly
Transformed with Structure Reference. Image Credits: Adobe

And Style Reference:

Adobe Firefly
Original image. Image Credits: Adobe
Adobe Firefly
Transformed with Style Reference. Image Credits: Adobe

I asked Adobe if, with all the upgrades, Firefly image-generation pricing would change. Currently, the cheapest Firefly premium plan is $4.99 per month — undercutting competition like Midjourney ($10 per month) and OpenAI (which gates DALL-E 3 behind a $20-per-month ChatGPT Plus subscription).

Adobe said that its current tiers will remain in place for now, along with its generative credit system. It also said that its indemnity policy, which states Adobe will pay copyright claims related to works generated in Firefly, won’t be changing either, nor will its approach to watermarking AI-generated content. Content Credentials — metadata to identify AI-generated media — will continue to be automatically attached to all Firefly image generations on the web and in Photoshop, whether generated from scratch or partially edited using generative features.

Google's image-generating AI gets an upgrade

The Google Inc. logo

Image Credits: David Paul Morris/Bloomberg / Getty Images

Google’s upgrading its image-generation tech to keep apace with rivals. 

At the company’s I/O developer conference in Mountain View on Tuesday, Google announced Imagen 3, the latest in the tech giant’s Imagen generative AI model family.

Demis Hassabis, CEO of DeepMind, Google’s AI research division, said that Imagen 3 more accurately understands the text prompts that it translates into images versus its predecessor, Imagen 2, and is more “creative and detailed” in its generations. In addition, the model produces fewer “distracting artifacts” and errors, he said.

“This is [also] our best model yet for rendering text, which has been a challenge for image-generation models,” Hassabis added.

To allay concerns around the potential to create deepfakes, Google says that Imagen 3 will use SynthID, an approach developed by DeepMind to apply invisible, cryptographic watermarks to media.

Sign-ups for Imagen 3 in private preview are available in Google’s ImageFX tool, and Google says the model will “come soon” to devs and corporate customers using Vertex AI, Google’s enterprise generative AI development platform.

Google Imagen 3
Image Credits: Google

Google typically doesn’t reveal much about the source of the data it uses to train its AI models — and this time was no exception. There’s a reason for that. Much of the training data comes from public sites, repositories and datasets around the web. And some of that training data, specifically the copyrighted data scraped without permission from content creators, is a source of IP-related lawsuits.

Google’s web publisher controls allow webmasters to prevent the company from scraping data, including photos and videos, from their websites. But Google doesn’t offer an “opt-out” tool, and — unlike some of its rivals — the company hasn’t committed to compensating rights holders for their (in some cases unknowing) contributions to the training datasets.

The lack of transparency isn’t surprising. But it is disappointing — especially from a company with resources like Google’s.

We’re launching an AI newsletter! Sign up here to start receiving it in your inboxes on June 5.

Read more about Google I/O 2024 on TechCrunch