Airbnb or hotel? Overmoon's vacation rental model aims to combine the best of both

Overmoon vacation rental bedroom

Image Credits: Overmoon

Historically, vacation rental companies have managed homes for homeowners. While this model has proven popular, it also has its drawbacks. Customer complaints around quality get routed through the companies to the homeowner, and as such, the ability to offer a consistent guest experience is just not possible.

Overmoon is a three-year-old vacation rental startup with a different model that essentially cuts out the middle man. Rather than serving as a marketplace to team up travelers with vacation rental property owners, the company actually owns the homes and as such, has more control over the quality and maintenance of the properties. It also offers concierge services, such as prestocking the refrigerator.

“Even Brian Chesky recently said that Airbnb is kind of broken,” said CEO and founder Joe Fraiman. “You can have a fantastic travel experience in a vacation rental and you can have a terrible one — and it’s hard to know before you get there, what it’s going to be, and that’s frustrating for customers… In general, the biggest difficulty with vacation rentals or short-term rentals in general is just a lack of consistency, a lack of reliability. You just never know what you’re gonna get. Hotel brands solved this problem many years ago.”

In 2023, Overmoon hosted 4,000 guests after more than quadrupling the number of homes it owned from five to 22. It also more than quadrupled the revenue it earned on those homes, according to Fraiman. It earns the rental revenue, as well as revenue off the concierge services.

The startup is also emerging from stealth today with a new exchange platform that gives vacation rental owners a way to contribute their homes into a multi-property fund through a 721 exchange. The benefit of this, according to Fraiman, is that they defer the capital gains tax that comes with selling a second home. Also, Overmoon takes over the responsibility and costs of property management and maintenance while providing income to the former owner in the form of fund distributions.

Now, it expects to earn additional revenue through its new 721 exchange, which was launched in partnership with Flock Homes, a startup that operates a similar exchange for single-family rental properties.

Overmoon previously raised $10 million in venture capital funding in 2021 via an undisclosed round from investors such as NFX, Khosla Ventures, Camber Creek, 1Sharpe and Sunsar Capital. It also raised $30 million in financing from a variety of real estate investors including family offices, high-net-worth individuals and wealth management firms. Those investors receive partial ownership of the houses and dividends from the rental income. It also secured $40 million in real estate debt over the past year.

Advance bookings (i.e. bookings for the following 12 months), grew over 800%, per home, from January 1, 2023 to January 1, 2024, noted Fraiman. As such, Overmoon earned “premier host status” on both Airbnb and VRBO.

The startup plans to use its new capital in part to purchase more homes in 2024. For now, Overmoon is concentrating on southeast markets such as Florida and Tennessee, as well the sunbelt.

The 721 exchange program is another way to add great homes to our portfolio,” Fraiman said. “The more homes on our platform, the more we earn.”

Overmoon operates an op/prop co model, meaning that it has one company that actually owns the real estate and a separate one that built out all the technology and functions like a tech company.

The current interest rate environment, in which rates surged to nearly 8% in 2023, has hurt a number of proptech companies. TechCrunch has recently covered the struggles and shutdowns of a number of such startups, including Here, Frontdesk and Zeus Living, among others. But Fraiman doesn’t believe that so many proptech companies are failing because of high interest rates.

“I think the inability to raise capital is the real reason and interest rates are a contributing factor,” he said.

Fraiman actually believes the high interest rates spell opportunity.

“They allow us to buy homes cheaper than we could have a couple of years ago,” he said. “In the fourth quarter, on average we paid 17% below asking price in the transactions we closed. Plus, higher interest rates means fewer buyers out there, which means less competition. You can always refinance debt as rates come down. But you can’t go back and change the price you paid for your asset.”

NFX General Partner Pete Flint, who also founded Trulia, said he was drawn to backing Overmoon because in it, he saw “a unique opportunity for owners to efficiently manage their estates, while maintaining passive income and real estate appreciation potential.”

Want more fintech news in your inbox? Sign up for TC’s fintech newsletter here.

Robot holds a green check mark and red x on a purple background.

Building a viable pricing model for generative AI features could be challenging

Robot holds a green check mark and red x on a purple background.

Image Credits: tommy / Getty Images

In October, Box unveiled a new pricing approach for the company’s generative AI features. Instead of a flat rate, the company designed a unique consumption-based model.

Each user gets 20 credits per month, good for any number of AI tasks that add up to 20 events, with each task charged a single credit. After that, people can dip into a company pool of 2,000 additional credits. If the customer surpasses that, it would be time to have a conversation with a salesperson about buying additional credits.

Box CEO Aaron Levie explained that this approach provides a way to charge based on usage with the understanding that some users would take advantage of the AI features more than others, while also accounting for the cost of using the OpenAI API, which the company is using for its underlying large language model.

Meanwhile, Microsoft has chosen a more traditional pricing model, announcing in November that it would charge $30 per user per month to use its Copilot features, over and above the cost of a normal monthly Office 365 subscription, which varies by customer.

While it became clear throughout last year that enterprise software companies would be building generative AI features, at a panel on generative AI’s impact on SaaS companies at Web Summit in November, Christine Spang, co-founder and CTO at Nylas, a communications API startup, and Manny Medina, CEO at sales enablement platform Outreach, spoke about the challenges that SaaS companies face as they implement these features.

Spang says, for starters, that in spite of the hype, generative AI is clearly a big leap forward, and software companies need to look for ways to incorporate it into their products. “I’m not going to say it’s like 10 out of 10 where the hype meets the [current] reality, but I do think there is real value there and what’s really going to make the difference is how people take the technology and connect it to other systems, other apps and sort of drive real value in different use cases with it,” she said.

It’s also about finding a balance between providing the kind of features that customers are suddenly demanding, and figuring out a way to price it in a way that provides real customer value, yet allows the company to make money. “In reality, those of us who are bundling [generative AI features] need to repeatedly check back with our [large language model] provider, and that’s going to get expensive quickly. So until we create experiences that are 10x differentiated, and for which somebody wants to pay for it, it’s going to be challenging,” Medina said.

It’s worth noting that model makers like OpenAI are already announcing price cuts as they find ways to run models more efficiently, or cut prices on older products as new ones are announced. For example, in June, the company announced some new features that increase processing power, which provide more bang for the buck, while also lowering the cost of prior versions for developers who don’t require all the latest bells and whistles.

Spang says her company is already using a consumption model based on the number of connected email or calendar applications, and she expects to follow a similar approach as they add generative AI features.

“We already have the case where some people send a lot more messages, or they receive a lot more messages and I think it’s important to map [to a similar pricing model] that people understand, and then hopefully we can find a price point that kind of works through the median,” she said.

But Medina says for an application, it’s more difficult to use a consumption model than an API provider like Nylas. “I just don’t know that that’s an acceptable model in applications. When you’re a provider of Legos [like Nylas], it’s a different story, but for application providers, [it’s more difficult],” he said.

But it’s also not clear that companies will be willing to pay a flat rate like Microsoft’s $30 a month per user for Office 365, unless they can see real value from that additional cost. “The jury’s still out until somebody either lowers the cost and it makes it very accessible for the rest of us, or we figure out a way to monetize it,” Medina said.

One big unknown also is the compliance costs that could be related to using this technology, which remains a big open question for companies and their customers. “So if you start embedding some of these applications and the U.S. [or other government] passes a law where you have to disclose the list of ingredients of your AI, you’re not getting that from OpenAI, so that’s going to be difficult,” he said.

CIOs who control the company technology budget are taking a close look at this technology, but they are still trying to figure out if the extra cost being passed on to them will pay for itself in terms of higher employee productivity.

Sharon Mandell, CIO at Juniper Networks, says she is looking closely at the ROI on these features. “In 2024, we’re going to be testing the GenAI hype, because if those tools can produce the types of benefits that they say, then the ROI on those is high and may help us eliminate other things,” she said. So she and other CIOs are running pilots, moving cautiously and trying to find ways to measure whether there is truly a productivity increase to justify the increased cost.

Regardless, companies will continue to experiment with pricing models, while their customers are conducting pilots and proofs of concept. It seems like they both have something to gain here, but until we start to see more of these tools in production, it’s hard to know the real benefits to everyone involved.

When it comes to generative AI in the enterprise, CIOs are taking it slow

Airbnb or hotel? Overmoon's vacation rental model aims to combine the best of both

Overmoon vacation rental bedroom

Image Credits: Overmoon

Historically, vacation rental companies have managed homes for homeowners. While this model has proven popular, it also has its drawbacks. Customer complaints around quality get routed through the companies to the homeowner, and as such, the ability to offer a consistent guest experience is just not possible.

Overmoon is a three-year-old vacation rental startup with a different model that essentially cuts out the middle man. Rather than serving as a marketplace to team up travelers with vacation rental property owners, the company actually owns the homes and as such, has more control over the quality and maintenance of the properties. It also offers concierge services, such as prestocking the refrigerator.

“Even Brian Chesky recently said that Airbnb is kind of broken,” said CEO and founder Joe Fraiman. “You can have a fantastic travel experience in a vacation rental and you can have a terrible one — and it’s hard to know before you get there, what it’s going to be, and that’s frustrating for customers… In general, the biggest difficulty with vacation rentals or short-term rentals in general is just a lack of consistency, a lack of reliability. You just never know what you’re gonna get. Hotel brands solved this problem many years ago.”

In 2023, Overmoon hosted 4,000 guests after more than quadrupling the number of homes it owned from five to 22. It also more than quadrupled the revenue it earned on those homes, according to Fraiman. It earns the rental revenue, as well as revenue off the concierge services.

The startup is also emerging from stealth today with a new exchange platform that gives vacation rental owners a way to contribute their homes into a multi-property fund through a 721 exchange. The benefit of this, according to Fraiman, is that they defer the capital gains tax that comes with selling a second home. Also, Overmoon takes over the responsibility and costs of property management and maintenance while providing income to the former owner in the form of fund distributions.

Now, it expects to earn additional revenue through its new 721 exchange, which was launched in partnership with Flock Homes, a startup that operates a similar exchange for single-family rental properties.

Overmoon previously raised $10 million in venture capital funding in 2021 via an undisclosed round from investors such as NFX, Khosla Ventures, Camber Creek, 1Sharpe and Sunsar Capital. It also raised $30 million in financing from a variety of real estate investors including family offices, high-net-worth individuals and wealth management firms. Those investors receive partial ownership of the houses and dividends from the rental income. It also secured $40 million in real estate debt over the past year.

Advance bookings (i.e. bookings for the following 12 months), grew over 800%, per home, from January 1, 2023 to January 1, 2024, noted Fraiman. As such, Overmoon earned “premier host status” on both Airbnb and VRBO.

The startup plans to use its new capital in part to purchase more homes in 2024. For now, Overmoon is concentrating on southeast markets such as Florida and Tennessee, as well the sunbelt.

The 721 exchange program is another way to add great homes to our portfolio,” Fraiman said. “The more homes on our platform, the more we earn.”

Overmoon operates an op/prop co model, meaning that it has one company that actually owns the real estate and a separate one that built out all the technology and functions like a tech company.

The current interest rate environment, in which rates surged to nearly 8% in 2023, has hurt a number of proptech companies. TechCrunch has recently covered the struggles and shutdowns of a number of such startups, including Here, Frontdesk and Zeus Living, among others. But Fraiman doesn’t believe that so many proptech companies are failing because of high interest rates.

“I think the inability to raise capital is the real reason and interest rates are a contributing factor,” he said.

Fraiman actually believes the high interest rates spell opportunity.

“They allow us to buy homes cheaper than we could have a couple of years ago,” he said. “In the fourth quarter, on average we paid 17% below asking price in the transactions we closed. Plus, higher interest rates means fewer buyers out there, which means less competition. You can always refinance debt as rates come down. But you can’t go back and change the price you paid for your asset.”

NFX General Partner Pete Flint, who also founded Trulia, said he was drawn to backing Overmoon because in it, he saw “a unique opportunity for owners to efficiently manage their estates, while maintaining passive income and real estate appreciation potential.”

Want more fintech news in your inbox? Sign up for TC’s fintech newsletter here.

Dice with Weather symbols

Jua raises $16M to build a foundational AI model for the natural world, starting with the weather

Dice with Weather symbols

Image Credits: Dimitri Otis (opens in a new window) / Getty Images

Large AI models — the big troves of language, vision and audio data that power generative artificial intelligence services — are shaping up to be as significant in the development of AI as operating systems have been in the development of smartphones: they are, in a way, looking like the platforms of the space (an idea others are noodling on, too). Now, a Swiss startup called Jua is using that paradigm with ambitions to build out a new frontier for how AI might be used in the physical world. It’s picked up $16 million to build what it is essentially a large “physics” model for the natural world.

The company is still very early stage. Its first application will be in modeling and predicting weather and climate patterns, initially in how they relate to players in the energy industry. This is due to launch in the coming weeks, the company said. Other industries that it plans to target with its model include agriculture, insurance, transportation and government.

468 Capital and the Green Generation Fund are co-leading this seed round for the Zurich-based startup, with Promus Ventures, Kadmos Capital, Flix Mobility founders, Session.vc, Virtus Resources Partners, Notion.vc and InnoSuisse also participating.

Andreas Brenner, Jua’s CEO who co-founded the company with CTO Marvin Gabler, says that the increasing “volatility” of climate change and geopolitics have led to a need among organizations that work in the physical world — whether in industrial areas like energy or agriculture or something else — to have more accurate modeling and forecasting. 2023 was a high watermark year for climate disasters, according to the U.S. National Centers for Environmental Information, resulting in tens of billions of dollars in damage: It’s this current state of affairs that is driving organizations to have been planning tools in place, not to mention better predictive tools for market analysts and others that use that data.

This is, in a way, not a new problem — nor even a problem that technologists have not already been addressing with AI.

Google’s DeepMind division has built GraphCast; Nvidia has FourCastNet; Huawei has Pangu, which last year saw launched a weather component that saw a flurry of interest. There are also projects underway building AI models out of weather data to hone in on other natural occurrences, as highlighted just last week in this report about a team trying to bring new understanding to bird migration patterns.

Jua’s response to that is twofold. First, it believes that its model is better than these others, in part because it is ingesting more information and is larger — by a multiple of 20x over GraphCast, it claims. Second, weather is just the starting point for considering a wider set of physical questions and answers, and challenges.

“Businesses must improve their capabilities to respond to all this [climate] volatility,” he said. “So in the short term, that is the problem we are solving. But looking into the future, we are building the first foundational model for the natural world… We’re essentially building a machine model that is learning physics… and that is one of the key pillars for achieving artificial general intelligence because just understanding language isn’t enough.”

The company has yet to launch its first products, but the leap of faith that investors are taking is not just couched in hype for all things AI.

Before Jua, Gabler headed up research at Q.met, a longtime player in weather forecasting; and he also worked on deep learning technology for the German government. Brenner has worked in the energy sector and previously founded a fleet management software startup. Taken together those experiences bridge not just technical awareness of the problems and potential solutions, but also firsthand understanding of how industry experiences this.

It’s also showing some early work to investors and would-be customers, getting their input on data, as it continues to develop the product.

One aim seems to be to take a new approach to the concept of what goes into the predictive models. When building a weather predicting model, for example, Brenner said that “using weather stations is pretty obvious.” But in addition to that, it’s ingesting what he describes as “much more noisy data” including recent satellite imagery and topography and other “more novel, recent data” to build its models. “The key difference is we are building this end-to-end system where all of the data that used to be used in different steps of the value chain is now all brought into the same pool,” he explained. The company said that it has around 5 petabytes (5,000 terabytes) of training data, versus some 45 terabytes for GPT3 and (reportedly) 1 petabyte for GPT4. (Understand that language data may well need less data than a physical world model, though.)

Another aim — not a small one — is that the company is trying to build something more efficient to bring down operational costs for itself and for customers. “Our system uses 10,000 times less compute than the legacy systems,” Brenner said.

It’s notable that Jua is emerging and getting funding at this moment in particular.

Foundational models are shaping up to be the cornerstone of how the next generation of AI applications are being developed, so the companies that are building and controlling foundational models hold a lot of value and potential power.

The biggest movers and shakers in this area right now are companies like OpenAI, Google, Microsoft, Anthropic, Amazon and Meta: all U.S. businesses. That has spurred some activity in other parts of the world, such as Europe, to seek out and fund home champions as alternatives. Notably, 468 Capital also backs Germany’s Aleph Alpha, which — like the foundational model players in the U.S. — is also building large language models, but seemingly in closer collaboration with potential customers. (One of its taglines is “Sovereignty in the AI era”).

“Andreas, Marvin and the team are building the world’s first foundation AI for physics and the natural world, which will be capable of providing powerful insights for a wide range of industries dependent on true understanding of nature, from insurance companies and chemical and energy providers, to disaster planning teams, organisations in agriculture, airlines and aid charities,” said Ludwig Ensthaler, a general partner at 468 Capital, in a statement.

There is a definite “good guy” feel about an AI company that is setting out to make better sense of how climate change is impacting us, to aid in better disaster planning, and perhaps even, one day, be used to help understand how to mitigate environment damage. And the bigger picture for a startup aiming to build an AI that can understand the physical world is that, potentially, that can be applied to a much wider set of challenges in material science, biomedicine, chemistry and much more. In addition to the feasibility of the model itself, though, the prospect also carries a lot of questions, similar to those facing other kinds of AI models, around safety, reliability and more, something Jua is already thinking about, even if in rudimentary terms for now.

“In order for models to work and to be accepted, you need to enforce consistency,” said Gabler. “You need to make sure the models actually learn physics from the ground up to solve problems correctly.”

Largest text-to-speech AI model yet shows 'emergent abilities'

Illustration of a robot in a laptop

Image Credits: Carol Yepes (opens in a new window) / Getty Images

Researchers at Amazon have trained the largest ever text-to-speech model yet, which they claim exhibits “emergent” qualities improving its ability to speak even complex sentences naturally. The breakthrough could be what the technology needs to escape the uncanny valley.

These models were always going to grow and improve, but the researchers specifically hoped to see the kind of leap in ability that we observed once language models got past a certain size. For reasons unknown to us, once LLMs grow past a certain point, they start being way more robust and versatile, able to perform tasks they weren’t trained to.

That is not to say they are gaining sentience or anything, just that past a certain point their performance on certain conversational AI tasks hockey sticks. The team at Amazon AGI — no secret what they’re aiming at — thought the same might happen as text-to-speech models grew as well, and their research suggests this is in fact the case.

The new model is called Big Adaptive Streamable TTS with Emergent abilities, which they have contorted into the abbreviation BASE TTS. The largest version of the model uses 100,000 hours of public domain speech, 90% of which is in English, the remainder in German, Dutch and Spanish.

At 980 million parameters, BASE-large appears to be the biggest model in this category. They also trained 400M- and 150M-parameter models based on 10,000 and 1,000 hours of audio respectively, for comparison — the idea being, if one of these models shows emergent behaviors but another doesn’t, you have a range for where those behaviors begin to emerge.

As it turns out, the medium-sized model showed the jump in capability the team was looking for, not necessarily in ordinary speech quality (it is reviewed better but only by a couple points) but in the set of emergent abilities they observed and measured. Here are examples of tricky text mentioned in the paper:

Compound nouns: The Beckhams decided to rent a charming stone-built quaint countryside holiday cottage.Emotions: “Oh my gosh! Are we really going to the Maldives? That’s unbelievable!” Jennie squealed, bouncing on her toes with uncontained glee.Foreign words: “Mr. Henry, renowned for his mise en place, orchestrated a seven-course meal, each dish a pièce de résistance.Paralinguistics (i.e. readable non-words): “Shh, Lucy, shhh, we mustn’t wake your baby brother,” Tom whispered, as they tiptoed past the nursery.Punctuations: She received an odd text from her brother: ’Emergency @ home; call ASAP! Mom & Dad are worried…#familymatters.’Questions: But the Brexit question remains: After all the trials and tribulations, will the ministers find the answers in time?Syntactic complexities: The movie that De Moya who was recently awarded the lifetime achievement award starred in 2022 was a box-office hit, despite the mixed reviews.

“These sentences are designed to contain challenging tasks – parsing garden-path sentences, placing phrasal stress on long-winded compound nouns, producing emotional or whispered speech, or producing the correct phonemes for foreign words like “qi” or punctuations like “@” – none of which BASE TTS is explicitly trained to perform,” the authors write.

Such features normally trip up text-to-speech engines, which will mispronounce, skip words, use odd intonation or make some other blunder. BASE TTS still had trouble, but it did far better than its contemporaries — models like Tortoise and VALL-E.

There are a bunch of examples of these difficult texts being spoken quite naturally by the new model at the site they made for it. Of course these were chosen by the researchers, so they’re necessarily cherry-picked, but it’s impressive regardless. Here are a couple, if you don’t feel like clicking through:

https://techcrunch.com/wp-content/uploads/2024/02/shh-its-starting.wavhttps://techcrunch.com/wp-content/uploads/2024/02/how-french.wavhttps://techcrunch.com/wp-content/uploads/2024/02/guiding-moonlight.wav

Because the three BASE TTS models share an architecture, it seems clear that the size of the model and the extent of its training data seem to be the cause of the model’s ability to handle some of the above complexities. Bear in mind this is still an experimental model and process — not a commercial model or anything. Later research will have to identify the inflection point for emergent ability and how to train and deploy the resulting model efficiently.

A representative for Amazon AI, Leo Zao (not an author of the paper), wrote that they don’t make any claims of exclusive emergent properties here.

“We think it’s premature to conclude that such emergence won’t appear in other models. Our proposed emergent abilities test set is one way to quantify this emergence, and it is possible that applying this test set to other models could produce similar observations. This is partly why we decided to release this test set publicly,” he wrote in an email. “It is still early days for a ‘Scaling Law’ for TTS, and we look forward to more research on this topic.”

Notably, this model is “streamable,” as the name says — meaning it doesn’t need to generate whole sentences at once but goes moment by moment at a relatively low bitrate. The team has also attempted to package the speech metadata like emotionality, prosody and so on in a separate, low-bandwidth stream that could accompany vanilla audio.

It seems that text-to-speech models may have a breakout moment in 2024 — just in time for the election! But there’s no denying the usefulness of this technology, for accessibility in particular. The team does note that it declined to publish the model’s source and other data due to the risk of bad actors taking advantage of it. The cat will get out of that bag eventually, though.

illustration featuring Google's Bard logo

Google's new Gemini model can analyze an hour-long video — but few people can use it

illustration featuring Google's Bard logo

Image Credits: TechCrunch

Last October, a research paper published by a Google data scientist, the CTO of Databricks Matei Zaharia and UC Berkeley professor Pieter Abbeel posited a way to allow GenAI models — i.e. models along the lines of OpenAI’s GPT-4 and ChatGPT — to ingest far more data than was previously possible. In the study, the co-authors demonstrated that, by removing a major memory bottleneck for AI models, they could enable models to process millions of words as opposed to hundreds of thousands — the maximum of the most capable models at the time.

AI research moves fast, it seems.

Today, Google announced the release of Gemini 1.5 Pro, the newest member of its Gemini family of GenAI models. Designed to be a drop-in replacement for Gemini 1.0 Pro (which formerly went by “Gemini Pro 1.0” for reasons known only to Google’s labyrinthine marketing arm), Gemini 1.5 Pro is improved in a number of areas compared with its predecessor, perhaps most significantly in the amount of data that it can process.

Gemini 1.5 Pro can take in ~700,000 words, or ~30,000 lines of code — 35x the amount Gemini 1.0 Pro can handle. And — the model being multimodal — it’s not limited to text. Gemini 1.5 Pro can ingest up to 11 hours of audio or an hour of video in a variety of different languages.

Google Gemini 1.5 Pro
Image Credits: Google

To be clear, that’s an upper bound.

The version of Gemini 1.5 Pro available to most developers and customers starting today (in a limited preview) can only process ~100,000 words at once. Google’s characterizing the large-data-input Gemini 1.5 Pro as “experimental,” allowing only developers approved as part of a private preview to pilot it via the company’s GenAI dev tool AI Studio. Several customers using Google’s Vertex AI platform also have access to the large-data-input Gemini 1.5 Pro — but not all.

Still, VP of research at Google DeepMind Oriol Vinyals heralded it as an achievement.

“When you interact with [GenAI] models, the information you’re inputting and outputting becomes the context, and the longer and more complex your questions and interactions are, the longer the context the model needs to be able to deal with gets,” Vinyals said during a press briefing. “We’ve unlocked long context in a pretty massive way.”

Big context

A model’s context, or context window, refers to input data (e.g. text) that the model considers before generating output (e.g. additional text). A simple question — “Who won the 2020 U.S. presidential election?” — can serve as context, as can a movie script, email or e-book.

Models with small context windows tend to “forget” the content of even very recent conversations, leading them to veer off topic — often in problematic ways. This isn’t necessarily so with models with large contexts. As an added upside, large-context models can better grasp the narrative flow of data they take in and generate more contextually rich responses — hypothetically, at least.

There have been other attempts at — and experiments on — models with atypically large context windows.

AI startup Magic claimed last summer to have developed a large language model (LLM) with a 5 million-token context window. Two papers in the past year detail model architectures ostensibly capable of scaling to a million tokens — and beyond. (“Tokens” are subdivided bits of raw data, like the syllables “fan,” “tas” and “tic” in the word “fantastic.”) And recently, a group of scientists hailing from Meta, MIT and Carnegie Mellon developed a technique that they say removes the constraint on model context window size altogether.

But Google is the first to make a model with a context window of this size commercially available, beating the previous leader Anthropic’s 200,000-token context window — if a private preview counts as commercially available.

Google Gemini 1.5 Pro
Image Credits: Google

Gemini 1.5 Pro’s maximum context window is 1 million tokens, and the version of the model more widely available has a 128,000-token context window, the same as OpenAI’s GPT-4 Turbo.

So what can one accomplish with a 1 million-token context window? Lots of things, Google promises — like analyzing a whole code library, “reasoning across” lengthy documents like contracts, holding long conversations with a chatbot and analyzing and comparing content in videos.

During the briefing, Google showed two prerecorded demos of Gemini 1.5 Pro with the 1 million-token context window enabled.

In the first, the demonstrator asked Gemini 1.5 Pro to search the transcript of the Apollo 11 moon landing telecast — which comes to around 402 pages — for quotes containing jokes, and then to find a scene in the telecast that looked similar to a pencil sketch. In the second, the demonstrator told the model to search for scenes in “Sherlock Jr.,” the Buster Keaton film, going by descriptions and another sketch.

Google Gemini 1.5 Pro
Image Credits: Google

Gemini 1.5 Pro successfully completed all the tasks asked of it, but not particularly quickly. Each took between ~20 seconds and a minute to process — far longer than, say, the average ChatGPT query.

Google Gemini 1.5 Pro
Image Credits: Google

Vinyals says that the latency will improve as the model’s optimized. Already, the company’s testing a version of Gemini 1.5 Pro with a 10 million-token context window.

“The latency aspect [is something] we’re … working to optimize — this is still in an experimental stage, in a research stage,” he said. “So these issues I would say are present like with any other model.”

Me, I’m not so sure latency that poor will be attractive to many folks — much less paying customers. Having to wait minutes at a time to search across a video doesn’t sound pleasant — or very scalable in the near term. And I’m concerned how the latency manifests in other applications, like chatbot conversations and analyzing codebases. Vinyals didn’t say — which doesn’t instill much confidence.

My more optimistic colleague Frederic Lardinois pointed out that the overall time savings might just make the thumb twiddling worth it. But I think it’ll depend very much on the use case. For picking out a show’s plot points? Perhaps not. But for finding the right screengrab from a movie scene you only hazily recall? Maybe.

Other improvements

Beyond the expanded context window, Gemini 1.5 Pro brings other, quality-of-life upgrades to the table.

Google’s claiming that — in terms of quality — Gemini 1.5 Pro is “comparable” to the current version of Gemini Ultra, Google’s flagship GenAI model, thanks to a new architecture comprised of smaller, specialized “expert” models. Gemini 1.5 Pro essentially breaks down tasks into multiple subtasks and then delegates them to the appropriate expert models, deciding which task to delegate based on its own predictions.

MoE isn’t novel — it’s been around in some form for years. But its efficiency and flexibility has made it an increasingly popular choice among model vendors (see: the model powering Microsoft’s language translation services).

Now, “comparable quality” is a bit of a nebulous descriptor. Quality where it concerns GenAI models, especially multimodal ones, is hard to quantify — doubly so when the models are gated behind private previews that exclude the press. For what it’s worth, Google claims that Gemini 1.5 Pro performs at a “broadly similar level” compared to Ultra on the benchmarks the company uses to develop LLMs while outperforming Gemini 1.0 Pro on 87% of those benchmarks. (I’ll note that outperforming Gemini 1.0 Pro is a low bar.)

Pricing is a big question mark.

During the private preview, Gemini 1.5 Pro with the 1 million-token context window will be free to use, Google says. But the company plans to introduce pricing tiers in the near future that start at the standard 128,000 context window and scale up to 1 million tokens.

I have to imagine the larger context window won’t come cheap — and Google didn’t allay fears by opting not to reveal pricing during the briefing. If pricing’s in line with Anthropic’s, it could cost $8 per million prompt tokens and $24 per million generated tokens. But perhaps it’ll be lower; stranger things have happened! We’ll have to wait and see.

I wonder, too, about the implications for the rest of the models in the Gemini family, chiefly Gemini Ultra. Can we expect Ultra model upgrades roughly aligned with Pro upgrades? Or will there always be — as there is now — an awkward period where the available Pro models are superior performance-wise to the Ultra models, which Google’s still marketing as the top of the line in its Gemini portfolio?

Chalk it up to teething issues if you’re feeling charitable. If you’re not, call it like it is: darn confusing.

OpenAI's newest model Sora can generate videos — and they look decent

OpenAI Sora

Image Credits: OpenAI

OpenAI, following in the footsteps of startups like Runway and tech giants like Google and Meta, is getting into video generation.

OpenAI today unveiled Sora, a generative AI model that creates video from text. Given a brief — or detailed — description or a still image, Sora can generate 1080p movie-like scenes with multiple characters, different types of motion and background details, OpenAI claims.

Sora can also “extend” existing video clips — doing its best to fill in the missing details.

“Sora has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions,” OpenAI writes in a blog post. “The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.”

Now, there’s a lot of bombast in OpenAI’s demo page for Sora — the above statement being an example. But the cherry-picked samples from the model do look rather impressive, at least compared to the other text-to-video technologies we’ve seen.

For starters, Sora can generate videos in a range of styles (e.g., photorealistic, animated, black and white) up to a minute long — far longer than most text-to-video models. And these videos maintain reasonable coherence in the sense that they don’t always succumb to what I like to call “AI weirdness,” like objects moving in physically impossible directions.

Check out this tour of an art gallery, all generated by Sora (ignore the graininess — compression from my video-GIF conversion tool):

OpenAI Sora
Image Credits: OpenAI

Or this animation of a flower blooming:

OpenAI Sora
Image Credits: OpenAI

I will say that some of Sora’s videos with a humanoid subject — a robot standing against a cityscape, for example, or a person walking down a snowy path — have a video game-y quality to them, perhaps because there’s not a lot going on in the background. AI weirdness manages to creep into many clips besides, like cars driving in one direction, then suddenly reversing or arms melting into a duvet cover.

OpenAI Sora
Image Credits: OpenAI

OpenAI — for all its superlatives — acknowledges the model isn’t perfect. It writes:

“[Sora] may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark. The model may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory.”

OpenAI’s very much positioning Sora as a research preview, revealing little about what data was used to train the model (short of ~10,000 hours of “high-quality” video) and refraining from making Sora generally available. Its rationale is the potential for abuse; OpenAI correctly points out that bad actors could misuse a model like Sora in myriad ways.

OpenAI says it’s working with experts to probe the model for exploits and building tools to detect whether a video was generated by Sora. The company also says that, should it choose to build the model into a public-facing product, it’ll ensure that provenance metadata is included in the generated outputs.

“We’ll be engaging policymakers, educators and artists around the world to understand their concerns and to identify positive use cases for this new technology,” OpenAI writes. “Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it. That’s why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time.”

Arthur Mensch, founder of Mistral AI SAS, during the LaRef conference in Paris, France, on Monday, Aug. 28, 2023.

Mistral AI releases new model to rival GPT-4 and its own chat assistant

Arthur Mensch, founder of Mistral AI SAS, during the LaRef conference in Paris, France, on Monday, Aug. 28, 2023.

Image Credits: Nathan Laine / Bloomberg / Getty Images

Paris-based AI startup Mistral AI is gradually building an alternative to OpenAI and Anthropic as its latest announcement shows. The company is launching a new flagship large language model called Mistral Large. When it comes to reasoning capabilities, it is designed to rival other top-tier models, such as GPT-4 and Claude 2.

In addition to Mistral Large, the startup is also launching its own alternative to ChatGPT with a new service called Le Chat. This chat assistant is currently available in beta.

If you’re not familiar with Mistral AI, the company is better known for its capitalization table, as it raised an obscene amount of money in very little time to develop foundational AI models. The company was officially incorporated in May 2023. Just a few weeks after that, Mistral AI raised a $113 million seed round. In December, the company closed a $415 million funding round, with Andreessen Horowitz (a16z) leading the round.

Founded by alums from Google’s DeepMind and Meta, Mistral AI originally positioned itself as an AI company with an open source focus. While Mistral AI’s first model was released under an open source license with access to model weights, that’s not the case for its larger models.

Mistral AI’s business model looks more and more like OpenAI’s business model as the company offers Mistral Large through a paid API with usage-based pricing. It currently costs $8 per million of input tokens and $24 per million of output tokens to query Mistral Large. In artificial language jargon, tokens represent small chunks of words — for example, the word “TechCrunch” would be split in two tokens, “Tech” and “Crunch,” when processed by an AI model.

By default, Mistral AI supports context windows of 32k tokens (generally more than 20,000 words in English). Mistral Large supports English, French, Spanish, German and Italian.

As a comparison, GPT-4 Turbo, which has a 128k-token context window, currently costs $10 per million of input tokens and $30 per million of output tokens. So Mistral Large is currently 20% cheaper than GPT-4 Turbo. Things are changing at a rapid pace and AI companies update their pricing regularly.

But how does Mistral Large stack up against GPT-4 and Claude 2? As always, it’s very hard to tell. Mistral AI claims that it ranks second after GPT-4 based on several benchmarks. But there could be some benchmark cherry-picking and disparities in real-life usage. We’ll have to dig more to see how it performs in our tests.

Image Credits: Mistral AI

An alternative to ChatGPT

Mistral AI is also launching a chat assistant today called Le Chat. Anyone can sign up and try it out on chat.mistral.ai. The company says that it is a beta release for now and that there could be “quirks.”

Access to the service is free (for now) and users can choose between three different models — Mistral Small, Mistral Large and a prototype model that has been designed to be brief and concise called Mistral Next. It’s also worth noting that Le Chat can’t access the web when you use it.

The company also plans to launch a paid version of Le Chat for enterprise clients. In addition to central billing, enterprise clients will be able to define moderation mechanisms.

A partnership with Microsoft

Finally, Mistral AI is also using today’s news drop to announce a partnership with Microsoft. In addition to Mistral’s own API platform, Microsoft is going to provide Mistral models to its Azure customers.

It’s another model in Azure’s model catalog, which doesn’t seem that big of a deal. And yet, it also means that Mistral AI and Microsoft are now holding talks for collaboration opportunities and potentially more. The first benefit of that partnership is that Mistral AI will likely attract more customers with this new distribution channel.

As for Microsoft, the company is the main investor of OpenAI’s capped profit subsidiary. But it has also welcomed other AI models on its cloud computing platform. For instance, Microsoft and Meta partner to offer Llama large language models on Azure.

This open partnership strategy is a nice way to keep its Azure customers in its product ecosystem. It might also help when it comes to anticompetitive scrutiny.

Microsoft made a $16M investment in Mistral AI

Correction: A previous version of this article compared Mistral Large’s pricing with an older version of OpenAI’s GPT API. Mistral Large is 20% cheaper than the most recent version of GPT called GPT-4 Turbo.

Tesla Model Y SUV

Tesla slashes Model Y inventory prices by as much as $7,000

Tesla Model Y SUV

Image Credits: Tesla

Tesla is dropping prices of unsold Model Y SUVs in the U.S. by thousands of dollars in an attempt to clear out an unprecedented backlog of inventory.

Many long-range and performance Model Ys are now selling for $5,000 less than their original price, while rear-wheel drive versions are seeing even bigger cuts of more than $7,000.

The discounts come as Tesla once again made far more vehicles than it sold in the last quarter. The company built 433,371 vehicles in the first quarter but only shipped 386,810, likely adding more than 40,000 EVs to its inventory glut. (Some of those vehicles were likely in transit, though Tesla didn’t say how many.) The company has built more cars than it shipped in seven of the last eight quarters, Bloomberg News noted Friday.

In January, Tesla warned sales growth could be “notably lower” in 2024 compared to previous years — a trend that has bothered every player in the market from big automakers like Ford to struggling upstarts like Lucid.

Tesla went through a typical end-of-quarter push to deliver as many cars as it could over the last few weeks, with lead designer Franz von Holzhausen once again pitching in to get them out the door in the final days. But Tesla also tried to boost sales in other ways. It announced a $1,000 price hike was coming to the Model Y, its most popular vehicle, on April 1. Tesla CEO Elon Musk also started mandating demos of the company’s advanced driver assistance system to all potential buyers. That software package costs $12,000 and can be a huge boost to the profit Tesla makes on a vehicle.

Musk has more or less admitted that Tesla has had to work harder to drum up demand for its vehicles lately. He has largely blamed the struggle on high interest rates, all while his company dramatically cut prices on the Model Y and Model 3 throughout 2023.

Tesla model 3 performance

Tesla launches new Model 3 Performance variant to rev up demand

Tesla model 3 performance

Image Credits: Tesla

Tesla has officially revealed a new Performance variant of the recently refreshed Model 3 sedan as the company looks to fight off receding demand.

The new version of the Model 3, which starts at $52,990, has a new active damping system and adaptive suspension for better handling and comfort, 296 miles of battery range and can travel from 0 to 60 miles per hour in 2.9 seconds with 510 horsepower on offer.

Compared to the previous Model 3 Performance, the new version has 32% more peak power and 16% more peak torque, and 5% less drag. It does all this while consuming less energy than its predecessor, according to Tesla. That’s thanks in part to a new-generation drive unit and a rear diffuser and spoiler. The front and rear ends of the car have also benefited from a slight face-lift, separating it from the other versions of the newly tweaked Model 3 revealed last year.

The Model 3 Performance still carries with it the wholesale changes made with that recent refresh. That means there’s an ambient light bar wrapping around the cabin interior, better sound dampening and upgraded materials throughout, a stalk-less steering wheel and a new touchscreen display.

Tesla is launching the new Model 3 Performance at a time when the company is coming off one of its worst quarters for deliveries in recent memory, having dropped 20% compared to the fourth quarter of 2023. The impact of that disappointing first quarter is set to be revealed Tuesday when the company publishes its financial results after the market closes.

The company is also just one week removed from announcing sweeping layoffs of more than 10% of its global workforce, with the cuts affecting seemingly all corners of the company.

Orders placed Tuesday, at least at the time of publication, show an estimated delivery window of May/June 2024 in North America.