Rabbit's web-based 'large action model' agent arrives on r1 as early as this week

Image Credits: Brian Heater

The Rabbit r1 was the must-have gadget of early 2024, but the blush fell off it pretty quick when the company’s expansive promises failed to materialize. CEO Jesse Lyu admits that “on day one, we set our expectations too high” but also said that an update coming to devices this month will finally set the vaunted Large Action Model free on the web.

While skeptics may (justifiably) see this as too little, too late, or another shifting of goalposts, Rabbit’s aspiration of building a platform-agnostic agent for web and mobile apps still has fundamental — if still largely theoretical — value.

Speaking to TechCrunch, Lyu said that the last six months have been a whirlwind of shipping, bug fixes, improving response times, and adding minor features. But despite 16 over-the-air updates to the r1, it remains fundamentally limited to interacting with an LLM or accessing one of seven specific services, like Uber and Spotify.

“That was the first-ever version of the LAM, trained on recordings collected from data laborers, but it isn’t generic — it only connects to those services,” he said. Whether or not it was what they call the LAM is pretty much academic at this point; whatever the model was, it didn’t provide the capabilities Rabbit detailed at its debut.

A generalist web-based agent

But Rabbit is ready to release the first generic version, which is to say not specific to any app or interface, of the LAM, which Lyu demonstrated for me.

This version is a web-based agent that reasons out the steps to do any ordinary task, like buying tickets to a concert, registering a website, or even playing an online game. “Our goal is very clear: At the end of September, your r1 will suddenly do lots more things. It should support anything you can do on any website,” Lyu said.

Given a task, it first breaks that task down into steps, then starts executing them by analyzing what it sees on screen: buttons, fields, images, regardless of position or appearance. Then it interacts with the appropriate element based on what it has learned in general about how websites work.

I asked it (through Lyu, who was operating it remotely) to register a new website for a film festival. Taking an action every few seconds, it searched for domain registries on Google, picked one (a sponsored one, I think), put film festival in the domain box, and from the resulting list of options picked “filmfestival2023.com” for $14. Technically I hadn’t given it any constraints like “for 2025” or “horror festival” or anything.

Similarly, when Lyu asked it to search for and buy an r1, it quickly found its way to eBay, where dozens were on sale. Perhaps a good result for a user but not for the founder of the company presenting to the press! He laughed it off and did the prompt again with the addition that it should buy only from the official website. The agent succeeded.

Next, he had it play Dictionary.com’s daily word game. It took a bit of prompt engineering (the model found an out in that it could quickly finish by hitting “end game”) but it did it.

Which browser does it use, though? A fresh, clean one in the cloud, Lyu said, but they are working on local versions, like a Chrome extension, that would mean you can use existing sessions and it wouldn’t have to log into your services.

To that end, as users are understandably (and rightly) wary of giving any company full access to their credentials, the agent is not equipped with those. Lyu suggested that a walled-off small language model with your credentials could be privately invoked in the future to perform logins. It seems to be an open question how this will work, which is somewhat to be expected given the newness of the space.

An example of UI analysis inside apps from the Rabbit website.
Image Credits: Rabbit

Still learning

The demo showed me a couple things. First, if we give the company and its developers the benefit of the doubt that this isn’t all some elaborate hoax (as some believe), it does appear to be a working, general-purpose web agent. And that would be, if not a first in itself, certainly the first to be easily accessible to consumers.

“There are companies doing verticals, for Excel or legal documents, but I believe this is one of the first general agents for consumers,” Lyu said. “The idea is you can say anything that can be achieved through a website. We’ll have the generic agent for websites first, then for apps.”

Second, it showed that prompt engineering is still very much needed. How you phrase a request can easily be the difference between success and failure, and that’s probably not something ordinary consumers will tolerate.

Lyu cautioned that this is a “playground version,” not final by any means, and that although it is a fully functioning general web agent, it still can be improved in many ways. For instance, he said, “the model is smart enough to do the planning, but isn’t smart enough to skip steps.” It wouldn’t “learn” that a user prefers not to buy their electronics on eBay, or that it should scroll down after searching to avoid the wall of sponsored results.

User data won’t be harvested to improve the model — yet. Lyu attributed this to the fact that there’s basically no evaluation method for a system like this, so it is difficult to say quantitatively whether improvements have been made. A “teach mode” is also coming, though, so you can show it how to do a specific type of task.

Interestingly, the company is also working on a desktop agent that can interact with apps like word processors, music players, and of course browsers. This is still in the early stages, but it’s working. “You don’t even need to input a destination, it just tries to use the computer. As long as there is an interface, it can control it.”

Third, there is still no “killer app,” or at least no obvious one. The agent is impressive, but I personally would have little use for it, being unfortunately sitting in front of a browser for 8 hours a day anyway. There are almost certainly some great applications, but none sprang to mind that makes the utility of a browser-based automaton as obvious as that of, say, a robot vacuum.

Why not an app, again?

I raised the common objection to the entire Rabbit business model, essentially that “this could be an app.”

Lyu has clearly heard this criticism many times, and he was confident of his answer.

“If you do the math, it doesn’t make sense,” he said. “Yes, it’s technically achievable, but you’re going to piss off Apple and Google from day one. They will never let this be better than Siri or Gemini. Just like there’s no way Apple intelligence is going to control Google stuff better, or vice versa. And they take 30% of revenue! If at the beginning we’d just built an app, we’d never have this momentum.”

The rabbit r1 in use. Hand model: Chris Velazco of The Washington Post.
Image Credits: Devin Coldewey / TechCrunch

The fundamental pitch Rabbit is making is that there can be a third-party AI or device that can access and operate all your other services, and from outside them, like you are. “A cross-platform, generic agent system,” as Lyu called it. “We’ll control every UI, and the website is a good start. Then we’ll go to Windows, to MacOS, to phones.”

Speaking of which: “We never said we’d never build a phone in the future.” Isn’t that antithetical to their original thesis of a smaller, simpler device? Maybe, maybe not.

In the meantime, they’re working on starting to fulfill the promises they made early this year. The new model should be available to any r1 owner sometime this week when the OTA update goes out. Instructions on how to invoke it will arrive then as well. Lyu cautioned expectant users with his characteristic understatement.

“We’re setting the expectations right. It’s not perfect,” he said. “It’s just the best the human race has achieved so far.”

The Browser Company splash screen

Arc is building an AI agent that browses on your behalf

The Browser Company splash screen

Image Credits: The Browser Company

For years, Google (or any other search engine) has been the main gateway for people to discover websites and other content. The Browser Company, which makes the Arc Browser, is on a quest to change that by building an AI that surfs the web for you and gets you the results while bypassing search engines.

The company laid out its product roadmap, which talks about releasing a new tool in the next few months where you can tell the browser what you are looking for and it will present you relevant information by automatically crawling the web.

In a video released today, Josh Miller, the co-founder and CEO of the company, shows that users will be able to type something like “Reservation for two people at either Llama Inn or Kings Imperial,” and the browser will return results with available time slots — that will be available in the coming months. Then users can reserve a table by going to a particular website with one click.

The product plan

The company has already started some of this work. The company said only some of these features use LLMs, but they all work in an effort to “bring the internet to you.”

Earlier this week, it released a new iPhone web browser app called Arc Search. The app has a “browse for me” feature, which reads at least six links related to the topic and creates a new webpage with photos and videos while summarizing the information.

Image Credits: Screenshot by TechCrunch

Arc’s new iPhone browser wants to be your search companion

Today, the startup is releasing a feature named “instant links,” which directly takes you to a link rather than returning results from the search engine. For instance, if you search for “Gladiator 2 trailer,” the new feature will directly take you to the trailer on YouTube. This also works on folders, so if you search for something like “Folder of Apple Vision Pro reviews” it will create a folder with stories of Vision Pro reviews from different publications.

Later this month, the company will release a “Live Folder” feature. It will work just like folders, but these folders will be automatically updated when an event like a new blog post occurs. In the video, Miller also shows that you can update filters for folders. Examples include a new story about a topic, or someone mentioning you in the Linear issue tracking tool. It sounds like a mixture of reading RSS feeds and page updates, but we don’t know if it can track changes on a page.

The story arc

Browsers typically make money by making deals with search engines, having their own ad stacks or, in some cases, offering subscriptions. The Browser Company argues that Chrome and other browsers are aligned to have you make more searches so there is more ad money flowing. So it wants to cut the middlepersons and serve you results directly.

Arc Browser wants to change how we browse the web and get results quickly. It also is trying to use AI agents in a different way by not sticking them in a sidebar to write posts. In October 2023, the company released new AI features including renaming tabs and downloaded files and showing a preview summary of a link.

On the flip side, as the role of AI-powered agents rises on the internet, there are questions about how you return value to publishers and blogs whose content you are fetching and summarizing.

Last year, in a video, the company said that it would never sell user data to third parties for money. But it will explore avenues such as Arc for Teams. But it hasn’t announced anything yet.

There is also a debate to be had about how AI selects the “best” results. For different people, the preference might be different and the result may not be the best for them.

Given the rise in popularity of LLMs, startups like OpenAI and Perplexity are pushing people toward getting answers on their platforms. Search players like Google, Microsoft and DuckDuckGo have also leaned into AI-powered search. So Arc feels that the time is ripe to put a new search method in your web browser.

Robot answering customer service inquiries.

Zendesk adds flexible AI agent capabilities with Ultimate acquisition

Robot answering customer service inquiries.

Image Credits: Anastasia Usenko / Getty Images

Zendesk has been trying to transform customer service since it launched in 2007, so it shouldn’t come as a surprise that the company sees the industry being altered in a big way by the rise of generative AI.

On Wednesday, the company announced it intends to acquire Ultimate, a German customer automation startup. The companies did not share the purchase price.

The idea of an AI agent has come to the forefront recently as companies look to build bots that do more than answer questions, but also help resolve problems by connecting to back-end transactional systems. Earlier this year, Bret Taylor and Clay Bavor launched a new company, Sierra, with the goal of building these flexible AI agents.

This is precisely how Zendesk describes Ultimate: “Its automation platform integrates with any backend system and provides robust analytics and reporting.” The company sees a hybrid future where customer inquiries can flow wherever they make the most sense, whether that’s an AI agent, workflow automation or human agent.

Zendesk CEO Tom Eggemeier, who joined the company last fall, says he believes that as more customers interact with the AI agents, it will increase the need for the kind of automated responses that Ultimate provides. “We believe that somewhere between 70% and 90% of interactions are going to be through AI agents in the future. And Ultimate has done a really nice job solving up to 80% of interactions via their AI agents,” Eggemeier told TechCrunch.

Eggemeier says that where Ultimate is perhaps more unusual than some of the other offerings in the market is by using an adaptive methodology where they use the level of technology required to solve the problem instead of a one-size-fits-all approach. “Sometimes it’s large language models, sometimes it’s old-school machine learning and predictive analytics, and sometimes they’ll use a rule or workflow when you can just plug in a rule to complete a task, and you don’t need to hit large language models or run predictive analytics on it,” he said.

Perhaps it should come as no surprise that one of Ultimate’s integration partners is none other than Zendesk, but it works with other companies, as well, including Salesforce and Freshdesk. He says that while the plan is to incorporate Ultimate technology into the Zendesk platform, they will continue to offer standalone products to other companies.

As for Ultimate, it seems to bode well for its customer and partner relationships that Zendesk wants to keep those going instead of swallowing the startup whole into its platform. Still, when it comes to acquisitions, it can sometimes get muddled and time will tell how much independence the company ultimately has and what impact that will have on those existing relationships.

Zendesk raised more than $85 million before going public to much fanfare in 2014. The company ran into activist investor trouble in 2022 and eventually went private that year in a deal worth over $10 billion. The private equity firm hired Eggemeier shortly after, replacing company co-founder Mikkel Svane.

Ultimate launched in 2017 and raised $27 million, per Crunchbase. Eggemeier says they will continue to run out of the company’s main office in Berlin, and give Zendesk a foothold in the city. The company’s approximately 140 employees will be joining Zendesk after the deal closes. Eggemeier says he expects it to go through rather quickly, in perhaps 2-4 weeks.

ultimate.ai scores $20M for a supportive approach to customer service automation

With Vertex AI Agent Builder, Google Cloud aims to simplify agent creation

Google Vertex AI Agent Builder presentation at Google Cloud Next

Image Credits: Frederic Lardinois/TechCrunch

AI agents are the new hot craze in generative AI. Unlike the previous generation of chatbots, these agents can do more than simply answer questions. They can take actions based on the conversation, and even interact with back-end transactional systems to take actions in an automated manner.

On Tuesday at Google Cloud Next, the company introduced a new tool to help companies build AI agents.

“Vertex AI Agent Builder allows people to very easily and quickly build conversational agents,” Google Cloud CEO Thomas Kurian said. “You can build and deploy production-ready, generative AI-powered conversational agents and instruct and guide them the same way that you do humans to improve the quality and correctness of answers from models.”

The no-code product builds upon Google’s Vertex AI Search and Conversation product released previously. It’s also built on top of the company’s latest Gemini large language models and relies both on RAG APIs and vector search, two popular methods used industry-wide to reduce hallucinations, where models make up incorrect answers when they can’t find an accurate response.

Are AI models doomed to always hallucinate?

Part of the way the company is improving the quality of the answers is through a process called “grounding,” where the answers are tied to something considered to be a reliable source. In this case, it’s relying on Google Search (which in reality could or could not be accurate).

“We’re now bringing you grounding in Google Search, bringing the power of the world’s knowledge that Google Search offers through our grounding service to models. In addition, we also support the ability to ground against enterprise data sources,” Kurian said. The latter might be more suitable for enterprise customers.

Image Credits: Frederic Lardinois/TechCrunch

In a demo, the company used this capability to create an agent that analyzes previous marketing campaigns to understand a company’s brand style, and then apply that knowledge to help generate new ideas that are consistent with that style. The demo analyzed over 3,000 brand images, descriptions, videos and documents related to this fictional company’s products stored on Google Drive. It then helped generate pictures, captions and other content based on its understanding of the fictional company’s style.

Although you can build any type of agent, this particular example would put Google directly in competition with Adobe, which released its creative generative AI tool Firefly last year and GenStudio last month to help build content that doesn’t stray from the company’s style. The flexibility is there to build anything, but the question is whether you want to buy something off the shelf instead if it exists.

The new capabilities are already available, according to Google. It supports multiple languages and offers country-based API endpoints in the U.S. and EU.

https://techcrunch.com/2024/04/09/google-cloud-next-2024-everything-you-need-to-know/