arrow sign made by pine cones

Pinecone's vector database gets a new serverless architecture

arrow sign made by pine cones

Image Credits: D-BASE / Getty Images

For a long time, vector databases were a bit of a niche product, but because they are uniquely suited to provide context and long-term memory to large language models, everybody in the database space is now seemingly trying to bolt vector search onto their existing products as fast as possible. Meanwhile, dedicated services like Pinecone, which was founded by the team behind Amazon SageMaker, are leading the charge, with Pinecone raising a total of $138 million since it as founded in 2019. Today, Pinecone is launching Pinecone Serverless, a new and significantly enhanced serverless architecture to power its service.

Pinecone Serverless now separates reads, writes and storage, which should reduce costs for users. Indeed, Pinecone argues that its new architecture can offer a 10x to 100x cost reduction. The new architecture now supports vector clustering on top of blob storage. This results in lower latencies and the ability for Pinecone Serverless to support massive data sizes. Likewise, Pinecone Serverless introduces new indexing and retrieval algorithms to enable fast vector search across this blob storage. The service now also offers a multi-tenant compute layer.

“Since it is truly serverless, it completely eliminates the need for developers to provision or manage infrastructure and allows them to build GenAI applications more easily and bring them to market much faster,” the company explains in its announcement. “As a result, developers with use cases of any size can build more reliable, effective, and impactful GenAI applications with any LLM of their choice, leading to an imminent wave of incredible GenAI applications reaching the market.”

From the outset, Pinecone Serverless will offer integrations with several other AI and back-end services, including Anthropic, Anyscale, Cohere, Confluent, LangChain, Pulumi and Vercel. “Vercel’s mission is to help the world ship the best products, and in the age of GenAI that requires Pinecone as the vector database component,” said Guillermo Rauch, CEO and founder of Vercel. “That’s why we are announcing that all Vercel users can now add Pinecone Serverless to their applications in just a few clicks, with more exciting capabilities to come.”

Pinecone drops $100M investment on $750M valuation, as vector database demand grows

arrow sign made by pine cones

Pinecone's vector database gets a new serverless architecture

arrow sign made by pine cones

Image Credits: D-BASE / Getty Images

For a long time, vector databases were a bit of a niche product, but because they are uniquely suited to provide context and long-term memory to large language models, everybody in the database space is now seemingly trying to bolt vector search onto their existing products as fast as possible. Meanwhile, dedicated services like Pinecone, which was founded by the team behind Amazon SageMaker, are leading the charge, with Pinecone raising a total of $138 million since it as founded in 2019. Today, Pinecone is launching Pinecone Serverless, a new and significantly enhanced serverless architecture to power its service.

Pinecone Serverless now separates reads, writes and storage, which should reduce costs for users. Indeed, Pinecone argues that its new architecture can offer a 10x to 100x cost reduction. The new architecture now supports vector clustering on top of blob storage. This results in lower latencies and the ability for Pinecone Serverless to support massive data sizes. Likewise, Pinecone Serverless introduces new indexing and retrieval algorithms to enable fast vector search across this blob storage. The service now also offers a multi-tenant compute layer.

“Since it is truly serverless, it completely eliminates the need for developers to provision or manage infrastructure and allows them to build GenAI applications more easily and bring them to market much faster,” the company explains in its announcement. “As a result, developers with use cases of any size can build more reliable, effective, and impactful GenAI applications with any LLM of their choice, leading to an imminent wave of incredible GenAI applications reaching the market.”

From the outset, Pinecone Serverless will offer integrations with several other AI and back-end services, including Anthropic, Anyscale, Cohere, Confluent, LangChain, Pulumi and Vercel. “Vercel’s mission is to help the world ship the best products, and in the age of GenAI that requires Pinecone as the vector database component,” said Guillermo Rauch, CEO and founder of Vercel. “That’s why we are announcing that all Vercel users can now add Pinecone Serverless to their applications in just a few clicks, with more exciting capabilities to come.”

Pinecone drops $100M investment on $750M valuation, as vector database demand grows

Qdrant founding team

Open source vector database startup Qdrant raises $28M

Qdrant founding team

Image Credits: Qdrant — Qdrant founders

Qdrant, the company behind the eponymous open source vector database, has raised $28 million in a Series A round of funding led by Spark Capital.

Founded in 2021, Berlin-based Qdrant is seeking to capitalize on the burgeoning AI revolution, targeting developers with an open source vector search engine and database — an integral part of generative AI, which requires relationships be drawn between unstructured data (e.g. text, images or audio that isn’t labelled or otherwise organized), even when that data is “dynamic” within real-time applications. As per Gartner data, unstructured data makes up around 90% of all new enterprise data, and is growing three times faster than its structured counterpart.

The vector database realm is hot. In recent months we’ve seen the likes of Weaviate raise $50 million for its open source vector database, while Zilliz secured secured $60 million to commercialize the Milvus open source vector database. Elsewhere, Chroma secured $18 million in seed funding for a similar proposition, while Pinecone nabbed $100 million for a proprietary alternative.

Qdrant, for its part, raised $7.5 million last April, further highlighting the seemingly insatiable appetite investors have for vector databases — while also pointing to a planned growth spurt on Qdrant’s part.

“The plan was to go into the next fundraising in the second quarter this year, but we received an offer a few months earlier and decided to save some time and start scaling the company now,” Qdrant CEO and co-founder Andre Zayarni explained to TechCrunch. “Fundraising and hiring of right people always takes time.”

Of note, Zayarni says that the company actually rebuffed a potential acquisition offer from a “major database market player” at the same time of receiving a follow-on investment offer. “We went with the investment,” he said, adding that they’ll use the fresh cash injection to build out its business team, given that the company substantively consists of engineers at the moment.

Binary logic

In the intervening nine months since its last raise, Qdrant has launched a new super-efficient compression technology called binary quantization (BQ), focused on low-latency, high-throughput indexing which it says can reduce memory consumption by as much as 32 times and enhance retrieval speeds by around 40 times.

“Binary quantization is a way to ‘compress’ the vectors to simplest possible representation with just zeros and ones,” Zayarni said. “Comparing the vectors becomes the simplest CPU instruction — this makes it possible to significantly speed up the queries and save dramatically on memory usage. The theoretical concept is not new, but we implemented it the way that there is very little loss of accuracy.”

BQ might not work for all all AI models though, and it’s entirely up to the user to decide with compression option will work best for their use-cases — but Zayarni says that the best results they found were with OpenAI’s models, while Cohere also worked well as did Google’s Gemini. The company is currently benchmarking against models from the likes of Mistral and Stability AI.

It’s such endeavors that have helped attract high-profile adopters, including Deloitte, Accenture, and — arguably the highest profile of them all — X (née Twitter). Or perhaps more accurately, Elon Musk’s xAI, a company developing the ChatGPT competitor Grok and which debuted on the X platform last month.

While Zayarni didn’t disclose any details of how X or xAI was using Qdrant due to a non-disclosure agreement (NDA), it’s reasonable to assume that it’s using Qdrant to process real-time data. Indeed, Grok uses a generative AI model dubbed Grok-1 trained on data from the web and feedback from humans, and given its (now) tight alignment with X, it can incorporate real-time data from social media posts into its responses — this is what is known today as retrieval augmented generation (RAG), and Elon Musk has teased such use-cases publicly over the past few months.

Qdrant doesn’t reveal which of its customers are using the open source Qdrant incarnation and which are using its managed services, but it did point to a number of startups, such as GitBook, VoiceFlow, and Dust, which are “mostly” using its managed cloud service — this, effectively, saves resource-restricted companies from having to manage and deploy everything themselves as they would have to with the core open source incarnation.

However, Zayarni is adamant that the company’s open source credentials are one of the major selling points, even if a company elects to pay for add-on services.

“When using a proprietary or cloud-only solution, there is always a risk of vendor lock-in,” Zayarni said. “If the vendor decides to adjust the pricing, or change other terms, customers need to agree or consider a migration to an alternative, which isn’t easy if it’s a heavy-production use-case. With open source, there is always more control over your data, and it is possible to switch between different deployment options.”

Alongside the funding today, Qdrant is also officially releasing its managed “on-premise” edition, giving enterprises the option to host everything internally but tap the premium features and support provided by Qdrant. This follows last week’s news that Qdrand’s cloud edition was landing on Microsoft Azure, adding to the existing AWS and Google Cloud Platform support.

Aside from lead backer Spark Capitali, Qdrant’s Series A round included participation from Unusual Ventures and 42cap.

Pinecone launches its serverless vector database out of preview

Image Credits: Victoria Kotlyarchuk / Getty Images

Pinecone, the vector database startup founded by Edo Liberty, the former head of Amazon’s AI Labs, has long been at the forefront of helping businesses augment large language models (LLMs) with their own data. Most recently, though, the company completely rearchitected its product to launch Pinecone Serverless, which frees its customers from having to think about managing their deployments and scaling them. Today, Pinecone serverless comes out of beta and is now generally available.

Liberty notes that the company’s early customers are now transitioning from experimenting with generative AI to wanting to launch their own AI products. The company watched enterprises grapple with the complexity of building new applications all while also figuring out how to best put them into production.

“The first like wave of production-grade applications is hitting the market now and in the next six to nine months. What our more than 5,000 customers told us loud and clear is that they need a dedicated, optimized, specialized tool that is extremely good at doing vector search, doing RAG, extracting knowledge and generating context for these language models. What they were really saying is: hey, I need scale, I need performance, and I need costs to be such that I can reason about the product that I’m building.”

Image Credits: Pinecone

Liberty stressed that Pinecone spent a lot of time making the product ready for production deployments — all while making it significantly more affordable, too. The company actually believes that customers who use Pinecone serverless can reduce their cost up to 50x, in part because the team rearchitected the system to be a multi-tenant service that decouples storage and compute. With that, Pinecone’s customers only pay when they actually consume CPU time, with the company orchestrating the capacity in the backend.

“Because we run everything as a service, our ability to orchestrate all of that makes us able to charge people for exactly what they use — and not anything more. That is incredibly rare and incredibly hard to do,” Liberty said.

Pinecone founder Edo Liberty.
Pinecone founder Edo Liberty.
Image Credits: Pinecone

During the public preview, Pinecone’s customers also asked for a number of additional features. One of these is Private Endpoints, which is launching in public preview today. This allows enterprises to create a direct connection to their virtual private clouds on Amazon via AWS PrivateLink, which doesn’t expose their data to the public internet to ensure the data stays well within the various governance and compliance regimes a company may have to adhere to.

Some of the companies that are already using Pinecone serverless include Gong, Help Scout, New Relic, Notion, TaskUS and You.com.

“Notion is leading the AI productivity revolution,” Notion co-founder and COO Akshay Kothari said. “Our launch of a first-to-market AI feature was made possible by Pinecone serverless. Their technology enables our Q&A AI to deliver instant answers to millions of users, sourced from billions of documents. Best of all, our move to their latest architecture has cut our costs by 60%, advancing our mission to make software toolmaking ubiquitous.”