Cloud Embedding Providers Are Non-Starters for Real-Time Search

# Vector Databases Simply Aren't Enough In the race to fully harness AI and machine learning, some turn to vector databases for managing unstructured data. At first glance, vector databases seem like the catch-all solution: easily search your data with dense vectors. But once you start building and evaluating search systems, you begin to see that vector databases come with a host of issues. At Trieve, we've spent years grappling with the complexities that lie beyond simple vector storage and retrieval. Here's what we've learned, and why a more comprehensive approach is crucial. ### The Limitations of Vector Databases Vector databases store embeddings but do not generate them. When a user submits a query, it must be converted into a vector representation. Using an external or unoptimized provider for this conversion can take hundreds of milliseconds, creating a significant bottleneck. ``` # Pseudocode for the Traditional approach - slow at inference time query = "user input" query_embedding = generate_embedding(query) # 200-500ms # search cloud vector db instance results = vector_db.search(query_embedding) # 50+ms of over the wire timereranker.rerank(results) # 200-500ms ``` For instance, the OpenAI embeddings API is optimized for batch inference rather than single-query searches. Generating an embedding for just one user could take hundreds of milliseconds. Consequently, using the OpenAI API or any closed-source models for search purposes is generally impractical unless you’re prepared to invest in a custom instance. However, even with a custom instance, significant challenges remain. When your vector database and embedding server are geographically dispersed, the latency of data transmission significantly increases, particularly if a reranker is involved. This spatial separation can severely slow down your queries, leading to inefficient processing times: ![Vector%20Databases%20Simply%20Aren't%20Enough%204e50c8778190409d9db48687494a9181/Untitled.png](Vector%20Databases%20Simply%20Aren't%20Enough%204e50c8778190409d9db48687494a9181/Untitled.png) These multiple hops can significantly increase latency, potentially resulting in a system that takes over 500ms to execute a single search. Trieve utilizes embedding models and rerankers within a unified Kubernetes cluster, resulting in the following architecture: ### Dense Search Alone Is Insufficient ![Vector%20Databases%20Simply%20Aren't%20Enough%204e50c8778190409d9db48687494a9181/With_Trieve-2024-07-25-005149.png](Vector%20Databases%20Simply%20Aren't%20Enough%204e50c8778190409d9db48687494a9181/With_Trieve-2024-07-25-005149.png) Many developers initially opt for a basic dense vector method when integrating vector search. This approach is appealing due to its simplicity and ease of implementation. However, this "good enough" mentality can result in mediocre search experiences that do not fully capitalize on the capabilities of the data. Dense vector search is a powerful tool but it comes with significant limitations: 1. Lexical Limitations in Dense Vectors: While dense vectors effectively encapsulate semantic meanings, they frequently overlook precise keyword matches. This oversight can result in aggravating situations where a system disregards the exact terms of a user's query, providing results that are semantically similar yet ultimately irrelevant. 2. Challenges with Uncommon Vocabulary: Dense embeddings often face difficulties handling rare words, specialized jargon, or newly created terms that were not included in the training data of the model. 3. Context Insensitivity: Dense vectors assign a static representation to text, irrespective of the search context, potentially causing ambiguities and irrelevant outcomes for queries that have multiple meanings. 4. Opacity in Search Results: Search engines frequently return results for queries without transparent explanations, complicating the debugging process and hindering enhancements in search quality. 5. Dense Vector Search Drawbacks for Brief Queries: When handling very brief queries, dense vector search may underperform compared to traditional keyword-based search methods. 6. Keyword Search is Very Good: BM25 has been around for a long time for a reason: it performs extremely well, especially on domain-specific data. Don’t take it for granted that naive dense search outperforms full-text search, as this is frequently not the reality. At Trieve, we've found that combining dense vector search with other techniques like sparse vector methods (e.g., SPLADE), traditional BM25, and advanced re-ranking yields dramatically better results. Our hybrid strategy effectively captures semantic meanings and lexical matches, adeptly manages rare terms, and adjusts to the context of queries. Don't settle for mediocre search. Your users deserve an exceptional experience, and by utilizing modern hybrid techniques, you can provide a search function that accurately comprehends and meets their needs. **Scaling is Hard:** As the volume of your data increases, so do the complexities associated with managing it. Horizontal scaling of vector databases presents significant challenges, typically necessitating intricate sharding techniques and sophisticated load balancing that exceed the capabilities of the database’s native functions. Trieve simplifies this process, taking on the burden of scaling so that you can focus on other priorities rather than struggling with scaling Qdrant. Trieve effortlessly scales up to billions of vectors right from the start, all while ensuring exceptionally rapid search speeds. **Building Out RAG is Hard:** Retrieval Augmented Generation (RAG) is transforming AI applications, yet implementing it is complex. Effective RAG systems need more than mere vector similarity search. They necessitate intricate chunking strategies, accurate management of context windows, and the integration of advanced citation mechanisms, which surpass the functionalities of traditional vector databases. A chat application equipped with memory functionalities should seamlessly integrate advanced features such as follow-up query generation, which Trieve accomplishes using hybrid search with RAG, file uploads, chat topic creation, and analytics. While it's possible to implement these features using your vector database, ideally, your application infrastructure should support these functionalities natively, eliminating the need for additional complex integrations. Vector databases such as Qdrant are highly specialized tools optimized for specific tasks. However, developing a production-ready system requires integrating additional functionalities that go beyond vector search. Critical features like autocomplete, result grouping, analytics, and multi-tenancy introduce complexities that are not inherently managed by vector databases. **Search is Incredibly Complex:** Numerous search strategies exist, and jumping straight to simple semantic search is often a mistake. Options like SPLADE, dense vectors, hybrid search, and BM25 abound. Implementing these in vector databases is typically challenging, either requiring you to generate your own sparse/dense vectors or not offering this functionality at all. For instance, using PGVector to implement an effective hybrid search can be a nightmare, let alone SPLADE! Trieve handles all these strategies for you; all you need to provide are your raw files/chunks. The platform also allows you to compare different search settings on your data through a dashboard, helping you determine the best search strategy. If nothing seems to work, reach out to us. We’re here to assist! **Reranking is Challenging:** Implementing a reranking process can greatly enhance the quality of search results, though it typically exceeds the capabilities of most vector databases. Adding rerankers introduces additional complexity and can increase latency. A straightforward approach might involve using a proprietary reranker such as Cohere rerank; however, this adds far too much latency in practice. This approach shares many of the same difficulties as deploying your own embedding model. Nevertheless, for those with expertise in machine learning operations, the [TEI](https://github.com/huggingface/text-embeddings-inference) library is highly recommended for both embedding and reranking tasks. **Hybrid Search is More Complex Than It Seems:** Combining sparse and dense vectors for hybrid search sounds great in theory. However, in practice, it requires careful tuning and infrastructure that goes well beyond what vector databases offer. For example, many vector databases implement hybrid search by searching some sparse vectors, searching some dense vectors, and then using Mean Reciprocal Rank (MRR) to rerank. But in practice, this is often a bad strategy. Of course, you should try it on your own dataset, but MRR tends to mess up rankings, as it doesn’t give very strong pure text or semantic matches as much weight as chunks that somewhat match on both. Here is an example illustrating traditional hybrid search: ![Vector%20Databases%20Simply%20Aren't%20Enough%204e50c8778190409d9db48687494a9181/With_Trieve-2024-07-25-011312.png](Vector%20Databases%20Simply%20Aren't%20Enough%204e50c8778190409d9db48687494a9181/With_Trieve-2024-07-25-011312.png) In our experience, reranking with MRR typically yields inferior results compared to employing a dedicated reranker. ![Vector%20Databases%20Simply%20Aren't%20Enough%204e50c8778190409d9db48687494a9181/With_Trieve-2024-07-25-011453.png](Vector%20Databases%20Simply%20Aren't%20Enough%204e50c8778190409d9db48687494a9181/With_Trieve-2024-07-25-011453.png) Trieve enables you to select the reranking strategy that best fits your unique data and latency needs, and our user-friendly dashboard makes it incredibly easy to experiment with different options. **Evaluation is an Art, But Not an Easy One:** How do you determine if your search is truly effective? Creating dashboards and tools to evaluate various search strategies is a significant undertaking, separate from managing the vector database itself. While evaluations for vector search are valuable, the most effective assessment method is performing numerous searches to see if the results meet your expectations. This is why having a dashboard that allows easy tweaking of settings is crucial. Instead of spending months evaluating the quality of your search, you can achieve meaningful improvements in just days. All search systems encounter difficult queries that frustrate developers when incorrect results are returned. Without dashboards, adjusting your search to handle these challenging queries is challenging. ### Vector Databases: The Trieve Approach At Trieve, we've developed a system that transcends mere vector storage by comprehensively addressing these challenges. Here's how: **Lightning-Fast Embedding Generation:** Our optimizations have streamlined the entire embedding pipeline, slashing generation times to mere single-digit milliseconds. (ALL CODE IS FILLER AND NEEDS TO BE REWRITTEN ONCE WE HAVE AN SDK) ``` # Trieve approach - blazing fast query = "user input" results = trieve.search(query) # Embedding and search in ~20ms ``` **Seamless Scalability:** Our architecture is engineered for effortless horizontal scaling: ``` for i in range(1000000000): # Scale to billions of vectors effortlessly trieve.insert(vectors[i]) ``` **RAG Made Easy:** We seamlessly manage chunking, context handling, and prompt optimization for you: ``` trieve.chunk_an_insert(str(document)) rag_response = trieve.rag_query("Complex question about your data") ``` **Feature-Rich Right from the Box:** Autocomplete, grouping, analytics, and multi-tenancy are built-in: ``` autocomplete_results = trieve.autocomplete("what is group search") grouped_results = trieve.search("query", group_by="document") analytics = trieve.get_search_analytics() ``` **Adaptive Search Strategies:** Our system dynamically selects the optimal search strategy for each individual query: ``` results = trieve.search("query", strategy="auto") # Automatically selects best strategy ``` **Integrated Reranking:** Reranking is seamlessly integrated with minimal latency impact: ``` results = trieve.search("query", rerank=True) # Reranking in milliseconds ``` **Effortless Hybrid Search:** We combine sparse and dense vectors behind the scenes for optimal results: ``` pythonCopy code results = trieve.hybrid_search("query") # Combines SPLADE and dense vectors ``` **Built-in Evaluation Tools:** Our dashboard provides real-time insights into search performance: ``` trieve.open_evaluation_dashboard() # Opens a web interface for search analytics ``` ### The Power of a Comprehensive Approach What sets Trieve apart is our ability to address all these complexities in a unified system while still providing flexibility when you need it. Here's a glimpse of a complex search pipeline that runs in milliseconds: ``` results = trieve.search("quantum computing applications", strategy="hybrid", filters={"date":">2022", "domain":"physics"}, rerank=True, group_by="paper", highlight=True ``` This single line of code encapsulates: - Hybrid search combining dense and sparse vectors - Custom filtering - Re-ranking - Grouping results - Text highlighting - Optimized to run in a few milliseconds. ### Beyond Vector Databases The limitations of vector databases are real, and they shouldn't be a barrier to innovation in AI. By providing a comprehensive solution that goes far beyond simple vector storage, Trieve allows developers to focus on what really matters: building amazing AI applications. We've spent years optimizing every aspect of unstructured data processing so that you don't have to. The result is a system that not only saves you hundreds of development hours but also provides performance and features that would be challenging to achieve even with a dedicated team and a top-tier vector database. As we move into an AI-driven future, the ability to effectively work with unstructured data will be a key differentiator. With Trieve, you're not just keeping up; you're staying ahead of the curve. The future of search requires more than just vector databases. It demands a comprehensive, intelligent approach to ingesting and searching unstructured data. At Trieve, our approach involves building out the most challenging features needed for AI products so you don’t have to. In Rust.****