Back

What is Hybrid Search

Keywords:

A. The Rise of the Retrieval-Augmented Generation (RAG) Paradigm and Data-Driven Requirements

With the rapid advancement of artificial intelligence technology, Retrieval-Augmented Generation (RAG) has emerged as the core framework for enhancing the output quality of Large Language Models (LLMs) in enterprise-level applications. The core value of RAG lies in providing the LLM with accurate, up-to-date, and factually grounded information, thereby effectively overcoming the inherent latency and potential "hallucination" problems LLMs face when dealing with knowledge. To achieve high-quality generation, a RAG system must first rely on a high-performance Information Retrieval (IR) phase that is required to comprehensively capture the query intent, balancing both conceptual understanding (Contextual) and precise facts (Factual).

B. Analysis of Single Retrieval Model Limitations

Traditional single-modality retrieval systems, whether based on keyword-driven lexical search or vector-based semantic search, exhibit significant limitations when faced with complex enterprise knowledge bases, making hybrid retrieval an inevitable architectural choice.

1. Lexical/Keyword Search: Strengths and Blind Spots

Lexical search primarily relies on inverted indices, and its strength lies in achieving extremely high Precision for exact matches. This makes it excel at processing specific types of low-frequency entities that require accurate identification, such as product codes, SKUs, dates, names, highly specialized industry jargon, and error codes. However, the inherent flaw of lexical search is its lack of contextual understanding and semantic generalization ability. When users cannot recall precise keywords or use phrasing different from the source document, lexical search often fails to return relevant results. For example, for the query "screen turned black", lexical search might fail to match the authoritative document titled "Fix for error code GFX-108".

2. Semantic/Vector Search: Strengths and Inherent Defects

Semantic search utilizes Dense Vectors to represent and capture the semantic meaning and intrinsic relationships of content. This method allows users to search based on "what they mean," enabling the system to find conceptually similar information even if no exact keywords from the document appear in the query.

However, semantic search suffers from a critical "inherent defect", namely the Out-of-Domain (OOD) Data Problem. The effectiveness of semantic search is entirely dependent on the training data and comprehension ability of the underlying Embedding Model. For new entities or proprietary enterprise data not contained within the model's training set—such as newly added product names, internal company codes, or arbitrary SKUs—the embedding model cannot generate meaningful semantic representations, leading to degraded or complete failure of retrieval quality.

C. Hybrid Search Value Proposition: Bridging the Gap Between Precision and Conceptual Understanding

Hybrid Search systematically integrates the strengths of lexical search and semantic search, aiming to bridge the gap between precision and conceptual understanding. It provides system Robustness and a fault-tolerance mechanism to counteract the structural risks posed by pure semantic search when dealing with OOD data.

The value of hybrid search is particularly prominent in RAG scenarios. It can simultaneously retrieve the "Fix Guide for error code GFX-108" (precise facts from keyword search) and "Forum posts describing the screen black issue" (broad context from vector search). Through hybrid retrieval, the system ensures that the LLM is provided with the richest and most complete context possible, significantly enhancing the utility and accuracy of the final answer. Therefore, adopting hybrid search is not merely a performance optimization; it is a necessary architectural decision to ensure high reliability and prevent LLM hallucination in enterprise-level AI search systems.

II. Hybrid Search Definition, Components, and Working Principle

Hybrid Search is a single query request configured to execute both a full-text query and a vector query simultaneously. This query is executed against a special search index that contains both searchable plain-text content and pre-generated embeddings.

B. System Components of Hybrid Search: Coexistence of Sparse and Dense Vectors

The key to the success of the hybrid search architecture is its ability to efficiently manage and query two fundamentally different data representations: Sparse Vectors and Dense Vectors.

1. Sparse Vectors and Inverted Indices

The traditional, token-based Inverted Index models text data as sparse vectors. In this model, each token in the vocabulary corresponds to one dimension of the vector, so the number of dimensions is often extremely high. The vector for a given document is mostly zero (hence sparse) because any single document contains only a tiny fraction of the tokens in the entire index. Sparse vectors are typically generated directly from text input by traditional tokenizers and are the foundation for efficient lexical search.

2. Dense Vectors and Embedding Models

Dense vector representation stands in sharp contrast to sparse vectors. It distills the approximate semantic meaning into a fixed and limited number of dimensions via deep learning models (such as BERT). The number of dimensions in a dense vector is usually much lower than in the sparse case, and the vector for any given document is dense, as most of its dimensions are populated by non-zero values. This fixed-length vector of floating-point numbers captures the conceptual meaning of the content, ensuring that data points with similar semantics are mapped close to each other in the vector space. Unlike sparse vectors, the task of generating dense vectors usually requires external application logic or built-in NLP models to complete and the result is stored as a specific dense_vector field type.

C. Workflow Overview: Parallel Execution and Result Fusion

The execution flow of hybrid search is concise and efficient, primarily consisting of the following three steps:

  1. Single Query Request: The user submits a single query request that includes both lexical search and vector query parameters.
  2. Parallel Execution: The search system simultaneously launches and executes the full-text search (lexical retrieval) and the vector search (semantic retrieval).
  3. Result Fusion: The two independent retrieval processes each produce a result set ranked by its own relevance score. Subsequently, the system uses a result fusion algorithm (most commonly Reciprocal Rank Fusion, RRF) to merge and unify these rankings, generating the final single result set.

Parallel execution is a critical engineering consideration for the efficiency of hybrid search. However, this also means that the overall Query Latency will be limited by the slower of the lexical and vector searches. Therefore, system architects must strive to optimize the performance of vector search (P99 tail latency) to match that of lexical search, ensuring the efficiency and responsiveness of hybrid search in a production environment.

III. Key Hybrid Search Technology I: The Lexical Retrieval Foundation

Lexical retrieval, which serves as the foundation for providing precise matching signals in hybrid search, relies on proven and mature algorithms, with the BM25 (Best Matching 25) algorithm being the cornerstone of modern IR systems.

A. In-Depth Analysis of the BM25 (Best Matching 25) Algorithm

BM25 is a probabilistic scoring function used to determine the relevance of a document to a given search query. It is widely used in academic search and industry due to its excellent performance in balancing multiple relevance factors.

1. Core Components of BM25

BM25 combines three key elements to calculate relevance scores:

  • Term Frequency (TF): Measures how often the search term appears in the document. BM25 introduces Term Frequency Saturation, using the customizable parameter k_1 to limit the marginal impact of excessive keyword repetition on the score. This means the contribution to the relevance score slows down once the term count reaches a certain threshold.
  • Inverse Document Frequency (IDF): Counts how frequently the term appears across the entire document collection. It assigns a higher weight to rarer terms, enhancing the distinctiveness of the information.
  • Document Length Normalization: This is one of the biggest improvements BM25 made over earlier models. It uses the parameter b to adjust the score, accounting for document length to ensure a fair relevance assessment for both long and short documents, preventing an inherent bias toward longer documents.

B. Comparative Analysis of BM25 and TF-IDF

BM25 addresses several key shortcomings of the TF-IDF (Term Frequency–Inverse Document Frequency) model, making it the preferred choice for advanced search applications:

DimensionTF-IDFBM25 (Best Matching 25)
Document Length AdjustmentLimited accuracy; does not adjust for document length, potentially favoring long documentsSuperior accuracy; normalizes document length via the b parameter, ensuring fair scoring
Term Frequency HandlingRelies solely on term frequency; sensitive to repeated termsIncorporates Term Frequency Saturation; limits the marginal effect of excessive repetition
Underlying ModelBased on word frequencyBased on a probabilistic model, resulting in more nuanced scoring

For system architects, the tunable parameters k_1 and b in BM25 offer Legacy Control Knobs for optimizing the lexical signal. By adjusting these parameters, the output of lexical relevance can be refined according to the characteristics of the specific enterprise knowledge base (such as document length distribution or the repetition rate of industry terms), thereby ensuring the high quality of the sparse signal within hybrid retrieval.

IV. Key Hybrid Search Technology II: Semantic Retrieval Architecture

Semantic retrieval relies on converting data into dense vectors, and achieving conceptual matching through efficient similarity search techniques (k-NN/aNN).

A. Dense Vectors and Embedding Models

Dense vectors are created by deep learning models (such as Vertex AI Embeddings) that convert text, images, or other data into fixed-length vectors of floating-point numbers to capture their semantic meaning. The embedding model constructs a "map of content meanings," where semantically similar content is mapped to nearby points in the vector space. For example, a text discussing movies, music, and actors might be represented by a vector [0.1, 0.02, 0.3], indicating the weights of different topics. This representation is fundamental for semantic search, content recommendations, and multimedia similarity search.

B. Vector Search Core Mechanisms: k-NN and aNN

The core task of vector retrieval is to find the k nearest neighbors (i.e., the k most similar documents) whose vectors d in the corpus are closest to the query vector v.

  • k-NN (K-Nearest Neighbors): Traditional exhaustive search methods calculate the distance (e.g., Euclidean distance, dot product) between the query vector and every document vector in the corpus to determine similarity.
  • Approximate Nearest Neighbor (aNN): For massive corpora, the computational cost of exhaustive k-NN is prohibitively high. Therefore, the industry widely adopts aNN strategies. aNN algorithms sacrifice a tiny degree of accuracy in exchange for massive speed and efficiency gains.

C. aNN Algorithms and Performance Optimization

Efficient aNN algorithms are critical for achieving low-latency semantic retrieval in hybrid search.

  1. HNSW (Hierarchical Navigable Small World): This is one of the most popular aNN strategies currently. It is based on a graph structure that provides efficient approximate vector retrieval. Systems like Apache Lucene and Solr employ HNSW-based graph structures for highly efficient retrieval.
  2. Vector Quantization and Compression: Facing terabyte-level data volumes in enterprise environments, memory and latency pose significant challenges. Advanced vector quantization techniques like Better Binary Quantization (BBQ) can compress embeddings into a compact binary form, thereby accelerating similarity search, reducing memory footprint, and ultimately improving search relevance and cost efficiency.

The engineering challenge of vector search is no longer simply accuracy, but how to achieve a balance between cost-effectiveness and low latency. Sparse retrieval (BM25) has an inherent speed advantage, and the emergence of HNSW and BBQ aims to push the performance curve of dense vector search closer to that of lexical search, making hybrid search's parallel execution both feasible and economical.

V. Core Algorithm: Result Fusion and Ranking

The value of hybrid search is demonstrated in its result fusion phase, which involves unifying and optimally ranking results from different metric spaces (BM25 scores and vector similarity scores). These scores cannot be directly compared or summed.

A. Reciprocal Rank Fusion (RRF)

RRF is the fusion algorithm most widely adopted and often used by default in hybrid search (for instance, in Azure AI Search).

1. Detailed Mechanism of RRF

The core advantage of RRF lies in its Ranking Robustness: it does not rely on the absolute values of the scores output by the retrievers but instead generates a unified result based only on the document's Position (Rank) within its respective list.

  • Parallel Output: RRF takes the sorted result sets from multiple concurrently executed queries (e.g., full-text query + vector query).
  • Reciprocal Rank Score Calculation: For a document in any query result list, the system assigns a reciprocal rank score based on its position, rank, in the list. The scoring formula is typically Score = 1 / (K + rank), where K is a constant (e.g., 60) used to smooth the score and ensure that higher-ranked documents receive disproportionately high weight.
  • Summation and Final Ranking: For the same document, the system accumulates the reciprocal rank scores obtained across all parallel queries to form a total fusion score. Finally, documents are sorted based on this total score to generate the final fused ranking.

RRF's main strength is that it does not require the input scores to be comparable, making it the preferred method for combining lexical and semantic signals.

B. Weighted Fusion Strategy and Control

While standard RRF is robust, it treats all retrievers equally. In complex search scenarios (such as balancing brand and category in e-commerce), more fine-grained control is necessary.

1. Alpha Parameter (alphaalpha)

The alphaalpha parameter is one of the most straightforward ways to adjust fusion weights, especially common in platforms like Weaviate. The value ranges from 0 to 1:

  • alpha=0.5alpha=0.5 (often the default): Lexical and vector search contribute equally.
  • alpha>0.5alpha>0.5: Assigns higher weight to vector search.
  • alpha<0.5alpha<0.5: Assigns higher weight to keyword search.
  • alpha=1alpha=1 is equivalent to a pure vector search, while alpha=0alpha=0 is equivalent to a pure keyword search.

2. Explicit Weight Multiplier

Under the weighted RRF framework, an initial score from a single retriever can be subject to a weight multiplier before RRF calculation. For example, if a retriever's weight is set to 2.0, its contribution to the total RRF score will be proportionally greater; if set to 0.5, the proportion is reduced. This mechanism provides more precise control over the search strategy.

C. Alternative Approaches like Relative Score Fusion (RSF)

In addition to RRF, some platforms offer alternative fusion methods.

  • Relative Score Fusion (RSF) or Linear Fusion (Linear Retriever): Unlike RRF's ranking-based approach, these methods aim to achieve fusion by normalizing the raw scores. RSF (such as Weaviate's relativeScoreFusion) normalizes vector similarity scores and BM25 scores separately into the 0 to 1 range, followed by a weighted summation.
  • Linear Fusion (Weighted Sum): The linear retriever calculates a weighted sum, which preserves the relative importance of documents, and supports normalization techniques like MinMax.

The choice of fusion algorithm reflects a strategic trade-off between Fidelity of Raw Scores and Ranking Robustness. RRF offers the highest robustness because it is independent of the raw scores. However, RSF and linear fusion, through sophisticated normalization, attempt to preserve the subtle variations in raw score differences between documents. If the quality of the raw retrieval scores is high and normalization is handled properly, this approach may offer higher precision in certain scenarios that require downstream models to consume precise scores.

VI. Hybrid Search Architecture Practice and Applications

A. Hybrid Retrieval and Reranking Architecture in the Vector Database Velodb

Modern vector databases, such as Velodb, have integrated hybrid search as a core capability to meet the dual enterprise requirements for high precision and semantic understanding. Hybrid search ensures the comprehensiveness and accuracy of retrieval results by processing queries in parallel and utilizing a subsequent Reranking step to consolidate results.

1. Velodb's Hybrid Search Capability

The Velodb platform supports hybrid retrieval, which is an evolutionary information retrieval method that combines precise keyword matching (e.g., BM25-based search) with semantic understanding (based on kNN vector search).

  • Complementary Strengths: Keyword search offers precise control and explainable results, suitable for finding exact terms, codes, or names. In contrast, vector search (such as the kNN search supported by Velodb) is responsible for understanding the query's semantics and context, enabling the system to find relevant content even when the user employs natural language or conceptual descriptions.
  • Parallel Query Execution: The hybrid search architecture processes a single user request by executing multiple search queries in parallel (typically BM25-based keyword search and kNN vector search). This architecture ensures adaptability to diverse query patterns and user expectations.

2. Result Fusion and Reranking Mechanism

In the hybrid search workflow, parallel search pipelines return multiple result sets, therefore the reranking strategy is a crucial step to ensure the final results are the most accurate and relevant.

  • Importance of Reranking: Hybrid search achieves more precise results through multiple simultaneous Approximate Nearest Neighbor (ANN) searches. The reranking step aims to merge and reorder these result sets—originating from different search paths (e.g., text similarity, image similarity, keyword matching)—to return a single, unified result set.
  • Supported Reranking Strategies: Platforms typically support various reranking strategies to flexibly adapt to different business needs:
    • Reciprocal Rank Fusion Ranker (RRFRanker): This is a ranking-based fusion strategy. It calculates a total score based on the document's relative position in different result sets, often leading to a fairer and more effective integration of diverse data types or modalities. This method is robust because it does not require the raw scores to be comparable.
    • Weighted Ranker (WeightedRanker): This strategy merges results by calculating a weighted sum of the raw scores or distances from different vector searches. It allows different weights to be assigned based on the importance of each vector field or search path, thereby achieving customized result prioritization.

By integrating these advanced retrieval and reranking algorithms, modern vector databases like Velodb can effectively address the limitations of single retrieval methods and provide high-quality, high-reliability foundational data for RAG and other applications.

B. Advanced Applications in Retrieval-Augmented Generation (RAG)

Hybrid search is a strategic step in ensuring the output quality of RAG. By providing multi-dimensional context, it enables the LLM to generate responses that are both precise and complete.

  • Factual Grounding: The precision of keyword search ensures that the LLM receives specific, non-substitutable entities and facts (such as particular error codes or product models), preventing the LLM from hallucinating on critical details.
  • Contextual Completeness: Semantic search, conversely, ensures that the broad conceptual context surrounding the query intent is captured, preventing the LLM's generated answer from being too narrow. For example, when handling customer support queries, hybrid search can simultaneously retrieve official guides (precise) and user discussions (contextual).

C. Tuning Strategies: Practical Application of Alpha Values and Weighted RRF

Tuning is essential for ensuring hybrid search adapts to specific business scenarios. For systems focused on conceptual matching and finding similar content (e.g., news recommendation), the alpha value should be increased, assigning greater weight to vector search. For scenarios relying on high-precision fact localization (e.g., legal documents or financial transaction searches), the alpha value should be decreased, emphasizing keyword matching.

Furthermore, advanced system architectures are exploring dynamic tuning: adjusting fusion weights based on the query's characteristics (e.g., whether the query contains many numbers, capital letters, or technical jargon). For instance, when technical terms are detected, the algorithm automatically increases the weight of lexical retrieval to achieve optimized retrieval effectiveness.

Evaluating the effectiveness of a hybrid search system, particularly the quality of the fused and re-ranked results, is crucial.

  • nDCG@k (Normalized Discounted Cumulative Gain at rank k): This is the primary metric in the information retrieval field, typically evaluated at k=10 by default (nDCG@10). nDCG can handle graded relevance judgments and penalizes documents' positions through discounting, ensuring that highly relevant documents are ranked at the top of the list. Its robustness makes it the preferred metric for evaluating hybrid retrieval effectiveness.
  • MRR@k (Mean Reciprocal Rank): Measures the position of the first relevant document, suitable for scenarios where the importance of a single correct answer is being assessed.
  • Recall@k: Measures the proportion of all relevant documents retrieved within the top k results.

B. Efficiency Considerations: Latency and Throughput

In a production environment, system efficiency is just as important as effectiveness.

  • Query Latency: Evaluates online efficiency, typically measured by average latency and P99 tail latency. Due to the parallel execution of hybrid search, tail latency is a key focus for system architects.
  • Throughput: Measures the system's capacity under high load, expressed in Queries Per Second (QPS).
  • Indexing Performance: Measures the total indexing time, including data pre-processing, embedding generation, and the construction time for both sparse and dense indices.

C. Conclusion: The Strategic Significance of Hybrid Search for Enterprise AI Systems

Hybrid search systematically combines classic information retrieval theory (BM25 lexical search) with modern deep learning technology (dense vector semantic search) to build a highly robust retrieval system that is both precise and capable of deep semantic understanding.

The strategic significance of hybrid search is that it solves the inherent defects of pure semantic search when dealing with evolving enterprise knowledge and out-of-domain data, providing comprehensive data support for RAG. The emphasis on evaluation metrics like nDCG@10 indicates that the ultimate goal of hybrid search is to optimize the quality of the top results, ensuring that the LLM receives the highest quality, most relevant information within its limited context window. This optimization of retrieval quality makes hybrid search a mandatory architectural standard for building high-reliability, high-accuracy enterprise-level AI systems. It represents a mature and pragmatic engineering strategy adopted by the information retrieval domain to meet the challenges of next-generation generative AI.

VIII. Hybrid Search In VeloDB

A. The VeloDB Approach: HSAP-Native Hybrid Search Implementation

Unlike loosely coupled architectures that require "gluing" a vector database and a search engine together via application logic, VeloDB implements a Hybrid Search and Analytics Processing (HSAP) architecture. It integrates a mature Inverted Index engine directly alongside its Vector Execution Engine within a unified storage layer.

  1. Architectural Integration: Vector + Inverted Index The core differentiator of VeloDB's hybrid search is its ability to perform "Pre-filtering" with zero performance penalty, thanks to its generalized Inverted Index technology.
  • Unified Indexing: For a single row of data, VeloDB maintains both an HNSW index for the dense vector column and Inverted Indices for text/scalar columns (e.g., user_id, category, timestamp, keywords).
  • Predicate Pushdown: When a hybrid query is executed (e.g., "Find relevant documents [Vector] regarding 'compliance' [Text] created last week [Scalar]"), VeloDB pushes the scalar and text predicates down to the storage engine. The Inverted Index instantly prunes the search space, allowing the Vector Engine to perform ANN search only on the relevant subset of data. This avoids the "brute-force scan" issues common in other vector databases when dealing with selective filters.
  1. Execution Flow: Single-Stage SQL Processing VeloDB simplifies the complexity of "Parallel Execution and Result Fusion" by abstracting it into standard SQL execution.
  • SQL-Native Interface: Developers do not need to manage parallel threads or complex fusion logic in Python/Java code. A hybrid search is a declarative SQL query.
    • Example Logic:
    • WITH text_raw AS ( SELECT id, score() AS bm25 FROM hackernews WHERE (textMATCH_PHRASE 'hybird search' ORtitle MATCH_PHRASE 'hybird search') AND dead = 0 AND deleted = 0 ORDER BY score() DESC LIMIT 1000 ), vec_raw AS ( SELECT id, l2_distance_approximate(vector, [0.12, 0.08, ...]) AS dist FROM hackernews ORDER BY dist ASC LIMIT 1000 ), text_rank AS ( SELECT id, ROW_NUMBER() OVER (ORDER BY bm25 DESC) AS r_text FROM text_raw ), vec_rank AS ( SELECT id, ROW_NUMBER() OVER (ORDER BY dist ASC) AS r_vec FROM vec_raw ), fused AS ( SELECT id, SUM(1.0/(60 + rank)) AS rrf_score FROM ( SELECT id, r_text AS rank FROM text_rank UNION ALL SELECT id, r_vec AS rank FROM vec_rank ) t GROUP BY id ORDER BY rrf_score DESC LIMIT 20 ) SELECT f.id, h.title, h.text, f.rrf_score FROM fused f JOIN hackernews h ON h.id = f.id ORDER BY f.rrf_score DESC;
  • Internal Parallelism: The Massively Parallel Processing (MPP) engine of VeloDB automatically parallelizes the vector calculations and keyword matching across cluster nodes, returning a unified, ranked result set in milliseconds.
  1. Flexible Reranking & Scoring While VeloDB supports external Reranking models, its internal engine offers powerful "Lightweight Reranking" capabilities directly via SQL functions: