Build a real-time, performant and cost-effective knowledge store with Apache Doris 4.0

Combine Vector Search, Full-Text Search, SQL, and fast ingestion for a real-time knowledge store on structured and unstructured data

Start Free Cloud Trial

Submit your product and service requirements,
our team will respond within 1 business day.

New Release

Apache Doris 4.0 Is Live on VeloDB!

Apache Doris 4.0 brings capabilities that enable organizations to build a knowledge store with real-time data. Learn more about the release here.

Hear from our users and partners on how Apache Doris 4.0 accelerates efforts to build a knowledge store and the real-time benefits they are realizing.

Apache Doris 4.0 unified capabilities — Vector Search, Full-Text Search, SQL, JSON, Join

Progressive filtering: SQL filters → BM25 full-text → Vector search, achieving 58% to 94% relevance with 20× less memory

ByteDance Case Study

Vector search alone is not enough

Users and agents require precise deterministic answers, which is difficult to rely on similarity search alone. By combining SQL filtering, full-text search, and vector search, users will get exact matches, semantically relevant context, and precision from one retrieval system.

Similarity and vector search is expensive. Many RAG systems built on pure vector searches require high memory and a high-dimensional index to get the necessary accuracy. Hybrid search can reduce the cost massively by reducing the required vector search with cheaper methods first.

Learn about ByteDance improving accuracy and lowering cost with hybrid search

Hybrid Search Outcome

58% → 94%

Relevance improvement with progressive filtering before semantic search

20×

Memory reduction with IVPQ compression, enabling single-server deployment at billion scale

Agentic data stack architecture with Apache Doris as the real-time knowledge store at center

Real-time context is needed for real-time insights and action

Delivering real-time context is hard. Incremental indexing at scale, managing document versioning to prevent duplicate retrieval, and ensuring consistency across hybrid search indexes is costly and fragile.

VeloDB and CocoIndex form a two-layer context stack — CocoIndex owns the transformation pipeline with per-step memory, while VeloDB stores, indexes, and serves queries in real time. Changed documents sync in seconds, not days.

Learn more about the real-time AI context stack

Real-Time Context Outcome

~1 Second

Data freshness — changed documents are queryable within one second of ingestion

Full Lineage

Trace any passage back to its source through every transformation step