Build a real-time, performant and cost-effective knowledge store with Apache Doris 4.0

Combine Vector Search, Full-Text Search, SQL, and fast ingestion for a real-time knowledge store on structured and unstructured data

Submit your product and service requirements,
our team will respond within 1 business day.

loading...
New Release

Apache Doris 4.0 Is Live on VeloDB!

Apache Doris 4.0 brings capabilities that enable organizations to build a knowledge store with real-time data. Learn more about the release here.

Hear from our users and partners on how Apache Doris 4.0 accelerates efforts to build a knowledge store and the real-time benefits they are realizing.

Apache Doris 4.0 unified capabilities — Vector Search, Full-Text Search, SQL, JSON, Join
Progressive filtering: SQL filters → BM25 full-text → Vector search, achieving 58% to 94% relevance with 20× less memory
ByteDanceByteDance Case Study

Vector search alone is not enough

Users and agents require precise deterministic answers, which is difficult to rely on similarity search alone. By combining SQL filtering, full-text search, and vector search, users will get exact matches, semantically relevant context, and precision from one retrieval system.

Similarity and vector search is expensive. Many RAG systems built on pure vector searches require high memory and a high-dimensional index to get the necessary accuracy. Hybrid search can reduce the cost massively by reducing the required vector search with cheaper methods first.

Learn about ByteDance improving accuracy and lowering cost with hybrid search
Hybrid Search Outcome
58% → 94%
Relevance improvement with progressive filtering before semantic search
20×
Memory reduction with IVPQ compression, enabling single-server deployment at billion scale
Agentic data stack architecture with Apache Doris as the real-time knowledge store at center

Real-time context is needed for real-time insights and action

Delivering real-time context is hard. Incremental indexing at scale, managing document versioning to prevent duplicate retrieval, and ensuring consistency across hybrid search indexes is costly and fragile.

VeloDB and CocoIndex form a two-layer context stack — CocoIndex owns the transformation pipeline with per-step memory, while VeloDB stores, indexes, and serves queries in real time. Changed documents sync in seconds, not days.

Learn more about the real-time AI context stack
Real-Time Context Outcome
~1 Second
Data freshness — changed documents are queryable within one second of ingestion
Full Lineage
Trace any passage back to its source through every transformation step
Need help? Contact us!