VeloDB Cloud v26.0: The Unified Analytics & Search Foundation for the AI Era

We are thrilled to announce the official launch of VeloDB Cloud Core v26.0, powered by the general availability of Apache Doris 4.0. This is the first time Apache Doris 4.0 is commercially available, incorporating all hardening and enhancements from more than 200 community contributors over the past five months (Open-Source Apache Doris versions 4.0–4.0.4).

As the premier commercial evolution of Apache Doris 4.0, v26.0 marks a pivotal milestone: VeloDB has officially transformed from a Real-Time Analytics Database into a Unified Analytics & Search Database. Apache Doris 4.0 represents a major leap for the project as it evolves beyond real-time analytics to tackle new challenges posed by the changing data management landscape driven by generative AI and agent-based systems.

After five months of intensive optimization, v26.0 delivers the high-performance, low-latency infrastructure required to anchor the next generation of AI applications and autonomous Agents. With this release, VeloDB helps organizations build AI systems and manage the challenges that emerge from AI adoption.

1. The Power of Hybrid Search: Analytics + Vector + Full-Text

In the AI era, developers require more than just structured data aggregation; they need deep semantic understanding and precision retrieval. v26.0 introduces Native Hybrid Search, seamlessly integrating OLAP Analytics, Full-text Search, and Vector Search within a single, high-performance engine.

Search is becoming a core analytical workload for modern data platforms, from log search and observability to document retrieval and context engineering. Apache Doris continues to enhance its native search capabilities across each 4.0.X release.

Native Vector Search

VeloDB features high-performance HNSW indexing for L2 and Inner Product similarity. By unifying SQL filtering and vector search, it eliminates the need for standalone vector stores. The ANN (Approximate Nearest Neighbor) index now supports an index-only scan mode, allowing vector searches to resolve results directly from the index without a table scan. This significantly reduces I/O overhead in large-scale vector retrieval scenarios.

Simplified Full-Text Search (DSL)

The new SEARCH() function provides an Elasticsearch-like DSL, offering a familiar and lightweight way to handle complex text queries (phrases, wildcards, regex). It also supports schema-less search via dot-notation for JSON and Variant sub-columns.

The SEARCH() function has been progressively enhanced across 4.0.X releases: Lucene Boolean Mode brings Elasticsearch-style boolean query semantics (must / should / must_not) to SQL-based search. BM25 score range filtering (min_score semantics) lets users filter search results by relevance score, focusing on only the most relevant matches for search recommendation and intelligent retrieval pipelines. New parameters (default_field, default_operator) enable convenient multi-field search without complex SQL expressions.

BM25 Ranking & Intelligent Pushdown

We replaced TF-IDF with the industry-standard BM25 algorithm for superior text relevance. Performance is further boosted by Intelligent Pushdown, where search conditions execute directly at the inverted index layer to minimize latency.

Advanced Tokenization & Indexing

VeloDB includes a comprehensive tokenizer suite (ICU, IK, Pinyin) with support for multiple indexes per column. All inverted indexes now default to the V3 format, delivering superior compression and faster retrieval.

Across the 4.0.X releases, the inverted index received foundational improvements: custom analyzers with Pinyin tokenizer and filter for Chinese Pinyin search scenarios, NORMALIZER support for case folding and accent removal before indexing, multi-position PhraseQuery support for position-aware phrase matching, and the ability to attach multiple tokenizer indexes to a single column for multilingual search, n-gram matching, or specialized analyzers.

Why VeloDB for AI?

VeloDB simplifies the architecture for core AI scenarios:

Long-term Memory: Provides AI Agents with access to both deep historical insights and immediate context without the overhead of disparate systems.
RAG / Knowledge Engines: Combines BM25 keyword relevance with vector semantic matching to drastically reduce LLM hallucinations.
Architecture Simplicity: A single source of truth for metadata and multimodal embeddings, eliminating cross-database synchronization.

Hybrid Search Example

Define inverted indexes for text and HNSW indexes for vectors directly in your DDL:

CREATE TABLE ai_knowledge_base (
    doc_id BIGINT NOT NULL,
    content TEXT,
    metadata VARIANT,                          
    content_vector ARRAY<FLOAT> NOT NULL,
    INDEX hnsw_idx (content_vector) USING ANN PROPERTIES (
        "index_type"  = "hnsw",
        "metric_type" = "l2_distance",
        "dim"         = "128"
    ),
    INDEX content_search (content) USING INVERTED
        PROPERTIES("parser" = "english")     
)
ENGINE = OLAP
DUPLICATE KEY(doc_id)
DISTRIBUTED BY HASH(doc_id) BUCKETS 16;

Filter by business logic, match keywords, and calculate vector similarity—all in one query:

SELECT doc_id, content
FROM ai_knowledge_base
WHERE metadata['author'] = 'VeloDB_Team'          
    AND content MATCH_ANY 'OOM Optimization'
ORDER BY l2_distance_approximate(content_vector,
         [0.12, 0.35, ...])
LIMIT 5;

2. In-Database AI Functions

VeloDB v26.0 eliminates the cost of data movement with Native In-Database AI. You can now invoke leading LLMs, including OpenAI, Anthropic, and Gemini, directly via SQL. By replacing fragmented Python-based RAG pipelines with simple SQL commands, VeloDB enables massive-scale data structuring and real-time inference at the source.

Filter by business logic, match keywords, and calculate vector similarity—all in one query:

SELECT doc_id, content
FROM ai_knowledge_base
WHERE metadata['author'] = 'VeloDB_Team'          
    AND content MATCH_ANY 'OOM Optimization'
ORDER BY l2_distance_approximate(content_vector,
         [0.12, 0.35, ...])
LIMIT 5;

3. Performance Acceleration

Enhanced Top-N Late Materialization

v26.0 extends Late Materialization to complex joins and external data sources. By deferring the fetch of non-essential columns in wide-table LIMIT queries, VeloDB drastically reduces I/O overhead, delivering order-of-magnitude performance gains.

SQL Cache & 100x Faster Parsing

SQL parsing performance has been optimized by up to 100x (reducing complex view parsing from 400ms to 2ms). SQL Cache is now enabled by default to further accelerate high-concurrency analytical workloads.

JSONB & Binary Optimization

Performance for JSONB functions has increased by 30%+, with full GROUP BY and DISTINCT support. New VARBINARY functions enable high-efficiency processing of serialized data directly via SQL.

Materialized Views: Smarter Transparent Acceleration

Materialized views received several important improvements: transparent rewrite remains available when non-partitioned base tables change (reducing maintenance overhead), materialized views can now be built on top of regular views, refresh now supports multiple partitioned change tracking (PCT) tables, and queries can hit materialized view rewrites even when the view contains window functions.

4. Resilience

To ensure stability for memory-intensive ETL and Materialized View refreshes, v26.0 introduces architecture-level safeguards:

Operator Spill-to-Disk

Intermediate data from Hash Join, Aggregation, Sort, and CTE operators now automatically spills to disk when memory is low. This fundamentally prevents OOM failures and ensures large-scale queries always finish successfully.

Easier Workload Management

VeloDB v26.0 simplifies resource control by unifying CPU and memory limits into one easy setup. By combining Workload Groups with automatic disk spilling (supporting Fixed, Dynamic, and None strategies), the system handles resource competition for you. This ensures your most important tasks always stay stable and fast.

What Else Is New in VeloDB Cloud v26.0

Beyond the headline features above, Apache Doris 4.0 brings a broad set of improvements across SQL capabilities, semi-structured data handling, and lakehouse integration.

Expanded SQL & Analytical Capabilities

New spatial functions (ST_Distance, ST_GeometryType, ST_Length) enable geospatial analytics directly in SQL without external GIS tooling.
Enhanced time processing: PREVIOUS_DAY() for calendar-based business logic, additional INTERVAL time units, and MySQL-compatible UTC functions.
New hash functions (mmh64_v2, json_hash) for cross-system data consistency checks and JSON deduplication.
PostgreSQL partition table sync via Streaming Job, enabling real-time HTAP architectures where PostgreSQL serves as the transactional source and Doris as the analytical target.

Improved Semi-Structured Data Processing

New VARBINARY functions (length, from_base64_binary, to_base64_binary, sub_binary) with type mapping support across Hive, Iceberg, Paimon, and JDBC external tables.
JSON processing enhancements: sort_json_object_keys for deterministic comparison, normalize_json_numbers_to_double for cross-system compatibility, and GROUP BY / DISTINCT support for JSON/JSONB types.

Stronger Lakehouse Integration

Iceberg metadata visibility via the new all_manifests system table, plus manifest-level caching for reduced metadata I/O.
Iceberg snapshot management with expire_snapshots for controlling metadata growth and reducing storage costs.
Iceberg table optimization through rewrite_data_files for small file compaction, plus Partition Evolution DDL support.
Schema evolution support for complex types (Array, Map, Struct) in Iceberg external tables.
Broader storage and authentication support: AWS CredentialsProviderChain, Paimon DLF Catalog with OSSHDFS, Apache Ozone object storage, and MaxCompute RAM Role authentication.

Defining the Future of AI Analytics

VeloDB Cloud Core v26.0, powered by Apache Doris 4.0, is our definitive answer to how data infrastructure should serve the AI era. It stands as the fastest Analytics & Search Database, providing the speed and architectural simplicity required to fuel next-gen AI Agents.

Apache Doris 4.0 was built by more than 200 community contributors. Join the Apache Doris community on Slack to help shape the next release. VeloDB Cloud v26.0 is currently available on AWS U.S East (N. Virginia), U.S West (Oregon), U.S. West (N. California), AP (Hong Kong), AP (Singapore), and EU (Frankfurt). For availability in other regions, please contact us.

Experience the future of AI Analytics today. Try it now!