Solutions/Data Warehouse

Data Warehousing
on VeloDB

Sub-second analytics on petabyte-scale data. Open formats. Unified workloads. No vendor lock-in.

Run sub-second analytics across Doris tables and lakehouse tables like Iceberg, Hive, and Delta Lake with a single SQL statement

Handle structured tables, semi-structured JSON, and vector embeddings in one engine

Connect Tableau, Grafana, Superset, and DBT through MySQL-compatible protocol

Start Free Trial Book a Consultation

VeloDB data warehouse on lakehouse architecture

The analytical database
your lakehouse
is missing

Open formats solved the storage and interoperability problem. VeloDB solves the performance, multimodal, and operational problems that come after

Lakehouse queries at warehouse speed

VeloDB accelerates queries on Iceberg, Hive, and Delta Lake data through async materialized views, multi-level caching, and vectorized MPP execution. Your data stays in open formats. Query performance reaches native warehouse speed.

Open by design, portable by default

VeloDB reads and writes Iceberg v3 natively with time travel and schema evolution. Delta Lake and Hive tables are queryable through multi-catalog federation. Polaris and Unity catalog integration fits existing governance. Apache 2.0 licensed, your data never enters a proprietary format.

Every data type queried in one place

VeloDB stores and queries tables, JSON, full-text, and vector embeddings together in standard SQL. VARIANT handles semi-structured data with automatic column extraction. Inverted indexes replace Elasticsearch for text search. Vector indexes enable similarity queries without a separate database.

Trusted in production

Data teams run on VeloDB

Xiaomi built a unified lakehouse on Doris and Paimon, cutting average query latency from 60 seconds to 10 seconds with 6x faster performance.

Faster average query latency

Higher concurrency vs Presto

40s→8s

Aggregation query time

“We replaced separate Presto, Druid, and Spark clusters with one Doris engine over Paimon storage. Aggregation queries dropped from 40 seconds to 8 seconds. Concurrent query capacity scaled from 5 to 80 sessions.”

Data Platform Team, Xiaomi

Global consumer electronics leader

Read the full story

XiaomiPlanetSF TechnologyHaidilaoTencent MusicMeituanByteDanceBaiduNetEaseKwaiJD.comTrip.comXiaomiPlanetSF TechnologyHaidilaoTencent MusicMeituanByteDanceBaiduNetEaseKwaiJD.comTrip.com

Real-world tradeoffs

Challenges with
lakehouse analytics at scale

01·Query speed

Lakehouse solved data sharing but query performance still falls short

Open lakehouse architectures centralized storage and eliminated data silos. But query performance for interactive workloads remains the unsolved problem.

Business users expect sub-second responses. Applications need low-latency concurrent access. The lakehouse can hold all the data, but serving it fast enough for these use cases still requires additional systems.

Tap to flip

How VeloDB solves it

Sub-second analytics on lakehouse data without moving it

VeloDB accelerates Iceberg, Hive, and Delta Lake queries through multi-level caching that avoids re-reading unchanged data and async materialized views that precompute expensive aggregations. The optimizer transparently rewrites queries to use cached or materialized results without changes to application SQL.

← Flip back

02·Multimodal

Each data type in the analytics stack requires its own system and its own pipeline

Structured data goes to the warehouse. Full-text goes to a search engine. Vector embeddings go to a dedicated vector database. JSON payloads get flattened or stored separately.

Each system has its own ingest path, query language, and consistency model. Answering questions that span multiple data types means querying multiple systems and joining results in application code.

Tap to flip

How VeloDB solves it

Structured, semi-structured, text, and vector data in one database

VeloDB stores and queries all data types together in standard SQL. VARIANT handles semi-structured JSON with automatic column extraction. Inverted indexes with BM25 scoring handle full-text search. HNSW and IVPQ vector indexes enable similarity queries. A single SQL statement can filter structured columns, search text, and rank vectors in one round trip.

← Flip back

03·Operational complexity

The warehouse is one system but operating it requires six more around it

A production lakehouse stack typically includes a caching layer for query performance, an ETL scheduler for materialized view refreshes, a coordination service, and separate monitoring for each component.

Every dependency has its own deployment, upgrade cycle, and failure modes. The warehouse itself is often the simplest part to operate.

Tap to flip

How VeloDB solves it

Built-in infrastructure that replaces external dependencies

VeloDB includes multi-level query caching, async materialized view refresh, and ZSTD compression natively. No Redis, no ETL scheduler, no ZooKeeper. Two node types and 60% or more storage reduction through columnar compression. Fewer systems to operate means fewer things to break and fewer engineers allocated to keeping the analytics stack running.

← Flip back

04·Vendor lock-in

Data becomes difficult to move once it enters a proprietary warehouse

Proprietary storage formats, proprietary catalogs, and proprietary query extensions create dependencies that grow over time. The longer data stays in the system, the more pipelines, dashboards, and models depend on it.

Migration means rewriting all of them. The cost of leaving eventually exceeds the cost of staying, even when the system no longer fits.

Tap to flip

How VeloDB solves it

Open source database with native open format support

VeloDB is built on Apache Doris, fully open source under Apache 2.0. It reads and writes Iceberg v3 natively with time travel, partition evolution, and schema evolution. Delta Lake and Hive Metastore are first-class. Polaris and Unity catalog compatibility means VeloDB fits into your existing catalog governance without migration. Your data stays in your storage, in formats any engine can read, with no lock-in at the engine, storage, or catalog level.

← Flip back

Architecture overview

VeloDB for lakehouse analytics

Whether you're accelerating queries over Iceberg tables, building dimensional models with materialized views, or querying structured, semi-structured, and vector data from one SQL interface, VeloDB handles it on one engine.

VeloDB lakehouse analytics engine

Data Sources

Iceberg Tables

v1 / v2 / v3

Hive Tables

Metastore catalog

Delta Lake Tables

Streaming lakehouse

MySQL / PostgreSQL

JDBC federation

S3 / HDFS

Direct file access