Use Cases

Open Data Lakehouse

Modern data lakehouses achieve a complete separation of storage and compute systems. They rely on data lakes and open table formats for data storage, allowing various data processing and analytics engines on compute layer, all sharing the same data. This modern separation of storage and compute as the "shared-data" architecture, is distinct from the traditional "shared-disk" architecture.

The lakehouse SQL analytics engine is one of the most critical components. However, in many current lakehouse solutions, lakehouse analytics face the following challenges:

lakehouse-icon-1

Low Analytical Performance

Current lakehouse analytics engines, both open-source and commercial, struggle to provide low-latency queries at a cost-effective rate, falling short of traditional data warehouses.

Variety of Data Formats

Many lakehouse analytics engines lack support for open table formats and catalogs. Managing diverse data formats with unique optimization needs is essential for effective analytics.

Poor Warehouse-Lakehouse Integration

Either the engines on the lakehouse cannot be used as a warehouse, or a large number of warehouses cannot access the lakehouse

Contact Us
Why Choose VeloDB

Blazing-Fast

With a high-performance query engine and fast metadata/data caching, VeloDB becomes the fastest lakehouse analytics engine, outperforming Trino by 2-3 times.

Open

Seamlessly integrates with mainstream open data formats and catalogs in the lakehouse ecosystem, while also providing extensive support for other data sources, including databases.

Unified

With built-in storage, VeloDB can be used as an analytical database, a lakehouse analytics engine, or both, unlocking more powerful capabilities.

After introduced Doris to replace Presto, with an average daily query volume of more than 1 million, the P95 performance has been improved by nearly 3X, and the computation resource savings up to 48%, which is a significant benefit.

logo

We built a unified lakehouse architecture based on Apache Doris and Iceberg, enabling seamless data interoperability between Doris and Iceberg. This significantly simplifies and unifies the overall architecture.

logo

We introduced Apache Doris to replace Trino and Pinot, unified the data management on PostgreSQL, Elasticsearch and Iceberg. This significantly simplified our architecture, improved query performance and system stability, and reduced resource costs by 30%.

logo
VeloDB Open Data Lakehouse Solution
Analytical Workloads
arrow
Long-running ETL
arrow
Machine Learning
arrow
Lightweight ETL
arrow
Interactive Analytics
Open Data Lakehouse
Lakehouse Compute
Batch Processing Engine
(Spark, ...)
Real-Time Analytics Engine
(VeloDB)
Lakehouse Storage
Data Lake
(Iceberg, Hudi, Delta Lake, ... )
Catalog
(Polaris, Unity, Glue, ...)
Data Sources
arrow
Tables
arrow
Streams
arrow
Files
arrow
...
Real-Time Analytics Engine
Use VeloDB as the real-time analytics engine, primarily responsible for supporting interactive analytics and lightweight ETL computational workloads.
Batch Processing Engine
Use Spark-like batch processing engines, primarily responsible for supporting long-running ETL and machine learning computational workloads.
Open Lakehouse Storage
Build an open lakehouse storage based on Data Lake using open table formats and open Catalog.
Related Resources
Docs

Guides, reference manuals, and deep dive - all the technical documentation about lakehouse analytics.

User Stories

Discover real-world applications and experiences from industrial users.

Videos

How to build open lakehouse with VeloDB.

community icon
Community

Join lakehouse analytics dedicated group on Slack and special category on Forum to ask question and get support.