Use Case

Lakehouse Analytics

Fastest lakehouse SQL engine — boosts open lakehouse, replaces Trino/Presto, SparkSQL

Challenges of Data Analytics on Open Data Lakehouse

Unlike traditional data warehouses, data lakehouses achieve a complete separation of storage and compute systems. They rely on data lakes and open table formats for data storage, and the compute layer can support various data processing and analytics engines, all sharing the same data. We refer to this modern separation of storage and compute as the "shared-data" architecture, which is distinct from the traditional "shared-disk" architecture.

The lakehouse SQL analytics engine is one of the most critical components. However, in many current lakehouse solutions, lakehouse analytics face the following challenges:

Low Analytical Performance

Current lakehouse analytics engines, both open-source and commercial, struggle to provide low-latency queries at a cost-effective rate, falling short of traditional data warehouses.

Variety of Data Formats

Many lakehouse analytics engines lack support for open table formats and catalogs. Managing diverse data formats with unique optimization needs is essential for effective analytics.

Poor Warehouse-Lakehouse Integration

Either the engines on the lakehouse cannot be used as a warehouse, or a large number of warehouses cannot access the lakehouse.

Why Choose VeloDB

Blazing-Fast

With a high-performance query engine and fast metadata/data caching, VeloDB becomes the fastest lakehouse analytics engine, outperforming Trino by 2-3 times.

Open

Seamlessly integrates with mainstream open data formats and catalogs in the lakehouse ecosystem, while also providing extensive support for other data sources, including databases.

Unified

With built-in storage, VeloDB can be used as an analytical database, a lakehouse analytics engine, or both, unlocking more powerful capabilities.

After introduced Doris to replace Presto, with an average daily query volume of more than 1 million, the P95 performance has been improved by nearly 3X, and the computation resource savings up to 48%, which is a significant benefit.

We built a unified lakehouse architecture based on Apache Doris and Iceberg, enabling seamless data interoperability between Doris and Iceberg. This significantly simplifies and unifies the overall architecture.

We introduced Apache Doris to replace Trino and Pinot, unified the data management on PostgreSQL, Elasticsearch and Iceberg. This significantly simplified our architecture, improved query performance and system stability, and reduced resource costs by 30%.