Use Cases
Modern data lakehouses achieve a complete separation of storage and compute systems. They rely on data lakes and open table formats for data storage, allowing various data processing and analytics engines on compute layer, all sharing the same data. This modern separation of storage and compute as the "shared-data" architecture, is distinct from the traditional "shared-disk" architecture.
The lakehouse SQL analytics engine is one of the most critical components. However, in many current lakehouse solutions, lakehouse analytics face the following challenges:
Low Analytical Performance
Current lakehouse analytics engines, both open-source and commercial, struggle to provide low-latency queries at a cost-effective rate, falling short of traditional data warehouses.
Variety of Data Formats
Many lakehouse analytics engines lack support for open table formats and catalogs. Managing diverse data formats with unique optimization needs is essential for effective analytics.
Poor Warehouse-Lakehouse Integration
Either the engines on the lakehouse cannot be used as a warehouse, or a large number of warehouses cannot access the lakehouse
Blazing-Fast
With a high-performance query engine and fast metadata/data caching, VeloDB becomes the fastest lakehouse analytics engine, outperforming Trino by 2-3 times.
Open
Seamlessly integrates with mainstream open data formats and catalogs in the lakehouse ecosystem, while also providing extensive support for other data sources, including databases.
Unified
With built-in storage, VeloDB can be used as an analytical database, a lakehouse analytics engine, or both, unlocking more powerful capabilities.
After introduced Doris to replace Presto, with an average daily query volume of more than 1 million, the P95 performance has been improved by nearly 3X, and the computation resource savings up to 48%, which is a significant benefit.
We built a unified lakehouse architecture based on Apache Doris and Iceberg, enabling seamless data interoperability between Doris and Iceberg. This significantly simplifies and unifies the overall architecture.
We introduced Apache Doris to replace Trino and Pinot, unified the data management on PostgreSQL, Elasticsearch and Iceberg. This significantly simplified our architecture, improved query performance and system stability, and reduced resource costs by 30%.
Guides, reference manuals, and deep dive - all the technical documentation about lakehouse analytics.
Join lakehouse analytics dedicated group on Slack and special category on Forum to ask question and get support.