Back
User Stories

Kwai Replaced ClickHouse with Apache Doris for a Smart, Unified Lakehouse Architecture

VeloDB Engineering Team· 2025/07/19
icon
VeloDB is a leading managed service for Apache Doris, offering Fast, Cost-Effective, Enterprise-Grade capabilities for real-time analytics use cases in the AI era.

TL;DR:

Kwai, a leading social media platform, has replaced ClickHouse with Apache Doris to upgrade its OLAP system, now handling nearly 1 billion daily queries. This move shifts them from a complex lake-warehouse separation model to a unified lakehouse architecture. The new system leverages Doris's direct lake access and an intelligent auto-materialization service to solve critical issues of data redundancy, resource contention, and complex governance. The key results include:

  • Queries accelerated by up to 6x, turning billion-row queries into millisecond responses.

  • Data volumes compressed by over 11x via smart materialization.

  • A simplified, unified architecture eliminating data duplication and complex ETL.

The Challenge: The Cracks in a Lake-Warehouse Separation Model

Kwai processes an enormous volume of data every day, making a strong OLAP system essential. Their unified platform handles close to 1 billion queries daily, powering everything from B2B reports to internal dashboards.

The Challenge- The Cracks in a Lake-Warehouse Separation Model.PNG

Their initial architecture was a classic lake-warehouse separation model: a data lake (Hive/Hudi) for batch processing and a real-time data warehouse (ClickHouse) for serving queries. Data was processed through various layers (ODS, DWD, DWS) in the lake before being loaded into ClickHouse for analysis. While stable, this architecture developed serious issues over time:

  • Redundant Storage: Copying data from the lake into ClickHouse created data duplication, increasing storage costs and data latency.
  • Resource Contention: The constant data ingestion and compaction processes in ClickHouse competed for resources with high-concurrency queries, impacting cluster performance and stability.
  • Complex Governance: Data engineers spent significant effort building and maintaining data models and ETL jobs just to load data into ClickHouse. When dashboards were decommissioned, the underlying data pipelines often kept running, wasting resources and requiring manual cleanup.
  • Difficult Query Tuning: Optimizing query performance in ClickHouse by choosing the right sort keys, indexes, and materialized views was complex and had a steep learning curve.

The Search for a Solution: A Unified Lakehouse with Apache Doris

The Search for a Solution- A Unified Lakehouse with Apache Doris.PNG

The goal was clear: move to a lakehouse architecture where the warehouse could directly analyze data in the lake, eliminating redundant data movement and simplifying the entire process. Apache Doris, with its evolving lakehouse capabilities, was the ideal choice. Key features that aligned with their goals were:

  • Direct-on-Lake Query Performance: Doris’s powerful MPP query engine is highly optimized for open data formats like Parquet and ORC, enabling high-throughput, low-latency analytics directly on the data lake.
  • Federated Querying: Ability to connect to and unify various data sources (Hive, Iceberg, Hudi, databases) under a single query interface.
  • Seamless Data Integration: Built-in features like asynchronous materialized views and job scheduling simplify data processing and transformation within the lakehouse.
  • Unified Engine: Doris can act as a single engine for data ingestion, processing, and analysis, creating a closed-loop lakehouse system.

The Results: An Intelligent, High-Performance Lakehouse

The new architecture places Apache Doris at the core of the query layer, directly accessing data in the lake (Hive/Hudi), which is cached by Alluxio. This eliminates the need for a separate ClickHouse warehouse.

The most innovative part of the new system is two custom-developed services that work with Doris:

  1. Query Routing Service: Automatically routes massive queries to a Spark engine, protecting Doris resources from being overwhelmed by heavy-duty jobs.
  2. Auto-Materialization Service: Intelligently creates and manages materialized views. It analyzes query patterns to build optimized data models on the fly, which Doris's query planner then transparently uses to accelerate queries.

This new Doris-powered lakehouse delivered transformative benefits:

  • Simplified Architecture, Unified Storage: By querying data directly on the lake, they eliminated the entire data import pipeline to ClickHouse. This reduced maintenance costs, storage overhead, and data latency.
  • From Seconds to Milliseconds
    • Queries on datasets ranging from millions to billions of rows were consistently returned in milliseconds, supported by following technologies:
    • Smart Acceleration: The Auto-Materialization Service, combined with Doris’s powerful query rewrite capability, delivered huge speedups. Queries were accelerated by at least 6x, with data volumes for these queries being compressed by over 11x.
    • Efficient Caching: A custom metadata caching service, working with Alluxio, reduced metadata access latency from 800ms down to just 50ms.
  • Flexible & Automated Data Governance: The "consumption-driven" model of the Auto-Materialization Service means data models are now created automatically based on actual usage. This frees up data engineers from manual modeling and ensures that compute and storage resources are only used for valuable, active data.
  • Smarter Query Optimization: They further optimized performance by feeding pre-collected statistics from Spark into Doris's optimizer, creating sorted Parquet files, and using bucketed tables on the lake, all of which Doris leverages for more efficient, distributed query plans.

With this successful migration, Kwai plans to move more workloads, including ad-hoc queries currently on Presto, to their new Doris-powered lakehouse, creating a truly unified analytics engine for the entire company.

Talk to Us

If you want to bring similar (or even higher) performance improvements and benefits to your data platform, or just explore further on Apache Doris, you are more than welcome to join the Apache Doris community, where you can connect with other users facing similar challenges and get access to professional technical advice and support.

If you're exploring fully-managed, cloud-native options, you can reach out to the VeloDB team!