TL; DR;
One of the world's biggest telecommunication companies replaced the Lambda architecture with a unified real-time & batch data platform powered by Apache Doris for its 5G Fully-Connected Factory. Leveraging Doris's federated query capabilities, this platform established a unified query gateway and simplified data pipelines, significantly reducing storage costs, improving data freshness, and boosting query performance and development efficiency. Currently, Doris handles 70% of real-time data ingestion and 90% of real-time queries.
The Challenges: Inefficient Engine with High Costs Powered by Hive + ClickHouse
5G Fully-Connected Factory is a manufacturing ecosystem that leverages 5G and next-gen ICT to establish pervasive connectivity across all production units. A stable, efficient data warehouse engine is the core of 5G Fully-Connected Factory, providing robust support for data ingestion and queries.
This company previously adopted the Lambda architecture, including real-time and batch data processing pipelines. However, this mode faced significant challenges:
- Complex Data Pipeline: Data must flow from Hive to ClickHouse before report queries, adding to computational overhead. Multi-stream joins also caused extended processing time.
- Unstable Data Processing: Any business dimension changes in wide tables triggered batch reprocessing of historical data, causing extended processing time and even disrupting business operations.
- High Maintenance Costs: The architecture contained multiple technology stacks and components, such as Hive, HBase, HDFS, and ClickHouse, leading to high maintenance costs.
Why Apache Doris?
Real-Time Writes and Updates: All in Doris mode allows direct real-time and batch data ingestion, simplifying data pipelines and enhancing data freshness. Also, Doris features a strongly consistent primary key storage model, supporting synchronous data updates and deletions.
Superior Query Performance: Doris delivers sub-second multi-dimensional queries, high-concurrency point queries, and high-performance queries, with high multi-table join performance, direct access to data, and optimization for complex analysis.
Unified Data Analytics Gateway: Doris features federated query capabilities and a scalable data framework, enabling fast queries across diverse databases (relational databases/ data warehouses/ data lakes).
Reduced Operation & Maintenance Costs: Doris supports online scaling, automatic load balancing, and rolling cluster upgrades, providing a simplified, stable O&M architecture with lower costs.
Easy to Use: Doris is highly compatible with MySQL syntax and supports standard SQL, making it accessible to most developers and analysts.
With the above advantages, Apache Doris upgraded the previous data pipelines and developed a unified real-time & batch data platform:
The Future: Seamless Migration to Apache Doris
This company will advance the migration with Apache Doris:
- Further Exploring Doris Manager: Using its monitoring dashboard for anomaly detection will improve online operational efficiency.
- Improving Query Performance: Enhancing index acceleration, slow query monitoring, resource queue allocation, and more will achieve higher query efficiency.
- Introducing Multi-Table Materialized Views: Doris 2.1's multi-table materialized views will further simplify task construction and accelerate data warehouse development.
- Introducing Doris Compute Storage Decoupled Mode: Doris 3.0 will allow the integration and open source of Doris compute storage decoupled mode.
- Standardizing Metrics System: Building an efficient, unified metrics system for the 5G fully-connected factory will help enterprises standardize the definition and usage of data metrics.