Back
Products

Deep Dive into VeloDB Cloud: The Modern Cloud-Native Real-Time Data Warehouse

VeloDB Engineering Team· 2025/08/28
Keywords:

Cloud computing has become a critical component of modern corporate operations in the backdrop of digital transformation. Cloud exerts critical effects in several aspects. Firstly, it provides higher flexibility and scalability, allowing dynamic adjustment of computing and storage resources according to market changes and business growth. Secondly, its pay-as-you-go model prevents enterprises from high hardware investments and maintenance costs.

VeloDB Inc., a commercial provider of Apache Doris, has developed the modern multi-cloud native real-time data warehouse, VeloDB Cloud. It fully leverages cloud-native capabilities to provide customers with cost-effective, unified, easy-to-use, secure, and stable cloud-based data analytics services. Since its launch, VeloDB Cloud has supported major global cloud platforms and introduced a BYOC (Bring Your Own Cloud) deployment model for customers' compliance needs. Furthermore, VeloDB Cloud unswervingly deepens its capabilities in real-time analytics, log storage and analysis, lakehouse, and more scenarios, committed to delivering efficient data analytics performance.

Get Started Now: https://www.velodb.cloud/passport/login

Compute-Storage Decoupled Architecture

VeloDB Cloud adopts a compute-storage decoupled mode to maximize the strengths of cloud platforms.

In terms of Compute:

  • Workload isolation: multiple compute clusters can share the same data, allowing users to isolate different business workloads or offline loads using separate compute clusters.
  • Elastic scaling of compute clusters: the computing resources can be scaled flexibly based on the workload requirements.

In terms of Storage:

  • Tiered Storage: the full dataset is stored in the more cost-effective and highly reliable shared storage, with only hot data cached locally. Compared to the compute-storage coupled mode with three data replicas, the storage cost can be reduced by up to 90%.

    Compute-Storage Decoupled Architecture.png

01 Multi-Cluster: Efficient Workload Isolation

In the compute-storage decoupled mode, the compute layer no longer stores the data, enabling extremely flexible and rapid elastic scaling. The shared data storage layer can be accessed by multiple computing resources. Therefore, VeloDB Cloud introduces multi-cluster capability, innovating the data warehouse architecture to better meet user needs.

VeloDB Cloud's multi-cluster architecture is primarily for read-write isolation and isolation between real-time/offline businesses.

  • Read-write isolation: in traditional data warehouse architectures, data writes and reads occur within the same compute cluster. During writing peaks, resource preemption can easily impact query performance and stability. VeloDB Cloud's multi-cluster architecture allows writes and reads to be handled by independent compute clusters. Even during high write throughput, computing tasks can execute smoothly without interference.

01 Multi-Cluster- Efficient Workload Isolation.png

  • Real-time/offline workload isolation: typically, most analytical scenarios use the same data to support multiple business requirements, but these scenarios have different requirements for latency and availability. Traditional architectures often redundantly store data in different systems, leading to high storage and maintenance costs. VeloDB Cloud's multi-cluster architecture can use isolated computing resources to achieve workload isolation for different businesses based on the same data copy. Simultaneously, it brings easy-to-use operations and significant cost savings.

01 Multi-Cluster- Efficient Workload Isolation-2.png

02 Elastic Scaling: Scaling Up During Peak, Scaling Down During Off-peak

As workloads are constantly changing, more computing resources are required during peak business periods, and these resources will be wasted during off-peak periods.

To address this problem, VeloDB Cloud supports elastic scaling of compute resources.Computing resources can be scaled flexibly based on the load requirements. For example, users can scale up resources quickly in peak times to increase efficiency and scale down in valley times to reduce costs. Idle clusters will automatically suspend to further reduce resource costs.

02 Elastic Scaling- Scaling Up During Peak, Scaling Down During Off-peak.png

03 Tiered Storage: Balancing High Performance and Low Cost

Enterprise data volumes range from terabytes to petabytes now. Within the overall cost structure, computing resources and data storage account for the majority of expenses. To maximize efficiency, enterprises must optimize storage costs without compromising computational performance.

VeloDB Cloud employs a tiered storage system, taking advantage of different cloud storage media to store cold and hot data. Hot data is prioritized to be stored on SSD to ensure high performance; cold data is stored in object storage to reduce storage costs. This approach reduces storage costs while maintaining high performance for hot data reads and writes.

03 Tiered Storage- Balancing High Performance and Low Cost.png

Real-Time Ingestion, Blazing-Fast Analysis

Faced with large-scale data, enterprises encounter challenges in ingesting and processing data with lower latency to improve data freshness. Additionally, for data applications, reducing latency and providing higher query performance are also pressing.

In terms of Ingestion: VeloDB Cloud supports real-time ingestion:

  • Sub-second real-time updates (primary key tables) & appends: VeloDB Cloud delivers sub-second data visibility, enabling real-time updates and appends on both primary key and non-primary key tables. Unlike most traditional data warehouses that only support batch updates and lack primary key table support, this eliminates barriers to high-frequency real-time updates.
  • Database CDC / Kafka data streaming: upstream data sources for real-time data warehouses typically come from TP databases or Kafka. VeloDB Cloud provides native integration with database CDC and Kafka data streaming, achieving sub-second data synchronization.
  • Sub-second lightweight schema changes: beyond real-time data ingestion and updates, schemas often require rapid modifications to keep pace with evolving business needs. VeloDB Cloud allows lightweight schema change operations to be completed in seconds without compromising system performance.
  • Support for semi-structured data types: as businesses expand, semi-structured data types have become increasingly prevalent. VeloDB supports storage and processing of semi-structured data types like Array, Map, JSON, and Variant.

In terms of Queries: VeloDB Cloud delivers blazing-fast analytical performance across various query workloads:

  • High-concurrency point queries: thousands of QPS per node, enabling a single architecture to satisfy both high-throughput OLAP analysis and high-concurrency Data Serving online services. This significantly simplifies the technical architecture for mixed workloads, providing users with a unified analytical experience across diverse scenarios.
  • Wide-table queries: in October 2022, Doris topped the ClickHouse-initiated database performance leaderboard, Clickbench, proving its outstanding performance in wide-table queries. In May 2024, Doris achieved the #1 position on the Clickbench Hot Run overall leaderboard without any tuning, once again demonstrating its superior performance. VeloDB extended this strength to more real-world scenarios.
  • Multi-table joins: on the SSB and TPC-H standard test datasets, VeloDB's multi-table join performance can even be 100x faster than ClickHouse and 5-10x faster than Greenplum.

Lakehouse Integration, Unified Platform

In most enterprises, data lakes and data warehouses operate as two parallel systems. Data lakes store raw data, supporting diverse data types and flexible access, while data warehouses store structured data for complex analysis. However, both lakes and warehouses have limitations. We are now in an era of lakehouse integration, where juggling the high performance of data warehouses with the openness of data lakes becomes essential.

Lakehouse Integration, Unified Platform.png

As a modern unified data warehouse, VeloDB Cloud helps enterprises rapidly build lakehouse architectures.

  • Lakehouse query acceleration: without the need to migrate data to VeloDB Cloud, users can leverage Doris' efficient query engine to directly query data stored in data lakes such as Iceberg, Hudi, Paimon, and offline data warehouses like Hive, thereby accelerating query analysis.
  • Federated analysis: VeloDB Cloud enhances federated analysis capabilities by extending its catalog and storage plugins. Users can perform unified analysis across multiple heterogeneous data sources without physically centralizing the data in a single storage system. This enables external table queries and federated joins between internal and external tables, breaking down data silos and providing globally consistent data insights.
  • Write-back lakehouse: VeloDB Cloud introduces write-back functionality for Hive and Iceberg, allowing users to directly create Hive and Iceberg tables and write data into them. This allows users to write internal table data back to the offline lakehouse or process offline lakehouse data and save the results back into the lakehouse, simplifying and streamlining the data lake construction process.
  • Enhancements for semi-structured and unstructured data: semi-structured and unstructured data are common in data lakes. VeloDB Cloud has introduced support for data types like Array, Map, Struct, JSON, and Variant, with plans to support vector indexing in the future.

BYOC and SaaS Deployment Models

When building real-time data warehouses, users typically have different requirements:

  • Ease of use: users expect the operations to be as simple as possible, and that the suppliers handle all infrastructure management. Users can focus on business without worrying about infrastructure maintenance.
  • Compliance and control: some users have strict requirements for data compliance and control, prioritizing these even if sacrificing system performance.

For varying demands, VeloDB Cloud offers two deployment models: SaaS(Software as a Service) and BYOC (Bring Your Own Cloud).

BYOC and SaaS Deployment Models.png

01 SaaS

  • Simplified operations: SaaS model provides a fully managed service, ready to use out of the box. Users don't need to make excessive investments in resource management, access control, etc., for infrastructure maintenance.
  • Lower Total Cost of Ownership (TCO): If cloud provider discounts are limited, warehouses powered by the SaaS model can achieve lower TCO. The SaaS model reduces TCO by simplifying operations and reducing infrastructure investment. It also offers flexible subscription models, allowing users to adjust service scale based on actual needs.
  • Supported cloud platforms: Amazon Web Services (AWS), Azure.

02 BYOC

  • Data control & compliance: the BYOC model allows data to reside entirely within the customer's own VPC (Virtual Private Cloud), enhancing data security and compliance.
  • Lower usage costs: based on the BYOC model, users can purchase cloud servers, object storage, and other resources using their own cloud service accounts. This allows them to use discounts offered by cloud providers, optimizing and saving costs.
  • Network environment convenience: the BYOC model allows users to deploy warehouses directly within their internal VPC. Network connectivity setup becomes more straightforward, simplifying network configuration and management, and improving network efficiency and performance.
  • Supported Cloud Platforms: Amazon Web Services (AWS), Google Cloud Platform (GCP), Azure.

Stability, Security & Compliance, and Professional Services

As data-driven decision-making becomes widespread, enterprises' demand for stable and secure data platforms continues to grow. VeloDB Cloud meets international security standards and has obtained multiple authoritative certifications. Simultaneously, VeloDB Inc. provides comprehensive and long-term technical support, ensuring customers have a stable and reliable experience.

  • Exceptional stability & reliability: VeloDB Inc. offers long-term support services with maintenance cycles of 12 to 36 months, providing continuous stability assurance for customer production environments. Furthermore, VeloDB is fully compatible with Apache Doris and its ecosystem tools, ensuring seamless integration and efficient operation of your business processes.
  • Comprehensive data security & compliance: VeloDB Cloud actively benchmarks against international security compliance standards. It has currently passed multiple authoritative certifications, including SOC 2 Type II, ISO/IEC 27001:2022, HIPAA, GDPR, US Data Privacy, and PCI DSS – SAQ A. VeloDB also provides private networks and encrypted connections to effectively ensure data transmission security. Data encryption and environment isolation deployment further enhance the security of data and computations.
  • Professional team support & services: VeloDB Cloud provides comprehensive technical support and services. It includes regular health checks to proactively identify and eliminate potential system risks. Strictly adhering to Service SLAs, VeloDB Cloud ensures timely responses and resolutions. Additionally, regular product training sessions and best practice case studies are arranged to enhance the capabilities of using customers' systems.

Conclusion

VeloDB Cloud is seamlessly adapted to cloud infrastructure, balancing efficiency and elasticity. With powerful data analytics capabilities, a cloud-native compute-storage decoupled architecture, and a consistent multi-cloud service experience, VeloDB Cloud empowers enterprises to navigate ever-changing business demands and technological innovations while delivering an efficient data processing and analysis experience.

Get Started Now!