Back
Engineering

Apache Doris Tops JSONBench in Cold Queries and Data Quality

VeloDB Engineering Team

We're excited to share that Apache Doris has topped the latest JSONBench benchmark. Apache Doris ranks first in cold query performance and data quality, and ranks second in hot queries. Doris has shown an overall performance advantage: Doris is about 2 times faster than Elasticsearch, about 160 times faster than MongoDB, and about 1,070 times faster than PostgreSQL.

JSONBench is a trusted benchmark for testing semi-structured JSON data analytics. With the recent release of Apache Doris 3.1, Doris has climbed to the top of the JSONBench leaderboard.

Doris 3.1 has done deep optimizations to the VARIANT data type, introducing new features like sparse subcolumns and schema templates, improving column pruning and path indexing, vectorizing JSON pruning engine, and integrating intelligent caching mechanisms to enhance overall performance.

The results show Apache Doris' strength in large-scale JSON analytics, particularly for cold data, the toughest challenge in semi-structured data analytics. Unlike hot data, which stays cached in memory and is frequently accessed, cold data refers to massive JSON files that must be read directly from disk. It's a true test of a system's efficiency.

What's JSONBench

JSONBench, initially developed by the ClickHouse community, is one of the most representative benchmarks for JSON analytics, widely regarded as the gold standard for testing the performance of semi-structured data.

It uses a real-world production dataset containing over 1 billion JSON records with deeply nested structures and dynamic keys. The benchmark rigorously tests query optimization, columnar storage design, and parsing engine efficiency.

Apache Doris Ranks #1 in Cold Query Performance in JSONBench

In the latest JSONBench test, Apache Doris 3.1 ranked first on the leaderboard in cold query performance and data quality. In cold query scenarios, where no data is cached, Doris delivered a decisive lead and performed best in Q3 to Q5 cold queries, outperforming all other systems.

Detailed results: In cold queries, Doris achieved a score of 1.57, showing about 164× faster than MongoDB (score 258.21) and over 1,000× faster than PostgreSQL (1687.29). It also outperformed Elasticsearch (3.01) by nearly , making it one of the most competitive choices for large-scale JSON analytics.

202511_JSONBench_cold_edited.png

JSONBench official website: https://jsonbench.com/

Additionally, Doris also performed well in hot query performance, ranking second only to ClickHouse, the benchmark initiator.

202511_JSONBench_hot_edited.png

Doris also ranked first in data quality (data correctness), accurately recognizing and importing 100% the dataset, while other databases recognized and imported fewer records, highlighting its superior handling of complex JSON data.

202511_JSONBench_data_quality_edited.png

Apache Doris Optimization and Improvements

Apache Doris has consistently added key optimizations since version 3.0 to achieve better performance in cold query performance:

  • Efficient I/O path: Through path-level column pruning and late materialization, Doris loads only the necessary data when reading JSON subcolumns during cold queries, using precisely targeted reads that minimize data amplification.
  • Variant subcolumn indexing: Supports JSON path indexes (such as ZoneMap, BloomFilter, and other sparse indexes) and predicate pushdown, enabling file-level pruning to accelerate filter evaluation.
  • Powerful query engine: Features a highly optimized vectorized execution engine and efficient concurrent query processing.
  • Intelligent caching strategies: During cold reads, Doris combines prefetching and page caching to improve system throughput.

Apache Doris 3.1 has also introduced systematic optimizations to the VARIANT data type, allowing it to achieve better performance in cold query performance:

  • Sparse Subcolumn: Stores only frequently accessed JSON keys in columnar format to reduce I/O and metadata overhead.
  • Schema Template: Standardizes subcolumn data types to ensure more consistent and reliable index hits.
  • Enhanced column pruning and path indexing: Precisely locates target subcolumns during cold reads, avoiding full-field scans and ensuring consistent index performance.

For more details on VARIANT, read our documentation.

Why Choose Apache Doris

Apache Doris 3.1 is setting a new performance benchmark for semi-structured analytics, offering stability, accuracy, high performance, and cost efficiency.

  • Low-latency Queries: Sub-second query response and interactive analytics for logs, events, and behavioral data.
  • Storage-Compute Separation: Apache Doris offers strong cold query performance even in large-scale S3 or HDFS-based architectures.
  • Low I/O Cost: Reduces cold query I/O costs by over 60% compared with Elasticsearch under similar workloads.
  • Better JSON Analytics: Apache Doris delivers much better JSON analytics performance compared to MongoDB and PostgreSQL, and can effectively replace Elasticsearch and Snowflake in many use cases.

Want to learn more about Apache Doris and its JSON analytics performance? Join the Apache Doris community on Slack and connect with Doris experts and users. If you're looking for a fully-managed, cloud-native version of Apache Doris, contact the VeloDB team.