Back
Engineering

ASOF JOIN Benchmark: Apache Doris vs ClickHouse and DuckDB

2026/6/11

A stock trade prints at 10:31:00.250. The quote that explains its price arrived at 10:30:59.980. An equality join will never pair those two rows, because the timestamps differ by 270 milliseconds. The query you need says: for each trade, find the latest quote at or before the trade time, for the same ticker.

That lookup shows up across time-series workloads:

  • Transaction cost analysis. Match every trade to the prevailing quote to measure slippage and execution quality.

  • IoT and telemetry. Attach the most recent calibration record or device state to each sensor reading.

  • Observability. Enrich each error event with the latest deploy, config change, or feature flag that preceded it.

  • Event tracking. Join each order event to the latest status snapshot of the entity it belongs to.

SQL engines call this ASOF JOIN, and most major analytical databases now ship it. Writing the same logic without it takes a correlated subquery or a window-function detour, and both get expensive at billions of rows. An engine with native support can run the key match and the time lookup as a single operator.

Why We Benchmarked It

Apache Doris added ASOF JOIN in versions 4.0.5 and 4.1.0. Before recommending a new join type for production work, we wanted an objective answer to one question: whether it is fast enough, consistently enough, to sit in core query paths. Three things made a purpose-built benchmark necessary.

First, shared syntax hides divergent implementations. The operator behind ASOF JOIN involves real design choices: partition by key or globally, sort or hash, how to handle unordered input, how to manage per-key state when one key holds 100,000 rows and another holds exactly 1. Each choice shows up directly in query time, so a feature-list checkmark says little about speed.

Second, no standard suite measures it. TPC-H, TPC-DS, and similar industry benchmarks contain no ASOF JOIN queries, so performance claims for this operator have no public yardstick. We had to define one.

Third, the workloads that depend on this join carry hard latency expectations. Trade and quote matching, telemetry enrichment, and event correlation run on large, fast-growing tables, and the queries sit behind interactive dashboards and automated pipelines. An implementation that holds up only on small or tidy data would not be worth shipping.

So we analyzed the typical business scenarios for this join, defined 6 scenario families covering the factors that decide its cost (table size ratio, key cardinality, sequence length per key, physical data order, and post-join filtering), and used the suite in 2 ways: during development, to iterate on the implementation, and at release, to compare the result against 2 widely used analytical databases on identical hardware with equal thread counts.

Here is what to expect below: the join's semantics, the full test matrix, the environment and method (where both comparison systems are identified, with exact versions and settings), and per-scenario results for all 11 measurements. Doris posted the fastest time in every one. The narrowest gap was 16%, the widest 14x, and the test matrix is specific enough to reproduce.

The Semantics: Latest Available Fact

The trade and quote example translates directly to SQL:

SELECT t.trade_id, t.stock_id, t.trade_time, q.ask_price
FROM trade t
ASOF LEFT JOIN quote q
ON t.stock_id = q.stock_id
AND t.trade_time >= q.quote_time;

The ON clause does two jobs. The equality condition groups rows by business key: trades match quotes for the same stock_id. The inequality condition then picks a single row from that group: the quote with the largest quote_time that is still less than or equal to trade_time. Each left row gets at most one match.

The semantics are simple. Making them fast is the hard part, because production data is rarely tidy: per-key time chains can hold 100,000 entries or exactly 1, data lands on disk out of order, and key counts run from 100 to 10 million. The Doris team iterated the implementation against exactly those conditions, backed by 100+ regression tests plus large batches of randomly generated join tests for correctness.

Benchmark Design

Each scenario family isolates one factor that drives ASOF JOIN cost:

ScenarioWhat it verifiesLeft rowsLeft NDVRight rowsRight NDVData characteristics
Large vs small tableProbe efficiency against small reference series100M1k to 100M1k to 10M1k to 10MRandom distribution
100M vs 100MLarge-scale joins; NDV impact on grouping and state100M100 to 10k100M100 to 10kRandom distribution
Long sequencesLookup along long per-key time chains100M10k100M1k100k rows per right-table key
Short sequencesSparse, high-cardinality matching100M10M10M10M1 row per right-table key
Out-of-order storageSensitivity to physical data layout100M10k100M10kJoin keys on non-prefix, unsorted columns; raw data unordered
Filtered queriesExecution with post-join filtering100M10k100M10kWHERE filters at 75% to 100% selectivity

The matrix covers LEFT and INNER join semantics. To keep the comparison about the join itself, all 3 systems used identical table schemas and identical data, and we verified after the runs that the ASOF JOIN operator accounted for the bulk of each query's cost, so scan-speed differences don't drive the rankings.

Environment and Method

All 3 engines ran on the same single machine:

ItemSpec
Machines1
Instance typeecs.g7.4xlarge
CPU cores16
Memory64 GB
Disk1x ESSD PL0 cloud disk, 500 GB

DuckDB runs in process and ClickHouse ran as a single server, so a 1-node deployment keeps the comparison fair. Doris FE and BE shared the node.

ProductApache DorisClickHouseDuckDB
Version4.1.0v26.3.3.201.5.1
Parallelismparallel_pipeline_task_num = 16max_threads = 16threads = 16

Doris defaults to a different degree of parallelism than the other 2 engines, so we pinned all 3 to 16 threads. Every query ran at single concurrency: 1 cold run, then 3 hot runs, taking the fastest hot time as the result.

Results

pic1.png

ScenarioDoris (s)ClickHouse (s)DuckDB (s)
100M vs 10M0.380.532.27
100M vs 100k0.250.292.19
100M vs 1k0.150.192.10
10M vs 100M0.521.551.69
100k vs 100M0.471.681.43
100M vs 100M, standard NDV0.971.933.62
100M vs 100M, low NDV1.132.263.54
Long sequences0.59 to 0.611.59 to 1.781.53 to 3.19
Short sequences0.64 to 0.674.43 to 4.492.45 to 2.46
Out-of-order storage1.40 to 1.442.24 to 2.294.22 to 4.23
Filtered queries1.14 to 1.152.15 to 2.163.74 to 3.75

Large Fact Table, Small Reference Table

This is the most common production shape: a high-frequency event stream joined to a smaller reference series, like trades to quotes or events to dimension snapshots.

With a 100M-row left table, Doris held 0.15s to 0.38s as the right table grew from 1k to 10M rows. ClickHouse stayed close here, within 16% to 39% of Doris. DuckDB ran 6x to 14x behind. Doris's latency falling smoothly as the right table shrinks points to low fixed overhead, which is what keeps this query shape inside interactive dashboard budgets.

Flipping the ratio separates the engines. At 10M vs 100M, Doris ran 0.52s against 1.55s for ClickHouse and 1.69s for DuckDB. At 100k vs 100M, Doris ran 0.47s against 1.68s and 1.43s. Once the right side carries the bulk of the data, Doris holds about a 3x lead or better over both.

100M Rows on Both Sides

At standard NDV (10,000 keys), Doris finished in 0.97s, against 1.93s for ClickHouse and 3.62s for DuckDB. A sub-second result on a 100M x 100M time-series join, on 1 machine with 16 cores, is the headline number of this benchmark.

Low NDV (100 keys) concentrates all 100M rows into a few heavy groups, which stresses per-group state and tends to expose execution hot spots. Doris moved from 0.97s to 1.13s, a 16% shift, while ClickHouse ran 2.26s and DuckDB 3.54s. Production key distributions are rarely ideal, so how little a result moves between NDV 10,000 and NDV 100 matters as much as the absolute time.

Sequences, Ordering, and Filters

The remaining scenarios test the data shapes that benchmarks often skip.

Long sequences (100,000 rows per right-table key, the shape of dense market data or high-frequency metrics): Doris held 0.59s to 0.61s, ClickHouse 1.59s to 1.78s, DuckDB 1.53s to 3.19s.

Short sequences (10M keys with exactly 1 right-table row each, the shape of sparse device fleets or wide user dimensions): Doris held 0.64s to 0.67s. ClickHouse slowed to 4.43s to 4.49s, its weakest result in the benchmark at roughly 7x behind. DuckDB ran 2.45s to 2.46s.

Out-of-order storage (join keys on non-prefix, unsorted columns): Doris held 1.40s to 1.44s, against 2.24s to 2.29s for ClickHouse and 4.22s to 4.23s for DuckDB. Real ingestion rarely lands sorted, so a result that survives unordered input lowers the engineering cost of adopting the feature.

Filtered queries (WHERE conditions at 75% to 100% selectivity): Doris held 1.14s to 1.15s, against 2.15s to 2.16s and 3.74s to 3.75s. Real analyses filter and aggregate after they join, so the operator has to stay fast inside a larger plan.

Reading the Results

Doris was fastest in all 11 measurements, and the shape of the lead tells you more than any single row:

  • ClickHouse stayed within 40% of Doris when the right table was small, and ran 1.6x to 7x behind everywhere else.

  • DuckDB ran 2.5x to 14x behind, with most scenarios at 3x or worse.

  • Across every scenario, Doris stayed between 0.15s and 1.44s. That narrow band, held across NDV swings, unordered input, and sequence-length extremes, is the property that decides whether you can put a feature in a core query path.

Initial production tests with financial-industry users matched these results. The usual caveats apply: these numbers come from the listed versions, configuration, and generated datasets, and your own data distribution will shift them. The test matrix above is specific enough to reproduce.

Try It

ASOF JOIN ships in Apache Doris 4.0.5 and 4.1.0. Download the latest 4.x release from the Apache Doris site and follow the join documentation to run it on your own trade, telemetry, or log data.

To run the same engine without managing the cluster, VeloDB Cloud runs Apache Doris as a managed service, with a free trial to test ASOF JOIN against your own workload. If you hit problems with either, join the Apache Doris community on Slack or contact the VeloDB team.

Try VeloDB Cloud for Free

SaaS warehouse free trial 14 days,
BYOC warehouse free computing service fee 90 days.

Subscribe to Our Newsletter

Stay ahead on Apache Doris releases, product roadmap, and best practices for real-time analytics and AI-ready data infra.

Need help? Contact us!