Inside Doris 3.1: Smarter Semi-Structured JSON Analytics, Better Iceberg and Lakehouse Support

Apache Doris 3.1 is officially released! Join us for a webinar to learn more about the new features in 3.1.

Vincent Zhang, Apache Doris PMC, will discuss the major updates in semi-structured data analytics for observability workloads, lakehouse integration for Iceberg and Paimon, and other updates in storage, query, and JOINs.

This webinar provides a high-level overview of new features in Doris 3.1. It's the first session in a three-part series, and future webinars will dive deeper into semi-structured JSON analytics, Iceberg and lakehouse support. Be sure to join the upcoming sessions as well.

Key Topics We'll Cover:

1. Better Semi-Structured Data Analytics for Observability

Apache Doris is already widely used for observability, thanks to its fast, high-concurrent queries on large datasets, its ability to ingest semi-structured data (like JSON logs), and its support for text search and indexing of log/event data.

In Doris 3.1, the VARIANT data type gets a major upgrade with sparse columns and schema templates.

Sparse columns keep common JSON key as true columnar subcolumns, making queries fast and metadata small, while rare keys stay in a compact "sparse" area so the table doesn't bloat. This enables Doris to handle tens of thousands of subcolumns smoothly, without struggling with super-wide tables, metadata bloat, or slow queries.
Schema templates make constantly changing JSON predictable along critical paths, such as lock types for key JSON subpaths, tailor inverted index strategies per subpath, and tune indexes per subpath. This leads to faster queries, more stable indexes, and controlled costs.

2. Better Lakehouse Support for Iceberg and Paimon

Iceberg and Paimon are two popular open table formats that are seeing increasing adoption in lakehouse scenarios. Doris 3.1 now offers comprehensive compatibility with Iceberg and Paimon, including:

Async materialized views now support partition-level incremental maintenance and transparent query rewriting, creating a high-speed bridge between data lake and data warehouse.
Iceberg: Doris now natively supports branch/tag lifecycle management. It also added support for Iceberg system tables, allowing users to directly query Iceberg data. We also added support for Iceberg REST Catalog. Doris 3.1 now works with multiple backend implementations, including Unity, Polaris, Gravitino, and Glue.
Paimon: Doris now supports batch incremental queries in Paimon, reading Paimon table data from branches and tags, and direct access to Paimon system tables for easier debugging and optimization.

3. Storage & Query Performance Optimizations

Optimizing MOW Tables: Added optimizations to make large-scale, concurrent data ingestion more stable and efficient, including reducing compaction lock times and reducing long-tail import latency.

Smarter Partition Pruning: New binary search pruning, added support for monotonic functions so queries on time-partitioned tables (like logs or events) now skip irrelevant partitions, and full-path code optimizations.

Data Traits: Doris now detects traits like uniqueness and functional dependencies to remove redundant joins, aggregations, and sorts, delivering up to 10x faster queries in tests.

Webinar Details

Date: Thursday, Oct. 16, 6:00 p.m. PST | 9:00 p.m. EST

This session is scheduled for US and APAC-friendly hours, but everyone is welcome. We will do live Q&As during the session. Can't make it live? No worries, register anyway, and we'll send you the recording and slides after the event.