Back

Comparing the Leading Distributed PostgreSQL Architectures: Citus, PGXC-PGXL, and Greenplum

PostgreSQL is fundamentally a single-node database, relying on ecosystem extensions and derivative architectures for distributed capabilities. Today, the most widely adopted distributed PostgreSQL solutions include Citus, PGXC/PGXL, and Greenplum. We will compare each in this article.

Main Differences: Core Architecture and Design Positioning

1. Citus: "Lightweight Distributed Extension" for Both OLTP and OLAP

Citus is an influential open-source sharding extension in the PostgreSQL ecosystem. Its development is closely linked to the rise of cloud-native databases. The project originated from the support of YCombinator, a well-known startup incubator in the San Francisco Bay Area. After being acquired by Microsoft in 2019, Citus was integrated into the Azure cloud service system and became the core tool for Microsoft to deploy PostgreSQL's distributed capabilities.

Citus's core design is a lightweight architecture of "coordinator node + worker node", aiming to achieve distributed capabilities without sacrificing native PostgreSQL compatibility. This design approach not only avoids the performance bottlenecks of traditional middleware architectures but also lowers the threshold for distributed transformation for small and medium-sized teams.

Citus adopts a Shared-Nothing architecture: the coordinator node is only responsible for SQL parsing, routing, and result aggregation, without storing business data; worker nodes perform dynamic horizontal sharding (hash/range/list) based on the "distribution column" (e.g., user ID), and each shard can be configured with master-slave replicas (based on PostgreSQL streaming replication).

The original design intention is to enable small and medium-sized teams to implement "transaction + analytics" mixed workloads at low cost, retaining PostgreSQL's OLTP capabilities while handling distributed queries of TB-level data.

2. PGXC/PGXL: "Pure Distributed Cluster" Focused on Strong OLTP Consistency

PGXC/PGXL is representative of PostgreSQL pure distributed clusters, and its technical origin can be traced back to 2010. At that time, NTT Open Source Software Center and EnterpriseDB joined hands to integrate their technical accumulations in the RitaDB and GridSQL projects and jointly launched the Postgres-XC project.

In 2012, the core development team established StormDB and added enhanced features such as MPP parallel processing. After being acquired by TransLattice in 2014, it was open-sourced as Postgres-XL, which has become the mainstream version of this system. Both belong to the architecture system of "global transaction priority", with the core goal of solving distributed problems in high-concurrency OLTP scenarios and filling the gap of the early PostgreSQL native architecture in strong-consistency distributed transactions.

The architecture consists of "Global Transaction Manager (GTM) + coordinator + data node": GTM is responsible for global transaction ID allocation, sequence management, and lock coordination to ensure the ACID properties of distributed transactions; the coordinator receives requests and distributes tasks; data nodes store static horizontally sharded data (hash/range distribution). The design positioning is to replace traditional single-master architectures, suitable for core businesses with extremely high requirements for distributed transaction consistency (e.g., financial transactions).

3. Greenplum: "MPP Data Warehouse" Specializing in Extreme OLAP Performance

As a benchmark for MPP data warehouses, Greenplum can be traced back to 2003. It was developed by Greenplum based on the PostgreSQL kernel and Massively Parallel Processing (MPP) architecture, and mainly served enterprise-level big data analysis scenarios in the early days. After being acquired by EMC in 2010, it entered Pivotal with asset integration and open-sourced its core engine in 2015, becoming the first open-source MPP data warehouse.

After VMware acquired Pivotal in 2019, its commercial version was upgraded to VMware Tanzu Greenplum, and the latest round of ownership changes was completed after Broadcom acquired VMware in 2023. The core positioning is an enterprise-level data warehouse specifically designed for petabyte-scale data analysis.

The architecture adopts a Shared-Nothing mode of "master node + segment node": the master node is responsible for query planning and result aggregation; segment nodes store static horizontally sharded data, and all segment nodes execute query tasks in parallel (the core of MPP parallel computing). It weakens PostgreSQL's OLTP features, strengthens batch data processing and parallel query optimization, and supports data compression, partitioned tables, and efficient ETL tools (e.g., gpfdist). The design goal is to provide extreme OLAP throughput.

Differences in Core Capabilities: Key Features Comparison

1. Data Distribution and Scalability

  • Citus: Dynamic sharding design supports online expansion. When adding worker nodes, shards can be automatically rebalanced without downtime; computing and storage can be linearly scaled, and read/write performance improves steadily with the increase of nodes, but shard rebalancing will generate short-term overhead.
  • PGXC/PGXL: Static shards are bound to data nodes. During expansion, shards need to be manually migrated, with the risk of downtime; scalability depends on the number of data nodes, but GTM tends to become a bottleneck (needing to deploy GTM clusters to alleviate, with high complexity).
  • Greenplum: Static shards are fixed to segment nodes. Expansion requires re-planning shards and data migration, with poor flexibility; however, MPP parallel computing capabilities are significantly improved with the increase of segment nodes, and the storage capacity and query throughput have great expansion potential (adapting to PB-level data).

2. Consistency and Transaction Support

  • Citus: Single-shard transactions are fully compatible with PostgreSQL's native consistency, and cross-shard transactions ensure strong consistency through 2PC; however, in cross-shard scenarios (e.g., cross-user ID JOIN), consistency guarantee will bring certain performance overhead, requiring reasonable design of distribution columns to reduce cross-shard operations.
  • PGXC/PGXL: Realizes global transaction consistency through Global Transaction Manager (GTM), supports complete distributed ACID transactions, including core OLTP features such as global sequences and global locks; it has the strongest consistency guarantee capability, but the centralized management of transactions by GTM will reduce the throughput in high-concurrency scenarios.
  • Greenplum: Batch transactions ensure strong consistency through 2PC, adapting to batch writes in OLAP scenarios (e.g., ETL); however, it weakens row-level transaction support, has extremely poor performance in high-concurrency single-row writes, and is not suitable for high-frequency transactions in OLTP scenarios.

3. SQL Compatibility and Workload Adaptation

  • Citus: Highly compatible with PostgreSQL's native SQL, supports core OLTP features such as indexes, stored procedures, and triggers, and can handle simple OLAP analytics (e.g., report queries); only some advanced features (e.g., inherited tables) have limited adaptation, and cross-shard JOIN performance depends on distribution column design (improper planning may cause lag).
  • PGXC/PGXL: Well compatible with PostgreSQL, strongly adapted to high-frequency transactions, row-level locks, and real-time read/write in OLTP scenarios; however, it lacks parallel query optimization in OLAP scenarios, batch analytics performance is far inferior to Greenplum, and the efficiency of cross-node joint queries is general.
  • Greenplum: Compatible with the core SQL subset of PostgreSQL, weakens OLTP-related features (e.g., high-concurrency transactions, row-level updates), and strengthens OLAP syntax (e.g., window functions, batch import); it is adapted to large-scale offline analytics and data mart construction, but migrating PostgreSQL native applications requires a lot of code modifications.

4.High Availability and Operation and Maintenance Complexity

  • Citus: Shard replicas of worker nodes support automatic failover, coordinator nodes can be deployed with master-slave backup, and RTO is in seconds; operation and maintenance costs are moderate, documentation is complete, the open-source community is active, small and medium-sized teams can deploy and maintain independently, and only need to pay attention to shard distribution and coordinator node load.
  • PGXC/PGXL: Data nodes support automatic fault masking, GTM can be deployed in clusters (master-slave switchover), and the high-availability design is oriented to 7×24-hour OLTP services; however, operation and maintenance complexity is high, GTM cluster maintenance, shard migration, and troubleshooting require professional skills, and community resources are relatively niche.
  • Greenplum: Segment nodes support master-slave replica failover, master nodes have active-standby redundancy, and data reliability is strong; however, operation and maintenance costs are extremely high, batch data import/backup/restore processes are complex, relying on professional operation and maintenance teams or vendor support, community edition functions are limited, and enterprise-level features require payment.

Which One to Use: Pros, Cons, and Applicable Scenarios

Architecture TypeAdvantagesDisadvantagesApplicable Scenarios
Citus1. Supports mixed OLTP/OLAP workloads with wide scenario adaptation; 2. High native PostgreSQL compatibility with low application migration costs; 3. Automated expansion, supporting online scaling; 4. Open-source and free, complete documentation, easy for small and medium-sized teams to implement.1. Cross-shard operation performance depends on distribution column design, and improper planning may cause lag; 2. Performance in PB-level pure OLAP scenarios is weaker than Greenplum; 3. Coordinator node has a lightweight bottleneck.1. TB-level small and medium-sized distributed OLTP (e.g., e-commerce orders, user data); 2. Mixed workloads with both high-frequency transactions and simple analytics; 3. Scenarios where native PG applications need to quickly upgrade to distributed architectures.
PGXC/PGXL1. Extremely strong distributed transaction consistency, strictly supporting ACID; 2. Fully adapted to core OLTP businesses, supporting global sequences/locks; 3. Pure distributed architecture without coordinator node performance bottlenecks; 4. High-availability design meets 7×24-hour OLTP service requirements.1. Static sharding leads to complex expansion, requiring manual migration with downtime risk; 2. High difficulty in GTM cluster operation and maintenance, relying on professional skills; 3. Weak community ecosystem with few troubleshooting resources; 4. Insufficient parallel performance in OLAP scenarios.1. High-concurrency OLTP distributed scenarios (e.g., core banking transactions, payment systems); 2. Core businesses requiring strict distributed ACID transactions; 3. Scenarios with strong requirements for global sequences and cross-node transaction consistency.
Greenplum1. Extreme MPP parallel computing performance with fast response for PB-level data analysis; 2. Complete batch data processing tools, supporting enterprise-level data warehouse features; 3. Industry-leading OLAP throughput, adapting to large-scale data analysis; 4. Strong data reliability, supporting multi-replica redundancy.1. Poor SQL compatibility with high migration costs for PG native applications; 2. Extremely poor OLTP performance, not supporting high-concurrency transactions; 3. High operation and maintenance complexity, relying on professional teams or vendors; 4. Poor flexibility in static sharding expansion.1. Enterprise-level data warehouses (PB-level data storage and analysis); 2. Offline ETL, log analysis, batch report statistics; 3. Large-scale data mart construction, offline decision support systems.

Selection Summary

  • If you need a "low-cost, high-compatibility" mixed workload solution, prioritize Citus;
  • If focusing on "high-concurrency, strong-consistency" core OLTP businesses, prioritize PGXC/PGXL;
  • If specializing in "large-scale, high-performance" OLAP analysis scenarios, prioritize Greenplum.

Core selection logic:

First, clarify whether the business is OLTP-dominated, OLAP-dominated, or mixed workload, then combine the historical evolution characteristics of the technologies: Citus's cloud-native genes adapt to agile scenarios, PGXC/PGXL's financial-grade consistency inheritance adapts to core transactions, and Greenplum's enterprise-level iteration process adapts to large-scale analysis.

Finally, select the suitable architecture based on data volume (TB/PB level), consistency requirements, and operation and maintenance capabilities.