Why a Mexican Mining Giant Migrated from Azure Synapse to Apache Doris

My team helped one of Mexico's top mining companies switch from Azure Synapse to Apache Doris for analytics workloads. Beyond mining, the company produces cement and concrete, operates more than five mines across Mexico, employs over 20,000 people, and runs around the clock.

The company ingests terabytes of data per day from ERP systems, equipment sensors, production logs, maintenance records, and logistics feeds across mines, distribution centers, and transportation fleets. The analytics system produces over 200 reports and dashboards serving 62+ divisions daily.

Azure Synapse couldn't keep up. Cloud bills were unpredictable, ad hoc queries took up to two minutes, and the platform required dedicated cloud DBAs and DevOps just to stay running. In a mining operation where real-time data drives safety and maintenance decisions, two-minute query latency is a real risk.

The team evaluated alternatives with clear criteria: sub-second query performance, lower total cost, MySQL compatibility, and a unified OLAP engine that didn't require custom ETL layers. Apache Doris was selected because it satisfied all the criteria.

The Challenges with Azure Synapse

These challenges will be familiar to anyone who has managed a cloud analytics stack at scale:

Unpredictable costs. The compute and storage pricing model led to budget overruns. With 24/7 operations generating terabytes of data, the cloud bill became a moving target that was hard to plan around.

Query latency. Ad hoc queries on equipment telemetry and operational data took 30 seconds to two minutes. For a mining operation where real-time information drives safety alerts and maintenance decisions, that latency was a problem.

Operational complexity. Azure Synapse offered many capabilities, but the team used only a fraction of them. The unused components added failure modes that needed specialized support. When issues arose, resolution could take more than a day, sometimes requiring escalation to the vendor's own support team.

Limited real-time capabilities. Most data processing ran on batch pipelines, which introduced delays in decision-making. In mining, equipment downtime and logistics decisions happen in real time, and batch analytics created a constant friction point.

Why Did We Migrate to Apache Doris

After evaluating several alternatives, the team chose Apache Doris. Here is why it checked every box:

Real-time analytics: Doris delivers sub-second query performance on standard hardware, with no specialized GPUs or processors required. The team deployed Doris on VMware VMs in the company's existing data center near the mining sites and central offices, cutting network latency to near zero.

High concurrency: Doris's MPP architecture and vectorized execution engine handled the high-concurrency ingestion and query loads the team needed.

MySQL compatibility: Most of the engineering team already knew MySQL, so analysts could write queries immediately without retraining.

Stream and batch ingestion in one platform: Moving away from batch-only processing was a priority. Apache Doris supports both stream ingestion (via Kafka and Flink) and batch, so the team didn't need to stitch together separate tools.

No custom ETL layers. The team didn't want to build ETL pipelines from scratch. Doris's native Stream Load and Broker Load significantly reduced engineering effort.

The Architecture

Before: Azure Synapse

The previous architecture on Azure Synapse was feature-rich but operationally heavy. Between unpredictable cloud costs, slow ad hoc queries, and batch-only pipelines, the system created more friction than it solved.

After: Apache Doris

The new architecture is significantly simpler.

Infrastructure: An 8-node Doris cluster (64 vCPUs, 512 GB RAM total), deployed in a secure on-premises data center near the mining sites.

Data sources: ERP, social networks, data feeds, equipment sensors, and telemetry logs from mining machines and transportation trucks.

Ingestion: Custom Python scripts handled schema conversion during migration. Apache Kafka plus Flink now handles real-time streaming. Doris Stream Load and Broker Load covered historical data.

Analytics: Near-real-time analytics from raw data to Power BI dashboards, with ad hoc query access for operational teams across all 62+ divisions.

Integration: Connected to the existing LDAP for authentication, plus monitoring tools for 24/7 cluster health visibility.

The architecture followed a blueprint from the Apache Doris documentation, which made it easy to implement within the existing infrastructure.

Migration Strategy

The team followed a four-phase approach designed to minimize risk:

Phase 1: Assessment (2 weeks)

The team cataloged all 120+ tables and 50 key reports that needed to migrate, identified dependencies, and prioritized the migration order.

Phase 2: Pilot (4 weeks). The team migrated one high-value dashboard (equipment downtime tracking) to Apache Doris, confirming the technical selection before committing to a full migration.

Phase 3: Parallel run (6 weeks). The team ran Azure Synapse and Apache Doris side by side, comparing data validation results, ingestion behavior, and end-user feedback. This phase built confidence that the new system matched or exceeded the old one.

Phase 4: Cutover and decommission (2 weeks). The team switched all production traffic to Apache Doris, kept cloud services in standby until the full cutover was validated, and trained data analysts across divisions on the new system.

Additional Tools Used:

Custom Python scripts for schema conversion from Azure Synapse to Apache Doris
Apache Kafka + Flink for real-time data ingestion (replacing batch pipelines)
Apache Doris Stream Load and Broker Load for migrating historical data

Results

~18% reduction in equipment downtime: Real-time data availability enabled faster responses to safety alerts and maintenance issues.
Predictive maintenance: Mining equipment runs continuously. When you can query maintenance logs and sensor telemetry in real time, you catch problems before they cause unplanned downtime. That is where the 18% reduction comes from.
Simplified operations: A simpler architecture means fewer specialized personnel. When issues come up, the team resolves them directly without waiting on vendor support.
Predictable costs. Open-source software on commodity hardware means no more surprise cloud bills.
Near-zero query latency. With the cluster co-located in the same data center as the offices and mining sites, network overhead is essentially gone.

Lessons Learned

A few takeaways for any team considering a similar migration:

Start small, prove value fast. The pilot dashboard won executive buy-in. Don't migrate everything at once. Pick a high-visibility use case, show results, and expand from there.
Leverage MySQL compatibility. Analysts started writing queries on day one. Doris's familiar SQL dialect eliminated the retraining bottleneck that usually slows migrations.
Plan for data validation: Run checksums during cutover to confirm the data migrated correctly. Don't assume the data is right. Verify it.
On-premises doesn't mean outdated. The team chose on-premises deployment because of data residency restrictions, cross-border policies, and privacy requirements. They are already planning AI and LLM projects on the same local infrastructure.
Community matters: The Apache Doris Slack and GitHub communities were valuable for troubleshooting and gave the team access to solutions that other users had already worked through.

What's Next

The company plans to expand Apache Doris to additional business units in 2026, integrate AI and machine learning models for predictive analytics, and potentially open-source the ingestion framework the team built.

Interested in running Apache Doris for your analytics workload? Join the community on Slack or contact the VeloDB team to talk through your use case.