A global financial services company was facing a common but costly problem: their mission-critical data processing workloads were consuming an increasingly large share of their cloud infrastructure budget. With strict SLA requirements and growing data volumes, costs were becoming difficult to justify.
Two workloads in particular were driving the majority of their data infrastructure spend:
A complex streaming job that processes incoming JSON files with a strict SLA of under 5 minutes per file. The pipeline handles approximately 1,500 files daily, each undergoing 32 transformations before records are written to Delta tables.
A data replication job responsible for synchronizing data lakes in Delta format between AWS regions (us-east-1 to eu-east-1) every 2 hours, ensuring business continuity and regional compliance.
Combined, these two workloads were costing the organization over $180,000 annually in compute and licensing fees and costs were projected to increase as data volumes continued to grow.
The company integrated Yeedu alongside their existing data infrastructure. Rather than a disruptive migration, they strategically redirected their most expensive workloads to Yeedu while maintaining their current governance and data architecture.
One of the key advantages was how little modification was needed to run existing Spark workloads on Yeedu:
# Streaming Pipeline Migration: 6 lines changed
+ Updated environment configs (bucket name, catalog name)
+ Added 4 lines for Yeedu CloudFiles streaming (parallel processing)
+ Added 2 lines for Python path imports
# Data Replication Migration: 10 lines changed
+ Minor adjustments for open-source Spark compliance
+ Thread pool performance optimizations
Yeedu's Turbo Engine delivered such significant performance improvements that the company was able to dramatically simplify their infrastructure replacing multi-node clusters with single-instance configurations while eliminating the need for auto-scaling.
We expected cost savings, but we didn't expect to also see performance improvements. Getting both simultaneously changed how we think about our entire data infrastructure strategy.
Immediate and substantial improvements across both performance and cost metrics.