30 Billion Ad Transactions with 50% Infrastructure Cost Reduction: AI Platform Modernization for a Global AdTech Leader

About the Client

The client is a global AdTech leader specializing in contextual advertising, brand safety, and AI-driven ad intelligence. Its platforms analyze millions of web pages daily to extract advertising signals across sentiment, brand safety, threat detection, keyword extraction, and IAB category classification, supporting demand-side and supply-side partners across the open web.

The organization operates at internet scale. Its production environment processes approximately 30 billion advertising transactions and 100 terabytes of data each day, with content classification pipelines analyzing more than 20 million webpages daily and growing. The technology footprint ran on AWS and combines Apache Spark, Kafka, ECS-based microservices, and a portfolio of analytical systems supporting real-time bidding, contextual intelligence, and reporting workloads.

Impact Delivered

5x
faster reporting and ETL
50%
reduction in reporting infrastructure cost with no loss of throughput or coverage
4x
faster ML inference on GPU-enabled Databricks clusters, unlocking richer contextual signals at the same SLA
$52K/month
in recurring AWS savings from combined storage and Savings Plans optimisation
3ms
advertising transactions processed daily without manual scaling intervention
100 TB
of data processed every day on a single unified lakehouse
200 million
webpages analysis capacity per day unlocking 10x headroom for contextual intelligence

Standing at a Turning Point

Scale Outpacing Architecture

The classification platform that powers the client’s contextual advertising business had been built on a fleet of ECS-based distributed services running custom Spark and ETL frameworks. As adoption grew, the architecture hit hard structural limits. Scaling for peak ad traffic required manual intervention, ML inference response times slowed under load, and reporting pipelines for ad exchange data became increasingly difficult to scale and maintain as volumes climbed.

Cost and Latency Pressure

These constraints surfaced at exactly the moment the business was running a company-wide infrastructure optimizations effort, with compute-intensive workloads growing faster than infrastructure efficiency. Fragmented batch and streaming environments multiplied the operational complexity, AI experimentation was throttled by the lack of a unified platform, and every new model or use case added more bespoke services to maintain. Without a structural reset, the platform would have continued to incur cost faster than it absorbed traffic, and contextual signal latency, the core product, would have degraded as scale grew.

Solutioning

A Unified Lakehouse Reset

Zimetrics positioned the engagement towards a unified AI platform reset. The architectural principle was to consolidate streaming ETL, batch processing, ML inference, and ad-hoc data science onto a single Spark-native lakehouse, while leaving genuinely latency-sensitive serving paths (real-time bidding lookups, ad decisioning) on the purpose-built systems already engineered for millisecond response.

Why Databricks, and Why a Specialist Approach?

Databricks was selected as the central data processing and lakehouse layer because it offered four capabilities the legacy stack could not deliver together:

• Managed Spark compute that removed the burden of self-managed clusters
• Unified batch and Structured Streaming on a common runtime
• Integrated MLflow for model tracking and registry
• Unity Catalog for governance across the data estate.

Zimetrics approached it as an AI platform engineering problem, deliberately leaving DynamoDB, DAX, Redis, Kafka, and GPU inference clusters in place for workloads where they were already the right choice, and using Databricks for the data and ML workloads where it was.

Engineering the Transformation

The new end-to-end flow takes the shape: web pages → Kafka → Databricks Structured Streaming → ML classification → signal API and S3. Spark-based processing on Databricks replaces the fragmented ECS service mesh that previously orchestrated content extraction and classification, while Kafka and S3 integration preserves the existing event backbone the rest of the platform depends on.

• GPU-Enabled Inference: Model inference for NLP and computer vision workloads (including IAB classification, threat detection, sentiment, keyword extraction, and image threat classification) runs on GPU-enabled Databricks clusters, replacing CPU-bound ECS services for the workloads where GPU economics win.
• Structured Streaming for Real-Time Signals: Contextual signal generation moved to Databricks Structured Streaming, allowing signals to be produced continuously rather than through bespoke micro-batch services.
• Operational Streaming Coverage: Streaming coverage now extends across four operational modules running in production on Databricks: Ad Events, Inventory, Real Time Bidding (RTB) and
• RTB User Sync Statistics: Together these handle the high-volume ad serving, inventory, realtime bidding, and user sync workloads that underpin the contextual advertising business.
• Databricks Workflows for Orchestration: Workflow orchestration was centralized on Databricks Workflows, replacing scattered scheduling and dependency logic spread across the ECS environment.

The platform layer was extended for the data science organization: MLflow handles model tracking, experiment management, and the model registry; notebooks support multi-language exploration in Python, Scala, and SQL on the same governed data; and the lakehouse pattern unifies the data substrate that contextual, NLP, computer vision, brand safety, and page classification models all draw from. The net result is that new models can be built, evaluated, and promoted on shared infrastructure instead of bespoke per-team stacks.

Unity Catalog was introduced for data governance and metastore management across the organization, replacing fragmented metadata practices. Alongside the platform build, Zimetrics ran an explicit cost engineering workstream covering AWS Spot-based workloads for elastic compute, storage tiering, Savings Plans optimization, and GPU migration for inference. The hybrid posture was deliberate: DynamoDB with DAX continues to serve ultra-low-latency lookups in the 2-3 millisecond range, Redis and Kafka remain the operational substrate for serving and event handling, and Databricks is positioned for the processing, analytics, and ML workloads where Spark-native compute is the right tool.

Future Outlook

With the unified Databricks AI platform in production, the client now has the headroom to scale contextual intelligence workloads from approximately 20 million webpages a day toward a target of 200 million webpages a day, without rebuilding the underlying platform. Continued cost optimization across Spot, GPU, and storage tiering remains an active workstream.

Beyond scale, the lakehouse architecture is the substrate for the next wave of AI workloads. The same governed data, MLflow lifecycle, and Spark compute that power today’s contextual classification become the launchpad for GenAI-ready experimentation, richer brand safety models, and faster AI model iteration cycles for the data science organization.

Zimetrics Team Perspective

“When people talk about AI platforms, the conversation jumps straight to GPUs and GenAI, but the reality of getting there is very different. Most of the work goes into versioned data, lineage, model registry, governed access, repeatable environments, and getting it right decides whether a program like this compounds or stalls.”

Related Stories

Let's Talk

To find out more about us, email
biz@zimetrics.com or complete the form below.