A high-throughput event pipeline replacing a brittle cron-based system.



Stripe’s internal payment retry system had grown organically into a patchwork of cron jobs and queue workers with inconsistent failure handling. Each service had its own retry logic, leading to cascading failures during peak traffic and unreliable delivery guarantees.
The system processed roughly 800K events per day but regularly fell behind during traffic spikes, creating backlogs that took hours to clear.
The redesigned pipeline — codenamed Meridian — handled peak traffic events without falling behind for the first time in three years. Key improvements included a unified retry strategy with exponential backoff, circuit breakers between service boundaries, and a dead-letter queue with automatic alerting.
The system now processes 2M+ events daily with 99.97% uptime and sub-200ms P99 latency, a significant improvement over the previous architecture.