Surviving the Spike: Scaling to 10M Concurrent WebSockets
When a global streaming event caused traffic to surge by 40,000% in minutes, traditional auto-scaling couldn’t react fast enough.We designed a system built for spikes—pre-warmed infrastructure, optimized WebSocket handling, and a simplified routing layer—to support over 10 million concurrent connections without compromising latency or reliability.
Category
Distributed Systems
Read Time
7 min
Published
Mar 2026
Stack
6 technologies
Standard cloud auto-scaling has a dirty secret: it takes 3–5 minutes to provision and warm new instances. When a global sporting event drove a 40,000% traffic spike in under three minutes, that gap was lethal. We had 180 seconds to absorb 10 million concurrent WebSocket connections — or drop the stream entirely.
- 01
Auto-scaling groups couldn't provision fast enough — new EC2 instances took 4+ minutes from trigger to ready state
- 02
Standard Application Load Balancers impose a 3,000 concurrent connection limit per target by default
- 03
WebSocket state was partially sticky to individual nodes, making seamless failover impossible
- 04
Backpressure from 10M simultaneous connection attempts overwhelmed the TLS handshake queue
We replaced reactive auto-scaling with predictive pre-warming triggered by upstream ticketing data. Seventy-two hours before the event, we provisioned and fully warmed a dedicated WebSocket tier — bypassing ALBs entirely in favour of a custom NLB + eBPF-based connection router that distributed load at the kernel level before TCP handshake. State was externalised to Redis Cluster with sub-millisecond replication lag.
- 01
Reactive auto-scaling is fundamentally incompatible with sudden, predictable spikes — pre-warm based on upstream signals
- 02
ALBs are the wrong tool for massive WebSocket workloads; NLBs with custom routing give you the control you need
- 03
eBPF-based load distribution at the kernel level eliminates userspace overhead at scale
- 04
Externalise all connection state before you think you need to — retrofitting it under load is impossible
Article Details
Category
Distributed Systems
Read Time
7 min
Published
Mar 2026
Tech Stack
Ready to build?
Start a similar project