Distributed Systems·7 min read·Mar 2026

Distributed Systems

Surviving the Spike: Scaling to 10M Concurrent WebSockets

↗

When a global streaming event caused traffic to surge by 40,000% in minutes, traditional auto-scaling couldn’t react fast enough.We designed a system built for spikes—pre-warmed infrastructure, optimized WebSocket handling, and a simplified routing layer—to support over 10 million concurrent connections without compromising latency or reliability.

Category

Distributed Systems

Read Time

7 min

Published

Mar 2026

Stack

6 technologies

Overview

Standard cloud auto-scaling has a dirty secret: it takes 3–5 minutes to provision and warm new instances. When a global sporting event drove a 40,000% traffic spike in under three minutes, that gap was lethal. We had 180 seconds to absorb 10 million concurrent WebSocket connections — or drop the stream entirely.

The Problem

01
Auto-scaling groups couldn't provision fast enough — new EC2 instances took 4+ minutes from trigger to ready state
02
Standard Application Load Balancers impose a 3,000 concurrent connection limit per target by default
03
WebSocket state was partially sticky to individual nodes, making seamless failover impossible
04
Backpressure from 10M simultaneous connection attempts overwhelmed the TLS handshake queue

Our Approach

We replaced reactive auto-scaling with predictive pre-warming triggered by upstream ticketing data. Seventy-two hours before the event, we provisioned and fully warmed a dedicated WebSocket tier — bypassing ALBs entirely in favour of a custom NLB + eBPF-based connection router that distributed load at the kernel level before TCP handshake. State was externalised to Redis Cluster with sub-millisecond replication lag.

Key Takeaways

01
Reactive auto-scaling is fundamentally incompatible with sudden, predictable spikes — pre-warm based on upstream signals
02
ALBs are the wrong tool for massive WebSocket workloads; NLBs with custom routing give you the control you need
03
eBPF-based load distribution at the kernel level eliminates userspace overhead at scale
04
Externalise all connection state before you think you need to — retrofitting it under load is impossible

Article Details

Category

Distributed Systems

Read Time

7 min

Published

Mar 2026

Tech Stack

Backend

eBPFRust

Database

Redis Cluster

Infrastructure

AWS EC2NLBKubernetes

Ready to build?

Start a similar project

View all insights

Next

02·Core Modernisation·Feb 2026

Strangling the Monolith: Zero-Downtime Migration for a Tier-1 Bank