Bigstrum
Bigstrum
Bigstrum
Book Consultation
All Insights
Distributed Systems·7 min read·Mar 2026
Distributed Systems

Surviving the Spike: Scaling to 10M Concurrent WebSockets

When a global streaming event caused traffic to surge by 40,000% in minutes, traditional auto-scaling couldn’t react fast enough.We designed a system built for spikes—pre-warmed infrastructure, optimized WebSocket handling, and a simplified routing layer—to support over 10 million concurrent connections without compromising latency or reliability.

Category

Distributed Systems

Read Time

7 min

Published

Mar 2026

Stack

6 technologies

Overview

Standard cloud auto-scaling has a dirty secret: it takes 3–5 minutes to provision and warm new instances. When a global sporting event drove a 40,000% traffic spike in under three minutes, that gap was lethal. We had 180 seconds to absorb 10 million concurrent WebSocket connections — or drop the stream entirely.

The Problem
  • 01

    Auto-scaling groups couldn't provision fast enough — new EC2 instances took 4+ minutes from trigger to ready state

  • 02

    Standard Application Load Balancers impose a 3,000 concurrent connection limit per target by default

  • 03

    WebSocket state was partially sticky to individual nodes, making seamless failover impossible

  • 04

    Backpressure from 10M simultaneous connection attempts overwhelmed the TLS handshake queue

Our Approach

We replaced reactive auto-scaling with predictive pre-warming triggered by upstream ticketing data. Seventy-two hours before the event, we provisioned and fully warmed a dedicated WebSocket tier — bypassing ALBs entirely in favour of a custom NLB + eBPF-based connection router that distributed load at the kernel level before TCP handshake. State was externalised to Redis Cluster with sub-millisecond replication lag.

Key Takeaways
  • 01

    Reactive auto-scaling is fundamentally incompatible with sudden, predictable spikes — pre-warm based on upstream signals

  • 02

    ALBs are the wrong tool for massive WebSocket workloads; NLBs with custom routing give you the control you need

  • 03

    eBPF-based load distribution at the kernel level eliminates userspace overhead at scale

  • 04

    Externalise all connection state before you think you need to — retrofitting it under load is impossible

Article Details

Category

Distributed Systems

Read Time

7 min

Published

Mar 2026

Tech Stack

AWS EC2NLBeBPFRedis ClusterRustKubernetes

Ready to build?

Start a similar project

View all insights

Next Insight

Core Modernisation·Feb 2026

Strangling the Monolith: Zero-Downtime Migration for a Tier-1 Bank