Automated Ephemeral Environments for 500+ Engineers
Staging environments were our biggest bottleneck. We built a custom Kubernetes operator that spins up completely isolated, data-anonymized replicas of production for every single pull request in under 90 seconds.
Category
Infrastructure
Read Time
5 min
Published
Oct 2025
Stack
6 technologies
With 500 engineers across 40 teams sharing three staging environments, merge queues were routinely 48 hours long. Environment contention caused more delays than code review. The solution wasn't more staging environments — it was a system that could create an isolated, production-accurate environment for every pull request in under 90 seconds and tear it down automatically when the PR closed.
- 01
Production database snapshots were 8TB — a full restore per PR was impractical on any timeline
- 02
Shared staging environments meant one team's bad deployment could block all other teams
- 03
Service-to-service dependencies meant isolated environments needed intelligent traffic routing to avoid calling production
- 04
Data anonymisation for GDPR compliance had to run within the 90-second spin-up budget
We built a custom Kubernetes operator that listened to GitHub PR webhooks and provisioned namespaced environments using copy-on-write volume snapshots — eliminating the full-restore problem. Service mesh routing rules isolated all inter-service traffic to the PR namespace by default, with explicit overrides for stable shared services. A streaming anonymisation job processed only the tables touched by changed migrations, completing in under 40 seconds for 95% of PRs.
- 01
Copy-on-write volume snapshots are the unlock for fast ephemeral environments at scale — they make 8TB databases practical
- 02
Namespace isolation with service mesh routing is cleaner than VPC-per-environment and dramatically faster to provision
- 03
Scope your anonymisation to changed migrations, not the full database — it is the only way to meet aggressive SLAs
- 04
Build the teardown logic first — ephemeral environments that are not reliably cleaned up become permanent environments
Article Details
Category
Infrastructure
Read Time
5 min
Published
Oct 2025
Tech Stack
Ready to build?
Start a similar project