Sub-10ms Analytics: Pushing Inference to the Manufacturing Edge

↗

Cloud round-trip latency is too slow for robotic defect detection on a live assembly line. Here is how we deployed highly quantized computer vision models directly onto ruggedized factory-floor hardware.

Category

Edge Computing

Read Time

6 min

Published

Jan 2026

Stack

6 technologies

Overview

A high-volume electronics manufacturer was running defect detection through a cloud-hosted vision model. Cloud round-trip latency averaged 180ms — fast enough for dashboards, too slow for the assembly line robots that needed to act on results within 8ms or miss the rejection window entirely. The model had to move to the edge, but the factory floor imposed constraints that made standard edge deployments non-trivial.

The Problem

01
Factory-floor hardware was ruggedised ARM-based units with no GPU — standard inference frameworks were too slow
02
Intermittent connectivity meant cloud fallback was not a viable safety net
03
Model accuracy could not degrade below 99.2% — below that, false negatives reached an unacceptable rate
04
OTA update cycles on factory hardware are months-long; the deployment model had to be right first time

Our Approach

We applied INT8 post-training quantisation to a MobileNetV3 backbone, reducing model size by 74% with less than 0.3% accuracy loss. Inference ran via ONNX Runtime on the ARM units, hitting consistent 6ms latency. A local model registry handled versioning and rollback without cloud dependency. Drift detection ran on-device and triggered alerts when incoming image distributions deviated beyond a calibrated threshold.

Key Takeaways

01
INT8 quantisation is usually the right first step for edge deployment — the accuracy trade-off is smaller than engineers expect
02
ONNX Runtime's ARM backend is production-grade; stop assuming you need a GPU for real-time inference
03
On-device drift detection is non-negotiable when the update cycle is months, not hours
04
Design for zero-connectivity from the start — cloud fallback is an illusion on factory floors

Article Details

Strangling the Monolith: Zero-Downtime Migration for a Tier-1 Bank

04·Data Engineering·Dec 2025

Sub-10ms Analytics: Pushing Inference to the Manufacturing Edge

Strangling the Monolith: Zero-Downtime Migration for a Tier-1 Bank

Ingesting 5 Petabytes a Day: Rebuilding Our Data Pipeline in Rust