Sub-10ms Analytics: Pushing Inference to the Manufacturing Edge
Cloud round-trip latency is too slow for robotic defect detection on a live assembly line. Here is how we deployed highly quantized computer vision models directly onto ruggedized factory-floor hardware.
Category
Edge Computing
Read Time
6 min
Published
Jan 2026
Stack
6 technologies
A high-volume electronics manufacturer was running defect detection through a cloud-hosted vision model. Cloud round-trip latency averaged 180ms — fast enough for dashboards, too slow for the assembly line robots that needed to act on results within 8ms or miss the rejection window entirely. The model had to move to the edge, but the factory floor imposed constraints that made standard edge deployments non-trivial.
- 01
Factory-floor hardware was ruggedised ARM-based units with no GPU — standard inference frameworks were too slow
- 02
Intermittent connectivity meant cloud fallback was not a viable safety net
- 03
Model accuracy could not degrade below 99.2% — below that, false negatives reached an unacceptable rate
- 04
OTA update cycles on factory hardware are months-long; the deployment model had to be right first time
We applied INT8 post-training quantisation to a MobileNetV3 backbone, reducing model size by 74% with less than 0.3% accuracy loss. Inference ran via ONNX Runtime on the ARM units, hitting consistent 6ms latency. A local model registry handled versioning and rollback without cloud dependency. Drift detection ran on-device and triggered alerts when incoming image distributions deviated beyond a calibrated threshold.
- 01
INT8 quantisation is usually the right first step for edge deployment — the accuracy trade-off is smaller than engineers expect
- 02
ONNX Runtime's ARM backend is production-grade; stop assuming you need a GPU for real-time inference
- 03
On-device drift detection is non-negotiable when the update cycle is months, not hours
- 04
Design for zero-connectivity from the start — cloud fallback is an illusion on factory floors
Article Details
Category
Edge Computing
Read Time
6 min
Published
Jan 2026
Tech Stack
Ready to build?
Start a similar project