Bigstrum
Bigstrum
Bigstrum
Book Consultation
All Insights
Edge Computing·6 min read·Jan 2026
Edge Computing

Sub-10ms Analytics: Pushing Inference to the Manufacturing Edge

Cloud round-trip latency is too slow for robotic defect detection on a live assembly line. Here is how we deployed highly quantized computer vision models directly onto ruggedized factory-floor hardware.

Category

Edge Computing

Read Time

6 min

Published

Jan 2026

Stack

6 technologies

Overview

A high-volume electronics manufacturer was running defect detection through a cloud-hosted vision model. Cloud round-trip latency averaged 180ms — fast enough for dashboards, too slow for the assembly line robots that needed to act on results within 8ms or miss the rejection window entirely. The model had to move to the edge, but the factory floor imposed constraints that made standard edge deployments non-trivial.

The Problem
  • 01

    Factory-floor hardware was ruggedised ARM-based units with no GPU — standard inference frameworks were too slow

  • 02

    Intermittent connectivity meant cloud fallback was not a viable safety net

  • 03

    Model accuracy could not degrade below 99.2% — below that, false negatives reached an unacceptable rate

  • 04

    OTA update cycles on factory hardware are months-long; the deployment model had to be right first time

Our Approach

We applied INT8 post-training quantisation to a MobileNetV3 backbone, reducing model size by 74% with less than 0.3% accuracy loss. Inference ran via ONNX Runtime on the ARM units, hitting consistent 6ms latency. A local model registry handled versioning and rollback without cloud dependency. Drift detection ran on-device and triggered alerts when incoming image distributions deviated beyond a calibrated threshold.

Key Takeaways
  • 01

    INT8 quantisation is usually the right first step for edge deployment — the accuracy trade-off is smaller than engineers expect

  • 02

    ONNX Runtime's ARM backend is production-grade; stop assuming you need a GPU for real-time inference

  • 03

    On-device drift detection is non-negotiable when the update cycle is months, not hours

  • 04

    Design for zero-connectivity from the start — cloud fallback is an illusion on factory floors

Article Details

Category

Edge Computing

Read Time

6 min

Published

Jan 2026

Tech Stack

ONNX RuntimeMobileNetV3INT8 QuantisationARMPythonC++

Ready to build?

Start a similar project

View all insights

Next Insight

Data Engineering·Dec 2025

Ingesting 5 Petabytes a Day: Rebuilding Our Data Pipeline in Rust