Hailo-8 in production: lessons from shipping eight stations on the same platform

Aior · Thursday at 11:33 PM

Why Hailo

We started shipping Hailo-8 accelerators about two years ago, after testing it head-to-head with Jetson Xavier NX on a vision inspection workload. The headline numbers were clear: comparable inference performance at roughly a quarter of the power, with a much smaller thermal envelope. After eight production stations, here's what we know that's not in the marketing material.

The toolchain workflow, in practice

The path from a PyTorch model to a Hailo-deployed binary is:

Train in PyTorch / TensorFlow
Export to ONNX
Optimize with the Hailo Dataflow Compiler (DFC) — this includes quantization to INT8
Compile to a Hailo Executable Format (HEF) targeting the specific chip (Hailo-8 / 8L / 15)
Deploy via HailoRT runtime

Steps 3 and 4 are where the real work happens. The DFC needs a representative calibration dataset — at least 64 images, ideally 512 — captured under production conditions. Calibration is the difference between "almost the same accuracy as FP32" and "embarrassing accuracy regression we explain to the customer".

Quantization sensitivity is real

Some architectures quantize cleanly. Others don't.

ResNet, MobileNet, YOLO families — INT8 with <1 % accuracy regression. No drama.
Transformers (ViT, DETR) — sensitive. Often need per-channel quantization, sometimes need partial FP16 retention on attention heads.
Anomaly detection (PatchCore, EfficientAD) — distance-based scoring is sensitive to quantization noise. We spent a week recovering 2 % AUROC on EfficientAD with QAT before deciding to keep it on a Jetson Orin Nano instead.

The pragmatic rule: if your model has unusual numerics (cosine similarity in the loss, distance-based scoring, custom layer norms), assume quantization will cost you 1-3 % accuracy and budget for QAT.

Memory & multi-model deployments

Hailo-8 has 20 MB of on-chip SRAM. A typical YOLOv8s post-quantization is around 12 MB; YOLOv8m is around 25 MB and doesn't fit alone. The chip then "context switches" — loading partial graphs from host RAM — which costs latency.

For multi-model deployments (e.g. detection + classification + OCR on the same chip), HailoRT supports model swapping between frames. It's measurably slower than a single model. We size for single-model where latency matters, multi-model where the use case can tolerate 30-50 ms swap penalties.

Hailo-15 vs Hailo-8 — when to upgrade

Hailo-15 is the newer SoC-style chip with built-in ISP, video codec, and more compute. We use it when:

The cell is space-constrained and we want camera + accelerator on a single board
We need >1 stream at production resolution
Multi-model deployments stop fitting on Hailo-8

For a single-camera, single-model station, the Hailo-8 M.2 is still the cheapest path.

One thing we'd warn about

The Hailo ecosystem is excellent if you're building one camera-to-decision pipeline per chip. It is less ergonomic if you're building a heterogeneous data pipeline with 20 transforms and 3 conditional models — for that you want CPU + Hailo, not Hailo alone.

Anyone running Hailo-15 in real cells yet? Curious about the ISP integration story and whether it actually replaces a discrete camera ASIC.

Hailo-8 in production: lessons from shipping eight stations on the same platform

Hailo-8 in production: lessons from shipping eight stations on the same platform

Aior

Administrator

Why Hailo

The toolchain workflow, in practice

Quantization sensitivity is real

Memory & multi-model deployments

Hailo-15 vs Hailo-8 — when to upgrade

One thing we'd warn about

Forum statistics

Members online

Latest posts

Newest members

Featured content

Trending content

Share this page

Legal Notice

We value your privacy

Hailo-8 in production: lessons from shipping eight stations on the same platform

Hailo-8 in production: lessons from shipping eight stations on the same platform

Aior

Administrator

Why Hailo​

The toolchain workflow, in practice​

Quantization sensitivity is real​

Memory & multi-model deployments​

Hailo-15 vs Hailo-8 — when to upgrade​

One thing we'd warn about​

Forum statistics

Members online

Latest posts

Newest members

Featured content

Trending content

Share this page

Tüm ihtiyaçlarınız için Teklif alın

Legal Notice

We value your privacy

Why Hailo

The toolchain workflow, in practice

Quantization sensitivity is real

Memory & multi-model deployments

Hailo-15 vs Hailo-8 — when to upgrade

One thing we'd warn about