The thing nobody warns you about
Edge AI tutorials show "load image, run model, draw box". Production edge AI is "decode an RTSP stream from a camera that drops frames every fourteen minutes, run inference at variable framerate without skipping the trigger frame, push results to a PLC over OPC-UA, write rejected frames to disk, and log telemetry — for 18 months without a restart". The pipeline tooling decides whether that works.Three options worth considering.
NVIDIA DeepStream
What it is: a GStreamer-based SDK with NVIDIA-specific plugins for hardware-accelerated decode (NVDEC), inference (nvinfer with TensorRT), and analytics. The reference framework for Jetson video pipelines.Strengths: hardware-accelerated everything. NVDEC handles 8+ concurrent 1080p H.264 streams on a Jetson NX without touching the CPU. nvinfer integrates with TensorRT. Multi-stream batching is built-in. The reference apps cover most patterns.
Weaknesses: tight coupling to NVIDIA. GStreamer's debug story is rough — when something breaks, you're staring at a pipeline graph at midnight. Custom logic between elements requires writing GStreamer plugins, which is its own skill set.
Ship when: Jetson + multi-camera + standard CV pipeline. The leverage is real.
Bare GStreamer
What it is: the underlying framework. Run it without DeepStream when the target isn't Jetson, or when you need fine-grained control over the pipeline graph.Strengths: portable across hardware. Excellent codec / muxing / streaming coverage. ONNX Runtime + GStreamer is a workable Hailo / RK3588 / Intel pipeline.
Weaknesses: you write more glue. The Python bindings exist but are awkward; serious work is in C/C++.
Ship when: non-NVIDIA edge target, multi-camera, RTSP / file / GigE Vision sources mixed.
FFmpeg + custom Python
What it is: use FFmpeg as the decoder via PyAV or subprocess piping, do everything else in Python.Strengths: easiest to read, easiest to extend, easiest to debug. The whole pipeline lives in code your team already understands. For single-camera, single-model deployments, this is overwhelmingly the most pragmatic option.
Weaknesses: not zero-copy. CPU-side decode at 4 streams 1080p is doable on x86 but tight on ARM. Latency is higher than DeepStream / GStreamer for the same hardware.
Ship when: single camera, single model, prototype-to-production timeline matters more than peak throughput.
RTSP gotchas you will hit
Every camera vendor implements RTSP slightly differently. Plan for:- Reconnects. Cameras drop. The pipeline must reconnect transparently. Both DeepStream's nvurisrcbin and GStreamer's rtspsrc can be configured for this — it's not on by default.
- Codec quirks. H.264 from one vendor is not the same as H.264 from another. Some need do-timestamp=true, some break with it on. Test the actual cameras.
- Buffering vs latency. Default RTSP buffers are tuned for video playback (smooth, latent). Inspection wants the opposite. Set latency=0 or equivalent.
- Time-of-day clock skew. Cameras often have wrong RTC. If you correlate frames with PLC timestamps, sync NTP everywhere or accept correlation by sequence number, not time.
The pattern we ship
Single camera, edge inference, simple pipeline → FFmpeg + Python.Multi-camera or anything > 30 fps end-to-end → DeepStream on Jetson, GStreamer + nvinfer-equivalent on Hailo.
Anything streaming to a remote inference server → not a video pipeline, it's a network problem. Solve it as one.
One last thing
Always log per-frame inference latency to a metrics endpoint, not just to a file. The pipeline that fails silently after 14 days is the one that's eating frames you didn't notice. A Prometheus endpoint and a Grafana dashboard is fifteen minutes of work and saves a project.What's your pipeline? Curious about anyone running a custom Rust pipeline at the edge.