Yolo GPU DeepStream realtime streaming detection

# Yolo GPU DeepStream realtime streaming detection ## Source: In this tutorial, I use `rtsp` source. Specifically, I used a smartphone app `LarixBroadcaster` stream `rtmp` to our server: `172.18.240.131` port `1937` with topic `app/viewsmall`, i.e., `rtmp://172.18.240.131:1937/app/viewsmall`. Then the `rtsp_simple` server has been deployed in 172.18.240.131 will re-broadcast the real-time streaming video as both rtsp, rtmp, and HLS. We will use `rstp://172.18.240.131:8555/app/viewsmall` as the source for deepstreamer ![](https://i.imgur.com/vyVHVuG.jpg) ![](https://i.imgur.com/gzRjINL.jpg) ## DeepStreamer Yolo: * Use my built yolo deepstream (DS ver 6.0) image: ``` docker run --it --network=host --gpus 2 -w /opt/nvidia/deepstream/deepstream-6.0/sources/objectDetector_Yolo_Mao ngovanmao/nvidia-deepstream-6.0.1-yolo:v03 ``` Inside the docker container, you can modify the config file. For example, here is the config file with source0 as RTSP and sink0 as RTSP. You can see some options are disable (i.e, `enable=0`), we only use `source0` and `sink0`, no `display`. If you want to modify the pipeline, feel free to change. ``` [application] enable-perf-measurement=1 perf-measurement-interval-sec=5 #gie-kitti-output-dir=streamscl [tiled-display] enable=0 rows=1 columns=1 width=4096 height=2160 gpu-id=0 #(0): nvbuf-mem-default - Default memory allocated, specific to particular platform #(1): nvbuf-mem-cuda-pinned - Allocate Pinned/Host cuda memory, applicable for Tesla #(2): nvbuf-mem-cuda-device - Allocate Device cuda memory, applicable for Tesla #(3): nvbuf-mem-cuda-unified - Allocate Unified cuda memory, applicable for Tesla #(4): nvbuf-mem-surface-array - Allocate Surface Array memory, applicable for Jetson nvbuf-memory-type=0 [source0] enable=1 #Type - 1=CameraV4L2 2=URI 3=MultiURI type=2 uri=rtsp://172.18.240.131:8555/app/viewsmall latency=20 num-sources=1 gpu-id=0 # (0): memtype_device - Memory type Device # (1): memtype_pinned - Memory type Host Pinned # (2): memtype_unified - Memory type Unified cudadec-memtype=0 [source1] enable=0 #Type - 1=CameraV4L2 2=URI 3=MultiURI type=1 intra-decode-enable=1 camera-width=4096 camera-height=2160 camera-fps-n=30 camera-fps-d=1 #device=/dev/video0 camera-v4l2-dev-node=0 nvbuf-memory-type=3 gpu-id=0 [sink0] enable=1 #Type - 1=FakeSink 2=EglSink 3=File 4=RTSPStreaming 5=Overlay type=4 #1=h264 2=h265 codec=1 sync=0 source-id=0 gpu-id=0 nvbuf-memory-type=0 bitrate=4000000 # set below properties in case of RTSPStreaming rtsp-port=8554 udp-port=5400 [sink1] enable=0 #Type - 1=FakeSink 2=EglSink 3=File type=2 sync=0 source-id=0 gpu-id=0 nvbuf-memory-type=0 [osd] enable=1 gpu-id=0 border-width=2 text-size=15 text-color=1;1;1;1; text-bg-color=0.3;0.3;0.3;1 font=Serif show-clock=0 clock-x-offset=800 clock-y-offset=820 clock-text-size=12 clock-color=1;0;0;0 nvbuf-memory-type=0 [streammux] gpu-id=0 ##Boolean property to inform muxer that sources are live live-source=1 batch-size=2 ##time out in usec, to wait after the first buffer is available ##to push the batch even if the complete batch is not formed batched-push-timeout=40000 ## Set muxer output width and height width=4096 height=2160 ##Enable to maintain aspect ratio wrt source, and allow black borders, works ##along with width, height properties enable-padding=0 nvbuf-memory-type=0 # config-file property is mandatory for any gie section. # Other properties are optional and if set will override the properties set in # the infer config file. [primary-gie] enable=1 gpu-id=0 #model-engine-file=model_b1_gpu0_int8.engine labelfile-path=labels.txt batch-size=1 #Required by the app for OSD, not a plugin property bbox-border-color0=1;0;0;1 bbox-border-color1=0;1;1;1 bbox-border-color2=0;0;1;1 bbox-border-color3=0;1;0;1 interval=2 gie-unique-id=1 nvbuf-memory-type=0 config-file=config_infer_primary_yoloV3.txt [tracker] enable=1 # For NvDCF and DeepSORT tracker, tracker-width and tracker-height must be a multiple of 32, respectively tracker-width=640 tracker-height=384 ll-lib-file=/opt/nvidia/deepstream/deepstream-6.0/lib/libnvds_nvmultiobjecttracker.so # ll-config-file required to set different tracker types # ll-config-file=../../samples/configs/deepstream-app/config_tracker_IOU.yml ll-config-file=../../samples/configs/deepstream-app/config_tracker_NvDCF_perf.yml # ll-config-file=../../samples/configs/deepstream-app/config_tracker_NvDCF_accuracy.yml # ll-config-file=../../samples/configs/deepstream-app/config_tracker_DeepSORT.yml gpu-id=0 enable-batch-process=1 enable-past-frame=1 display-tracking-id=1 [tests] file-loop=0 ``` * The start the Yolo service inference on edge server (e.g., Xinmatrix GPU edge server): The model is quite big so loading quite slow, be patient. You can use Yolov3_tiny or other models download online as well. ``` root@xinmatrix:/opt/nvidia/deepstream/deepstream-6.0/sources/objectDetector_Yolo_Mao# deepstream-app -c deepstream_app_config_yoloV3_Mao.txt .... (100) conv-bn-leaky 128 x 76 x 76 256 x 76 x 76 61277790 (101) conv-bn-leaky 256 x 76 x 76 128 x 76 x 76 61311070 (102) conv-bn-leaky 128 x 76 x 76 256 x 76 x 76 61607006 (103) conv-bn-leaky 256 x 76 x 76 128 x 76 x 76 61640286 (104) conv-bn-leaky 128 x 76 x 76 256 x 76 x 76 61936222 (105) conv-linear 256 x 76 x 76 255 x 76 x 76 62001757 (106) yolo 255 x 76 x 76 255 x 76 x 76 62001757 Output yolo blob names : yolo_83 yolo_95 yolo_107 Total number of yolo layers: 257 Building yolo network complete! Building the TensorRT Engine... .... ``` This may take 4 minutes for completing conversion model (i.e., do not know exactly why, I guess due to different platform issues, but just waiting is fine). After you see `**PERF: FPS 0 (Avg)`, it start inference online. For Yolov3, it can run 24 FPS: ``` **PERF: 23.80 (23.55) **PERF: 24.17 (23.75) **PERF: 23.78 (23.76) **PERF: 24.07 (23.86) **PERF: 24.00 (23.89) **PERF: 23.85 (23.87) **PERF: 24.08 (23.89) **PERF: 23.92 (23.90) **PERF: 23.96 (23.91) **PERF: 23.96 (23.92) **PERF: 23.81 (23.91) ``` ## Client side: As configured in the `deepstream_app_config_yoloV3_Mao.txt`, the port is `8554`, default topic is `ds-test`. You can use VLC player to play the content remotely, or other mobile app supported RTSP, or event our built tool gstreamer. ![](https://i.imgur.com/r75pglU.jpg) You can change resolution, and boders sizes of the bounding box to increase visibility. For improve latency, we can use our GStreamer script `gst-client.sh`: ``` #./gst_client.sh -s 10.1.7.24 -p 8554 -t ds-test ... gst-launch-1.0 rtspsrc location=rtsp://10.1.7.24:8554/ds-test ! rtph264depay ! h264parse ! d3d11h264dec ! glimagesink sync=false ``` ![](https://i.imgur.com/yLGSknH.png) ## Some extensions: For getting details of bounding boxes coordinate, you may follow this guide: https://forums.developer.nvidia.com/t/get-detected-bounding-box-infomations-from-deepstream-yolo-app/77327/4 My first guess is that you may change the file `/opt/nvidia/deepstream/deepstream-6.0/sources/objectDetector_Yolo_Mao/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp`, class: `decodeYoloV3Tensor`.