Accelerate multi-streaming cameras with DeepStream and deploy custom (YOLO) models
使用DeepStream加速多串流攝影機並部署客製(YOLO)模型

tags: `Edge AI`　`DeepStream` `Edge_AI` `deployment` `Nvidia` `Jetson`

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

NVIDIA Jetson 平台部署相關筆記

基本環境設定

模型部署與加速

yolov7 with multiple cameras running on DeepStream

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Github 開箱即用

簡介

Nvidia DeepStream是一個基於GStreamer的流分析工具包，可用於多傳感器處理、視頻、音頻和圖像理解等多種用途。目標是提供一個完整的流處理框架，基於AI和計算機視覺技術，以實現對傳感器數據的即時處理和分析

詳見官方介紹DeepStream SDK | NVIDIA Developer

白話來說，DeepStream是Nvidia是作為AI部屬平台的加速工具，主要在串接模型資料傳輸處理與協定上進行加速，特別是在多媒體串流處理上更有使用的必要性

DeepStream 應用架構解析

DeepStream Reference Application - deepstream-app

DeepStream組件

Nvidia DeepStream是一個人工智能框架，有助於利用Jetson和GPU設備中的Nvidia GPU的最終潛力來實現計算機視覺。它為Jetson Nano等邊緣設備和Jetson系列的其他設備提供動力，實時處理邊緣設備上的並行視頻流。

DeepStream使用Gstreamer流水線（用C語言編寫）在GPU中獲取輸入視頻，最終以更快的速度處理它，以便進一步處理。

DeepStream的組成部分
DeepStream有一個基於插件的架構。基於圖形的管道接口允許高層組件互連。它可以在GPU和CPU上使用多線程進行異質並行處理(heterogeneous parallel processing)。
下面是DeepStream的主要組件和它們的高級功能

名稱	說明
元數據 Meta Dat	它是由圖形生成的，在圖形的每個階段都會生成。利用它，我們可以得到許多重要的字段，如檢測到的物體類型、ROI坐標、物體分類、來源等。
解碼器 Decoder	解碼器有助於對輸入影片（H.264和H.265）進行解碼。它支持多數據流(multi-stream)同時解碼。它將位元深度(Bit depth)和分辨率(Resolution)作為參數。
影片聚合器 Video Aggregator (nvstreammux）	它有助於接受n個輸入流並將其轉換為連續的批量幀(sequential batch frames)。它使用低級別的API來訪問GPU和CPU來完成這個過程。
推理 Inferencing (nvinfer）	這是用來獲得所使用的模型的推斷。所有與模型相關的工作都是通過nvinfer完成的。它還支持一級和二級模式以及各種聚類方法。
格式轉換和縮放 Format Conversion and Scaling (nvvidconv）	它將格式從YUV轉換為RGBA/BRGA，縮放分辨率並做圖像旋轉部分。
對象追蹤器 Object Tracker (nvtracker）	它使用CUDA並基於KLT參考實現。我們也可以用其他追蹤器來替換默認的追蹤器。
屏幕追蹤器 Screen Tiler (nvstreamtiler）	它管理輸出的視頻，即相當於open cv的imshow函數。
屏幕顯示 On Screen Display (nvdosd）	它管理屏幕上所有的可畫性，比如畫線、邊界框、圓圈、ROI等。
訊息轉換器和代理器（nvmsgconv + nvmsgbroker）	組合在一起，將分析資料發送到雲中的伺服器

(source : Nvidia DeepStream – A Simplistic Guide)

Nvidia DeepStream的執行流程
Decoder -> Muxer -> Inference -> Tracker (if any) -> Tiler -> Format Conversion -> On Screen Display -> Sink

DeepStream應用程序由兩部分組成，一部分是配置文件，另一部分是其驅動文件（可以是C語言或Python語言）。

在設定檔config.txt相關的控制項的說明在Configuration Groups文件內

使用Gstreamer 命令列檢視組件功能

這邊需要先有Gstreamer的基本概念才看得懂

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

GStreamer 簡介與筆記
安裝完成gstreamer後，執行gst-inspect-1.0指令檢視模組功能，以nvinfer模組為例

gst-inspect-1.0 nvinfer

`nvinfer` 插件的屬性定義與功能

`Pad Templates` and `Pads`　（數據傳輸接口）

可以見到有src輸出(數據生產)與sink輸入(數據消費)的端點
支援的影片格式為video/x-raw(memory:NVMM) 預設使用NVIDIA的GPU處理
- 可選格式為 NV12 與RGBA
- 這邊可見輸入模型預設的通道為RGBA

`Element Properties`　（元素屬性）

可以見到有許多參數的設定說明，這也是後面config file中設定的參數項目

也可在code中取得或進行設定
- 取得 .get_property()
- 設定 .set_property()

# create "nvinfer" element
pgie = Gst.ElementFactory.make("nvinfer", "primary-inference") 

# get "batch size"
pgie.get_property("batch-size")

# set "config file"
pgie.set_property("config-file-path", config_file_path)

Python API(binding)使用入門

deepstream-get-started-with-python

以下主要參考官方提供的DeepStream Python AppsNVIDIA-AI-IOT/deepstream_python_apps的範例，內有詳細執行範例

DeepStream python biding 示意圖

環境安裝

按NVIDIA-AI-IOT/deepstream_python_apps/blob/master/bindings/README.md文件指引

補充執行到步驟1.3 deepstream_python_apps倉儲安裝

按以下指示安裝
當執行到 "1.3 Initialization of submodules"時
要將"deepstream_python_apps"的倉儲clone到DeepStream根目錄/source目錄下<DeepStream 6.2 ROOT>/sources:

我這邊安裝的是deepstream 6.2版，可以用 dpkg -L deepstream-6.2指令查找安裝位置，按Linux系統慣例果然在/opt/之下，我查到的安裝位置是：/opt/nvidia/deepstream/deepstream-6.2/

cd /opt/nvidia/deepstream/deepstream/sources
git clone https://github.com/NVIDIA-AI-IOT/deepstream_python_apps

補充執行步驟2 Compiling the bindings時出現錯誤

按照官方指示執行到2.1 Quick build (x86-ubuntu-20.04 | python 3.8 | Deepstream 6.2)時

cd deepstream_python_apps/bindings
mkdir build
cd build
cmake ..
make

按照官方預設指令先執行make ..，再接在執行make以後，會在/opt/nvidia/deepstream/deepstream/sources/deepstream_python_apps/bindings/build下產出 'dist/pyds-1.1.6-py3-none-linux_x86_64.whl' 檔案，後面接著執行

pip3 install ./pyds-1.1.6-py3-none*.whl

會出現錯誤訊息"ERROR: pyds-1.1.6-py3-none-linux_x86_64.whl is not a supported wheel on this platform."

錯誤原因分析
- 因為預設預設平台環境錯誤
x86_64 is for dGPU, please add -DPIP_PLATFORM=linux_aarch64 when executing cmake.
by nvidia開發者論壇
解決方案
- 執行cmake指令時，後面需指定運作的硬體環境，指令應該為 cmake .. -DPIP_PLATFORM=linux_aarch64

範例

範例檔案使用說明
- 官方文件內有提供各種應用情境範例的.py檔及配置文件，包含串聯不同模型、multistream、結合Triton或直接使用本機TRT直接推論

模型轉換

模型轉換流程與格式

Nvidia官方指引Using the TensorRT Runtime API
以pytorch > onnx > TensorRT 為例
- model.pt → model.onnx → model.engine
TensorRT格式有 .engine 與 model.trt兩種
- .engine格式
  - 是TensorRT模型的序列化表示形式，它存儲了經過優化和編譯的模型和相關參數
  - DeepStream 所必需的讀入格式
- model.trt格式
  - 實際上與.engine文件相同，是.engine文件的另一種後綴名稱，可以互相轉換使用
  - DeepStream無法直接使用
  - 在python環境中可以透過import tensorrt、pycuda等模組讀取二進位的.trt檔，對其序列化後使用
在DeepStream文件設定中，指定以model-engine-file讀取==.engine==參數配置的必要性

第一次啟用時，DeepStream app時會根據設定文件的配置，載入.onnx檔構建 TensorRT引擎後產出model.engine檔案，此後再指定“model-engine-file=model.engine”啟動就會快很多

在 DeepStream 中，如果已經指定了 "onnx-file"，那麽指定 "model-engine-file" 的設置通常是可選的。
- "onnx-file" 參數用於指定 ONNX 模型文件的路徑，DeepStream 將根據該文件構建 TensorRT 引擎。這種方式需要在運行時動態地將 ONNX 模型轉換為 TensorRT 引擎。這可能需要一些額外的時間和計算資源來完成模型的轉換和優化，因為在每次運行應用程序時都需要進行這個過程(在AGX Xavier要20-30min)。

當指定 "model-engine-file" 參數時，它用於指定預先構建好的 TensorRT 引擎文件的路徑。預先構建引擎意味著將 ONNX 模型轉換為 TensorRT 引擎的過程已經提前執行，引擎文件已經生成並存儲在磁盤上。在應用程序運行時，DeepStream 將直接加載該引擎文件，而無需再次進行模型轉換和優化的過程。這可以節省啟動時間並加快執行速度。

如果指定了 "model-engine-file"，DeepStream 將忽略 "onnx-file" 的設置，而直接加載和使用預先構建的引擎文件

YOLOv7模型格式轉換(Onnx → TensorRT Engine)

參見NVIDIA-AI-IOT/yolo_deepstream/tensorrt_yolov7
- 官方有提供客製的編譯檔與環境，編譯環境準備請參考官方指示

模型取得

NVIDIA-AI-IOT/yolo_deepstream/yolov7_qat

NVIDIA-AI-IOT/yolo_deepstream/yolov7_qat可直接下載量化過後的int8版本
- Quantization Aware Training(QAT-INT8)訓練過的模型版本(yolov7_qat_640.onnx)
  - Model Quantization Note 模型量化筆記
    下載後下一步要在自己的硬體平台先轉為TensorRT Engine格式

準備 TensorRT engines

convert onnx model(.onnx) to TensorRT-engine(.engine )

轉換方式參考自NVIDIA-AI-IOT/yolo_deepstream/ tensorrt_yolov7)

由於後面流程要串接多路攝影機，因此我選擇轉為動態批次的設定

# int8 QAT model, the onnx model with Q&DQ nodes
/usr/src/tensorrt/bin/trtexec --onnx=yolov7_qat_640.onnx \
            --saveEngine=yolov7QAT_640.engine --fp16 --int8

# if you want dynamic_patch for batch inference
/usr/src/tensorrt/bin/trtexec --onnx=yolov7_qat_640.onnx \
            --minShapes=images:1x3x640x640 \
            --optShapes=images:12x3x640x640 \ 
            --maxShapes=images:16x3x640x640 \
            --saveEngine=yolov7QAT_640.engine --fp16 --int8

如果後面串接的數量固定
- '–minShapes'、'–optShapes'、'–maxShapes'都設為一致，似乎可以開啟NV的DLA (Deep Learning Accelerator)加速(待確認)

使用DeepStream部署YOLO系列模型

主要流程參照NVIDIA官方文件NVIDIA-AI-IOT/yolo_deepstream

另外這篇非官方的導覽也可以參考marcoslucianops/DeepStream-Yolo/customModels

手動修改DeepStream Python Binding範例文件，並指定使用客製模型(YOLOv7)

資料配置方式，在/opt/nvidia/deepstream/deepstream目錄內

以下分兩部分來說明:

1. DeepStream 模型檔案配置與客製模型編譯

這裡採取的作法是把客製模型放在模型專用目錄下，方便未來其他專案重複取用
samples/models/tao_pretrained_models/yolov7/

編譯過程詳見NVIDIA-AI-IOT/yolo_deepstream/deepstream_yolo內的操作指示

詳細編譯過程與檔案配置位置

編譯與設定文件取得
- 為方便閱讀，這邊將/opt/nvidia/deepstream/deepstream/samples/models/tao_pretrained_models/yolov7指定為path_yolov7變數，方便之後取用

cd ~/
sudo git clone https://github.com/NVIDIA-AI-IOT/yolo_deepstream.git

path_yolov7=/opt/nvidia/deepstream/deepstream/samples/models/tao_pretrained_models/yolov7
sudo mmkdir $path_yolov7

sudo mcp -vr /yolo_deepstream/deepstream_yolo/* $path_yolov7/

這時可以看到nvdsinfer_custom_impl_Yolo/內的資料結構包含3個檔案

nvdsinfer_custom_impl_Yolo/
├── Makefile # 用於編譯程式碼的。包含了編譯和建構這個自訂推論實現所需的指令和規則。
├── nvdsparsebbox_Yolo.cpp      # C++ 檔案，包含了用於解析和處理YOLO模型的邊界框的程式碼
├── nvdsparsebbox_Yolo_cuda.cu  # CUDA 檔案，包含了在 GPU 上執行的加速程式碼
                                # 針對 YOLO 模型的邊界框解析進行加速計算的 CUDA 程式碼
                                # YOLO post-processing(decoce yolo result, not include NMS)

進入nvdsinfer_custom_impl_Yolo目錄內開始編譯

cd $path_yolov7/nvdsinfer_custom_impl_Yolo
sudo make
cd ..

編譯成功後，就會得到1個主要的.so檔及2個編譯好的物件檔案(.o)

nvdsinfer_custom_impl_Yolo/
├── Makefile
├── nvdsparsebbox_Yolo.cpp
├── nvdsparsebbox_Yolo_cuda.cu
├── libnvdsinfer_custom_impl_Yolo.so # 共享程式庫（shared library），用於執行時的動態連結。
                                     # 這個程式庫提供了用於推論（inference）的自訂實現
├── nvdsparsebbox_Yolo_cuda.o  # 已編譯的 CUDA 目標檔案（object file），用於在連結階段
                               # 將 CUDA 代碼與其他目標檔案一起連結成最終的執行檔案或共享程式庫     
└── nvdsparsebbox_Yolo.o  # 已編譯的 C++ 目標檔案，用於在連結階段將 C++ 程式碼
                          # 與其他目標檔案一起連結成最終的執行檔案或共享程式庫

2. DeepStream app(python API)與配置文件

python biding的範例檔位於/opt/nvidia/deepstream/deepstream/sources/deepstream_python_apps/apps

2.1 這邊以`deepstream_test1_rtsp_in_rtsp_out/`為例

修改的檔案下載處
- deepstream_test1_rtsp_in_rtsp_out_getconf.py
- dstest1_pgie_inferserver_config.txt

資料輸入
- 這個範例中可以使用rtsp(使用rtsp://或file:/)讀入多種格式的影片檔案(.h264、.mp4、.mov等)
資料輸出
- 使用rtsp接收
- 本機(直接在server/jetson上檢視)
  - 在瀏覽器輸入rtsp://localhost:8554/ds-test
- 遠端連線檢視
  - 我這邊使用的方案是VLC Player
    - 在"媒體/開啟網路"串流的設定中輸入指定的位址rtsp://<your_server_ip>:8554/ds-test
    - 相關設定參考dusty-nv/jetson-inference/Camera Streaming and Multimedia

修改deepstream_test1_rtsp_in_rtsp_out.py讀取設定檔的路徑


        pgie.set_property("config-file-path", config_file_path)







def parse_args():
    parser.add_argument("-config", "--config-file", default='dstest1_pgie_config.txt',
                help="Set the config file path", type=str)

    args = parser.parse_args()
    global config_file_path
    config_file_path = args.config_file

2.2 模型配置文件修改

參照NVIDIA-AI-IOT/yolo_deepstream/deepstream_yoloconfig_infer_primary_yoloV7.txt的文件配置建立dstest1_pgie_yolov7_config.txt

pgie_config.txt 客製模型配置文件說明

pgie_config.txt 客製模型配置文件說明
以下說明改動部分，主要是跟模型存放路徑有關
- 客製模型檔案路徑
  為方便閱讀，這邊以your_model_path 代替完整路徑/opt/nvidia/deepstream/deepstream/samples/models/tao_pretrained_models/yolov7/
  - onnx-file=your_model_path/yolov7.onnx
  - model-engine-file= your_model_path/yolov7.onnx_b16_gpu0_fp16.engine
    - 在第一次載入model.onnx後，DeepStream會動態地將 ONNX 模型轉換為 TensorRT 引擎(AGX Xavier耗時2-30min)，然後自動產出已命名好的.engine檔案
  - labelfile-path=your_model_path/labels.txt
- 模型設定
  - batch-size
    - 模型轉為.onnxe格式時記得要開啟動態批次()，詳見前文TensorRT轉檔啟用動態批次 Dynamic_batch說明
    - ~~至於在deepstream中該設多大呢? 參考前人經驗並非越大越好，還需要實際測試，但可以先抓跟你要輸入的camera數量接近~~
    - 根據官方文件對於效能提升的提示DeepStream best practices，批次大小設置與輸入源(batch == sources)一致，達到的效能提升效果最好
  - parse-bbox-func-name=NvDsInferParseCustomYoloV7_cuda # 使用cuda做後處裡
  - custom-lib-path=your_model_path/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
    - 這邊放預先make編譯好的.so檔
    - 詳細編譯過程與檔案配置見"DeepStream 模型檔案配置與編譯"小節

dstest1_pgie_yolov7_config.txt 詳細配置說明

[property]
gpu-id=0
# 使用的GPU設備ID，這裡設定為0

net-scale-factor=0.0039215697906911373
# 圖像預處理的縮放因子，將像素值轉換為0到1之間的浮點數，這個值等於1/255

model-engine-file=../../../../samples/models/tao_pretrained_models/yolov7/yolov7.onnx_b16_gpu0_fp16.engine
# 模型引擎文件的路徑和文件名，這是已經編譯好的TensorRT引擎文件

onnx-file=../../../../samples/models/tao_pretrained_models/yolov7/yolov7.onnx
# 原始ONNX模型文件的路徑和文件名

labelfile-path=../../../../samples/models//tao_pretrained_models/yolov7/labels.txt
# 包含類別標籤的文件路徑和文件名，每行包含一個類別標籤

force-implicit-batch-dim=1
# 強制隱式批次維度為1，用於不支援顯式批次維度的模型


batch-size=1
#batch-size=12
# 模型的批次大小，即一次送入模型推論的圖像數量
# 如果模型格式沒有設定動態批次的話請指定為1
# 批次大小應與輸入的stream數量相等以取得較好的推理效能

## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=2
# 網路模型的運算精度模式，這裡設定為FP16模式

num-detected-classes=80
# 模型能夠檢測的目標類別數量

gie-unique-id=1
# DeepStream GIE (GPU Inference Engine) 的唯一ID

network-type=0
# 網路模型的類型，這裡設定為0，表示物件檢測模型

#is-classifier=0
# 是否為分類器模型的標誌，這裡註解掉了，因此不使用分類器模型

## 1=DBSCAN, 2=NMS, 3=DBSCAN+NMS Hybrid, 4=None(No clustering)
cluster-mode=2
# 物件聚類模式的設定，這裡設定為NMS模式

maintain-aspect-ratio=1
# 是否保持圖像的長寬比例，這裡設定為保持比例

symmetric-padding=1
# 是否對圖像進行對稱填充，這裡設定為對稱填充

## Bilinear Interpolation
scaling-filter=1
# 圖像縮放時使用的插值方法，這裡設定為雙線性插值

#parse-bbox-func-name=NvDsInferParseCustomYoloV7
parse-bbox-func-name=NvDsInferParseCustomYoloV7_cuda
# 物件檢測結果解析的函數名稱，這裡設定為NvDsInferParseCustomYoloV7_cuda

#disable-output-host-copy=0
#disable-output-host-copy=1
# 是否禁用主機複製輸出，這裡註解掉了，因此未禁用複製輸出

custom-lib-path=../../../../samples/models/tao_pretrained_models/yolov7/nvdsinfer_custom_impl_Yolo/libnvdsinfer_custom_impl_Yolo.so
# 自定義的物件檢測實現庫的路徑和文件名

#scaling-compute-hw=0
# 圖像縮放計算硬體的設定，這裡註解掉了，因此未設定

## start from DS6.2
crop-objects-to-roi-boundary=1
# 是否將物件裁剪到ROI邊界，這裡設定為是

[class-attrs-all]
#nms-iou-threshold=0.3
#threshold=0.7
nms-iou-threshold=0.65
# 非最大抑制 (NMS) 的IoU閾值，用於去除重疊的檢測框

pre-cluster-threshold=0.25
# 物件聚類前的閾值，用於過濾低置信度的檢測框

topk=300
# 每張圖像最多保留的檢測框數量

Case Study

deepstream sdk-api

範例：讓螢幕畫面的類別標籤展示信任分數confidence scores

完整的程式碼放在YunghuiHsu/deepstream_python_apps/apps/deepstream-rtsp-in-rtsp-out

流程如示意圖，主要在Metadata進入nvdosd物件(負責螢幕顯示工作)前加入Probe，告知nvdosd該如何顯示想要的資訊，螢幕追蹤資訊要如何呈現，則在def tiler_src_pad_buffer_probe()中定義

取出並顯示信賴分數(Confidence score)

在`def tiler_src_pad_buffer_probe()`中撈出meta資料

obj_meta.text_params.display_text : 顯示meta資料文字
obj_meta.confidence : meta物件資料中預定義提取信賴分數的關鍵字
- 當在設定檔中指定為偵測模型時，會解析偵測模型類別的meta物件格式

deepstream_test1_rtsp_in_rtsp_out_getconf.py















def tiler_src_pad_buffer_probe(pad, info, u_data):
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame = batch_meta.frame_meta_list
    while l_frame is not None:
        try:
            frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
        except StopIteration:
            break
        l_obj = frame_meta.obj_meta_list
        while l_obj is not None:
            # Casting l_obj.data to pyds.NvDsObjectMeta
            obj_meta = pyds.NvDsObjectMeta.cast(l_obj.data)
            msg = f"{class_names[obj_meta.class_id]:s}"
            msg += f" {obj_meta.confidence:3.2f}"
            obj_meta.text_params.display_text = msg

接下來還需要加入探針才能更新要顯示的資訊

在`nvdosd`物件接口前加入探針(probe)更新要顯示的資訊

deepstream_test1_rtsp_in_rtsp_out_getconf.py








    # Add probe to get informed of the meta data generated, we add probe to
    # the sink pad of the osd element, since by that time, the buffer would have
    # had got all the metadata.  
    # either nvosd.get_static_pad("sink") or pgie.get_static_pad("src") works 
    osdsinkpad = nvosd.get_static_pad("sink")
    if not osdsinkpad.
        sys.stderr.write(" Unable to get sink pad of nvosd \n")
    osdsinkpad.add_probe(Gst.PadProbeType.BUFFER, tiler_src_pad_buffer_probe, 0)

執行範例

python3 deepstream_test1_rtsp_in_rtsp_out_getconf.py \ 
       -i  file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_720p.h264  \
           file:///opt/nvidia/deepstream/deepstream/samples/streams/sample_qHD.mp4 \   
       -config dstest1_pgie_yolov7_config.txt

Reference for display confidence scores
- How to include YOLOv4 confidence score in the deepstream-app output?
- https://forums.developer.nvidia.com/t/how-to-display-confidence-with-label-in-deepstream-like-person-0-81/199878/9

自定義傳遞訊息資料

使用NvDsEventMsgMeta物件

可用於跟server間溝通傳遞訊息

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

structNvDsEventMsgMeta

自定義修改及撈取MetaData

關於客製meta data存放、修改與撈出

使用自定義 NvDsUserMeta物件

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

NvDsUserMeta

NvDsInferTensorMeta

在[Gst-nvinfer]階段，可直接從TensorRT inference engine讀取原始(預測的)輸出張量，轉為meta格式

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

NvDsInferTensorMeta

從Deepstream Buffer中取出影像與meta資料進行客製操作

python biding相關範例見apps/deepstream-imagedata-multistream

從流程圖中可見，分別從FRAME BUFFER與<INFERENCE>模塊撈出影像與模型預測的張量

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

imagedata diagram

更詳細解說見 Use Deepstream python API to extract the model output tensor and customize model post-processing (e.g., YOLO-Pose)
使用Deepstream python API提取模型輸出張量並定製模型后處理（如：YOLO-Pose）

NvBufSurface

NvBufSurface 是 NVIDIA DeepStream SDK 中的一個結構，用於表示經過解碼和處理的視訊幀的圖像資料

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

structNvBufSurface

NvBufSurface 的功能如下：

維護視訊幀的元數據，如圖像的寬度、高度、畫素格式等。
提供了對視訊幀數據的訪問和操作接口，如讀取和寫入像素值、設置和獲取 ROI（Region of Interest）等。
支持對視訊幀數據進行硬件加速處理，如 GPU 轉換和編碼等。

code 範例(c++)

#include <nvbufsurftransform.h>

void process_nvbufsurface(NvBufSurface *surface) {
    // 獲取視訊幀的元數據
    int width = surface->surfaceList[0].width;
    int height = surface->surfaceList[0].height;
    int pitch = surface->surfaceList[0].pitch[0];
    NvBufColorFormat colorFormat = surface->surfaceList[0].colorFormat;

    // 設置 ROI
    NvBufSurfTransformRect src_rect;
    src_rect.top = 0;
    src_rect.left = 0;
    src_rect.width = width;
    src_rect.height = height;
    NvBufSurfTransformRect dst_rect = src_rect;

    // 讀取像素值
    unsigned char *buffer = surface->surfaceList[0].mappedAddr.addr[0];
    for (int row = 0; row < height; row++) {
        for (int col = 0; col < width; col++) {
            unsigned char pixel_value = buffer[row * pitch + col];
            // 對像素值進行處理
            // ...
        }
    }

    // 寫入像素值
    for (int row = 0; row < height; row++) {
        for (int col = 0; col < width; col++) {
            buffer[row * pitch + col] = 255; // 將所有像素設置為白色
        }
    }

    // 釋放視訊幀資源
    NvBufSurfaceParams params;
    memset(&params, 0, sizeof(params));
    params.gpuId = surface->surfaceList[0].gpuId;
    params.width = width;
    params.height = height;
    params.pitch = pitch;
    params.colorFormat = colorFormat;
    params.nvbuf_tag = surface->surfaceList[0].nvbuf_tag;
    nvbufsurface_dma_unmap(&surface->surfaceList[0], -1, -1);
    nvbufsurface_free(surface);
}

DeepStream效能優化的基本原則

DeepStream best practices

以下羅列幾項基本設置，更多請參考官方文件對於效能提升的提示DeepStream best practices

批次大小設置為等於輸入源(batch == sources)
streammux的高度和寬度設置為輸入解析度
如果從RTSP 從USB 傳輸，配置文件的[streammux]設置live-source=1，可以確保正確的時間戳記
視覺輸出(Tiling and visual output )會占用GPU資源。在不需要在屏幕上渲染輸出時，以下三個方法可以禁用以最大限度提高吞吐量
- 關閉OSD或屏幕顯示
  - 在配置文件將[osd]參數中設置enable=0
- 平鋪器(tiler)為顯示輸出流創建了一個NxM網格
  - 將 [tiled-display]參數中設置enable=0
- 關閉輸出接收器(output sink)
  - 將[sink]參數選擇fakesink，即type=1

Jetson optimization

確保Jetson時脈開至最大

$ sudo nvpmodel -m <mode> --for MAX perf and power mode is 0
$ sudo jetson_clocks

參考資料

NVIDIA DeepStream官方文件

DeepStream SDK | NVIDIA Developer
- 官方文件及載點
- docker DeepStream-l4t
- 範例檔案使用說明
  - 官方文件內有提供各種應用情境範例的.py檔及配置文件，包含串聯不同模型、multistream、結合Triton或直接使用本機TRT直接推論
- 部署YOLO模型的文件配置NVIDIA-AI-IOT/yolo_deepstream
Building a Real-time Redaction App Using NVIDIA DeepStream, Part 2: Deployment | NVIDIA Technical Blog

NVIDIA 官方範例程式

NVIDIA-AI-IOT/deepstream_python_apps: DeepStream SDK Python bindings and sample applications (github.com)
- 提供python接口的範例
- NVIDIA Jetson Nano 2GB 系列文章（35）：Python版test1实战说明

其他不錯的DeepStream概念介紹

2022。Galliot。NVIDIA DeepStream Python Bindings; Customize your Applications

Figure 2: A DeepStream Pipeline with two processing elements

2021。Kavika Roy。Nvidia DeepStream — A Simplistic Guide

NVIDIA 教學影片

2023/01。NVIDIA DeepStream Technical Deep Dive: DeepStream Inference Options with Triton & TensorRT

outline
1. 使用DeepStream的推理選項來處理Tensorflow、Pytorch和ONNX模型。
2. 與TensorRT和DeepStream合作，進行優化模型。
3. 使用Triton服務器來支持單個或多個DeepStream管道。
4. 使用DeepStream的前/後處理插件。

帶有DS-TensorRT（gst-nvinfer）插件的DeepStream-app設置

應用程序和插件的配置文件
DS-TRT首次將ONNX/TAO/Caffe模型在線轉化為TensorRT引擎文件

DeepStream-app with DS-Triton (gst-nvinferserver) Server CAPI

推理方法 1(本機):
- Triton Server CAPI：在單一程序(Process)中直接加載模型Repo

DeepStream-app with DS-Triton (gst-nvinferserver) gRPC Inference

推理方法 2(遠端):
- Triton gRPC Remote: 通過gRPC向遠程tritonserver-app發送INPUT並等待響應

	Inference Approach 1: Triton Server CAPI	Inference Approach 2: Triton gRPC Remote
優點	- 直接在本機加載模型並使用，效能較佳 - 不需要通過網絡傳輸數據	- 可以將模型部署在遠程服務器上(雲端) - 可以使用Triton Server提供的所有特性
缺點	- 受限於單個進程中的記憶體和處理能力 - 不支持在遠程服務器上運行模型	- 需要通過網絡傳輸數據，可能會影響推理性能 - 遠程服務器必須支持gRPC協議

在NVIDIA Triton Inference Server中，"CAPI"和"gRPC"都是指不同的推理模式。
- CAPI
  - 代表“C API”。這是一種本地推理模式，其中應用程序直接使用Triton的C++推理庫來加載和運行模型。這種推理模式通常用於將推理服務嵌入到應用程序中，以便實現最佳性能和低延遲。在這種模式下，模型存儲庫直接加載到應用程序進程中，因此可以避免網絡傳輸和通信開銷。
- gRPC
  - 代表“general-purpose Remote Procedure Call”。這是一種遠程推理模式，其中客戶端通過網絡連接到遠程Triton推理服務器，使用gRPC協議發送輸入數據，並等待服務器返回推理結果。在這種模式下，應用程序不需要在本地加載模型，因為模型存儲庫是由Triton推理服務器加載和管理的。這種模式通常用於客戶端和服務器之間的跨網絡推理，例如在雲上運行的推理服務。

推理前的DeepStream批(Batching)處理

nvstreammux在推理前將所有的輸入數據流分批輸入在推理前一起進行
nvstreammux的批處理策略適用於兩個推理插件
- DS-Triton(gst-nvinferserver)插件
- DS-TensorRT (gst-nvinfer) 插件
通過配置文件設置批量(batch)大小

解析推理數據的DeepStream示例應用程序

DeepStream Triton推理數據流

2022/06。NVIDIA DeepStream Technical Deep Dive : Multi-Object Tracker

2021/01。Implementing Real-time Vision AI Apps Using NVIDIA DeepStream SDK

C++ python bindings

python-bindings-overview

YOLO - DeepStream

以下幾個倉儲有提供C++的範例程式，不過如果要結合PYTHON BINDING的範例程式執行，要手動修改的部分還蠻複雜的

GITHUB

NVIDIA-AI-IOT/yolo_deepstream
- 官方範例檔
- 提供C++的範例程式
marcoslucianops/DeepStream-Yolo
- 非官方範例，有提供YOLO系列架構支援
- 提供C++的範例程式
- 說明與支援較完整
visualcortex-official/yolov7-deepstream
- 提供整合efficientNMS外掛的支援

Forum

Tutorial: How to run YOLOv7 on Deepstream
- 有討論到如何在多媒體串流範例程式上如何修改 deepstream_python_apps/apps/deepstream-rtsp-in-rtsp-out at master · NVIDIA-AI-IOT/deepstream_python_apps
Deepstream python app yolov7 integration issue
- python3 deepstream_test_1.py /opt/nvidia/deepstream/deepstream-6.2/samples/streams/sample_720p.h264
- dstest1_pgie_config.txt → from config_infer_primary_yoloV7.txt 1 under yolo_deepstream/deepstream_yolo at main · NVIDIA-AI-IOT/yolo_deepstream · GitHub
- nvdsinfer_custom_impl_Yolo 1 → from nvdsinfer_custom_impl_Yolo 1 under yolo_deepstream/deepstream_yolo at main · NVIDIA-AI-IOT/yolo_deepstream · GitHub
- labels.txt → from labels.txt under yolo_deepstream/deepstream_yolo at main · NVIDIA-AI-IOT/yolo_deepstream · GitHub
- yolov7.onnx → from yolov7.onnx under yolo_deepstream/yolov7_qat at main · NVIDIA-AI-IOT/yolo_deepstream · GitHub

GStreamer

GStreamer 簡介與筆記
DeepStream的功能是建基於GStreamer上，所以在使用前最好對後者的原理有點概念會比較好修改
gstreamer/documentation

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.

Accelerate multi-streaming cameras with DeepStream and deploy custom (YOLO) models使用DeepStream加速多串流攝影機並部署客製(YOLO)模型

tags: Edge AI DeepStream Edge_AI deployment Nvidia Jetson

NVIDIA Jetson 平台部署相關筆記

基本環境設定

模型部署與加速

yolov7 with multiple cameras running on DeepStream

Github 開箱即用

簡介

DeepStream 應用架構解析

DeepStream組件

使用Gstreamer 命令列檢視組件功能

nvinfer 插件的屬性定義與功能

Pad Templates and Pads （數據傳輸接口）

Element Properties （元素屬性）

Python API(binding)使用入門

環境安裝

補充 執行到 步驟1.3 deepstream_python_apps倉儲安裝

補充 執行步驟2 Compiling the bindings時出現錯誤

範例

模型轉換

模型轉換流程與格式

YOLOv7模型格式轉換(Onnx → TensorRT Engine)

模型取得

準備 TensorRT engines

使用DeepStream部署YOLO系列模型

手動修改DeepStream Python Binding範例文件，並指定使用客製模型(YOLOv7)

1. DeepStream 模型檔案配置與客製模型編譯

2. DeepStream app(python API)與配置文件

2.1 這邊以deepstream_test1_rtsp_in_rtsp_out/為例

2.2 模型配置文件修改

Case Study

範例 ：讓螢幕畫面的類別標籤展示信任分數confidence scores

取出並顯示信賴分數(Confidence score)

在def tiler_src_pad_buffer_probe()中撈出meta資料

在nvdosd物件接口前加入探針(probe)更新要顯示的資訊

自定義傳遞訊息資料

使用NvDsEventMsgMeta物件

自定義修改及撈取MetaData

使用自定義 NvDsUserMeta物件

NvDsInferTensorMeta

從Deepstream Buffer中取出影像與meta資料進行客製操作

NvBufSurface

code 範例(c++)

DeepStream效能優化的基本原則

DeepStream best practices

Jetson optimization

參考資料

NVIDIA DeepStream官方文件

NVIDIA 官方範例程式

其他不錯的DeepStream概念介紹

NVIDIA 教學影片

2023/01。NVIDIA DeepStream Technical Deep Dive: DeepStream Inference Options with Triton & TensorRT

帶有DS-TensorRT（gst-nvinfer）插件的DeepStream-app設置

DeepStream-app with DS-Triton (gst-nvinferserver) Server CAPI

DeepStream-app with DS-Triton (gst-nvinferserver) gRPC Inference

推理前的DeepStream批(Batching)處理

解析推理數據的DeepStream示例應用程序

DeepStream Triton推理數據流

2022/06。NVIDIA DeepStream Technical Deep Dive : Multi-Object Tracker

2021/01。Implementing Real-time Vision AI Apps Using NVIDIA DeepStream SDK

C++ python bindings

python-bindings-overview

YOLO - DeepStream

GITHUB

Forum

GStreamer

Deploy YOLOv8 on NVIDIA Jetson using TensorRT and DeepStream SDK

Accelerate multi-streaming cameras with DeepStream and deploy custom (YOLO) models
使用DeepStream加速多串流攝影機並部署客製(YOLO)模型

tags: `Edge AI`　`DeepStream` `Edge_AI` `deployment` `Nvidia` `Jetson`

`nvinfer` 插件的屬性定義與功能

`Pad Templates` and `Pads`　（數據傳輸接口）

`Element Properties`　（元素屬性）

補充執行到步驟1.3 deepstream_python_apps倉儲安裝

補充執行步驟2 Compiling the bindings時出現錯誤

2.1 這邊以`deepstream_test1_rtsp_in_rtsp_out/`為例

範例：讓螢幕畫面的類別標籤展示信任分數confidence scores

在`def tiler_src_pad_buffer_probe()`中撈出meta資料

在`nvdosd`物件接口前加入探針(probe)更新要顯示的資訊