# 使用Deepstream python API提取模型輸出張量並定製模型后處理(如:YOLO-Pose) ###### tags: `Edge AI` `DeepStream` `deployment`  `Nvidia` `Jetson` `YOLO-POSE` `pose estimation` `TensorRT` `pose` ## NVIDIA Jetson 平台部署相關筆記 ### 基本環境設定 - [Jetson AGX Xavier 系統環境設定1_在windows10環境連接與安裝](https://hackmd.io/@YungHuiHsu/HJ2lcU4Rj) - [Jetson AGX Xavier 系統環境設定2_Docker安裝或從源程式碼編譯](https://hackmd.io/k-lnDTxVQDWo_V13WEnfOg) - [NVIDIA Container Toolkit 安裝筆記](https://hackmd.io/wADvyemZRDOeEduJXA9X7g) - [Jetson 邊緣裝置查詢系統性能指令jtop](https://hackmd.io/VXXV3T5GRIKi6ap8SkR-tg) - [Jetson Network Setup 網路設定](https://hackmd.io/WiqAB7pLSpm2863N2ISGXQ) - [OpenCV turns on cuda acceleration in Nvidia Jetson platform<br>OpenCV在Nvidia Jetson平台開啟cuda加速](https://hackmd.io/6IloyiWMQ_qbIpIE_c_1GA) ### 模型部署與加速 - [[Object Detection_YOLO] YOLOv7 論文筆記](https://hackmd.io/xhLeIsoSToW0jL61QRWDcQ) - [Deploy YOLOv7 on Nvidia Jetson](https://hackmd.io/kZftj6AgQmWJsbXsswIwEQ) - [Convert PyTorch model to TensorRT for 3-6x speedup<br>將PyTorch模型轉換為TensorRT,實現3-8倍加速](https://hackmd.io/_oaJhYNqTvyL_h01X1Fdmw?both) - [Accelerate multi-streaming cameras with DeepStream and deploy custom (YOLO) models<br>使用DeepStream加速多串流攝影機並部署客製(YOLO)模型](https://hackmd.io/@YungHuiHsu/rJKx-tv4h) - [Use Deepstream python API to extract the model output tensor and customize model post-processing (e.g., YOLO-Pose)<br>使用Deepstream python API提取模型輸出張量並定製模型后處理(如:YOLO-Pose)](https://hackmd.io/@YungHuiHsu/rk41ISKY2) - [Model Quantization Note 模型量化筆記](https://hackmd.io/riYLcrp1RuKHpVI22oEAXA) --- ## Github 實作 ![](https://github.com/YunghuiHsu/deepstream-yolo-pose/blob/main/imgs/Multistream_4_YOLOv8s-pose-3.PNG?raw=true =600x) ### [github/deepstream-yolo-pose](https://github.com/YunghuiHsu/deepstream-yolo-pose) - [ ] YOLO-POSE 演算法簡介(待補) - [x] 環境設定安裝請見[README](https://github.com/YunghuiHsu/deepstream-yolo-pose/blob/main/README.md) - [x] TensorRT與Deepstream教學請見: - [Convert PyTorch model to TensorRT for 3-6x speedup<br>將PyTorch模型轉換為TensorRT,實現3-8倍加速](https://hackmd.io/_oaJhYNqTvyL_h01X1Fdmw?both) - [Accelerate multi-streaming cameras with DeepStream and deploy custom (YOLO) models<br>使用DeepStream加速多串流攝影機並部署客製(YOLO)模型](https://hackmd.io/@YungHuiHsu/rJKx-tv4h) Performance on Jetson(AGX Xavier / AGX Orin) for TensorRT Engine | model | device | size | batch | fps | ms | | -------------------- |:----------:|:----:|:-----:|:-----:|:----:| | yolov8s-pose.engine | AGX Xavier | 640 | 1 | 40.6 | 24.7 | | yolov8s-pose.engine | AGX Xavier | 640 | 12 | 12.1 | 86.4 | | yolov8s-pose.engine | AGX Orin | 640 | 1 | 258.8 | 4.2 | | yolov8s-pose.engine | AGX Orin | 640 | 12 | 34.8 | 33.2 | | yolov7w-pose.pt | AGX Xavier | 640 | 1 | 14.4 | 59.8 | | yolov7w-pose.pt | AGX Xavier | 960 | 1 | 11.8 | 69.4 | | yolov7w-pose.engine* | AGX Xavier | 960 | 1 | 19.0 | 52.1 | | yolov7w-pose.engine* | AGX Orin | 960 | 1 | 61.1 | 16.8 | - \* yolov7w-pose with yolo layer tensorrt plugin from [(nanmi/yolov7-pose)](https://github.com/nanmi/yolov7-pose).NMS not included。Single batch and image_size 960 only. - test .engine(TensorRT) model with `trtexec` command.<br> - test .pt model with 15s video for baseline([test scripts](https://github.com/YunghuiHsu/yolov7/tree/log_metric)). ## 為何選擇YOLO-POSE 相比於傳統的 top-down 和 bottom-up 方法,YOLO-pose 方法具有以下優勢: 1. 計算效率更高:YOLO-pose 方法可以在單次推理中定位所有人物的位置和姿勢,而不需要多次前向傳遞。相比之下,top-down 方法需要進行多次前向傳遞,而 bottom-up 方法需要進行複雜的後處理。 2. 精度更高:YOLO-pose 方法使用了一種新的方法來處理多人姿勢估計問題,並且可以在不需要測試時間增強的情況下實現最佳性能。相比之下,top-down 和 bottom-up 方法需要使用複雜的後處理和測試時間增強技術來提高性能。 3. 複雜度更低:YOLO-pose 方法的複雜度與圖像中的人數無關,因此可以更輕鬆地處理多人姿勢估計問題。相比之下,top-down 方法的複雜度隨著圖像中的人數增加而增加,而 bottom-up 方法需要進行複雜的後處理。 **強項:多人擁擠場景** <div style="text-align: center;"> <figure> <img src="https://hackmd.io/_uploads/r1Nk0FJjh.png" alt="DS_plugin_gst-nvinfer.png" width="400"> <figcaption><span style="color: grey; ">上圖為yolov5m6Pose、下圖為HigherHRNetW32,可以見到在多人擁擠場景的情況下,傳統的由下而上(Bottom-Up)方法,在關節點分群時錯誤率較高</span></figcaption> </figure> </div> <div style="text-align: center;"> <figure> <img src="https://hackmd.io/_uploads/HkHFpFkj2.png" alt="DS_plugin_gst-nvinfer.png" width="800"> <figcaption><span style="color: BLACK; font-weight: bold;">YOLO-POSE架構</span></figcaption> </figure> </div> ## Depstream Python API for Extracting Model Output Tensor 完整程式碼見 : [deepstream_YOLOv8-Pose_rtsp.py](https://github.com/YunghuiHsu/deepstream-yolo-pose/blob/main/deepstream_YOLOv8-Pose_rtsp.py#L92) ### Extracting Model Output Tensor 要從deepstream buffer中讀取模型預測結果,需要在Deepstream Pipeline中負責模型推理的[`Gst-nvinfer`](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvinfer.html)模塊輸出的Buffer中將meta data撈出 <div style="text-align: center;"> <figure> <img src="https://docs.nvidia.com/metropolis/deepstream/dev-guide/_images/DS_plugin_gst-nvinfer.png" alt="DS_plugin_gst-nvinfer.png" width="450"> <figcaption><span style="color: green; font-weight: bold;">Gst-nvinfer</span></figcaption> </figure> </div> 整體data pipeline如下圖。如果還想連影像資料一起撈的話,要從FRAME BUFFER著手 <div style="text-align: center;"> <figure> <img src="https://github.com/NVIDIA-AI-IOT/deepstream_python_apps/blob/master/apps/deepstream-imagedata-multistream/imagedata-app-block-diagram.png?raw=true" alt="imagedata-app-block-diagram.png" width="800"> <figcaption><span style="color: grey; font-weight: bold;"> Workflow for Extracting Model Output Tensor from `Gst-nvinfer`</span></figcaption> </figure> </div> source : [deepstream-imagedata-multistream](https://github.com/NVIDIA-AI-IOT/deepstream_python_apps/tree/master/apps/deepstream-imagedata-multistream) #### 1. Parsing the output data of a custom model from the `NvDsBatchMeta` data structure<br>從`NvDsBatchMeta`資料結構解析取出客製模型的輸出資料 要取得的目標(模型預測的張量)位於 `NvDsBatchMeta` > `NvDsFrameMeta` > `NvDsUserMeta` 之下,並指定為`NvDsInferTensorMeta`的資料結構,該結構用於存儲推論張量的相關屬性和數據。 ##### `NvDsBatchMeta` Deepstream中關於Meta data資料結構定義如下圖 <div style="text-align: center;"> <figure> <img src="https://docs.nvidia.com/metropolis/deepstream/dev-guide/_images/DS_plugin_metadata.png" alt="DS_plugin_metadata.png" width="600"> <figcaption><span style="color: grey; font-weight: bold;"> NvDsBatchMeta: Basic Metadata Structure</span></figcaption> </figure> </div> source : [MetaData in the DeepStream SDK](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_metadata.html) ##### `NvDsUserMeta` - DeepStream SDK API Reference關於`NvDsUserMeta`的資料結構定義 <div style="text-align: center;"> <figure> <img src="https://docs.nvidia.com/metropolis/deepstream/dev-guide/sdk-api/struct__NvDsUserMeta__coll__graph.png" alt="struct__NvDsUserMeta__coll__graph.png" width="400"> <figcaption><span style="color: grey; font-weight: bold;"> NvDsBatchMeta: Basic Metadata Structure</span></figcaption> </figure> </div> source : [NVIDIA DeepStream SDK API Reference - NvDsUserMeta Struct Reference](https://docs.nvidia.com/metropolis/deepstream/dev-guide/sdk-api/struct__NvDsUserMeta.html) ###### 自定義`NvDsUserMeta`範例 :::spoiler - Nvidia bodypose2D 自定義meta資料結構 ```cpp static gpointer nvds_copy_2dpose_meta (gpointer data, gpointer user_data) { NvDsUserMeta *user_meta = (NvDsUserMeta *) data; NvDs2DposeMetaData *p_2dpose_meta_data = (NvDs2DposeMetaData *)user_meta->user_meta_data; NvDs2DposeMetaData *pnew_2dpose_meta_data = (NvDs2DposeMetaData *)g_memdup ( p_2dpose_meta_data, sizeof(NvDs2DposeMetaData)); return (gpointer) pnew_2dpose_meta_data; } ``` - 使用obj\_user\_meta\_list取出meta資料 ```cpp for (l_user = obj_meta->obj_user_meta_list; l_user != NULL; l_user = l_user->next) ``` ::: * 從\`NvDsFrameMeta\` 取出Tensor Metadata data的python API, ###### [To read or parse inference raw tensor data of output layers](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvinfer.html) 詳細的範例見DeepStream SDK samples`sources/apps/sample_apps/deepstream_infer_tensor_meta-test.cpp` 以下是讀取及解析模型預測結果的raw tensor data方法 - 為`Gst-nvinfer`插件啟用屬性`output-tensor-meta`,或在配置文件中啟用相同名稱的屬性 ``` output-tensor-meta=1 ``` - 取出方式 - 當作為**主要GIE**時 - 每幀`NvDsFrameMeta`內的`frame_user_meta_list` - 當作為**次要GIE**時 - 每幀`NvDsObjectMeta`內的`obj_user_meta_list` - `Gst-nvinfer`的Meta數據可以在下游的`GStreamer pad probe`中訪問 \* GIE: GPU Inference Engine :::spoiler 範例: bodypose2d中直接撈出模型原始輸出的張量做操作 - [deepstream_bodypose2d_app.cpp](https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/blob/d6094a4f9e52788ec9c42052042f9252022d04d8/apps/tao_others/deepstream-bodypose2d-app/deepstream_bodypose2d_app.cpp#L298) ```cpp=298 /* Iterate each frame metadata in batch */ for (NvDsMetaList *l_frame = batch_meta->frame_meta_list; l_frame != NULL; l_frame = l_frame->next) { NvDsFrameMeta *frame_meta = (NvDsFrameMeta *)l_frame->data; . . . /* Iterate user metadata in frames to search PGIE's tensor metadata */ for (NvDsMetaList *l_user = frame_meta->frame_user_meta_list; l_user != NULL; l_user = l_user->next) { NvDsUserMeta *user_meta = (NvDsUserMeta *)l_user->data; if (user_meta->base_meta.meta_type != NVDSINFER_TENSOR_OUTPUT_META) continue; NvDsInferTensorMeta *meta = (NvDsInferTensorMeta *)user_meta->user_meta_data; . . . for (unsigned int i = 0; i < meta->num_output_layers; i++) { NvDsInferLayerInfo *info = &meta->output_layers_info[i]; info->buffer = meta->out_buf_ptrs_host[i]; NvDsInferDimsCHW LayerDims; std::vector<NvDsInferLayerInfo> ``` ::: - python code example for extract `NvDsInferTensorMeta` ```python def pose_src_pad_buffer_probe(pad, info, u_data): gst_buffer = info.get_buffer() batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer)) l_frame = batch_meta.frame_meta_list frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data) l_usr = frame_meta.frame_user_meta_list user_meta = pyds.NvDsUserMeta.cast(l_usr.data) tensor_meta = pyds.NvDsInferTensorMeta.cast(user_meta.user_meta_data) ``` - ps: ["nvdsparsebbox_Yolo.cpp"](https://github.com/NVIDIA-AI-IOT/yolo_deepstream/blob/5af35bab7f6dfca7f1f32d44847b2a91786485f4/deepstream_yolo/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp#L227)中是在`[Gst-nvinfer`\]後的\[`Gst-nvdspostprocess`\]階段,取出`NvDsInferTensorMeta` 內的`output_layers_info`進行解析 #### 2. Convert `NvDsBatchMeta` data schema to `numpy` array<br>將`NvDsBatchMeta`資料結構解轉換為`numpy` 陣列 `NvDsInferTensorMeta` > `NvDsInferLayerinfo` > `layer.buffer` > `numpy array` ##### `NvDsInferTensorMeta` <div style="text-align: center;"> <figure> <img src="https://docs.nvidia.com/metropolis/deepstream/dev-guide/sdk-api/structNvDsInferTensorMeta__coll__graph.png" alt="struct__NvDsUserMeta__coll__graph.png" width="400"> <figcaption><span style="color: grey; font-weight: bold;"> NvDsInferTensorMeta </span></figcaption> </figure> </div> source : [NVIDIA DeepStream SDK API Reference - NvDsInferTensorMeta Struct Reference](https://docs.nvidia.com/metropolis/deepstream/dev-guide/sdk-api/structNvDsInferTensorMeta.html) ##### 將模型的預測結果(位於**l**ayer.buffer記憶體區塊)轉變為numpy陣列 在取得模型推理結果的張量後(Deepstream中用來儲存推理結果的`NvDsInferTensorMeta`資料結構),這時候的原始資料型態還是記憶體中分配的一塊buffer,必須透過python `ctype` 函式庫的幫助,取得buffer記憶體位置的指標後,將其轉換為numpy資料格式,方便後續在python中處理 ##### `NvDsInferTensorMeta` > numpy array : python code example :::spoiler - **pyds.get_ptr(layer.buffer)**:這個函式是用於獲取指向**layer.buffer**的指標,**layer.buffer**表示一塊記憶體區域。 - **ctypes.cast(pointer, POINTER(ctypes.c\_float))**:**ctypes.cast**函式用於將指標進行類型轉換。在這個例子中,我們將獲取的指標**pyds.get\_ptr(layer.buffer)轉換為指向ctypes.c\_float**類型的指標。這樣做是為了讓後續的處理能夠以**float**類型進行。 - **np.ctypeslib.as\_array(ptr, shape=dims)**:這個函式將指標**ptr**轉換為NumPy陣列。**dims** 可利用`layer_output_info.inferDims.d`API來自動抓取模型輸出的形狀,用於指定NumPy陣列的維度。這樣,後面就可以使用`NumPy`提供的強大功能來操作這個陣列/張量,即模型預測結果) ```python! data_type_map = {pyds.NvDsInferDataType.FLOAT: ctypes.c_float, pyds.NvDsInferDataType.INT8: ctypes.c_int8, pyds.NvDsInferDataType.INT32: ctypes.c_int32} def pose_src_pad_buffer_probe(pad, info, u_data): . . . tensor_meta = pyds.NvDsInferTensorMeta.cast(user_meta.user_meta_data) # layers_info = [] # for i in range(tensor_meta.num_output_layers): # layer = pyds.get_nvds_LayerInfo(tensor_meta, i) # # print(i, layer.layerName) # layers_info.append(layer) # if your model only have one output layer, just give "0" as key layer_output_info = pyds.get_nvds_LayerInfo(tensor_meta, 0) # as num_output_layers == 1 # remove zeros from both ends of the array. 'b' : 'both' dims = np.trim_zeros(layer_output_info.inferDims.d, 'b') if frame_number == 0 : print(f'\tModel output dimension from LayerInfo: {dims}') # load float* buffer to python cdata_type = data_type_map[layer_output_info.dataType] ptr = ctypes.cast(pyds.get_ptr(layer_output_info.buffer), ctypes.POINTER(cdata_type)) # Determine the size of the array # Automatic acquisition of buffer memory sizes based on output in the model out = np.ctypeslib.as_array(ptr, shape=dims) ``` ::: #### 3. Customized Tensor Post-Processing based on Model Output <br> 根據模型輸出張量的客置後處理 檢視[YOLOv8-pose](https://github.com/ultralytics/ultralytics)的輸出,張量形狀為(batch, anchors, max_outputs) ![](https://hackmd.io/_uploads/Bym3Mynt3.png =800x) `netron yolov8s-pose.onnx` ![](https://hackmd.io/_uploads/ryDZ712Yn.png =400x) 由於`yolov7-pose`與`yolov8-pose`模型的輸出形狀與anchors不同 我這邊使用的後處理函式需要採用(batch, max_outputs, anchors)的格式輸入,而且包含類別標籤的機率,因此手動客製調整`yolov8-pose`模型的輸出張量 :::info - 人體關節鍵點keypoints YOLO-POSE的人體關節點17個,每個點位包含{$x,y,conf$},其中conf為0、1或2,分別代表"不存在"、"存在但被遮擋"或"存在且不被遮擋",故和人體有關的label一個人包含17x3(3代表x、y、conf)=51個元素,bbox預測的label含有6個元素(x,y,w,h,box_conf,cls_conf),共51+6=57個元素,表達如下: $$ P_{v} = \{C_{x}, C_{y}, W, H, box_{conf}, class_{conf}, K_{x}^{1}, K_{y}^{1}, K_{conf}^{1}, \cdots, K_{x}^{n}, K_{y}^{n}, K_{conf}^{n}\} $$ ![](https://hackmd.io/_uploads/B1aRBJntn.png =400x) ![](https://hackmd.io/_uploads/SJgRLT3Y2.png =400x) - anchors of yolov7-pose - 57 # bbox(4) + confidence(1) + cls(1) + keypoints(3 x 17) = 4 + 1 + 1 + 51 = 57 - anchors of yolov8-pose - 56 # bbox(4) + confidence(1) + keypoints(3 x 17) = 4 + 1 + 0 + 51 = 56 ::: ```python= def pose_src_pad_buffer_probe(pad, info, u_data): . network_info = tensor_meta.network_info . # [Optional] Postprocess for yolov8-pose prediction tensor # [YOLOv8](https://github.com/ultralytics/ultralytics) #  (batch, 56, 8400) >(batch, 8400, 56) for yolov8 out = out.transpose((0, 2, 1)) # make pseudo class prob cls_prob = np.ones((out.shape[0], out.shape[1], 1), dtype=np.uint8) out[..., :4] = map_to_zero_one(out[..., :4]) # scalar prob to [0, 1] # insert pseudo class prob into predictions out = np.concatenate((out[..., :5], cls_prob, out[..., 5:]), axis=-1) out[..., [0, 2]] = out[..., [0, 2]] * network_info.width # scale to screen width out[..., [1, 3]] = out[..., [1, 3]] * network_info.height # scale to screen height ``` ## config file [dstest1_pgie_YOLOv8-Pose_config.txt](https://github.com/YunghuiHsu/deepstream-yolo-pose/blob/main/configs/dstest1_pgie_YOLOv8-Pose_config.txt) 以下列出幾個使用客製模型時需要特別設置的參數 #### `model-engine-file` 指定你的TensorRT Engine路徑,我習慣放在`/opt/nvidia/deepstream/deepstream/samples/models/`並指定絕對路徑 - `model-engine-file=<your engine path>` #### `network-type` 指定模型種類,由於deepstream還沒有提供公版的姿態模型,所有要指定其他:`100` > [Gst-nvdspreprocess & Gst-nvdspostprocess plugin] > 0=Detector, 1=Classifier, 2=Segmentation, 100=Other - `network-type=100` #### `output-tensor-meta` 要設為`1`(true)才能在`NvDsUserMeta`取得模型輸出結果 - `output-tensor-meta=1` #### `tensor-meta-pool-size` 預設單一batch是6,即使不設定也會自動分配,有測試過設比較大的值但加速效果不明顯 > if batch-size!=1 。 ## Reference #### [deepstream-imagedata-multistream](https://github.com/NVIDIA-AI-IOT/deepstream_python_apps/tree/master/apps/deepstream-imagedata-multistream) - 取出影像資料後處理的範例 #### [deepstream-ssd-parser](https://github.com/NVIDIA-AI-IOT/deepstream_python_apps/tree/master/apps/deepstream-ssd-parser) - 內有自定義解析Metadata的方式可以參考 #### [NVIDIA DeepStream SDK API Reference ](https://docs.nvidia.com/metropolis/deepstream/dev-guide/sdk-api/index.html) #### [DEEPSTREAM PYTHON API REFERENCE/NvDsInfer](https://docs.nvidia.com/metropolis/deepstream/python-api/PYTHON_API/NvDsInfer/NvDsInfer_toc.html) #### [Using a Custom Model with DeepStream](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_using_custom_model.html)