owned this note
owned this note
Published
Linked with GitHub
# 使用Deepstream python API提取模型輸出張量並定製模型后處理(如:YOLO-Pose)
###### tags: `Edge AI` `DeepStream` `deployment` `Nvidia` `Jetson` `YOLO-POSE` `pose estimation` `TensorRT` `pose`
## NVIDIA Jetson 平台部署相關筆記
### 基本環境設定
- [Jetson AGX Xavier 系統環境設定1_在windows10環境連接與安裝](https://hackmd.io/@YungHuiHsu/HJ2lcU4Rj)
- [Jetson AGX Xavier 系統環境設定2_Docker安裝或從源程式碼編譯](https://hackmd.io/k-lnDTxVQDWo_V13WEnfOg)
- [NVIDIA Container Toolkit 安裝筆記](https://hackmd.io/wADvyemZRDOeEduJXA9X7g)
- [Jetson 邊緣裝置查詢系統性能指令jtop](https://hackmd.io/VXXV3T5GRIKi6ap8SkR-tg)
- [Jetson Network Setup 網路設定](https://hackmd.io/WiqAB7pLSpm2863N2ISGXQ)
- [OpenCV turns on cuda acceleration in Nvidia Jetson platform<br>OpenCV在Nvidia Jetson平台開啟cuda加速](https://hackmd.io/6IloyiWMQ_qbIpIE_c_1GA)
### 模型部署與加速
- [[Object Detection_YOLO] YOLOv7 論文筆記](https://hackmd.io/xhLeIsoSToW0jL61QRWDcQ)
- [Deploy YOLOv7 on Nvidia Jetson](https://hackmd.io/kZftj6AgQmWJsbXsswIwEQ)
- [Convert PyTorch model to TensorRT for 3-6x speedup<br>將PyTorch模型轉換為TensorRT,實現3-8倍加速](https://hackmd.io/_oaJhYNqTvyL_h01X1Fdmw?both)
- [Accelerate multi-streaming cameras with DeepStream and deploy custom (YOLO) models<br>使用DeepStream加速多串流攝影機並部署客製(YOLO)模型](https://hackmd.io/@YungHuiHsu/rJKx-tv4h)
- [Use Deepstream python API to extract the model output tensor and customize model post-processing (e.g., YOLO-Pose)<br>使用Deepstream python API提取模型輸出張量並定製模型后處理(如:YOLO-Pose)](https://hackmd.io/@YungHuiHsu/rk41ISKY2)
- [Model Quantization Note 模型量化筆記](https://hackmd.io/riYLcrp1RuKHpVI22oEAXA)
---
## Github 實作
![](https://github.com/YunghuiHsu/deepstream-yolo-pose/blob/main/imgs/Multistream_4_YOLOv8s-pose-3.PNG?raw=true =600x)
### [github/deepstream-yolo-pose](https://github.com/YunghuiHsu/deepstream-yolo-pose)
- [ ] YOLO-POSE 演算法簡介(待補)
- [x] 環境設定安裝請見[README](https://github.com/YunghuiHsu/deepstream-yolo-pose/blob/main/README.md)
- [x] TensorRT與Deepstream教學請見:
- [Convert PyTorch model to TensorRT for 3-6x speedup<br>將PyTorch模型轉換為TensorRT,實現3-8倍加速](https://hackmd.io/_oaJhYNqTvyL_h01X1Fdmw?both)
- [Accelerate multi-streaming cameras with DeepStream and deploy custom (YOLO) models<br>使用DeepStream加速多串流攝影機並部署客製(YOLO)模型](https://hackmd.io/@YungHuiHsu/rJKx-tv4h)
Performance on Jetson(AGX Xavier / AGX Orin) for TensorRT Engine
| model | device | size | batch | fps | ms |
| -------------------- |:----------:|:----:|:-----:|:-----:|:----:|
| yolov8s-pose.engine | AGX Xavier | 640 | 1 | 40.6 | 24.7 |
| yolov8s-pose.engine | AGX Xavier | 640 | 12 | 12.1 | 86.4 |
| yolov8s-pose.engine | AGX Orin | 640 | 1 | 258.8 | 4.2 |
| yolov8s-pose.engine | AGX Orin | 640 | 12 | 34.8 | 33.2 |
| yolov7w-pose.pt | AGX Xavier | 640 | 1 | 14.4 | 59.8 |
| yolov7w-pose.pt | AGX Xavier | 960 | 1 | 11.8 | 69.4 |
| yolov7w-pose.engine* | AGX Xavier | 960 | 1 | 19.0 | 52.1 |
| yolov7w-pose.engine* | AGX Orin | 960 | 1 | 61.1 | 16.8 |
- \* yolov7w-pose with yolo layer tensorrt plugin from [(nanmi/yolov7-pose)](https://github.com/nanmi/yolov7-pose).NMS not included。Single batch and image_size 960 only.
- test .engine(TensorRT) model with `trtexec` command.<br>
- test .pt model with 15s video for baseline([test scripts](https://github.com/YunghuiHsu/yolov7/tree/log_metric)).
## 為何選擇YOLO-POSE
相比於傳統的 top-down 和 bottom-up 方法,YOLO-pose 方法具有以下優勢:
1. 計算效率更高:YOLO-pose 方法可以在單次推理中定位所有人物的位置和姿勢,而不需要多次前向傳遞。相比之下,top-down 方法需要進行多次前向傳遞,而 bottom-up 方法需要進行複雜的後處理。
2. 精度更高:YOLO-pose 方法使用了一種新的方法來處理多人姿勢估計問題,並且可以在不需要測試時間增強的情況下實現最佳性能。相比之下,top-down 和 bottom-up 方法需要使用複雜的後處理和測試時間增強技術來提高性能。
3. 複雜度更低:YOLO-pose 方法的複雜度與圖像中的人數無關,因此可以更輕鬆地處理多人姿勢估計問題。相比之下,top-down 方法的複雜度隨著圖像中的人數增加而增加,而 bottom-up 方法需要進行複雜的後處理。
**強項:多人擁擠場景**
<div style="text-align: center;">
<figure>
<img src="https://hackmd.io/_uploads/r1Nk0FJjh.png" alt="DS_plugin_gst-nvinfer.png" width="400">
<figcaption><span style="color: grey; ">上圖為yolov5m6Pose、下圖為HigherHRNetW32,可以見到在多人擁擠場景的情況下,傳統的由下而上(Bottom-Up)方法,在關節點分群時錯誤率較高</span></figcaption>
</figure>
</div>
<div style="text-align: center;">
<figure>
<img src="https://hackmd.io/_uploads/HkHFpFkj2.png" alt="DS_plugin_gst-nvinfer.png" width="800">
<figcaption><span style="color: BLACK; font-weight: bold;">YOLO-POSE架構</span></figcaption>
</figure>
</div>
## Depstream Python API for Extracting Model Output Tensor
完整程式碼見 : [deepstream_YOLOv8-Pose_rtsp.py](https://github.com/YunghuiHsu/deepstream-yolo-pose/blob/main/deepstream_YOLOv8-Pose_rtsp.py#L92)
### Extracting Model Output Tensor
要從deepstream buffer中讀取模型預測結果,需要在Deepstream Pipeline中負責模型推理的[`Gst-nvinfer`](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvinfer.html)模塊輸出的Buffer中將meta data撈出
<div style="text-align: center;">
<figure>
<img src="https://docs.nvidia.com/metropolis/deepstream/dev-guide/_images/DS_plugin_gst-nvinfer.png" alt="DS_plugin_gst-nvinfer.png" width="450">
<figcaption><span style="color: green; font-weight: bold;">Gst-nvinfer</span></figcaption>
</figure>
</div>
整體data pipeline如下圖。如果還想連影像資料一起撈的話,要從FRAME BUFFER著手
<div style="text-align: center;">
<figure>
<img src="https://github.com/NVIDIA-AI-IOT/deepstream_python_apps/blob/master/apps/deepstream-imagedata-multistream/imagedata-app-block-diagram.png?raw=true" alt="imagedata-app-block-diagram.png" width="800">
<figcaption><span style="color: grey; font-weight: bold;"> Workflow for Extracting Model Output Tensor from `Gst-nvinfer`</span></figcaption>
</figure>
</div>
source : [deepstream-imagedata-multistream](https://github.com/NVIDIA-AI-IOT/deepstream_python_apps/tree/master/apps/deepstream-imagedata-multistream)
#### 1. Parsing the output data of a custom model from the `NvDsBatchMeta` data structure<br>從`NvDsBatchMeta`資料結構解析取出客製模型的輸出資料
要取得的目標(模型預測的張量)位於 `NvDsBatchMeta` > `NvDsFrameMeta` > `NvDsUserMeta` 之下,並指定為`NvDsInferTensorMeta`的資料結構,該結構用於存儲推論張量的相關屬性和數據。
##### `NvDsBatchMeta`
Deepstream中關於Meta data資料結構定義如下圖
<div style="text-align: center;">
<figure>
<img src="https://docs.nvidia.com/metropolis/deepstream/dev-guide/_images/DS_plugin_metadata.png" alt="DS_plugin_metadata.png" width="600">
<figcaption><span style="color: grey; font-weight: bold;"> NvDsBatchMeta: Basic Metadata Structure</span></figcaption>
</figure>
</div>
source : [MetaData in the DeepStream SDK](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_metadata.html)
##### `NvDsUserMeta`
- DeepStream SDK API Reference關於`NvDsUserMeta`的資料結構定義
<div style="text-align: center;">
<figure>
<img src="https://docs.nvidia.com/metropolis/deepstream/dev-guide/sdk-api/struct__NvDsUserMeta__coll__graph.png" alt="struct__NvDsUserMeta__coll__graph.png" width="400">
<figcaption><span style="color: grey; font-weight: bold;"> NvDsBatchMeta: Basic Metadata Structure</span></figcaption>
</figure>
</div>
source : [NVIDIA DeepStream SDK API Reference - NvDsUserMeta Struct Reference](https://docs.nvidia.com/metropolis/deepstream/dev-guide/sdk-api/struct__NvDsUserMeta.html)
###### 自定義`NvDsUserMeta`範例
:::spoiler
- Nvidia bodypose2D 自定義meta資料結構
```cpp
static
gpointer nvds_copy_2dpose_meta (gpointer data, gpointer user_data)
{
NvDsUserMeta *user_meta = (NvDsUserMeta *) data;
NvDs2DposeMetaData *p_2dpose_meta_data =
(NvDs2DposeMetaData *)user_meta->user_meta_data;
NvDs2DposeMetaData *pnew_2dpose_meta_data =
(NvDs2DposeMetaData *)g_memdup ( p_2dpose_meta_data,
sizeof(NvDs2DposeMetaData));
return (gpointer) pnew_2dpose_meta_data;
}
```
- 使用obj\_user\_meta\_list取出meta資料
```cpp
for (l_user = obj_meta->obj_user_meta_list; l_user != NULL;
l_user = l_user->next)
```
:::
* 從\`NvDsFrameMeta\` 取出Tensor Metadata data的python API,
###### [To read or parse inference raw tensor data of output layers](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_plugin_gst-nvinfer.html)
詳細的範例見DeepStream SDK samples`sources/apps/sample_apps/deepstream_infer_tensor_meta-test.cpp`
以下是讀取及解析模型預測結果的raw tensor data方法
- 為`Gst-nvinfer`插件啟用屬性`output-tensor-meta`,或在配置文件中啟用相同名稱的屬性
```
output-tensor-meta=1
```
- 取出方式
- 當作為**主要GIE**時
- 每幀`NvDsFrameMeta`內的`frame_user_meta_list`
- 當作為**次要GIE**時
- 每幀`NvDsObjectMeta`內的`obj_user_meta_list`
- `Gst-nvinfer`的Meta數據可以在下游的`GStreamer pad probe`中訪問
\* GIE: GPU Inference Engine
:::spoiler 範例: bodypose2d中直接撈出模型原始輸出的張量做操作
- [deepstream_bodypose2d_app.cpp](https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/blob/d6094a4f9e52788ec9c42052042f9252022d04d8/apps/tao_others/deepstream-bodypose2d-app/deepstream_bodypose2d_app.cpp#L298)
```cpp=298
/* Iterate each frame metadata in batch */
for (NvDsMetaList *l_frame = batch_meta->frame_meta_list; l_frame != NULL;
l_frame = l_frame->next)
{
NvDsFrameMeta *frame_meta = (NvDsFrameMeta *)l_frame->data;
.
.
.
/* Iterate user metadata in frames to search PGIE's tensor metadata */
for (NvDsMetaList *l_user = frame_meta->frame_user_meta_list;
l_user != NULL; l_user = l_user->next)
{
NvDsUserMeta *user_meta = (NvDsUserMeta *)l_user->data;
if (user_meta->base_meta.meta_type != NVDSINFER_TENSOR_OUTPUT_META)
continue;
NvDsInferTensorMeta *meta =
(NvDsInferTensorMeta *)user_meta->user_meta_data;
.
.
.
for (unsigned int i = 0; i < meta->num_output_layers; i++)
{
NvDsInferLayerInfo *info = &meta->output_layers_info[i];
info->buffer = meta->out_buf_ptrs_host[i];
NvDsInferDimsCHW LayerDims;
std::vector<NvDsInferLayerInfo>
```
:::
- python code example for extract `NvDsInferTensorMeta`
```python
def pose_src_pad_buffer_probe(pad, info, u_data):
gst_buffer = info.get_buffer()
batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
l_frame = batch_meta.frame_meta_list
frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
l_usr = frame_meta.frame_user_meta_list
user_meta = pyds.NvDsUserMeta.cast(l_usr.data)
tensor_meta = pyds.NvDsInferTensorMeta.cast(user_meta.user_meta_data)
```
- ps: ["nvdsparsebbox_Yolo.cpp"](https://github.com/NVIDIA-AI-IOT/yolo_deepstream/blob/5af35bab7f6dfca7f1f32d44847b2a91786485f4/deepstream_yolo/nvdsinfer_custom_impl_Yolo/nvdsparsebbox_Yolo.cpp#L227)中是在`[Gst-nvinfer`\]後的\[`Gst-nvdspostprocess`\]階段,取出`NvDsInferTensorMeta` 內的`output_layers_info`進行解析
#### 2. Convert `NvDsBatchMeta` data schema to `numpy` array<br>將`NvDsBatchMeta`資料結構解轉換為`numpy` 陣列
`NvDsInferTensorMeta` > `NvDsInferLayerinfo` > `layer.buffer` > `numpy array`
##### `NvDsInferTensorMeta`
<div style="text-align: center;">
<figure>
<img src="https://docs.nvidia.com/metropolis/deepstream/dev-guide/sdk-api/structNvDsInferTensorMeta__coll__graph.png" alt="struct__NvDsUserMeta__coll__graph.png" width="400">
<figcaption><span style="color: grey; font-weight: bold;"> NvDsInferTensorMeta </span></figcaption>
</figure>
</div>
source : [NVIDIA DeepStream SDK API Reference - NvDsInferTensorMeta Struct Reference](https://docs.nvidia.com/metropolis/deepstream/dev-guide/sdk-api/structNvDsInferTensorMeta.html)
##### 將模型的預測結果(位於**l**ayer.buffer記憶體區塊)轉變為numpy陣列
在取得模型推理結果的張量後(Deepstream中用來儲存推理結果的`NvDsInferTensorMeta`資料結構),這時候的原始資料型態還是記憶體中分配的一塊buffer,必須透過python `ctype` 函式庫的幫助,取得buffer記憶體位置的指標後,將其轉換為numpy資料格式,方便後續在python中處理
##### `NvDsInferTensorMeta` > numpy array : python code example
:::spoiler
- **pyds.get_ptr(layer.buffer)**:這個函式是用於獲取指向**layer.buffer**的指標,**layer.buffer**表示一塊記憶體區域。
- **ctypes.cast(pointer, POINTER(ctypes.c\_float))**:**ctypes.cast**函式用於將指標進行類型轉換。在這個例子中,我們將獲取的指標**pyds.get\_ptr(layer.buffer)轉換為指向ctypes.c\_float**類型的指標。這樣做是為了讓後續的處理能夠以**float**類型進行。
- **np.ctypeslib.as\_array(ptr, shape=dims)**:這個函式將指標**ptr**轉換為NumPy陣列。**dims** 可利用`layer_output_info.inferDims.d`API來自動抓取模型輸出的形狀,用於指定NumPy陣列的維度。這樣,後面就可以使用`NumPy`提供的強大功能來操作這個陣列/張量,即模型預測結果)
```python!
data_type_map = {pyds.NvDsInferDataType.FLOAT: ctypes.c_float,
pyds.NvDsInferDataType.INT8: ctypes.c_int8,
pyds.NvDsInferDataType.INT32: ctypes.c_int32}
def pose_src_pad_buffer_probe(pad, info, u_data):
.
.
.
tensor_meta = pyds.NvDsInferTensorMeta.cast(user_meta.user_meta_data)
# layers_info = []
# for i in range(tensor_meta.num_output_layers):
# layer = pyds.get_nvds_LayerInfo(tensor_meta, i)
# # print(i, layer.layerName)
# layers_info.append(layer)
# if your model only have one output layer, just give "0" as key
layer_output_info = pyds.get_nvds_LayerInfo(tensor_meta, 0) # as num_output_layers == 1
# remove zeros from both ends of the array. 'b' : 'both'
dims = np.trim_zeros(layer_output_info.inferDims.d, 'b')
if frame_number == 0 :
print(f'\tModel output dimension from LayerInfo: {dims}')
# load float* buffer to python
cdata_type = data_type_map[layer_output_info.dataType]
ptr = ctypes.cast(pyds.get_ptr(layer_output_info.buffer),
ctypes.POINTER(cdata_type))
# Determine the size of the array
# Automatic acquisition of buffer memory sizes based on output in the model
out = np.ctypeslib.as_array(ptr, shape=dims)
```
:::
#### 3. Customized Tensor Post-Processing based on Model Output <br> 根據模型輸出張量的客置後處理
檢視[YOLOv8-pose](https://github.com/ultralytics/ultralytics)的輸出,張量形狀為(batch, anchors, max_outputs)
![](https://hackmd.io/_uploads/Bym3Mynt3.png =800x)
`netron yolov8s-pose.onnx`
![](https://hackmd.io/_uploads/ryDZ712Yn.png =400x)
由於`yolov7-pose`與`yolov8-pose`模型的輸出形狀與anchors不同
我這邊使用的後處理函式需要採用(batch, max_outputs, anchors)的格式輸入,而且包含類別標籤的機率,因此手動客製調整`yolov8-pose`模型的輸出張量
:::info
- 人體關節鍵點keypoints
YOLO-POSE的人體關節點17個,每個點位包含{$x,y,conf$},其中conf為0、1或2,分別代表"不存在"、"存在但被遮擋"或"存在且不被遮擋",故和人體有關的label一個人包含17x3(3代表x、y、conf)=51個元素,bbox預測的label含有6個元素(x,y,w,h,box_conf,cls_conf),共51+6=57個元素,表達如下:
$$ P_{v} = \{C_{x}, C_{y}, W, H, box_{conf}, class_{conf}, K_{x}^{1}, K_{y}^{1}, K_{conf}^{1}, \cdots, K_{x}^{n}, K_{y}^{n}, K_{conf}^{n}\} $$
![](https://hackmd.io/_uploads/B1aRBJntn.png =400x)
![](https://hackmd.io/_uploads/SJgRLT3Y2.png =400x)
- anchors of yolov7-pose
- 57 # bbox(4) + confidence(1) + cls(1) + keypoints(3 x 17) = 4 + 1 + 1 + 51 = 57
- anchors of yolov8-pose
- 56 # bbox(4) + confidence(1) + keypoints(3 x 17) = 4 + 1 + 0 + 51 = 56
:::
```python=
def pose_src_pad_buffer_probe(pad, info, u_data):
.
network_info = tensor_meta.network_info
.
# [Optional] Postprocess for yolov8-pose prediction tensor
# [YOLOv8](https://github.com/ultralytics/ultralytics)
# (batch, 56, 8400) >(batch, 8400, 56) for yolov8
out = out.transpose((0, 2, 1))
# make pseudo class prob
cls_prob = np.ones((out.shape[0], out.shape[1], 1), dtype=np.uint8)
out[..., :4] = map_to_zero_one(out[..., :4]) # scalar prob to [0, 1]
# insert pseudo class prob into predictions
out = np.concatenate((out[..., :5], cls_prob, out[..., 5:]), axis=-1)
out[..., [0, 2]] = out[..., [0, 2]] * network_info.width # scale to screen width
out[..., [1, 3]] = out[..., [1, 3]] * network_info.height # scale to screen height
```
## config file
[dstest1_pgie_YOLOv8-Pose_config.txt](https://github.com/YunghuiHsu/deepstream-yolo-pose/blob/main/configs/dstest1_pgie_YOLOv8-Pose_config.txt)
以下列出幾個使用客製模型時需要特別設置的參數
#### `model-engine-file`
指定你的TensorRT Engine路徑,我習慣放在`/opt/nvidia/deepstream/deepstream/samples/models/`並指定絕對路徑
- `model-engine-file=<your engine path>`
#### `network-type`
指定模型種類,由於deepstream還沒有提供公版的姿態模型,所有要指定其他:`100`
> [Gst-nvdspreprocess & Gst-nvdspostprocess plugin]
> 0=Detector, 1=Classifier, 2=Segmentation, 100=Other
- `network-type=100`
#### `output-tensor-meta`
要設為`1`(true)才能在`NvDsUserMeta`取得模型輸出結果
- `output-tensor-meta=1`
#### `tensor-meta-pool-size`
預設單一batch是6,即使不設定也會自動分配,有測試過設比較大的值但加速效果不明顯
> if batch-size!=1 。
## Reference
#### [deepstream-imagedata-multistream](https://github.com/NVIDIA-AI-IOT/deepstream_python_apps/tree/master/apps/deepstream-imagedata-multistream)
- 取出影像資料後處理的範例
#### [deepstream-ssd-parser](https://github.com/NVIDIA-AI-IOT/deepstream_python_apps/tree/master/apps/deepstream-ssd-parser)
- 內有自定義解析Metadata的方式可以參考
#### [NVIDIA DeepStream SDK API Reference ](https://docs.nvidia.com/metropolis/deepstream/dev-guide/sdk-api/index.html)
#### [DEEPSTREAM PYTHON API REFERENCE/NvDsInfer](https://docs.nvidia.com/metropolis/deepstream/python-api/PYTHON_API/NvDsInfer/NvDsInfer_toc.html)
#### [Using a Custom Model with DeepStream](https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_using_custom_model.html)