使用Deepstream python API提取模型輸出張量並定製模型后處理（如：YOLO-Pose）

tags: `Edge AI` `DeepStream` `deployment` 　`Nvidia` `Jetson` `YOLO-POSE` `pose estimation` `TensorRT` `pose`

NVIDIA Jetson 平台部署相關筆記

基本環境設定

模型部署與加速

Github 實作

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

github/deepstream-yolo-pose

YOLO-POSE 演算法簡介(待補)
環境設定安裝請見README
TensorRT與Deepstream教學請見:
- Convert PyTorch model to TensorRT for 3-6x speedup
  將PyTorch模型轉換為TensorRT，實現3-8倍加速
- Accelerate multi-streaming cameras with DeepStream and deploy custom (YOLO) models
  使用DeepStream加速多串流攝影機並部署客製(YOLO)模型

Performance on Jetson(AGX Xavier / AGX Orin) for TensorRT Engine

model	device	size	batch	fps	ms
yolov8s-pose.engine	AGX Xavier	640	1	40.6	24.7
yolov8s-pose.engine	AGX Xavier	640	12	12.1	86.4
yolov8s-pose.engine	AGX Orin	640	1	258.8	4.2
yolov8s-pose.engine	AGX Orin	640	12	34.8	33.2
yolov7w-pose.pt	AGX Xavier	640	1	14.4	59.8
yolov7w-pose.pt	AGX Xavier	960	1	11.8	69.4
yolov7w-pose.engine*	AGX Xavier	960	1	19.0	52.1
yolov7w-pose.engine*	AGX Orin	960	1	61.1	16.8

* yolov7w-pose with yolo layer tensorrt plugin from (nanmi/yolov7-pose).NMS not included。Single batch and image_size 960 only.
test .engine(TensorRT) model with trtexec command.
test .pt model with 15s video for baseline(test scripts).

為何選擇YOLO-POSE

相比於傳統的 top-down 和 bottom-up 方法，YOLO-pose 方法具有以下優勢：

計算效率更高：YOLO-pose 方法可以在單次推理中定位所有人物的位置和姿勢，而不需要多次前向傳遞。相比之下，top-down 方法需要進行多次前向傳遞，而 bottom-up 方法需要進行複雜的後處理。
精度更高：YOLO-pose 方法使用了一種新的方法來處理多人姿勢估計問題，並且可以在不需要測試時間增強的情況下實現最佳性能。相比之下，top-down 和 bottom-up 方法需要使用複雜的後處理和測試時間增強技術來提高性能。
複雜度更低：YOLO-pose 方法的複雜度與圖像中的人數無關，因此可以更輕鬆地處理多人姿勢估計問題。相比之下，top-down 方法的複雜度隨著圖像中的人數增加而增加，而 bottom-up 方法需要進行複雜的後處理。

強項：多人擁擠場景

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

上圖為yolov5m6Pose、下圖為HigherHRNetW32，可以見到在多人擁擠場景的情況下，傳統的由下而上(Bottom-Up)方法，在關節點分群時錯誤率較高

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

YOLO-POSE架構

Depstream Python API for Extracting Model Output Tensor

完整程式碼見 : deepstream_YOLOv8-Pose_rtsp.py

Extracting Model Output Tensor

要從deepstream buffer中讀取模型預測結果，需要在Deepstream Pipeline中負責模型推理的Gst-nvinfer模塊輸出的Buffer中將meta data撈出

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Gst-nvinfer

整體data pipeline如下圖。如果還想連影像資料一起撈的話，要從FRAME BUFFER著手

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Workflow for Extracting Model Output Tensor from `Gst-nvinfer`

source : deepstream-imagedata-multistream

1. Parsing the output data of a custom model from the `NvDsBatchMeta` data structure
從`NvDsBatchMeta`資料結構解析取出客製模型的輸出資料

要取得的目標(模型預測的張量)位於 NvDsBatchMeta > NvDsFrameMeta > NvDsUserMeta 之下，並指定為NvDsInferTensorMeta的資料結構，該結構用於存儲推論張量的相關屬性和數據。

`NvDsBatchMeta`

Deepstream中關於Meta data資料結構定義如下圖

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

NvDsBatchMeta: Basic Metadata Structure

source : MetaData in the DeepStream SDK

`NvDsUserMeta`

DeepStream SDK API Reference關於NvDsUserMeta的資料結構定義

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

NvDsBatchMeta: Basic Metadata Structure

source : NVIDIA DeepStream SDK API Reference - NvDsUserMeta Struct Reference

自定義`NvDsUserMeta`範例

Nvidia bodypose2D 自定義meta資料結構

static
gpointer nvds_copy_2dpose_meta (gpointer data, gpointer user_data)
{
    NvDsUserMeta *user_meta = (NvDsUserMeta *) data;
    NvDs2DposeMetaData *p_2dpose_meta_data =
        (NvDs2DposeMetaData *)user_meta->user_meta_data;
    NvDs2DposeMetaData *pnew_2dpose_meta_data =
        (NvDs2DposeMetaData *)g_memdup ( p_2dpose_meta_data,
            sizeof(NvDs2DposeMetaData));
    return (gpointer) pnew_2dpose_meta_data;
}

使用obj_user_meta_list取出meta資料

      for (l_user = obj_meta->obj_user_meta_list; l_user != NULL;
           l_user = l_user->next)

從`NvDsFrameMeta` 取出Tensor Metadata data的python API，

To read or parse inference raw tensor data of output layers

詳細的範例見DeepStream SDK samplessources/apps/sample_apps/deepstream_infer_tensor_meta-test.cpp

以下是讀取及解析模型預測結果的raw tensor data方法

為Gst-nvinfer插件啟用屬性output-tensor-meta，或在配置文件中啟用相同名稱的屬性
```
output-tensor-meta=1
```
取出方式
- 當作為主要GIE時
  - 每幀NvDsFrameMeta內的frame_user_meta_list
- 當作為次要GIE時
  - 每幀NvDsObjectMeta內的obj_user_meta_list
- Gst-nvinfer的Meta數據可以在下游的GStreamer pad probe中訪問
  * GIE: GPU Inference Engine

範例: bodypose2d中直接撈出模型原始輸出的張量做操作

deepstream_bodypose2d_app.cpp





























/* Iterate each frame metadata in batch */
  for (NvDsMetaList *l_frame = batch_meta->frame_meta_list; l_frame != NULL;
       l_frame = l_frame->next)
  {
    NvDsFrameMeta *frame_meta = (NvDsFrameMeta *)l_frame->data;
    .
    .
    .

    /* Iterate user metadata in frames to search PGIE's tensor metadata */
    for (NvDsMetaList *l_user = frame_meta->frame_user_meta_list;
         l_user != NULL; l_user = l_user->next)
    {
      NvDsUserMeta *user_meta = (NvDsUserMeta *)l_user->data;
      if (user_meta->base_meta.meta_type != NVDSINFER_TENSOR_OUTPUT_META)
        continue;

      NvDsInferTensorMeta *meta =
          (NvDsInferTensorMeta *)user_meta->user_meta_data;
          .
          .
          .
      for (unsigned int i = 0; i < meta->num_output_layers; i++)
      {
        NvDsInferLayerInfo *info = &meta->output_layers_info[i];
        info->buffer = meta->out_buf_ptrs_host[i];

        NvDsInferDimsCHW LayerDims;
        std::vector<NvDsInferLayerInfo>

python code example for extract NvDsInferTensorMeta

def pose_src_pad_buffer_probe(pad, info, u_data):

    gst_buffer = info.get_buffer()
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame = batch_meta.frame_meta_list
    frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
    l_usr = frame_meta.frame_user_meta_list
    
    user_meta = pyds.NvDsUserMeta.cast(l_usr.data)
    tensor_meta = pyds.NvDsInferTensorMeta.cast(user_meta.user_meta_data)

ps: "nvdsparsebbox_Yolo.cpp"中是在[Gst-nvinfer]後的[Gst-nvdspostprocess]階段，取出NvDsInferTensorMeta 內的output_layers_info進行解析

2. Convert `NvDsBatchMeta` data schema to `numpy` array
將`NvDsBatchMeta`資料結構解轉換為`numpy` 陣列

NvDsInferTensorMeta > NvDsInferLayerinfo > layer.buffer > numpy array

`NvDsInferTensorMeta`

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

NvDsInferTensorMeta

source : NVIDIA DeepStream SDK API Reference - NvDsInferTensorMeta Struct Reference

將模型的預測結果(位於layer.buffer記憶體區塊)轉變為numpy陣列

在取得模型推理結果的張量後(Deepstream中用來儲存推理結果的NvDsInferTensorMeta資料結構)，這時候的原始資料型態還是記憶體中分配的一塊buffer，必須透過python ctype 函式庫的幫助，取得buffer記憶體位置的指標後，將其轉換為numpy資料格式，方便後續在python中處理

`NvDsInferTensorMeta` > numpy array : python code example

pyds.get_ptr(layer.buffer)：這個函式是用於獲取指向layer.buffer的指標，layer.buffer表示一塊記憶體區域。
ctypes.cast(pointer, POINTER(ctypes.c_float))：ctypes.cast函式用於將指標進行類型轉換。在這個例子中，我們將獲取的指標pyds.get_ptr(layer.buffer)轉換為指向ctypes.c_float類型的指標。這樣做是為了讓後續的處理能夠以float類型進行。
np.ctypeslib.as_array(ptr, shape=dims)：這個函式將指標ptr轉換為NumPy陣列。dims 可利用layer_output_info.inferDims.dAPI來自動抓取模型輸出的形狀，用於指定NumPy陣列的維度。這樣，後面就可以使用NumPy提供的強大功能來操作這個陣列/張量，即模型預測結果)

data_type_map = {pyds.NvDsInferDataType.FLOAT: ctypes.c_float,
                    pyds.NvDsInferDataType.INT8: ctypes.c_int8,
                    pyds.NvDsInferDataType.INT32: ctypes.c_int32}

def pose_src_pad_buffer_probe(pad, info, u_data):
     .
     .
     .
     tensor_meta = pyds.NvDsInferTensorMeta.cast(user_meta.user_meta_data)

    # layers_info = []
    # for i in range(tensor_meta.num_output_layers):
    #     layer = pyds.get_nvds_LayerInfo(tensor_meta, i)
    #     # print(i, layer.layerName)
    #     layers_info.append(layer)

    # if your model only have one output layer, just give "0" as key
    layer_output_info = pyds.get_nvds_LayerInfo(tensor_meta, 0)  # as num_output_layers == 1

    # remove zeros from both ends of the array. 'b' : 'both'
    dims = np.trim_zeros(layer_output_info.inferDims.d, 'b') 

    if frame_number == 0 :
        print(f'\tModel output dimension from LayerInfo: {dims}')

    # load float* buffer to python
    cdata_type = data_type_map[layer_output_info.dataType]
    ptr = ctypes.cast(pyds.get_ptr(layer_output_info.buffer),
                         ctypes.POINTER(cdata_type))
    # Determine the size of the array
    # Automatic acquisition of buffer memory sizes based on output in the model
    out = np.ctypeslib.as_array(ptr, shape=dims)

3. Customized Tensor Post-Processing based on Model Output
根據模型輸出張量的客置後處理

檢視YOLOv8-pose的輸出，張量形狀為(batch, anchors, max_outputs)

netron yolov8s-pose.onnx

由於yolov7-pose與yolov8-pose模型的輸出形狀與anchors不同
我這邊使用的後處理函式需要採用(batch, max_outputs, anchors)的格式輸入，而且包含類別標籤的機率，因此手動客製調整yolov8-pose模型的輸出張量

人體關節鍵點keypoints
YOLO-POSE的人體關節點17個，每個點位包含{
$x, y, c o n f$ }，其中conf為0、1或2，分別代表"不存在"、"存在但被遮擋"或"存在且不被遮擋"，故和人體有關的label一個人包含17x3（3代表x、y、conf）=51個元素，bbox預測的label含有6個元素(x,y,w,h,box_conf,cls_conf)，共51+6=57個元素，表達如下：

P_{v} = {C_{x}, C_{y}, W, H, b o x_{c o n f}, c l a s s_{c o n f}, K_{x}^{1}, K_{y}^{1}, K_{c o n f}^{1}, \dots, K_{x}^{n}, K_{y}^{n}, K_{c o n f}^{n}}

anchors of yolov7-pose
- 57 # bbox(4) + confidence(1) + cls(1) + keypoints(3 x 17) = 4 + 1 + 1 + 51 = 57
anchors of yolov8-pose
- 56 # bbox(4) + confidence(1) + keypoints(3 x 17) = 4 + 1 + 0 + 51 = 56















def pose_src_pad_buffer_probe(pad, info, u_data):
     .
    network_info = tensor_meta.network_info
     .
    # [Optional] Postprocess for yolov8-pose prediction tensor
    # [YOLOv8](https://github.com/ultralytics/ultralytics)
    # 　(batch, 56, 8400)　＞(batch, 8400, 56) for yolov8
    out = out.transpose((0, 2, 1))
    # make pseudo class prob
    cls_prob = np.ones((out.shape[0], out.shape[1], 1), dtype=np.uint8)
    out[..., :4] = map_to_zero_one(out[..., :4])  # scalar prob to [0, 1]
    # insert pseudo class prob into predictions
    out = np.concatenate((out[..., :5], cls_prob, out[..., 5:]), axis=-1)
    out[..., [0, 2]] = out[..., [0, 2]] * network_info.width  # scale to screen width
    out[..., [1, 3]] = out[..., [1, 3]] * network_info.height  # scale to screen height

config file

dstest1_pgie_YOLOv8-Pose_config.txt

以下列出幾個使用客製模型時需要特別設置的參數

`model-engine-file`

指定你的TensorRT Engine路徑，我習慣放在/opt/nvidia/deepstream/deepstream/samples/models/並指定絕對路徑

model-engine-file=<your engine path>

`network-type`

指定模型種類，由於deepstream還沒有提供公版的姿態模型，所有要指定其他:100

[Gst-nvdspreprocess & Gst-nvdspostprocess plugin]
0=Detector, 1=Classifier, 2=Segmentation, 100=Other

network-type=100

`output-tensor-meta`

要設為1(true)才能在NvDsUserMeta取得模型輸出結果

output-tensor-meta=1

`tensor-meta-pool-size`

預設單一batch是6，即使不設定也會自動分配，有測試過設比較大的值但加速效果不明顯

if batch-size!=1 。

Reference

deepstream-imagedata-multistream

取出影像資料後處理的範例

deepstream-ssd-parser

內有自定義解析Metadata的方式可以參考

使用Deepstream python API提取模型輸出張量並定製模型后處理（如：YOLO-Pose）

tags: `Edge AI` `DeepStream` `deployment` 　`Nvidia` `Jetson` `YOLO-POSE` `pose estimation` `TensorRT` `pose`

NVIDIA Jetson 平台部署相關筆記

基本環境設定

模型部署與加速

Github 實作

github/deepstream-yolo-pose

為何選擇YOLO-POSE

Depstream Python API for Extracting Model Output Tensor

Extracting Model Output Tensor

1. Parsing the output data of a custom model from the `NvDsBatchMeta` data structure
從`NvDsBatchMeta`資料結構解析取出客製模型的輸出資料

`NvDsBatchMeta`

`NvDsUserMeta`

自定義`NvDsUserMeta`範例

To read or parse inference raw tensor data of output layers

2. Convert `NvDsBatchMeta` data schema to `numpy` array
將`NvDsBatchMeta`資料結構解轉換為`numpy` 陣列

`NvDsInferTensorMeta`

將模型的預測結果(位於layer.buffer記憶體區塊)轉變為numpy陣列

`NvDsInferTensorMeta` > numpy array : python code example

3. Customized Tensor Post-Processing based on Model Output
根據模型輸出張量的客置後處理

config file

`model-engine-file`

`network-type`

`output-tensor-meta`

`tensor-meta-pool-size`

Reference

deepstream-imagedata-multistream

deepstream-ssd-parser

NVIDIA DeepStream SDK API Reference

DEEPSTREAM PYTHON API REFERENCE/NvDsInfer

Using a Custom Model with DeepStream

使用Deepstream python API提取模型輸出張量並定製模型后處理（如：YOLO-Pose）

tags: Edge AI DeepStream deployment Nvidia Jetson YOLO-POSE pose estimation TensorRT pose

NVIDIA Jetson 平台部署相關筆記

基本環境設定

模型部署與加速

Github 實作

github/deepstream-yolo-pose

為何選擇YOLO-POSE

Depstream Python API for Extracting Model Output Tensor

Extracting Model Output Tensor

1. Parsing the output data of a custom model from the NvDsBatchMeta data structure從NvDsBatchMeta資料結構解析取出客製模型的輸出資料

NvDsBatchMeta

NvDsUserMeta

自定義NvDsUserMeta範例

To read or parse inference raw tensor data of output layers

2. Convert NvDsBatchMeta data schema to numpy array將NvDsBatchMeta資料結構解轉換為numpy 陣列

NvDsInferTensorMeta

將模型的預測結果(位於layer.buffer記憶體區塊)轉變為numpy陣列

NvDsInferTensorMeta > numpy array : python code example

3. Customized Tensor Post-Processing based on Model Output 根據模型輸出張量的客置後處理

config file

model-engine-file

network-type

output-tensor-meta

tensor-meta-pool-size

Reference

deepstream-imagedata-multistream

deepstream-ssd-parser

NVIDIA DeepStream SDK API Reference

DEEPSTREAM PYTHON API REFERENCE/NvDsInfer

Using a Custom Model with DeepStream

Read more

[GenAI][AI Agents] Long-Term Agentic Memory With LangGraph - Introduction to Agent Memory

[GenAI][AI Agents] Long-Term Agentic Memory With LangGraph - Baseline Email Assistant

[AI Agents in LangGraph](https://learn.deeplearning.ai/courses/ai-agents-in-langgraph/lesson/1/introduction)

AI / ML領域相關學習筆記入口頁面

tags: `Edge AI` `DeepStream` `deployment` 　`Nvidia` `Jetson` `YOLO-POSE` `pose estimation` `TensorRT` `pose`

1. Parsing the output data of a custom model from the `NvDsBatchMeta` data structure
從`NvDsBatchMeta`資料結構解析取出客製模型的輸出資料

`NvDsBatchMeta`

`NvDsUserMeta`

自定義`NvDsUserMeta`範例

2. Convert `NvDsBatchMeta` data schema to `numpy` array
將`NvDsBatchMeta`資料結構解轉換為`numpy` 陣列

`NvDsInferTensorMeta`

`NvDsInferTensorMeta` > numpy array : python code example

3. Customized Tensor Post-Processing based on Model Output
根據模型輸出張量的客置後處理

`model-engine-file`

`network-type`

`output-tensor-meta`

`tensor-meta-pool-size`