Try   HackMD

使用Deepstream python API提取模型輸出張量並定製模型后處理(如:YOLO-Pose)

tags: Edge AI DeepStream deployment  Nvidia Jetson YOLO-POSE pose estimation TensorRT pose

NVIDIA Jetson 平台部署相關筆記

基本環境設定

模型部署與加速


Github 實作

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

github/deepstream-yolo-pose

Performance on Jetson(AGX Xavier / AGX Orin) for TensorRT Engine

model device size batch fps ms
yolov8s-pose.engine AGX Xavier 640 1 40.6 24.7
yolov8s-pose.engine AGX Xavier 640 12 12.1 86.4
yolov8s-pose.engine AGX Orin 640 1 258.8 4.2
yolov8s-pose.engine AGX Orin 640 12 34.8 33.2
yolov7w-pose.pt AGX Xavier 640 1 14.4 59.8
yolov7w-pose.pt AGX Xavier 960 1 11.8 69.4
yolov7w-pose.engine* AGX Xavier 960 1 19.0 52.1
yolov7w-pose.engine* AGX Orin 960 1 61.1 16.8
  • * yolov7w-pose with yolo layer tensorrt plugin from (nanmi/yolov7-pose).NMS not included。Single batch and image_size 960 only.
  • test .engine(TensorRT) model with trtexec command.
  • test .pt model with 15s video for baseline(test scripts).

為何選擇YOLO-POSE

相比於傳統的 top-down 和 bottom-up 方法,YOLO-pose 方法具有以下優勢:

  1. 計算效率更高:YOLO-pose 方法可以在單次推理中定位所有人物的位置和姿勢,而不需要多次前向傳遞。相比之下,top-down 方法需要進行多次前向傳遞,而 bottom-up 方法需要進行複雜的後處理。

  2. 精度更高:YOLO-pose 方法使用了一種新的方法來處理多人姿勢估計問題,並且可以在不需要測試時間增強的情況下實現最佳性能。相比之下,top-down 和 bottom-up 方法需要使用複雜的後處理和測試時間增強技術來提高性能。

  3. 複雜度更低:YOLO-pose 方法的複雜度與圖像中的人數無關,因此可以更輕鬆地處理多人姿勢估計問題。相比之下,top-down 方法的複雜度隨著圖像中的人數增加而增加,而 bottom-up 方法需要進行複雜的後處理。

強項:多人擁擠場景

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →
上圖為yolov5m6Pose、下圖為HigherHRNetW32,可以見到在多人擁擠場景的情況下,傳統的由下而上(Bottom-Up)方法,在關節點分群時錯誤率較高
Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →
YOLO-POSE架構

Depstream Python API for Extracting Model Output Tensor

完整程式碼見 : deepstream_YOLOv8-Pose_rtsp.py

Extracting Model Output Tensor

要從deepstream buffer中讀取模型預測結果,需要在Deepstream Pipeline中負責模型推理的Gst-nvinfer模塊輸出的Buffer中將meta data撈出

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Gst-nvinfer

整體data pipeline如下圖。如果還想連影像資料一起撈的話,要從FRAME BUFFER著手

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Workflow for Extracting Model Output Tensor from `Gst-nvinfer`

source : deepstream-imagedata-multistream

1. Parsing the output data of a custom model from the NvDsBatchMeta data structure
NvDsBatchMeta資料結構解析取出客製模型的輸出資料

要取得的目標(模型預測的張量)位於 NvDsBatchMeta > NvDsFrameMeta > NvDsUserMeta 之下,並指定為NvDsInferTensorMeta的資料結構,該結構用於存儲推論張量的相關屬性和數據。

NvDsBatchMeta

Deepstream中關於Meta data資料結構定義如下圖

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
NvDsBatchMeta: Basic Metadata Structure

source : MetaData in the DeepStream SDK

NvDsUserMeta
  • DeepStream SDK API Reference關於NvDsUserMeta的資料結構定義
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
NvDsBatchMeta: Basic Metadata Structure

source : NVIDIA DeepStream SDK API Reference - NvDsUserMeta Struct Reference

自定義NvDsUserMeta範例
  • Nvidia bodypose2D 自定義meta資料結構

    ​​​​static
    ​​​​gpointer nvds_copy_2dpose_meta (gpointer data, gpointer user_data)
    ​​​​{
    ​​​​    NvDsUserMeta *user_meta = (NvDsUserMeta *) data;
    ​​​​    NvDs2DposeMetaData *p_2dpose_meta_data =
    ​​​​        (NvDs2DposeMetaData *)user_meta->user_meta_data;
    ​​​​    NvDs2DposeMetaData *pnew_2dpose_meta_data =
    ​​​​        (NvDs2DposeMetaData *)g_memdup ( p_2dpose_meta_data,
    ​​​​            sizeof(NvDs2DposeMetaData));
    ​​​​    return (gpointer) pnew_2dpose_meta_data;
    ​​​​}
    
  • 使用obj_user_meta_list取出meta資料

      for (l_user = obj_meta->obj_user_meta_list; l_user != NULL;
           l_user = l_user->next)
  • 從`NvDsFrameMeta` 取出Tensor Metadata data的python API,
To read or parse inference raw tensor data of output layers

詳細的範例見DeepStream SDK samplessources/apps/sample_apps/deepstream_infer_tensor_meta-test.cpp

以下是讀取及解析模型預測結果的raw tensor data方法

  • Gst-nvinfer插件啟用屬性output-tensor-meta,或在配置文件中啟用相同名稱的屬性
    ​​​​output-tensor-meta=1
    
  • 取出方式
    • 當作為主要GIE
      • 每幀NvDsFrameMeta內的frame_user_meta_list
    • 當作為次要GIE
      • 每幀NvDsObjectMeta內的obj_user_meta_list
    • Gst-nvinfer的Meta數據可以在下游的GStreamer pad probe中訪問
      * GIE: GPU Inference Engine
範例: bodypose2d中直接撈出模型原始輸出的張量做操作
/* Iterate each frame metadata in batch */ for (NvDsMetaList *l_frame = batch_meta->frame_meta_list; l_frame != NULL; l_frame = l_frame->next) { NvDsFrameMeta *frame_meta = (NvDsFrameMeta *)l_frame->data; . . . /* Iterate user metadata in frames to search PGIE's tensor metadata */ for (NvDsMetaList *l_user = frame_meta->frame_user_meta_list; l_user != NULL; l_user = l_user->next) { NvDsUserMeta *user_meta = (NvDsUserMeta *)l_user->data; if (user_meta->base_meta.meta_type != NVDSINFER_TENSOR_OUTPUT_META) continue; NvDsInferTensorMeta *meta = (NvDsInferTensorMeta *)user_meta->user_meta_data; . . . for (unsigned int i = 0; i < meta->num_output_layers; i++) { NvDsInferLayerInfo *info = &meta->output_layers_info[i]; info->buffer = meta->out_buf_ptrs_host[i]; NvDsInferDimsCHW LayerDims; std::vector<NvDsInferLayerInfo>
  • python code example for extract NvDsInferTensorMeta
def pose_src_pad_buffer_probe(pad, info, u_data):

    gst_buffer = info.get_buffer()
    batch_meta = pyds.gst_buffer_get_nvds_batch_meta(hash(gst_buffer))
    l_frame = batch_meta.frame_meta_list
    frame_meta = pyds.NvDsFrameMeta.cast(l_frame.data)
    l_usr = frame_meta.frame_user_meta_list
    
    user_meta = pyds.NvDsUserMeta.cast(l_usr.data)
    tensor_meta = pyds.NvDsInferTensorMeta.cast(user_meta.user_meta_data)
  • ps: "nvdsparsebbox_Yolo.cpp"中是在[Gst-nvinfer]後的[Gst-nvdspostprocess]階段,取出NvDsInferTensorMeta 內的output_layers_info進行解析

2. Convert NvDsBatchMeta data schema to numpy array
NvDsBatchMeta資料結構解轉換為numpy 陣列

NvDsInferTensorMeta > NvDsInferLayerinfo > layer.buffer > numpy array

NvDsInferTensorMeta
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
NvDsInferTensorMeta

source : NVIDIA DeepStream SDK API Reference - NvDsInferTensorMeta Struct Reference

將模型的預測結果(位於layer.buffer記憶體區塊)轉變為numpy陣列

在取得模型推理結果的張量後(Deepstream中用來儲存推理結果的NvDsInferTensorMeta資料結構),這時候的原始資料型態還是記憶體中分配的一塊buffer,必須透過python ctype 函式庫的幫助,取得buffer記憶體位置的指標後,將其轉換為numpy資料格式,方便後續在python中處理

NvDsInferTensorMeta > numpy array : python code example
  • pyds.get_ptr(layer.buffer):這個函式是用於獲取指向layer.buffer的指標,layer.buffer表示一塊記憶體區域。
  • ctypes.cast(pointer, POINTER(ctypes.c_float))ctypes.cast函式用於將指標進行類型轉換。在這個例子中,我們將獲取的指標pyds.get_ptr(layer.buffer)轉換為指向ctypes.c_float類型的指標。這樣做是為了讓後續的處理能夠以float類型進行。
  • np.ctypeslib.as_array(ptr, shape=dims):這個函式將指標ptr轉換為NumPy陣列。dims 可利用layer_output_info.inferDims.dAPI來自動抓取模型輸出的形狀,用於指定NumPy陣列的維度。這樣,後面就可以使用NumPy提供的強大功能來操作這個陣列/張量,即模型預測結果)
data_type_map = {pyds.NvDsInferDataType.FLOAT: ctypes.c_float,
                    pyds.NvDsInferDataType.INT8: ctypes.c_int8,
                    pyds.NvDsInferDataType.INT32: ctypes.c_int32}

def pose_src_pad_buffer_probe(pad, info, u_data):
     .
     .
     .
     tensor_meta = pyds.NvDsInferTensorMeta.cast(user_meta.user_meta_data)

    # layers_info = []
    # for i in range(tensor_meta.num_output_layers):
    #     layer = pyds.get_nvds_LayerInfo(tensor_meta, i)
    #     # print(i, layer.layerName)
    #     layers_info.append(layer)

    # if your model only have one output layer, just give "0" as key
    layer_output_info = pyds.get_nvds_LayerInfo(tensor_meta, 0)  # as num_output_layers == 1

    # remove zeros from both ends of the array. 'b' : 'both'
    dims = np.trim_zeros(layer_output_info.inferDims.d, 'b') 

    if frame_number == 0 :
        print(f'\tModel output dimension from LayerInfo: {dims}')

    # load float* buffer to python
    cdata_type = data_type_map[layer_output_info.dataType]
    ptr = ctypes.cast(pyds.get_ptr(layer_output_info.buffer),
                         ctypes.POINTER(cdata_type))
    # Determine the size of the array
    # Automatic acquisition of buffer memory sizes based on output in the model
    out = np.ctypeslib.as_array(ptr, shape=dims) 

3. Customized Tensor Post-Processing based on Model Output
根據模型輸出張量的客置後處理

檢視YOLOv8-pose的輸出,張量形狀為(batch, anchors, max_outputs)


netron yolov8s-pose.onnx

由於yolov7-poseyolov8-pose模型的輸出形狀與anchors不同
我這邊使用的後處理函式需要採用(batch, max_outputs, anchors)的格式輸入,而且包含類別標籤的機率,因此手動客製調整yolov8-pose模型的輸出張量

  • 人體關節鍵點keypoints
    YOLO-POSE的人體關節點17個,每個點位包含{
    x,y,conf
    },其中conf為0、1或2,分別代表"不存在"、"存在但被遮擋"或"存在且不被遮擋",故和人體有關的label一個人包含17x3(3代表x、y、conf)=51個元素,bbox預測的label含有6個元素(x,y,w,h,box_conf,cls_conf),共51+6=57個元素,表達如下:

Pv={Cx,Cy,W,H,boxconf,classconf,Kx1,Ky1,Kconf1,,Kxn,Kyn,Kconfn}

  • anchors of yolov7-pose

    • 57 # bbox(4) + confidence(1) + cls(1) + keypoints(3 x 17) = 4 + 1 + 1 + 51 = 57
  • anchors of yolov8-pose

    • 56 # bbox(4) + confidence(1) + keypoints(3 x 17) = 4 + 1 + 0 + 51 = 56
def pose_src_pad_buffer_probe(pad, info, u_data): . network_info = tensor_meta.network_info . # [Optional] Postprocess for yolov8-pose prediction tensor # [YOLOv8](https://github.com/ultralytics/ultralytics) #  (batch, 56, 8400) >(batch, 8400, 56) for yolov8 out = out.transpose((0, 2, 1)) # make pseudo class prob cls_prob = np.ones((out.shape[0], out.shape[1], 1), dtype=np.uint8) out[..., :4] = map_to_zero_one(out[..., :4]) # scalar prob to [0, 1] # insert pseudo class prob into predictions out = np.concatenate((out[..., :5], cls_prob, out[..., 5:]), axis=-1) out[..., [0, 2]] = out[..., [0, 2]] * network_info.width # scale to screen width out[..., [1, 3]] = out[..., [1, 3]] * network_info.height # scale to screen height

config file

dstest1_pgie_YOLOv8-Pose_config.txt

以下列出幾個使用客製模型時需要特別設置的參數

model-engine-file

指定你的TensorRT Engine路徑,我習慣放在/opt/nvidia/deepstream/deepstream/samples/models/並指定絕對路徑

  • model-engine-file=<your engine path>

network-type

指定模型種類,由於deepstream還沒有提供公版的姿態模型,所有要指定其他:100

[Gst-nvdspreprocess & Gst-nvdspostprocess plugin]
0=Detector, 1=Classifier, 2=Segmentation, 100=Other

  • network-type=100

output-tensor-meta

要設為1(true)才能在NvDsUserMeta取得模型輸出結果

  • output-tensor-meta=1

tensor-meta-pool-size

預設單一batch是6,即使不設定也會自動分配,有測試過設比較大的值但加速效果不明顯

if batch-size!=1 。

Reference

deepstream-imagedata-multistream

  • 取出影像資料後處理的範例

deepstream-ssd-parser

  • 內有自定義解析Metadata的方式可以參考

NVIDIA DeepStream SDK API Reference

DEEPSTREAM PYTHON API REFERENCE/NvDsInfer

Using a Custom Model with DeepStream