【DL】YOLOv8 Deploy Using NCNN 筆記

Brief 簡介

專案使用 YOLOv8 物件檢測模型部屬在邊緣裝置上，在了解模型與測試後發現有可以優化地方存在，而優化的關鍵在於 YOLOv8 在 Decode bounding box 的過程中是將每一個特徵網格 (grid) 的每一個 bbox 都進行解碼，而這部分能夠將其移除，並且移動到邊緣端進行實作，進而減少 decode box 時間，而減少時間關鍵在於可以先判斷信心分數，把分數較低的 Bounding Box 移除，再進行積分操作，進而優化程式碼執行時間

Model Structure

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

YOLOv8 Post Process

在將影像輸入模型並進行後續操作後，針對不同部署模型格式 (如 ONNX、TensorRT、TFLite 等) 進行 export 時，後處理操作可能會影響計算圖 (computational graph) 的生成，而 decode bbox 過程也包含在裡面，因此我們需要將其拔除並另外實現在邊緣裝置上

我們從 YOLOv8 (現行為 YOLOv11 了) 來進行部分程式碼的解讀，參考 ultralytics 的 github 開源程式碼 ultralytics/nn/modules/head.py

Detect Head 功能

在 Detect 類別中負責了以下的任務

不同尺度的特徵影像轉換成類別 (cls) 與預測框 (bbox) 的預測
將預測框解碼成實際預測框的操作

雖然 YOLOv8 是 anchor-free 的模型，但實際上訓練與 inference 也都需要 anchor，但意義上是不同的

YOLOv5 anchor-based 表示需要預先定義好一組 anchor box，其代表寬和高
YOLOv8 anchor-free 表示在 forward 過程自動生成 anchor，而 anchor 則是代表特徵圖的中心點位置座標

self.cv2
負責輸出 BBox 的四個位置數值， YOLOv8 中藉由 DFL 來回歸出這四個數值，為積分操作做準備
self.cv3
負責輸出類別的機率分布
self.reg_max
由 DFL 回歸出的 bbox 積分點的數量，通常設置為16，公式以
$r^{m}$

self.cv2 = nn.ModuleList(
    nn.Sequential(
        Conv(x, c2, 3), 
        Conv(c2, c2, 3), 
        nn.Conv2d(c2, 4 * self.reg_max, 1)
    ) for x in ch
)
self.cv3 = nn.ModuleList(
    nn.Sequential(
        nn.Sequential(DWConv(x, x, 3), Conv(x, c3, 1)),
        nn.Sequential(DWConv(c3, c3, 3), Conv(c3, c3, 1)),
        nn.Conv2d(c3, self.nc, 1),
    ) for x in ch
)

Detect head 向前傳播流程

在 YOLOv8 的架構中，forward 函數需要處理來自不同尺度的特徵圖，而這些特徵圖被儲存在一個列表 x 中。這些特徵圖分過 self.cv2 和 self.cv3 層進行向前傳播，可以看出 YOLOv8 在輸出預測上進行了 解偶 (decoupling) 的

特徵圖 x[i] 經過 self.cv2[i] 和 self.cv3[i] 的處理後，tensor 的形狀如下：

self.cv2[i](x[i]) 的 tensor shape 為
$(b, 4 r_{p}, h, w)$ ，其中：
- $b$ : batch 的大小
- $4 \cdot r^{p}$ : 為所設計的維度
- $h, w$ : 特徵圖的高度與寬度
self.cv3[i](x[i]) 的 tensor shape 為
$(b, n_{c}, h, w)$ ，其中：
- $n_{c}$ : 類別的數量

最終，特徵圖 x[i] 的 tensor shape 會是

(b, 4 r_{p} + n_{c}, h, w)

，即將回歸框和分類信息合併到同一個 tensor 中。

這樣，我們就完成了第一階段的 tensor 處理。由於這過程涉及許多 tensor 操作，對初學者來說可能較難理解，但基本上是將不同的預測任務（如邊界框和分類）進行分離後，再合併回輸出中。

def forward(self, x):
    """Concatenates and returns predicted bounding boxes and class probabilities."""
    for i in range(self.nl):
        x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)
    if self.training:  # Training path
        return x
    y = self._inference(x)
    return y if self.export else (y, x)

`_inference`

在 _inference 函數中，負責處理 decode bbox與 anchor 處理，我們將核心的程式碼來進行解讀

x_cat
- 將列表中的所有tensor串接在一起，注意這裡是將不同尺度下特徵圖串接在一起處理
- tensor shape :
  $(b, 4 r_{p} + n_{c}, \frac{h}{8} \frac{w}{8} + \frac{h}{16} \frac{w}{16} + . . .)$
make_anchors(feats, strides, grid_cell_offset)
- 建構anchor函數，根據傳入的特徵圖大小，在每一個特徵圖上的 pixel 生成一個 anchor ，如果在 feat[0][0] 生成的話就會是在特徵點的中心有一個 anchor，strides 則是相對於原圖其特徵圖縮小的倍率，通常就是
  $(8, 16, 32)$ 就是輸出到 Detect 的特徵圖相對於原圖的縮放比例
self.dlf(x)
進行 bbox 的積分進而回歸出 bbox 的
$(x, y, w, h)$
- $(b, 4 \cdot r_{p}, h w) \to (B, 4, r_{p}, h w) \to (b, r_{p}, 4, h w)$
  操作順序為 reshape() -> permute()

積分操作藉由 Conv 1 x 1 來完成，等價於積分操作

def _inference(self, x):
    x_cat = torch.cat([xi.view(shape[0], self.no, -1) for xi in x], 2)       
    self.anchors, self.strides = (
        x.transpose(0, 1) for x in make_anchors(x, self.stride, 0.5)
    )
    self.shape = shape
    box, cls = x_cat.split((self.reg_max * 4, self.nc), 1)
    dbox = self.decode_bboxes(self.dfl(box), self.anchors.unsqueeze(0)) * self.strides
    return torch.cat((dbox, cls.sigmoid()), 1)

Deploy using NCNN

Export

藉由 NCNN 將模型部屬，並使用 C++ 進行撰寫

將解碼BBox操作包含到模型並在邊緣裝置上執行nms
- 直接藉由 model.export(format="ncnn", dynamic=True, simplify=True) 將模型轉換
將解碼BBox操作與NMS都在邊緣裝置上執行nms
- 修改原始程式碼
  ultralytics/nn/modules/head.py

## ultralytics/nn/modules/head.py 58-68 lines
def forward(self, x):
    """Concatenates and returns predicted bounding boxes and class probabilities."""
    batch_size = x[0].shape[0]
    for i in range(self.nl):
        # x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)
        box = self.cv2[i](x[i]).view(batch_size, self.reg_max * 4, -1)
        cls = self.cv3[i](x[i]).view(batch_size, self.nc, -1)
        x[i] = torch.cat((box, cls), 1)

    if self.training:  # Training path
        return x
    # y = self._inference(x)
    y = torch.cat(x, dim=2)
    return y if self.export else (y, x)

在export後會得到4個檔案，我們需要的是.bin與.param

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Visualize

將解碼BBox操作包含到模型視覺化
Image Not Showing Possible Reasons
- The image was uploaded to a note which you don't have access to
- The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons

The image was uploaded to a note which you don't have access to
The note which the image was originally uploaded to has been deleted

Learn More →

Anchor Boxes

預定義的框，用於協助模型學習，讓模型學習的目標是如何調整預定義的框，以去符合標籤所標記的框

YOLOv5 Anchor Box

模型預測偏移量
- $t_{x}, t_{y}$ : 中心位置的偏移量
- $t_{w}, t_{h}$ : 寬高的縮放係數
Anchor Box 寬高
- $a_{w}$ : Anchor box 寬度
- $a_{h}$ : Anchor box 高度
Bounding Box
- $b_{x} = (σ (t_{x}) + c_{x}) \times s$
- $b_{y} = σ (t_{y}) + c_{y} \times s$
- $b_{w} = (a_{w} \cdot e^{t_{w}}) \times s$
- $b_{h} = (a_{h} \cdot e^{t_{h}}) \times s$

YOLOv8 Anchor-Free 轉換公式

YOLOv8 Anchor-Free 模型預測偏移量（

d_{l}, d_{t}, d_{r}, d_{b}

）和網格中心點（

x_{a}, y_{a}

），通過縮放因子

s

轉換為最終邊界框座標（

b_{x 1}, b_{y 1}, b_{x 2}, b_{y 2}

）。

模型輸出參數：

$d_{l}$ ：左邊偏移量
$d_{t}$ ：上邊偏移量
$d_{r}$ ：右邊偏移量
$d_{b}$ ：下邊偏移量
$(x_{a}, y_{a})$ ：特徵網格中心點
$s$ ：縮放因子

邊界框計算公式：

$b_{x 1} = (x_{a} - d_{l}) \cdot s$
$b_{y 1} = (y_{a} - d_{t}) \cdot s$
$b_{x 2} = (x_{a} + d_{r}) \cdot s$
$b_{y 2} = (y_{a} + d_{b}) \cdot s$

Non-maximum Suppression

由於 YOLOv8 是密集地在一張影像上進行 bbox 的預測，因此需要透過 NMS 來消除分數較低以及重複預測的bbox

先過濾掉分數較低的框
根據分數進行排序
將分數最大當作候選框
與其他的框計算IOU
- 如果IOU大於閥值則排除掉
- 閥值越大則Overlapping有可能比較多，反之毅然
- 重複直到本來bbox set為空為止

Export Patch

在不更改原始程式碼的情況下，我們可以透過 export.py 來將 forward 函數進行改寫後，替換掉原生 Detect 的類別，這樣我們便可以在匯出模型權重檔案時，將 Bounding Box Decode 部分移除

export.py

class PatchDetect(nn.Module):
    def forward(self, x):
        """Concatenates and returns predicted bounding boxes and class probabilities."""
        print("Use Patch Forward!")
        if self.end2end:
            return self.forward_end2end(x)

        for i in range(self.nl):
            x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)
        if self.training:  # Training path
            return x
        y = torch.cat([xi.view(x[0].shape[0], self.no, -1) for xi in x], 2)
        # y = self._inference(x)
        return y  # i self.export else (y, x)

# Replace original Package Detect head class
ultralytics.nn.modules.head.Detect = PatchDetect

main.py

from ultralytics import YOLO
import argparse

def run(args):
    if args.patch:
        import export_patch
    model = YOLO(args.path)
    model.export(format="ncnn", dynamic=True, simplify=True)

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Run YOLO model with optional patch.")
    parser.add_argument('--path', type=str, help="Export Model Path")
    parser.add_argument('--patch', action='store_true', help='Enable export patch')
    args = parser.parse_args()
    run(args)

Export model時不包含decode box的功能
python .\export_ncnn.py --path best.pt --patch

Export model時包含decode box的功能
python .\export_ncnn.py --path best.pt

【DL】YOLOv8 Deploy Using NCNN 筆記

Brief 簡介

Model Structure

YOLOv8 Post Process

Detect Head 功能

Detect head 向前傳播流程

_inference

Deploy using NCNN

Export

Visualize

Anchor Boxes

YOLOv5 Anchor Box

YOLOv8 Anchor-Free 轉換公式

Non-maximum Suppression

Export Patch

export.py

main.py

Read more

韓國釜山旅遊筆記

【C】C語言型態宣告與解讀

C/C++ 學習筆記

【論文閱讀】論

`_inference`