yolov7 export end2end nms onnxruntime record

[TOC] # export and excute ## correct command export onnruntime nms end2end model commad ```bash= python export.py --grid --end2end --max-wh 0 ``` 關於"--max-wh": For onnxruntime, you need to specify this value as an integer, when it is 0 it means agnostic NMS, otherwise it is non-agnostic NMS ## run onnruntime nms end2end model code backup 官方版本： https://github.com/WongKinYiu/yolov7/blob/a207844b1ce82d204ab36d87d496728d3d2348e7/tools/YOLOv7onnx.ipynb --- # error record ## error1 執行export.py，出現的error ```bash= Registering NMS plugin for ONNX... ONNX graph created successfully 2024-09-03 16:31:52.604872378 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_986 2024-09-03 16:31:52.604894174 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_985 2024-09-03 16:31:52.604901333 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_936 2024-09-03 16:31:52.604907758 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_935 2024-09-03 16:31:52.604914047 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_886 2024-09-03 16:31:52.604920081 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_885 2024-09-03 16:31:52.604926081 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_836 2024-09-03 16:31:52.604931998 [W:onnxruntime:, unsqueeze_elimination.cc:20 Apply] UnsqueezeElimination cannot remove node Unsqueeze_835 Created NMS plugin 'EfficientNMS_TRT' with attributes: {'plugin_version': '1', 'background_class': -1, 'max_output_boxes': 100, 'score_threshold': 0.25, 'iou_threshold': 0.45, 'score_activation': False, 'box_coding': 0} ``` reason：這是在command加上 ``` --dynamic --dynamic-batch ``` 才會發生此為tensorrt onnx專用，移除這兩項Command-line Options即可。 ## error2 執行export.py，出現的error ```bash= ONNX export failure: name 'gs' is not defined ``` solution: ```bash= pip install nvidia-pyindex pip install onnx_graphsurgeon ``` 此onnx_graphsurgeon版本為0.5.2 若要裝較低版本則輸入command如下(記得要將原本已安裝的onnx_graphsurgeon pip uninstall) ```bash= pip install onnx_graphsurgeon --index-url https://pypi.ngc.nvidia.com ``` 相關issue參考：https://github.com/WongKinYiu/yolov7/issues/1518 註：若安裝完仍出現此error，那麼可能是有多個python環境，安裝的版本和執行python的版本不同。若發生此問題則可參考這篇[hackmd:python error pip install already but can't import](https://hackmd.io/FnNEgecKQdSmgMthgeyKow?view)。 ## error3 執行export.py，出現的error ```bash= Inference failed. You may want to try enabling partitioning to see better results. Note: Error was: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Failed to load model with error: /onnxruntime_src/onnxruntime/core/graph/model.cc:149 onnxruntime::Model::Model(onnx::ModelProto&&, const PathString&, const IOnnxRuntimeOpSchemaRegistryList*, const onnxruntime::logging::Logger&, const onnxruntime::ModelOptions&) Unsupported model IR version: 10, max supported IR version: 9 ``` onnx和onnxruntime版本問題經測試 ```bash= onnx 1.12.0 onnx-graphsurgeon 0.3.27 onnx-simplifier 0.4.19 onnxruntime 1.12.0 onnxruntime-gpu 1.12.0 ``` 上述版本能夠穩定執行。官方網站參考: [ONNX Runtime compatibility](https://onnxruntime.ai/docs/reference/compatibility.html#onnx-opset-support) ## error4 執行onnxruntime code(官方版本)，出現的error ``` failed:This is an invalid model. In Node, ("batched_nms", EfficientNMS_TRT, "", -1) : ("output": tensor(float),) -> ("num_dets": tensor(int32),"det_boxes": tensor(float),"det_scores": tensor(float),"det_classes": tensor(int32),) , Error Node (batched_nms) has input size 1 not in range [min=2, max=3]. ``` reason：這是在使用export.py轉換model時，command加上 ``` --include-nms ``` 才會發生，移除後即可正常運行。 ## error5 執行onnxruntime code(官方版本)，出現的error ```bash= RuntimeError: /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:122 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] /onnxruntime_src/onnxruntime/core/providers/cuda/cuda_call.cc:116 bool onnxruntime::CudaCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = cudaError; bool THRW = true] CUDA failure 100: no CUDA-capable device is detected ; GPU=0 ; hostname=3695C02613 ; expr=cudaSetDevice(info_.device_id); ``` gpu問題，可先用cpu執行將官方版本的onnxruntime code中的 cuda = True 改為 cuda = False solution: 參考onnx官方的[CUDA Execution Provider](https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html) 此網站可確認ONNXRuntime與CUDA與cuDNN之間的版本支援關係。若是cuda本身的問題，如nvidia-smi無法使用。則建議重新run一個新的container，image為yolov7官方推薦的nvidia環境操作連結：https://github.com/WongKinYiu/yolov7/tree/a207844b1ce82d204ab36d87d496728d3d2348e7?tab=readme-ov-file#installation