AUO_PCL_Train_Fast_API_流程

# AUO_PCL_Train_Fast_API_流程 ## [GET] /weight/ **查看 weight_list** <details> <summary>[GET] /weight/</summary> ![1](https://hackmd.io/_uploads/B1M_XAVx1x.png) </details> 執行後，顯示 weight list。 <details> <summary>return</summary> ![2](https://hackmd.io/_uploads/Hyy570Nx1l.png) </details> --- ## [POST] /weight/{name} **Load pretrained weight** <details> <summary>[POST]/weight/</summary> ![3](https://hackmd.io/_uploads/HJJTmCElye.png) </details> - **name**: 設定 weight 名稱。 - **info**: 設定 weight 註解。 - **weight**: 上傳 pretrained weight (zip 檔)，包含 model weight (.pth.tar) 和 config (.yaml)。 <details> <summary>return</summary> - **error_code: 0**: 執行成功，並回傳 weight_id。 - **error_code: 1**: 上傳的檔案不是 zip 檔案。 </details> --- ## [DELETE] /weight/{weight_id} **Delete pretrained weight** <details> <summary>return</summary> ![4](https://hackmd.io/_uploads/r1Kv40Vekx.png) </details> - **weight_id**: 設定要刪除的 weight_id。 <details> <summary>return</summary> - **error_code: 0**: 執行成功。 - **error_code: 1**: 模型不存在。 </details> --- ## [POST] /train/ **上傳 dataset，訓練模型** <details> <summary>[POST]/train/</summary> ![5](https://hackmd.io/_uploads/HJvZHAEgkg.png) </details> - **name** (必填): 設定模型名稱。 - **info**: 設定模型註解。 - **callback_url**: 設定回傳 watchdog 監測訓練狀態的 URL (可略過)。 - **resume**: 當程式碼意外中斷，可以選擇用斷點之前的checkpoint恢復訓練。 - **weight_id**: 使用pretrained weight(基本上用不到)。 - **batch_size**: 設定訓練的 batch size。 - **workers**: 設定訓練的 workers。 - **epochs**: 設定訓練的 epochs (請設為800)。 - **warmup_epoch**: 設定訓練的 warmup_epoch (請設為500)。 - **num_cluster**: 設定訓練的 num_cluster。 - **world_size**: 設定訓練的 world_size。 - **rank**: 設定訓練的 rank。 - **data** (必填): 上傳 dataset (zip 檔)。 - zip 檔中有兩層，第一層為VQ-VAE所產生的結果(共有四個資料夾)，第二層為各子目錄中的圖片。 - 子目錄名稱即是訓練資料。 - 子目錄的名稱不能更改 !! ### dataset 格式： dataset ├── defect_img ├── defect_mask ├── non_defect_img └── non_defect_mask <details> <summary>return</summary> ![6](https://hackmd.io/_uploads/ByiZwRExyg.png) </details> <details> <summary>return</summary> - **error_code: 0**: 執行成功。 - **error_code: 1**: 另一個 job 正在訓練或測試。 - **error_code: 2**: weight 不存在。 - **error_code: 3**: 上傳的檔案不是 zip 檔案。 </details> --- ## [GET] /train/ **查看訓練的 job list** <details> <summary>[GET]/train/</summary> ![7](https://hackmd.io/_uploads/HyeRqAEg1g.png) </details> 按下執行，即可得到目前所有的job list。 <details> <summary>return</summary> ![8](https://hackmd.io/_uploads/Bygmi04lJl.png) </details> --- ## [GET] /train_status **查看訓練狀態** (watchdog 監測模型訓練狀態) <details> <summary>return</summary> ![9](https://hackmd.io/_uploads/S1rt0A4eJe.png) </details> 按下執行，即可得到目前的train status。 <details> <summary>return</summary> - **idle**: 查看目前 process 是否空閒。(True: 可訓練/測試；False: 已有人在訓練/測試) - **completed**: 查看目前 job 是否完成。(True: 完成；False: 未完成) </details> --- ## [DELETE] /train/{job_id} **中止訓練(刪除watchdog process、dataset、job)** <details> <summary>[DELETE]/train/</summary> ![11](https://hackmd.io/_uploads/rkQu11Sekl.png) </details> - **job_id**: 輸入要中止訓練的 job_id。 <details> <summary>return</summary> - **error_code: 0**: 執行成功。 - **error_code: 1**: job 不存在。 - **error_code: 2**: job 不是訓練任務。 - **error_code: 3**: job 已經完成。 </details> --- ## [GET] /train/result/{job_id} **下載訓練結果** (包含模型weight(.pth)、config(.yaml)) <details> <summary>[GET]/train/result/{job_id}</summary> ![12](https://hackmd.io/_uploads/H1P6kyrekg.png) </details> <details> <summary>return</summary> - **出現下載框** </details> --- ## [DELETE] /train/result/{job_id} **刪除訓練結果** <details> <summary>return</summary> ![13](https://hackmd.io/_uploads/SkaMlyBg1l.png) </details> - **job_id**: 設定要刪除的 job_id。 <details> <summary>return</summary> - **error_code: 0**: 執行成功。 - **error_code: 1**: job 不存在。 - **error_code: 2**: job 不是訓練任務。 </details>