TorchServe-Pytorch模型從訓練到佈署

--- title: 'TorchServe-Pytorch模型從訓練到佈署' disqus: hackmd --- TorchServe-Pytorch模型從訓練到佈署 === :::info **Find this document incomplete?** Leave a comment! **YuehChuan** *2020.11.17* ::: ![](https://i.imgur.com/W4qVAqJ.png) ## 目錄 [TOC] ## 安裝官方文件: https://github.com/pytorch/serve https://github.com/pytorch/serve/blob/master/README.md#serve-a-model `pip install torchserve torch-model-archiver` 特色，可註冊多個模型做model serving 訓練pytorch模型-以Resnet50為例 --- 假如你沒用過pytorch 請參照 PyTorch 入門最速傳說 https://gist.github.com/YuehChuan/8acce82806e3831da7381103d2c6ec64 訓練的程式: https://github.com/YuehChuan/resnet-torch/blob/main/train-gpu.py 測試的程式: GPU https://github.com/YuehChuan/resnet-torch/blob/main/test-gpu.py CPU 將訓練好的模型權重(.pth) 轉換成torchScript(.pt) --- 目的: python訓練完的模型權重是.pth 轉成torchScript後可透過C++做推論並且作量化(quantize)最佳化比如我把 **pds.pth** 轉換成**resnet-50-batch.pt** ```python= #traced mode from torchvision import models import torch #model = models.resnet152(pretrained=True) device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model=torch.load('pds.pth') model.eval() example_input = torch.rand(1, 3, 224, 224).to(device) traced_script_module = torch.jit.trace(model, example_input) traced_script_module.save("resnet50-batch.pt") ``` ![](https://i.imgur.com/yr6EDnM.png) https://github.com/YuehChuan/serve/blob/pds-dev/examples/image_classifier/pds_resnet50/export.py 準備torchServe需要的材料 --- ![](https://i.imgur.com/uruusO6.png) 在我的根目錄底下有 1. config.properties 2. model_store資料夾以及我們要修改 3. /home/corleone/serve/examples/image_classifier/pds_resnet50 路徑下的檔案們 4. index_to_name.json 更改你的類別名稱~ {"0": ["n01440764", "corona"], "1": ["n01443537", "external"], "2": ["n01484850", "internal"], "3": ["n01491361", "noise"], 一個字典key數字值:list 因為用ImageNet權重pretrained做transfer learning >哈聽不懂8 我也很討厭這動不動賣弄專有術語把簡單事情描述複雜的人~ (不過這樣就可以忽悠不知道的~) 個人最討厭術語No.1 GroundTruth! (當學生的時候被問你的groundtruth, groundtruth勒? 「蛤?」「就ground truth啊!」 X 講理論值這麼難嗎? ImageNet是一個資料集，有google專家替我們把神經網路前面幾層權重透過洪荒之力訓練好了~~ 於是我們在訓練的時候可以凍結除了分類問題全連接層(Fully convolution layer 用個術語哈)，網路前段使用他們的權重抽取特徵，會比我們用自己的資料集訓練整個網路要準確(站在巨人肩膀上~) 這個整個流程就叫做 "遷移學習"(Transfer learning) 我們通常用自己的資料集1000張照片以下規模常常採用遷移學習。是不是清楚多了~ 遷移學習說根本就是個buzz word 那麼這些學術神棍不知道的是，為什麼使用imageNet圖片都要做過伸縮和平移某個值呢? 大致上是ImageNet的平均數和標準差有關請看~ [name=YuehChuan] :::info 關於pretrain初始化，imageNet照片都要先做平移與伸縮 mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] https://github.com/pytorch/vision/issues/1439 :bird: ::: ![](https://i.imgur.com/pyMieIN.png) ![](https://i.imgur.com/89QDJU9.png) ![](https://i.imgur.com/qKUeE57.png) ![](https://i.imgur.com/VBbVvvV.png) https://github.com/YuehChuan/serve/tree/pds-dev/examples/image_classifier/pds_resnet50 config.properties指定模型放的路徑以及Netty要開啟幾個worker model_store放轉換好的torchScript檔案接著主要是註冊一些進行推論前處理後處理的handler(我也不知handler怎翻) 總之就是觸發一次推論會呼叫的動作 torch-model-archiver將torchScript模轉換成.mar --- ```bash= torch-model-archiver --model-name resnet-50-batch --version 1.0 --model-file ./serve/examples/image_classifier/pds_resnet50/model.py --serialized-file resnet50-batch.pt --handler ./serve/examples/image_classifier/pds_resnet50/resnet152_handler.py --extra-files ./serve/examples/image_classifier/pds_resnet50/index_to_name.json ``` 啟動 --- ```bash= torchserve --start --ncs --model-store model_store --models resnet-50-batch.mar ``` 在另一臺電腦透過curl傳圖做推論 --- 單張圖測試 curl http://<遠端Server電腦ip>:5050/predictions/resnet-50-batch -T /home/schwarm/val/internal/13833_20191222-112800_5.png 多張圖也行~ curl http://<遠端Server電腦ip>:5050/predictions/resnet-50-batch -T /home/schwarm/val/external/4967_20191229-084714_2.png & curl http://<遠端Server電腦ip>:5050/predictions/resnet-50-batch -T /home/schwarm/val/noise/10045_20191230-163304_2.png [Demo影片](https://youtu.be/RKGhF1uFemg) ![](https://i.imgur.com/BtwP5yX.png) ```python= print('Great!!! Now, everyone is happy! (◕ ‿ ◕ )!') ``` ```sequence client->Server: curl http://192.168.2.1:5050/predictions/resnet-50-batch -T corona.png? Note right of Server: torchServe Inference Server-->client: It's corona discharge ! ``` ## Appendix and FAQ https://docs.aws.amazon.com/zh_tw/dlami/latest/devguide/tutorial-torchserve.html https://www.youtube.com/watch?v=AIrrI8WOIuk Easily Deploy PyTorch models in Production on AWS with TorchServe - AWS Online Tech Talks https://www.youtube.com/watch?v=mYV8nk29m8o https://aws.amazon.com/cn/blogs/machine-learning/deploying-pytorch-models-for-inference-at-scale-using-torchserve/ 我比[這篇](https://towardsdatascience.com/deploy-models-and-create-custom-handlers-in-torchserve-fc2d048fbe91)還早弄出來XD 只是敝帚自珍一陣(茶) 但是發現想使用更多奇淫進階功能得逼迫自己寫出來先存 https://www.zhihu.com/question/389731764 https://pytorch.org/elastic/0.2.1/index.html https://hackmd.io/2be4Cc3tSrC2OJsE-JPErA https://hackmd.io/ziNEB3qBSseAi2H-Uy06Pg TorchServe on AWS https://torchserve-on-aws.workshop.aws/en/ > Read more about sequence-diagrams here: http://bramp.github.io/js-sequence-diagrams/ Project Timeline --- ```mermaid gantt title A Gantt Diagram section Section A task :a1, 2014-01-01, 30d Another task :after a1 , 20d section Another Task in sec :2014-01-12 , 12d anther task : 24d ``` > Read more about mermaid here: http://mermaid-js.github.io/mermaid/ ###### tags: `Templates` `Documentation` `torchServe` `pds`