owned this note
owned this note
Published
Linked with GitHub
---
title: 'TorchServe-Pytorch模型從訓練到佈署'
disqus: hackmd
---
TorchServe-Pytorch模型從訓練到佈署
===
:::info
**Find this document incomplete?** Leave a comment!
**YuehChuan** *2020.11.17*
:::
![](https://i.imgur.com/W4qVAqJ.png)
## 目錄
[TOC]
## 安裝
官方文件:
https://github.com/pytorch/serve
https://github.com/pytorch/serve/blob/master/README.md#serve-a-model
`pip install torchserve torch-model-archiver`
特色,可註冊多個模型做model serving
訓練pytorch模型-以Resnet50為例
---
假如你沒用過pytorch
請參照
PyTorch 入門最速傳說
https://gist.github.com/YuehChuan/8acce82806e3831da7381103d2c6ec64
訓練的程式:
https://github.com/YuehChuan/resnet-torch/blob/main/train-gpu.py
測試的程式:
GPU
https://github.com/YuehChuan/resnet-torch/blob/main/test-gpu.py
CPU
將訓練好的模型權重(.pth) 轉換成torchScript(.pt)
---
目的:
python訓練完的模型權重是.pth
轉成torchScript後可透過C++做推論
並且作量化(quantize)最佳化
比如我把 **pds.pth** 轉換成**resnet-50-batch.pt**
```python=
#traced mode
from torchvision import models
import torch
#model = models.resnet152(pretrained=True)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model=torch.load('pds.pth')
model.eval()
example_input = torch.rand(1, 3, 224, 224).to(device)
traced_script_module = torch.jit.trace(model, example_input)
traced_script_module.save("resnet50-batch.pt")
```
![](https://i.imgur.com/yr6EDnM.png)
https://github.com/YuehChuan/serve/blob/pds-dev/examples/image_classifier/pds_resnet50/export.py
準備torchServe需要的材料
---
![](https://i.imgur.com/uruusO6.png)
在我的根目錄底下有
1. config.properties
2. model_store資料夾
以及我們要修改
3. /home/corleone/serve/examples/image_classifier/pds_resnet50
路徑下的檔案們
4. index_to_name.json 更改你的類別名稱~
{"0": ["n01440764", "corona"], "1": ["n01443537", "external"], "2": ["n01484850", "internal"], "3": ["n01491361", "noise"],
一個字典key數字 值:list
因為用ImageNet權重pretrained做transfer learning
>哈 聽不懂8 我也很討厭這動不動賣弄專有術語把簡單事情描述複雜的人~
(不過這樣就可以忽悠不知道的~) 個人最討厭術語No.1 GroundTruth!
(當學生的時候被問 你的groundtruth, groundtruth勒? 「蛤?」「就ground truth啊!」 X 講理論值這麼難嗎?
ImageNet是一個資料集,有google專家替我們把神經網路前面幾層權重透過洪荒之力訓練好了~~
於是我們在訓練的時候可以凍結除了分類問題全連接層(Fully convolution layer 用個術語 哈),網路前段使用他們的權重抽取特徵,會比我們用自己的資料集訓練整個網路要準確(站在巨人肩膀上~)
這個整個流程就叫做 "遷移學習"(Transfer learning)
我們通常用自己的資料集1000張照片以下規模常常採用遷移學習。
是不是清楚多了~ 遷移學習說根本就是個buzz word
那麼這些學術神棍不知道的是,為什麼使用imageNet圖片都要做過伸縮和平移某個值呢?
大致上是ImageNet的平均數和標準差有關 請看~ [name=YuehChuan]
:::info
關於pretrain初始化,imageNet照片都要先做平移與伸縮
mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
https://github.com/pytorch/vision/issues/1439 :bird:
:::
![](https://i.imgur.com/pyMieIN.png)
![](https://i.imgur.com/89QDJU9.png)
![](https://i.imgur.com/qKUeE57.png)
![](https://i.imgur.com/VBbVvvV.png)
https://github.com/YuehChuan/serve/tree/pds-dev/examples/image_classifier/pds_resnet50
config.properties指定模型放的路徑 以及Netty要開啟幾個worker
model_store放轉換好的torchScript檔案
接著主要是註冊一些進行推論前處理後處理的handler(我也不知handler怎翻) 總之就是觸發一次推論會呼叫的動作
torch-model-archiver將torchScript模轉換成.mar
---
```bash=
torch-model-archiver --model-name resnet-50-batch --version 1.0 --model-file ./serve/examples/image_classifier/pds_resnet50/model.py --serialized-file resnet50-batch.pt --handler ./serve/examples/image_classifier/pds_resnet50/resnet152_handler.py --extra-files ./serve/examples/image_classifier/pds_resnet50/index_to_name.json
```
啟動
---
```bash=
torchserve --start --ncs --model-store model_store --models resnet-50-batch.mar
```
在另一臺電腦透過curl傳圖做推論
---
單張圖測試
curl http://<遠端Server電腦ip>:5050/predictions/resnet-50-batch -T /home/schwarm/val/internal/13833_20191222-112800_5.png
多張圖也行~
curl http://<遠端Server電腦ip>:5050/predictions/resnet-50-batch -T /home/schwarm/val/external/4967_20191229-084714_2.png & curl http://<遠端Server電腦ip>:5050/predictions/resnet-50-batch -T /home/schwarm/val/noise/10045_20191230-163304_2.png
[Demo影片](https://youtu.be/RKGhF1uFemg)
![](https://i.imgur.com/BtwP5yX.png)
```python=
print('Great!!! Now, everyone is happy! (◕ ‿ ◕ )!')
```
```sequence
client->Server: curl http://192.168.2.1:5050/predictions/resnet-50-batch -T corona.png?
Note right of Server: torchServe Inference
Server-->client: It's corona discharge !
```
## Appendix and FAQ
https://docs.aws.amazon.com/zh_tw/dlami/latest/devguide/tutorial-torchserve.html
https://www.youtube.com/watch?v=AIrrI8WOIuk
Easily Deploy PyTorch models in Production on AWS with TorchServe - AWS Online Tech Talks
https://www.youtube.com/watch?v=mYV8nk29m8o
https://aws.amazon.com/cn/blogs/machine-learning/deploying-pytorch-models-for-inference-at-scale-using-torchserve/
我比[這篇](https://towardsdatascience.com/deploy-models-and-create-custom-handlers-in-torchserve-fc2d048fbe91)還早弄出來XD
只是敝帚自珍一陣(茶) 但是發現想使用更多奇淫進階功能得逼迫自己寫出來
先存
https://www.zhihu.com/question/389731764
https://pytorch.org/elastic/0.2.1/index.html
https://hackmd.io/2be4Cc3tSrC2OJsE-JPErA
https://hackmd.io/ziNEB3qBSseAi2H-Uy06Pg
TorchServe on AWS
https://torchserve-on-aws.workshop.aws/en/
> Read more about sequence-diagrams here: http://bramp.github.io/js-sequence-diagrams/
Project Timeline
---
```mermaid
gantt
title A Gantt Diagram
section Section
A task :a1, 2014-01-01, 30d
Another task :after a1 , 20d
section Another
Task in sec :2014-01-12 , 12d
anther task : 24d
```
> Read more about mermaid here: http://mermaid-js.github.io/mermaid/
###### tags: `Templates` `Documentation` `torchServe` `pds`