Triton
ref :
https://www.youtube.com/watch?v=m-eaFJ5GK94
AI Inference Workflow
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
AI Inference 困難點
- 多種不同的框架和格式
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Triton 核心功能
- 支援多數主流框架
- Tensorflow
- Pytorch
- ONNX
- TensorRT
- 支援任何 Query Type
- Real Time
- Batch
- Streaming
- Ensemble
- 支援任何平台
- X86
- ARM
- Linux / Windows / Virtualization
- Public Cloud / Edge / Embedded
- DevOps & MLOps
- 整合 K8s、KServe、Prometheus 和 Grafane
- 效能監控
- Performance & Utilization
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Triton Architecture
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Feature
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Model Control API
- None
- POLL
- 啟動 Server 後,持續檢查是否有新的模型,有的話就 load model
- EXPLICT
- 啟動 Server 時不載入模型,透過 modelcontrol API 去 load model
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Triton Custom Backend
- 可串接自行開發的框架
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Model Ensembling
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Triton Inference server metrics for autoscaling
- 下圖範例為兩個配有 8 張 GPU 的 Server,每個 GPU 對應一組 model,
未使用 Triton : 無法有效利用,如下圖右,特定 GPU 使用率較高,無法有效分配運算資源
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
使用 Triton : 可同時載入所有 model 在同一 GPU 上,透過 Load Balance 方式自動分配
** 單一時間同一 model 被大量使用,可以同時分配到所有 GPU 上來推論,如下圖右所示
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Dynamic Batching Scheduler
- 最多支援到 batch 32
- 可以設定動態接收一段時間內的 batch,而不是每次都1個batch 1個 batch 跑
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Concurrent Model Execution
- 當你有多個同樣的 model 時,可用此功能來加速 Inference
- 每個 instance = model
- 當 request 數量 > instance 數量時,會先讓每個 instance 執行一次,剩下再分配到第二回合,以此類推
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Concurrent Modedl Execution ResNet 50 & Deep Recommender
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Model Analyzer
- 高 Throupt = 高 batch size = 高延遲
- 透過 Model Analyzer 可得到最分析報告來優化
- 增加可靠度,避免 GPU OOM
- 幫助判斷是否需要新增硬體資源
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Model Navigator
- 透過 tensorflow 和 pytorch 訓練出來的東西,會自動幫你轉換成 ONNX TRT,自動執行 Model Analyzer,並生成 helm chart
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
進階功能
Multi-Instance GPU (多執行個體 GPU)
- 支援 NVIDIA H100、A100 以及 A30 系列
- 最多可切成 7 個執行個體
- 各自完全獨立且具備個別的高頻寬記憶體、快取和運算核心
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
1 A100 - 7 Model Instances Using MIG
- 可透過 Load Balancer,平均分配到每個 GPU
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Triton on A100 with MIG
-
透過增加 MIG instances 數量來提高 Throughput
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
-
透過增加 MIG instances 數量來減少 Latency
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
-
透過 MIG 方式,可大幅增加 Throughput
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
DeepStream - Triton Pipeline
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Triton - DeepStream Deployment
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →