Triton

ref :
https://www.youtube.com/watch?v=m-eaFJ5GK94

AI Inference Workflow

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

AI Inference 困難點

多種不同的框架和格式
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →

Triton 核心功能

支援多數主流框架
- Tensorflow
- Pytorch
- ONNX
- TensorRT
支援任何 Query Type
- Real Time
- Batch
- Streaming
- Ensemble
支援任何平台
- X86
- ARM
- Linux / Windows / Virtualization
- Public Cloud / Edge / Embedded
DevOps & MLOps
- 整合 K8s、KServe、Prometheus 和 Grafane
- 效能監控
Performance & Utilization
- 模型優化和分析並產出最佳化報告
- 優化延遲和吞吐量

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Triton Architecture

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Feature

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Model Control API

None
- 啟動 Server 時載入本地資料夾內所有模型
POLL
- 啟動 Server 後，持續檢查是否有新的模型，有的話就 load model
EXPLICT
- 啟動 Server 時不載入模型，透過 modelcontrol API 去 load model

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Triton Custom Backend

可串接自行開發的框架
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →

Model Ensembling

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Triton Inference server metrics for autoscaling

下圖範例為兩個配有 8 張 GPU 的 Server，每個 GPU 對應一組 model，

未使用 Triton : 無法有效利用，如下圖右，特定 GPU 使用率較高，無法有效分配運算資源

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

使用 Triton : 可同時載入所有 model 在同一 GPU 上，透過 Load Balance 方式自動分配

** 單一時間同一 model 被大量使用，可以同時分配到所有 GPU 上來推論，如下圖右所示

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Dynamic Batching Scheduler

最多支援到 batch 32
可以設定動態接收一段時間內的 batch，而不是每次都1個batch 1個 batch 跑

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Concurrent Model Execution

當你有多個同樣的 model 時，可用此功能來加速 Inference
每個 instance = model
當 request 數量 > instance 數量時，會先讓每個 instance 執行一次，剩下再分配到第二回合，以此類推

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Concurrent Modedl Execution ResNet 50 & Deep Recommender

部署多種 instance， request queue 會根據開啟的 instance 數量來分配
instance 需要自行設定數量
triton 會自動最佳化去幫你分配資源

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Model Analyzer

高 Throupt = 高 batch size = 高延遲
透過 Model Analyzer 可得到最分析報告來優化
增加可靠度，避免 GPU OOM
幫助判斷是否需要新增硬體資源

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Model Navigator

透過 tensorflow 和 pytorch 訓練出來的東西，會自動幫你轉換成 ONNX TRT，自動執行 Model Analyzer，並生成 helm chart

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

進階功能

Multi-Instance GPU (多執行個體 GPU)

支援 NVIDIA H100、A100 以及 A30 系列
最多可切成 7 個執行個體
各自完全獨立且具備個別的高頻寬記憶體、快取和運算核心

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

1 A100 - 7 Model Instances Using MIG

可透過 Load Balancer，平均分配到每個 GPU

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Triton on A100 with MIG

透過增加 MIG instances 數量來提高 Throughput
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
透過增加 MIG instances 數量來減少 Latency
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →
透過 MIG 方式，可大幅增加 Throughput
Image Not Showing Possible Reasons
- The image file may be corrupted
- The server hosting the image is unavailable
- The image path is incorrect
- The image format is not supported
Learn More →

DeepStream - Triton Pipeline

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

Triton - DeepStream Deployment

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →

AI - Edge to Cloud Platform

Image Not Showing Possible Reasons

The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported

Learn More →