Triton

ref :
https://www.youtube.com/watch?v=m-eaFJ5GK94

AI Inference Workflow

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

AI Inference 困難點

  1. 多種不同的框架和格式
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

Triton 核心功能

  • 支援多數主流框架
    • Tensorflow
    • Pytorch
    • ONNX
    • TensorRT
  • 支援任何 Query Type
    • Real Time
    • Batch
    • Streaming
    • Ensemble
  • 支援任何平台
    • X86
    • ARM
    • Linux / Windows / Virtualization
    • Public Cloud / Edge / Embedded
  • DevOps & MLOps
    • 整合 K8s、KServe、Prometheus 和 Grafane
    • 效能監控
  • Performance & Utilization
    • 模型優化和分析並產出最佳化報告
    • 優化延遲和吞吐量

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Triton Architecture

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Feature

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Model Control API

  • None
    • 啟動 Server 時載入本地資料夾內所有模型
  • POLL
    • 啟動 Server 後,持續檢查是否有新的模型,有的話就 load model
  • EXPLICT
    • 啟動 Server 時不載入模型,透過 modelcontrol API 去 load model

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Triton Custom Backend

  • 可串接自行開發的框架
    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

Model Ensembling

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Triton Inference server metrics for autoscaling

  • 下圖範例為兩個配有 8 張 GPU 的 Server,每個 GPU 對應一組 model,

未使用 Triton : 無法有效利用,如下圖右,特定 GPU 使用率較高,無法有效分配運算資源

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

使用 Triton : 可同時載入所有 model 在同一 GPU 上,透過 Load Balance 方式自動分配

** 單一時間同一 model 被大量使用,可以同時分配到所有 GPU 上來推論,如下圖右所示

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Dynamic Batching Scheduler

  • 最多支援到 batch 32
  • 可以設定動態接收一段時間內的 batch,而不是每次都1個batch 1個 batch 跑

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Concurrent Model Execution

  • 當你有多個同樣的 model 時,可用此功能來加速 Inference
  • 每個 instance = model
  • 當 request 數量 > instance 數量時,會先讓每個 instance 執行一次,剩下再分配到第二回合,以此類推

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Concurrent Modedl Execution ResNet 50 & Deep Recommender

  • 部署多種 instance, request queue 會根據開啟的 instance 數量來分配

  • instance 需要自行設定數量

  • triton 會自動最佳化去幫你分配資源

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Model Analyzer

  • 高 Throupt = 高 batch size = 高延遲
  • 透過 Model Analyzer 可得到最分析報告來優化
  • 增加可靠度,避免 GPU OOM
  • 幫助判斷是否需要新增硬體資源

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Model Navigator

  • 透過 tensorflow 和 pytorch 訓練出來的東西,會自動幫你轉換成 ONNX TRT,自動執行 Model Analyzer,並生成 helm chart

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

進階功能

Multi-Instance GPU (多執行個體 GPU)

  • 支援 NVIDIA H100、A100 以及 A30 系列
  • 最多可切成 7 個執行個體
  • 各自完全獨立且具備個別的高頻寬記憶體、快取和運算核心

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

1 A100 - 7 Model Instances Using MIG

  • 可透過 Load Balancer,平均分配到每個 GPU

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Triton on A100 with MIG

  • 透過增加 MIG instances 數量來提高 Throughput

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

  • 透過增加 MIG instances 數量來減少 Latency

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

  • 透過 MIG 方式,可大幅增加 Throughput

    Image Not Showing Possible Reasons
    • The image file may be corrupted
    • The server hosting the image is unavailable
    • The image path is incorrect
    • The image format is not supported
    Learn More →

DeepStream - Triton Pipeline

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Triton - DeepStream Deployment

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

AI - Edge to Cloud Platform

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →