TEE GPU Benchmark - HackMD

slide: https://hackmd.io/rrhoBatjTBCAMjgBEsWZmw

Who am I?

Hang Yin
Cofounder & CTO @ Phala Network
5 years TEE Apologist; AI Decentralizer
Proud father of two

Why TEE-GPU?

Why TEE-GPU?

AI is too powerful to be a monopoly
Decentralize it!
How?

Web3 to Break The Data Wall

TEE GPU is the Only Pragmatic Solution

TEE Enabled GPU

Model	VRAM	TFlops (TF32)	Bandwidth	Avail.
nVIDIA H100	94G	989	3.9 TB/s	2024Q1
nVIDIA H200	141G	989	4.8 TB/s	2024Q4/2025

How it works

How it works

Bottleneck: CPU-GPU IO

Experiment Platform

Part	Configuration
GPU	nVIDIA H100 NVL (94GB, 3.9TB/s bandwidth)
CPU	AMD EPYC 9V84 96-Core Processor with SEV-SNP
RAM	314 GB
CUDA	12.5 (driver version 555.42.02)
CUDA Kernel	550.90.07

Experiment Platform

Part	Configuration
Benchmark Suite	vLLM v0.5.4
Models	Meta-Llama-3.1-8B-Instruct
	Phi-3-14B-128k-Instruct
	Meta-Llama-3.1-70B-Instruct

Average overhead is less than 7%!

TPS: Output token per second
QPS: Throughput on queries per second

Overhead reduces toward zero as the model size grows

Length: Short - 100 | Medium - 500 | Long 500+

Latency is the main overhead

TTFT: Time to First Token
ITL: Inter-token latency

Summary

Avg overhead <7%
Mainly due to the bandwidth / latency of CPU-GPU IO
The more computation, the less overhead

Future Work

Benchmark for training
Benchmarking process using H200
- 50% more VRAM, good for bigger models!
- Access to H200 since Aug'24
- Benchmark finished
- Lots of pitfalls for Intel TDX + H200
- Releasing report in weeks

nVIDIA Remote Attestation

https://www.loom.com/share/b07ff646fcbe47f3b365be66cd549328

ECDSA Signed Model Output

MorpheusAI + TEE GPU

https://x.com/i/status/1826411655334691258

Thank you!

You can find me on

Twitter: x.com/bgmshana
Telegram: @h4x3rotab

slide: https://hackmd.io/rrhoBatjTBCAMjgBEsWZmw

{"title":"TEE GPU Benchmark","breaks":true,"description":"View the slide with \"Slide Mode\".","contributors":"[{\"id\":\"5a6697bc-81b6-4e3d-8398-f62afc6b49fb\",\"add\":3392,\"del\":2584}]"}

changed 9 months ago 284 views