# Run SGLang Thor & Spark
1. Install uv
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
2. Create environment
```bash
uv venv .sglang --python 3.12
source .sglang/bin/activate
sudo apt install python3-dev python3.12-dev
```
3. Export variables
```bash
export TORCH_CUDA_ARCH_LIST=11.0a # Spark, for Thor 11.0a
export TRITON_PTXAS_PATH=/usr/local/cuda/bin/ptxas
export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
```
4. Install SGLang
```bash
uv pip install -U sglang --pre \
--index-url https://sgl-project.github.io/whl/cu130/ \
--extra-index-url https://pypi.org/simple \
--extra-index-url https://download.pytorch.org/whl/cu130 \
--index-strategy unsafe-best-match
# Step 2: Install CUDA 13.0 kernel
uv pip install -U sglang-kernel \
--extra-index-url https://sgl-project.github.io/whl/cu130/ \
--extra-index-url https://download.pytorch.org/whl/cu130 \
--index-strategy unsafe-best-match
uv pip install --prerelease=allow --force-reinstall triton --index-url https://download.pytorch.org/whl/test/cu132
```
5. Clean memory
```bash
sudo sysctl -w vm.drop_caches=3
```
6. Run nemotron nvfp4
```bash
python3 -m sglang.launch_server \
--model-path nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 \
--trust-remote-code \
--tp 1 \
--attention-backend flashinfer \
--tool-call-parser qwen3_coder \
--reasoning-parser nano_v3 \
--mem-fraction-static 0.6 \
--cuda-graph-max-bs 16
```