# Run SGLang Thor & Spark 1. Install uv ```bash curl -LsSf https://astral.sh/uv/install.sh | sh ``` 2. Create environment ```bash uv venv .sglang --python 3.12 source .sglang/bin/activate sudo apt install python3-dev python3.12-dev ``` 3. Export variables ```bash export TORCH_CUDA_ARCH_LIST=11.0a # Spark, for Thor 11.0a export TRITON_PTXAS_PATH=/usr/local/cuda/bin/ptxas export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH ``` 4. Install SGLang ```bash uv pip install -U sglang --pre \ --index-url https://sgl-project.github.io/whl/cu130/ \ --extra-index-url https://pypi.org/simple \ --extra-index-url https://download.pytorch.org/whl/cu130 \ --index-strategy unsafe-best-match # Step 2: Install CUDA 13.0 kernel uv pip install -U sglang-kernel \ --extra-index-url https://sgl-project.github.io/whl/cu130/ \ --extra-index-url https://download.pytorch.org/whl/cu130 \ --index-strategy unsafe-best-match uv pip install --prerelease=allow --force-reinstall triton --index-url https://download.pytorch.org/whl/test/cu132 ``` 5. Clean memory ```bash sudo sysctl -w vm.drop_caches=3 ``` 6. Run nemotron nvfp4 ```bash python3 -m sglang.launch_server \ --model-path nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 \ --trust-remote-code \ --tp 1 \ --attention-backend flashinfer \ --tool-call-parser qwen3_coder \ --reasoning-parser nano_v3 \ --mem-fraction-static 0.6 \ --cuda-graph-max-bs 16 ```