Guides: TensorRT Extension

# Guides: TensorRT-LLM Extension ## Overview Users with Nvidia GPUs can get 20-40% faster token speeds* on their laptop or desktops by using [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM). This guide walks you through how to install Jan's official TensorRT-LLM Extension. This extension uses [Nitro-TensorRT-LLM](https://github.com/janhq/nitro-tensorrt-llm) as the AI engine, instead of the default [Nitro-LlamaCPP](https://github.com/janhq/nitro). It includes an efficient C++ server to natively execute the [TRT-LLM C++ runtime](https://nvidia.github.io/TensorRT-LLM/gpt_runtime.html). It also comes with additional feature and performance improvements like OpenAI compatibility, tokenizer improvements, and queues. *Compared to using LlamaCPP engine. :::info This feature is only available for Windows users right now. Linux is coming soon. ::: ## Requirements - A indows PC - Nvidia GPU(s): Ada or Ampere series (e.g. 4070, 4090, 3080). More will be supported soon. - 3GB+ of disk space to download TRT-LLM artifacts and a Nitro binary - Jan v0.4.9+ or Jan v0.4.8-321+ (nightly) - Nvidia Driver v535+ ([installation guide](https://jan.ai/guides/common-error/not-using-gpu/#1-ensure-gpu-mode-requirements)) - CUDA Toolkit v12.2+ ([installation guide](https://jan.ai/guides/common-error/not-using-gpu/#1-ensure-gpu-mode-requirements)) :::info The complete installation takes ~10 minutes. ::: ## Install TensorRT-Extension 1. Go to Settings > Extensions 2. Click install next to the TensorRT-LLM Extension, taking note of the extension version number e.g. `v0.0.2` 3. Check that files are correctly downloaded ```sh # Your Jan Data Folder should now include `nitro.exe`, among other files ls ~\jan\extensions\@janhq\tensorrt-llm-extension\dist\bin ``` ## Download a Compatible Model TensorRT-LLM can only run models in `TensorRT` format, that are prebuilt specifically for the target hardware. These models are called Engines, We offer a handful of precompiled models for Ampere and Ada cards that you can immediately download and play with. 1. Restart the application and go to the Hub 2. Look for models with the `TensorRT-LLM` label in the recommended models list. Click download. ![image](https://hackmd.io/_uploads/rJewrEgRp.png) This step might take some time. 🙏 3. Click use and start chatting! :::info Due to our limited resources, we only prebuilt a few demo models. You can always build your desired models directly on your machine. [Read here](##Build-your-own-TensorRT-models). ::: ## Configure Settings You can customize the default parameters for how Jan runs TensorRT-LLM. 1. Go to ## Troubleshooting ### Install Nitro-TensorRT-LLM manually ### Build your own TensorRT models ### Incompatible Extension vs Engine versions ### Uninstall the Extension