Triton Inference server installation.

# Triton Inference server installation. ###### tags: `NVIDIA` `Triton` `inference` `ML` Sohail Anjum Last update: 2021.04.28 ## Platform * Model: CB-1921-AA1 * OS: Ubuntu 20.04 (5.4.0-42-generic) ## Docker Installtion Install the prerequsities: ```javascript= $ sudo apt-get update $ sudo apt-get install \ apt-transport-https \ ca-certificates \ curl \ gnupg-agent \ software-properties-common ``` Add gpg key: ```javascript= $ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - $ sudo apt-key fingerprint 0EBFCD88 Make sure result is like this: 9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88 ``` Add repository: ```javascript= $ sudo add-apt-repository \ "deb [arch=amd64] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) \ stable" $ sudo apt-get update ``` Docker Installtion: ```javascript= $ sudo apt-get install docker-ce docker-ce-cli containerd.io ``` Check the Docker Version: ```javascript= $ docker version ``` ## Now Installtion the NVIDIA Driver download .run installing driver file from NVIDIA website: > Note: Select your model of Nvida and OS before Downlaoding: https://www.nvidia.com/Download/index.aspx?lang=en-us before installtion blacklist nouveau driver create a file: ```javascript= $ sudo vim /etc/modprobe.d/blacklist-nouveau.conf ``` in blacklist-nouveau.conf ```javascript= blacklist nouveau options nouveau modeset=0 ``` save the file and exit final: ```javascript= $ sudo update-initramfs -u $ sudo reboot ``` After restarting, we can use the following command to confirm whether Nouveau has stopped working: ```javascript= lsmod | grep nouveau ``` If nothing is printed, then congratulations! You have disabled Nouveau's kernel driver. Now we can try again to see if we can install Nvidia's official driver make it excutable ```javascript= $ chmod +x NVIDIA-Linux-x86_64-460.32.03.run //make it executable ``` install gcc and make ```javascript= $ sudo apt-get install gcc make ``` installing nvidia driver ```javascript= $ ./NVIDIA-Linux-x86_64-460.32.03.run //name of file may be different, depends on the version which you download from ``` in .run, there're some warnings, just choose continue installing item and finish the installing procedure and ```javascript= $ reboot ``` after reboot, press nvidia-smi to see the driver is OK or not ```javascript= $ nvidia-smi ``` The output would be like this ![](https://i.imgur.com/KeYic0A.png) ## Triton Inference Server Installation Before you can use the Triton Docker image you must install Docker. If you plan on using a GPU for inference you must also install the NVIDIA Container Toolkit. https://github.com/NVIDIA/nvidia-docker ![](https://i.imgur.com/Qk4IFV3.png) Make sure you have installed the NVIDIA driver and Docker engine for your Linux distribution Note that you do not need to install the CUDA Toolkit on the host system, but the NVIDIA driver needs to be installed For instructions on getting started with the NVIDIA Container Toolkit, refer to the installation guide. https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#install-guide ### Setting up Docker Docker-CE on Ubuntu can be setup using Docker’s official convenience script: ```javascript= curl https://get.docker.com | sh \ && sudo systemctl --now enable docker ``` ![](https://i.imgur.com/7KLWNeY.png) ![](https://i.imgur.com/LNB8FlE.png) ### Setting up NVIDIA Container Toolkit Setup the stable repository and the GPG key: ```javascript= distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - \ && curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list ``` ![](https://i.imgur.com/qoKRYnI.png) To get access to experimental features such as CUDA on WSL or the new MIG capability on A100, you may want to add the experimental branch to the repository listing: ```javascript= curl -s -L https://nvidia.github.io/nvidia-container-runtime/experimental/$distribution/nvidia-container-runtime.list | sudo tee /etc/apt/sources.list.d/nvidia-container-runtime.list ``` ![](https://i.imgur.com/o6Umitz.png) ### Install the nvidia-docker2 package (and dependencies) after updating the package listing: ```javascript= sudo apt-get update sudo apt-get install -y nvidia-docker2 ``` ![](https://i.imgur.com/Vh4HXU4.png) ![](https://i.imgur.com/fBTaCFb.png) ### Restart the Docker daemon to complete the installation after setting the default runtime: ```javascript= sudo systemctl restart docker ``` ### At this point, a working setup can be tested by running a base CUDA container: ```javascript= sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi ``` This should result in a console output shown below: ![](https://i.imgur.com/C2ahhhC.png) ### Pull the image using the following command. I am using tritonserver version 21.04. ```javascript= docker pull nvcr.io/nvidia/tritonserver:21.04-py3 ``` ![](https://i.imgur.com/j5nS2pq.png) ### Download the zip folder from the github link make do all the things without root user: https://github.com/triton-inference-server/server unzip the folder ```javascript= sudo unzip server-master.zip ``` ![](https://i.imgur.com/CTSh8JH.png) cd Downloads/server-master/docs/example ```javascript= sudo ./fetch_models.sh ``` ![](https://i.imgur.com/toA2LgR.png) ### Run on System with GPUs Use the following command to run Triton with the example model repository you just created. The NVIDIA Container Toolkit must be installed for Docker to recognize the GPU(s). The --gpus=1 flag indicates that 1 system GPU should be made available to Triton for inferencing. ```javascript= docker run --gpus=2 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/home/aewin/Downloads/server-master/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:21.04-py3 tritonserver --model-repository=/models ``` ![](https://i.imgur.com/MTwy8OW.png) ![](https://i.imgur.com/OEQDcig.png) ![](https://i.imgur.com/8zlANBv.png) .......... .......... .......... ![](https://i.imgur.com/6gzl6AE.png) ![](https://i.imgur.com/hNhcYnl.png) ### Verify Triton Is Running Correctly Use Triton’s ready endpoint to verify that the server and the models are ready for inference. From the host system use curl to access the HTTP endpoint that indicates server status. The HTTP request returns status 200 if Triton is ready and non-200 if it is not ready. ```javascript= curl -v localhost:8000/v2/health/ready ... < HTTP/1.1 200 OK < Content-Length: 0 < Content-Type: text/plain ``` ![](https://i.imgur.com/ZeDfszG.png) ## Getting The Client Examples Use docker pull to get the client libraries and examples image from NGC. ```javascript= docker pull nvcr.io/nvidia/tritonserver:21.04-py3-sdk docker run -it --rm --net=host nvcr.io/nvidia/tritonserver:21.04-py3-sdk ``` ![](https://i.imgur.com/bWBtJj6.png) ### Running The Image Classification Example From within the nvcr.io/nvidia/tritonserver:<xx.yy>-py3-sdk image, run the example image-client application to perform image classification using the example densenet_onnx model. To send a request for the densenet_onnx model use an image from the /workspace/images directory. In this case we ask for the top 3 classifications. ```javascript= $ /workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg Request 0, batch size 1 Image '/workspace/images/mug.jpg': 15.346230 (504) = COFFEE MUG 13.224326 (968) = CUP 10.422965 (505) = COFFEEPOT ``` ![](https://i.imgur.com/e6285m6.png) ### Final docker ps and docker images -a result ![](https://i.imgur.com/ECuYDU1.png) ### References: 1. https://github.com/triton-inference-server/server/blob/master/docs/quickstart.md#install-triton-docker-image 2. https://github.com/NVIDIA/nvidia-docker 3. https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker