PyTorch GPU Environment on Ubuntu OS

# Run PyTorch With GPU Power When training or performing inference with a model on Ubuntu, utilizing a GPU is very important. This guide will simplify the process of setting up a new environment using a GPU to run PyTorch models. ## Prerequisite Before installing anything, please ensure that your older NVIDIA driver or CUDA installation is completely uninstalled. You can follow the commands below to clean up your system: 1. Purge the CUDA Toolkit Run the following command to remove all CUDA-related packages: ```bash sudo apt-get --purge remove "*cublas*" "*cufft*" "*curand*" "*cusolver*" "*cusparse*" "*npp*" "*nvjpeg*" "cuda*" "nsight*" ``` 2. Remove NVIDIA Drivers Use this command to uninstall all NVIDIA drivers: ```bash sudo apt-get purge nvidia-* ``` 3. Clean Residual Files Delete any remaining CUDA-related files: ```bash sudo rm -rf /usr/local/cuda* sudo rm -rf /etc/cuda ``` ## CUDA Runtime & CUDA Compiler First, to utilize GPU power, using CUDA is the mature way to go. I will briefly explain what the CUDA Runtime and CUDA Compiler are. ### CUDA Compiler (nvcc) - **Purpose**: Compiles CUDA programs. - **Functionality**: - Translates CUDA code into executable code. - Handles both GPU (device) and CPU (host) code. - **Output**: Creates an executable or a library. ### CUDA Runtime - **Purpose**: Provides a high-level API for managing GPU resources. - **Functionality**: - Manages memory on the GPU. - Launches GPU kernels. - Handles streams, events, and error checking. - **Execution**: Runs during the execution of a CUDA program. ## Install NVIDIA Driver Before installing CUDA, it is essential to first install the NVIDIA driver for your graphics card. This ensures that the operating system and software can better understand the graphics card configuration. ### Installation Steps You can simply follow these steps: ```bash sudo apt install nvidia-common sudo add-apt-repository ppa:graphics-drivers sudo apt update ubuntu-drivers devices # List available driver versions for your hardware sudo apt install nvidia-driver-5XX # Install the latest version available for your hardware sudo reboot ``` ### Validate Installation You can run following command to show GPUs info ```bash= nvidia-smi # for general information nvidia-smi topo -m # for connection information ``` ## Install PyTorch in a Virtual Environment On Ubuntu, Python 3.X is already installed by default. To run PyTorch, you can build a virtual environment and install the PyTorch package. 1. Install Python virtual environment: ```bash sudo apt update sudo apt install python3-venv ``` 2. Initialize a virtual environment: ```bash python3 -m venv venv source venv/bin/activate ``` 3. Follow the instructions on the PyTorch [installation page](https://pytorch.org/get-started/locally/). The PyTorch library installation via pip already includes the CUDA runtime and cuDNN runtime, so you only need to install the NVIDIA driver to get started. ## Install CUDA Compiler (Optional) In the previous section, we mentioned that you don't need to install the CUDA runtime separately because PyTorch handles CUDA dependencies. However, some libraries like `deepspeed` or certain applications that require compiling CUDA or PyTorch extensions will need the system CUDA compiler. For compatibility issues, I strongly recommend installing version officially supported by [PyTorch](https://pytorch.org/get-started/locally/)). You can follow these instructions to install the CUDA compiler (nvcc) on your PC: 1. Visit the NVIDIA website: [CUDA Downloads](https://developer.nvidia.com/cuda-downloads) or Google `Cuda Toolkit 12.X`. 2. Set up the environment: ```bash sudo nano ~/.bashrc ``` Add the following lines at the end of the file: ```plaintext export PATH=/usr/local/cuda-12.X/bin${PATH:+:${PATH}} # if your install cuda 12.X ``` Finally, complete the setup: ```bash source ~/.bashrc sudo ldconfig nvcc -V ``` ## Install cuDNN (Not Recommended) You can also install cuDNN for your local CUDA compiler, but it is not recommended since PyTorch already handles the correct version for utilizing cuDNN. Installing it manually may cause dynamic library conflicts. 1. Follow the instructions on the NVIDIA website: [cuDNN Downloads](https://developer.nvidia.com/rdp/cudnn-download) ## Special Case for 5090 (Modified at 2025.06.05) 1. using Ubuntu 22.04 with latest driver (570) will encounter No device found issue 2. Follow: https://forums.developer.nvidia.com/t/nvidia-rtx-5090-not-detected-by-nvidia-smi-on-ubuntu-server-24-04/327409/20, update Ubuntu to 24.10, with appropriate BIOS setting, the problem is solved 3. 5090 need pytorch with cuda>=12.8 support. ## References 1. [Could Not Load Library libcudnn_cnn_train.so.8](https://discuss.pytorch.org/t/could-not-load-library-libcudnn-cnn-train-so-8-but-im-sure-that-i-have-set-the-right-ld-library-path/190277/4) 2. [Installing NVIDIA Driver 535 CUDA 12.2 cuDNN 12.x on Ubuntu 22.04](https://jackfrisht.medium.com/install-nvidia-driver-via-ppa-in-ubuntu-18-04-fc9a8c4658b9) 3. [Installing NVIDIA Driver on Ubuntu 20.04](https://medium.com/@scofield44165/ubuntu-20-04%E4%B8%AD%E5%AE%89%E8%A3%9Dnvidia-driver-cuda-11-4-2%E7%89%88-cudnn-install-nvidia-driver-460-cuda-11-4-2-cudnn-6569ab816cc5)