# Run PyTorch With GPU Power
When training or performing inference with a model on Ubuntu, utilizing a GPU is very important. This guide will simplify the process of setting up a new environment using a GPU to run PyTorch models.
## Prerequisite
Before installing anything, please ensure that your older NVIDIA driver or CUDA installation is completely uninstalled. You can follow the commands below to clean up your system:
1. Purge the CUDA Toolkit
Run the following command to remove all CUDA-related packages:
```bash
sudo apt-get --purge remove "*cublas*" "*cufft*" "*curand*" "*cusolver*" "*cusparse*" "*npp*" "*nvjpeg*" "cuda*" "nsight*"
```
2. Remove NVIDIA Drivers
Use this command to uninstall all NVIDIA drivers:
```bash
sudo apt-get purge nvidia-*
```
3. Clean Residual Files
Delete any remaining CUDA-related files:
```bash
sudo rm -rf /usr/local/cuda*
sudo rm -rf /etc/cuda
```
## CUDA Runtime & CUDA Compiler
First, to utilize GPU power, using CUDA is the mature way to go. I will briefly explain what the CUDA Runtime and CUDA Compiler are.
### CUDA Compiler (nvcc)
- **Purpose**: Compiles CUDA programs.
- **Functionality**:
- Translates CUDA code into executable code.
- Handles both GPU (device) and CPU (host) code.
- **Output**: Creates an executable or a library.
### CUDA Runtime
- **Purpose**: Provides a high-level API for managing GPU resources.
- **Functionality**:
- Manages memory on the GPU.
- Launches GPU kernels.
- Handles streams, events, and error checking.
- **Execution**: Runs during the execution of a CUDA program.
## Install NVIDIA Driver
Before installing CUDA, it is essential to first install the NVIDIA driver for your graphics card. This ensures that the operating system and software can better understand the graphics card configuration.
### Installation Steps
You can simply follow these steps:
```bash
sudo apt install nvidia-common
sudo add-apt-repository ppa:graphics-drivers
sudo apt update
ubuntu-drivers devices # List available driver versions for your hardware
sudo apt install nvidia-driver-5XX # Install the latest version available for your hardware
sudo reboot
```
### Validate Installation
You can run following command to show GPUs info
```bash=
nvidia-smi # for general information
nvidia-smi topo -m # for connection information
```
## Install PyTorch in a Virtual Environment
On Ubuntu, Python 3.X is already installed by default. To run PyTorch, you can build a virtual environment and install the PyTorch package.
1. Install Python virtual environment:
```bash
sudo apt update
sudo apt install python3-venv
```
2. Initialize a virtual environment:
```bash
python3 -m venv venv
source venv/bin/activate
```
3. Follow the instructions on the PyTorch [installation page](https://pytorch.org/get-started/locally/).
The PyTorch library installation via pip already includes the CUDA runtime and cuDNN runtime, so you only need to install the NVIDIA driver to get started.
## Install CUDA Compiler (Optional)
In the previous section, we mentioned that you don't need to install the CUDA runtime separately because PyTorch handles CUDA dependencies. However, some libraries like `deepspeed` or certain applications that require compiling CUDA or PyTorch extensions will need the system CUDA compiler.
For compatibility issues, I strongly recommend installing version officially supported by [PyTorch](https://pytorch.org/get-started/locally/)).
You can follow these instructions to install the CUDA compiler (nvcc) on your PC:
1. Visit the NVIDIA website: [CUDA Downloads](https://developer.nvidia.com/cuda-downloads) or Google `Cuda Toolkit 12.X`.
2. Set up the environment:
```bash
sudo nano ~/.bashrc
```
Add the following lines at the end of the file:
```plaintext
export PATH=/usr/local/cuda-12.X/bin${PATH:+:${PATH}} # if your install cuda 12.X
```
Finally, complete the setup:
```bash
source ~/.bashrc
sudo ldconfig
nvcc -V
```
## Install cuDNN (Not Recommended)
You can also install cuDNN for your local CUDA compiler, but it is not recommended since PyTorch already handles the correct version for utilizing cuDNN. Installing it manually may cause dynamic library conflicts.
1. Follow the instructions on the NVIDIA website: [cuDNN Downloads](https://developer.nvidia.com/rdp/cudnn-download)
## Special Case for 5090 (Modified at 2025.06.05)
1. using Ubuntu 22.04 with latest driver (570) will encounter No device found issue
2. Follow: https://forums.developer.nvidia.com/t/nvidia-rtx-5090-not-detected-by-nvidia-smi-on-ubuntu-server-24-04/327409/20, update Ubuntu to 24.10, with appropriate BIOS setting, the problem is solved
3. 5090 need pytorch with cuda>=12.8 support.
## References
1. [Could Not Load Library libcudnn_cnn_train.so.8](https://discuss.pytorch.org/t/could-not-load-library-libcudnn-cnn-train-so-8-but-im-sure-that-i-have-set-the-right-ld-library-path/190277/4)
2. [Installing NVIDIA Driver 535 CUDA 12.2 cuDNN 12.x on Ubuntu 22.04](https://jackfrisht.medium.com/install-nvidia-driver-via-ppa-in-ubuntu-18-04-fc9a8c4658b9)
3. [Installing NVIDIA Driver on Ubuntu 20.04](https://medium.com/@scofield44165/ubuntu-20-04%E4%B8%AD%E5%AE%89%E8%A3%9Dnvidia-driver-cuda-11-4-2%E7%89%88-cudnn-install-nvidia-driver-460-cuda-11-4-2-cudnn-6569ab816cc5)