Install CUDA - HackMD

# Install CUDA https://koding.work/how-to-install-cuda-and-cudnn-to-ubuntu-20-04/ ## 若已裝錯版本，請依照以下步驟執行 1.重新安裝driver ```bash $ sudo apt remove '^nvidia' $ sudo apt autoremove $ sudo reboot $ sudo apt install nvidia-driver-xxx ``` 2. 修改~/.bashrc ```bash $ vim ~/.bashrc 找到export PATH=/usr/local/cuda-11.4/bin${PATH:+:${PATH}}並刪除 (dd 刪除整行) 找到export LD_LIBRARY_PATH=/usr/local/cuda11.4/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}並刪除 (dd 刪除整行) :wq $ sudo reboot ``` ## Driver Error (reinstall driver) https://clay-atlas.com/us/blog/2022/07/29/solved-nvidia-smi-has-failed-because-it-couldnt-communicate-with-the-nvidia-driver-make-sure-that-the-latest-nvidia-driver-is-installed-and-running/ ``` NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. ``` ```bash sudo apt purge nvidia-* sudo apt search nvidia-driver* sudo apt install nvidia-driver<NVIDIA DRIVER VERSION> ``` ## CUDA 11.7 Installation 1. Upgrade the system, and reboot if required. ```bash $ sudo apt update $ sudo apt upgrade $ sudo reboot ``` 2. Install CUDA Toolkit from official site. ```bash $ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin $ sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600 $ wget https://developer.download.nvidia.com/compute/cuda/11.7.0/local_installers/cuda-repo-ubuntu2004-11-7-local_11.7.0-515.43.04-1_amd64.deb $ sudo dpkg -i cuda-repo-ubuntu2004-11-7-local_11.7.0-515.43.04-1_amd64.deb $ sudo cp /var/cuda-repo-ubuntu2004-11-7-local/cuda-*-keyring.gpg /usr/share/keyrings/ $ sudo apt-get update $ sudo apt-get -y install cuda ``` 3. Add CUDA into the PATH. ```bash $ echo 'export PATH=/usr/local/cuda-11.7/bin${PATH:+:${PATH}}' >> ~/.bashrc $ echo 'export LD_LIBRARY_PATH=/usr/local/cuda11.7/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc ``` 4. Reboot and verify the installation. ```bash $ sudo reboot $ nvcc --version $ nvidia-smi ``` 5. Install Pytorch https://pytorch.org/ 6. CPU/GPU memory usage ```python !ln -sf /opt/bin/nvidia-smi /usr/bin/nvidia-smi !pip install gputil !pip install psutil !pip install humanize import psutil import humanize import os, time import GPUtil as GPU def worker(): while True: process = psutil.Process(os.getpid()) print("Gen RAM Free: " + humanize.naturalsize( psutil.virtual_memory().available `enter code here`), " I Proc size: " + humanize.naturalsize( process.memory_info().rss)) # print("CPU Memory usage(MB): ", process.memory_info().rss/ 1024 ** 2, " MB") gpu = GPU.getGPUs()[0] print("GPU RAM Free: {0:.0f}MB | Used: {1:.0f}MB | Util {2:3.0f}% | Total {3:.0f}MB".format(gpu.memoryFree, gpu.memoryUsed, gpu.memoryUtil*100, gpu.memoryTotal)) time.sleep(6) import threading t = threading.Thread(target=worker, name='Monitor') t.start() ```