--- title: Nvidia Setup tags: Nvidia description: All the setup for running with GPU --- # Nvidia Driver,Cuda and Cudnn ## Prepared ### nvidia driver https://www.nvidia.com.tw/Download/index.aspx?lang=tw ### cuda > Just find your required cuda version,i.e., cuda 10. ### cudnn https://developer.nvidia.com/rdp/form/cudnn-download-survey ## Validation ### Has the Nvidia ? ```bash lspci | grep -i nvidia ``` ### linux version ? ```bash uname -m && cat /etc/*release ``` ### gcc version ? ```bash gcc –version ``` ### Nvidia driver version ? ```bash cat /proc/driver/nvidia/version ``` ### cuda version ? ```bash nvcc -V ``` **Note:** All local nvidia files were in `/dev/nvidia*` ## Dependency * driver & cuda * https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html * tensorflow & python & cudnn & Cuda * https://www.tensorflow.org/install/source ## NVIDIA Driver ### 1. Remove anything related to Nvidia ```bash sudo apt-get remove --purge nvidia* ``` ```bash sudo apt-get autoremove ``` will remove those dependencies that were installed with applications and that are no longer used by anything else on the system ### 2. Disable Nouveau (your default driver) * Create a file ```bash sudo nano /etc/modprobe.d/blacklist-nouveau.conf ``` Or update original file: ```bash sudo gedit /etc/modprobe.d/blacklist.conf ``` * add this to the above file ```sh blacklist nouveau options nouveau modeset=0 ``` * Then, execute. (update the kernel) ```bash sudo update-initramfs -u ``` ```bash sudo reboot ``` ### 3. Ctrl + Alt + F1 on the log-in screen * Check whether the usage of nouveau driver. If nothing return, then do the next step;else go back to set **2. Disable Nouveau** again ```bash lsmod | grep nouveau ``` * close the X-server ```bash sudo service lightdm stop ``` * chmod ```bash sudo chmod 755 your_driver.run ``` * Install ```bash sudo bash your_driver.run –no-opengl-files ``` **NOTE** It will occur circular logging without `–no-opengl-files` Accept License -Select Continue Installation -Select “NO” to not install 32bit files -Select “NO” to rebuilding any Xserver configurations with Nvidia. * Example ![](https://i.imgur.com/U6SFPZb.png) * open the X-server ```bash sudo service lightdm start ``` ```bash reboot ``` ### Unable to find a suitable destination to install 32-bit compatibility libraries ```bash dpkg --add-architecture i386 apt-get update apt-get install libc6:i386 ``` ## Cuda Install ```bash sudo chmod +x cuda_9.2.148_396.37_linux.run or sudo chmod 755 cuda_9.2.148_396.37_linux.run sudo ./cuda_10.0.148_396.37_linux.run ``` ### Probabaly issues unsupported compiler x.x.x Use `--override` to override this check **solution:** ```bash ./cuda_10.0.148_396.37_linux.run --override ``` Message: ``` Do you accept the previously read EULA? accept/decline/quit: accept ``` ### Cuda Path: /usr/local/ ### Cuda Environment Allocation ```bash nano ~/.bashrc ``` ```bash export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64 export PATH=$PATH:/usr/local/cuda/bin export CUDA_HOME=$CUDA_HOME:/usr/local/cuda ``` ```bash source ~/.bashrc ``` ## Multiple Cudas ### See the current soft link to which cuda version ```bash stat cuda ``` ### See the current cuda version ```bash nvcc --version ``` ### rm our old linked file and redirect to other cuda version ```bash sudo rm -rf cuda ``` ```bash sudo ln -s /usr/local/cuda-10.0 /usr/local/cuda ``` ### Check the result ```bash nvcc --version ``` ```bash cd /usr/local ``` delete previous allocated CUDA ```bash sudo rm cuda ``` Point symlink /usr/local/cuda to default version ```bash sudo ln -s cuda-10.0 cuda ``` ### Probabaly issues Cannot update the cuda version by changing soft link ```bash cat /usr/local/cuda/version.txt nvcc --version ``` There are different versions. **solution:** ```bash sudo vim /etc/profile ``` ```bash export PATH=/usr/local/cuda/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda/lib64$LD_LIBRARY_PATH export CUDA_HOME=/usr/local/cuda ``` ```bash source /etc/profile ``` ## (Optional) Cudnn Install https://developer.nvidia.com/rdp/cudnn-download i.e., cuDNN v7.3.1 Library for Linux ```bash tar -xzvf cudnn-10.0-linux-x64-v6.0.tgz sudo cp cuda/include/cudnn.h /usr/local/cuda-10.0/include sudo cp cuda/lib64/libcudnn* /usr/local/cuda-10.0/lib64 sudo chmod a+r /usr/local/cuda-10.0/include/cudnn.h /usr/local/cuda-10.0/lib64/libcudnn* ``` ### Download Neural Network Frameworks ```bash pip install tensorflow-gpu & torch & keras ``` # Supplement ## Python - pyenv https://realpython.com/intro-to-pyenv/ - download python 3.7 - `sudo apt-get install libffi-dev` - `sudo apt-get install python==3.7` - 更改python版本順序 `sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.7 1` or `sudo update-alternatives --set /usr/bin/python3.7 python3` - 查看版本且調整 `sudo update-alternatives --config python3`