How to Enable GPU Support for Tensorflow in Windows and in Ubuntu 18.04

# How to Enable GPU Support for Tensorflow in Windows and in Ubuntu 18.04 ###### tags: `gpu` `tensorflow` `dissertation` `jupyter notebook` `cuda` `windows 10` `ubuntu` `anaconda` `conda` `data science` `machine learning` ## Overview Tensorflow, by default, uses CPU, which is in general 20x slower than using a GPU. In order to exploit the power of GPU, we need to satify the following conditions: 1. The computer has a [CUDA-enabled GPU](https://developer.nvidia.com/cuda-gpus) 2. The GPU has a proper driver that works with Tensorflow 3. Tensorflow and dependencies are properly installed 4. Your OS knows how to communicate with the GPU driver ## The computer has a CUDA-enabled GPU Look for computeres that have a GPU that is [CUDA-enabled ](https://developer.nvidia.com/cuda-gpus), as found in the list. ## The GPU has a proper driver that works with Tensorflow The newest driver is not always the best driver. Its version has to be compatible with Tensorflow. The easiest way to make sure this happens is to use `Anaconda`. 1. First, download [Anaconda](https://www.anaconda.com/products/individual) and install it. 2. In Ubuntu 18.04 LTS, the latest `conda` works well in resolving dependency issues of packages for the newest version of python. Thus, all you have to do is run `conda create --name tf_gpu` and then `conda activate tf_gpu` to activate it. Then `conda install tensorflow-gpu`, which should do it. 3. In Windows, though, the newest `conda` doesn't always solve dependency issues readily and the newest `tensorflow` is not always supported. Thus, you need to downgrade `conda` itself. After downloading conda, you run `conda config --set allow_conda_downgrades true` and `conda config --set auto_upgrade_conda false` to allow conda to be downgraded and to prevent it from automatically upgrading itself to the newest buggy version. Then, you run `conda install conda=4.6` (for example) to use an older version of conda that works well with dependency problems. Afterwards, you do the same as you would in Ubuntu, i.e. `conda create --name tf_gpu` and then `conda activate tf_gpu` to activate it. Then `conda install tensorflow-gpu`. 4. As implied, the benefit of using Ubuntu (and other Linux distributions) is that you get to use newer versions of `Tensorflow` 5. When the conda environment `tf_gpu` is active (as you ran `conda activate tf_gpu`), you run `jupyter notebook`. In the notebook, run the following test to know if tensorflow knows to use GPU. **If the result is non-empty, then it works**. If it is empty, then it's probable that your OS doesn't know how to communicate with your GPU driver. ``` import tensorflow as tf tf.test.gpu_device_name() # will be depricated soon tf.config.list_physical_devices('GPU') # this is the newer way ``` ### Update For Windows 1. https://www.tensorflow.org/install/gpu 2. if it stops working, download all necessary components 3. **DO NOT USE EXPRESS SETUP**. USE CUSTOM SETUP AND MANUALLY INSTALL ALL COMPONENTS. ### Ensure that your OS knows how to communicate with GPU driver In Windows, this is usually not a problem at all. Simply find your computer, and download the GPU driver from their official website. In Ubuntu, the fact that it comes with some preinstalled GPU driver(s) could make this process a pain. Thus, the following notes are for Ubuntu users. 1. run the test to know if your OS knows how to communicate with GPU driver, by running in the shell `nvidia-smi`. If this doesn't return an error message, you are fine. If it does, try the following steps. 2. Know your GPU and directly search for it on the Nvidia website. My GPU was Nvidia GeForce RTX 2060, so I googled "ubuntu nvidia driver geforce RTX 2060" and got a `.run` file [here](https://www.nvidia.com/Download/driverResults.aspx/141847/en-us). 3. I ran the .run file by first making it executable using the command `sudo chmod +x <path_to_the .run file>`, and then `sudo ./<path to the .run file>`. The error messages returned by the `.run` file were extremely helpful. This `.run` file told me that there was already an installation of nvidia-450, which can be uninstalled using `sudo apt-get remove --purge nvidia-450`, which did not work and returned an error message “cannot locate package”. But a werid solution was to append ‘-driver’ to the name of the package, i.e. `sudo apt-get remove --purge nvidia-driver-450` , which served to remove the pacakge. Then, of course, I `sudo apt-get install nvidia-driver-450`, which asked me to change settings for secure boot, and to reboot. While, rebooting, there was an extra blue screen that asked me to change mok (not sure what this is). I found a way to enter a secure boot password, and voila – after rebooting, `nvidia-smi` worked (stopped returning error messages)! 4. Back to juypter notebook, after activating the conda environment `conda activate tf_gpu`, I ran the gpu test and result was no longer empty. ``` import tensorflow as tf tf.test.gpu_device_name() ``` 5. Occassionaly, the `.run` file may prompt you to use an application called `software and updates`, which may "better integrate with your system". In this case, follow the instruction, and run `software and updates` > `additional drive` and choose `nvidia ...` (note -- this is a different app than `software`). # Trial and Error Log ## In Ubuntu 18.04 LTS ### Jupyter Notebook can't find mounted drive (solved) Tried mounting drive, but didn't work. Tried navigating to the mounted drive, and open termnial there. There, typed `jupyter notebook` and it worked. ### how to install tensorflow-gpu with jupyter notebook (didn't work) https://medium.com/@birkann/install-tensorflow-2-0-with-gpu-support-and-jupyter-notebook-db0eeb3067a1 ### Couldn't install cuda due to dependency problems I followed official guide https://forums.developer.nvidia.com/t/cuda-install-unmet-dependencies-cuda-depends-cuda-10-0-10-0-130-but-it-is-not-going-to-be-installed/66488/4 and chose ubuntu 18.4 debian local, didn't work the error message was that there were dependency problems. Error message after cuda 11.0 first update >NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running The failure of NVIDIA-SMI may have to do with Nouveau [https://docs.anaconda.com/anaconda/user-guide/tasks/gpu-packages/](/eahr6WE8TYSjO5DwQVy6CQ) >... Ubuntu and some other Linux distributions ship with a third party open-source driver for NVIDIA GPUs called Nouveau. CUDA requires replacing the Nouveau driver with the official closed source NVIDIA driver... #### solution 1 (didn't work) source: https://askubuntu.com/questions/598607/package-dependency-problem-while-installing-cuda-on-ubuntu-14-04 used aptitude to apparently solve the problem (NOT REALLY) ``` sudo apt-get install aptitude Install main package sudo aptitude install cuda ``` #### solution 2 (didn't work) It seems that you have installed certain nvidia drivers or deficient cuda toolkit previously, like some contradictory versions from the ubuntu's repository, so you should remove them firstly. ``` sudo apt-get purge nvidia-* sudo apt-get autoremove and then install cuda. sudo apt-get install cuda ``` #### solution 3 (didn't work) installing nvidia cuda toolkit ```sudo apt install nvidia-cuda-toolkit``` #### solution 3.1 (finally worked) apparently the reason the 3 previous solutions didn't work was that the new installation conflicted with previous nvidia drivers. Thus, I tried googling my own gpu (NVidia Geforce RTX 2060) and found a [driver](https://www.nvidia.com/content/DriverDownload-March2009/confirmation.php?url=/XFree86/Linux-x86_64/415.27/NVIDIA-Linux-x86_64-415.27.run&lang=us&type=TITAN) from the official website here. The file was a .run file for Linux, which needs to be changed to executable by `sudo chmod +x <filepath>` In my case, it was (after `cd` ing my way to the folder that stored the `.run` file) `sudo chmod +x NVIDIA-Linux-x86_64-415.27.run` and then `sudo ./NVIDIA-Linux-x86_64-415.27.run` to excute it. This the executable told me that there was already an installation of `nvidia-450`, which can be uninstalled using `sudo apt-get remove --purge nvidia-450`, which did not work and returned an error message "cannot locate package". But a werid solution was to append '-driver' to the name of the package, i.e. `sudo apt-get remove --purge nvidia-driver-450 `, which served to remove the pacakge. Then, of course, I `sudo apt-get install nvidia-driver-450`, which asked me to change settings for secure boot, and to reboot. While, rebooting, there was an extra blue screen that asked me to change `mok` (not sure what this is). I found a way to enter a secure boot password, and voila -- after rebooting, `nvidia-smi` worked! ## Accidentally found a way to use tensorflow-gpu in Windows #### solution 4 apparently anaconda takes care of it. https://www.anaconda.com/blog/tensorflow-in-anaconda https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf https://towardsdatascience.com/tensorflow-gpu-installation-made-easy-use-conda-instead-of-pip-52e5249374bc?gi=92c3a271721 No it didn't work. However, it seems to be a problem of Anaconda's newer versions. https://github.com/conda/conda/issues/9367 Thus, one way is to try to downgrade anaconda. #### solution 4.1 (works) Downgrading anaconda >Run this code to allow downgrades: `conda config --set allow_conda_downgrades true` Find and download the standalone conda executable you want here: https://repo.anaconda.com/pkgs/misc/conda-execs/. I went with 4.7.5 and it's been fine so far. Run this to install the downloaded executable into your existing directory: <executable path> install -p <path to broken installation> conda=<version number> <executable path> is the path to the downloaded .exe. <path to broken installation> is just your main Anaconda folder. <version number> is whatever executable number you've decided to go with. That worked for me, anyway! Once that goes through, run `conda config --set auto_update_conda false`. Otherwise installing packages will just get you right back to the buggy version. Install your packages and wait until a confirmed fix for this is rolled out before upgrading your conda again :D Apparently an easier way to downgrade anaconda is `conda config --set allow_conda_downgrades true` then `conda install conda=4.6` This seems to work, but if I specify the python version to be 3.6, then I get tensorflow 1.15. With python versoin 3.7.0, I get tensorflow 2.1.0. Thus, I would use python 3.7.0. Fortunately, this concurs with the version Francois Chollet was using in Deep Learning with Python (the ebook), but instead of `from keras import layers`, you should use `from tensorflow.keras import layers`. Basically replacing all instances of `import keras` with `import tensorflow.keras` #### Full instructions (for installation in Windows, with GPU) 1. download anaconda and install it 2. open *anaconda prompt* (click on the small search box beside the Windows start-button, and type "anaconda") 3. run `conda config --set allow_conda_downgrades true` (type it in and hit Enter) 4. run `conda config --set auto_update_conda false` 5. run `conda install conda=4.6` 6. then create a new environment by running `conda create --name tf_gpu python==3.7.0` 7. `conda activate tf_gpu` 8. `conda install tensorflow-gpu` 9. When running Jupyter notebook, be sure to activate the conda environmment `tf_gpu` that you just created first by running `conda activate tf_gpu`, and then `jupyter notebook`, in your powershell #### Full instructions (for installation in Windows, without GPU) 1. download anaconda and install it 2. open *anaconda prompt* (click on the small search box beside the Windows start-button, and type "anaconda") 3. run `conda config --set allow_conda_downgrades true` (type it in and hit Enter) 4. run `conda config --set auto_update_conda false` 5. run `conda install conda=4.6` 6. then create a new environment by running `conda create --name tf_no_gpu python==3.7.0` 7. `conda activate tf_no_gpu` 8. `conda install tensorflow` 9. When running Jupyter notebook, be sure to activate the conda environmment `tf_gpu` that you just created first by running `conda activate tf_no_gpu`, and then `jupyter notebook`, in your powershell ### Tips to using Anaconda 1. `conda create --name yourenvironemnt` to create an environment 2. `conda env list` see all conda environments available 3. `conda activate thenameofyourenvironment` to activate an environment 4. `conda deactivate` to deactivate an environment 5. `conda config --show` to see all available settings and values Creating a virtual environment using `conda`, instead of `pip`, benefits from conda's full capcity of more readily resolving dependency problems, which is especially helpful if your main goal is machine learning, as packages like `tensorflow` are picky about versions. ### Special Issue: what happens if you use !pip install in Jupyter Notebook The brief answer is that jupyter uses the excutable `pip` determined by the activated jupyter kernel at the moment, but it doesn't always match the `pip` you use directly in the shell command. Helpful tips to debugging: 1. To know which `pip` is being used by jupyter notebook, run `sys.executable` in jupyter. 2. To know which `pip` (or generally other commands) is being used in the shell, run `type -a pip`. 3. In general, each python executable has its separate site-packages, and thus to remedy the headache of not knowing where things are installed, it is sometimes helpful to know which python executable is being used. 4. For each package, you can import it, and run `<package_name>.__path__` to know which one is being used. Source: https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/ The general distinction between `conda` and `pip`. >pip installs python packages in any environment. conda installs any package in conda environments. >pip installs packages in the Python in its same path conda installs packages in the current active conda environment General suggestions: >If you installed Python using Anaconda or Miniconda, then use `conda` to install Python packages. If conda tells you the package you want doesn't exist, then use pip (or try `conda-forge`, which has more packages available than the default conda channel). If you installed Python any other way (from source, using pyenv, virtualenv, etc.), then use `pip` to install Python packages Briefly speaking, `conda` can be thought of as enhanced `pip`, but when you use `conda install` in an environment created by using `conda create --name <name_of_the_environment>`, it, by default, installs a compatible version of `pip` for your python. #### Tips 1. use `type -a [command]` to locate `pip3`, `conda`, `python` etc. 2. use `sys.path` and `[packagename].__path__` to know what path python is using to find the package 3. `python -m pip install <package_name>` forces the `pip` that at least matches the python executable invoked by `python` to be used and thus is safter than `pip install <package_name></package_name>`. The `-m` parameter specifies that the following stuff is a module name, which, in this case, is `pip` (i.e. you can't just type `python pip install...`).