Try   HackMD
tags: Nvidia Cuda Linux

How to Install/Reinstall Nvidia Driver with Cuda package in Ubuntu if it crashed

If you are new on install Nvidia stuff on Ubuntu, you may sometimes find out: why there are errors like "something version not match", or the command nvidia-smi is just not working, this guide will guide you though to fix the problem and configurate it correctly without that thick as **** Nvidia installation guide.

Try to remove all nvidia related package with the below command:

apt-get remove --purge ^nvidia-.*
apt-get remove --purge ^libnvidia-.*
apt-get remove --purge ^cuda-.*

It might encouter error when try to remove them, or install Cuda after removing them, then it might require you to run:

apt-get --fix-broken install
apt-get autoremove

Install Cuda

Just install this package with one command, do not install any other nvidia driver, like nvidia-510 or some other similar thing, which can ensure the Driver and Library's version is matched, they will install the related driver and library themselves with the correct matching version:

apt-get -y install cuda

Post-install action to prevent same issue happens again

Disable Auto-update / Unattended Upgrade

Finally, ensure you have disable the auto update from the OS, because it will update the nvidia driver, and will causing the driver version as it would usually mis-match with the Cuda version, and crashed heavily.

To do so, run the below command to edit the file /etc/apt/apt.conf.d/20auto-upgrades:

sudo vi /etc/apt/apt.conf.d/20auto-upgrades

Then comment out all the lines you see there, and edit the file as:

APT::Periodic::Update-Package-Lists "0";
APT::Periodic::Download-Upgradeable-Packages "0";
APT::Periodic::AutocleanInterval "0";
APT::Periodic::Unattended-Upgrade "0";

Then save and quit, it should disable all the update in the OS now, and for the security safety, you need to do the manually upgrade afterward with:

apt-get update && apt-get upgrade

And everytime after the update, it may crash the Cuda again, so you better check it buy running the command below and see if it runs correctly without errors:

nvidia-smi

Remove Nvidia stuff from Auto-update

Or, you may want the unattanded update be active to apply all the security patch, just not those Nvidia stuff.

To do so, run the below command to edit the file /etc/apt/apt.conf.d/50unattended-upgrades, there should be some lines called "Unattended-Upgrade::Package-Blacklist", editing the lines to look like this and save:

Unattended-Upgrade::Package-Blacklist {
        "nvidia-";
        "libnvidia-";
        "cuda";
}

You may also need to consider adding additional lines if you see any other NVIDIA driver names in the output of the command:

apt list --installed | grep nv

Prevent update Nvidia stuff from apt update

Although it is not nessuary, because update all the server packages means you always needs to overlook all the package being update and test, instead of Nvidia stuff only. Some poeple may just want it for specific purpose.

To do so, all you need to use is: apt-mark

For example, if you need to onhold cuda from its update, you may enter the command:

apt-mark hold cuda

And release it with:

apt-mark unhold cuda

This should much easier to editing APT Preferences File instead for the same purpose. apt-mark support fnmatch() and regex format for the package name as well.

For more information, you may take a look with:
https://www.tecmint.com/disable-lock-blacklist-package-updates-ubuntu-debian-apt/