CentOS-Ollama-NVIDIA-Docker-OpenWebUI

installed CentOS9 Stream. dnf update Reference-- https://medium.com/@blackhorseya/step-by-step-guide-to-installing-nvidia-drivers-on-rhel-9-1107e0cd641d *skipped RHEL 9 CodeReady Builder repo step* dnf makecache *Install the epel-release package* dnf install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm *Update the DNF package repository cache again to ensure all changes take effect* dnf update *Install Dependencies and Build Tools* dnf install -y kernel-devel-$(uname -r) kernel-headers-$(uname -r) gcc make dkms acpid libglvnd-glx libglvnd-opengl libglvnd-devel pkgconfig *Add the Nvidia CUDA Repository* dnf config-manager --add-repo http://developer.download.nvidia.com/compute/cuda/repos/rhel9/$(uname -i)/cuda-rhel9.repo *Install Nvidia GPU Drivers* sudo dnf module install -y nvidia-driver:open-dkms *reboot System* lsmod | grep nvidia # Should return results lsmod | grep nouveau # Should return nothing ![image](https://hackmd.io/_uploads/HybtGiknR.png) Looks good! * if you need to remove the drivers* dnf remove -y nvidia-driver dnf module reset -y nvidia-driver Reference -- https://docs.docker.com/engine/install/centos/#install-using-the-repository *Install docker* yum install -y yum-utils yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo sudo yum install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin ![image](https://hackmd.io/_uploads/H1RbXj1h0.png) systemctl start docker * To Test * docker run hello-world ![image](https://hackmd.io/_uploads/HyxqQsJnC.png) Refernce -- https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#installing-the-nvidia-container-toolkit curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \ sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo sudo yum install -y nvidia-container-toolkit ![image](https://hackmd.io/_uploads/SkJENsJ3C.png) *Configuring Docker* sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker ![image](https://hackmd.io/_uploads/SkMqVsknR.png) *install ollama* curl -fsSL https://ollama.com/install.sh | sh This script will search for GPU and try to installe drivers. But they are already installed so it should see that as well. ![image](https://hackmd.io/_uploads/B1oXBjJ3A.png) Sees GPU Installed. *running ollama on CLI ollama run llama3.1 -calls llama3.1 LLM. will pull down if not already downloaded use "watch -n 0.5 nvidia-smi" to see status of GPUs. ![image](https://hackmd.io/_uploads/r1EhHiyhA.png) Bottom line showes LLM loaded into memory on GPU 0 asked the LLM "why is the sky blue?" ![image](https://hackmd.io/_uploads/H1r8Uok3R.png) shows GPU 0 under load everything seems to be working. * This will allow multiple LLMs to be loaded (5) and multiple users to send requests at the same time (10)* dnf install vim vim /etc/systemd/system/ollama.service Environment="OLLAMA_NUM_PARALLEL=10" Environment="OLLAMA_MAX_LOADED_MODELS=5" systemctl daemon-reload systemctl restart ollama.service opened another window and ran gemma2 LLM ![image](https://hackmd.io/_uploads/SkZmuiy20.png) now we see both models running. One in each GPU. lets ask both LLMs the same question. ![image](https://hackmd.io/_uploads/B1_Ouo1hC.png) note the GPU utilization and WATT usage. dnf inatll cockpit mounted RAID 5 set (2.7 TB) /mnt/raid5/ open webUI container data will go here "/mnt/raid5/openwebui-data" Now, this is what I have issues. Getting the Open WebUI container running properly. -Reference https://docs.openwebui.com/ To run Open WebUI with Nvidia GPU support docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v /mnt/raid5/openwebui-data:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda ![image](https://hackmd.io/_uploads/HyDuTsJ2C.png) ![image](https://hackmd.io/_uploads/HkUfAjJnR.png) Container is up and running ![image](https://hackmd.io/_uploads/rJ9NRsJhC.png) Looks like the webpage is up. Created initial account and logged in. ![image](https://hackmd.io/_uploads/S1RF0ikhR.png) Try to upload an LLM ![image](https://hackmd.io/_uploads/B1x-yh13R.png) Per this site. this should fix this. https://github.com/open-webui/open-webui/blob/main/TROUBLESHOOTING.md Trying this docker run -d --network=host –gpus all -e OLLAMA_BASE_URL=http://127.0.0.1:11434 -v /mnt/raid5/openwebui-data:/app/backend/data –name open-webui –restart always ghcr.io/open-webui/open-webui:cuda docker run -d --network=host -v /mnt/raid5/openwebui-data:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda at this point the cainter keeps rebooting. ![image](https://hackmd.io/_uploads/BkfyMnyn0.png) docker logs open-webui now it cant find the NVIDIA driver ![image](https://hackmd.io/_uploads/SywoWh12A.png) docker stop open-webui docker rm open-webui went back to original docker run command. Container is up.. but here are the logs. ![image](https://hackmd.io/_uploads/r1A0fn1nA.png) remembered wireguard stuff about adding masquerade to the firewall for doing hairpins within the system. not sure if it applies here.. but. tried it. firewall-cmd --add-masquerade --permanent firewall-cmd --reload does not seemt to work. noticed its trying to connect to 172.17.0.1 is that the container networks gateway? hmmm I think we had this issue before. Cant recall what we had to do. *** Update *** Found the issue. I needed to run the ollama serve command. OLLAMA_HOST=192.168.1.184:11435 ollama serve Once that is up.. you will see the connections to the service. ![image](https://hackmd.io/_uploads/BJvgQ3_3R.png) I can now see this when hitting the port number of my host ![image](https://hackmd.io/_uploads/B1dmX2_3A.png) and now the container can talk to the ollama service on the host after changing the settings. ![image](https://hackmd.io/_uploads/BymYX3uh0.png) I edited the /etc/systemd/system/ollama.service file to add that into the service so it will come up every time with that environment variable. The file contents: [Unit] Description=Ollama Service After=network-online.target [Service] ExecStart=/usr/local/bin/ollama serve User=ollama Group=ollama Restart=always RestartSec=3 Environment="PATH=/root/.local/bin:/root/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin" Environment="OLLAMA_NUM_PARALLEL=10" Environment="OLLAMA_MAX_LOADED_MODELS=5" Environment="OLLAMA_HOST=192.168.1.184:11434 ollama serve" [Install] WantedBy=default.target Save and then reloaded the service. systemctl daemon-reload systemctl restart ollama.service Everything seems to be working! **** Yay me! :) ***