Oobabooga on Fedora 38

# Oobabooga on Fedora 38 Today we're going to run [Ooobabooga](https://github.com/oobabooga/text-generation-webui/) -- the text generation UI to run large language models (LLMs) on your local machine. We'll make it containerized so that you don't ## Requirements Looks like we'll need podman compose if you don't have it... * Fedora 38 * A nVidia GPU * Docker (might be optional*) * Podman (typically included by default) * podman-compose (optional) * The nVidia drivers If you want podman compose, pick up: ``` pip3 install --user podman-compose ``` Also why Docker and podman? At the time of writing I only have it working for a build in docker, but only running in podman. I'll fix this later, but for now, that's what I've got. You're also going to need to install the nVidia driver, and the nVidia container tools Before you install CUDA, do a `dnf update` (otherwise I wound up with mismatched deps), then install [CUDA Toolkit](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Fedora&target_version=37&target_type=rpm_local) (link is for F37 RPM, but it worked fine on F38) And the container tools: ``` curl -s -L https://nvidia.github.io/libnvidia-container/centos8/libnvidia-container.repo | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo sudo dnf install nvidia-container-toolkit nvidia-docker2 ``` (nvidia docker 2 might not be required.) If you need more of a reference for GPUs on Red Hat flavored linuxes, [this article from the red hat blog is very good](https://www.redhat.com/en/blog/how-use-gpus-containers-bare-metal-rhel-8) ## Let's get started In my experience, you've gotta use podman for GPU support in Fedora 38 (and probably a few versions earlier, is my guess). Go ahead and clone this guy... ``` https://github.com/oobabooga/text-generation-webui ``` From their [README](https://github.com/oobabooga/text-generation-webui#alternative-docker), you've gotta set this up to do the container build... ``` ln -s docker/{Dockerfile,docker-compose.yml,.dockerignore} . cp docker/.env.example .env # Edit .env and set TORCH_CUDA_ARCH_LIST based on your GPU model docker compose up --build ``` *Importantly* -- you've got to set the `TORCH_CUDA_ARCH_LIST`. You can check that you've got the right one from [this grid on wikipedia](https://en.wikipedia.org/wiki/CUDA#GPUs_supported) ``` TORCH_CUDA_ARCH_LIST=8.6+PTX ``` First, try building ti with podman -- it worked for me on the second attempt. Unsure what went wrong, but I built with... ``` podman build -t dougbtv/oobabooga . ``` I couldn't build it with podman, the first time. So I did... ``` docker compose build ``` *WARNING*: These are some BIG images. I think mine came out to ~48 gigs. (And oddly enough, my second attemp with podman was "only" 16 gigs) And then I loaded that image it into podman... ``` docker save text-generation-webui-text-generation-webui:latest --output ooba.tar podman load --input ooba.tar ``` *(Yeah, piping it didn't work for me, but you can try it yourself)* I need make a few mods before I can run it... Copy the .env file also to the docker folder (we could probably improve this with a symlink in an earlier step). And while we're here we'll need to copy the template prompts, presets, too. ``` cp .env docker/.env cp prompts/* docker/prompts/ cp presets/* docker/presets/ ``` Now you'll need at least a model, so to download one leveraging the container image... ``` podman-compose run --entrypoint "/bin/bash -c 'source venv/bin/activate; python download-model.py TheBloke/stable-vicuna-13B-GPTQ'" text-generation-webui ``` Naturally, change `TheBloke/stable-vicuna-13B-GPTQ` to whatever model you want. You'll find the model in... ``` ls ./docker/models/ ``` I also modify the docker/.env to change this line to... ``` CLI_ARGS=--model TheBloke_stable-vicuna-13B-GPTQ --chat --model_type=Llama --wbits 4 --groupsize 128 --listen ``` However, I run it by hand with: ``` podman run \ --env-file /home/doug/ai-ml/text-generation-webui/docker/.env \ -v /home/doug/ai-ml/text-generation-webui/characters:/app/characters \ -v /home/doug/ai-ml/text-generation-webui/extensions:/app/extensions \ -v /home/doug/ai-ml/text-generation-webui/loras:/app/loras \ -v /home/doug/ai-ml/text-generation-webui/models:/app/models \ -v /home/doug/ai-ml/text-generation-webui/presets:/app/presets \ -v /home/doug/ai-ml/text-generation-webui/prompts:/app/prompts \ -v /home/doug/ai-ml/text-generation-webui/softprompts:/app/softprompts \ -v /home/doug/ai-ml/text-generation-webui/docker/training:/app/training \ -p 7860:7860 \ -p 5000:5000 \ --gpus all \ -i \ --tty \ --shm-size=512m \ localhost/dougbtv/oobabooga:latest ``` (If you're smarter than me, you can get it running with podman-compose at this point) **At this point, you should be done, grats!** It should give you a web address, fire it up and get on generating! ### Mount your models somewhere I wound up bind mounting some directories... ``` sudo mount --bind /home/doug/ai-ml/oobabooga_linux/text-generation-webui/models/ docker/models/ sudo mount --bind /home/doug/ai-ml/oobabooga_linux/text-generation-webui/presets/ docker/presets/ sudo mount --bind /home/doug/ai-ml/oobabooga_linux/text-generation-webui/characters/ docker/characters/ ``` ### Bonus note: I also wound up changing my dockerfile to install a torch+cu118, in case that helps you. So I changed out two lines that looked like this diff: ``` - pip3 install torch torchvision torchaudio && \ + pip3 install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2 -f https://download.pytorch.org/whl/cu118/torch_stable.html && \ ``` I'm not sure how much it helped, but, I kept this change after I made it. I'm hopeful to submit a patch for https://github.com/RedTopper/Text-Generation-Webui-Podman which isn't building for me right now hopefully integrating what I learned from this. And then have the whole thing in podman, later. ## Stupid things I ran into before I got there... But... now I'm getting: ``` RuntimeError: CUDA error: no kernel image is available for execution on the device ``` I tried messing with the TORCH_CUDA_ARCH_LIST in the .env file and change it to 8.6+PTX, 8.0, etc, the whole list, commented out, no luck. I created an issue in the meanwhile: https://github.com/oobabooga/text-generation-webui/issues/2002 Ok, I'm trying again but this time I'm modifying TORCH_CUDA_ARCH_LIST in the Dockerfile and using just "8.0;8.6+PTX" to see if that helps... Nope. Same thing. I go and stare and compare against stablediffusion/vlad ``` torch 2.0.1+cu118 torchvision 0.15.2+cu118 ``` That does differ, in the image I have... ``` (venv) root@56efd82685f3:/app# pip3 list | grep -i torch torch 2.0.1 torchaudio 2.0.2 torchvision 0.15.2 ``` So, that's different. I have the 2.0.1+cu118 for stablediffusion... Didn't work. ``` - pip3 install torch torchvision torchaudio && \ + pip3 install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2 -f https://download.pytorch.org/whl/cu118/torch_stable.html && \ ``` Did it for both GPTQ-for-llama and for the webui, same thing. Another tactic, keeping my pip install changes, I'm now setting it to 8.6+PTX version in the build, and I also found that because of my funky .env files, I might not have been setting this properly, and it might've even been set to 7.5 (!!!) which looks like a smoking gun. GOT IT. That was it. My github issue: https://github.com/oobabooga/text-generation-webui/issues/2002 which summarizes my mistake, basically. ## hey I found this podman image repo! https://github.com/RedTopper/Text-Generation-Webui-Podman and I forked it. It looks like it could possibly need updates. # attempt 2: without docker https://github.com/oobabooga/text-generation-webui/releases/download/installers/oobabooga_linux.zip ``` wget https://github.com/oobabooga/text-generation-webui/releases/download/installers/oobabooga_linux.zip unzip oobabooga_linux.zip rm oobabooga_linux.zip cd oobabooga_linux/ chmod +x start_linux.sh ``` Now edit start_linux.sh and add this line at the top... ``` alias python3='python3.10' ``` And... ``` ./start_linux.sh ``` This goes fairly smooth but chokes with... ``` INFO:Loading ehartford_WizardLM-13B-Uncensored... ERROR:Failed to load GPTQ-for-LLaMa ERROR:See https://github.com/oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md ```