Kuwa user manual (Linux)

# Kuwa User Manual on Linux # 1. Introduction ## 1.1 Overview This user manual provides the information necessary for everyone to effectively use the Kuwa GenAI OS, including the instructions from the [Kuwa AI blog](https://kuwaai.tw/blog), [Kuwa AI Github repository](https://github.com/kuwaai/genai-os/tree/main), and community-contributed notes from [L.J. Chen](https://hackmd.io/@cclljj/r1mIc3tNR) and [S.L. Hsu](https://hackmd.io/@San-Li/ryQvBCiUA). ## 1.2 What is Kuwa? Kuwa is an open, free, secure, and privacy-focused Generative-AI Orchestrating System, including user-friendly WebUI for LLMs, and novel GenAI kernel to support AI-powered applications. ### 1.2.1 Features 1. 🌐 Multi-lingual turnkey solution for GenAI development and deployment on Linux and Windows 2. 💬 Concurrent multi-chat, quoting, full prompt-list import/export/share and more for users 3. 🔄 Flexible orchestration of prompts x RAGs x bots x models x hardware/GPUs 4. 💻 Heterogeneous supports from virtual hosts, laptops, PCs, edge servers to cloud 5. 🔓 Open source, allowing developers to contribute and customize the system according to their needs ## 1.2.2 Architecture ![architecture-0709-d58eeadf388c02f22a32ac8e221c7792](https://hackmd.io/_uploads/Bke-PG7tC.svg) ## 1.2.3 Acknowledgements Many thanks to [Taiwan NSTC TAIDE project](https://taide.tw/index) and [AI Academy](https://aiacademy.tw/) for their early supports to Kuwa. # 2. Getting Started ## 2.1 Installation Guide ### 2.1.1 Instructions * OS version: Ubuntu 22.04 LTS #### 1. Install Docker Refer to [Docker official installation documentation](https://docs.docker.com/engine/install/). ```sh= # Uninstall conflicting packages for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done # Add docker's official GPG key sudo apt-get update sudo apt-get install ca-certificates sudo install -m 0755 -d /etc/apt/keyrings curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc sudo chmod a+r /etc/apt/keyrings/docker.asc # Setup repository echo \ "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \ "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt-get update # Install necessary package sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin # Enable the service sudo systemctl --now enable docker # Enable unattended-update cat <<EOT | tee /etc/apt/apt.conf.d/51unattended-upgrades-docker Unattended-Upgrade::Origins-Pattern { "origin=Docker"; }; EOT ``` * Use `sudo docker run hello-world` to test if docker is installed successfully. ##### 2. (Optional) Install NVIDIA Drivers ```sh= # Update and Upgrade sudo apt update sudo apt upgrade # Remove previous NVIDIA installation sudo apt autoremove nvidia* --purge sudo apt autoclean # Install Ubuntu and NVIDIA drivers ubuntu-drivers devices # get the recommended version sudo ubuntu-drivers autoinstall sudo apt install nvidia-driver-$version # Reboot sudo reboot ``` If reboot is unsuccessful, hold down `shift` key, select `Advanced options for Ubuntu > recovery mode > dpkg`, and follow the instructions to repair broken packages. After reboot, use the command `nvidia-smi` to check if nvidia-driver is installed successfully. possible result: ![螢幕擷取畫面 2024-07-16 231028](https://hackmd.io/_uploads/Syd-1f6FA.png) ##### 3. (Optional) Install CUDA Toolkits Refer to [NVIDIA CUDA official installation guide](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/). ```shell= # Update and Upgrade sudo apt update sudo apt upgrade # Install CUDA toolkit sudo apt install nvidia-cuda-toolkit # Check CUDA install nvcc --version ``` ![螢幕擷取畫面 2024-07-24 204307](https://hackmd.io/_uploads/SyqeyGpF0.png) You can test CUDA on Pytorch: ```sh= sudo apt-get install python3-pip sudo pip3 install virtualenv virtualenv -p py3.10 venv source venv/bin/activate # Install pytorch pip3 install torch torchvision torchaudio pip install --upgrade pip # Test python3 ``` (In python): ```python= import torch print(torch.cuda.is_available()) # should be True t = torch.rand(10, 10).cuda() print(t.device) # should be CUDA ``` expected result: ![螢幕擷取畫面 2024-07-18 115645](https://hackmd.io/_uploads/SJL_oW6YA.png) ##### 4. (Optional) Install NVIDIA Container Toolkit Refer to [NVIDIA Container Toolkit official installation guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html). ```sh= # Setup GPG key curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg # Setup the repository distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update sudo apt-get install -y nvidia-container-toolkit # Configure the NVIDIA runtime to be the default docker runtime sudo nvidia-ctk runtime configure --runtime=docker --set-as-default sudo systemctl restart docker ``` ##### 5. Install Kuwa 1. Download Kuwa Repository ```sh= git clone https://github.com/kuwaai/genai-os/ cd genai-os/docker ``` 2. Change Configuration Files Copy `.admin-password.sample`, `.db-password.sample`, `.env.sample`, `run.sh.sample`, remove the `.sample` suffix to setup your own configuration files. ```sh= cp .admin-password.sample .admin-password cp .db-password.sample .db-password cp .env.sample .env ``` * `.admin-password`: default administrator password * `.db-password`: system built-in database password * `.env`: environment variables, the default set value is as follows ```tx DOMAIN_NAME=localhost # Website domain name, if you want to make the service public, please set it to your public domain name PUBLIC_BASE_URL="http://${DOMAIN_NAME}/" # Website base URL ADMIN_NAME="Kuwa Admin" # Website default administrator name ADMIN_EMAIL="admin@${DOMAIN_NAME}" # Website default administrator login email, which can be an invalid email ``` * `run.sh`: the executable file 3. Start the System Execute and wait for minutes. ```sh sudo ./run.sh ``` By default, Kuwa will be deployed on `http://localhost`. # 3. Using the System ## 3.1 Model Setup ### 3.1.1 Gemini Pro Gemini Pro model is setted up by default, just enter your API key in `setting > API Management`. ### 3.1.2 ChatGPT (OPENAI) 1. Enter `genai-os/docker/compose` directory and copy the .yaml file. ```sh cp gemini.yaml chatgpt.yaml ``` 2. Edit chatgpt.yaml: * line 2: change `gemini-executor` to `chatgpt-executor` * line 8: change `geminipro` to `chatgpt` * line 9: change `gemini-pro` to `gpt-4-turbo` or other non-repeated code * line 10: change `Gemini Pro` to `OpenAI GPT-4` or other name that will be shown as Kuwa web server chatroom name * line 11: change `gemini.png` to `chatgpt.png` * line 15: change `command: ["--api_key", ……` to `command: ["--model", "gpt-4-turbo","--api_key", "<FILL IN YOUR API key>"` 3. Edit `run.sh`: Add `chatgpt` into `confs` array and execute `run.sh` again. `confs` array: ![螢幕擷取畫面 2024-07-25 170624](https://hackmd.io/_uploads/HkP0CWat0.png =60%x) chat room interface: ![螢幕擷取畫面 2024-07-18 172203](https://hackmd.io/_uploads/B1t16WTtA.png) ### 3.1.3 TAIDE 1. Download the TAIDE 8B 4bit version gguf file `taide-8b-a.3-q4_k_m.gguf` based on Llama3 from [here](https://huggingface.co/taide/Llama3-TAIDE-LX-8B-Chat-Alpha1-4bit/tree/main). ```sh= cd ~ mkdir gguf/taide curl -L -o "taide/taide-8b-a.3-q4_k_m.gguf" https://huggingface.co/nctu6/Llama3-TAIDE-LX-8B-Chat-Alpha1-GGUF/resolve/main/Llama3-TAIDE-LX-8B-Chat-Alpha1-Q4_K_M.gguf?download=true ``` 2. Enter `genai-os/docker/compose` directory and copy the .yaml file. ```sh cp llamacpp.yaml llamacpp-taide.yaml ``` 3. Edit `llamacpp-taide.yaml`: * line 2: change `llamacpp-executor` to `llamacpp-taide-executor` * line 9: change `TAIDE 4bit` to `llamacpp-taide` or other non-repeated code * line 10: change `Gemini Pro` to `Llama3-TAIDE-LX-8B-Chat-Alpha1-4bit` or other name that will be shown as Kuwa web server chatroom name * line 15: change `command: ["--model_path", "/var/model/taide-4bit.gguf" ......` to `command: ["--model_path", "/var/model/taide-8b-a.3-q4_k_m.gguf" ......` * line 17: change `/path/to/taide/model.gguf` to your path to the gguf file, and change `/var/model/taide-4bit.gguf` to `/var/model/taide-8b-a.3-q4_k_m.gguf` 4. Edit `run.sh`: Add `llamacpp-taide` into `confs` array and execute `run.sh` again. If you faced the error `ModuleNotFoundError: No module named 'llama_cpp'`, please refer to the [commit](https://github.com/kuwaai/genai-os/commit/00ff80b5983325f1736299d8abae671f72c3f6ca) to fix it. ### 3.1.4 Others, using Ollama 1. Use Ollama, a simple API for running and managing models, to pull model. ```sh ollama pull <model name> ``` 2. Enter `genai-os/docker/compose` directory and copy the .yaml file. ```sh cp gemini.yaml ollama-<name>.yaml ``` 3. Edit `ollama-<name>.yaml`: * line 2: change `gemini-executor` to `ollama-<name>-executor` * line 8: change `geminipro` to `chatgpt` * line 9: change `gemini-pro` to `<access code>`, recommended a non-repeated model version name * line 10: change `Gemini Pro` to `<the code that will be shown in chat room interface>` * line 15: change `command: ["--api_key", ……` to `["--model", "<model name>", "--base_url", "http://host.docker.internal:11434/v1", "--api_key", "ollama"]` Note that the API URL originally prompted in Ollama is `http://<localhost>:11434/v1`, but since we are using Docker to load this LLM, we can't directly access services through localhost in the container. Therefore, we should change `localhost` to the machine IP address or use Docker-provided IP `host.docker.internal` to access to services. 4. Edit `run.sh`: Add `<yaml file name>` into `confs` array and execute `run.sh` again. ### 3.1.5 Others, using LM Studio 1. Use LM Studio, a fast LLM deployment platform, to download model. 2. Enter `genai-os/docker/compose` directory and copy the .yaml file. ```sh cp gemini.yaml lmstudio-<name>.yaml ``` 3. Edit `lmstudio-<name>.yaml`: * line 2: change `gemini-executor` to `lmstudio-<name>-executor` * line 8: change `geminipro` to `chatgpt` * line 9: change `gemini-pro` to `<access code>`, recommended a non-repeated model version name * line 10: change `Gemini Pro` to `<the code that will be shown in chat room interface>` * line 15: change `command: ["--api_key", ……` to `["--model", "<model name>", "--base_url", "http://host.docker.internal:11434/v1", "--api_key", "lm-studio"]` Note that the API URL originally prompted in Ollama is `http://<localhost>:11434/v1`, but since we are using Docker to load this LLM, we can't directly access services through localhost in the container. Therefore, we should change `localhost` to the machine IP address or use Docker-provided IP `host.docker.internal` to access to services. 4. Edit `run.sh`: Add `<yaml file name>` into `confs` array and execute `run.sh` again. ### 3.1.6 Others, using GGUF 1. Download the gguf file. 2. Enter `genai-os/docker/compose` directory and copy the .yaml file. ```sh cp llamacpp.yaml llamacpp-<name>.yaml ``` 3. Edit `llamacpp-<name>.yaml`: * line 2: change `llamacpp-executor` to `llamacpp-<name>-executor` * line 8: change `taide-4bit` to `llamacpp-<name>` * line 9: change `TAIDE 4bit` to `<access code>`, recommended a non-repeated model version name * line 10: change `Gemini Pro` to `<the code that will be shown in chat room interface>` * line 15: change `command: ["--model_path", "/var/model/taide-4bit.gguf" ......` to `command: ["--model_path", "/var/model/<gguf file name>" ......` * line 17: change `["/path/to/taide/model.gguf` to your path to the gguf file, and change `/var/model/taide-4bit.gguf` to `/var/model/<gguf file name>` 4. Edit `run.sh`: Add `<yaml file name>` into `confs` array and execute `run.sh` again. ## 3.2 RAG Setup v0.3.0 has added the RAG toolchain, which allows users to drag and drop their on-premises file folders to build their own vector database and do Q&A. This part will guide you through how to use Kuwa's RAG toolchain to build your own vector database and related bot. 1. Refer to the document `genai-os/src/toolchain/README.md` to use the command to create vector database. 2. Refer to `docker/compose/dbqa.yaml` to create the DB QA Executor. * Change the ``</path/to/vector-database>`` in the volume to the location of the vector database on the Host * `EXECUTOR_NAME` can be changed to an easy-to-remember name. * `--model` parameter can be used to specify a certain model to answer, if the `--model` parameter is omitted, the first online Executor in the Kernel will be selected (excluding Executors with the suffix "-qa") to answer. 3. Add `dbqa` into `confs` array and execute `run.sh` again. ## 3.3 Bot Setup You can create different bots on the Kuwa website. 1. Click the store section and green button to create bot. ![store-25a673dbccd89e472de3e608928b9b9c](https://hackmd.io/_uploads/ByvGZ_dFR.png) 2. Modify some settings about the bot. You can simply use the interface to set common bot parameters. ![bot-create-1-287b39ced78863358490fc6c89b0a810](https://hackmd.io/_uploads/Hk7Yz_uYC.png) If you want to set more detailed information, you can also open the model configuration file. ![bot-create-2-8475079dd646b7c08ccd8511d57f09b6](https://hackmd.io/_uploads/rJ0eQdOKA.png) Although this part does not have an auxiliary interface, you can more freely set all parameters. Please refer to the Ollama Modelfile for the format. Note that in the current 0.3.0 version, only some configuration parameters are supported. The following lists the relevant parameters and some example usages. * SYSTEM \<prompt> The system prompt should serve as the main method to influence the model's output, preloading some knowledge or changing the response style. * SYSTEM You are a helpful assistant. * SYSTEM Please respond briefly. * SYSTEM Your name is Bob, and you love learning other languages. * TEMPLATE \<template> Specify the dialogue template to apply during inference. The template used by each model may vary, so it is recommended to refer to the relevant template for the model. * TEMPLATE """ {% for message in messages %} {% if message['role'] == 'system' %} {{ '\<s>' + message['content'] }} {% endif %} {% if message['role'] == 'user' %} {{ 'USER: ' + message['content'] }} {% endif %} {% if message['role'] == 'assistant' %} {{ 'ASSISTANT: ' + message['content'] }} {% endif %} {% endfor %} {{ 'ASSISTANT: ' }}""" * MESSAGE \<role> \<prompt> Preload some dialogue records. The User and Assistant parts must be paired. * MESSAGE SYSTEM You are a helpful assistant. * MESSAGE SYSTEM Please respond briefly. * MESSAGE USER Hello. * MESSAGE ASSISTANT """Hello! How can I assist you?""" In addition to the parameters supported by the original modelfile, we have also extended two additional parameters: * BEFORE-PROMPT \<prompt> In the last message, this prompt will be placed before the user's message. * BEFORE-PROMPT Please translate the following into Japanese: 「 * BEFORE-PROMPT 「 * AFTER-PROMPT \<prompt> In the last message, this prompt will be placed after the user's message. * AFTER-PROMPT 」 * AFTER-PROMPT 」, please rephrase the above content. Please note that not all models support these parameters. For example, the current Gemini Pro API does not support templates. The system prompt part is supported as a before-prompt. Additionally, ChatGPT does not support template settings. The effectiveness of these settings depends on the model's training. If the training for system prompts is insufficient, it may be challenging to influence the model's behavior using the system prompt alone. You can try to influence the model output using MESSAGE or Before/After prompt instead. ## 3.4 Others * [Search QA Setup](https://kuwaai.tw/blog/search-qa-setup) * [Whisper Setup](https://kuwaai.tw/blog/whisper-tutorial) * [Visual-Language Model Setup Tutorial](https://kuwaai.tw/blog/vlm-tutorial) * [Stable Diffusion Image Generation Model Building Tutorial](https://kuwaai.tw/blog/painter-tutorial) * [Tool Development Tutorial](https://kuwaai.tw/blog/tool-tutorial) * [RAG Custom Parameters Tutorial](https://kuwaai.tw/blog/rag-param-tutorial) * [Cool-Whisper Tutorial](https://kuwaai.tw/blog/cool-whisper-tutorial) # 4. Troubleshooting, FAQ, and Else ## 4.1 Troubleshooting If you encounter any technical issues, have a look at our [FAQs](https://kuwaai.tw/os/FAQ) to see if your problem is addressed there. If you still cannot resolve your issue, feel free to ask [our community](https://kuwaai.tw/community) and we will do our best to help you troubleshoot any issues. When asking for help, please include information about your system's hardware, software versions, and what the expected and actual behavior is, as this will help us assist you in troubleshooting more quickly. ## 4.2 Support Kuwa is an open-source community, welcome everyone to participate in contributing 😁 Where can I start contributing? Currently, Kuwa GenAI OS is still in a very early stage. If you encounter any problems during the process, that's great 👍 it means our documents is not clear enough. If you encounter any problems, please feel free to inform us through [our community](https://kuwaai.tw/community), we will try our best to help you solve any difficulties. If you think any feature is cool, welcome to issue on [GitHub](https://github.com/kuwaai/genai-os/issues), or contact us through [our community](https://kuwaai.tw/community), we are happy to discuss any cool ideas with you. Of course, if you do some features yourself and want to merge them into Kuwa GenAI OS, you are also welcome to submit Pull requests on [GitHub](https://github.com/kuwaai/genai-os/issues), thank you for your contribution. 🎉 ## 4.3 Legal Information * License of the source code: MIT * License of the documentation: CC BY-SA 4.0