# Local Alpaca via KobaldAI and TavernAI
## Introduction
I've been researching and tinkering a lot with locally hosted LLMs recently. There are several great tutorials out there which explain how to run LLaMa or Alpaca locally. After digged through a couple of them I decided to write a step-by-step on how to run Alpaca 13B 4-bit via KoboldAI and have chat conversation with different characters through TavernAI - entirely on your local machine.
<iframe style="width:100%;display:inline-block;padding:0px" height="420"src="https://www.youtube.com/embed/tTKHcEn8uvg" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
The performance of the quantized model loaded on the GPU is incredible and shows the potential for on-prem LLM systems. I'm aware of CPU based solutions like alpaca.cpp and played around with them. However in this guide I'll dig how to install the Alpaca that I personally like most on gaming hardware. It runs incredibly well on my RTX 4080 - see the video below.
## What makes this stack special?
The ability to run this setup locally on gaming hardware is pretty neat. It amazed me for the same reasons Stable Diffusion amazes me. The modularity is another reason. You can configure the language model interface in KoboldAI and plug that API into other frontends: Instead of TavernAI you could embed it into [Hyperfy](https://twitter.com/hyperfy_io), [Webaverse](https://twitter.com/webaverse) or other web3xr platforms.
We already saw [ChatGPT integrations in Hyperfy](https://twitter.com/philburrrrt/status/1637229855631634433) and the [Webaverse Character Studio](https://twitter.com/webaverse/status/1627678180982267904) already showed very powerful AI integrations. I hope this guide helps you to understand the modularity aspect. I'm currently exploring the [Langchain](https://python.langchain.com/en/latest/) framework which is going to allow the creation of more sophisticated LLM systems that are open and can be hosted on premise.
## Overview
This guide is written for Windows 10. The installation was documented on a computer with RTX 4070ti (12 GB VRAM) and should be installed for research purposes only.
- Alpaca 13B 4bit hf
- KoboldAI 4bit fork
- TavernAI
If you have less VRAM I'd suggest to take a look at Alpaca 7B 4bit.
## Step by Step Installation
### Install the Kobold 4bit Fork with your Alpaca Model
1. Download and install [Visual Studio 2019 build tools](https://learn.microsoft.com/en-us/visualstudio/releases/2019/history#release-dates-and-build-numbers) --- **Important**: Check "Desktop development with C++" when installing.
![](https://i.imgur.com/iNmBgz6.png)
2. Download and unzip [0cc4m/KoboldAI](https://github.com/0cc4m/KoboldAI/) somewhere on your drive
![](https://i.imgur.com/4C5vElf.png)
3. Download + Extract all files from [0cc4m/GPTQ-for-LLaMa](https://github.com/0cc4m/GPTQ-for-LLaMa/) to KoboldAI-4bit/repos/gptq --- **Note**: This archieve contains a subfolder which you don't need. Make sure to place the contents into the gptq folder as seen in the screenshot below.
![](https://i.imgur.com/ZxnVMcm.png)
4. Run install_requirements.bat in the main directory as administrator. Select the Temporary Drive Letter option by typing "1" - then press enter to start the installation
![](https://i.imgur.com/hLOPOGx.png)
![](https://i.imgur.com/vCmqZ8r.png)
5. Wait until all requirements are installed. Depending on your internet connection this will take a few minutes as multiple gigabytes are being downloaded. At some point (last screenshot) you will be prompted to press any key which closes the installer.
![](https://i.imgur.com/eLcyVfZ.png)
![](https://i.imgur.com/FYB6yh8.png)
![](https://i.imgur.com/x5XQWyb.png)
6. Download and install [Cuda Toolkit 11.3](https://developer.nvidia.com/cuda-11.3.0-download-archive?target_os=Windows&target_arch=x86_64&target_version=10&target_type=exe_local) via the suggested express option. Don't worry about the notice regarding the unsupported visual studio version - just check the box and click next to start the installation.
![](https://i.imgur.com/ZAFrPpw.png)
![](https://i.imgur.com/zVrCJ78.png)
7. Start commandline.bat in the main directory. Type "cd repos" and hit enter. Type "cd gptq" and hit enter. Type "python setup_cuda.py install" and hit enter. The installation takes some time - you can close the window once it is finished (see last screenshot)
![](https://i.imgur.com/81JeTjU.png)
![](https://i.imgur.com/yRDZPJs.png)
![](https://i.imgur.com/szueqtG.png)
8. Copy your Alpaca folder that contains the quantized 4bit.pt (rename your file pt file) plus hf config files etc. into the models folder.
![](https://i.imgur.com/9d6KsTI.png)
**Note**: Open tokenizer_config.json and make sure that it says the following *"tokenizer_class": "LlamaTokenizer"* as the tokenizer class has been changed from LLaMATokenizer to LlamaTokenizer. Adjust and save in case you still had the old tokenizer config.
:::info
This tutorial works with other models. When you wanna load models with other group sizes you'll need to append it to the pt/safetensors file. Usually its clearly indicated in the name of the model. For reference, the screenshot below shows the way in which vicuna-13b-GPTQ-4bit-128g has to be renamed to work. Loading it works exactly the same.
![](https://i.imgur.com/pdQ8gmr.png)
If you don't indicate the group size in the file name you'll get size missmatch errors.
:::
### Install TavernAI
1. Download [Node.js for Win x64](https://nodejs.org/download/release/v19.1.0/node-v19.1.0-x64.msi) or [other systems](https://nodejs.org/download/release/v19.1.0/) and install it as administrator. Click next while leaving all checks by default.
2. Download [TavernAI](https://github.com/TavernAI/TavernAI) and unzip it to your folder.
![](https://i.imgur.com/R6TDu3j.png)
## Spin Up the Stack
### Launch KoboldAI
1. Launch Kobold by executing the play.bat in the main directory. The Kobold interface should launch in your browser. If thats not the case you can access it via http://localhost:5000
![](https://i.imgur.com/kGe8IK1.png)
2. Click "Try New UI" in the upper right corner to access the updated UI. In the left menu click the interface tab you have to enable the "Experimental UI" by clicking the toggle.
![](https://i.imgur.com/Asvc8Aa.png)
![](https://i.imgur.com/j2I7YOw.png)
![](https://i.imgur.com/KDJVGI2.png)
3. Go back to home and click "Load Model" then click "load a model from its directory" to find the alpaca model on your drive. Enable 4 bit mode before you click load below.
![](https://i.imgur.com/3lWulmV.png)
![](https://i.imgur.com/VZu3H4W.png)
![](https://i.imgur.com/7MyQE94.png)
4. Wait until the model has loaded. You need 12 GB VRAM to load the Alpaca 13B 4bit. If everthing worked well, you should see that your dedicated GPU memory holds the model.
![](https://i.imgur.com/XZithFf.png)
![](https://i.imgur.com/LiTyCOc.png)
5. Change the Game Mode to "Chat" and keep Kobold running.
![](https://i.imgur.com/UdeCXIc.png)
### Launch TavernAI
1. Launch the Start.bat in the main directory. It should automatically launch the TavernAI application in your browser. You can access it via http://127.0.0.1:8000
![](https://i.imgur.com/tZNb6GD.png)
2. Open on the menu in the top right and go to settings. Select KoboldAI as API. Connect to your localhost:5000/api (should be the default) and select "GUI KobaldAI Settings" as preset
![](https://i.imgur.com/VhhYmoM.jpg)
3. To personalize the experience, drop an avatar image into "TavernAI-main\public\User Avatars" and change your name in the settings.
![](https://i.imgur.com/MpbYuYJ.png)
![](https://i.imgur.com/swXQLgq.jpg)
Note: You need to refresh the Tavern website to refresh the avatar image selection. You can delete the other avatar images from the folder if you want to remove them from the UI.
4. In the Master Settings lower the context size to 1024 tokens. You can try higher values but that requires more VRAM.
![](https://i.imgur.com/Su8kOxq.png)
Start with these settings and go from there during further tinkering / experimentation.
## Done! Start your Conversations in TavernAI
You can now have your first conversation. You can select various characters from the TavernAI charaCloud (the view that you see at launch) or import your own character configs.
![](https://i.imgur.com/omEtyOE.jpg)
Btw: Most of the settings that you're configuring in KoboldUI and TavernAI during the first time will be saved. Future launches will be far more simpler especially as you get used to it.
## Bonus: Load Aristotle into your Tavern
This GTP4 generated description of Aristotle was [posted on reddit](https://www.reddit.com/r/LocalLLaMA/comments/121nytl/simulating_aristotle_in_alpaca_7b_i_used_gpt4_to/) this weekend and I thought it would be a great first learning on custom character imports. In the Tavern Characters Tab in the right menu click "New Character" and copy the content from the json file into the text areas.
![](https://i.imgur.com/XiuNOi7.png)
```json
{
"char_name": "Aristotle",
"char_persona": "Aristotle was an ancient Greek philosopher and scientist who lived from 384 BC to 322 BC. He was a student of Plato and later became the teacher of Alexander the Great. As one of the most influential philosophers in Western history, Aristotle's work covered a wide range of subjects, including physics, biology, ethics, politics, and metaphysics. His ideas formed the foundation of many modern scientific disciplines and continue to be studied and debated by scholars today. Aristotle was known for his systematic and empirical approach to knowledge, as well as his emphasis on virtue ethics and the pursuit of happiness.",
"char_greeting": "A tall, distinguished-looking man in a flowing robe enters the room, his keen eyes taking in his surroundings with curiosity and intellect. He approaches you with a warm, welcoming smile and extends a hand in greeting.\n\nGreetings, fellow seeker of wisdom! I am Aristotle, and I am delighted to meet someone who shares my passion for knowledge and understanding. I look forward to engaging in thought-provoking conversation and exploring the mysteries of the universe together.",
"world_scenario": "Aristotle lived during the Classical Greek period, a time of great intellectual and artistic growth. He spent much of his life in Athens, which was the center of this cultural renaissance, surrounded by philosophers, poets, and artists who were all pushing the boundaries of human knowledge. Aristotle studied under Plato and later went on to establish his own school, the Lyceum, where he taught and wrote extensively on a wide range of subjects. His ideas have had a profound impact on Western philosophy and science, and his teachings continue to be studied and debated today.",
"example_dialogue": "{{user}}: How did your time as a student of Plato influence your philosophical ideas?\n{{char}}: Studying under the great Plato was both an honor and a challenge. His ideas and teachings provided me with a strong foundation in philosophy, but I also found myself questioning and diverging from some of his views. For example, I disagreed with his theory of Forms, which posits the existence of ideal, immaterial versions of things. Instead, I took a more empirical approach, believing that knowledge could be gained through direct observation and experience of the world around us.\n{{user}}: What is your perspective on ethics and the pursuit of happiness?\n{{char}}: In my view, the ultimate goal of life is to achieve happiness, or 'eudaimonia,' which can be translated as a state of flourishing or living a fulfilling life. I believe that happiness can be attained through the cultivation of virtue and the development of moral character. It is essential to strike a balance between extremes, an idea I refer to as the 'Golden Mean.' By practicing moderation and striving for excellence in all aspects of life, we can achieve true happiness and lead meaningful lives.\n{{user}}: Can you explain your concept of the 'Golden Mean' in more detail?\n{{char}}: Certainly! The Golden Mean is the idea that virtue lies between two extremes, which are vices. For example, courage is a virtue that lies between the vices of cowardice and recklessness. The key to living a virtuous life is to find the right balance between these extremes in every aspect of one's behavior and decision-making. This balance will be different for each individual and may change over time, requiring continuous self-reflection and adjustment.\n{{user}}: How does your approach to knowledge differ from your contemporaries, such as Plato and Socrates?\n{{char}}: While Socrates and Plato were more focused on dialectical reasoning and abstract ideas, my approach to knowledge is more empirical and grounded in the natural world. I believed that true knowledge could be gained through careful observation, experience, and the systematic collection of data. I emphasized the importance of empirical evidence, experimentation, and inductive reasoning in the pursuit of truth. This approach has laid the foundation for many modern scientific disciplines and has had a profound impact on the way we understand and study the world around us.\n{{user}}: What is your view on the relationship between science and philosophy?\n{{char}}: I believe that science and philosophy are closely intertwined, as both seek to understand the fundamental principles of reality and the nature of existence. While science focuses on the empirical study of the natural world, philosophy provides a broader context and addresses questions that may not be easily answered through empirical methods alone. In my view, the pursuit of knowledge requires both a deep understanding of the natural world and the ability to think critically and abstractly about the underlying principles that govern it.\n{{user}}: You were the tutor of Alexander the Great. How did you influence his thinking and approach to leadership?\n{{char}}: As the tutor of young Alexander, I had the opportunity to shape his understanding of the world and impart to him the principles of virtue, wisdom, and ethical leadership. I emphasized the importance of reason, balance, and the pursuit of excellence in all aspects of life. While Alexander went on to become a great military leader and conqueror, I hope that my teachings had some influence on his approach to leadership and his treatment of those he encountered during his conquests.\n{{user}}: What advice would you give to aspiring philosophers and scientists today?\n{{char}}: My advice would be to remain curious, open-minded, and committed to the pursuit of truth. Embrace both empirical evidence and critical thinking, and strive to find a balance between these approaches to knowledge. Engage in dialogue with others, challenge your own assumptions, and always be willing to learn from new experiences and perspectives. Remember that the pursuit of wisdom is a lifelong journey, and that the true mark of a philosopher or scientist is the ability to adapt and grow in the face of new information and challenges.\n{{user}}: How do you think your ideas on politics and ethics can be applied in today's world?\n{{char}}: I believe that the core principles of my teachings on politics and ethics – such as the pursuit of virtue, the importance of the Golden Mean, and the value of rational decision-making – remain relevant today. In the complex and interconnected world in which we live, it is more important than ever for individuals and societies to strive for moral excellence, cultivate compassion, and seek balance in all aspects of life. By applying these principles, we can work together to create more just, equitable, and flourishing communities."
}
```
![](https://i.imgur.com/rxFJ2Wc.png)
I prompted this image of Aristotle via Stable Diffusion. You can upload this or any other image using the file uploader right above the description text area to have a more immersive coversation with the famous greek philosopher.
![](https://i.imgur.com/jTVOH4A.jpg)
## Discuss this Guide on Reddit
Dropped this into /r/KoboldAI for feedback. Let me know if you run into any issues in [the thread](https://www.reddit.com/r/KoboldAI/comments/122zjd0/guide_alpaca_13b_4bit_via_koboldai_in_tavernai/).