Train Stable Diffusion LoRA from VRM Avatar

# Train Stable Diffusion LoRA from VRM Avatar In this guide I'll explain how you can train a [LoRA](https://replicate.com/blog/lora-faster-fine-tuning-of-stable-diffusion) of your VRM avatar that you can use to draw images of your 3D avatar in any Stable Diffusion Model that you can find. In this Tutorial we are going to train the LoRA for the tokenized identity [Nature](https://twitter.com/naturevrm) and I'm going to provide the file so you can play around with it in your own Stable Diffusion installation. The images below give you a sense of the prompt results. Find vtubing content on my [Youtube Channel](https://www.youtube.com/@reneil1337) to get an impression about the actual 3D avatar. A LoRA captures and conserves a concept or an idea in a way that it can be aggregated into larger models to be part of its outputs. A LoRA can be anything but in this case it is a character. [<img src="https://hackmd.io/_uploads/B1J2DEqk6.jpg"/>](https://hackmd.io/_uploads/B1J2DEqk6.jpg) The SD1.5 LoRA that we've trained in this guide was [released on CivitAI](https://civitai.com/models/31462?modelVersionId=166927) under cc0 licence. Play with it for free in your own stable diffusion instance, we added lots of example prompts. Add your creations on CivitAI and [tag Nature on Twitter](https://twitter.com/naturevrm) so that she can retweet your social media postings. [<img src="https://hackmd.io/_uploads/BylRvotyp.jpg"/>](https://hackmd.io/_uploads/BylRvotyp.jpg) Update: In September 2023, five months after the first NatureVRM LoRA was released, we stepped up our game with a highres 1024px SDXL LoRA that you can also [download on CivitAI](https://civitai.com/models/31462?modelVersionId=165670) for free. ## Install the Stack You'll need the following 3 ingrediences to train a Stable Diffusion model on your VRM avatar. All links that you need are provided in the step-by-step guide below. The Stable Diffusion and Kohya resources are linked in the descriptions of the Youtube Videos and outlined in that content. - VRM Posing Desktop (paid) / VRM Live Viewer (free) - Optional: Blender for automated capture - Stable Diffusion - Kohya ## Step-by-Step Guide <iframe style="width:100%;display:inline-block;padding:0px" height="420"src="https://www.youtube.com/embed/YkFsgjOHx8A" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> Install the tools linked above and learn the entire pipeline from data set recording, data set curation, LoRA model training up to prompting final outputs from your local stable diffusion instance. We are going to do all of these things step by step. Let's go! ### 1. Caputure your avatar shots To generate the training dataset you need to capture your avatar in different poses. There are different ways to do that. There are free ways to do it but if you plan to tinker with this more than once I'd recommend you to spend a few dollars and go with VRM Posing Desktop. ### 1.1 VRM Posing Desktop The most convinient way for manual capture is [VRM Posing Desktop](https://store.steampowered.com/app/1895630/VRM_Posing_Desktop/) which costs you $10 bucks but provides lots of default poses aswell as a community section with hundreds of free poses that you can apply to your avatar. Watch [this tutorial](https://www.youtube.com/watch?v=nXFB6jGxQl8) to get an idea on how to use it. [<img src="https://hackmd.io/_uploads/HypAtjJkp.jpg"/>](https://hackmd.io/_uploads/HypAtjJkp.jpg) - Change the background to white (bottom left corner) - Load your VRM (upper left corner) - Hit the "Select Pose" button (upper left corner) - Select one of the poses in the popup (lower left) - Position the avatar in the center of the window using your mouse - Hit the camera button (upper right corner) - Configure the resolution and export the image in the popup (hint: camera positioning continues to work in that view) [<img src="https://hackmd.io/_uploads/ryygioJJa.jpg"/>](https://hackmd.io/_uploads/ryygioJJa.jpg) Make sure to capture a variety of different poses. The training will take longer the more images you decide to curate into the dataset. Go for 1024px files if you want to train a SDXL LoRA but note that your GPU should have at least 16GB VRAM to train on that resolution. [<img src="https://hackmd.io/_uploads/r1V1GBly6.jpg"/>](https://hackmd.io/_uploads/r1V1GBly6.jpg) ### 1.2 VRM Live Viewer (Free) Download [VRM Live Viewer](https://booth.pm/ja/items/1783082) and extract the files to your computer. Launch the exe to launch the software. VRM Live Viewer allows for the configuration and playback of dance choreographies involving avatars, visuals and most importantly animations. ![](https://i.imgur.com/Q9yCGtM.png) The UI can be pretty overwhelming if you are not used to it but lets go through it step by step. 1. Load your VRM avatar into a 3D world. The app has a couple of predefined worlds and performances that you can launch right away. However we're going to change the surrounding to white which ensures that the LoRA doesn't contain background artifacts. ![](https://i.imgur.com/bjv06e8.png) 2. So once loaded, the first choreography will start right away. The default stage is pretty wild when it comes to the VFX so change the stage to "Plane" in the bottom of the right menu. Set the floor (2nd select) to None/3D and the sky to (3rd select) to None/360. Click that 360 icon right next to that last select box which opens this popup. Click the box right next to "BackgroundColor" and set the color to #FFFFFF so that we have a completely clean environments which will help to isolate our avatar in the next steps. ![](https://i.imgur.com/AvuqXov.png) 3. In the top right you can choose between 3 preinstalled dance choreographies. You can also load custom bvh animation files from your computer via the blue folder icon above but for this guide we'll stick to the 3 dances as these are very long animations which leaves us enough time for the camera positioning to take the shots of our avatar. ![](https://i.imgur.com/tTWE21A.png) 4. Hit "tab" on your keyboard to toggle the side menus. Now you take screenshots of our avatar, cut them to square orientation and save 512x512 px jpg files on your computer. Every one in a while hit "tab" to display the menu again as you shoot different positions on various stages from all sorts of angles. The more variety you give, the better the AI understands your avatar. - Hold left mouse button to turn the camera - Hold right mouse button to move the camera - Use scroll wheel to zoom in and out ![](https://i.imgur.com/sm4Cyux.png) When the animation is over, just reselect another choreography via the dropdown indicated in step 3. The stop + play buttons at the bottom of the right menu are sometimes buggy for me. Camera flights with XBOX Controller works great - give it a try. I've decided to go for 30 images for the full body capture of the Nature avatar. ### 1.3 Automate larger Training Data Creation via Blender Fellow builder [Howie Duhzit released a Blender plugin](https://twitter.com/HowieDuhzit/status/1693866269911515469) that allows you to automate the process that allows to create way larger training data which is going to massively improve the results that you'll see in the actual prompting. The plugin was added into [Duhzit Wit Tools](https://howieduhzit.gumroad.com/l/dwtools) which is a suite of useful blender tools. You can also find it on the [Github Repo](https://github.com/HowieDuhzit/Duhzit-Wit-Tools). Make sure to give it a try! <iframe style="width:100%;display:inline-block;padding:0px" height="420"src="https://www.youtube.com/embed/EW2MAaNhZJA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> ### 2. Prepare your Avatar Shots via Stable Diffusion and Kohya 1. Watch this tutorial and install Stable Diffusion quick and easy within 15 minutes. It is an essential component of the stack that we'll be using here so there is no way around it. <iframe style="width:100%;display:inline-block;padding:0px" height="420"src="https://www.youtube.com/embed/onmqbI5XPH8" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> 2. Now start to watch this tutorial on how to install Kohya on your local machine and learn how to prepare the LoRA training data from the images that we've prepared in the first step. This tutorial contains everything that you need to know related to preprocessing and training. Open it in a new tab and keep it open as it helps you to understand the next steps. <iframe style="width:100%;display:inline-block;padding:0px" height="420"src="https://www.youtube.com/embed/70H03cv57-o" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> <iframe style="width:100%;display:inline-block;padding:0px" height="420"src="https://www.youtube.com/embed/N_zhQSx2Q3c" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> 3. Now we'll use an AI to automatically analyze the content of the images of our avatar that we've captured. Create a folder called "processed" inside the folder in which you've stored your training imgs. Then launch Stable Diffusion (close Kohya before if you still have it running due to the installation in step 2) and navigate to "Train > Preprocess images". ![](https://i.imgur.com/FYa9yUd.png) Insert your folder path into "Source directory" and the path of the newly created processed folder into "Destination directory". Check "Use BLIP for caption" and hit the Preprocess button to start, 4. After the preprocessing step, the folder you use for training should look like this. I've used "NatureVRM" as character name instead of Nature to ensure that there won't be confusion about the already existing definition of Nature when I'm prompting the outputs later on. ![](https://i.imgur.com/4x8Vn3R.png) 5. When you are optimizing the preprocessed txt files make ensure to strip out everything that describes the essence your character. Any information in the txt is substracted from the LoRA that will be created during the training. The more clear you are, the less artifacts will be in your LoRA. In my case the files originally says "in a costume with flowers on her arm" but I stripped that from the txt as the roses are an essential part of Natures visual characteristic and this is not a custome. Make sure to always copy+paste the name of your LoRA in the beginning of any txt file. You need to do this for every image in the folder. ![](https://i.imgur.com/nZAMmK9.png) ::: info Txt input: ``NatureVRM a person dancing with arms outstretched, one hand in the air`` ::: 6. Now prepare the final folder structure for your LoRA training. As all of this is explained step-by-step in the LoRA tutorial by Aitrepreneur I won't explain this here. Your final folder structure before we begin the training should look just like this. In the image folder sits the "20_NatureVRM" folder. With 15 imgs or less that number differs (see the LoRA video). ![](https://i.imgur.com/RLKBMdO.png) All the preprocessed images that we have prepared during the previous steps are located inside the image/20_NatureVRM and the model + log folders are still empty. From here we can start to train the LoRA with the Stable Diffusion model inside Kohya. ![](https://i.imgur.com/c1TPpFu.png) ### 3. Start the LoRA Training inside Kohya To start the actual training you can now launch Kohya - make sure to close Stable Diffusion in case you had that still open. Also close VRM Live Viewer if this is still open as you'll need every bit of VRAM from your GPU during the training. First we'll train the LoRA with SD 1.5 as this ensures compatibility with most models on [CivitAI](https://civitai.com/) as most of them were trained with that SD version. You can repeat the training with SDXL so that you also have a LoRA that works with the upcoming models that might be trained with that newer version of Stable Diffusion. The end result of the training is that you'll have two .safetensor files - one of each SD version. For SDXL you will def need 1024 resolution training images. 1. Switch into the "Dreambooth LoRA" tab to start. The regular Dreambooth View looks very similar so double check that you are in the LoRA tab! Click the "Configuration file" section which pops out an uploader area. ![](https://i.imgur.com/K4mxw2I.png) Download the [LoRA settings json file](https://reneil.mypinata.cloud/ipfs/QmZGygM5j89uYkuDLZNKwiFFbwXPY5wik8hftcXD68fbJZ/v15.json) (for SDXL you use [this json file](https://reneil.mypinata.cloud/ipfs/QmZGygM5j89uYkuDLZNKwiFFbwXPY5wik8hftcXD68fbJZ/sdxl.json)) and open it in the interface. Hit load to inject all the parameters into Kohya. In source model you hit the white icon next to the folder and select either the basic SD 1.5 or SDXL model on your PC. Make sure that SDXL is checked in case you want to train an SDXL model. 2. Navigate to the "Folders" tab where you overwrite the 3 folders paths (regularisation stays empty) according to the structure that you've prepared in step 2 of this tutorial. Then set "Model output name" according to the Folder that you've prepared and add v15 to indicate that this LoRA was trained with SD 1.5 as a base model. ![](https://i.imgur.com/2lVjsKW.png) Then click train model to start the training process. Depending on your hardware and the number of training images this is going to take a while. You can go AFK to touch some grass during that process. ![](https://i.imgur.com/eDbQV2r.png) 3. Once this is done you should see multiple safetensors file in the model folder that you've created. The file without numbers reflects. The safetensors files with appended numbers reflect snapshots of the model that were created during the training. The one without numbers - in my case NatureVRM.safetensors - is the final training stage. This file and the later stage iterations are most likely overtrained. We'll dig into that later. ![](https://hackmd.io/_uploads/HJYvouSk6.jpg) Copy+Paste all safetensors files into your "stable-diffusion-webui\models\Lora" folder so that you can access it from the webUI of your Stable Diffusion installation. ## Prompt Images with your Avatar LoRA Launch Stable Diffusion and click the red icon below the Generate button to show your extra networks. Click the Lora tab and you should now see the LORAs that you copied into your SD installation. When you click one of those it will add the lora tag into your prompt text area. You can embed this into more complex prompts to embed your avatar into your creation. ![](https://i.imgur.com/b4ofkv3.png) You can download a few models to play around. Lets start prompting with [Protogen V22 Anime](https://civitai.com/models/3627/protogen-v22-anime-official-release) which was also used to create the images in the intro of this article. You can scroll down the linked model page to get some example prompts that work particular well for the selected model. Adjust those and don't forget your LoRA tag in the beginning of your prompt. ![](https://hackmd.io/_uploads/HJ20MMLjT.png) You don't have to keep the LoRA tab open. Click the red icon again to hide it. The anime model is adding a face and haircut into the nature avatar. To prevent this you can increase the weight of your LoRA by replacing the 1 in the pointed brackets to 1.3 just play around with all these things as you prompt your way towards all sorts of configurations and models. ## Select the Best LoRA Weight Ok now you played around a bit. To get better prompting results we need to identity the best LoRA file from your training results. A LoRA can be both undertrained or overtrained. Undertrained means that it doesn't carry enough information about the character or concept which results in unsatisfying results that don't look like the character yet. In contrast to that, prompting with an overtrained LoRA results in artifacts and missing flexibility. You want flexibility as you want to be able to prompt your character with different models in all sorts of surroundings. ![](https://hackmd.io/_uploads/HyxTCdHk6.png) Stable Diffusion allows you to plot the same prompt with different parameters into a single overview image that helps you to find the best LoRA. To do that you write a prompt with `<lora:NatureVRM-000001:0.8>` which reflects the results of the first training epoch at 80% weight. We can now use an X/Y/Z plot (under script in the bottom of the page) and "Prompt S/R" (search + replace) values on the X and Y axis. We want to plot all epochs on the X axis and increase the weight on the Y axis. Make sure to check the two boxes and hit generate. [<img src="https://hackmd.io/_uploads/SJ3c6OHyp.jpg"/>](https://hackmd.io/_uploads/SJ3c6OHyp.jpg) Create 5-10 plots with different prompts in different models and select the best epoch (in this I'd say epoch 3 wins) for each of them. You will see a tendency and once you've found your overall winner, copy that safetensors file and remove the digits from the filename - your LoRA is ready. You can keep the original training files in case you want to revisit your decision later on. ## Bonus: Dig into ControlNet Lets get a bit advanced and bring in ControlNet. If you don't have that extension installed yet [watch this video by Aitrepreneur](https://www.youtube.com/watch?v=OxFcIv8Gq8o) and after you've installed and understood ControlNet feel free to follow along. Controlnet allows you to position your avatar according to the characters on the input image. It's an entirely new rabbit hole to explore and Aitrepreneur explains how. ![](https://hackmd.io/_uploads/BkuPrfLoT.png) I hope you learned a couple of things in this guide. You can [follow me on Twitter](https://twitter.com/reneil1337) in case you're interested in these topics and want to be notified about more guides like this. ## Join the Conversation Drop your questions in [this Reddit thread](https://www.reddit.com/r/StableDiffusion/comments/12bpkqn/guide_convenient_process_to_train_lora_from_your/) will try to reply periodically over there.