Basics of using StableDiffusion, or "How to make your Koikatsu pictures look arguably better"

# Basics of using StableDiffusion, or "How to make your Koikatsu pictures look arguably better" This guide is meant for beginners that want to try generating your own anime pictures or modifying existing pictures, e.g. KK screenshots. It will guide you through installing a local copy of WebUI and some SD (stable diffusion) models. **Word of caution: you will need\* a beefy NVIDIA GPU, details below.** > ![Preview of a KK screenshot being redrawn in anime style](https://i.imgur.com/1Wbd6aA.png) > Example of what can be accomplished with this guide (% refers to how much the AI was allowed to redraw, roughly speaking). First, some basics: - SD / Stable Diffusion is a diffusion model. - Diffusion Models are generative models, meaning that they are used to generate data similar to the data on which they are trained. - WD / Waifu Diffusion is a custom version of SD that was fine-tuned on images from Danbooru, making it much better at generating anime-style pictures. If you want to generate realistic pictures, use SD instead. - "Model" or "model file" usually refers to ckpt/checkpoint files that contain weights, which are basically the AI brain's neurons. - WebUI - a browser interface for interacting with models. If you've ever generated a picture online or used waifu2x, this is it. You can get basic image generation set up pretty easily, but beware, the rabbit hole is very deep. There are new models and scripts released every day, each possibly an improvement. If you want to have the best available, prepare for a lot of reading and a new hobby. This guide is not that. ## Requirements - A PC running Windows10/11 *(or Linux or MacOS, see [Alternatives](https://hackmd.io/k4COnMKpRVOZjYImI270dQ?both#Alternatives))* - A GPU with at least 6GB of VRAM. While it is *possible* to run Stable Diffusion on AMD or Intel GPUs, or even on the CPU/APU (for Mac), an **NVIDIA card is highly recommended** and even required for certain things like some extensions. ### Alternatives Running Stable Diffusion on other ecosystems is possible but limited. Performance will be far worse and some things that require cuda wont work at all. - If you're on Linux or MacOS with Apple Silicon, read [this](https://github.com/AUTOMATIC1111/stable-diffusion-webui#automatic-installation-on-linux). - If you have an AMD GPU, read [this](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs). - If you have an Intel Arc GPU, refer to [this](https://medium.com/intel-analytics-software/stable-diffusion-with-intel-arc-gpus-f2986bba8365) *(untested, seems very complicated)* ### Why NVIDIA? NVIDIA provides programmers with an API called *Compute Unified Device Architecture* (CUDA) which allows access to a big set of general purpose cores, which act similar to CPU cores but are a lot faster at parallel computing tasks. Most of todays machine learning ecosystems are build upon CUDA accelerated libraries and can only leverage their full potential when CUDA is available. ### About VRAM Stable Diffusion *used* to require large amounts of VRAM, which can only be found in very expensive GPUs such as the RTX 3090/4090 or NVIDIAs workstation cards. Since the the release in September 2022, VRAM usage for inference has improved tremendously and you can **easily get away with 16, 10 or even 8 GB or VRAM.** Even less VRAM is possible but starts to suffer from similar issues as using non NVIDIA cards. Only advanced tasks such as training still require bigger amounts of VRAM, but even that has improved by a lot. Nevertheless, the higher the VRAM the better, as you can go for higher resolution and higher batchsizes (how many images are generated simultaneously. ## What to download - Python 3.10.6 from https://www.python.org/downloads/windows/ (get the "Windows installer (64-bit)") - The currently newest version of xformers requires Python 3.10.9, which also fine for most tasks, but is apparently slow for training. - Python 3.11 is **not** recommended. - Latest version of GIT from https://git-scm.com/download/win (get the "64-bit Git for Windows Setup") - At least one Stable Diffusion model to use for inference. Models can be found and downloaded on [Civitai](https://civitai.com/) or [Huggingface](https://huggingface.co/models?other=stable-diffusion). Please look for models marked as **Checkpoint** (**not** Lora, Hypernetwork or Textual Inversion). If possible you should always go for `.safetensor` and **not** `.ckpt` files. ## Installing WebUI and models 1. Install Python 3.10.6. **Make sure to check "Add Python to PATH" during installation.** 2. Install Git. Make sure to install shell/path integration. 3. Download the stable-diffusion-webui repository by creating an empty folder, opening command line (shift+right click in the folder) in it, and running `git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git`. Afterwards you can auto update by running the `git pull` command. 4. Place any checkpoint files (`.safetensor`/`.ckpt` files, should be a few GB big) you have in the `models\Stable-diffusion` directory (it should already exist and contain a "Put Stable Diffusion checkpoints here.txt" file in it). **Tipp:** If you want to use other UIs or programs that use Stable Diffusion, which have their own models folder, you can store your models in a central folder and **symlink** it to the models folder for each UI. This way you dont have to keep multiple copies of the big checkpoint files. **Note:** If you have a **16xx series GPU (e.g. GTX1660)**, you will most likely get a black screen when generating images. To fix this, you have to edit `webui-user.bat` and add `--precision full --no-half --medvram` to `COMMANDLINE_ARGS`. **On all cards:** If you get black images, try using `--no-half-vae` ## How to launch and use the WebUI 1. After you've installed everything, double click `webui-user.bat` (as normal user, not administrator). See if there are any errors in the console window that opened. It should download and install a bunch of other requirements, and once ready say: `Running on local URL: http://127.0.0.1:7860`. 2. Open your web browser and connect to [http://127.0.0.1:7860](http://localhost:7860) You should now see something like this: ![WebUI preview](https://i.imgur.com/xD0JD7c.png) 3. Type something in the "Prompt field" and click the "Generate" button. A few extra things might need to be downloaded the first time, check the console window for progress (don't close it until you're done using the app!). Eventually a picture should appear in the preview. 4. All generated pictures are saved in the `outputs` folders. If you use the batch feature the `-grids` folders will also contain a copy. **Note:** If you have issues getting things to start/run properly, check [this page](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Troubleshooting) for troubleshooting steps. ### Generating new images from text (txt2img) #### Prompting Write a rough description of what you want in the "Prompt" field. You can use natural language, but for anime models you'll most likely have to write in danbooru tags (e.g. `1girl, solo, dress, long hair, blue eyes`). It's very common to prefix your prompts with `Masterpiece, best quality,` and put a bunch of *bad tags* in the negative (e.g.`lowres, blurry, bad anatomy, bad hands, worst quality, jpeg artifacts`) Generate will yield you a picture that at least mostly fits the description. Now starts the process of "prompt engineering", which is bascially the art of writing prompts in a way that makes the AI generate what you want. A big part of that is **emphasis**. You can use `()` to increase emphasis by a factor of 1.1. If you want more control over the emphasis use this syntax: `(<tag>:<factor>)` (e.g. `(big eyes:1.2)` or `(long hair:0.8)`). It should be noted that too much emphasis can quickly make things worse than using no emphasis, so dont overdo it. If somehthine does not work out, try prompting differntly (use a synoym or describe the feature) and consider negative prompts. Powerful anime models such as Anything v3/4 can generate acceptable images almost every try. But still, expect a lot of trail and error before getting something that really suits you. **Note:** You can hover over some of the buttons and labels to see a more info popup. #### Sampler Settings Next, there's the "Sampling Steps" section. It controls how many times the AI approximates the image. `Euler a` and 20-40 steps is optimal for a start. Higher values require more time to process, but do not increase the VRAM usage. 30-40 steps is usually the best for most models, its genrally accepted that anything over 50 is a waste of time and engery. It's worth experimenting with other samplers but for Anime `Euler a` and `Euler` are genrally the best. #### Resolution The default resolution of Stable Diffusion is 512x512. But depending on the model you are using you might get better results with higher resolution and different aspect ratios. Bigger Images will take more time and VRAM though. #### Batch The batch options let you generate multiple images at once. `Batch Size` indicates how many images to gernate simultaneously (taking more time and VRAM), while `Batch Count` refers to how many batches in a row it should generate with the same settings. There really isnt any reason to increase the batch count, is there are not benefits over hitting the generate button again when it finishes. #### Further Settings "CFG Scale" specifies how strongly your prompt should be followed. Too high value might cause disturbing effects, like a `barefoot` prompt generating a picture full of detached and mangled feet. Too low scale will result in the AI doing its own thing and not caring too much about what you want. The optimal values seem to be in the 7-13 range. Seed is the starting point of the image. The AI will try to turn noise made out of this seed into something that fulfills your prompt. ### Interlude - How the AI works under the hood <details> <summary>Open here</summary> To get the best results and make sense of the settings, it's best to get at least a basic understanding on how the AI actually works. The AI doesn't actually "draw" a picture like you would. Instead, it tries to remove noise from an image. This approach is called a "Diffusion Model". Those models have proven to work much better than previous more straight-forward approaches. You can read more about it [here](https://www.assemblyai.com/blog/diffusion-models-for-machine-learning-introduction/). At first, the AI is given your prompt and a random noise texture generated from the Seed. Then, it tries to remove the noise, but going from 0 to finished does not work very well (you can see it for yourself by setting "Sampling Steps" to 1). To get better results, the now-denoised image is given a smaller amount of new noise and fed back into the AI so that it can be denoisd again. This is repeated with less and less added noise for the "Sampling Steps" amount of times. Try increasing the steps 1 by 1 to see the increase in quality. > | ![Sweep of Sampling Steps for the same seed](https://i.imgur.com/S3Fhj8W.gif) | ![Final image at 21 steps](https://i.imgur.com/KD2t4GL.png) | > | -------- | -------- | > > > This is an animation showing a sweep of Sampling Steps from 1 to 21. Each picture was given 1 more step. All pictures use the same prompt and seed. The more steps you have, the more detailed and sharp the image tends to be, but the scaling is logarithmic, which means it gets much better at lower values, but has very little improvement above a certain point (around 20 is optimal for quality vs render time, 30 for better quality). More steps doesn't mean better content though, especially with complicated things like hands - you can often get better results with less steps here. This is caused by the AI "overshooting" its target and trying to denoise the shapes that it already fleshed out to a good degree. </details> ### Redrawing existing images (img2img) This mode is similar to txt2img, except instead of random noise the AI is given an existing image and the generation process starts midway-through. It can be used to apply a prompt to an existing image to modify it, or to completely replace parts of the image. It's very useful for tweaking images generated by txt2img. The point at which the AI starts redrawing your image is controlled by the "Denoising strength". What this does, is specify at what step the AI is supposed to start at (as opposed to step 0 - pure noise). - 0 = finished image - you will be given back your image without changes. - 1 = random noise - your image will be noised so much it won't even matter. If for example you select 0.50 Denoising strength, your picture will be given half of the noise, and the AI will start at halfway point, so if you set the Sampling Steps to 20 the result will be 20 * 0.50 = 10 sampling steps will be preformed. This means that higher values will take longer to process. The optimal values for "Denoising strength" are different depending on what you are looking for: - If you want the AI to keep most of the image intact and only touch it up, keep it around 0.1 (e.g. fixing up linework, antialiasing). - If you want the AI to keep all of the important parts the image while changing its style (e.g. make a KK screenshot look like 2D art) or modify small elements (e.g. change the character's expression), values in the 0.15 - 0.35 range work well. - If you want the AI to make major changes to the image while roughly keeping the subject, use values around 0.50. > ![Preview of different denoising strengths](https://i.imgur.com/LCHjkEG.png) > Example of different denoising strengths for a Koikatsu screenshot and prompt `2d, anime, angry` (the actual prompt was more detailed but you get the idea). Image size (Width/Height) should be close to or preferably in the same aspect ratio as the input image, or the image will be stretched to fit and the output might become deformed. Either adjust the long edge to fit, or turn on something other than the "Just resize" option: - "Crop and resize" will crop the input image to fit. A preview will be shown once you start dragging size sliders. - "Resize and fill" will scale the image down to fit inside your target size, and it will let the AI fill in the empty parts of the image (works beter at higher denoising strengths and step counts). ### Redrawing only some areas of an image (Inpaint) You can make the AI redraw only a small part of an image by using the "Inpaint" mode. You have to draw over the area you want to redraw with your cursor. The options work mostly the same as in img2img mode. This mode can be used to fix places that the AI messed up, e.g. to remove extra limbs, change how joints bend or do targetted changes like changing only hair color. To use it you can draw a mask on your image directly in the webui or upload a mask (an black-white image with black for the masked area). When in inpaint mode there are a few additional sliders and settings: - **Mask blur:** specifies how "hard" the masks edge is. At a value of 0 it will be perfectly sharp and the new generated content will probably not fit well into the rest of the image. For smaller masks use smaller values but I can genrally recommend values between 4 and 10. - **Inpaint masked or not masked**: which part of the image the AI should change (usefull when you want to change everything except the face for example) - **Masked Content**: the content with which the mask will be filled and which the AI will try to defuse. There are 4 options: - *fill*: fills the mask with a flat colors, this is not really useful from my experience. - *original*: uses the original image under the mask as a starting point for denoising, basically the same as img2img. - *latent noise*: fills the mask with latent noise, use this if you want to inpaint new content. - *latent nothing*: fills the mask with "latent zeros" (the latent space is bascially a "step" in the process of denoising), this is also not very useful. - **Inpaint at full resolution**: this will upscale the masked part to the resolution you selected with the resolution sliders and afterwards downscale it back into the image. Use this if you have a very small mask, for example when inpainting a face that is far away from the camera. ### Upscaling #### You can upscale your image in the "Extras" tab This will take any image and attempt to upscale it better than a simple resize (similarly to waifu2x, but it's not exactly the same). - Lanczos is the fastest but also the worst (it's what most image editors come with). - SwinIR 4x and BSRGAN 4x seem to be the best for anime-style pictures. - ScuNET GAN seems to be the best for realistic pictures. All upscalers other than Lanczos use neural networks, which will have to be downloaded during the first time you use them. This process is fully automated, you only need to wait a bit for the download to finish. You can see the download progress in the console window. These networks are easier and faster to run than the SD Upscale, but also less powerful. #### You can also use "Stable Diffusion Upscale" (SD Upscale) in img2img, via the "Script" dropdown menu This will not only upscale, but can also help with "fixing" some weirdness in the image because it will do the same denoising as normal img2img. You usually want to keep the image mostly the way it is, so using a denoising strengh of 0.2 to 0.3 is highly recommended. With very low denoising values it will only do a few steps, therefore I recommend bumping up the steps to at least 35-45. Because of the way the SD upscale script works, you should always leave batchsize at 1, otherwise you'll just waste time and energy. ### Other functions You can see what settings and prompt an image was generated with in the "PNG Info" tab. The image file must be unmodified after it was generated, or the metadata might be lost. Rest of the features are more advanced and require separate guides to use optimally. The rabbit hole goes deep. Very deep :wink: ## How to write prompts (prompt engineering) <details> <summary>There are a lot of way better resources for this on the internet, but you can still read it if you want.</summary> Writing prompts can be considered an art on its own. It's basically like telling a genie what you wish for, only to have your wish technically fulfilled, but in the most horrifying way possible. > ![Examples of bad generations](https://i.imgur.com/el1p2bb.png) > I want you to redraw this picture with `legs, feet`... no, no not like this, the opposite... I didn't mean it that way! Usually you want to start with some simple set of generic tags like `2d, high quality, highly detailed` for the prompt and `3d, low quality, watermark` for the negative prompt. Add more specific tags as necessary depending on what output you get. In img2img, generally speaking, the higher the denoising strength is, the better the description has to be or you'll start losing important features of your image. If you want your image to look anything like the source it's not uncommon to hit the 75 tag limit when going with more than 0.3 denoising (in the latest update to Automatic1111's WebUI this limit was apparently increased). "Prompt Engineering" refers to the process of finding and combining tags that make the AI work better to get the best output possible. You can read about it in detail [here](https://stablediffusion.fr/prompt) and [here](https://strikingloo.github.io/stable-diffusion-vs-dalle-2). </details> ## Tips and Tricks <details> <summary>This section is not really up to date anymore, but you can still read it if you want.</summary> *Please note that the following is entirely based on my (Njaecha's) experience and may only apply to the `WD 1.3 Float 32 EMA Pruned` model!* [Here](https://rentry.org/faces-faces-faces)'s a collection of useful tags with preview pictures. ### Tips for txt2img One thing you will certainly notice when playing around with text2img is the AI's bias. Certain tags will often bring concepts or things with them that don't necessarily relate to the tag itself. The `cute` tag for example, in combination with `girl` will often generate very young looking characters. To prevent the AI from doing that you can write the bias into the negative prompt or write the oppsite as an additional tag into the main prompt. You may want to write `a cute girl with a mature body` or put `young` in the negative prompt for exmaple. Secondly here are some settings I can recommend for starting out with a new prompt: - **Steps: 35** - **CFG scale: 10** - **width and hight:** - 512x512 if you don't have a certain image in mind, you will get a lot of chopped heads though - 512x704 if you want to generate a portait or full body image of *one* character - 704x512 if you want to generate a image with a group of people or a landscape Last but not least a few tips for writing prompts: - It is often helpful to specify where the camera should be aimed at when generating. You can... - use `focus on upper body` to get less chopped heads while keeping the upper body - use `full body image` if you want to legs and torso, especially good for standing poses - tag certain perspectives like `worm's eye view` or `bird's eye view` - ... - Specify features of a character instead of using a collective name: - instead of `catgirl` use `girl with catears and a tail` - instead of the name of a hairstyle, describe it with tags like `ponytail`, `long hair` or `over the eye bangs` - Describe the clothing you want your characters to wear, colors work really well here. You may even use certain iconic brands. - e.g. `wearing a red dress`, `white leotard` or `grey hoodie and adidas leggins` - It's often useful to mention a style or genre for the image - for example `fantasy` for anything with armor or medival weapons - or `cyberpunk` for futuristic stuff like cyborgs or andriods - Try also mentioning the background of your image to give it an overall style - use `classroom` or `at school` for a school setting - use `in a forest` or `mountains in background` for fantasy - use `at the beach` or `in a river` for something with swimsuits - ... - If some part of the image is often messed up it might be because the AI can't pick between all of the different options. In that case you might be able to fix it by add a tag related to said feature. - use `fist`, `open hand` or similar to improve how hands are drawn - specify any background at all, even just `simple background` will improve results - specify colors and texture of clothes and such, e.g.`solid white background` ### Tips for img2img (with Koikatsu images) First of all there is a really useful button in the img2img mode: "Interrogate". When you click that Waifu Diffusion will have a look at your source image and try to describe it. It does that in a way that is easy for it to understand, so you can take that as reference when writing your own prompt. I usually let the AI interrogate my image once and then change the prompt to better fit it. It will often misunderstand certain parts or find things that are not on your image at all. > ![Screenshot of the web UI with the interrogate button marked and an example image](https://i.imgur.com/5tBgwRj.png) > The interrogate button (marked in yellow). Image on the right is the source image again, so that you can see it better. When interrogating this :arrow_heading_up: image the AI returned `"a girl with a sword and a cat on her shoulder is posing for a picture with her cat ears, by Toei Animations"` which is obviously not quite what the image shows. I would change this to something like `"a girl with red hair and cat ears is holding a sword and is doing a defensive pose infront of the camera, pink top, blue skirt, focus on upper body"` *Fun fact: almost every Koikatsu image will by interrogated as "by Toei Animations" because thats the more or less only "artist" that Google's BLIP model (which is used for this feature) knows for Koikatsu's style. Sometimes it will also say `by sailor moon` though.* In the screenshot above you can also see my recommended base settings for img2img with a Koikatsu source image: - **Steps: 35** - **width and height matched to the source image** (512x896 for a 9:16 ratio in this case) - **CFG Scale: 10** - **Denoising Strenght: 0.25** - *Please note that I have my Batchsize a 4 because my GPU can handle it, I recommend you first do a Batchsize of 1 and pay attention to the VRam usage* After you ran with those base settings you can adjust them: - Adjust the prompt if the AI misunderstands things like hair ornaments and mention those. - If you loose too much detail in the image, lower the denoising strenght. This can help a lot with hands and genitalia. - If the image gets blurry or there is details that are kinda there but not really, increase the steps. - If the image differs too much from the original and the think the prompt should be good enough, try increasing the CFG scale. In case you roll a really good image but there is this one thing bothering you, instead of going into inpaint to try to fix it, you can also copy the seed, change the settings slightly and regenerate. Stable Diffusion is a "frozen" model by default, so generating with the same settings on the same seed will result in the same image. In img2img it is especially useful to describe the clothing your character is wearing. The color will usually stay the same but the type of clothing might heavily differ from the source image if you dont. While the AI is impressively good at understanding the images, there might be parts where there is something unnatural in the source image (for example skin clipping through clothing). This can confuse the AI and make it try to generate some kind of object from it, which we dont want. A quick and easy solution for that to hop into photoshop and simply edit those things away. It doesn't have to be a good edit, just enough that Waifu Diffusion wont get confused. All in all photoshop (or GIMP) is very useful for any removing small mistakes the AI made. Or you can combine two or more good images to get one great image. Fro example take the face from image A and the body from image B. Furthermore, most things I said in the txt2img section also apply to img2img. If you skipped to this part right away consider giving it a read. #### Addition to Tagging - WD 1.4 Tagger extension: Automatic1111's webui has support for extensions now and there is a very useful extension for tagging called [**stable-diffusion-webui-wd14-tagger**](https://github.com/toriato/stable-diffusion-webui-wd14-tagger). It can analyse any images and use an image recognition AI called [**deepdanbooru**](https://github.com/KichangKim/DeepDanbooru) that will basically tag the image for you. You can then just copy paste these tags to be used with Waifu Diffusion (remember: WD is trained on Danbooru). *Note: This also works quite well for NovelAI, they seem to use a simalar tagging system*. Installing it so quite simple: 1. Go to the "Extension" Tab, choose "Available" and "Load From" [this URL](https://raw.githubusercontent.com/wiki/AUTOMATIC1111/stable-diffusion-webui/Extensions-index.md) (should be there by default) . > ![Screenshot of the extension tab in Automatic's webui](https://i.imgur.com/Q0BgMQ0.png) The "WD 1.4 Tagger" extension is towards the bottom of the list. 2. Install the extension called **WD 1.4 Tagger**. 3. To use the extension and not get an error when launching the webUI do the following (taken from [step 2 here](https://github.com/toriato/stable-diffusion-webui-wd14-tagger/blob/master/README.md)). : 1. Download a release from [here](https://github.com/KichangKim/DeepDanbooru/releases) and put content into `models/deepdanbooru`. 2. Download [this](https://mega.nz/file/ptA2jSSB#G4INKHQG2x2pGAVQBn-yd_U5dMgevGF8YYM9CR_R1SY) and put he content into `extensions/stable-diffusion-webui-wd14-tagger` 4. Now you should be able to start the webui and see a new tab called "Tagger" To use the extension open the **"Tagger"** tab and choose either *"wd14"* or *"deepdanbooru"* from the `Interrogator` dropdown. *If the dropdown is empty you did not install the additional models corretly. Read the above and make sure you put the dowloaded files in the correct folders.* Then just choose or drag'n'drop any image as `Source` and it will spit out a bunch of tags in the top right. I recommend putting the `Treshold` slider to something above 0.5 so that it only spits out tags with a confidence score of more than 50%. > ![Screenshot of the tagger tab with an example interrogate](https://i.imgur.com/nOXPrlw.png) wd14 and deepdanbooru "find" different tags so its worth trying both and looking at differences and the confidence ratings. Now you can copy the tags to txt2img or img2img or use as inspiration which tags to put in a prompt of your own. Depending on how many tags you use and how confident the interrogation was, you can generate images that are quite similar to the one you entered. ### About artist tags... "Artists" (the `by [artist name]` or `in the style of [name of work]` tags) are bascially a way to tell the AI what style it is supposed to mimic. If you ask it to generate a `Picture of Hatsune Miku in the style of HR Giger` for example you can get some really freaky results: > | ![Hatsune Miku in the style of HR Giger img1](https://i.imgur.com/kdSzC4T.png) | ![Hatsune Miku in the style of HR Giger img2](https://i.imgur.com/kk3TZkk.png) | ![Hatsune Miku in the style of HR Giger img3](https://i.imgur.com/djlU2tA.png) | > | --- | --- | --- | As Waifu Diffusion is trained on Danbooru, you can try some of your favourite doujin artists, but often the amount of images in the training data is too small for it to "know" those. As a rule of thumb you could say that *the more famous an artist (on a global scale) is, the higher is the chance that WD knows their style.* I personally don't use artists for anime images and koikatsu img2img as its not really necessary but if your source image already has some kind of style you might want to specify it. If you made a Jojo character in koikatsu for example, writing `in the style of Jojo's bizarre adventures` is probably a nice addition. It's also a lot of fun to try out what your character would look like in certain styles: > ![Njaecha's discord avatar girl in a white sailor uniform original](https://i.imgur.com/kFOC6Ym.png) | ![Njaecha's discord avatar girl in a white sailor uniform in the style of dragon ball](https://i.imgur.com/WWWk42j.png) > | --- | --- | > original image | `[...] in the style of dragon ball` > > here I had to use a denoising strength of 0.5 because I wanted the image to change a lot </details> ## Further reading ### Basics - More information about SD - https://huggingface.co/blog/stable_diffusion - Readme and tutorials for the WebUI - https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki - [StableDiffusion subreddit](https://old.reddit.com/r/StableDiffusion/) - mostly for sharing impressive results and news <details> <summary>Notable links (outdated and less relevant)</summary> - [List of SD Tutorials & Resources on Reddit](https://old.reddit.com/r/StableDiffusion/comments/yknrjt/list_of_sd_tutorials_resources/) - Various models to download, both SFW and NSFW - https://rentry.org/sdmodels (outdated) - Arki's Stable Diffusion Guides - https://stablediffusionguides.carrd.co/ - About GANs - https://www.geeksforgeeks.org/generative-adversarial-network-gan/ - About Dreambooth (generate specific characters) - https://dataconomy.com/2022/09/google-dreambooth-ai-stable-diffusion/ (outdated, use Loras instead) - [Faces-Faces-Faces](https://rentry.org/faces-faces-faces) - useful face-related tags with previews - [NovelAI Tag experiments](https://zele.st/NovelAI/) - useful tags with previews - [Download the Dall-E 2 model](https://www.youtube.com/watch?v=dQw4w9WgXcQ) - [SD Resource Goldmine](https://rentry.org/sdgoldmine) - a huge collection of resources and links related to stable diffusion </details> ### Advanced topics - ControlNet guide (Koikatsu focus; Offers better results than the basic setup explained in this guide but requires more work) - https://rentry.org/ControlNetKoiGuide - ComfyUI (Very powerful UI that allows for customizing the pipeline, combining models, using different prompts for parts of an image, and more. It's obviously far more difficult to use than WebUI but it's worth it for advanced users) - https://www.youtube.com/watch?v=vUTV85D51yk ## Credits - ManlyMarco - The guide - Njaecha - The overhauled guide - Guicool - Updates to the "Requirements" section