ManlyMarco
    • Create new note
    • Create a note from template
      • Sharing URL Link copied
      • /edit
      • View mode
        • Edit mode
        • View mode
        • Book mode
        • Slide mode
        Edit mode View mode Book mode Slide mode
      • Customize slides
      • Note Permission
      • Read
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Write
        • Only me
        • Signed-in users
        • Everyone
        Only me Signed-in users Everyone
      • Engagement control Commenting, Suggest edit, Emoji Reply
      • Invitee
    • Publish Note

      Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

      Your note will be visible on your profile and discoverable by anyone.
      Your note is now live.
      This note is visible on your profile and discoverable online.
      Everyone on the web can find and read all notes of this public team.
      See published notes
      Unpublish note
      Please check the box to agree to the Community Guidelines.
      View profile
    • Commenting
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
      • Everyone
    • Suggest edit
      Permission
      Disabled Forbidden Owners Signed-in users Everyone
    • Enable
    • Permission
      • Forbidden
      • Owners
      • Signed-in users
    • Emoji Reply
    • Enable
    • Versions and GitHub Sync
    • Note settings
    • Engagement control
    • Transfer ownership
    • Delete this note
    • Save as template
    • Insert from template
    • Import from
      • Dropbox
      • Google Drive
      • Gist
      • Clipboard
    • Export to
      • Dropbox
      • Google Drive
      • Gist
    • Download
      • Markdown
      • HTML
      • Raw HTML
Menu Note settings Sharing URL Create Help
Create Create new note Create a note from template
Menu
Options
Versions and GitHub Sync Engagement control Transfer ownership Delete this note
Import from
Dropbox Google Drive Gist Clipboard
Export to
Dropbox Google Drive Gist
Download
Markdown HTML Raw HTML
Back
Sharing URL Link copied
/edit
View mode
  • Edit mode
  • View mode
  • Book mode
  • Slide mode
Edit mode View mode Book mode Slide mode
Customize slides
Note Permission
Read
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Write
Only me
  • Only me
  • Signed-in users
  • Everyone
Only me Signed-in users Everyone
Engagement control Commenting, Suggest edit, Emoji Reply
Invitee
Publish Note

Share your work with the world Congratulations! 🎉 Your note is out in the world Publish Note

Your note will be visible on your profile and discoverable by anyone.
Your note is now live.
This note is visible on your profile and discoverable online.
Everyone on the web can find and read all notes of this public team.
See published notes
Unpublish note
Please check the box to agree to the Community Guidelines.
View profile
Engagement control
Commenting
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
  • Everyone
Suggest edit
Permission
Disabled Forbidden Owners Signed-in users Everyone
Enable
Permission
  • Forbidden
  • Owners
  • Signed-in users
Emoji Reply
Enable
Import from Dropbox Google Drive Gist Clipboard
   owned this note    owned this note      
Published Linked with GitHub
3
Subscribed
  • Any changes
    Be notified of any changes
  • Mention me
    Be notified of mention me
  • Unsubscribe
Subscribe
# Basics of using StableDiffusion, or "How to make your Koikatsu pictures look arguably better" This guide is meant for beginners that want to try generating your own anime pictures or modifying existing pictures, e.g. KK screenshots. It will guide you through installing a local copy of WebUI and some SD (stable diffusion) models. **Word of caution: you will need\* a beefy NVIDIA GPU, details below.** > ![Preview of a KK screenshot being redrawn in anime style](https://i.imgur.com/1Wbd6aA.png) > Example of what can be accomplished with this guide (% refers to how much the AI was allowed to redraw, roughly speaking). First, some basics: - SD / Stable Diffusion is a diffusion model. - Diffusion Models are generative models, meaning that they are used to generate data similar to the data on which they are trained. - WD / Waifu Diffusion is a custom version of SD that was fine-tuned on images from Danbooru, making it much better at generating anime-style pictures. If you want to generate realistic pictures, use SD instead. - "Model" or "model file" usually refers to ckpt/checkpoint files that contain weights, which are basically the AI brain's neurons. - WebUI - a browser interface for interacting with models. If you've ever generated a picture online or used waifu2x, this is it. You can get basic image generation set up pretty easily, but beware, the rabbit hole is very deep. There are new models and scripts released every day, each possibly an improvement. If you want to have the best available, prepare for a lot of reading and a new hobby. This guide is not that. ## Requirements - A PC running Windows10/11 *(or Linux or MacOS, see [Alternatives](https://hackmd.io/k4COnMKpRVOZjYImI270dQ?both#Alternatives))* - A GPU with at least 6GB of VRAM. While it is *possible* to run Stable Diffusion on AMD or Intel GPUs, or even on the CPU/APU (for Mac), an **NVIDIA card is highly recommended** and even required for certain things like some extensions. ### Alternatives Running Stable Diffusion on other ecosystems is possible but limited. Performance will be far worse and some things that require cuda wont work at all. - If you're on Linux or MacOS with Apple Silicon, read [this](https://github.com/AUTOMATIC1111/stable-diffusion-webui#automatic-installation-on-linux). - If you have an AMD GPU, read [this](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs). - If you have an Intel Arc GPU, refer to [this](https://medium.com/intel-analytics-software/stable-diffusion-with-intel-arc-gpus-f2986bba8365) *(untested, seems very complicated)* ### Why NVIDIA? NVIDIA provides programmers with an API called *Compute Unified Device Architecture* (CUDA) which allows access to a big set of general purpose cores, which act similar to CPU cores but are a lot faster at parallel computing tasks. Most of todays machine learning ecosystems are build upon CUDA accelerated libraries and can only leverage their full potential when CUDA is available. ### About VRAM Stable Diffusion *used* to require large amounts of VRAM, which can only be found in very expensive GPUs such as the RTX 3090/4090 or NVIDIAs workstation cards. Since the the release in September 2022, VRAM usage for inference has improved tremendously and you can **easily get away with 16, 10 or even 8 GB or VRAM.** Even less VRAM is possible but starts to suffer from similar issues as using non NVIDIA cards. Only advanced tasks such as training still require bigger amounts of VRAM, but even that has improved by a lot. Nevertheless, the higher the VRAM the better, as you can go for higher resolution and higher batchsizes (how many images are generated simultaneously. ## What to download - Python 3.10.6 from https://www.python.org/downloads/windows/ (get the "Windows installer (64-bit)") - The currently newest version of xformers requires Python 3.10.9, which also fine for most tasks, but is apparently slow for training. - Python 3.11 is **not** recommended. - Latest version of GIT from https://git-scm.com/download/win (get the "64-bit Git for Windows Setup") - At least one Stable Diffusion model to use for inference. Models can be found and downloaded on [Civitai](https://civitai.com/) or [Huggingface](https://huggingface.co/models?other=stable-diffusion). Please look for models marked as **Checkpoint** (**not** Lora, Hypernetwork or Textual Inversion). If possible you should always go for `.safetensor` and **not** `.ckpt` files. ## Installing WebUI and models 1. Install Python 3.10.6. **Make sure to check "Add Python to PATH" during installation.** 2. Install Git. Make sure to install shell/path integration. 3. Download the stable-diffusion-webui repository by creating an empty folder, opening command line (shift+right click in the folder) in it, and running `git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git`. Afterwards you can auto update by running the `git pull` command. 4. Place any checkpoint files (`.safetensor`/`.ckpt` files, should be a few GB big) you have in the `models\Stable-diffusion` directory (it should already exist and contain a "Put Stable Diffusion checkpoints here.txt" file in it). **Tipp:** If you want to use other UIs or programs that use Stable Diffusion, which have their own models folder, you can store your models in a central folder and **symlink** it to the models folder for each UI. This way you dont have to keep multiple copies of the big checkpoint files. **Note:** If you have a **16xx series GPU (e.g. GTX1660)**, you will most likely get a black screen when generating images. To fix this, you have to edit `webui-user.bat` and add `--precision full --no-half --medvram` to `COMMANDLINE_ARGS`. **On all cards:** If you get black images, try using `--no-half-vae` ## How to launch and use the WebUI 1. After you've installed everything, double click `webui-user.bat` (as normal user, not administrator). See if there are any errors in the console window that opened. It should download and install a bunch of other requirements, and once ready say: `Running on local URL: http://127.0.0.1:7860`. 2. Open your web browser and connect to [http://127.0.0.1:7860](http://localhost:7860) You should now see something like this: ![WebUI preview](https://i.imgur.com/xD0JD7c.png) 3. Type something in the "Prompt field" and click the "Generate" button. A few extra things might need to be downloaded the first time, check the console window for progress (don't close it until you're done using the app!). Eventually a picture should appear in the preview. 4. All generated pictures are saved in the `outputs` folders. If you use the batch feature the `-grids` folders will also contain a copy. **Note:** If you have issues getting things to start/run properly, check [this page](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Troubleshooting) for troubleshooting steps. ### Generating new images from text (txt2img) #### Prompting Write a rough description of what you want in the "Prompt" field. You can use natural language, but for anime models you'll most likely have to write in danbooru tags (e.g. `1girl, solo, dress, long hair, blue eyes`). It's very common to prefix your prompts with `Masterpiece, best quality,` and put a bunch of *bad tags* in the negative (e.g.`lowres, blurry, bad anatomy, bad hands, worst quality, jpeg artifacts`) Generate will yield you a picture that at least mostly fits the description. Now starts the process of "prompt engineering", which is bascially the art of writing prompts in a way that makes the AI generate what you want. A big part of that is **emphasis**. You can use `()` to increase emphasis by a factor of 1.1. If you want more control over the emphasis use this syntax: `(<tag>:<factor>)` (e.g. `(big eyes:1.2)` or `(long hair:0.8)`). It should be noted that too much emphasis can quickly make things worse than using no emphasis, so dont overdo it. If somehthine does not work out, try prompting differntly (use a synoym or describe the feature) and consider negative prompts. Powerful anime models such as Anything v3/4 can generate acceptable images almost every try. But still, expect a lot of trail and error before getting something that really suits you. **Note:** You can hover over some of the buttons and labels to see a more info popup. #### Sampler Settings Next, there's the "Sampling Steps" section. It controls how many times the AI approximates the image. `Euler a` and 20-40 steps is optimal for a start. Higher values require more time to process, but do not increase the VRAM usage. 30-40 steps is usually the best for most models, its genrally accepted that anything over 50 is a waste of time and engery. It's worth experimenting with other samplers but for Anime `Euler a` and `Euler` are genrally the best. #### Resolution The default resolution of Stable Diffusion is 512x512. But depending on the model you are using you might get better results with higher resolution and different aspect ratios. Bigger Images will take more time and VRAM though. #### Batch The batch options let you generate multiple images at once. `Batch Size` indicates how many images to gernate simultaneously (taking more time and VRAM), while `Batch Count` refers to how many batches in a row it should generate with the same settings. There really isnt any reason to increase the batch count, is there are not benefits over hitting the generate button again when it finishes. #### Further Settings "CFG Scale" specifies how strongly your prompt should be followed. Too high value might cause disturbing effects, like a `barefoot` prompt generating a picture full of detached and mangled feet. Too low scale will result in the AI doing its own thing and not caring too much about what you want. The optimal values seem to be in the 7-13 range. Seed is the starting point of the image. The AI will try to turn noise made out of this seed into something that fulfills your prompt. ### Interlude - How the AI works under the hood <details> <summary>Open here</summary> To get the best results and make sense of the settings, it's best to get at least a basic understanding on how the AI actually works. The AI doesn't actually "draw" a picture like you would. Instead, it tries to remove noise from an image. This approach is called a "Diffusion Model". Those models have proven to work much better than previous more straight-forward approaches. You can read more about it [here](https://www.assemblyai.com/blog/diffusion-models-for-machine-learning-introduction/). At first, the AI is given your prompt and a random noise texture generated from the Seed. Then, it tries to remove the noise, but going from 0 to finished does not work very well (you can see it for yourself by setting "Sampling Steps" to 1). To get better results, the now-denoised image is given a smaller amount of new noise and fed back into the AI so that it can be denoisd again. This is repeated with less and less added noise for the "Sampling Steps" amount of times. Try increasing the steps 1 by 1 to see the increase in quality. > | ![Sweep of Sampling Steps for the same seed](https://i.imgur.com/S3Fhj8W.gif) | ![Final image at 21 steps](https://i.imgur.com/KD2t4GL.png) | > | -------- | -------- | > > > This is an animation showing a sweep of Sampling Steps from 1 to 21. Each picture was given 1 more step. All pictures use the same prompt and seed. The more steps you have, the more detailed and sharp the image tends to be, but the scaling is logarithmic, which means it gets much better at lower values, but has very little improvement above a certain point (around 20 is optimal for quality vs render time, 30 for better quality). More steps doesn't mean better content though, especially with complicated things like hands - you can often get better results with less steps here. This is caused by the AI "overshooting" its target and trying to denoise the shapes that it already fleshed out to a good degree. </details> ### Redrawing existing images (img2img) This mode is similar to txt2img, except instead of random noise the AI is given an existing image and the generation process starts midway-through. It can be used to apply a prompt to an existing image to modify it, or to completely replace parts of the image. It's very useful for tweaking images generated by txt2img. The point at which the AI starts redrawing your image is controlled by the "Denoising strength". What this does, is specify at what step the AI is supposed to start at (as opposed to step 0 - pure noise). - 0 = finished image - you will be given back your image without changes. - 1 = random noise - your image will be noised so much it won't even matter. If for example you select 0.50 Denoising strength, your picture will be given half of the noise, and the AI will start at halfway point, so if you set the Sampling Steps to 20 the result will be 20 * 0.50 = 10 sampling steps will be preformed. This means that higher values will take longer to process. The optimal values for "Denoising strength" are different depending on what you are looking for: - If you want the AI to keep most of the image intact and only touch it up, keep it around 0.1 (e.g. fixing up linework, antialiasing). - If you want the AI to keep all of the important parts the image while changing its style (e.g. make a KK screenshot look like 2D art) or modify small elements (e.g. change the character's expression), values in the 0.15 - 0.35 range work well. - If you want the AI to make major changes to the image while roughly keeping the subject, use values around 0.50. > ![Preview of different denoising strengths](https://i.imgur.com/LCHjkEG.png) > Example of different denoising strengths for a Koikatsu screenshot and prompt `2d, anime, angry` (the actual prompt was more detailed but you get the idea). Image size (Width/Height) should be close to or preferably in the same aspect ratio as the input image, or the image will be stretched to fit and the output might become deformed. Either adjust the long edge to fit, or turn on something other than the "Just resize" option: - "Crop and resize" will crop the input image to fit. A preview will be shown once you start dragging size sliders. - "Resize and fill" will scale the image down to fit inside your target size, and it will let the AI fill in the empty parts of the image (works beter at higher denoising strengths and step counts). ### Redrawing only some areas of an image (Inpaint) You can make the AI redraw only a small part of an image by using the "Inpaint" mode. You have to draw over the area you want to redraw with your cursor. The options work mostly the same as in img2img mode. This mode can be used to fix places that the AI messed up, e.g. to remove extra limbs, change how joints bend or do targetted changes like changing only hair color. To use it you can draw a mask on your image directly in the webui or upload a mask (an black-white image with black for the masked area). When in inpaint mode there are a few additional sliders and settings: - **Mask blur:** specifies how "hard" the masks edge is. At a value of 0 it will be perfectly sharp and the new generated content will probably not fit well into the rest of the image. For smaller masks use smaller values but I can genrally recommend values between 4 and 10. - **Inpaint masked or not masked**: which part of the image the AI should change (usefull when you want to change everything except the face for example) - **Masked Content**: the content with which the mask will be filled and which the AI will try to defuse. There are 4 options: - *fill*: fills the mask with a flat colors, this is not really useful from my experience. - *original*: uses the original image under the mask as a starting point for denoising, basically the same as img2img. - *latent noise*: fills the mask with latent noise, use this if you want to inpaint new content. - *latent nothing*: fills the mask with "latent zeros" (the latent space is bascially a "step" in the process of denoising), this is also not very useful. - **Inpaint at full resolution**: this will upscale the masked part to the resolution you selected with the resolution sliders and afterwards downscale it back into the image. Use this if you have a very small mask, for example when inpainting a face that is far away from the camera. ### Upscaling #### You can upscale your image in the "Extras" tab This will take any image and attempt to upscale it better than a simple resize (similarly to waifu2x, but it's not exactly the same). - Lanczos is the fastest but also the worst (it's what most image editors come with). - SwinIR 4x and BSRGAN 4x seem to be the best for anime-style pictures. - ScuNET GAN seems to be the best for realistic pictures. All upscalers other than Lanczos use neural networks, which will have to be downloaded during the first time you use them. This process is fully automated, you only need to wait a bit for the download to finish. You can see the download progress in the console window. These networks are easier and faster to run than the SD Upscale, but also less powerful. #### You can also use "Stable Diffusion Upscale" (SD Upscale) in img2img, via the "Script" dropdown menu This will not only upscale, but can also help with "fixing" some weirdness in the image because it will do the same denoising as normal img2img. You usually want to keep the image mostly the way it is, so using a denoising strengh of 0.2 to 0.3 is highly recommended. With very low denoising values it will only do a few steps, therefore I recommend bumping up the steps to at least 35-45. Because of the way the SD upscale script works, you should always leave batchsize at 1, otherwise you'll just waste time and energy. ### Other functions You can see what settings and prompt an image was generated with in the "PNG Info" tab. The image file must be unmodified after it was generated, or the metadata might be lost. Rest of the features are more advanced and require separate guides to use optimally. The rabbit hole goes deep. Very deep :wink: ## How to write prompts (prompt engineering) <details> <summary>There are a lot of way better resources for this on the internet, but you can still read it if you want.</summary> Writing prompts can be considered an art on its own. It's basically like telling a genie what you wish for, only to have your wish technically fulfilled, but in the most horrifying way possible. > ![Examples of bad generations](https://i.imgur.com/el1p2bb.png) > I want you to redraw this picture with `legs, feet`... no, no not like this, the opposite... I didn't mean it that way! Usually you want to start with some simple set of generic tags like `2d, high quality, highly detailed` for the prompt and `3d, low quality, watermark` for the negative prompt. Add more specific tags as necessary depending on what output you get. In img2img, generally speaking, the higher the denoising strength is, the better the description has to be or you'll start losing important features of your image. If you want your image to look anything like the source it's not uncommon to hit the 75 tag limit when going with more than 0.3 denoising (in the latest update to Automatic1111's WebUI this limit was apparently increased). "Prompt Engineering" refers to the process of finding and combining tags that make the AI work better to get the best output possible. You can read about it in detail [here](https://stablediffusion.fr/prompt) and [here](https://strikingloo.github.io/stable-diffusion-vs-dalle-2). </details> ## Tips and Tricks <details> <summary>This section is not really up to date anymore, but you can still read it if you want.</summary> *Please note that the following is entirely based on my (Njaecha's) experience and may only apply to the `WD 1.3 Float 32 EMA Pruned` model!* [Here](https://rentry.org/faces-faces-faces)'s a collection of useful tags with preview pictures. ### Tips for txt2img One thing you will certainly notice when playing around with text2img is the AI's bias. Certain tags will often bring concepts or things with them that don't necessarily relate to the tag itself. The `cute` tag for example, in combination with `girl` will often generate very young looking characters. To prevent the AI from doing that you can write the bias into the negative prompt or write the oppsite as an additional tag into the main prompt. You may want to write `a cute girl with a mature body` or put `young` in the negative prompt for exmaple. Secondly here are some settings I can recommend for starting out with a new prompt: - **Steps: 35** - **CFG scale: 10** - **width and hight:** - 512x512 if you don't have a certain image in mind, you will get a lot of chopped heads though - 512x704 if you want to generate a portait or full body image of *one* character - 704x512 if you want to generate a image with a group of people or a landscape Last but not least a few tips for writing prompts: - It is often helpful to specify where the camera should be aimed at when generating. You can... - use `focus on upper body` to get less chopped heads while keeping the upper body - use `full body image` if you want to legs and torso, especially good for standing poses - tag certain perspectives like `worm's eye view` or `bird's eye view` - ... - Specify features of a character instead of using a collective name: - instead of `catgirl` use `girl with catears and a tail` - instead of the name of a hairstyle, describe it with tags like `ponytail`, `long hair` or `over the eye bangs` - Describe the clothing you want your characters to wear, colors work really well here. You may even use certain iconic brands. - e.g. `wearing a red dress`, `white leotard` or `grey hoodie and adidas leggins` - It's often useful to mention a style or genre for the image - for example `fantasy` for anything with armor or medival weapons - or `cyberpunk` for futuristic stuff like cyborgs or andriods - Try also mentioning the background of your image to give it an overall style - use `classroom` or `at school` for a school setting - use `in a forest` or `mountains in background` for fantasy - use `at the beach` or `in a river` for something with swimsuits - ... - If some part of the image is often messed up it might be because the AI can't pick between all of the different options. In that case you might be able to fix it by add a tag related to said feature. - use `fist`, `open hand` or similar to improve how hands are drawn - specify any background at all, even just `simple background` will improve results - specify colors and texture of clothes and such, e.g.`solid white background` ### Tips for img2img (with Koikatsu images) First of all there is a really useful button in the img2img mode: "Interrogate". When you click that Waifu Diffusion will have a look at your source image and try to describe it. It does that in a way that is easy for it to understand, so you can take that as reference when writing your own prompt. I usually let the AI interrogate my image once and then change the prompt to better fit it. It will often misunderstand certain parts or find things that are not on your image at all. > ![Screenshot of the web UI with the interrogate button marked and an example image](https://i.imgur.com/5tBgwRj.png) > The interrogate button (marked in yellow). Image on the right is the source image again, so that you can see it better. When interrogating this :arrow_heading_up: image the AI returned `"a girl with a sword and a cat on her shoulder is posing for a picture with her cat ears, by Toei Animations"` which is obviously not quite what the image shows. I would change this to something like `"a girl with red hair and cat ears is holding a sword and is doing a defensive pose infront of the camera, pink top, blue skirt, focus on upper body"` *Fun fact: almost every Koikatsu image will by interrogated as "by Toei Animations" because thats the more or less only "artist" that Google's BLIP model (which is used for this feature) knows for Koikatsu's style. Sometimes it will also say `by sailor moon` though.* In the screenshot above you can also see my recommended base settings for img2img with a Koikatsu source image: - **Steps: 35** - **width and height matched to the source image** (512x896 for a 9:16 ratio in this case) - **CFG Scale: 10** - **Denoising Strenght: 0.25** - *Please note that I have my Batchsize a 4 because my GPU can handle it, I recommend you first do a Batchsize of 1 and pay attention to the VRam usage* After you ran with those base settings you can adjust them: - Adjust the prompt if the AI misunderstands things like hair ornaments and mention those. - If you loose too much detail in the image, lower the denoising strenght. This can help a lot with hands and genitalia. - If the image gets blurry or there is details that are kinda there but not really, increase the steps. - If the image differs too much from the original and the think the prompt should be good enough, try increasing the CFG scale. In case you roll a really good image but there is this one thing bothering you, instead of going into inpaint to try to fix it, you can also copy the seed, change the settings slightly and regenerate. Stable Diffusion is a "frozen" model by default, so generating with the same settings on the same seed will result in the same image. In img2img it is especially useful to describe the clothing your character is wearing. The color will usually stay the same but the type of clothing might heavily differ from the source image if you dont. While the AI is impressively good at understanding the images, there might be parts where there is something unnatural in the source image (for example skin clipping through clothing). This can confuse the AI and make it try to generate some kind of object from it, which we dont want. A quick and easy solution for that to hop into photoshop and simply edit those things away. It doesn't have to be a good edit, just enough that Waifu Diffusion wont get confused. All in all photoshop (or GIMP) is very useful for any removing small mistakes the AI made. Or you can combine two or more good images to get one great image. Fro example take the face from image A and the body from image B. Furthermore, most things I said in the txt2img section also apply to img2img. If you skipped to this part right away consider giving it a read. #### Addition to Tagging - WD 1.4 Tagger extension: Automatic1111's webui has support for extensions now and there is a very useful extension for tagging called [**stable-diffusion-webui-wd14-tagger**](https://github.com/toriato/stable-diffusion-webui-wd14-tagger). It can analyse any images and use an image recognition AI called [**deepdanbooru**](https://github.com/KichangKim/DeepDanbooru) that will basically tag the image for you. You can then just copy paste these tags to be used with Waifu Diffusion (remember: WD is trained on Danbooru). *Note: This also works quite well for NovelAI, they seem to use a simalar tagging system*. Installing it so quite simple: 1. Go to the "Extension" Tab, choose "Available" and "Load From" [this URL](https://raw.githubusercontent.com/wiki/AUTOMATIC1111/stable-diffusion-webui/Extensions-index.md) (should be there by default) . > ![Screenshot of the extension tab in Automatic's webui](https://i.imgur.com/Q0BgMQ0.png) The "WD 1.4 Tagger" extension is towards the bottom of the list. 2. Install the extension called **WD 1.4 Tagger**. 3. To use the extension and not get an error when launching the webUI do the following (taken from [step 2 here](https://github.com/toriato/stable-diffusion-webui-wd14-tagger/blob/master/README.md)). : 1. Download a release from [here](https://github.com/KichangKim/DeepDanbooru/releases) and put content into `models/deepdanbooru`. 2. Download [this](https://mega.nz/file/ptA2jSSB#G4INKHQG2x2pGAVQBn-yd_U5dMgevGF8YYM9CR_R1SY) and put he content into `extensions/stable-diffusion-webui-wd14-tagger` 4. Now you should be able to start the webui and see a new tab called "Tagger" To use the extension open the **"Tagger"** tab and choose either *"wd14"* or *"deepdanbooru"* from the `Interrogator` dropdown. *If the dropdown is empty you did not install the additional models corretly. Read the above and make sure you put the dowloaded files in the correct folders.* Then just choose or drag'n'drop any image as `Source` and it will spit out a bunch of tags in the top right. I recommend putting the `Treshold` slider to something above 0.5 so that it only spits out tags with a confidence score of more than 50%. > ![Screenshot of the tagger tab with an example interrogate](https://i.imgur.com/nOXPrlw.png) wd14 and deepdanbooru "find" different tags so its worth trying both and looking at differences and the confidence ratings. Now you can copy the tags to txt2img or img2img or use as inspiration which tags to put in a prompt of your own. Depending on how many tags you use and how confident the interrogation was, you can generate images that are quite similar to the one you entered. ### About artist tags... "Artists" (the `by [artist name]` or `in the style of [name of work]` tags) are bascially a way to tell the AI what style it is supposed to mimic. If you ask it to generate a `Picture of Hatsune Miku in the style of HR Giger` for example you can get some really freaky results: > | ![Hatsune Miku in the style of HR Giger img1](https://i.imgur.com/kdSzC4T.png) | ![Hatsune Miku in the style of HR Giger img2](https://i.imgur.com/kk3TZkk.png) | ![Hatsune Miku in the style of HR Giger img3](https://i.imgur.com/djlU2tA.png) | > | --- | --- | --- | As Waifu Diffusion is trained on Danbooru, you can try some of your favourite doujin artists, but often the amount of images in the training data is too small for it to "know" those. As a rule of thumb you could say that *the more famous an artist (on a global scale) is, the higher is the chance that WD knows their style.* I personally don't use artists for anime images and koikatsu img2img as its not really necessary but if your source image already has some kind of style you might want to specify it. If you made a Jojo character in koikatsu for example, writing `in the style of Jojo's bizarre adventures` is probably a nice addition. It's also a lot of fun to try out what your character would look like in certain styles: > ![Njaecha's discord avatar girl in a white sailor uniform original](https://i.imgur.com/kFOC6Ym.png) | ![Njaecha's discord avatar girl in a white sailor uniform in the style of dragon ball](https://i.imgur.com/WWWk42j.png) > | --- | --- | > original image | `[...] in the style of dragon ball` > > here I had to use a denoising strength of 0.5 because I wanted the image to change a lot </details> ## Further reading ### Basics - More information about SD - https://huggingface.co/blog/stable_diffusion - Readme and tutorials for the WebUI - https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki - [StableDiffusion subreddit](https://old.reddit.com/r/StableDiffusion/) - mostly for sharing impressive results and news <details> <summary>Notable links (outdated and less relevant)</summary> - [List of SD Tutorials & Resources on Reddit](https://old.reddit.com/r/StableDiffusion/comments/yknrjt/list_of_sd_tutorials_resources/) - Various models to download, both SFW and NSFW - https://rentry.org/sdmodels (outdated) - Arki's Stable Diffusion Guides - https://stablediffusionguides.carrd.co/ - About GANs - https://www.geeksforgeeks.org/generative-adversarial-network-gan/ - About Dreambooth (generate specific characters) - https://dataconomy.com/2022/09/google-dreambooth-ai-stable-diffusion/ (outdated, use Loras instead) - [Faces-Faces-Faces](https://rentry.org/faces-faces-faces) - useful face-related tags with previews - [NovelAI Tag experiments](https://zele.st/NovelAI/) - useful tags with previews - [Download the Dall-E 2 model](https://www.youtube.com/watch?v=dQw4w9WgXcQ) - [SD Resource Goldmine](https://rentry.org/sdgoldmine) - a huge collection of resources and links related to stable diffusion </details> ### Advanced topics - ControlNet guide (Koikatsu focus; Offers better results than the basic setup explained in this guide but requires more work) - https://rentry.org/ControlNetKoiGuide - ComfyUI (Very powerful UI that allows for customizing the pipeline, combining models, using different prompts for parts of an image, and more. It's obviously far more difficult to use than WebUI but it's worth it for advanced users) - https://www.youtube.com/watch?v=vUTV85D51yk ## Credits - ManlyMarco - The guide - Njaecha - The overhauled guide - Guicool - Updates to the "Requirements" section

Import from clipboard

Paste your markdown or webpage here...

Advanced permission required

Your current role can only read. Ask the system administrator to acquire write and comment permission.

This team is disabled

Sorry, this team is disabled. You can't edit this note.

This note is locked

Sorry, only owner can edit this note.

Reach the limit

Sorry, you've reached the max length this note can be.
Please reduce the content or divide it to more notes, thank you!

Import from Gist

Import from Snippet

or

Export to Snippet

Are you sure?

Do you really want to delete this note?
All users will lose their connection.

Create a note from template

Create a note from template

Oops...
This template has been removed or transferred.
Upgrade
All
  • All
  • Team
No template.

Create a template

Upgrade

Delete template

Do you really want to delete this template?
Turn this template into a regular note and keep its content, versions, and comments.

This page need refresh

You have an incompatible client version.
Refresh to update.
New version available!
See releases notes here
Refresh to enjoy new features.
Your user state has changed.
Refresh to load new user state.

Sign in

Forgot password

or

By clicking below, you agree to our terms of service.

Sign in via Facebook Sign in via Twitter Sign in via GitHub Sign in via Dropbox Sign in with Wallet
Wallet ( )
Connect another wallet

New to HackMD? Sign up

Help

  • English
  • 中文
  • Français
  • Deutsch
  • 日本語
  • Español
  • Català
  • Ελληνικά
  • Português
  • italiano
  • Türkçe
  • Русский
  • Nederlands
  • hrvatski jezik
  • język polski
  • Українська
  • हिन्दी
  • svenska
  • Esperanto
  • dansk

Documents

Help & Tutorial

How to use Book mode

Slide Example

API Docs

Edit in VSCode

Install browser extension

Contacts

Feedback

Discord

Send us email

Resources

Releases

Pricing

Blog

Policy

Terms

Privacy

Cheatsheet

Syntax Example Reference
# Header Header 基本排版
- Unordered List
  • Unordered List
1. Ordered List
  1. Ordered List
- [ ] Todo List
  • Todo List
> Blockquote
Blockquote
**Bold font** Bold font
*Italics font* Italics font
~~Strikethrough~~ Strikethrough
19^th^ 19th
H~2~O H2O
++Inserted text++ Inserted text
==Marked text== Marked text
[link text](https:// "title") Link
![image alt](https:// "title") Image
`Code` Code 在筆記中貼入程式碼
```javascript
var i = 0;
```
var i = 0;
:smile: :smile: Emoji list
{%youtube youtube_id %} Externals
$L^aT_eX$ LaTeX
:::info
This is a alert area.
:::

This is a alert area.

Versions and GitHub Sync
Get Full History Access

  • Edit version name
  • Delete

revision author avatar     named on  

More Less

Note content is identical to the latest version.
Compare
    Choose a version
    No search result
    Version not found
Sign in to link this note to GitHub
Learn more
This note is not linked with GitHub
 

Feedback

Submission failed, please try again

Thanks for your support.

On a scale of 0-10, how likely is it that you would recommend HackMD to your friends, family or business associates?

Please give us some advice and help us improve HackMD.

 

Thanks for your feedback

Remove version name

Do you want to remove this version name and description?

Transfer ownership

Transfer to
    Warning: is a public team. If you transfer note to this team, everyone on the web can find and read this note.

      Link with GitHub

      Please authorize HackMD on GitHub
      • Please sign in to GitHub and install the HackMD app on your GitHub repo.
      • HackMD links with GitHub through a GitHub App. You can choose which repo to install our App.
      Learn more  Sign in to GitHub

      Push the note to GitHub Push to GitHub Pull a file from GitHub

        Authorize again
       

      Choose which file to push to

      Select repo
      Refresh Authorize more repos
      Select branch
      Select file
      Select branch
      Choose version(s) to push
      • Save a new version and push
      • Choose from existing versions
      Include title and tags
      Available push count

      Pull from GitHub

       
      File from GitHub
      File from HackMD

      GitHub Link Settings

      File linked

      Linked by
      File path
      Last synced branch
      Available push count

      Danger Zone

      Unlink
      You will no longer receive notification when GitHub file changes after unlink.

      Syncing

      Push failed

      Push successfully