![pod logo](https://i.imgur.com/SlYH9da.png =600x408) ## Intro Welcome to episode twenty! This is your host, Doug Smith. This is Not An AI art podcast is a podcast about, well, AI ART – technology, community, and techniques. With a focus on stable diffusion, but all art tools are up for grabs, from the pencil on up, and including pay-to-play tools, like Midjourney. Less philosophy – more tire kicking. But if the philosophy gets in the way, we'll cover it. But plenty of art theory! Today we've got: * Model madness: Model review on 2 models * Training Gym: Some experiences with SDXL Training * Bloods and Crits: Art crits on two pieces * Technique of the week: IP Adapter Available on: * [Spotify](https://open.spotify.com/show/4RxBUvcx71dnOr1e1oYmvV) * [iHeartRadio](https://www.iheart.com/podcast/269-this-is-not-an-ai-art-podc-112887791/) * [Google Podcasts](https://podcasts.google.com/feed/aHR0cHM6Ly9hbmNob3IuZm0vcy9kZWY2YmQwOC9wb2RjYXN0L3Jzcw) Show notes are always included and include all the visuals, prompts and technique examples, the format is intended to be so that you don't have to be looking at your screen -- but the show notes have all the imagery and prompts and details on the processes we look at. ## News Midjourney has native upscaling now! Just an example of a raw upscale (I edited it for a final, of course) Overall I'm really happy with how it's working, and I plan to use it regularly. Couple notes: * Does burn your GPU hours quickly * Supposedly darkens your pieces * Sometimes you might wanna upscale with something else like Gigapixel anyway, which is still worth it. ![](https://hackmd.io/_uploads/rJfAaFZmT.jpg) ### Some links! * A controlnet text + Photoshop tutorial * [On Reddit](https://www.reddit.com/r/StableDiffusion/comments/17jboj0/text_in_sdxl_a_controlnet_and_photoshop_guide/?share_id=DYOmAp2MxluMXRPi8btvC&utm_content=1&utm_medium=android_app&utm_name=androidcss&utm_source=share&utm_term=1) * [On Civitai](https://civitai.com/articles/2746) * Seed LLaMa: An open source DALL-E * [On Reddit](https://www.reddit.com/r/StableDiffusion/comments/17cb51q/seedllama_opensource_dalle3/?share_id=WPEfCeF3BKczjsJW8whIg&utm_content=1&utm_medium=android_app&utm_name=androidcss&utm_source=share&utm_term=1) > We upgraded the SEED visual tokenizer (find the initial version here) and proposed SEED-LLaMA-8B/14B foundation models. The SEED-2 tokenizer can better preserve the rich visual semantics and reconstruct more realistic images. SEED-LLaMA is produced by large-scale pre-training and instruction tuning, demonstrating impressive performance on a broad range of multimodal comprehension and generation tasks. More importantly, SEED-LLaMA has exhibited compositional emergent abilities such as multi-turn in-context multimodal generation, acting like your AI assistant. * Insights on style training for SDXL * [On Reddit](https://www.reddit.com/r/StableDiffusion/comments/17bg9o9/ai_style_training_my_insights_on_ai_style/?share_id=YZ0sAjFHNj--fnHgdNcKd&utm_content=1&utm_medium=android_app&utm_name=androidcss&utm_source=share&utm_term=1) * [On Civitai](https://civitai.com/articles/2622/ai-style-training-my-insights-on-ai-style-training-on-dataset-preparation-experiment-inside) * Coherent facial expressions for the same character * [On Reddit](https://www.reddit.com/r/StableDiffusion/comments/17b6i47/9_coherent_facial_expressions_in_9_steps/?share_id=sNVasn3ot_o7j6J6XQ_p3&utm_content=1&utm_medium=android_app&utm_name=androidcss&utm_source=share&utm_term=1) ## Model Madness ### Juggernaut XL v6 Juggernaut basically always impresses me. * [On Reddit](https://www.reddit.com/r/StableDiffusion/comments/17idejf/juggernaut_xl_v6_released_amazing_photos_and/?share_id=ogjhr7FigJtO74Z5cFAuC&utm_content=1&utm_medium=android_app&utm_name=androidcss&utm_source=share&utm_term=1) * [On Civitai](https://civitai.com/models/133005/juggernaut-xl?modelVersionId=198530) ``` a 1920s flapper in a long dress on stage at a busy speakeasy, volumetric fog, dramatic lighting, color photography, RAW photo, analog style, depth of field Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 3516150792, Size: 816x1024, Model hash: 1fe6c7ec54, Model: juggernautXL_version6Rundiffusion, Version: v1.5.1 ``` ![flappers-jugxl6-a.jpg](https://hackmd.io/_uploads/Hy2NoqbmT.jpg) ![flappers-jugxl6-b.jpg](https://hackmd.io/_uploads/HJN8i9b76.jpg) ``` a 1990s raver girl on the dancefloor at a busy warehouse rave, volumetric fog, dramatic lighting, color photography, RAW photo, analog style, depth of field Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 2570851055, Size: 816x1024, Model hash: 1fe6c7ec54, Model: juggernautXL_version6Rundiffusion, Version: v1.5.1 ``` ![raver-jugxl6-a.jpg](https://hackmd.io/_uploads/SyOsiqbm6.jpg) ### Riot Diffusion XL * [On Civitai](https://civitai.com/models/125151?modelVersionId=181589) Apparently trained on riot games artwork, like League of Legends. Overall I'd say this is really swingy. I'm either getting amazing results, or... mediocre results. I don't grok prompting for this. Maybe it's due to the training and the kind of keywords you need to trigger it. Ah ha! Yes, it's keywords, now I get it rereading the page. ``` best aesthetic, lolsplashart, 1920s flapper Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 2654549112, Size: 816x1024, Model hash: 75cc450c7d, Model: riotDiffusionXLLeagueOfLegendsSplash_v20, Version: v1.5.1 ``` ![flapper-riot-a.jpg](https://hackmd.io/_uploads/BkCbn5WQT.jpg) ``` best aesthetic, lolsplashart, 1920s flapper runs out of the speakeasy in a hurry, hair waving in the wind, taxi cab drives away Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 323944718, Size: 816x1024, Model hash: 75cc450c7d, Model: riotDiffusionXLLeagueOfLegendsSplash_v20, Version: v1.5.1 ``` ![flapper-riot-b.jpg](https://hackmd.io/_uploads/HkK82q-X6.jpg) ``` best aesthetic, lolsplashart, 1990s raver girl is dancing so hard her hair is going everywhere, sweat dripping, warehouse rave Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 2310515265, Size: 816x1024, Model hash: 75cc450c7d, Model: riotDiffusionXLLeagueOfLegendsSplash_v20, Version: v1.5.1 ``` ![raver-riot-a.jpg](https://hackmd.io/_uploads/rJJ1p5WXT.jpg) ## SDXL Training I referenced this video for the basics for my training params: https://www.youtube.com/watch?v=AY6DMBCIZ3A&t=904s Using my [Kohya SS on Fedora 38](/MPHkPj_BQNGCuBGBUJlZvQ) containerized deployment. I just pulled master and then: `podman build -t dougbtv/kohyass -f Dockerfile`, but... I ran into some issue where xformers wasn't built, and then when I tried to run training... I got issues about how saying stuff wasn't built with cuda support. And I eventually printed the training command and removed the `--xformers` parameter. And it got further... But I ran out of VRAM! Even with 24gigs and a batch size of 1. So, apparently... I need xformers. So I filed: https://github.com/bmaltais/kohya_ss/issues/1651 This has my recipe for what I used to get a docker build to work ^^ It also has my training params. And... As of the time of recording, I haven't finished training one epoch. Bummer! I'll share results on the next episode and talk about my experience. ## Bloods and crits ### Origami Models [On Reddit](https://www.reddit.com/r/StableDiffusion/comments/17iqt80/origami_models_with_jaggernaut_xl_v6_nsfw/?share_id=BcJhYV4nNNlT7zf7TnnkJ&utm_content=1&utm_medium=android_app&utm_name=androidcss&utm_source=share&utm_term=1) Really cool idea! And they came out well. There's a kind of inherent repetition of form in orgami. And overall, I just am fascinated by origami. And I haven't seen someone else do it. Really neat idea and worth playing with. The renderings are good, and there's nothing directly detracting from them or things that I could immediately fix. Maybe I'd play with the origami in the background a bit. The depth of field comes out really well and works to help push the form and depth. They are a little "extra" for origami, like, almost too complex, and almost gets a 3d wireframe look a little bit. But, I'm unsure if it's really a problem. ![](https://preview.redd.it/tyqmpszt91xb1.png?width=640&crop=smart&auto=webp&s=a42ae1a990006c4fb13035d1b9113f159d1398ee) ### The air is turning cooler [On Reddit](https://www.reddit.com/r/StableDiffusion/comments/17hvzb9/the_air_is_turning_cooler_the_leaves_are_turning/?share_id=EuWUio77XBs1TQTIrPtsA&utm_content=1&utm_medium=android_app&utm_name=androidcss&utm_source=share&utm_term=1) Things I love about this... * Understated bird as a subject. * The style is awesome. * The positive and negative space adds to the composition * The subject isn't entirely in the frame. * There's an understated narrative that really works. * Good repetition of form with the wood, the leaves, and the forest in the background. Honestly, I can't find a lot about it to critique other than it has a lot of things that are working really well. ![](https://i.redd.it/zlcygtnrvswb1.png) ## Technique of the week: IP Adapater https://github.com/tencent-ailab/IP-Adapter I used this video: https://www.youtube.com/watch?v=KHm5Q5TfNvE I downloaded the models here because I wanted safetensors: https://huggingface.co/h94/IP-Adapter/tree/main/models Ugh, didn't work so I risked the pickletensors from the video: https://huggingface.co/lllyasviel/sd_control_collection/tree/main For a reference, I'm using one of my favorite ridiculous sculptures by Salvadore Dali: https://www.salvador-dali.org/en/artwork/catalogue-raisonne-sculpture/obra/60eab9bd42ece411947100155d647f0b/retrospective-bust-of-a-woman ![](https://hackmd.io/_uploads/rkmLqYWX6.png) I saw this in person as a teenager and it always stuck with me. ``` sculpture of a raver girl wearing sunglasses Negative prompt: (bad_prompt_v2:0.8),Asian-Less-Neg,bad-hands-5, BadDream, (skinny:1.2) Steps: 30, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2813612254, Size: 408x512, Model hash: f968fc436a, Model: analogMadness_v50, Denoising strength: 0.4, ControlNet 0: "Module: ip-adapter_clip_sd15, Model: ip-adapter_sd15 [dbbc7cfe], Weight: 1.3, Resize Mode: Crop and Resize, Low Vram: False, Processor Res: 512, Guidance Start: 0, Guidance End: 0.82, Pixel Perfect: True, Control Mode: Balanced", Hires upscale: 2, Hires upscaler: ESRGAN_4x, TI hashes: "bad_prompt: f9dfe1c982e2", Version: v1.5.1 ``` The result: ![](https://hackmd.io/_uploads/rk1f5KbQT.jpg) And for reference, same seed without it enabled: ![](https://hackmd.io/_uploads/Bk1G9KWma.jpg) Let's try another reference... We'll use one of my favorite paintings: https://en.wikipedia.org/wiki/Portrait_of_Madame_X by John Singer Sargent ![](https://hackmd.io/_uploads/HycAqFbQp.png) ``` a raver girl wearing sunglasses Negative prompt: (bad_prompt_v2:0.8),Asian-Less-Neg,bad-hands-5, BadDream, (skinny:1.2) Steps: 30, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3353852603, Size: 408x512, Model hash: f968fc436a, Model: analogMadness_v50, Denoising strength: 0.4, ControlNet 0: "Module: ip-adapter_clip_sd15, Model: ip-adapter_sd15 [6a3f6166], Weight: 1.3, Resize Mode: Crop and Resize, Low Vram: False, Processor Res: 512, Guidance Start: 0, Guidance End: 0.82, Pixel Perfect: True, Control Mode: Balanced", Hires upscale: 2, Hires upscaler: ESRGAN_4x, TI hashes: "bad_prompt: f9dfe1c982e2", Version: v1.5.1 ``` ![](https://hackmd.io/_uploads/rkhuoK-Q6.jpg)