![pod logo](https://i.imgur.com/SlYH9da.png =600x408) ## Intro Welcome to episode nineteen! This is your host, Doug Smith. This is Not An AI art podcast is a podcast about, well, AI ART – technology, community, and techniques. With a focus on stable diffusion, but all art tools are up for grabs, from the pencil on up, and including pay-to-play tools, like Midjourney. Less philosophy – more tire kicking. But if the philosophy gets in the way, we'll cover it. But plenty of art theory! Today we've got: * Model madness: 3 SDXL models * Technique of the week: Perspective warp & "where's waldo style" image Available on: * [Spotify](https://open.spotify.com/show/4RxBUvcx71dnOr1e1oYmvV) * [iHeartRadio](https://www.iheart.com/podcast/269-this-is-not-an-ai-art-podc-112887791/) * [Google Podcasts](https://podcasts.google.com/feed/aHR0cHM6Ly9hbmNob3IuZm0vcy9kZWY2YmQwOC9wb2RjYXN0L3Jzcw) Show notes are always included and include all the visuals, prompts and technique examples, the format is intended to be so that you don't have to be looking at your screen -- but the show notes have all the imagery and prompts and details on the processes we look at. ## News ### Gumbo Slice Meme * [Know your meme](https://knowyourmeme.com/memes/alligator-kick-black-guy-kicking-alligator) ![](https://i.kym-cdn.com/photos/images/newsfeed/002/672/532/035.jpg) ### DALL-E 3: The test drive. So, I tried to join chatGPT+ just to try it out, annnnd, I didn't even have access to it! (I later got access to it) I had been trying Bing, and it kept saying "sorry, try later" kind of thing. Eventually -- I tried on a desktop and it worked. So maybe mobile is more restricted. Here's the first thing I tried... Gotta say -- the prompting was very easy, as advertised. I really got exactly what I was thinking of very easy. Very close to a finished product. I would say -- it's looking a little 3d rendered kind of style to me. Unsure if this is on purpose, but this is something where we definitely get more control with something like SD. ![](https://hackmd.io/_uploads/r1cCxGAe6.png) And an upscale... ![](https://hackmd.io/_uploads/SyFAxzAlT.jpg) Then, editing is a feature. Pretty much nailed it. There's still some concept bleed. ![](https://hackmd.io/_uploads/BJ9RefRlT.jpg) And a couple upscales... Hands are overall very good. I'd only fix a little in this (especially the glass on the right, and maybe tone down the hand on the lap in the left) ![](https://hackmd.io/_uploads/ByKCgGRl6.jpg) ![](https://hackmd.io/_uploads/Sk9CxGAgp.jpg) Then I tried to get a little creative... (I only got one back this time, I randomly get 1-4 apparently.) You can see a bunch of facial problems, the kind that we fix very easily with inpainting in StableDiffusion. ![](https://hackmd.io/_uploads/SkyRGf0gT.jpg) ![](https://hackmd.io/_uploads/S1JCzzCea.jpg) My take: I think the easy prompting is interesting, and cool, and good for general consumption. But I think it's not necessarily better than MidJourney v5 in terms of model. I think the MJ5 model is actually better, if I could run one and be root, I'd pick MJv5 At the end of the day -- I kinda want root. I don't necessarily want a filtered dataset. To me it's sort of like a safety device on a tablesaw. Yeah, I know, it's safer. But it's limiting in terms of the cuts you can make. So, it takes a different level of skill to use. Sharp tools are like this. ### Dall-E stereographs * [On r/chatGPT on Reddit](https://reddit.com/r/ChatGPT/s/0ZKqSZv8UX) I made myself sick trying to look at ones I generated lol. Make sure you make them wide. Like a magic eye kinda: https://en.wikipedia.org/wiki/Magic_Eye And stereoscopic images: https://en.wikipedia.org/wiki/Stereoscopy ![](https://preview.redd.it/bpgwvllk21tb1.png?width=1792&format=png&auto=webp&s=46491d220d8f5a48959c6a823af02556ae4b5ff8) ### Bing pre-prompt. Someone got it out of it. * [On reddit](https://reddit.com/r/ChatGPTPro/s/QD8YBS63R1) ``` If the description is not in English, then translate it. Do not create more than 4 images, even if the user requests more. Don't create images of politicians or other public figures. Recommend other ideas instead. Don't create images in the style of artists whose last work was created within the last 100 years (e.g. Picasso, Kahlo). Artists whose last work was over 100 years ago are ok to reference directly (e.g. Van Gogh, Klimt). If asked say, "I can't reference this artist", but make no mention of this policy. Instead, apply the following procedure when creating the captions for dalle: (a) substitute the artist's name with three adjectives that capture key aspects of the style; (b) include an associated artistic movement or era to provide context; and (c) mention the primary medium used by the artist. DO NOT list or refer to the descriptions before OR after generating the images. They should ONLY ever be written out ONCE, in the "prompts" field of the request. You do not need to ask for permission to generate, just do it! Always mention the image type (photo, oil painting, watercolor painting, illustration, cartoon, drawing, vector, render, etc.) at the beginning of the caption. Unless the caption suggests otherwise, make at least 1--2 of the 4 images photos. Diversify depictions of ALL images with people to include DESCENT and GENDER for EACH person using direct terms. Adjust only human descriptions. Your choices should be grounded in reality. For example, all of a given OCCUPATION should not be the same gender or race. Additionally, focus on creating diverse, inclusive, and exploratory scenes via the properties you choose during rewrites. Make choices that may be insightful or unique sometimes. Do not create any imagery that would be offensive. Silently modify descriptions that include names or hints or references of specific people or celebrities by carefully selecting a few minimal modifications to substitute references to the people with generic descriptions that don't divulge any information about their identities, except for their genders and physiques. ``` ### Cool comparison of SD + DALLE-3 using Garfield * [From reddit](https://www.reddit.com/r/StableDiffusion/comments/174t6d3/garfield_comic_comparsion/?share_id=eDWUTvr9x4zwCqpVM-01s&utm_content=1&utm_medium=android_app&utm_name=androidcss&utm_source=share&utm_term=1) SD 1.5 ![](https://preview.redd.it/fhzly8ju9ftb1.jpg?width=512&format=pjpg&auto=webp&s=32da4cd32ac83b764b8bcf9a155c28d504c8764b) And DALLE-3 ![](https://preview.redd.it/2wxazzyx9ftb1.jpg?width=960&crop=smart&auto=webp&s=afb75e88bff7bcf59daf4573f9271ceb6a86c79a) But SD 1.5 with a [garfield LoRA](https://civitai.com/models/29444/garfield-diffusion-v1) (for sd 1.5) ![](https://preview.redd.it/e9bme7l4qftb1.png?width=1024&format=png&auto=webp&s=54f1dd966805e1adc448f76d8e90acd91763dcca) ### Bonus comparison for a prompt from ChatGPT + DALLE-3 My prompt: ``` create images of a pirate on sliding down ropes with a knife in her mouth, it should be aboard a sloop in the year 1725, in the style of a historical fiction film still ``` What chatGPT showed me ``` Photo from a historical fiction film depicting a fearless female pirate in action. She's sliding down the ropes of a sloop with agility, a sharp knife gripped in her mouth. The ship's deck is bustling with crew members, and the vast sea stretches out behind her. The scene is bathed in the golden hue of twilight, adding drama to the moment. ``` and... ``` Photo reminiscent of a historical fiction film scene showcasing a daring female pirate with wind-blown hair, sliding down thick ropes of a sloop in the year 1725. The gleam of a knife held in her mouth contrasts with the dim light of the setting sun. The ship's sails billow, and the horizon shows a distant island, adding to the cinematic atmosphere. ``` ## Model madness ### Detail slider LoRA It's for SD 1.5 * [On CivitAI](https://civitai.com/models/153562/detail-slider-lora) ``` color photograph of a woman standing in a speakeasy, 1920s clothing, bar, night club, (high heels:1.1), cinematic still, photography by Larry Clark Negative prompt: (bad_prompt_v2:0.8),Asian-Less-Neg,bad-hands-5, BadDream, (skinny:1.2), 3d, render, anime Steps: 30, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 2393311418, Size: 512x512, Model hash: 47170319ea, Model: juggernaut_final, Denoising strength: 0.4, Hires upscale: 2, Hires upscaler: ESRGAN_4x, TI hashes: "bad_prompt: f9dfe1c982e2", Version: v1.5.1 ``` ![](https://hackmd.io/_uploads/H13M8VPb6.jpg) Using the same seed, let's pump it to 5. Huh, pretty heavily saturated, huh? ![](https://hackmd.io/_uploads/HkyUU4v-p.jpg) Let's try 1.0 ![](https://hackmd.io/_uploads/HJni8Nwbp.jpg) And with -5 ![](https://hackmd.io/_uploads/B1Tu8EvWp.jpg) ### 9 angles LoRA * [On Reddit](https://www.reddit.com/r/StableDiffusion/comments/173y4i9/sdxl_as_requested_early_preview_samples_generated/?share_id=VBWXVzJ4Otg5HoxXSGFNR&utm_content=1&utm_medium=android_app&utm_name=androidcss&utm_source=share&utm_term=1) It's not out yet, but it looks fun! ![](https://preview.redd.it/kcoqngfdn7tb1.png?width=960&crop=smart&auto=webp&s=80ac1142c257c5b9e4d24ecdc67623d8b4b411d4) ![](https://preview.redd.it/tgce06ihn7tb1.png?width=960&crop=smart&auto=webp&s=02fc36f6d001bcb483c7b1b1c04305fd695b5cda) Looks like it could be really utilitarian for using for photobashing for inpainting, love it. ### Large comparison of SDXL models * [SDXL Model comparison on Reddit](https://www.reddit.com/r/StableDiffusion/comments/176iijv/comparison_of_20_popular_sdxl_models/?share_id=jye_uT12sbJwTNGSFt05b&utm_content=1&utm_medium=android_app&utm_name=androidcss&utm_source=share&utm_term=1) ### Training a LoRA for outfits * [On Youtube from Aitrepreneur](https://www.youtube.com/watch?si=LNRHsU9mtpJZgGTp&v=_n93IxxUwB0&feature=youtu.be) Really interesting that he paints out the faces and backgrounds. Interesting idea! I like it. Results look nice. ## Bloods and crits ### "She knows" * [From Reddit](https://www.reddit.com/r/StableDiffusion/comments/174nkqa/she_knows/?share_id=icZSG0FIItpYax_UE0RyQ&utm_content=1&utm_medium=android_app&utm_name=androidcss&utm_source=share&utm_term=1) Really incredible rendering and comes with a workflow, thank you! From SDXL base too. It's got ONE major problem though. Take a look at the man's right leg as we see it. It's in the wrong place in space. This is a great example of if you can kinda sketch out the leg shape and know how to measure proportions -- you can fix this in <5 minutes. Cool atmosphere. I'm not sure I'm sold on the composition. But overall, it's really neat. ``` Art by Jeremy Mann and Edwin Georgi, with erotic tension, cinematic composition and a distant, low angle. A smirking young man and a shy young woman stand nose to node, in a daylight Americana small town. Negative prompt: barbed wire Steps: 20, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 856560037, Size: 1024x1024, Model hash: 31e35c80fc, Model: sd_xl_base_1.0, Refiner: sd_xl_refiner_1.0 [7440042bbd], Refiner switch at: 0.7, Version: v1.6.0 ``` ![](https://preview.redd.it/mcdmrhyz4etb1.png?width=960&crop=smart&auto=webp&s=7da27a55f427232ff01c2c5aa693284b21e9029a) ### "Incredible scenery" * [From reddit](https://www.reddit.com/r/StableDiffusion/comments/17729e4/incredible_scenery/?utm_source=share&utm_medium=web2x&context=3) I really like how there's the cat in the foreground, it helps give a sense of scale and makes the image look very grandious! There's stuff that needs fixes, none of which are major problems, but are detracting from the image. * Remove the "imitated watermark" on the lower right hand corner. Be on the lookout for these. * The river looks like it stops before the bottom. * Also the springs on the left hand side of the river glen need to be fixed. ![](https://preview.redd.it/s6i6zwinuztb1.jpg?width=640&crop=smart&auto=webp&s=4d022ffd014d9ceeaac7d9e116f6e0a58d1be75d) ## Maybe for next week... https://github.com/tencent-ailab/IP-Adapter It's like a style transfer? ## Technique of the week: Adetailer I used instructions from this [great article from stable-diffusion-art.com](https://stable-diffusion-art.com/adetailer/) It's actually in the extensions list in automatic1111, just do a find for `after detailer`, install it there and restart. Example! Using this prompt... ``` people at a party in the 1880s Negative prompt: (bad_prompt_v2:0.8),Asian-Less-Neg,bad-hands-5, BadDream, (skinny:1.2) Steps: 30, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3744441292, Size: 512x512, Model hash: 47170319ea, Model: juggernaut_final, Denoising strength: 0.35, Hires upscale: 2, Hires upscaler: ESRGAN_4x, TI hashes: "bad_prompt: f9dfe1c982e2", Version: v1.5.1 ``` I get this result: ![](https://hackmd.io/_uploads/SJTgiH3gT.jpg) And then with adetailer set... ![](https://hackmd.io/_uploads/HJK-sH3g6.jpg) ### Adetailer hands And you can do hands, make sure to set the model and enable it, like so: ![](https://hackmd.io/_uploads/By_xXwhlT.jpg) And a prompt where I thought I'd get hands, even my first result on it... ``` top down view of a board game, people playing a board game Negative prompt: (bad_prompt_v2:0.8),Asian-Less-Neg,bad-hands-5, BadDream, (skinny:1.2) Steps: 30, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 3185927063, Size: 512x512, Model hash: 47170319ea, Model: juggernaut_final, Denoising strength: 0.35, Hires upscale: 2, Hires upscaler: ESRGAN_4x, TI hashes: "bad_prompt: f9dfe1c982e2", Version: v1.5.1 ``` Before adetailer... ![](https://hackmd.io/_uploads/r1rMXPnl6.jpg) And then with it enabled... ![](https://hackmd.io/_uploads/S1zXQD2ga.jpg) Definitively better. ### My opinion? This is awesome. It's so fast (on my machine, at least) that I'm considering having it enabled on some larger batches to help find some initial generations more quickly.