![pod logo](https://i.imgur.com/SlYH9da.png =600x408) ## Intro Welcome to episode twenty one! This is your host, Doug Smith. This is Not An AI art podcast is a podcast about, well, AI ART – technology, community, and techniques. With a focus on stable diffusion, but all art tools are up for grabs, from the pencil on up, and including pay-to-play tools, like Midjourney. Less philosophy – more tire kicking. But if the philosophy gets in the way, we'll cover it. But plenty of art theory! Today we've got: * New update as per usual * Model madness: Model review on 1 model and 2 LoRAs * Bloods and Crits: Art crits on two pieces * Technique of the week: SDXL Training, and the params I used. Available on: * [Spotify](https://open.spotify.com/show/4RxBUvcx71dnOr1e1oYmvV) * [iHeartRadio](https://www.iheart.com/podcast/269-this-is-not-an-ai-art-podc-112887791/) * [Google Podcasts](https://podcasts.google.com/feed/aHR0cHM6Ly9hbmNob3IuZm0vcy9kZWY2YmQwOC9wb2RjYXN0L3Jzcw) Show notes are always included and include all the visuals, prompts and technique examples, the format is intended to be so that you don't have to be looking at your screen -- but the show notes have all the imagery and prompts and details on the processes we look at. ## News Midjourney Style tuner * Official docs: https://docs.midjourney.com/docs/style-tuner * [Venturebeat blog article](https://venturebeat.com/ai/midjourneys-new-style-tuner-is-here-heres-how-to-use-it/) * [Decent youtube video](https://www.youtube.com/watch?v=OZTYgnO1pfU) Costs about 0.3 hours, not bad. Gist is that you prompt for a style, then you get to pick A/B (or from a large group) to tune the style based on it. Looks like: ![mj-tune-picker.jpg](https://hackmd.io/_uploads/rJ366xNXp.jpg) [Here's my resulting style Tuner on midjourney.com](https://tuner.midjourney.com/IIjkjDB) if you're interested. Gut says this looks like this might be using some fashion of "RLHF", Reinforcement learning from human feedback. * [RLHF, On Wikipedia](https://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback) * [Hugging face article](https://huggingface.co/blog/rlhf) ### ollama ![](https://github.com/jmorganca/ollama/assets/3325447/56ea1849-1284-4645-8970-956de6e51c3c) https://github.com/jmorganca/ollama/ > https://github.com/jmorganca/ollama/ I even wrote up a little (work-specific) blog article about my experiment with it: https://dougbtv.com/nfvpe/2023/11/21/robocni-config/ ## Model Madness ### Victorian Style LoRA (by yours truly!) Really prefers photographic style -- but it does have range, and some painting styles can produce decent generations. Likely has a bit of facial averaging in it from the dataset I used, where I didn't clean out faces. I'm currently OK with that, but some people might find it limiting. I'd recommend doing an inpaint pass for new faces, especially with removing the LoRA (or setting it to 0.0 strength) 0.6 strength seems to work just about right for me and my purposes. * [On Civitai](https://civitai.com/models/202690/victorian-style-lora-xl?modelVersionId=228196) ``` the gorgeous artistocrat in the swanky hotel, the! most! beautiful!, pinup, intricate dress, victorianstyle, 1880s, RAW photo, analogue style, dramatic lighting, depth of field, photography by Natalia Drepina <lora:victoriansxl_v1:0.6> Steps: 35, Sampler: DPM++ 2M Karras, CFG scale: 5.5, Seed: 22454014, Size: 816x1024, Model hash: 1fe6c7ec54, Model: juggernautXL_version6Rundiffusion, Lora hashes: "victoriansxl_v1: 2b8ae1797131", Version: v1.5.1 ``` ![victorians-demo-c1](https://hackmd.io/_uploads/H1yilFr4a.jpg) ``` the gorgeous debutante in the dimly lit hotel bar, the! most! beautiful!, pinup, intricate dress, victorianstyle, 1880s, RAW photo, analogue style, dramatic lighting, depth of field, photography by Natalia Drepina <lora:victoriansxl_v1:0.6> Steps: 35, Sampler: DPM++ 2M Karras, CFG scale: 5.5, Seed: 1483129961, Size: 816x1024, Model hash: 1fe6c7ec54, Model: juggernautXL_version6Rundiffusion, Lora hashes: "victoriansxl_v1: 2b8ae1797131", Version: v1.5.1 ``` ![victorians-demo-c2](https://hackmd.io/_uploads/ryzkWKBVa.jpg) ``` a painting of the lovely 25 year old woman on the luscious veranda, the! most! beautiful!, intricate dress, victorianstyle, 1880s, academism, licked finish, grandeur, neoclassicism, painting by William-Adolphe Bouguereau <lora:victoriansxl_v1:0.6> Steps: 35, Sampler: DPM++ 2M Karras, CFG scale: 5.5, Seed: 3707343593, Size: 816x1024, Model hash: 1fe6c7ec54, Model: juggernautXL_version6Rundiffusion, Lora hashes: "victoriansxl_v1: 2b8ae1797131", Version: v1.5.1 ``` ![tmpxd7x0k9a](https://hackmd.io/_uploads/rJbWMtHE6.jpg) As an example of how I might iterate one of these... Using the same prompt templates as above... Mostly just a few subtle changes * Fix the proportions on the leg on the right side, which looks like she has small legs * Hand fix (always!) * Inpainted the face * Color and contrast fixes in Photoshop. Before: ![ivictorian-before](https://hackmd.io/_uploads/HyqIbH8Na.jpg) After: ![ivictorian-after](https://hackmd.io/_uploads/Bybw-BIET.jpg) ## Albedo Base 1.2 * [Reddit](https://www.reddit.com/r/StableDiffusion/comments/17u910g/sdxl_albedobase_xl_v12_model_finally_released/?share_id=CDHywr3PhWTBCKutnNjwG&utm_content=1&utm_medium=android_app&utm_name=androidcss&utm_source=share&utm_term=1) * [Civitai](https://civitai.com/models/140737/albedobase-xl) ``` color photograph of the 1920s flapper in the busy nightclub, speakeasy, prohibition, RAW photo, analog style, volumetric lighting, photography by Natalia Drepina Steps: 35, Sampler: DPM++ 2M Karras, CFG scale: 5.5, Seed: 2863756277, Size: 816x1024, Model hash: 8a8efa5ad2, Model: albedobaseXL_v12, Version: v1.5.1 ``` ![albedo-12-flapper2](https://hackmd.io/_uploads/HJONT5LET.jpg) ``` illustration of the dancing raver girl, full body, expressive mark making, gestural, gouache, watercolor, ink, mixed media on paper, illustration by Felicien Rops Steps: 35, Sampler: DPM++ 2M Karras, CFG scale: 5.5, Seed: 3315243266, Size: 816x1024, Model hash: 8a8efa5ad2, Model: albedobaseXL_v12, Version: v1.5.1 ``` ![albedo-12-raverdrawing](https://hackmd.io/_uploads/BJocR98N6.jpg) ### Moon (formerly: Sci-White) LoRA (for SD1.5) * [On Reddit](https://www.reddit.com/r/StableDiffusion/comments/17ulkrd/moon_sciwhite_15_lora_released/?share_id=AcMSsvyHbCneXTRHKWXxV&utm_content=1&utm_medium=android_app&utm_name=androidcss&utm_source=share&utm_term=1) * [On Civitai](https://civitai.com/models/199118/moon-sciwhite) This is a cool concept, it's trained for a "color aesthetic". This is... super neat idea. I could definitely see myself going for this from a concept point of view, and it half makes me want to train my own model on a colorway that I pick. So interesting. ``` a 1920s flapper in high heels staring seriously in the style of m01n over white background <lora:m01n:1> Negative prompt: (bad_prompt_v2:0.8),Asian-Less-Neg,bad-hands-5, BadDream Steps: 35, Sampler: DPM++ 2M Karras, CFG scale: 5.5, Seed: 3491658892, Size: 408x512, Model hash: 47170319ea, Model: juggernaut_final, Denoising strength: 0.36, Hires upscale: 2, Hires upscaler: ESRGAN_4x, Lora hashes: "m01n: 701c1b209c4f", TI hashes: "bad_prompt: f9dfe1c982e2", Version: v1.5.1 ``` Damn these are chic! ![flapper-moin-2](https://hackmd.io/_uploads/By3x42LEa.jpg) ![flapper-moin-1](https://hackmd.io/_uploads/HynlE3U46.jpg) ``` space station control panel in the style of m01n, electronics, tubing, digital screens, futurepunk, wiring <lora:m01n:1> Negative prompt: (bad_prompt_v2:0.8),Asian-Less-Neg,bad-hands-5, BadDream Steps: 35, Sampler: DPM++ 2M Karras, CFG scale: 5.5, Seed: 1791435116, Size: 408x512, Model hash: 47170319ea, Model: juggernaut_final, Denoising strength: 0.36, Hires upscale: 2, Hires upscaler: ESRGAN_4x, Lora hashes: "m01n: 701c1b209c4f", TI hashes: "bad_prompt: f9dfe1c982e2", Version: v1.5.1 ``` Not quite as cool, but the LoRA works well still. ![controls-moin](https://hackmd.io/_uploads/SJi8BhIE6.jpg) ``` chromed out robotic gnat fly nanomachine in the style of m01n on a white background, realistic insect wings <lora:m01n:1> Negative prompt: (bad_prompt_v2:0.8),Asian-Less-Neg,bad-hands-5, BadDream Steps: 35, Sampler: DPM++ 2M Karras, CFG scale: 5.5, Seed: 254887183, Size: 408x512, Model hash: 47170319ea, Model: juggernaut_final, Denoising strength: 0.36, Hires upscale: 2, Hires upscaler: ESRGAN_4x, Lora hashes: "m01n: 701c1b209c4f", TI hashes: "bad_prompt: f9dfe1c982e2", Version: v1.5.1 ``` ![fly-moin](https://hackmd.io/_uploads/Bk2SUnUNT.jpg) ### Other Resources **PixArt-alpha.** * https://huggingface.co/PixArt-alpha/PixArt-XL-2-1024-MS * Really wanna try PixArt-alpha * Looks like good results. * Apparently in SD.Next / Vladautomatic * ...Didn't get a chance to try it. * Supposed to be cost-effective to train. **Bubble Prompter** ![Screenshot 2023-11-18 150724](https://hackmd.io/_uploads/rkcqYqLEp.jpg) * [On Reddit](https://www.reddit.com/r/StableDiffusion/comments/17xe2q8/bubble_prompter_a_tool_to_reorder_tags_and_edit/?share_id=uOY6B1EayrKPQOHV-3o0f&utm_content=1&utm_medium=android_app&utm_name=androidcss&utm_source=share&utm_term=1) Cool utility for visual prompt editing. Could be really nice for just jamming on a bunch of kind of "tags" that you use often in your prompts. **Article on consistent characters** * [On Reddit](https://www.reddit.com/r/StableDiffusion/comments/17x5fll/the_chosen_one_consistent_characters_in/?share_id=dZRtBwnp0m0xT9hjatZjQ&utm_content=1&utm_medium=android_app&utm_name=androidcss&utm_source=share&utm_term=1) * Based on [this arxiv paper](https://arxiv.org/pdf/2311.10093.pdf) Interesting idea of kind of re-training a LoRA iteratively. **Weird Wonderful Art Artists Reference** Found myself reaching for this a number of times this week! * https://weirdwonderfulai.art/resources/stable-diffusion-xl-sdxl-artist-study/ ## Bloods and crits ### The history and evolution of art REPEAT! The artist got back to me. The reason this comes out so well is in part because of the fact that the artist really put something together that's thinking on a number of levels... Listen to the concept: > The scene is supposed to depict the history and evolution of art. The AI and traditional artist are begging the goddess of creativity for more inspiration and mastery. We started with cave drawings, progressed to oil painting (that's Monet's water lilies painting), digital art (an Alena Aenami piece), and finally AI artwork. Where do we go from here? And the workflow was thoughtful: > I composed this scene using a bunch of images I generated in Dall E 3. I arranged the scene in Gimp then brought it into controlnet. From there it was basically just a shit ton of inpainting, outpainting, various Gimp tools, and CN tile+Ultimate SD Upscale. (the original was bigger and more resolute) I really like a lot about it. I think the narrative really is awesome, and having the prose to go with it helps me too. There's two places that I would improve... * Economy: Since there's so much going on, we could actually start reducing the number of things from this. One thing I would consider is de-emphasizing certain things. Make areas that aren't imporant to the story or composition less detailed, or darker, or reduced contrast. * Composition: There's a lot going on, which is a challenge because it all looks really good. But if there was an idea of negative space somewhere here, it could really help draw your eye around the piece and bring out some of the details. * in particular I might try removing the bookshelf and build some areas of contract between the characters and the background. especially the goddess, she is looking kinda flat and it could look a lot more dimensional in 3-space. ![bloodcrit1-sized](https://hackmd.io/_uploads/B1e3JoLN6.jpg) ### Name this 70s electronic duo It's super fun and I enjoy the concept of creating this. I also love 70s stuff, and electronic music, so clearly... I'm the right audience. The render is incredible overall, and really has a 70s look in terms of color and the photographic quality of it. The composition is OK. It's a little boring with the mostly centered subject. Details are OK overall. I'd probably fix the hands on the person on the right, hand seems to meld in with the body. Also, I'd probably remove, replace or blur out text that's unreal -- epsecially that license plate. Fun! * [On r/midjourney Reddit](https://www.reddit.com/r/midjourney/comments/17t1kj6/name_this_70s_electronic_duo/?share_id=hkqvul73EDeRpopxu1VVL&utm_content=1&utm_medium=android_app&utm_name=androidcss&utm_source=share&utm_term=1) ![](https://preview.redd.it/nqzn6jbzqrzb1.jpg?width=640&crop=smart&auto=webp&s=2643e06b1d0ead861cf7636e63aad980b0838cb6) ## Technique of the week: LoRA Training on SDXL General workflow for creating LoRA's * Collect imagery * I also use ChatGPT (w/ GPT-4) to create scripts to help me automate collecting images * Process imagery * Automate as much as I can with scripts * Upscale as I need with Gigapixel AI * Caption images * Use the utilities in Kohya_ss * I append some text like ", victorianstyle" in this case to the captions * Kick off the training * I referenced this video for the basics for my training params: https://www.youtube.com/watch?v=AY6DMBCIZ3A&t=904s My parameters are too big to post all in one shot! So I create a [github gist with my parameters](https://gist.github.com/dougbtv/e457a39f8ad2b190af611ae5c286d0d8) ## Technique of the Week: Concept mixing for photography from a traditional medium start Today we're going to use concept mixing to kind of "start with an illustration" and end with a photographic style. We'll use this prompt formatting in stable diffusion... ``` [Concept one:concept two:0.4] ``` The `0.4` is a ratio from 0-1. You can think of it as percentage, so `0.4` = `40%`, so that means that for the first 40% of the steps of rendering, the first concept will be referenced, and then for the remaining steps the second concept is referenced. Our final prompt will look like: ``` the dancer in a dynamic pose, slim dress, full body, Fin de siècle, 1880s fashion, pastel colors, [illustration by Felicien Rops:photography by Natalia Drepina:0.3] <lora:victoriansxl_v1:0.6> ``` Today I'm going to mix [Felicien Rops](https://en.wikipedia.org/wiki/F%C3%A9licien_Rops) who's a Belgian illustrator/painter that I really like from second half of the 1800s. And then we'll mix that with [Natalia Drepina (possibly NSFW content in this article)](https://uncannyarchive.com/interview-with-russian-fine-art-photographer-and-multimedia-artist-natalia-drepina-tenebrous-emotional-portraits/) a Russian photographer who captures some really stunning and stylized imagery, and is a contemporary artist. Note that I'm using my So we'll use an illustrator first to get kind of the layout and colors, then we'll render the photographic style for the last percentage of steps. This... Does majorly change both. But let's start with what our illustrator looks like: ``` the dancer in a dynamic pose, slim dress, full body, Fin de siècle, 1880s fashion, pastel colors, illustration by Felicien Rops <lora:victoriansxl_v1:0.6> Steps: 35, Sampler: DPM++ 2M Karras, CFG scale: 5.5, Seed: 222245886, Size: 816x1024, Model hash: 1fe6c7ec54, Model: juggernautXL_version6Rundiffusion, Lora hashes: "victoriansxl_v1: 2b8ae1797131", Version: v1.5.1 ``` ![dancer-rops](https://hackmd.io/_uploads/rk_3y5DV6.jpg) Now what if we just use the photographic styling of Natalia Drepina? ``` the dancer in a dynamic pose, slim dress, full body, Fin de siècle, 1880s fashion, pastel colors, photography by Natalia Drepina <lora:victoriansxl_v1:0.6> Steps: 35, Sampler: DPM++ 2M Karras, CFG scale: 5.5, Seed: 1987838467, Size: 816x1024, Model hash: 1fe6c7ec54, Model: juggernautXL_version6Rundiffusion, Lora hashes: "victoriansxl_v1: 2b8ae1797131", Version: v1.5.1 ``` ![dancer-drepina](https://hackmd.io/_uploads/BJaQxcD4a.jpg) Now let's combine the two ideas! Here's the mixed prompt: ``` the dancer in a dynamic pose, slim dress, full body, Fin de siècle, 1880s fashion, pastel colors, [illustration by Felicien Rops:photography by Natalia Drepina:0.3] <lora:victoriansxl_v1:0.6> Steps: 35, Sampler: DPM++ 2M Karras, CFG scale: 5.5, Seed: 3306090224, Size: 816x1024, Model hash: 1fe6c7ec54, Model: juggernautXL_version6Rundiffusion, Lora hashes: "victoriansxl_v1: 2b8ae1797131", Version: v1.5.1 ``` ![dancer-mixed](https://hackmd.io/_uploads/H1lJycwN6.jpg) Note that we get some influence from the colors and compositions of Rops! But, we've gone with a `0.3` ratio which means it's just the start of the diffusion process, and then we weigh it more heavily towards photography.