![pod logo](https://i.imgur.com/SlYH9da.png =600x408) ## Intro Welcome to episode twenty five! This is your host, Doug Smith. This is Not An AI art podcast is a podcast about, well, AI ART – technology, community, and techniques. With a focus on stable diffusion, but all art tools are up for grabs, from the pencil on up, and including pay-to-play tools, like Midjourney. Less philosophy – more tire kicking. But if the philosophy gets in the way, we'll cover it. But plenty of art theory! Today we've got: * News: DALL-E 3 Inpainting, IPAdapter v2 for ComfyUI, Meta.ai * Art Nerd Rant: The Orsay Museum & Berthe Morisot * Model madness: One model / one LoRA * 2 Techniques of the week: A comfy workflow for ipadapter + lora (and more) and also using control net line art to influence composition Available on: * [Spotify](https://open.spotify.com/show/4RxBUvcx71dnOr1e1oYmvV) * [iHeartRadio](https://www.iheart.com/podcast/269-this-is-not-an-ai-art-podc-112887791/) Show notes are always included and include all the visuals, prompts and technique examples, the format is intended to be so that you don't have to be looking at your screen -- but the show notes have all the imagery and prompts and details on the processes we look at. # News ## DALL-E 3 Inpainting While not even remotely as powerful as StableDiffusion, could be handy on the go... NB: Probably only for paid accounts. I tested it with "i can haz cheezburger" style meme. Generate as per usual. Then click into the image... You'll get a brush to paint areas... ![di3](https://hackmd.io/_uploads/r1lsjHnl0.jpg) Then you can add whatever prose about the edits that you want. ![di4](https://hackmd.io/_uploads/B1loiHnxA.jpg) ## Midjourney Character Reference So! Midjourney has a consistent character feature that's pretty cool. I wonder if it's a form of IPAdapter behind the scenes? I tried it using a generation that (while it has some proportional and foreshortening problems!!!) does a decent job of representing a character I often portray, Adirondack French Louie. (It's midjourney generations that I then photobashed together and processed in a1111, about a year ago) ![cref-frenchlouie](https://hackmd.io/_uploads/Sk_9CaTgA.jpg) ``` The French Canadian trapper from the Adirondacks French Louie is riding a horse, 1880s clothing, cinematic still, at dusk, spring, evening --ar 16:9 --cref https://s.mj.run/4IRL35gaF8E ``` ![dougbtv_The_French_Canadian_trap (1)](https://hackmd.io/_uploads/HJT30ppgC.jpg) ![dougbtv_The_French_Canadian_trap](https://hackmd.io/_uploads/rJ63AaTg0.jpg) Cool, not all bad, I could work with those in general. Now let's put him somewhere we wouldn't expect him... Let's note that there's some concept bleed! It's picking up on the packbasket. We saw it with the horses too. ``` The French Canadian trapper from the Adirondacks French Louie is drinking whiskey in the international space station, gleeful, 1880s clothing, 1980s space technology, CRT monitor, astrophotography, cinematic still --ar 16:9 --cref https://s.mj.run/4IRL35gaF8E ``` ![dougbtv_The_French_Canadian_trap (3)](https://hackmd.io/_uploads/HyIZ1A6xR.jpg) So, this next time, let's try with a cropped image... ![cref-frenchlouie2](https://hackmd.io/_uploads/Sy5o106gR.jpg) ``` The French Canadian trapper from the Adirondacks French Louie is drinking whiskey from the bottle in the international space station, gleeful, 1880s clothing, 1980s space technology, CRT monitor, astrophotography, cinematic still --ar 16:9 --cref https://s.mj.run/k1o4QHoW4Vg ``` Way better! There's still concept bleed, he's often slouched. In this case it works well for these chill pictures of Louie drinkin' right from the bottle. ![dougbtv_The_French_Canadian_trap (4)](https://hackmd.io/_uploads/rJIWlCTg0.jpg) ## Meta.ai new release Meta (you know, Facebook) has a new release of [meta.ai](https://www.meta.ai/), and while I don't think it's something to be taken seriously compared to more in-depth tools, like Stable Diffusion (and Midjourney, in particular) * Interesting "real time preview" of a single seed as you type, almost feels like an SDXL Lightning type of functionality with generations in few steps, as they're delivering these very very quickly * Does some animations And of course... * Definitely very limiting in what you can output. Don't put anything even moderately recognizable for a person name, it'll quit on you. * Has that super oversaturated look like DALL-E, overemphasized, over puffed. It's like they make them * Comes complete with a watermark. Tools like this have their place, but they're kind of one dimensional. ![image](https://hackmd.io/_uploads/r1pWaA1WA.png) ![image](https://hackmd.io/_uploads/r1ywaCy-R.png) And of course... It'll ![spaghetti](https://hackmd.io/_uploads/Sk0iT0kbC.jpg) It'll blur if it doesn't like something, like me mentioning Dom Deluise. ![image](https://hackmd.io/_uploads/SyUnRRkWR.png) ## And now the real news: IP Adapter v2 node for ComfyUI is out! * Available on [Github @ cubiq/ComfyUI_IPAdapter_plus](https://github.com/cubiq/ComfyUI_IPAdapter_plus) * [Video from Latent Vision on Youtube](https://www.youtube.com/watch?v=_JzDcgKgghY) * [Style + Composition video from Latent Vision](https://www.youtube.com/watch?v=czcgJnoDVd4) The work on this node is incredible and it's a HUGE upgrade from previous ipadpaters for SDXL in my experience. Feels really tunable and also feels like it's not impeding the model as much it felt like, sometimes too literal or something. Really flexible with the composition & style transfer. Composition isn't perfect but it's a huge boon for quickly getting compositions together and changing up what can otherwise be boring compositions. Kudos to [@cubiq](https://github.com/cubiq)! ### And I put together some workflows for it... * [Comfy workflow available on openart.ai](https://openart.ai/workflows/dougbtv/this-is-not-an-ai-art-podcast-ip-adapter-v2-lora/83pFZkpV8hwIzZMkZBad) Using the new IP Adapter v2 + [the victorian style LoRA](https://civitai.com/models/202690/victorian-style-lora-xl) ![Screenshot 2024-04-07 132915](https://hackmd.io/_uploads/HJRTcLeg0.png) ### IP Adapter v2 Style & Composition + LoRA + Face detailer + Hand detailer + Upscale And that wasn't enough for me -- so I doubled down on it. I created a workflow that has... * IP Adapter v2 -- style and composition * Random selection of images from a folder to give you variations for input styles and compositions * Face detailer: 2 passes * Disable the second pass for portraits, also, play with the `bbox_crop_factor`, lower for higher correction, higher for less / more blended correction. * Hand detailer * Also using face detailer, with a hand model * Ultimate SD Upscaler * Disabled I also downloaded a decent dataset from wikiart, available in [the dataset README of the (rather aged) ArtGAN project](https://github.com/cs-chan/ArtGAN/blob/master/WikiArt%20Dataset/README.md) for experimentation, but I picked out pieces by [John Singer Sargent](https://en.wikipedia.org/wiki/John_Singer_Sargent) (one of my personal favorites!) ![comfy-combo-workflow-screencap](https://hackmd.io/_uploads/SJiT2DTxR.jpg) # Art Nerd Rant: Berthe Morisot Got to go to [the Musée d'Orsay](https://www.musee-orsay.fr/en) ([wikipedia](https://en.wikipedia.org/wiki/Mus%C3%A9e_d%27Orsay)), which is the ultimate in art for the time period I enjoy the most -- ~1850-1915, which is bang on for me. It was basically a religious experience. I noticed [Degas' Abstinthe Drinker](https://en.wikipedia.org/wiki/L%27Absinthe) from across the exhibit and went over to it, in utter awe, and I didn't realize where I was... I was in the middle of a Degas exhibit and I recognized EVERYTHING in the room. I literally went to tears, it was overwhelming. ![](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e8/Edgar_Degas_-_In_a_Caf%C3%A9_-_Google_Art_Project_2.jpg/270px-Edgar_Degas_-_In_a_Caf%C3%A9_-_Google_Art_Project_2.jpg) Otherwise, it was an awesome experience of building up art references! AI generative art has reignited my passion for art history. The jargon terminology is so relevant! And I appreciate it in a new way. Granted -- I went to the Louve and didn't see the Mona Lisa -- too long to wait, I wanted to eat pastries instead. Found a couple artists I'd really been sleeping on, one piece that I was quite taken by was *The Knight of the Flowers* by [Georges Rochegrosse](https://en.wikipedia.org/wiki/Georges_Rochegrosse). This thing is HUGE, it's approximately 7ft x 12ft. ![](https://upload.wikimedia.org/wikipedia/commons/thumb/6/61/Le_Chevalier_aux_Fleurs_2560x1600.png/1920px-Le_Chevalier_aux_Fleurs_2560x1600.png) It's got a lot of accuracy with "just a bit of chill" and it has this fantastic scene -- unless of course your day-to-day includes dressing in plate armor surrounded by naked women. Overall a major artist I feel like I hadn't appreciated as much as I should've is [Berthe Morisot](https://en.wikipedia.org/wiki/Berthe_Morisot). ![](https://upload.wikimedia.org/wikipedia/commons/thumb/a/a8/La_Coiffure_-_Berthe_Morisot.jpg/800px-La_Coiffure_-_Berthe_Morisot.jpg) I really feel like Berthe had some really tender subject matter that I really like. But the reason I'm really bringing up Berthe is because of a technique that she used in her `Synthesis` period, 1887–1895... > Morisot started to use the technique of squaring and the medium of tracing paper to transcribe her drawing to the canvas exactly. By employing this new method, Morisot was able to create compositions with more complicated interaction between figures. She stressed the composition and the forms while her Impressionist brushstrokes still remained. Her original synthesis of the Impressionist touch with broad strokes and light reflections, and the graphic approach featured by clear lines, made her late works distinctive. There's a lot to unpack here for me, and for us -- check this out... first... Using tracing paper. I feel like that's an easy dig on someone's craft "Oh, well you just used tracing paper" -- No. You've gotta go to the next level. Using tracing paper is using a reference to accurately draft something. Yes, indeed, there's something different about drawing from life and how you interpret form and light, but, as a matter of technique -- *genius*. I also feel like it's an easy dig on craft in our own space, "Oh you use AI art". No. That's just a tool, just like tracing paper. Using AI art surely allows you to create something quickly by typing in a few words. But that's only where it begins. Having a further range of techniques, knowledge, vision and yes, taste, makes a difference. We can iterate on those generations and do more. I really think that DJs have this same thing happening too. Often what DJs do is even more literal than what we might do with AI art tools. They copy something 1:1 and then mix and manipulate it. It's easy for someone who might have a very traditional approach at music to take a dig at it In fact... It's really interesting that it says: > Morisot was able to create compositions with more complicated interaction between figures You know why? Those are HARD with ai/ml tools. Let's do an experiment... Complex interactions... From SDXL ``` the woman from Vermont is having her hair brushed by her best friend, brushing hair, hairstyling, low light at dusk, in a cottage living room overlooking the mountains of Vermont, film photography by Natalia Drepina Negative prompt: 3d, render, cgi Steps: 45, Sampler: DPM++ 2M Karras, CFG scale: 4, Seed: 1290370424, Size: 1024x816, Model hash: c9e3e68f89, Model: juggernautXL_v9Rundiffusionphoto2, Version: v1.6.1 ``` ![sdxl-hairbrush](https://hackmd.io/_uploads/HkyNS90eR.jpg) Not terrible. Like... Now from DALL-E ![image](https://hackmd.io/_uploads/Bkkvr9CeA.png) ### I've used this technique before... I generated with DALL-E because I was having trouble getting a woman riding on the shoulders of this guy (it's a well known piece of regional lore), here's [an illustration of it someone posted on Reddit](https://www.reddit.com/r/Adirondacks/comments/7pwx18/hitch_up_matilda/) So I got it, but, it looks WAY too DALL-E-ish and I didn't like the old dude, either... ![hitchup-matilda-c1-example](https://hackmd.io/_uploads/S1jv3jCxR.jpg) The landscape isn't representative of the region either, so I mixed it with an actual location of where the event happened... ![avalance-lake-hitchup-matildas](https://hackmd.io/_uploads/Sya2siRgA.jpg) And then I bashed those and inpainted it, and wound up with a final more like: ![hitchup-matilda-final-v1-example](https://hackmd.io/_uploads/rknypiAgC.jpg) # Model Madness ## Photopedia XL * [On Civitai](https://civitai.com/models/189109?modelVersionId=259323) Example workflows have a lot of "cargo cult" kind of stuff, lots of buzz words and fixes for typical things (`textured skin`, for example). So let's try one of those first -- but I removed the wordy negative (I'm not big on them for SDXL) I'm a little disappointed with the flappers right out of the gate with a sample prompt super slimmed down and added 1920s flapper. These are nice portrait renders overall, but... There's not a lot that says "flapper" to me. ``` 1920s flapper, inviting decor, RAW photo, (high detailed skin:1.2), 8k UHD, DSLR, soft lighting, high quality, film grain, Fujifilm XT3, RAW candid cinema, 16mm, color graded portra 400 film, remarkable color, ultra realistic, textured skin, remarkable detailed pupils, realistic dull skin noise, visible skin detail, skin fuzz, dry skin, shot with cinematic camera, feminine expressions, photography, 35mm, Nikon D850 film stock photograph, Kodak Portra 400 camera f1.6 lens, 8k, UHD Negative prompt: 3d, render, cgi, painting, monochrome Steps: 45, Sampler: DPM++ 2M Karras, CFG scale: 4, Seed: 2882338992, Size: 816x1024, Model hash: 3865942aef, Model: photopediaXL_45, Version: v1.6.1 ``` ![photopediaxl-flapper-1](https://hackmd.io/_uploads/rkr1X3plR.jpg) So let's try a more slimmed down one so we can get more attention of our subject... Cool, we get way more attention on the actual subject, and the renders aren't worse per se. I'd say so far these are looking "above average", not mind blowing. ``` the sly 1920s flapper in a dimly lit speakeasy, gorgeous, chicagoland ganster era, color photograph, street photography Negative prompt: 3d, render, cgi, painting, monochrome Steps: 45, Sampler: DPM++ 2M Karras, CFG scale: 4, Seed: 2769443026, Size: 816x1024, Model hash: 3865942aef, Model: photopediaXL_45, Version: v1.6.1 ``` ![photopediaxl-flapper-2](https://hackmd.io/_uploads/SkB3XhaeC.jpg) Let's try another subject... How about some tilt shift photography of some models? Always fun... ``` tilt shift photography of a model trainset in New Hampshire, lakeside town, model trains, steam engine, 4k photo, depth of field Negative prompt: 3d, render, cgi, painting, monochrome Steps: 45, Sampler: DPM++ 2M Karras, CFG scale: 4, Seed: 492851229, Size: 1024x816, Model hash: 3865942aef, Model: photopediaXL_45, Version: v1.6.1 ``` ![photopediaxl-trains1](https://hackmd.io/_uploads/ByEY42ax0.jpg) ## Semantic Shift LoRA * [On Civitai](https://civitai.com/models/347097?modelVersionId=397407) * [Reddit post](https://www.reddit.com/r/StableDiffusion/comments/1bhk639/2_new_loras_workflow_in_comments/?share_id=Y1LoWjIlpS23M2hE4CUtL&utm_content=1&utm_medium=android_app&utm_name=androidcss&utm_source=share&utm_term=13) A neat idea for a really artsy LoRA, it's more conceptual and interesting than it is like... concrete, in a way. From the author: > Basically, the goal of this LoRa is to "semantically shift" SDXL such that terms that have a set meaning are entirely changed in an internally consistent manner. I used a technique to do this partially in the Unsettling LoRa, although it was overtrained, and became intrigued by the idea that "good" prompts remain "good," albeit on a different axis, even if the internal understanding of them "shifts" within a given model. In other words: a unique and interesting prompt can create unique and interesting images in multiple new and unique themes if you play with the brain of the model in a directed way. > How did I do this? > I found areas of overtraining within SDXL and targeted them. Mona Lisa, Pillars of Creation, etc, and I redirected them to new images. As I suspected, this had ripple effects in the way the entire model perceives the concepts connected to the images modified, and these effects are quite substantial. This is so dang neat. Let's try some! Some of the prompts are just pure chaos and amazing, it looks fun to prompt: ``` A precise clear vibrant glitch 1980s anime featuring the face of the ein sof, her beauty laid bare, where all things are her and she is everything else, wearing Tel Aviv like a psychedelic dress weaved from human existence. Vaporwave classic anime, by Gustav Klimt in his CRT static period. Sold at auction for 23 million USD ``` Oh yeah, this is fun... here's my first prompt: ``` <lora:That_Special_Face_2:0.85> the 1920s flapper in the astral plane, international space station, confronted by the statue of liberty, diametrically opposed, art direction by Frank Lloyd Wright, abstract photography by Marie Antoinette Negative prompt: 3d, render, cgi, painting, monochrome Steps: 45, Sampler: DPM++ 2M Karras, CFG scale: 4, Seed: 1946832649, Size: 1024x816, Model hash: c9e3e68f89, Model: juggernautXL_v9Rundiffusionphoto2, Lora hashes: "That_Special_Face_2: dd5b32ebea7e", Version: v1.6.1 ``` ![shift-flapper2](https://hackmd.io/_uploads/S1H0P3TgC.jpg) Actually my first one while testing the prompt was even kinda rad ![shift-flapper1](https://hackmd.io/_uploads/H1fy_nTeA.jpg) Not all is magic though, these turned out pretty lame and predictable... ``` <lora:That_Special_Face_2:0.85> a mechanoid enforcement bot, damaged by electrical shock, DOS game, industrial art by John Singer Sargent Negative prompt: 3d, render, cgi, painting, monochrome Steps: 45, Sampler: DPM++ 2M Karras, CFG scale: 4, Seed: 1727651844, Size: 1024x816, Model hash: c9e3e68f89, Model: juggernautXL_v9Rundiffusionphoto2, Lora hashes: "That_Special_Face_2: dd5b32ebea7e", Version: v1.6.1 ``` ![shift-robot1](https://hackmd.io/_uploads/SyrlYh6x0.jpg) This came out slightly better, note that I've been using `0.85` strength. It's still oddly too literal for what I prompted for (despite it being a word soup, one of my own brand though haha) ``` <lora:That_Special_Face_2:0.85> her hair ripples through spacetime, rococo drapery, inspired by Back To the Future, orientalism, photorealistic fauvism, tenebrism depiction of The Peoples Court, UHF television broadcast Negative prompt: 3d, render, cgi, painting, monochrome Steps: 45, Sampler: DPM++ 2M Karras, CFG scale: 4, Seed: 3602349928, Size: 1024x816, Model hash: c9e3e68f89, Model: juggernautXL_v9Rundiffusionphoto2, Lora hashes: "That_Special_Face_2: dd5b32ebea7e", Version: v1.6.1 ``` ![shift-roc-2](https://hackmd.io/_uploads/BygJZaplA.jpg) I really kinda like the TV one, so I [posted it on the LoRA page](https://civitai.com/images/10124251). ![shift-roc-2-4th](https://hackmd.io/_uploads/S1dZW6alA.jpg) And I can't help but manipulate it a little -- I did a quick crop in 16:9 format to position the subject more, used Photoshop generative fill to fill in the blank, and then upscaled it with gigapixel -- < 5 mins of work and a lot of improvement imo. We have this cool opportunity for a "frame within a frame" with the TV face. ![00052-3602349931-gigapixel-standard-scale-2_00x-jpg](https://hackmd.io/_uploads/BkQPZApeA.jpg) Let's bump up the strength... I usually start with a LoRA low, but lets let 'er rip and go with `1.2` It does blow up the saturation a bit, which is common for a high strength (and overbake, sometimes, not necessarily here), but, it definitely ups the weirdness and looks more like the sample images. ``` <lora:That_Special_Face_2:1.2> her hair ripples through spacetime, rococo drapery, inspired by Back To the Future, orientalism, photorealistic fauvism, tenebrism depiction of The Peoples Court, UHF television broadcast Negative prompt: 3d, render, cgi, painting, monochrome Steps: 45, Sampler: DPM++ 2M Karras, CFG scale: 4, Seed: 1806571039, Size: 1024x816, Model hash: c9e3e68f89, Model: juggernautXL_v9Rundiffusionphoto2, Lora hashes: "That_Special_Face_2: dd5b32ebea7e", Version: v1.6.1 ``` ![shift-roc-3](https://hackmd.io/_uploads/r12WM6alA.jpg) I'm having too much fun, so let's put it through the [psyfi XL](https://civitai.com/models/228162/psyfi-xl) model, which is super tripped out, should be fun. Aaaand, it turns out looking, stereotypically trippy cheesy. ![shift-roc-4-jpg](https://hackmd.io/_uploads/SkAKsTTlR.jpg) ## Technique of the week: Multiple control nets for compositional control Overview of the process: * Converted asset using control net line art * Map bashed the line art to create a composition * Wound up using two levels of line art control net * One for steps 0.0 through ~0.3 * One for steps 0.3 through ~0.6 * Used the image as img2img in control net So my inspiration was this photo of [Mrs. A.G. Vanderbilt])(https://commons.wikimedia.org/wiki/Category:Margaret_Mary_Emerson#/media/File:Mrs._A.G._Vanderbilt_(Mrs._Margaret_Emerson_McKim)_LCCN2014686089_(cropped).jpg): ![Mrs._A.G._Vanderbilt_(Mrs._Margaret_Emerson_McKim)_LCCN2014686089_(cropped)](https://hackmd.io/_uploads/ry7NKSmeR.jpg) I wound up with an initial bash that looked like this: ![emerson-map-bashed-la](https://hackmd.io/_uploads/rJFotH7eC.png) And then with a prompt like: ``` a victorian heiress posed on a fauteuil on a lake house porch in the Adirondacks, evening, colorful spring scene, RAW photo, color photography by Marta Bevacqua <lora:gildedvictorians_v1:0.6> Negative prompt: 3d, render, cgi Steps: 35, Sampler: DPM++ 2M Karras, CFG scale: 4.5, Seed: 2057729327, Size: 360x640, Model hash: 879db523c3, Model: dreamshaper_8, Denoising strength: 0.35, ControlNet 0: "Module: none, Model: control_v11p_sd15_lineart [43d4be0d], Weight: 1, Resize Mode: Crop and Resize, Low Vram: False, Guidance Start: 0, Guidance End: 0.6, Pixel Perfect: True, Control Mode: Balanced, Hr Option: Low res only, Save Detected Map: True", Hires upscale: 2, Hires steps: 15, Hires upscaler: ESRGAN_4x, Lora hashes: "gildedvictorians_v1: f5ff52efa828", Version: v1.6.1 ``` Note that I'm using an SD1.5 based model, dreamshaper_8 here -- I eventually used juggernaut final. Using my [victorian style lora](https://civitai.com/models/202690/victorian-style-lora-xl) ...by the way word of the day, `fauteuil` -- learned that by having GPT4 describe the reference image to me. Wound up with results like: ![em-ex1](https://hackmd.io/_uploads/rkAEcSmlC.jpg) I still didn't love the composition, and I decided I wanted it to be on a dock... ![emerson-map-bashed-la6](https://hackmd.io/_uploads/Bkph9BmeA.png) Note how I tried to draw in the background to have a repetition of form of the dock and the sky ![em-ex2](https://hackmd.io/_uploads/H1nWsSXgA.jpg) I like this one more, but, the sky is cartoony. I was also not having great luck with getting a nighttime photo, maybe because of the way the line art control net map was rendered, so... I started to give up on it. I wanted something more subtle, and I wasn't having great luck, so I started putting together two line art maps, and using two control nets for the same generation, the first one for a third of the steps, and then a second with the rest of the steps (minus some at the end, I feel like it always helps to let go of the control net towards the end) ![emerson-map-bashed-la13c](https://hackmd.io/_uploads/HJVciSXlR.png) So yeah, just first a super duper basic one. With the idea of the composition that I want, where I want these implied lines to be. Then, one with a bit more detail (modestly!) following the same idea ![emerson-map-bashed-la13a](https://hackmd.io/_uploads/ry5hsH7eC.png) And got results like: ![em-ex4](https://hackmd.io/_uploads/SkDm2BQgA.jpg) Then I ran it back through SDXL with img2image (maybe denoise around `0.4`), with results like: ![em-ex5](https://hackmd.io/_uploads/HyTD3BmlC.jpg) I then inpainted and touched it up (I think I chose a chair from a different generation, too) And the final... ![em-ex3](https://hackmd.io/_uploads/Bkqk2r7lC.jpg)