Try   HackMD

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Intro

Welcome to episode twenty five! This is your host, Doug Smith. This is Not An AI art podcast is a podcast about, well, AI ART – technology, community, and techniques. With a focus on stable diffusion, but all art tools are up for grabs, from the pencil on up, and including pay-to-play tools, like Midjourney. Less philosophy – more tire kicking. But if the philosophy gets in the way, we'll cover it.

But plenty of art theory!

Today we've got:

  • News: DALL-E 3 Inpainting, IPAdapter v2 for ComfyUI, Meta.ai
  • Art Nerd Rant: The Orsay Museum & Berthe Morisot
  • Model madness: One model / one LoRA
  • 2 Techniques of the week: A comfy workflow for ipadapter + lora (and more) and also using control net line art to influence composition

Available on:

Show notes are always included and include all the visuals, prompts and technique examples, the format is intended to be so that you don't have to be looking at your screen but the show notes have all the imagery and prompts and details on the processes we look at.

News

DALL-E 3 Inpainting

While not even remotely as powerful as StableDiffusion, could be handy on the go

NB: Probably only for paid accounts.

I tested it with "i can haz cheezburger" style meme.

Generate as per usual. Then click into the image

You'll get a brush to paint areas

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Then you can add whatever prose about the edits that you want.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Midjourney Character Reference

So! Midjourney has a consistent character feature that's pretty cool.

I wonder if it's a form of IPAdapter behind the scenes?

I tried it using a generation that (while it has some proportional and foreshortening problems!!!) does a decent job of representing a character I often portray, Adirondack French Louie.

(It's midjourney generations that I then photobashed together and processed in a1111, about a year ago)

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

The French Canadian trapper from the Adirondacks French Louie is riding a horse, 1880s clothing, cinematic still, at dusk, spring, evening --ar 16:9 --cref https://s.mj.run/4IRL35gaF8E

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Cool, not all bad, I could work with those in general. Now let's put him somewhere we wouldn't expect him

Let's note that there's some concept bleed! It's picking up on the packbasket. We saw it with the horses too.

The French Canadian trapper from the Adirondacks French Louie is drinking whiskey in the international space station, gleeful, 1880s clothing, 1980s space technology, CRT monitor, astrophotography, cinematic still --ar 16:9 --cref https://s.mj.run/4IRL35gaF8E

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

So, this next time, let's try with a cropped image

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

The French Canadian trapper from the Adirondacks French Louie is drinking whiskey from the bottle in the international space station, gleeful, 1880s clothing, 1980s space technology, CRT monitor, astrophotography, cinematic still --ar 16:9 --cref https://s.mj.run/k1o4QHoW4Vg

Way better! There's still concept bleed, he's often slouched. In this case it works well for these chill pictures of Louie drinkin' right from the bottle.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Meta.ai new release

Meta (you know, Facebook) has a new release of meta.ai, and while I don't think it's something to be taken seriously compared to more in-depth tools, like Stable Diffusion (and Midjourney, in particular)

  • Interesting "real time preview" of a single seed as you type, almost feels like an SDXL Lightning type of functionality with generations in few steps, as they're delivering these very very quickly
  • Does some animations

And of course

  • Definitely very limiting in what you can output. Don't put anything even moderately recognizable for a person name, it'll quit on you.
  • Has that super oversaturated look like DALL-E, overemphasized, over puffed. It's like they make them
  • Comes complete with a watermark.

Tools like this have their place, but they're kind of one dimensional.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

And of course

It'll

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

It'll blur if it doesn't like something, like me mentioning Dom Deluise.

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

And now the real news: IP Adapter v2 node for ComfyUI is out!

The work on this node is incredible and it's a HUGE upgrade from previous ipadpaters for SDXL in my experience.

Feels really tunable and also feels like it's not impeding the model as much it felt like, sometimes too literal or something. Really flexible with the composition & style transfer.

Composition isn't perfect but it's a huge boon for quickly getting compositions together and changing up what can otherwise be boring compositions.

Kudos to @cubiq!

And I put together some workflows for it

Using the new IP Adapter v2 + the victorian style LoRA

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

IP Adapter v2 Style & Composition + LoRA + Face detailer + Hand detailer + Upscale

And that wasn't enough for me so I doubled down on it. I created a workflow that has

  • IP Adapter v2 style and composition
  • Random selection of images from a folder to give you variations for input styles and compositions
  • Face detailer: 2 passes
    • Disable the second pass for portraits, also, play with the bbox_crop_factor, lower for higher correction, higher for less / more blended correction.
  • Hand detailer
    • Also using face detailer, with a hand model
  • Ultimate SD Upscaler
    • Disabled

I also downloaded a decent dataset from wikiart, available in the dataset README of the (rather aged) ArtGAN project for experimentation, but I picked out pieces by John Singer Sargent (one of my personal favorites!)

Image Not Showing Possible Reasons
  • The image was uploaded to a note which you don't have access to
  • The note which the image was originally uploaded to has been deleted
Learn More →

Art Nerd Rant: Berthe Morisot

Got to go to the Musée d'Orsay (wikipedia), which is the ultimate in art for the time period I enjoy the most ~1850-1915, which is bang on for me.

It was basically a religious experience. I noticed Degas' Abstinthe Drinker from across the exhibit and went over to it, in utter awe, and I didn't realize where I was I was in the middle of a Degas exhibit and I recognized EVERYTHING in the room. I literally went to tears, it was overwhelming.

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

Otherwise, it was an awesome experience of building up art references! AI generative art has reignited my passion for art history. The jargon terminology is so relevant! And I appreciate it in a new way.

Granted I went to the Louve and didn't see the Mona Lisa too long to wait, I wanted to eat pastries instead.

Found a couple artists I'd really been sleeping on, one piece that I was quite taken by was The Knight of the Flowers by Georges Rochegrosse. This thing is HUGE, it's approximately 7ft x 12ft.

It's got a lot of accuracy with "just a bit of chill" and it has this fantastic scene unless of course your day-to-day includes dressing in plate armor surrounded by naked women.

Overall a major artist I feel like I hadn't appreciated as much as I should've is Berthe Morisot.

I really feel like Berthe had some really tender subject matter that I really like.

But the reason I'm really bringing up Berthe is because of a technique that she used in her Synthesis period, 1887–1895

Morisot started to use the technique of squaring and the medium of tracing paper to transcribe her drawing to the canvas exactly. By employing this new method, Morisot was able to create compositions with more complicated interaction between figures. She stressed the composition and the forms while her Impressionist brushstrokes still remained. Her original synthesis of the Impressionist touch with broad strokes and light reflections, and the graphic approach featured by clear lines, made her late works distinctive.

There's a lot to unpack here for me, and for us check this out first

Using tracing paper. I feel like that's an easy dig on someone's craft "Oh, well you just used tracing paper" No. You've gotta go to the next level. Using tracing paper is using a reference to accurately draft something. Yes, indeed, there's something different about drawing from life and how you interpret form and light, but, as a matter of technique genius.

I also feel like it's an easy dig on craft in our own space, "Oh you use AI art". No. That's just a tool, just like tracing paper. Using AI art surely allows you to create something quickly by typing in a few words. But that's only where it begins. Having a further range of techniques, knowledge, vision and yes, taste, makes a difference. We can iterate on those generations and do more.

I really think that DJs have this same thing happening too. Often what DJs do is even more literal than what we might do with AI art tools. They copy something 1:1 and then mix and manipulate it. It's easy for someone who might have a very traditional approach at music to take a dig at it

In fact It's really interesting that it says:

Morisot was able to create compositions with more complicated interaction between figures

You know why? Those are HARD with ai/ml tools.

Let's do an experiment

Complex interactions From SDXL

the woman from Vermont is having her hair brushed by her best friend, brushing hair, hairstyling, low light at dusk, in a cottage living room overlooking the mountains of Vermont, film photography by Natalia Drepina
Negative prompt: 3d, render, cgi
Steps: 45, Sampler: DPM++ 2M Karras, CFG scale: 4, Seed: 1290370424, Size: 1024x816, Model hash: c9e3e68f89, Model: juggernautXL_v9Rundiffusionphoto2, Version: v1.6.1

sdxl-hairbrush

Not terrible. Like

Now from DALL-E

image

I've used this technique before

I generated with DALL-E because I was having trouble getting a woman riding on the shoulders of this guy (it's a well known piece of regional lore), here's an illustration of it someone posted on Reddit

So I got it, but, it looks WAY too DALL-E-ish and I didn't like the old dude, either

hitchup-matilda-c1-example

The landscape isn't representative of the region either, so I mixed it with an actual location of where the event happened

avalance-lake-hitchup-matildas

And then I bashed those and inpainted it, and wound up with a final more like:

hitchup-matilda-final-v1-example

Model Madness

Photopedia XL

Example workflows have a lot of "cargo cult" kind of stuff, lots of buzz words and fixes for typical things (textured skin, for example). So let's try one of those first but I removed the wordy negative (I'm not big on them for SDXL)

I'm a little disappointed with the flappers right out of the gate with a sample prompt super slimmed down and added 1920s flapper. These are nice portrait renders overall, but There's not a lot that says "flapper" to me.

1920s flapper, inviting decor, RAW photo, (high detailed skin:1.2), 8k UHD, DSLR, soft lighting, high quality, film grain, Fujifilm XT3, RAW candid cinema, 16mm, color graded portra 400 film, remarkable color, ultra realistic, textured skin, remarkable detailed pupils, realistic dull skin noise, visible skin detail, skin fuzz, dry skin, shot with cinematic camera, feminine expressions, photography, 35mm, Nikon D850 film stock photograph, Kodak Portra 400 camera f1.6 lens, 8k, UHD
Negative prompt: 3d, render, cgi, painting, monochrome
Steps: 45, Sampler: DPM++ 2M Karras, CFG scale: 4, Seed: 2882338992, Size: 816x1024, Model hash: 3865942aef, Model: photopediaXL_45, Version: v1.6.1

photopediaxl-flapper-1

So let's try a more slimmed down one so we can get more attention of our subject

Cool, we get way more attention on the actual subject, and the renders aren't worse per se.

I'd say so far these are looking "above average", not mind blowing.

the sly 1920s flapper in a dimly lit speakeasy, gorgeous, chicagoland ganster era, color photograph, street photography
Negative prompt: 3d, render, cgi, painting, monochrome
Steps: 45, Sampler: DPM++ 2M Karras, CFG scale: 4, Seed: 2769443026, Size: 816x1024, Model hash: 3865942aef, Model: photopediaXL_45, Version: v1.6.1

photopediaxl-flapper-2

Let's try another subject

How about some tilt shift photography of some models? Always fun

tilt shift photography of a model trainset in New Hampshire, lakeside town, model trains, steam engine, 4k photo, depth of field
Negative prompt: 3d, render, cgi, painting, monochrome
Steps: 45, Sampler: DPM++ 2M Karras, CFG scale: 4, Seed: 492851229, Size: 1024x816, Model hash: 3865942aef, Model: photopediaXL_45, Version: v1.6.1

photopediaxl-trains1

Semantic Shift LoRA

A neat idea for a really artsy LoRA, it's more conceptual and interesting than it is like concrete, in a way.

From the author:

Basically, the goal of this LoRa is to "semantically shift" SDXL such that terms that have a set meaning are entirely changed in an internally consistent manner. I used a technique to do this partially in the Unsettling LoRa, although it was overtrained, and became intrigued by the idea that "good" prompts remain "good," albeit on a different axis, even if the internal understanding of them "shifts" within a given model. In other words: a unique and interesting prompt can create unique and interesting images in multiple new and unique themes if you play with the brain of the model in a directed way.

How did I do this?

I found areas of overtraining within SDXL and targeted them. Mona Lisa, Pillars of Creation, etc, and I redirected them to new images. As I suspected, this had ripple effects in the way the entire model perceives the concepts connected to the images modified, and these effects are quite substantial.

This is so dang neat.

Let's try some!

Some of the prompts are just pure chaos and amazing, it looks fun to prompt:

A precise clear vibrant glitch 1980s anime featuring the face of the ein sof, her beauty laid bare, where all things are her and she is everything else, wearing Tel Aviv like a psychedelic dress weaved from human existence. Vaporwave classic anime, by Gustav Klimt in his CRT static period. Sold at auction for 23 million USD

Oh yeah, this is fun here's my first prompt:

<lora:That_Special_Face_2:0.85> the 1920s flapper in the astral plane, international space station, confronted by the statue of liberty, diametrically opposed, art direction by Frank Lloyd Wright, abstract photography by Marie Antoinette
Negative prompt: 3d, render, cgi, painting, monochrome
Steps: 45, Sampler: DPM++ 2M Karras, CFG scale: 4, Seed: 1946832649, Size: 1024x816, Model hash: c9e3e68f89, Model: juggernautXL_v9Rundiffusionphoto2, Lora hashes: "That_Special_Face_2: dd5b32ebea7e", Version: v1.6.1

shift-flapper2

Actually my first one while testing the prompt was even kinda rad

shift-flapper1

Not all is magic though, these turned out pretty lame and predictable

<lora:That_Special_Face_2:0.85> a mechanoid enforcement bot, damaged by electrical shock, DOS game, industrial art by John Singer Sargent
Negative prompt: 3d, render, cgi, painting, monochrome
Steps: 45, Sampler: DPM++ 2M Karras, CFG scale: 4, Seed: 1727651844, Size: 1024x816, Model hash: c9e3e68f89, Model: juggernautXL_v9Rundiffusionphoto2, Lora hashes: "That_Special_Face_2: dd5b32ebea7e", Version: v1.6.1

shift-robot1

This came out slightly better, note that I've been using 0.85 strength.

It's still oddly too literal for what I prompted for (despite it being a word soup, one of my own brand though haha)

<lora:That_Special_Face_2:0.85> her hair ripples through spacetime, rococo drapery, inspired by Back To the Future, orientalism, photorealistic fauvism, tenebrism depiction of The Peoples Court, UHF television broadcast
Negative prompt: 3d, render, cgi, painting, monochrome
Steps: 45, Sampler: DPM++ 2M Karras, CFG scale: 4, Seed: 3602349928, Size: 1024x816, Model hash: c9e3e68f89, Model: juggernautXL_v9Rundiffusionphoto2, Lora hashes: "That_Special_Face_2: dd5b32ebea7e", Version: v1.6.1

shift-roc-2

I really kinda like the TV one, so I posted it on the LoRA page.

shift-roc-2-4th

And I can't help but manipulate it a little I did a quick crop in 16:9 format to position the subject more, used Photoshop generative fill to fill in the blank, and then upscaled it with gigapixel < 5 mins of work and a lot of improvement imo. We have this cool opportunity for a "frame within a frame" with the TV face.

00052-3602349931-gigapixel-standard-scale-2_00x-jpg

Let's bump up the strength I usually start with a LoRA low, but lets let 'er rip and go with 1.2

It does blow up the saturation a bit, which is common for a high strength (and overbake, sometimes, not necessarily here), but, it definitely ups the weirdness and looks more like the sample images.

<lora:That_Special_Face_2:1.2> her hair ripples through spacetime, rococo drapery, inspired by Back To the Future, orientalism, photorealistic fauvism, tenebrism depiction of The Peoples Court, UHF television broadcast
Negative prompt: 3d, render, cgi, painting, monochrome
Steps: 45, Sampler: DPM++ 2M Karras, CFG scale: 4, Seed: 1806571039, Size: 1024x816, Model hash: c9e3e68f89, Model: juggernautXL_v9Rundiffusionphoto2, Lora hashes: "That_Special_Face_2: dd5b32ebea7e", Version: v1.6.1

shift-roc-3

I'm having too much fun, so let's put it through the psyfi XL model, which is super tripped out, should be fun.

Aaaand, it turns out looking, stereotypically trippy cheesy.

shift-roc-4-jpg

Technique of the week: Multiple control nets for compositional control

Overview of the process:

  • Converted asset using control net line art
  • Map bashed the line art to create a composition
  • Wound up using two levels of line art control net
    • One for steps 0.0 through ~0.3
    • One for steps 0.3 through ~0.6
  • Used the image as img2img in control net

So my inspiration was this photo of [Mrs. A.G. Vanderbilt])(https://commons.wikimedia.org/wiki/Category:Margaret_Mary_Emerson#/media/File:Mrs._A.G.Vanderbilt(Mrs._Margaret_Emerson_McKim)LCCN2014686089(cropped).jpg):
Mrs._A.G._Vanderbilt_(Mrs._Margaret_Emerson_McKim)_LCCN2014686089_(cropped)

I wound up with an initial bash that looked like this:

emerson-map-bashed-la

And then with a prompt like:

a victorian heiress posed on a fauteuil on a lake house porch in the Adirondacks, evening, colorful spring scene, RAW photo, color photography by Marta Bevacqua <lora:gildedvictorians_v1:0.6>
Negative prompt: 3d, render, cgi
Steps: 35, Sampler: DPM++ 2M Karras, CFG scale: 4.5, Seed: 2057729327, Size: 360x640, Model hash: 879db523c3, Model: dreamshaper_8, Denoising strength: 0.35, ControlNet 0: "Module: none, Model: control_v11p_sd15_lineart [43d4be0d], Weight: 1, Resize Mode: Crop and Resize, Low Vram: False, Guidance Start: 0, Guidance End: 0.6, Pixel Perfect: True, Control Mode: Balanced, Hr Option: Low res only, Save Detected Map: True", Hires upscale: 2, Hires steps: 15, Hires upscaler: ESRGAN_4x, Lora hashes: "gildedvictorians_v1: f5ff52efa828", Version: v1.6.1

Note that I'm using an SD1.5 based model, dreamshaper_8 here I eventually used juggernaut final.

Using my victorian style lora

by the way word of the day, fauteuil learned that by having GPT4 describe the reference image to me.

Wound up with results like:

em-ex1

I still didn't love the composition, and I decided I wanted it to be on a dock

emerson-map-bashed-la6

Note how I tried to draw in the background to have a repetition of form of the dock and the sky

em-ex2

I like this one more, but, the sky is cartoony. I was also not having great luck with getting a nighttime photo, maybe because of the way the line art control net map was rendered, so I started to give up on it.

I wanted something more subtle, and I wasn't having great luck, so I started putting together two line art maps, and using two control nets for the same generation, the first one for a third of the steps, and then a second with the rest of the steps (minus some at the end, I feel like it always helps to let go of the control net towards the end)

emerson-map-bashed-la13c

So yeah, just first a super duper basic one. With the idea of the composition that I want, where I want these implied lines to be.

Then, one with a bit more detail (modestly!) following the same idea

emerson-map-bashed-la13a

And got results like:

em-ex4

Then I ran it back through SDXL with img2image (maybe denoise around 0.4), with results like:

em-ex5

I then inpainted and touched it up (I think I chose a chair from a different generation, too)

And the final

em-ex3