Welcome to episode twenty 24! This is your host, Doug Smith. This is Not An AI art podcast is a podcast about, well, AI ART – technology, community, and techniques. With a focus on stable diffusion, but all art tools are up for grabs, from the pencil on up, and including pay-to-play tools, like Midjourney. Less philosophy – more tire kicking. But if the philosophy gets in the way, we'll cover it.
But plenty of art theory!
Today we've got:
Available on:
Show notes are always included and include all the visuals, prompts and technique examples, the format is intended to be so that you don't have to be looking at your screen – but the show notes have all the imagery and prompts and details on the processes we look at.
https://stability.ai/news/introducing-stable-cascade
Which introduces a new architecture, and while I'm not super versed on the internals, I do understand that it adds a 3 model architecture with two models for decoding, and one model for generating.
One of the main things this means is there's going to be better prompt coherence, like DALL-E is praised for. Maybe not at the same level, but some of my initial tests it really feels like it's got a very accurate take on what you're prompting for.
And it seems to get text – sometimes.
And I've got some notes on how I got it going later in the show.
Announcement @ https://stability.ai/news/stable-diffusion-3
The model is apparently only half-way finished cooking.
Good news is it sounds like it's going to be feasible to run on 16+ gigs of VRAM. (And likely some lower VRAM cards after a while from optimizations)
It's in a very early access preview. I'd venture to guess with a stability membership there's probably a way to try it out. (I'm sure they're antsy to get some RLHF, too. At least MJ seems to be when they're cooking new models)
from /r/stablediffusion "the art of prompt engineering".
There's a 4 chan movement that's putting clothes back on people using AI, "DignifAI". Ironically enough.
Probably the best Pixel art LoRA that I've used so far. Well done! I can definitely find myself trying this out here and again. I don't have big usage for it in my own projects, but it's something that I find satisfying to play with, pixel art and isometric art.
Using my default comfy workflow, LoRA at 0.65
,
and at 0.80
At 0.80
At 0.60
Interesting that it's been trained by using GPT4 to write captions for the dataset images! I really like this idea of the combination.
I'll bet it gives better captions than just CLIP captioning alone. Let's see how it fairs.
Oh yeah, and I want to try it myself, and I found this post about gpt 4 vision tagger which is available on github @ https://github.com/jiayev/GPT4V-Image-Captioner
Using my default comfy workflow
One of the example prompts from the civitai gallery:
I found a post on /r/stablediffusioninfo about how to emulate a particular style that to me, looks like some pop art. But it's kinda neat and unique, and decidedly weird, which I like. So I decided I'd try my hand at it and look at how I would approach it…
Ok, so first, I interrogated this image:
And I got:
I don't love this interrogation at all. Let's run it anyway…
I can't help but modify it, and I ran it through SDXL using Juggernaut XL v8 model, here's what I got:
Honestly not that bad for a first crack. Maybe I'll take a few items from it to use as I progress.
Next I'm going to see how well I do with controlnet reference, but, I've gotta use SD 1.5 for it. That's fine, I had great output with it. It doesn't bother me to use SD 1.5 when I need control net (I'm having terrible luck with SDXL + control net so far)
Same thing but with SD 1.5, using Juggernaut Final
I'm going maybe change the prompt and add a control net reference using the original photo…
I also did a little homework and found a reference artist who I think will work for this style, mostly I searched for "pop art photographers" and this was the first thing I found that I thought was cool, Aleksandra Kingo: https://www.aleksandrakingo.com/ (which I found from this blog article about pop art influenced photographers)
Getting way better. Now let's add my own thing to it, which is, let's do a Bond Girl.
It's really cool-to-me kind of thing, and it captures some of the essence of the original piece. But there's more a punky avant garde thing that it's just not quite getting.
Another thing is that I think we're going to be kinda influenced by the colors of the original. For this exercise, I'm OK with that, but, we could tweak it by using control net and generating new images with a different opinion on color in the prompt, or something like that, and then use those images as a control net. But for now, let's just stick with it. So, you'll keep seeing these colors.
Let's add a fashion designer to the mix and take out the bond girl.
I'm using midlibrary.io @ https://midlibrary.io/categories/fashion-designers to shop around, and I like… Sonia Rykiel.
I interrogated it there, with /describe
and I got:
I tried my hand at one… with just a prompt.
But I wanted to try the new "style reference" feature of MJ, --sref http://url/to/image
, I'll bet it works something similar to IP Adapter and/or Controlnet Reference
So I used a prompt like…
I tried to use my own weirdness flare. It kinda came through, but, we can see the style being applied without anything prompting for it.
Now let's take it a layer deeper and trying to prompt for some of the style as well, let's see how this does…
And last but not least, I tried IP Adapter with SDXL in comfyui. Well… It's almost "too alike the original" in some weird way, but, it does the trick.
And the prompt is included (w00t)!
Found this user from a comment on the show notes (thanks!).
I had posted about the technique of the week on reddit, so, this actually winds up matching.
Really well selected generation. It's an incredible render, great resolution, and there are no obvious rendering mistakes.
I chose this one in part because it's really similar kind of subject matter that I'd pick for my main project – except probably not robots. So, I'm envious of it.
Really cool subject matter and narrative, render turned out awesome overall.
There's nit picks to touch up, funky parts of the chair (chair + railing is weird too), and hands being notable. I don't love "the can" near the robot feet, it's not doing anything for the narrative or composition. Robot feet could use a "hint" of the foot that's behind the first.
I like that the cabin / porch is wired up. I'd keep it or emphasize it probably.
Composition is meh, like, the subject is totally centered basically. There could be something more here to draw your eye around the piece.
This turned out really cinematic and it's cool, it's got a lot of story going on here. I really like it. Nice choice of aspect ratio, pushes the cinematic look.
A few things I'd probably touch up are… I'd change out the birds for one. Like, the birds are "just off" to me. I'd try messing around with inpainting them with a denoise on the higher size.
The sword is strange to me proportionately? I'd almost rather see the tip just hit or just go off the frame.
There's a really nice hint at these kind of "particles" of feathers flying around. I'd push that, it's working really well and I think it could use more of it – between the bird shapes and the features, there's a real opportunity to repeat form and really dial in the depth of this image.
I also bet if you positioned the woman at a 2/3's mark towards the right, or aim for a golden ratio maybe you could really push the composition compared to having the subject equally centered. there's a good enough off-set of symmetrical balance that it looks OK, though.
It's just so close to hitting the next level, good start though for sure.
A little old but I like Sebastian: https://www.youtube.com/watch?v=KTPLOqAMR0s
Also I just used the README and easy install on github: https://github.com/comfyanonymous/ComfyUI?tab=readme-ov-file#windows
And then I also installed comfy ui manager: https://github.com/ltdrdata/ComfyUI-Manager
And I also manipulated my model folders to use symbolic links to link to my stable diffusion model and lora folders, I used a hint from this github issue
I have three goals:
Workflow examples:
Kinda rolling the dice on this, but starting with this example: https://comfyworkflows.com/workflows/0b69c625-d9c5-48b9-8c38-bf069f2c8cd5
And I use Comfy UI manager to install the missing nodes.
And the comfyui styles node is missing, so I have to dig into that but I find out what the story is on reddit and install comfyui-styles-all.
Voila!
I have published my default workflow @ https://openart.ai/workflows/qosagwok5wpPZEJ8qma0
I'm following Olivio's guide on MeshGraphormer with ComfyUI
So I install: https://github.com/Fannovel16/comfyui_controlnet_aux from manager, even though it warns me not to.
Then I grab: https://huggingface.co/hr16/ControlNet-HandRefiner-pruned/blob/main/control_sd15_inpaint_depth_hand_fp16.safetensors
And I load Olivio's comfy workflow and install the missing nodes.
It's… Pretty good. I'm a little disappointed to find out it's SD 1.5 focused. But, I'm starting to wonder if I could gen original images with SDXL and then pass them through a modified version of this workflow, and just have the auto hand inpainting done with SD 1.5.
I looked into potentially adapting it to SDXL, and it's non-trivial. The model it uses is trained at 512x512, so that's limiting.
It works pretty well! Result from Olivio's workflow.
With the prompt:
And Juggernaut Final (for SD 1.5)
Before
After
(I recommend the next method, although I followed the node install method from this video)
Following another Video from Olivio…
I installed https://github.com/kijai/ComfyUI-DiffusersStableCascade via comfyui manager "install from git URL"
That wasn't enough, I also had to manually pip install the requirements.txt (see Olivio's for the command, I closed the window, sorry!)
I wound up with an error, of course:
I found this discussion: https://huggingface.co/stabilityai/stable-cascade/discussions/27
So I edited "C:\Users\doug\.cache\huggingface\hub\models--stabilityai--stable-cascade\snapshots\e3aee2fd11a00865f5c085d3e741f2e51aef12d3\decoder\config.json"
in my case.
And then modifying "c_in": 4
to "in_channels": 4
(and then I restarted comfy)
And guess what? It worked for me, and I did some text with it to see how everyone was raving about it…
It's not perfect, but it does work!
I think something is up with my installation still. I'm getting some weird "swirliness" that I don't know what to attribute it to. Seeing it's a research preview, I'm not too fussed about it.
That swirliness was a problem – I think there's no VAE step here, so…
On second thought, I think I have something wrong with my installation, I need another reference. So I use this youtube video as a second attempt by "How Do?".
First: Update comfyui (hopefully mine is new enough)
I go to pick up models from: https://huggingface.co/stabilityai/stable-cascade/tree/main
Download:
stage_a.safetensors
into ./models/vae
stage_b_bf16.safetensors
into ./models/unet
stage_c_bf16.safetensors
into ./models/unet
text_endcoder/model.safetensors
into ./models/clip
(Lite for low VRAM, and stage_b (non bf16) optionally)
Download the workflow @ https://comfyworkflows.com/workflows/15b50c1e-f6f7-447b-b46d-f233c4848cbc
OH Yeah, this is working MUCH better.
For the prompt:
Rather smooth but a number of steps for model download: https://github.com/cubiq/ComfyUI_IPAdapter_plus
Installed that and the models, then I chose their SDXL example from their examples and started with that
The main thing I ran into was that:
Dang, let's try it for that pop art piece for technique of the week
From this paper: https://huggingface.co/papers/2401.07519
And we'll use this repo: https://github.com/cubiq/ComfyUI_InstantID
And that author also has a YT video about it (awesome video, like most authors, he's super deep into it and rips through some stuff very fast)
And I wound up referencing this other video about installing instant id (from the same cubiq repo)
You can install it via manager, but there's another step.
It's not hard, just tedious to download all the models and put them where they need to be, you can the installation section of the readme.
I was still having failures, so I did a comfy manager "update all", even though my install is just a few days old at this point.
I'm still getting:
Which is required but the README doesn't detail how to install insightface.
So I went looking for tips…
Which had me download a .whl
file according to my python version and then install with, from the comfyui dir.
That did the trick.
I started with InstantID_IPAdapter.json from the cupiq/ComfyUI_InstantID repo as a start.
Then I extended it to add 3 images
Here's the dude himself, French Louie
And some output, from:
and for: