Let’s go - HackMD

# Let’s go ## Starting off strong ### **What you need to know to build this.** You don’t need to know *that* much to build this thing. I’ll give you the things you need to actually make this thing and even go beyond that. **But, you should at least know how to code in one language and how to use the command line. If you know JS and React that helps a lot as well.** - If you don’t know how to code in any language at all, my recommendation is you go [here](https://scrimba.com/learn/learnjavascript?utm_source=buildspace.so&utm_medium=buildspace_project) first. **Our specialty at buildspace isn’t teaching people to code from scratch**, there are much better platforms for that. - If you know how to code a *little* but aren’t that confident, see how far you get in this build! If you have issues or see obvious gaps in your knowledge, don’t worry. Just Google it, or go back to the basics here. We’ve also got a massive Discord community of builders that will be with you every step of the way! - If you know how to code in any language, but have never used React -- you’ll be just fine. If you end up really struggling w/ React, you can learn it [here](https://scrimba.com/learn/learnreact?utm_source=buildspace.so&utm_medium=buildspace_project). - If you’re a pro I better see you at the end with a really fucking cool product. Inspire the rest of us! ### **All our content is fully open-source.** All of the content you see here is 100% open-source. We’ve had over 1000 PRs merged into our repo. If you see an issue in the content or want to make an addition, just make a PR and once it’s accepted it’ll be instantly updated here for everyone. Tech moves quick. And, it’s because of the beautiful people making PRs that this content always stays up to date. **Please check out the repo [here](https://github.com/buildspace/buildspace-projects?utm_source=buildspace.so&utm_medium=buildspace_project)** -- give it a star if you’re feeling nice! Helps us out a lot. ### How to get help You’ll probably run into some roadblocks while building. That’s why we have a mod team whose main goal it is to help you get through any issues so you can actually ship this thing. Whenever you run into something you can hit the mods up directly in the Discord by doing `@mods` to ping them. You can also run `/question` which will alert the mods with your question. **Please keep all discussions around this build in the help sections of the `BUILD AN AI WRITER` section on the Discord. If you start asking questions in random channels in the Discord your question will most likely not get answered lol.** If you don’t have access to these channels, relink your Discord [here](https://buildspace.so/p/build-ai-avatars?utm_source=buildspace.so&utm_medium=buildspace_project). ### **Build in public.** Through this build, I’ll give you certain moments where you can share what you’re building + give you pre-built messages to inspire you to tell the world about what you're building. There is zero harm in **building in public —** believe it or not, almost all the people I’ve met have helped me continue building great things because I tweeted randomly about what I was building lol. ### ****What happens at the end?**** At the end of every build at buildspace, you get the ability to claim a unique NFT. We’ve also got a pack of some **REALLY** cool perks to keep your AI journey going! This NFT gives you access to a few secret powers: 1. **It gets you access to alumni channels on our Discord,** which is where most of the more interesting discussions happen. Way less noise and way higher value -- it’s where all the true shippers are. 2. **A lot of magic when building happens IRL**. That’s why we’re building out IRL hubs around the world for our top alumni to visit/build from whenever they want. The more NFTs you hold, the better chance you have of getting access. **Our first IRL hub opens in SF on March 1, 2023.** 3. **You’ll get accepted instantly to [Nights & Weekends](https://buildspace.so/nights-and-weekends?utm_source=buildspace.so&utm_medium=buildspace_project)** if you have the NFT. N&W is kinda like an accelerator we run -- build out any idea you want over 6-weeks, come out to San Francisco for 3-days to build IRL, and present your work at demo day in front of thousands. N&W alumni have gone on to raise millions of dollars. ### Please do this or Raza will be sad. In `#sd-general` — tell the world why you’re curious about doing this stuff. It’d be cool to understand why you’re curious about this stuff, what you’re trying to build, what inspired you to get into this, etc. Also, this is a cool way to make some friends — the magical thing about buildspace is that over **140,000 people are in the Discord** and you never know who may see your message, resonate with what you say, and reach out to you! # Stable Diffusion 101 ## What are we building? ### Putting ____ in your favorite movie scenes Imagine this: You invite your friends over for movie night. The latest blockbuster hit is out, so you get everything ready: pizza, popcorn, H20. You pull up the TV/projector. The first thing on screen? **YOU**! Before the movie even plays, all your friends are treated to an incredible piece of art depicting *you* as the protagonist. Everyone's gonna remember this one. **IMG_GOES_HERE** Welcome to the world of AI generated art through a sick open source software called “Stable Diffusion”. You’ve probably heard about Dall-e or MidJourney before. These are apps that let you generate images from text. Stable Diffusion is a similar project, except it’s free and open-source — anyone can run it on their own computers! Open-source models like Stable Diffusion are an absolute game changer - you can use techniques to fine-tune them and combine them with other tech to create extremely accurate depictions of you as *any* character on the internet. Ever wondered what you'd look like as Patrick Bateman in the 2000 cult classic American Psycho? Batman? Legolas? Shaggy? You can make it happen with Stable Diffusion. **Here’s the plan**: 1. We’ll start by learning all about prompts and how this tech works by making a bunch of really cool avatars 2. Next we learn about Dreambooth - a special technique that let’s you **fine-tune** Stable Diffusion on a specific subject (you!) by **teaching** it what you look like 3. We’ll then build an app that uses that model so your friends can use to put you anywhere! ### Letting your friends generate scenes with you in it The magic behind our app here is customizing it to YOU. We’re going to get the perfect photos of you with just your laptop (or phone) and train a model that will be able to generate insane AI avatars with a specific theme. This all with just a prompt + click of a button. ## Becoming a Prompt Engineer™ // Prompt Engineer drills Lets hop right into it — by the end of this section you’re going to have generated your first AI avatar image while getting into the insanity of prompts. Why don’t we start with wtf a prompt even is? A prompt is like a magic spell - it’s a sentence or set of phrases that describes an image you want. You need to use the right words in the right order or things will get weird. **IMG GOES HERE** Ready to speak the language of the AI? We're going to be writing lots of prompts and we'll get into the advanced stuff quite quickly, so I suggest checking out this awesome [Prompt Engineering 101](https://buildspace.so/notes/prompt-engineering-101-sd) note from our resident prompt mastermind Jeffrey to get caught up. Check him out on [Twitter](https://twitter.com/ser_ando), he posts some crazy good shit there. Just a heads up here: creating prompts is a skill. You won’t become a prompt god within 30 minutes. Most of your prompts will suck, but that’s okay. **When you’re bad at something, the only guarantee is that you will get better at it :)** There are a couple of places you can generate images, we are going to start with [DreamStudio](https://beta.dreamstudio.ai/) - it’s the official tool provided by Stability AI, the company behind Stable Diffusion, it’s got the latest version, and it lets you configure some important stuff. **IMG GOES HERE** You’ll start with 100 credits, which will be enough for 100-200 images based on your settings. Make sure you open up the settings and crank up the steps - the higher the # of steps, the greater the accuracy of the image. I went with 75 steps. You’ll also want to change the model to Stable Diffusion 2.1 (or whatever the latest one is). I also turned it down to two images per prompt cuz I don’t wanna blow through the credits lol. What I want you to do here is ******************************find your style******************************. Play around with all the modifiers I’m gonna share. We’ll build up layer by layer, so you’ll have lots of chances to mix things up. I’m going to base my site on creating AI Avatars around the LOTR world. This is your time to pick a theme and go all in on it. Your app by the end will be 10x better if you have a direction to follow. Alright, alright enough explaining — let’s do this. ### Art History/Theory crash course Ya know how people joke about Art History being a useless degree? It's the complete opposite with AI art lol. Like Jeffrey mentioned in the 101 note - you need to know the techniques of real artists and the various styles so you know *how* to describe the next Creation of Adam. These models aren't aliens from another planet - they're all of humanity squished into a single program. Everything we've put on the internet for the past decade has culminated into these gargantuan entities that label things just like us. We are going to generate epic art as we go through each pillar of what makes art so appealing. You’ll be able to see how each part interacts which each other and how they blend seamlessly together. **As we take a journey through art history, we are going to actively be trying out prompts in DreamStudio! So, make sure it’s open and ready to rock 🤘.** **Artist** **Leonardo. Raphael. Michelangelo. Donatello.** Almost all of the world’s most renown artists have distinct art styles. The internet is full of their work and all sorts of derivatives - inspirations, tributes, imitations. This makes using artist names in prompts *********************************incredibly********************************* powerful. Artist names are going to have the **most** impact on the style of your image. It's going to be the base your image is built on and will be responsible for a good chunk of the vibe. If, like me, the only artists you know are the ones named after the Teenage Mutant Ninja Turtles, here’s a list of various artists you can try: - James Gurney - John Singer Sargent - Edgar Degas - Paul Cezanne - Jan van Eyck This specific modifier is a really tricky one - all text-to-image models are trained on art taken from the internet, without explicit permission from their creators. There’s a lot of ongoing discussions about ******who****** owns the images generated using these models and a lot of artists are angry because they don’t consent to their art being used for training. The names on this list weren’t chosen by accident - 4 of them are historically famous portrait artists (who are dead), the fifth is okay with their name being used for AI art. All this is to say - **respect artists’ decisions and don’t use specific art styles without permission from their creators.** You could go on sites like [ArtStation](https://www.artstation.com/) and [DeviantArt](https://www.deviantart.com/) to find more artists you like, or maybe take a shower and go to your local art museum, I promise it’ll be worth it and you’ll learn a lot more about art. **********************************Time to generate!********************************** I live in Auckland, New Zealand - just an hour away from the Shire, so I’m gonna use SD to imagine Gandalf if he was made by Disney. Here’s my first prompt! ```json A profile picture of Gandalf from Lord of the Rings in the style of Pixar, smiling, in front of The Shire ``` Pretty simple! You’ll learn that you don’t **need** a lot of detail in your prompts, you just have to be specific about what you want. This is an important distinction. I cranked up steps in the settings to 150 - this makes the results better but costs more GPU time. Here’s the best one I got back: **IMAGE GOES HERE** Not bad. The hair is a bit wonky and he’s missing teeth, but this is just our first prompt :P Let’s try changing up the medium! **Medium** **Acrylic. Watercolor. Microsoft Paint.** Vincent Van Gogh never made pixel art, but with the power of Stable Diffusion, you can find out what that might have looked like. Stick with more popular mediums like digital illustrations, paintings, and pixel art. The more content out there, the better Stable Diffusion will be at it. A shadow painting won't be as good as an oil painting because the internet has way more oil. Here's a list of mediums you can play with: - Acrylic painting - Watercolor painting - Pixel art - Digital Illustration - Marble sculpture - Polaroid picture - 3D render Some of these mediums can be combined! Here’s a Dr Strange sculpture made with “chrome copper marble” that [Jeffrey conjured with MidJourney](https://twitter.com/ser_ando/status/1600335448039006208): **IMAGE GOES HERE** We truly are in the future. I think I wanna see what a 3D render of Gandalf would look like. Here’s my updated prompt: ``` A Pixar style 3D render of Gandalf from Lord of the Rings, smiling with his mouth closed, in front of The Shire, green hills in the back ``` Not much has changed. I’ve added in “3D render” and left the rest as it is. **IMAGE GOES HERE** **Insane —** how frickin cool is this?? ******************Aesthetic****************** You’ve got your medium and your artist. Next we’ve got the vibes — is this a lofi kinda mood or are we going cyberpunk? Here are my favourites: - Fantasy - Vaporwave - Cyberpunk - Steampunk - Gothic - Sci-Fi, futuristic [Here’s a massive list of aesthetics](https://aesthetics.fandom.com/wiki/List_of_Aesthetics) for you to check out. You can even make your own vibes! Here’s Dronecore fashion made with GPT-3: **IMAGE GOES HERE** I want something in fantasy colours so I went with this: ``` A Pixar style 3D artwork of Gandalf from Lord of the Rings, smiling with his mouth closed, in front of The Shire, green hills in the back, fantasy vivid colors ``` **IMAGE GOES HERE** This is starting to legit look like a Pixar version of LOTR lol. **********************Descriptor********************** The last ingredient I want to cover is the most vague one - it’s not one specific area, it’s general descriptors. - ************Time -************ 1970s, stone age, apocalypse, ancient, great depression, world war II, victorian - ********************Seasons -******************** Winter, summer, spring, autumn - ************************Holidays -************************ Eid, Christmas, Diwali, Easter, Halloween, Hanukkah - ********************Details -******************** Detailed, hyper-realistic, high definition, trending on ArtStation (yes, really) Take a stab at this and add a couple of these descriptors to your prompt! When you get an image you like, make sure to drop it in `#prompts` in our Discord to show others what you came up with! There’s **************************so much more************************** you can do with prompts. You could spend weeks learning all about how humans describe things. I’ll leave you to it with this one bit you need to remember: ********************************************AI models are trained on almost all of human media on the internet********************************************. That means anything you can find on the internet, the models probably know of too. Stills from famous movies. Fan art of characters. Breakdowns of iconic scenes from specific directors. Shots of varying camera angles, different films, resolutions, lighting, lenses, photo genres. The AI has seen all (except the NSFW stuff, they removed it :P). I’ve found the internet is bigger than my immediate imagination, so if you can imagine it, it’s likely that the AI knows about it, you just need to find the right words. Here’s a few handy links with various examples of what words can do: [**Medium**](https://docs.google.com/document/d/1_yQfkfrS-6PuTyYEVxs-GMSjF6dRpapIAsGANmxeYSg/edit) **[Color](https://docs.google.com/document/d/1XVfmu8313A4P6HudVDJVO5fqDxtiKoGzFjhSdgH7EYc/edit)** [**Camera**](https://docs.google.com/document/d/1kh853h409DeRTg-bVo_MSYXrWjMDRMX9kLq9XVFngus/edit) **[Lighting](https://docs.google.com/document/d/1qcpgNsA-M998zy0ngVvNcMs2AYHpMuAjAefM6p63Tp8/edit)** **********[Film](https://docs.google.com/document/d/1vM9izOU4bQIcrKxAZiw85Q826zb6kBsjUQKdawm3lyk/view)********** Take a step back and look at your beautiful (and not so beautiful) creations. You are well on your way to generating the next wave of amazing art for the world — now let’s 10x that. ### Advanced configuration flags Well, well, well — fancy seeing you here in the advanced section 😏. Hopefully you found some magic generating images with DreamStudio. Let’s see how we can make this even better by making it easier for SD to understand :). Most models out there have ways to configure settings *inside* your prompt. What you can do and how you do it depends on the model. For instance, DreamStudio let’s you combine multiple prompts with the pipe `|` character: ``` Portrait illustration of Gandalf from Lord of the Rings : 2.0 | The Shire in the background: 0.4 | Renaissance oil painting ``` The numbers after the prompt indicate the weight, which can go up +/-10.0. Click the question mark on the top left of the prompt input box for more detail :) **IMAGE GOES HERE** Looks like I’ll have to push the weights on the background, but this is pretty good! For the other models out there, they each have their different weights. If you decide to use something else, like Midjourney, make sure to checkout their manual! ### Negative prompts We’ve talked a bunch about how to tell Stable Diffusion if you want something in your images, but what if you *******don’t******* want something? Here’s how you can do that! You might have noticed I’ve been telling Stable Diffusion I want Gandalf to smile with his mouth closed for a bunch of prompts. This is cause it usually messed up the teeth. I can do the same thing by giving it a negative weight like this: `smiling:-1` ``` Renaissance portrait of Gandalf from Lord of the Rings in the style of The School of Athens, vibrant colours, highly detailed : 2.0 | The Shire in the background : 0.8 | Detailed baroque painting, Michelangelo Buonarroti, Raphael : 1.0 | Smiling: -1 | Duplication: -1 ``` Here’s what I got: **IMAGE GOES HERE** WOW. I’m liking classical art more and more. I increased the weight for the second block and two of these have the shire in the background! For the rest, I just Googled “famous renaissance paintings” cause I know Stable Diffusion will know all the popular ones. I picked the names and artists of the ones I liked and jumped into [Lexica](https://lexica.art/) - a search engine for Stable Diffusion. I looked up artist names and even just the word `renaissance` to see what prompts do well. As someone that didn’t know the classical art period existed until I was 16, I think this is pretty good! The one thing I don’t like here is how crowded the prompt is getting. Some services like Lexica have a dedicated field for negative prompts, check it out: **IMAGE GOES HERE** It’s funny, this is considered “advanced”, but it’s really not that bad right? It’s more like you have more tools at your disposal to make your prompts even **better.** EPIC WORK. Let’s keep it going. ### Please do this or Raza will be sad. You should be on a **REALLY** good path for generating epic prompts! At this point take another 30m to mess around with prompts until you find one that blows you away. Go ahead and copy your avatar and post it into `#prompts` on Discord. Make sure to include your prompt in the message to give some inspiration to others. ## Wtf is Stable Diffusion doing? Let's take a step back and talk about how all this works. Yah Stable Diffusion is a deep learning, text-to-image model, but what does that mean??? Understanding it will make you a lot better at using it. **********Note -********** *I’m going to be simplifying a lot of this stuff and sticking to the important bits. This tech is once in a decade type stuff that you’ll need an actual PhD to understand, so I’ll skip the math lol.* ### The basics behind Stable Diffusion Think of the last image caption you saw. It was probably on a blog or article somewhere on the internet and you glossed over it. There’s billions of these images out there: high-ish-quality, available for free, described with sufficient accuracy. These make up the bulk of training data for image generation models. You see, in order to generate images from text, we need to first “train” a machine learning model on a large dataset of images and their corresponding text descriptions. This training data is used to teach the model the relationships between the words in the text descriptions and the visual features of the corresponding images. **We basically give a computer a few billion images and tell it what each of those images contains, effectively “teaching” it what things are.** Ya know how those captchas ask you to select the boxes with the sidewalks or traffic lights? You’re actually training the AI there lol Once we teach the model how to link words in a text description to the corresponding images, it can use deep learning to figure out the relationships between the two on its own. The way “deep learning” works is it creates neural networks with layers of interconnected “neurons”, which process and analyse large amounts of data to solve problems like matching text to images. **All of this means it can take a new text description and make a related new image.** This bit isn’t particularly new - this tech has been around for a while and it doesn’t produce very high quality results. **IMAGE GOES HERE** ### CLIPping into magic The magic of Stable Diffusion happens with CLIP. There’s a **LOT** happening here, so let’s start with the concept of embeddings. Computers don’t see images or words. They’re not as powerful as the all-in-one everything machine with billions of CPUs that sits in our head. When we look at something, light from the image enters our eyes and is converted to electrical signals by the retina. Our brain processes these signals and recognizes the things we’re looking at. Computers need to do a similar type of processing - they have a dictionary that maps pieces of words to numbers. This is called text embedding. By representing words or images as numerical vectors, we can use these vectors as input to machine learning algorithms, which can then learn from the data and make predictions or generate new data. **IMG** Image embedding has a couple extra steps - they’re first passed through a convolutional neural network (CNN), which is a type of deep learning model that is designed to automatically learn the important features and patterns in the image. This way we can represent the important features of an image in a numerical vector and do math with it. So — **we’ve got a text embedding and an image embedding. Basically a numerical representation of the image and it’s caption.** This is where CLIP comes in - its job is taking these two embeddings and finding the most similarities. This is what gives us those extra crisp results which are realistic and don’t have any weird artifacts. If you wanna see it in action, [check this out](https://huggingface.co/spaces/EleutherAI/clip-guided-diffusion). **IMG** ### The Diffusion in Stable Diffusion Think of a bunch of round objects that you’ve seen. How do you know a football is different from a bowling ball if you’re just looking at it? The way it looks, duh! **IMG** You’ve probably never thought about this, but in your head, you’ve got at least three “axes” - one for shape, one for color, another for size maybe. Footballs are in one spot on this graph, bowling balls are on another. ~~(ignore the terrible 3D graph)~~ Stable Diffusion does something similar, except it has a ********lot******** more dimensions and variables. Here our big brain gets left behind - it can’t visualize more than 3 dimensions, but our models have more than 500 dimensions and an insane number of variables. This is called **************************latent space.************************** **IMG** Imagine you're training a machine learning model to generate cat pictures from text. The latent space in this case would be a space where each point represents a different cat picture. So, if you were to drop a description of a fluffy white cat into the latent space, the model would navigate through the space and find the point that represents a fluffy white cat. Then, it would use that point as a starting point to generate a new, related cat picture. This right here is why our prompts are so powerful - **they're working in dimensions we literally can't imagine**. We need hundreds of mathematical coordinates to navigate to a point using text. This is why our results get better when we add more modifiers. The process of navigating through this space and finding points that are related to a given input is called diffusion. Once it’s found the point closest to the text prompt, it works some more AI magic to generate the output image. **IMG** d now you know how Stable Diffusion works! Check [this](https://jalammar.github.io/illustrated-stable-diffusion/) out for a detailed explanation of the various parts if you’re curious and want to dig deeper. You might feel like there’s not much point in understanding how any of this works, but now that you do, you’ll be able to build things that others can’t even think of. As I was writing this, OpenAI launched its [new and improved embedding model](https://openai.com/blog/new-and-improved-embedding-model/), which is 99.8% cheaper. The model for GPT-3 isn’t open-source, so we can’t use it to make custom applications, but we ****can**** use the embeddings API to match sets of text directly. This can be used for all sorts of awesome apps, like recommendation systems and natural language search! The stuff you’ve just learned is going to compound and take you places you can’t think of :) # Dreambooth 101 ## Wtf is Dreambooth? You’re now well-versed with what Stable Diffusion is and how it works. Except it has one big problem - you can’t teach it new things. There’s no way I can give it 10 pictures of my dog and make it generate images of my dog on the bed so I can gaslight him into thinking he broke the rules. In comes [Dreambooth](https://dreambooth.github.io/) - a machine learning technique that lets you generate photorealistic images of specific subjects in a variety of different contexts. Stable Diffusion knows everything about the general world - what clouds look like, how bald Dwayne “The Rock” Johnson is, and what rainbows are made of. Using Dreambooth, you can teach it what ********you******** look like too! ### How do Dreambooth + SD work together? To use Dreambooth, we’ll have to give it some training data: a set of images of ourselves, or whoever we want to generate images of, along with a label (our name) and the class of objects the thing belongs to (human). Dreambooth then fine-tunes a pre-trained text-to-image model to learn to recognize and generate images of the specific subject. **IMG** ### How does it work? Unlike magic, this tech is even cooler when you know the trick. We've got the OG Stable Diffusion model. That's not the special part. What's really going to make it ours is the set of input photos we'll use. Here's a simplified flow of how it'll work: 1. The SD model is trained on a dataset of images that represent one subject (you). 2. Using the input photos, the model extracts the key features and characteristics of the faces in the photos. 3. The model uses the extracted features to generate new, synthesized images that resemble the input photos but have their own unique style and variations based on our text. Soon you’ll be able to turn yourself into Thor (aka Thorza) like I did below **IMG** ### Training Dreambooth models is a big deal You know all those fancy AI-style profile pictures you’re seeing on Twitter and Instagram? All of them were generated by fine-tuning SD models using Dreambooth. That’s what Lensa charges $$$ for. All of this is open-source tech that was made available to everyone at the same time, it was all up to the builders to ship fast and get it in front of people. While it’s not likely that you’ll be able to make a $50k MRR app overnight by building this out, you’ll know how it works and you’ll be ready to capture the opportunity the next time it arises! ## Training Dreambooth models on yourself ### Getting tasty photos of yourself **It is time.** The first thing you’ll need to do is gather training data. Since we’re making AI avatars of ourselves, we’ll have to get a bunch of pictures. For your first run, I recommend going with 5-10 images. These can’t just be your average selfies though, we have to be very careful with what we teach the AI. Here’s a few rules for your pics: 1. **Pictures should contain only you** - no friends, dogs, samosas, aunties 2. **Clear backgrounds -** If you can’t get white backgrounds, use [https://remove.bg/](https://remove.bg/) to remove them entirely 3. **Picture quality** - Pictures should be well lit 4. **Picture size** - At least 720p at the minimum. You ************can************ use laptop webcams but you need to be in a well-lit place lol You gotta be careful here - more images do ********not******** mean better results! For my first set I used 19 pics and it came out pretty meh. The big mistake I made was using a blue background in ************every************ picture. I taught SD that I always have a blue background, so it generated results with blue backgrounds! **IMG** Let’s take some pics!! Grab your phone, webcam, DSLR — whatever you got and get snappin. Your pictures should show off features that you want SD to learn about. Maybe you tighten your jawline or get a Zac Efron haircut. Up to you lol. Check out the pics I took below: **IMG** Once you’ve got all your pics and are happy with the backgrounds, we can prep them for processing. The main thing we need to do here is to resize them to 512x512 because that’s the size of all the images we’ll be generating. Head over to [Birme](https://www.birme.net/?target_width=512&target_height=512) and resize all your pics to 512x512. The last thing you need to do is rename all your pics with a unique label. SD has millions of data points for what a “man”, “woman” or “handsome AI developer” looks like. It probably also has lots of results for your first name. So what we need to do here is give you, the subject, a distinct name that we can use in prompts. I’m mashing my first and last names to get “abraza”. So in a prompt I’d go “Oil paint portrait of **abraza** as a professional wrestler by Vincent Van Gogh”. Pretty simple eh? You can try a couple of different angles, but make sure they’re close-up pics. Torso pics work too, but keep it above the belt. Our data is ready! Onwards! ### Training with Google Colab The main event - **training**. This is where things start to get **REALLY** cool. Since this is really compute intensive, we’ll need some powerful GPUs for this. Don’t have a beefy GPU? Worry not! Google Colab to the rescue! Google Colab is basically just an IDE inside a browser. It’s connected to the Google Cloud Platform so we never have to install any base dependencies and get lots of free compute. Thanks Google! We’ll be using Python for this part, except you won’t actually have to write any Python or set up a Python environment! This will be done with the magic of Jupyter notebooks. Here’s COVID Raza talking about what a Jupyter notebook is on Google Colab: **LOOM** We ****can**** use Jupyter notebooks in VS Code, but we’ll be using Google Colab cause we get free compute! Who can say no to a free Tesla T4 GPU? Colab notebooks can be shared like files, so I’ll just give you a link that you can copy over to your Google Drive. ## Running Dreambooth ****************************************Before you get started, make sure your Google Drive account has at least 5 GB of free space.**************************************** We’ll be saving the fine-tuned model to Gdrive, and it takes up about 2-3 gigs. We’re going to be using an extra special version of Stable Diffusion which is optimised for memory. The best part? The entire training/tuning workflow will happen in Google Colab without writing a single line of code! Be warned though - even though Colab is free, the resources aren’t permanently available. Make sure you have at least ******************60 minutes****************** free to go through this section, cause if you leave it running you might run out of free hours. If you ***do*** need to leave at any time before training finishes, you’ll have to disconnect your runtime using the dropdown menu next to the RAM/Disk bars on the top right. This will reset your environment so when you come back next time you’ll have to start from the top (step 1 in the notebook). **IMG** Start by [clicking this link](https://colab.research.google.com/github/buildspace/diffusers/blob/main/examples/dreambooth/DreamBooth_Stable_Diffusion.ipynb). It’ll open up a Jupyter notebook in Colab. The first thing you wanna do is make sure you’re on the correct Google account. If you aren’t, click your profile on the top right and switch to an account with at least 5 gigs free. Next, you wanna copy the notebook to your Gdrive. This will open it up in a new tab. ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/1b2fa6f9-9d75-457c-8359-7a0e57de8a06/Untitled.png) We’re ready to rumble! The notebook has a few extra bits that you can ignore on the first run. **********************Remember -********************** You’ll only need to run each block one time. The first block will connect our notebook to a virtual machine and show us what we’re connected to. This block also starts a timer — you only get a limited number of GPU hours for free. ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/a6d4fa03-d295-43c4-8844-de5adf71088f/Untitled.png) ************************************Set up environment************************************ The first thing we’re gonna do is sort out the requirements. Every time we open up a new Colab notebook, we’re connecting to a brand-new virtual machine. You’ll need to install requirements every time your machine disconnects - the state is cleared. ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/99004f28-4938-4f04-9bb4-f828171df414/Untitled.png) This will take about 2-4m. While that’s running, head over to [HuggingFace](https://huggingface.co?ref=buildspace) and sign up. Once you create an account, we are going to need to generate an access token! This will be used in the “Login to HuggingFace” section of Colab. To grab this, just click on your profile, click “Settings” and the go to “Access Tokens” on the left :). Here you will want to press “New token” at the bottom of the page. Name this thing whatever you want and make sure to give it the write role (more on this later) — ![Screenshot 2023-01-04 at 6.43.47 PM.png](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/355ee137-3bbb-4cf9-ae40-d2c2de885532/Screenshot_2023-01-04_at_6.43.47_PM.png) Chuck that bad boy into the token field and run when the requirements block is all done. ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/59b96552-39d8-471f-a46a-d4533efa0709/Untitled.png) **************************************************************Hold up - wtf is HuggingFace?************************************************************** In order for us to go from text → image on our app, we’re going to need to run Stable Diffusion! For now, we’ll be able to do this in Colab, but Colab doesn’t have API endpoints it can expose. This means we need to be able to host and run SD somewhere - remember that it’s insanely GPU intensive, meaning it will only allow like 1% of the world to use our app lol. Luckily, the world has already been using cloud computing forever and we can rent NVIDIA’s newest GPUs no problemo 🤘. **BUT** — these fancy GPUs can cost $100s per month to just keep up. That’s where Hugging Face (🤗 ) comes in. It’s one of the world’s largest AI libraries out there looking to expand the world of AI through open source. A lot of braincells went into trying to figure out how we could make this free for all and are **HYPED AF** to show you exactly how to do it. But for now, lets head back to Colab. Next, we need a fancy lib called xformers. These are an additional dependency that will seriously speed up how fast Stable Diffusion runs. You don’t need to know how this works, just that you should definitely use it whenever possible since it will 2x performance. The version will need to be kept updated, it’s 0.0.15 at the time of writing - if this breaks, head over to `#section-2` help and tag the mods. ******************************************Configure your model****************************************** Let’s take a lil breather here! You just did a lot of awesome stuff in Colab: 1. Got started with a free GPU from Google 2. Setup your HuggingFace account + created an Access Token 3. Installed some xformers **The internet is crazy dude.** Now we need to tell the notebook which model we want to use. Since we’re connecting to HuggingFace, we can read any public model on there. V2.1 is really wonky with prompts so I’m going with v1.5. You can try v2.1 later, for now just enter this path into the `MODEL_NAME` field and get going: ``` runwayml/stable-diffusion-v1-5 ``` The way you choose a model is by putting in the path of the URL on HuggingFace. So `https://huggingface.co/runwayml/stable-diffusion-v1-5` becomes `runwayml/stable-diffusion-v1-5`. ************************MAKE SURE `save_to_gdrive` IS CHECKED!** That way if the notebook crashes for whatever reason, you won’t have to retrain your entire model again :) **Please note** — even though you ****can**** use other fine-tuned models, our notebook only supports Stable Diffusion v1.5 and v2.1. If you somehow got your hands on the MidJourney model, it won’t work here. ****************Configure training resources**************** The beauty of this model is that it’s incredibly optimised and can be configured to run with comparatively fewer resources. Luckily we won’t need to mess around with this - Google Colab will push it out. Head over to step 5.5 so we can tell Stable Diffusion *****what***** we’re training it on. **Instance prompt**: this describes exactly what your images are of. In our case it's whatever we decided as the name ("abraza" for me) and "man/woman/person". This is the **label** for the images we uploaded. **Class prompt**: this just describes what else Stable Diffusion should relate your model to. "man", "woman" or "person" works :) ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/4e88a2cb-9b8a-440f-9d51-020e6f3aea93/Untitled.png) **Step 6 -** **************************Upload images************************** This one’s pretty straightforward! Run the block, a “Choose Files” button will pop up. Click choose files and upload the images we prepped earlier. ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/f1cb07d9-f894-497b-9914-4b7bf190f3c0/Untitled.png) ****************************************************Step 7 - Configure training options**************************************************** Wait, wait, wait. We are already getting ready to train this thing on our face? This feels like a magic trick has been exposed to you right? I hope you are seeing how doing this, while takes a solid amount of time, is actually so straight forward with the current tech out there! Let’s freaking run this thing 🤘 Okay, this next section may seem intimidating, but you don't have to touch most of it! Again, I've left these in here if you really know what you're doing and want to customise your model, for your first time all you need to do is: 1. Change `max_train_steps`. You wanna keep this number lower than 2000 - the higher it goes, the longer training takes and the more "familiar" SD becomes with you. Keep this number small to avoid overfitting. The general rule of thumb here is 100 steps for each picture, plus 100 if you’re under 10 pics. So for 6 pictures, just set it to 700! If you think the results don’t look like you enough, just come back here and turn this number up lol 2. **Update `save_sample_prompt` to a prompt with your subject.** Right after training, this block will generate 4 images of you with this prompt. I recommend spazzing it up a bit more than just "Photo of xyz person", those come out quite boring. Put those prompting skills to use! ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/8e3b3547-233b-4440-89ee-549be1a44490/Untitled.png) While the training is happening, take a moment to get up and stretch! Your back will thank you and you’ll be able to stare at screens for a much longer period of your life. When that’s all done, run through blocks 7.2 and 7.3 without any changes. You should see your first images!!! **YOU’RE A MACHINE LEARNING ENGINEER NOW WOOOOOO.** Okay, well, maybe not just yet. Run through the next two blocks - you won’t need to change anything on this first run. Step 8 converts the weights to a CKPT format - this is necessary if we want to upload it to HuggingFace and get inference endpoints. Step 9 prepares the converted model so it’s ready for inference. Again - you don’t need to know how this works, this bit is here in case you want to change the `model_path`. ******************************Generate images****************************** We’re here - the promised land. Use your magic prompt powers and the unique subject identifier to make some magic happen. You can turn up the inference steps to get more detailed results, or turn up the guidance scale to make the AI more obedient to your prompt. I like 7.5 for the guidance scale and 50 for the inference steps. I’ve found it does best with well defined themes with lots of material online like TV shows, bands, fanart. Here’s me as a Peaky Blinders character, a mafia boss, and if I were in Blink182: ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/72896572-77f8-42a6-83fe-ed1eb7c380af/Untitled.png) ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/68615b2d-4e96-450a-96e6-a65ddaece2c0/Untitled.png) ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/2c34375c-5446-40f8-b7bd-a74723daca11/Untitled.png) I got all of these on the ******first****** try! ****************UNREAL.**************** Here’s the prompt I used: ``` concept art oil painting of [SUBJECT] by [ARTIST], extremely detailed, artstation, 4k Portrait of [SUBJECT] in [TV SHOW], highly detailed digital painting, artstation, concept art, smooth, sharp focus, illustration, art by [ARTIST 1] and [ARTIST 2] and [ARTIST 3] Portrait of [PERSON] as [CHARACTER], muscular, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by [ARTIST] and [ARTIST 2] and [ARTIST 3] ``` I mixed a few different artists here - the trick is to make sure their styles are similar. The big pot of gold that’s giving you all this magic is the 4GB `.CKPT` file in your Google Drive folder. That is what we’ve been working towards all this time - a custom Stable Diffusion model trained on **you (**or your cat). Next, we’ll put it up on HuggingFace and set up a React app to let the world try it out! ### Upload to HuggingFace The last step (#11) is extra special — it takes your custom-tuned model and all the necessary files and puts them on HuggingFace. You won’t need to do much here - just change the concept name (ex: SD-Raza) and put in a write token from HuggingFace (you can use the same one you generated at the beginning!), hit the run button, and watch the magic happen. When hosting models, there’s two big problems we usually need to solve — 1. **Where do we host our fancy new model?** 2. **How do we actually call our hosted model?** HuggingFace has solved both of these for us! It’s hosting our model and has inference API endpoints we can access. ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/c24d5601-9259-4e78-bbe9-3f201384024a/Untitled.png) Click the link and you’ll see this bit on the right side: ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/752bcada-c6ad-43c8-a2c4-6b6b52a249b4/Untitled.png) This is UI for your inference API! Put a prompt in there and see the magic happen :D Once you press compute, you’ll notice that you’ll get a “Model is loading” message. This is one of the caveats of using hugging face as a free service. Since it costs so much money to keep this model in memory, Hugging Face will automatically clear your model out of their memory if your model isn’t being used. This saves them resources + money on a model that isn’t getting a lot of traffic. Sometimes this process can take up to **5 minutes.** So don’t be alarmed if you are waiting multiple minutes. Just like that, you have an image generated, just like in Colab! Head over to your [usage link by clicking here](https://api-inference.huggingface.co/dashboard/usage). This think is actually pretty cool. Hugging Face gives you 30K free characters (essentially credits to run these queries). Thats **PLENTY** to get you started :). **Wow — you just created a custom model, hosted your model somewhere, AND now have an endpoint you can call in your web app 👀** ********************************************************************Please do this or Raza will be sad******************************************************************** The coolest part about MidJourney is the Discord server. You can see what everyone else is doing and it really inspires you. I want you to share your best prompts in `#prompts`. Tell us what works and what doesn’t! This new tech is a mystery, we can figure it out amongst ourselves :) # Build your own React App to generate images ## Set up your React app By this point, you have trained your very own Dreambooth model. Thats fucking **nuts**. You’re already doing what 98% of the world has no idea how to do. Hopefully you combined that with your legendary prompt engineering skills to get some images that look like this: ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/69e80615-a798-4a07-89ad-e5fe53b3be47/Untitled.png) ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/f6432591-3551-42eb-a57a-f51649519738/Untitled.png) ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/a1c7c165-ee1b-48c9-a891-d49331a9ddfd/Untitled.png) ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/9913aa75-b929-4cbd-9c6e-9a29cfe032b8/Untitled.png) We’re going to take this even further by creating a web app that uses this model! Like we talked about before, your web app will be able to generate an avatar of yourself from any web browser. More importantly, you’ll be able to send this off to friends and they can create some images based on your custom model. Hope you have nice friends lol. The beauty of this is you don’t have to use a human face (ya alien faces count too lol). But in all seriousness, you can train a model of a tree, a bridge, or even your guitar. It’s freaking insane to see what you can work with here tbh. ### Get the starter code Let’s go ahead and start by forking the starter repo. We are going to fork it so we can use a tool called [railway](https://railway.app/) to easily deploy our app generator to the world! [Click here to fork the repo](https://github.com/buildspace/ai-avatar-starter/fork). Go ahead and clone your brand new fork, open the folder in your favorite text editor (I’m using VSCode) and run the command `npm i` . Then you’ll be ready to start the project by running `npm run dev` If everything worked, you should be able to navigate to `[localhost:3000](http://localhost:3000)` in your favorite web browser and see: ![Screenshot 2022-12-19 at 6.22.15 PM.png](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/54f09bb9-87b5-49d8-abba-e534f848b503/Screenshot_2022-12-19_at_6.22.15_PM.png) Very nice! We will be working with `next.js` to build our UI + single API for this :). If you’ve never used Next before, have no fear. I’m going to take you through the magical lands of this framework. **NOW** — go ahead and head back to your code editor and let’s get some basic things in here. First, change your one-liner! Head to the `index.js` file and update your title and description with the type of generator you’re making. We are going to be building an epic guitar generator, so I’ll change mine to — “Silly picture generator” + change the description to “generate amazing portraits of my guitar by using ‘ajd guitar’”! ```jsx const Home = () => { return ( <div className="root"> <Head> {/* Add one-liner here */} <title>Silly picture generator | buildspace</title> </Head> <div className="container"> <div className="header"> <div className="header-title"> {/* Add one-liner here */} <h1>Silly picture generator</h1> </div> <div className="header-subtitle"> {/* Add description here */} <h2> Turn me into anyone you want! Make sure you refer to me as "abraza" in the prompt </h2> </div> </div> </div> <div className="badge-container grow"> <a href="https://buildspace.so/builds/ai-avatar" target="_blank" rel="noreferrer" > <div className="badge"> <Image src={buildspaceLogo} alt="buildspace logo" /> <p>build with buildspace</p> </div> </a> </div> </div> ); }; ``` Good stuff. This is already feeling really good. The next thing we are going to want to setup is a place for our users to type in! We need to be able to take in a prompt and send it over to our Inference API. We are going to start by adding a prompt container right under the div holding our description: ```jsx <div className="root"> <Head> <title>Silly picture generator | buildspace</title> </Head> <div className="container"> <div className="header"> <div className="header-title"> <h1>Silly picture generator</h1> </div> <div className="header-subtitle"> <h2> Turn me into anyone you want! Make sure you refer to me as "abraza" in the prompt </h2> </div> {/* Add prompt container here */} <div className="prompt-container"> <input className="prompt-box" /> </div> </div> </div> <div className="badge-container grow"> <a href="https://buildspace.so/builds/ai-avatar" target="_blank" rel="noreferrer" > <div className="badge"> <Image src={buildspaceLogo} alt="buildspace logo" /> <p>build with buildspace</p> </div> </a> </div> </div> ``` Cool! I dropped some basic css in the `styles/styles.css` file in this project, but feel free to change this to however you want — remember this is **your** build. ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/815e076e-a242-4b43-b9be-c43caeb0c9a5/Untitled.png) This is going to hold all the UI we need for getting a prompt from our user to our API. Now, in order to actually capture this input we are going to need to create some state properties. Go ahead and import `useState` at the top of your file and then create an `input` state property: ```jsx // Add useState import to top of file import { useState } from 'react'; import Head from 'next/head'; import Image from 'next/image'; import buildspaceLogo from '../assets/buildspace-logo.png'; const Home = () => { // Create state property const [input, setInput] = useState(''); return ( // rest of code ) } export default Home; ``` Now that we have a way to hold what someone is writing in our input box, we need to tell our input box read from that property! Head back to where you created your input and add this property: ```jsx <div className="prompt-container"> {/* Add value property */} <input className="prompt-box" value={input} /> </div> ``` Cool — almost there! If you start typing into your input you’ll start to realize that there is nothing being shown. Well that’s because as we type we need to save the changes to our `input` state. In order to do that we need to use the `onChange` property of our input and give it a function that takes the text and saves it to our state. Start by creating a new function right under where you declared your `input` called `onChange`: ```jsx const Home = () => { const [input, setInput] = useState(''); // Add this function const onChange = (event) => { setInput(event.target.value); }; return ( // rest of code ) } export default Home; ``` This will take in an event and we just take that value and set it in our input state! Now, we just need to tell our input UI to call this function every time you type. Go ahead and add the `onChange` property to your input like this: ```jsx <div className="prompt-container"> {/* Add onChange property */} <input className="prompt-box" value={input} onChange={onChange} /> </div> ``` Go ahead and start typing in the input box, you should now see text appear! Ezpz my friend — we are well on our way. Okay now for the exciting stuff — **making a network call to use our Inference API from hugging face**. If you have never worked with APIs, have no fear, your mind is about to be blown. To start, we actually need a way to run our network request. Let’s create a button that will take our input and send it off to the internet. For that we are going to add some more UI like this: ```jsx <div className="prompt-container"> <input className="prompt-box" value={input} onChange={onChange} /> {/* Add your prompt button in the prompt container */} <div className="prompt-buttons"> <a className="generate-button"> <div className="generate"> <p>Generate</p> </div> </a> </div> </div> ``` At this point you should see something like this: ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/240eafb7-61b3-46ef-a1e5-b904dfd378c4/Untitled.png) Try clicking the button — nothing happens right? That’s because we haven’t told the button to run anything when clicked! For that we are going to do something very similar to what we did with the `onChange` event. Start by creating a new function right under the `onChange` function we declared earlier called `generateAction` : ```jsx const Home = () => { const [input, setInput] = useState(''); const onChange = (event) => { setInput(event.target.value); }; // Add generateAction const generateAction = async () => { console.log('Generating...'); } return ( // rest of code } } export default Home; ``` We are going to add a `console.log` statement for now just to make sure things are running as we expect. If you try pressing the generate button you’ll notice nothing happens still. We need to tell our button to run this function when it is clicked. Go back to where you declared your generate button and you are going to add one more property to it called `onClick`: ```jsx <div className="prompt-container"> <input className="prompt-box" value={input} onChange={onChange} /> <div className="prompt-buttons"> {/* Add onClick property here */} <a className="generate-button" onClick={generateAction}> <div className="generate"> <p>Generate</p> </div> </a> </div> </div> ``` Epic — once you do that head over to your press and open the inspector and head to the Console tab. When you click the generate button you should see `Generating...` print out like this: ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/9aae6992-cb25-4616-bf85-ebab4bb83cf6/Untitled.png) LFG. See how easy this is? You are literally half way there to call an API. It’s time for us to now write the logic that is actually going to call our API. Let’s head back to the `generateAction` function and start by adding this: ```jsx const generateAction = async () => { console.log('Generating...'); // Add the fetch request const response = await fetch('/api/generate', { method: 'POST', headers: { 'Content-Type': 'image/jpeg', }, body: JSON.stringify({ input }), }); const data = await response.json(); }; ``` This first block of code is the piece that will actually go out to the internet and say “hey `/api/generate` can you take my input and give me back an image?” Once we get a response back, we want to convert it to `JSON` format so we can check for a few different things. Beautiful, let’s keep going. Go ahead and add this code right under where you are converting the response to `JSON`: ```jsx const generateAction = async () => { console.log('Generating...'); const response = await fetch('/api/generate', { method: 'POST', headers: { 'Content-Type': 'image/jpeg', }, body: JSON.stringify({ input }), }); const data = await response.json(); // If model still loading, drop that retry time if (response.status === 503) { console.log('Model is loading still :(.') return; } // If another error, drop error if (!response.ok) { console.log(`Error: ${data.error}`); return; } }; ``` In this block we are checking for two different statuses — `503` and `ok` (which is really just status code of `200`). Remember when we were testing our model on hugging face with their UI and sometimes it would have a loading indicator saying “Model is loading”? Well, Hugging Face will return a status of `503` if this is the case! Actually really great, because we can handle this no problemo. We are then checking to see if there are any other errors, if there is also make sure to catch those and print them out for us. If everything goes well (as it always should right?) we are going to take our image and save it into state to display. Alright, first things first, lets create a new state property called `img`: ```jsx const Home = () => { const [input, setInput] = useState(''); // Create new state property const [img, setImg] = useState(''); // rest of code } export default Home; ``` Once you have that all set, we can go back into the `generateAction` function and add this line to the end of it: ```jsx const generateAction = async () => { console.log('Generating...'); const response = await fetch('/api/generate', { method: 'POST', headers: { 'Content-Type': 'image/jpeg', }, body: JSON.stringify({ input }), }); const data = await response.json(); if (response.status === 503) { console.log('Model still loading...'); return; } if (!response.ok) { console.log(`Error: ${data.error}`); return; } // Set image data into state property setImg(data.image); }; ``` And that’s it! At this point you are successfully using Fetch to send a request out to the internet. Pretty magical right? Type something in your input, give it a spin and… wait a second… it’s insanely broken lol. ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/1a1beca6-c28f-48b3-b888-fbcfa8046d29/Untitled.png) We got a `404`? A `404` usually means that the endpoint (or API) could not be found! There is one very important step we are missing here — **actually write the API code.** The beauty of Next.js is that you can easily spin up serverless functions within the same project and not worry about any of the hosting / maintenance of servers / etc. It’s insanely cool and done by just creating files and writing some code in them! To get this thing working, let’s go ahead and write our first endpoint :). Go ahead and start by creating a new folder in the `pages` directory called `api` . Within this directory you are going to create a new file called `generate.js`. The amazing think about Next.js is how it uses folder structures to define your API path. For example, we just create a folder called `api` and in that folder a file called `generate` . If you go back to your `index.js` file you’ll notice that the API endpoint we are calling is `api/generate`. It literally just uses the folder structure! Okay epic — let’s write some code. First thing is first, let’s write a function that will be run when we hit this endpoint: ```jsx const generateAction = async (req, res) => { console.log('Received request') } export default generateAction; ``` You’re going to start to see a lot of similarities here as we go through this, but same as before, lets log some stuff out when this thing is called. Only difference is these log statements will show up in your terminal where you ran `npm run dev` . Once you have that setup, go ahead and rerun the `npm run dev` command and go ahead and press the generate button. ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/444ccb3f-9d6c-4966-b338-41989bbf9ab7/Untitled.png) If you inspect the network tab, you’ll see your request going through — **LFGGG.** Big moves right there for real. You may notice it stays stuck on pending, but don’t sweat it we are going to fix that soon :). You should also notice that in your VSCode terminal “Received request” printed out! Now that we know we are receiving requests from our frontend, let’s actually do the things we need it to do lol. Inside the `generateAction` we are going to start by grabbing the input from our request. Remember we are sending over the input text when we send the request? We can grab it like this: ```jsx const generateAction = async (req, res) => { console.log('Received request'); // Go input from the body of the request const input = JSON.parse(req.body).input; }; ``` At this point we will have the input that is sent over from the UI and can use it to call our Inference API in Hugging Face. For that we are going to write another fetch request. I’m going to drop it here and will explain more: ```jsx const generateAction = async (req, res) => { console.log('Received request'); const input = JSON.parse(req.body).input; // Add fetch request to Hugging Face const response = await fetch( `https://api-inference.huggingface.co/models/buildspace/ai-avatar-generator`, { headers: { Authorization: `Bearer ${process.env.HF_AUTH_KEY}`, 'Content-Type': 'application/json', }, method: 'POST', body: JSON.stringify({ inputs: input, }), } ); }; ``` This should look pretty similar to what we saw on the frontend, except with some additions! First off — the url. This url is the path that points to your model in hugging face. This is mine, but to find yours all you need is this: [`https://api-inference.huggingface.co/models/](https://api-inference.huggingface.co/models/buildspace/ai-avatar-generator%60){USERNAME}/{MODEL_NAME}` The next thing you’ll notice is there is a `headers` object in our request. In order for Hugging Face to allow us to use their Inference API, we need to have an API key associated with our account. This key will tell Hugging Face we are authorized to access this Inference API — **so make sure to keep it secret.** Head over to the [tokens](https://huggingface.co/settings/tokens) page and get a write token - you ****can**** use the same one you generated for your Colab, it’ll work fine. In our `generateAction` function you’ll see some weird syntax that looks like this `processs.env.HF_AUTH_KEY`. This is a special way for Next.js to read secret keys like this without exposing it to the user! Imagine if everyone could see your password every time you logged into a website? This helps stop that! To start, take a look at the `.example.env` file. This was created to show you how we need to properly set our API key. Go ahead and create a new file called `.env` at the root of your project and use the same setup like so: ```jsx HF_AUTH_KEY=YOUR_API_KEY_HERE ``` Don’t forget to `CMD/CTRL` + `C` the terminal and rerun `npm run dev` to make sure this file is compiled with your build else it may not get picked up! **ALRIGHT** — now last thing here this property called `body`. This is where we will take the input we received from the user and and pass it along to Hugging Face! You may notice that the object has this property called `inputs` . If you head back to your model on the Hugging Face website, open up the network inspector, and run another text to image. ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/ff26db27-90bb-4540-959d-8e4068c56d7c/Untitled.png) In the payload you’ll see that it is expecting the `inputs` property to be the text we entered! This is cool, because you just did a bit of reverse engineering — picking up skills left and right out here! [You can also dig through the inference api detailed parameters docs here](https://huggingface.co/docs/api-inference/detailed_parameters) :) Okay okay okay — we are almost ready to run this thing. Let’s add a **FEW** more checks before we do our first try. Take a look at this code below and drop it in under your fetch: ```jsx const generateAction = async (req, res) => { console.log('Received request'); const input = JSON.parse(req.body).input; const response = await fetch( `https://api-inference.huggingface.co/models/buildspace/ai-avatar-generator`, { headers: { Authorization: `Bearer ${process.env.HF_AUTH_KEY}`, 'Content-Type': 'application/json', }, method: 'POST', body: JSON.stringify({ inputs: input, }), } ); // Check for different statuses to send proper payload if (response.ok) { const buffer = await response.buffer(); res.status(200).json({ image: buffer }); } else if (response.status === 503) { const json = await response.json(); res.status(503).json(json); } else { const json = await response.json(); res.status(response.status).json({ error: response.statusText }); } }; ``` This should be pretty self explanatory — we are checking for three different statuses: `ok`, `503`, and any other error! Let’s break these down a bit more: `ok` - Remember this is essentially any successful status code like a `200`. This means the call was a success and it should return back the image. Now the interesting part here is taking our response and converting it into a `buffer` . In order for us to set our image in our UI we will need to convert it into a form that our UI will be able to read. Let’s start with a buffer and see what happens :). `503` - We will receive this when our model is still loading. This response will include two properties — `error` and `estimated_time` . `error` will just be a message stating what is happening and `estimated_time` is how much longer it may take to load the model. We will be using the `estimated_time` to setup a retry method soon so keep that in mind! `any other error` - If there are any other errors, send it back to our UI with what the problem is — this one’s easy. **OKAY NICE.** We’re at a really good spot to test our first run here. Let’s go ahead and see what happens and keep building from there! I suggest keeping your network tab open so you can see your request go through and complete :). Write some prompt, press generate and let’s see what happens: ![Screenshot 2022-12-20 at 2.39.11 PM.png](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/a2d7bc63-062a-4a0e-bb3e-de29a6703ed4/Screenshot_2022-12-20_at_2.39.11_PM.png) Holy shit just like that and I received a response! You can see here that I responded with my buffer no problem! Now, let’s change the prompt a tad — woh we received a 503 😅. Looks like our model is still loading here: ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/f4d9fa7f-89db-40b8-b2d3-3d757b46f863/Untitled.png) Hmmm so we have a bit of a problem then don’t we? When we receive a `503` we need to make the request again when we think the model has loaded. Well, we have the estimated time left, why don’t we just send a request after waiting x number of seconds? Let’s head back to our `index.js` file and start by adding three things — a `maxRetries` property, `retryCount` property and `retry` property: ```jsx const Home = () => { // Don't retry more than 20 times const maxRetries = 20; const [input, setInput] = useState(''); const [img, setImg] = useState(''); // Numbers of retries const [retry, setRetry] = useState(0); // Number of retries left const [retryCount, setRetryCount] = useState(maxRetries); // rest of code } export default Home; ``` Okay a lot of new properties were introduced here — but let me explain. We know that when we receive a `503` that we get the number (in seconds) of how long until the model loads. This number can change so let’s make sure to set that in a state property, `retry`. We can use that property to setup a timer to wait for x number of seconds, but sometimes models can take up to 10 minutes to load into memory (one of the caveats of a free instance like this) and we don’t want to keep spamming this endpoint for 10 minutes. Thats where the `maxRetries` comes in. After 20 tries, let’s just drop a message in the console saying — “hey you just need to wait longer for this thing to load before trying to make a request”. Finally, we control the retries left with the `retryCount` property! After each request we will count down that number. Now that we got that under control, lets add a bit of code to our `generateAction` function in `index.js`: ```jsx const generateAction = async () => { console.log('Generating...'); // If this is a retry request, take away retryCount if (retry > 0) { setRetryCount((prevState) => { if (prevState === 0) { return 0; } else { return prevState - 1; } }); setRetry(0); } const response = await fetch('/api/generate', { method: 'POST', headers: { 'Content-Type': 'image/jpeg', }, body: JSON.stringify({ input }), }); const data = await response.json(); if (response.status === 503) { // Set the estimated_time property in state setRetry(data.estimated_time); return; } if (!response.ok) { console.log(`Error: ${data.error}`); return; } setImg(data.image); }; ``` At the very top you’ll notice that we check if retry is greater than 0. If it is, set our `retryCount` to be one less since we are about to make another call to the Inference API. Then we will set `retry` back to 0. Then you’ll notice that we set the `retry` property with the value of `estimated_time` . Now we know how long we should wait until making this request again! Okay cool! Now the problem becomes, where do we actually call this retry? All we’ve done is handle it if there is a retry. For this we are going to use React `useEffect`. What we want to happen is to trigger a retry when the `retry` property changes. `useEffect` is perfect for this because it will run some code anytime a certain property changes (just like `retry` ). Start by importing `useEffect` at the top of `index.js`: ```jsx // Add useEffect here import { useState, useEffect } from 'react'; import Head from 'next/head'; import Image from 'next/image'; import buildspaceLogo from '../assets/buildspace-logo.png'; const Home = () => {...} export default Home ``` Now, right above our render function we are going to add this: ```jsx const Home = () => { const maxRetries = 20; const [input, setInput] = useState(''); const [img, setImg] = useState(''); const [retry, setRetry] = useState(0); const [retryCount, setRetryCount] = useState(maxRetries); const onChange = (event) => { setInput(event.target.value); }; const generateAction = async () => {...} // Add useEffect here useEffect(() => { const runRetry = async () => { if (retryCount === 0) { console.log( `Model still loading after ${maxRetries} retries. Try request again in 5 minutes.` ); setRetryCount(maxRetries); return; } console.log(`Trying again in ${retry} seconds.`); await sleep(retry * 1000); await generateAction(); }; if (retry === 0) { return; } runRetry(); }, [retry]); return ( // rest of code ); }; ``` Okay this may look pretty confusing, but I got you — check it: ```jsx if (retryCount === 0) { console.log( `Model still loading after ${maxRetries} retries. Try request again in 5 minutes.` ); setRetryCount(maxRetries); return; } ``` You’ll see this function inside another function? That’s wacky lol. Don’t worry too much about why this is here, but basically we need to run an `async` function inside an `useEffect` and this is how we do it! This function is the meat of it. Here we are first checking to see if `retryCount` is 0, if it is we don’t run anymore requests. Pretty simple! ```jsx console.log(`Trying again in ${retry} seconds.`); await sleep(retry * 1000); ``` If we have some retries left, we need to wait the `retry` amount. Thats where the `sleep` function comes in! You may have noticed that we never defined that, so let’s actually add this right above our `useEffect` like this: ```jsx const sleep = (ms) => { return new Promise((resolve) => { setTimeout(resolve, ms); }); }; ``` We are using a fancy implementation of `setTimeout` to allow it to “sleep” or “wait” until it keeps going! Promises are some wild shit in Javascript — [take a deeper look at them here](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise) if you are interested! ```jsx await generateAction(); ``` Finally, if we are ready to rock — call `generateAction` ! This will run through the initial checks we wrote in that function :). A couple more things to note in this `useEffect` : ```jsx if (retry === 0) { return; } runRetry(); ``` We want to actually run `runRetry` when the `retry` property changes. The only thing we need to verify is if `retry` is 0 since the property is initialized with 0. So if we step back real quick this is what just went down: - We wrote our `generate` API and are capturing if we receive a `503` - If we receive a `503` thats retry the request in x number seconds by setting the `retry` property - Once `retry` is set, check to see if we have reached `maxRetries` if not, run the request after x number of seconds This is some advanced web dev stuff so give yourself a massive high five before running this thing. Tons of stuff happening here and you just built it — **good shit**. Alright, lets go ahead and open our console in our web browser and try running a prompt one more time: ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/92b6f35f-96f1-4fbf-bd09-3d8b511e00a8/Untitled.png) Holy shit — you are making retries, thats nuts! Now this will keep going until you receive an image response 🤘 You may notice while running this that the UI feels **SUPER** wack. The only way you know if something is happening is if you open the console lol. You aren’t going to tell your mom to open the console on her browser right? Let’s fix that by adding a loading indicator! Start by creating a new state property called `isGenerating` where we have declared all of our other state: ```jsx const maxRetries = 20; const [input, setInput] = useState(''); const [img, setImg] = useState(''); const [retry, setRetry] = useState(0); const [retryCount, setRetryCount] = useState(maxRetries); // Add isGenerating state const [isGenerating, setIsGenerating] = useState(false); ``` Then head to the `generateAction` function and add them in these few spots: ```jsx const generateAction = async () => { console.log('Generating...'); // Add this check to make sure there is no double click if (isGenerating && retry === 0) return; // Set loading has started setIsGenerating(true); if (retry > 0) { setRetryCount((prevState) => { if (prevState === 0) { return 0; } else { return prevState - 1; } }); setRetry(0); } const response = await fetch('/api/generate', { method: 'POST', headers: { 'Content-Type': 'image/jpeg', }, body: JSON.stringify({ input }), }); const data = await response.json(); if (response.status === 503) { setRetry(data.estimated_time); return; } if (!response.ok) { console.log(`Error: ${data.error}`); // Stop loading setIsGenerating(false); return; } setImg(data.image); // Everything is all done -- stop loading! setIsGenerating(false); }; ``` You see here there are four different spots we are going to use this state. Now that we are changing this property, lets do something with it. Head to your render function and go to the `prompt-buttons` div and add this: ```jsx <div className="prompt-container"> <input className="prompt-box" value={input} onChange={onChange} /> <div className="prompt-buttons"> {/* Tweak classNames to change classes */} <a className={ isGenerating ? 'generate-button loading' : 'generate-button' } onClick={generateAction} > {/* Tweak to show a loading indicator */} <div className="generate"> {isGenerating ? ( <span className="loader"></span> ) : ( <p>Generate</p> )} </div> </a> </div> </div> ``` A lot of the CSS around this loading indicator is located in `styles/styles.css` so make sure to go check it out and change it to fit your flow + vibe. Now that we have a loading indicator set, let’s give this another spin — type in another prompt and let it rip: ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/e1d96356-7ec0-453d-9f35-aee3cf95228f/Untitled.png) Yooo loading indicator working just as expected! Freaking insane. We are coming up on the tail end here! We setup some UI, called our Inference API, and are handling scenarios for when our model is loading. I guess it’s time to actually display this image in the UI eh? Let’s start with adding some UI elements in our render function like this: ```jsx <div className="root"> <Head> <title>Silly picture generator | buildspace</title> </Head> <div className="container"> <div className="header"> <div className="header-title"> <h1>Silly picture generator</h1> </div> <div className="header-subtitle"> <h2> Turn me into anyone you want! Make sure you refer to me as "abraza" in the prompt </h2> </div> <div className="prompt-container"> <input className="prompt-box" value={input} onChange={onChange} /> <div className="prompt-buttons"> <a className={ isGenerating ? 'generate-button loading' : 'generate-button' } onClick={generateAction} > <div className="generate"> {isGenerating ? ( <span className="loader"></span> ) : ( <p>Generate</p> )} </div> </a> </div> </div> </div> {/* Add output container */} {img && ( <div className="output-content"> <Image src={img} width={512} height={512} alt={input} /> </div> )} </div> <div className="badge-container grow"> <a href="https://buildspace.so/builds/ai-avatar" target="_blank" rel="noreferrer" > <div className="badge"> <Image src={buildspaceLogo} alt="buildspace logo" /> <p>build with buildspace</p> </div> </a> </div> </div> ``` Close to the bottom here you’ll see some logic that says, “if there is something in the `img` property display this Image”. Amazinggg, now what if we make this a bit cooler? When we press generate, lets remove the prompt from the input box and display it under the image that we show in our UI! To do this go ahead and create yet another state property called `finalPrompt` here: ```jsx const maxRetries = 20; const [input, setInput] = useState(''); const [img, setImg] = useState(''); const [retry, setRetry] = useState(0); const [retryCount, setRetryCount] = useState(maxRetries); const [isGenerating, setIsGenerating] = useState(false); // Add new state here const [finalPrompt, setFinalPrompt] = useState(''); ``` Now that we have that head to the `generateAction` function and add this line towards this bottom: ```jsx const generateAction = async () => { console.log('Generating...'); if (isGenerating && retry === 0) return; setIsGenerating(true); if (retry > 0) { setRetryCount((prevState) => { if (prevState === 0) { return 0; } else { return prevState - 1; } }); setRetry(0); } const response = await fetch('/api/generate', { method: 'POST', headers: { 'Content-Type': 'image/jpeg', }, body: JSON.stringify({ input }), }); const data = await response.json(); if (response.status === 503) { setRetry(data.estimated_time); return; } if (!response.ok) { console.log(`Error: ${data.error}`); setIsGenerating(false); return; } // Set final prompt here setFinalPrompt(input); // Remove content from input box setInput(''); setImg(data.image); setIsGenerating(false); }; ``` We take the input and set it as a new property and then finally remove it from the current input. Once we have that done, we have one more piece to do — display it! Head down to where you declared where you will be displaying the image and add this: ```jsx {img && ( <div className="output-content"> <Image src={img} width={512} height={512} alt={finalPrompt} /> {/* Add prompt here */} <p>{finalPrompt}</p> </div> )} ``` LFG. We are ready to display some images, pretty freaking hype ngl. Let’s go and run a prompt and see our image in all it’s glory: ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/2dd7a3d2-4f19-4302-bf78-881ad86fa93f/Untitled.png) Wait… wtf? This image is broken AF lol. There is actually 1 more thing we need to do in order to get this to work properly. If you remember from our API , we were returning a `buffer` to our frontend. Well, in order to display an image we need to convert that `buffer` into a `base64` string. This is the only way that our frontend will understand what as an image! For this, let’s had back to `generate.js` and we are going to create a new function called `bufferToBase64` : ```jsx const bufferToBase64 = (buffer) => { const base64 = buffer.toString('base64'); return `data:image/png;base64,${base64}`; }; ``` It’s a super simple function that takes in a `buffer` and adds some image decorators to it so our UI will know it’s an image! Now take that function and inside of our `generateAction` and add this function in the `ok` response: ```jsx const generateAction = async (req, res) => { console.log('Received request'); const input = JSON.parse(req.body).input; const response = await fetch( `https://api-inference.huggingface.co/models/buildspace/ai-avatar-generator`, { headers: { Authorization: `Bearer ${process.env.HF_AUTH_KEY}`, 'Content-Type': 'application/json', }, method: 'POST', body: JSON.stringify({ inputs: input, }), } ); if (response.ok) { const buffer = await response.buffer(); // Convert to base64 const base64 = bufferToBase64(buffer); // Make sure to change to base64 res.status(200).json({ image: base64 }); } else if (response.status === 503) { const json = await response.json(); res.status(503).json(json); } else { const json = await response.json(); res.status(response.status).json({ error: response.statusText }); } }; ``` **OKAY** — NOW this will work (I promise hehe). Give it one more run and watch all the glory of your web app take form 🥲. ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/b9d84f61-3d00-4cae-8ea9-a6a172252cef/Untitled.png) Take a moment to look back on the last few things you’ve done. You may have had 0 knowledge on training models and you now have officially trained your very own model (pretty insane tbh). You can now take this site and make it better. You can even potentially start making revenue off of it! I want to give you a few more ideas to work on in the next section that can level up your current page. # Finishing touches ### Hide your unique subject identifier It’s kinda wack that your users have to type “abraza” to generate prompts of you. This is an easy fix! Just use the Javascript `replace()` function to hide it from them! Right before you make the API call in `index.js` inside `generateAction` , put in something like this: ```jsx const finalInput = input.replace(/raza/gi, 'abraza'); const response = await fetch('/api/generate', { method: 'POST', headers: { 'Content-Type': 'image/jpeg', }, body: JSON.stringify({ input: finalInput }), }); ``` `replace` takes in a regular expression, that’s what the `/raza/gi` fanciness is. You can use something like [AutoRegex](https://www.autoregex.xyz/) which is a GPT powered regex translator if you have various spellings or nicknames! Most of the time, `replace("name", "unique_ting")` will work just fine. You can read up more about replace [here](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace), it’s pretty simple, and regular expressions [here](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp) (they’re not simple at all 💀) ### Give your users some fancy prompts Shall we bestow some of your magic onto your users? They’ll get much better results if you give them some prompts they can modify. They probably don’t know who all these fancy artists are, so let’s build some buttons that fill these prompts in! I’m not gonna walk you through this, but really all this would be is a set of buttons that update the value of `input` in `index.js` to preset prompts. While you’re at it, might as well split up the input bar into the core 4 pieces - artist, medium, vibe, descriptors. This will train the users ***how*** to write good prompts without them even realising! So you gotta build two things - 1. A few buttons that auto populate the prompt input field with preset prompts 2. Four fields for each minor Here’s a messy mock-up of what this might look like: ![Untitled](https://s3-us-west-2.amazonaws.com/secure.notion-static.com/6705888b-0585-4756-b662-7680450f9d4e/Untitled.png) All you’d have to do is `[concat](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/concat)` the fields together for the final prompt. Ezpz. These two bits are really important! Most devs don’t really think about how simple tweaks like these impact the user experience. By being explicit about what the user needs to describe, everyone from your grandma to your dog will be able to generate good stuff. ************************************************************************Designing simple products takes more work than messy, complicated products.************************************************************************ My design has lots of room for improvement. Should the artist field be a dropdown instead? Wtf is a descriptor? What do vibes even look like? I’ll leave it up to you to take it further, maybe store generated images and their prompts in a DB so your users can look at old results while they wait for new images to be generated? That’d be pretty cool! ### How do I let other people generate avatars of their own? The big money maker. Getting people to generate their own images. There’s no way around training models - you’ll need to use Dreambooth to created a customised model **for each person**. This **********will********** cost money. The way the big players like Lensa and AvatarAI do it is renting bare metal GPUs via cloud providers like AWS or GCP. Their entire operation is a programmatic way of the manual parts you did in this build. If I had to guess, their flow is probably something like: 1. Get 5-10 images from user 2. Process images (resize, remove background) 3. Tune Stable Diffusion model with GPUs 4. Use predefined prompts to generate 50-100 images 5. Send user images, maybe delete their model All of this is simple-ish to do programmatically. The trick here is getting GPUs for as cheap as possible. Idk if you can get GPUs as cheap as Lensa ($3.49 for 100 avatars lol), but the opportunity here is in steps #2 and #3 I think. There are awesome platforms out there that help you build these flows for a cheaper price such as, [banana.dev](https://banana.dev). We were able to get some credits from them to give you! Complete this project, claim your NFT, and get instructions on how you can get access to 10 free hours of GPU time! Maybe thats all you need to get your business going 🤘. ## Deploy with Railway **GTFOL: Let’s go to prod.** It’s time to [gtfol](https://www.urbandictionary.com/define.php?term=GTFOL&utm_source=buildspace.so&utm_medium=buildspace_project). We don’t want to just stay on localhost, after all. That’d be boring! The whole point of this app is to let your friends and family create alternate realities with you. Deploying a NextJS app has gotten **SUPER** easy - this should just take a few minutes — and then you’ll have a link to your creation you can share with the world. Check out this vid to find out what you’ll need to do here :) [https://vimeo.com/786521187](https://vimeo.com/786521187) ### Beyond localhost! // What’s next? ### What you’ve learned When you’re following a project guide and not doing everything yourself from scratch, it can be easy to forget or overlook what you’ve learned. Here’s a list of all the cool technologies you learned how to use in this project: 1. Dreamstudio 2. Google Colab 3. Jupyter notebooks 4. HuggingFace 5. Text-to-image models like Stable Diffusion and Dall-e 6. Next.js WOW. All these tools are now in your arsenal. You can spin up new projects using any of these and I fully expect you to. I better be seeing some tags from you on Twitter when you make even cooler stuff than this app. Cya around you wizard :)