22 Karras et al (2022)

# 22 Karras et al (2022) ## Recall ![Screenshot 2025-07-09 at 6.36.41 AM](https://hackmd.io/_uploads/SJNULTiBle.jpg) ![Screenshot 2025-07-08 at 3.39.05 PM](https://hackmd.io/_uploads/rk-g4xsBlx.jpg) ![Screenshot 2025-07-08 at 10.18.35 AM](https://hackmd.io/_uploads/B1k6gxiHll.jpg) ![Screenshot 2025-07-08 at 10.08.25 AM](https://hackmd.io/_uploads/HkICgloHlg.jpg) ## DDPM/DDIM improvements - A **Cosine Scheduler** is a type of learning rate scheduler used in machine learning, particularly deep learning, to adjust the learning rate during **training** in a cosine-shaped pattern. - Purpose The idea is to start with a relatively high learning rate, then gradually decrease it following a cosine function, which helps: - Avoid sharp changes - Smoothly reduce the learning rate - Improve convergence and generalization ## Predicting the amount of noise in an image | **Configuration** | **FID ↓** | **KID ↓** | **Shape** | | ----------------- | --------- | -------------- | ------------------- | | `model-t eta 1.0` | **3.88** | **0.00441** | `[2048, 1, 32, 32]` | | `model-t eta 0.5` | 4.58 | ❌ **-0.00111** | `[2048, 1, 32, 32]` | | `model-t eta 0` | 5.75 | 0.01767 | `[2048, 1, 32, 32]` | | `sig *= 0.5` | 4.01 | 0.00347 | `[2048, 1, 32, 32]` | **Notes**: - Best FID: **`model-t eta 1.0`** (lowest = best quality) - Invalid KID: **eta 0.5** gives a negative KID (should be ≥ 0), likely due to instability or small batch noise - Best overall: **`model-t eta 1.0`** has the best FID and valid low KID - **sig *= 0.5** shows competitive FID and KID — could be worth exploring further ## **Noise scheduling for diffusion models** Noise scheduling defines how much noise is added at each timestep 𝑡 during the forward process, and how the model learns to reverse that noise during generation. ### **Paper: The Role of Noise Scheduling in Diffusion Models** #### **Why Noise Scheduling Matters** - The **noise schedule γ(t)** controls how much noise is added at each time step `t` during training. - It significantly impacts the model’s performance. - The **optimal noise schedule depends on the specific task and image resolution**. #### **Signal-to-Noise Ratio (SNR) Insight** - For larger images, noise becomes **relatively weaker** because noise is added per pixel. - Example: Adding the same noise to 64×64 and 1024×1024 images → larger image appears **less noisy**. - Therefore, noise schedules that work well for small images may **undertrain** higher-resolution images if not adapted. #### **Noise Scheduling Strategies** ##### 1. **Custom Noise Functions** - Use **cosine**, **sigmoid**, or **linear** schedules: - Example: γ(t)=1−t - Hyperparameters like start, end, and **temperature (τ)** control the curve shape. - Schedules are often **skewed toward noisier levels** to better guide the model. ##### 2. **Input Scaling Factor (b)** - Modify input: ![Screenshot 2025-07-08 at 8.00.21 AM](https://hackmd.io/_uploads/rym3PaqSlg.jpg) - Decreasing `b` → increases effective noise. - But changing `b` may disturb variance, harming performance. - Fix: **Normalize `x_t` to unit variance** before feeding into the model. - Log-SNR plots show **shift in curve shape** depending on `b` and γ(t). #### **Inference vs Training Schedule** - **Training**: Uses continuous time `t ∈ [0, 1]` - **Inference**: Can use a **different γ(t)** schedule - Uniform discretization of time - Cosine schedule is effective for **sampling** #### **Conclusion** - **Noise scheduling is crucial** and must be **task- and resolution-aware**. - Use appropriate schedules and scaling strategies to **maximize model performance**. ## **DDPM/DDIM Enhancements:** - Removed the constraint of an integral number of diffusion steps to allow a more **continuous process**. - Introduced a method to **predict noise in an image without using the time step as input**. - Modified the **DDIM step** to use **predicted alpha bar** values specific to each image. ## **Main Focus: Karras et al. (2022) Paper** *"Elucidating the Design Space of Diffusion-Based Generative Models"* - Introduces **pre-conditioning** to normalize inputs and targets to **unit variance**. - The model predicts a combination of the **clean image and noise**, based on the input's noise level. - Purpose of Samplers in Diffusion Models In the reverse process, the model starts from noise and iteratively removes noise to generate realistic data. This is equivalent to solving a differential equation backward in time. Samplers determine how that equation is solved numerically. - **Sampling Techniques Covered:** - **Euler sampler** - x = x + (x - denoised) / sigma_t * (sigma_{t+1} - sigma_t) - ![Euler_method.svg](https://hackmd.io/_uploads/SJvjy0iBxx.png) - Based on the Euler method for solving ODEs. - Only uses the current gradient to update the sample. - No random noise is added → deterministic path. - ✅ Pros: - Fast and simple - Deterministic output (good for reproducibility) - Effective when paired with DDIM-style models - ❌ Cons: - May lack diversity (same sample each time) - Can underperform when stochasticity is important - **Euler Ancestral sampler** - x = x + (x - denoised) / sigma_t * (sigma_down - sigma_t) - x = x + torch.randn_like(x) * sigma_up - Adds random noise proportional to sigma_up - sigma_down & sigma_up computed from eta parameter (interpolates between DDIM and DDPM) - Useful for more diverse and realistic images - ✅ Pros: - Higher sample diversity - More flexible (adjustable eta) - Often better image quality with large models - ❌ Cons: - Stochastic (less reproducible) - More complex and may require tuning - **Heuns method** - First guess (Euler step) - x1 = x + dt * f(x, t) - Second guess using updated slope - x2 = x + dt * f(x1, t + dt) - Final estimate: average the two - x = x + 0.5 * dt * (f(x, t) + f(x2, t + dt)) - ![Heun's_Method_Diagram](https://hackmd.io/_uploads/SJjb10jrle.jpg) - Takes two estimates of the slope and averages them. - More accurate than standard Euler. - Used in Karras et al. (2022) as one of the best samplers. - ✅ Pros: - High image quality - Low FID, especially with good preconditioning - ❌ Cons: - Slower than Euler (needs 2 model calls per step) - More memory usage - **LMS(Linear Multistep Sampler) sampler** - x_t+1 = a1 * x_t + a2 * x_{t-1} + a3 * x_{t-2} + ... + noise - Popular in stable diffusion and Elucidated Diffusion - Typically uses 2–4 previous timesteps - ✅ Pros: - Extremely good sample quality (low FID) - Efficient when using fewer steps - ❌ Cons: - Needs to store past states - May be unstable if parameters not chosen carefully - [GPT整理] | Sampler | Type | Description | Reproducible? | Quality (FID ↓) | Speed | | ------------------- | ------------- | ------------------ | ------------- | --------------- | ------------------------ | | **Euler** | Deterministic | First-order reverse update | ✅ Yes | Medium | ⚡ Fast | | **Euler Ancestral** | Stochastic | Adds controlled noise at each step | ❌ No | High | ⚡ Fast | | **Heun’s Method** | Deterministic | Second-order (more accurate) update | ✅ Yes | ⭐ Very High | 🐢 Slower | | **LMS** | Deterministic | Uses past steps for better estimation | ✅ Yes | ⭐ Very High | ⚡ Fast (with good setup) | - **Key Teaching Points:** - Emphasizes the importance of **understanding concepts from research papers**. - Demonstrates how to apply theoretical techniques to **enhance generative model performance**. # Jupyter notebooks - [https://github.com/fastai/course22p2/blob/master/nbs/22_cosine.ipynb](https://github.com/fastai/course22p2/blob/master/nbs/22_cosine.ipynb) - [https://github.com/fastai/course22p2/blob/master/nbs/22_noise-pred.ipynb](https://github.com/fastai/course22p2/blob/master/nbs/22_noise-pred.ipynb) - [https://github.com/fastai/course22p2/blob/master/nbs/23_karras.ipynb](https://github.com/fastai/course22p2/blob/master/nbs/23_karras.ipynb) # Other resources - [【生成式AI】Diffusion Model 原理剖析 ](https://www.youtube.com/watch?v=ifCDXFdeaaM) - [Diffusion Models: DDPM | Generative AI Animated](https://www.youtube.com/watch?v=EhndHhIvWWw) - [大白话AI | Diffusion Model](https://www.youtube.com/watch?v=zEZOYZeIPUs) - [Stable Diffusion 原理详解](https://www.youtube.com/watch?v=I62Ju6FEOGQ) - [Diffusion web demo](https://wangjia184.github.io/diffusion_model/) - [How diffusion models work: the math from scratch](https://theaisummer.com/diffusion-models/)