# 22 Karras et al (2022)
## Recall




## DDPM/DDIM improvements
- A **Cosine Scheduler** is a type of learning rate scheduler used in machine learning, particularly deep learning, to adjust the learning rate during **training** in a cosine-shaped pattern.
- Purpose
The idea is to start with a relatively high learning rate, then gradually decrease it following a cosine function, which helps:
- Avoid sharp changes
- Smoothly reduce the learning rate
- Improve convergence and generalization
## Predicting the amount of noise in an image
| **Configuration** | **FID ↓** | **KID ↓** | **Shape** |
| ----------------- | --------- | -------------- | ------------------- |
| `model-t eta 1.0` | **3.88** | **0.00441** | `[2048, 1, 32, 32]` |
| `model-t eta 0.5` | 4.58 | ❌ **-0.00111** | `[2048, 1, 32, 32]` |
| `model-t eta 0` | 5.75 | 0.01767 | `[2048, 1, 32, 32]` |
| `sig *= 0.5` | 4.01 | 0.00347 | `[2048, 1, 32, 32]` |
**Notes**:
- Best FID: **`model-t eta 1.0`** (lowest = best quality)
- Invalid KID: **eta 0.5** gives a negative KID (should be ≥ 0), likely due to instability or small batch noise
- Best overall: **`model-t eta 1.0`** has the best FID and valid low KID
- **sig *= 0.5** shows competitive FID and KID — could be worth exploring further
## **Noise scheduling for diffusion models**
Noise scheduling defines how much noise is added at each timestep 𝑡 during the forward process, and how the model learns to reverse that noise during generation.
### **Paper: The Role of Noise Scheduling in Diffusion Models**
#### **Why Noise Scheduling Matters**
- The **noise schedule γ(t)** controls how much noise is added at each time step `t` during training.
- It significantly impacts the model’s performance.
- The **optimal noise schedule depends on the specific task and image resolution**.
#### **Signal-to-Noise Ratio (SNR) Insight**
- For larger images, noise becomes **relatively weaker** because noise is added per pixel.
- Example: Adding the same noise to 64×64 and 1024×1024 images → larger image appears **less noisy**.
- Therefore, noise schedules that work well for small images may **undertrain** higher-resolution images if not adapted.
#### **Noise Scheduling Strategies**
##### 1. **Custom Noise Functions**
- Use **cosine**, **sigmoid**, or **linear** schedules:
- Example: γ(t)=1−t
- Hyperparameters like start, end, and **temperature (τ)** control the curve shape.
- Schedules are often **skewed toward noisier levels** to better guide the model.
##### 2. **Input Scaling Factor (b)**
- Modify input:

- Decreasing `b` → increases effective noise.
- But changing `b` may disturb variance, harming performance.
- Fix: **Normalize `x_t` to unit variance** before feeding into the model.
- Log-SNR plots show **shift in curve shape** depending on `b` and γ(t).
#### **Inference vs Training Schedule**
- **Training**: Uses continuous time `t ∈ [0, 1]`
- **Inference**: Can use a **different γ(t)** schedule
- Uniform discretization of time
- Cosine schedule is effective for **sampling**
#### **Conclusion**
- **Noise scheduling is crucial** and must be **task- and resolution-aware**.
- Use appropriate schedules and scaling strategies to **maximize model performance**.
## **DDPM/DDIM Enhancements:**
- Removed the constraint of an integral number of diffusion steps to allow a more **continuous process**.
- Introduced a method to **predict noise in an image without using the time step as input**.
- Modified the **DDIM step** to use **predicted alpha bar** values specific to each image.
## **Main Focus: Karras et al. (2022) Paper**
*"Elucidating the Design Space of Diffusion-Based Generative Models"*
- Introduces **pre-conditioning** to normalize inputs and targets to **unit variance**.
- The model predicts a combination of the **clean image and noise**, based on the input's noise level.
- Purpose of Samplers in Diffusion Models
In the reverse process, the model starts from noise and iteratively removes noise to generate realistic data.
This is equivalent to solving a differential equation backward in time.
Samplers determine how that equation is solved numerically.
- **Sampling Techniques Covered:**
- **Euler sampler**
- x = x + (x - denoised) / sigma_t * (sigma_{t+1} - sigma_t)
- 
- Based on the Euler method for solving ODEs.
- Only uses the current gradient to update the sample.
- No random noise is added → deterministic path.
- ✅ Pros:
- Fast and simple
- Deterministic output (good for reproducibility)
- Effective when paired with DDIM-style models
- ❌ Cons:
- May lack diversity (same sample each time)
- Can underperform when stochasticity is important
- **Euler Ancestral sampler**
- x = x + (x - denoised) / sigma_t * (sigma_down - sigma_t)
- x = x + torch.randn_like(x) * sigma_up
- Adds random noise proportional to sigma_up
- sigma_down & sigma_up computed from eta parameter (interpolates between DDIM and DDPM)
- Useful for more diverse and realistic images
- ✅ Pros:
- Higher sample diversity
- More flexible (adjustable eta)
- Often better image quality with large models
- ❌ Cons:
- Stochastic (less reproducible)
- More complex and may require tuning
- **Heuns method**
- First guess (Euler step)
- x1 = x + dt * f(x, t)
- Second guess using updated slope
- x2 = x + dt * f(x1, t + dt)
- Final estimate: average the two
- x = x + 0.5 * dt * (f(x, t) + f(x2, t + dt))
- 
- Takes two estimates of the slope and averages them.
- More accurate than standard Euler.
- Used in Karras et al. (2022) as one of the best samplers.
- ✅ Pros:
- High image quality
- Low FID, especially with good preconditioning
- ❌ Cons:
- Slower than Euler (needs 2 model calls per step)
- More memory usage
- **LMS(Linear Multistep Sampler) sampler**
- x_t+1 = a1 * x_t + a2 * x_{t-1} + a3 * x_{t-2} + ... + noise
- Popular in stable diffusion and Elucidated Diffusion
- Typically uses 2–4 previous timesteps
- ✅ Pros:
- Extremely good sample quality (low FID)
- Efficient when using fewer steps
- ❌ Cons:
- Needs to store past states
- May be unstable if parameters not chosen carefully
- [GPT整理]
| Sampler | Type | Description | Reproducible? | Quality (FID ↓) | Speed |
| ------------------- | ------------- | ------------------ | ------------- | --------------- | ------------------------ |
| **Euler** | Deterministic | First-order reverse update | ✅ Yes | Medium | ⚡ Fast |
| **Euler Ancestral** | Stochastic | Adds controlled noise at each step | ❌ No | High | ⚡ Fast |
| **Heun’s Method** | Deterministic | Second-order (more accurate) update | ✅ Yes | ⭐ Very High | 🐢 Slower |
| **LMS** | Deterministic | Uses past steps for better estimation | ✅ Yes | ⭐ Very High | ⚡ Fast (with good setup) |
- **Key Teaching Points:**
- Emphasizes the importance of **understanding concepts from research papers**.
- Demonstrates how to apply theoretical techniques to **enhance generative model performance**.
# Jupyter notebooks
- [https://github.com/fastai/course22p2/blob/master/nbs/22_cosine.ipynb](https://github.com/fastai/course22p2/blob/master/nbs/22_cosine.ipynb)
- [https://github.com/fastai/course22p2/blob/master/nbs/22_noise-pred.ipynb](https://github.com/fastai/course22p2/blob/master/nbs/22_noise-pred.ipynb)
- [https://github.com/fastai/course22p2/blob/master/nbs/23_karras.ipynb](https://github.com/fastai/course22p2/blob/master/nbs/23_karras.ipynb)
# Other resources
- [【生成式AI】Diffusion Model 原理剖析 ](https://www.youtube.com/watch?v=ifCDXFdeaaM)
- [Diffusion Models: DDPM | Generative AI Animated](https://www.youtube.com/watch?v=EhndHhIvWWw)
- [大白话AI | Diffusion Model](https://www.youtube.com/watch?v=zEZOYZeIPUs)
- [Stable Diffusion 原理详解](https://www.youtube.com/watch?v=I62Ju6FEOGQ)
- [Diffusion web demo](https://wangjia184.github.io/diffusion_model/)
- [How diffusion models work: the math from scratch](https://theaisummer.com/diffusion-models/)