PixArt - HackMD

# PixArt-$\alpha$ FAST TRAINING OF DIFFUSION TRANSFORMER FOR PHOTOREALISTIC TEXT-TO-IMAGE SYNTHESIS * **Link:** [[pdf]](https://arxiv.org/pdf/2310.00426) * **Authors:** Huawei * **Comments:** ICLR 2024 ## Introduction > High resolution image synthesis up to 1024x2024 with low training cost * Training stratgy decomposition * Optimize pixel dependency * Text-image alignment * Image aesthetic quality * Efficient T2I Transformer * High-informative data ## Method * **TRAINING STRATEGY DECOMPOSITION** * Stage1: Pixel dependency learning * Initialize DiT from ImageNet trained model: inexpensive, learn dependency between features. * Stage2: Text-image alignment learning * Construct a dataset consisting of precise text-image pairs * Stage3: High-resolution and aesthetic image generation * Fine-tune the model using high-quality aesthetic data for high-resolution image generation * **DATASET CONSTRUCTION** * Use LLaVA for captioning: better captions ![image](https://hackmd.io/_uploads/BkWs4BKrA.png) * MODEL ARCHITECTURE ![image](https://hackmd.io/_uploads/Byss6xZwC.png) ## Experiments ### Datasets ### Results ## Misc # PixArt-$\delta$ FAST AND CONTROLLABLE IMAGE GENERATION WITH LATENT CONSISTENCY MODELS * **Link:** [[pdf]](https://arxiv.org/pdf/2401.05252) * **Authors:** Huawei * **Comments:** ## Introduction ## Method ## Experiments ### Datasets ### Results ## Misc # PixArt-$\Sigma$ Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation * **Link:** [[pdf]](https://arxiv.org/pdf/2403.04692) * **Authors:** Huawei * **Comments:** ## Introduction ## Method ## Experiments ### Datasets ### Results ## Misc