# PixArt-$\alpha$ FAST TRAINING OF DIFFUSION TRANSFORMER FOR PHOTOREALISTIC TEXT-TO-IMAGE SYNTHESIS * **Link:** [[pdf]](https://arxiv.org/pdf/2310.00426) * **Authors:** Huawei * **Comments:** ICLR 2024 ## Introduction > High resolution image synthesis up to 1024x2024 with low training cost * Training stratgy decomposition * Optimize pixel dependency * Text-image alignment * Image aesthetic quality * Efficient T2I Transformer * High-informative data ## Method * **TRAINING STRATEGY DECOMPOSITION** * Stage1: Pixel dependency learning * Initialize DiT from ImageNet trained model: inexpensive, learn dependency between features. * Stage2: Text-image alignment learning * Construct a dataset consisting of precise text-image pairs * Stage3: High-resolution and aesthetic image generation * Fine-tune the model using high-quality aesthetic data for high-resolution image generation * **DATASET CONSTRUCTION** * Use LLaVA for captioning: better captions  * MODEL ARCHITECTURE  ## Experiments ### Datasets ### Results ## Misc # PixArt-$\delta$ FAST AND CONTROLLABLE IMAGE GENERATION WITH LATENT CONSISTENCY MODELS * **Link:** [[pdf]](https://arxiv.org/pdf/2401.05252) * **Authors:** Huawei * **Comments:** ## Introduction ## Method ## Experiments ### Datasets ### Results ## Misc # PixArt-$\Sigma$ Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation * **Link:** [[pdf]](https://arxiv.org/pdf/2403.04692) * **Authors:** Huawei * **Comments:** ## Introduction ## Method ## Experiments ### Datasets ### Results ## Misc
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up