Applications and Limits of Image-to-Image Translation Models

{%hackmd @themes/dracula %} ## Problems - Due to contexts(e.g., industrial [2] or medical [3]) privacy concerns and realtime constraints, those AI services need to be deployed in the loco - trained on limited sets of images typically lead to poor quality outputs ## I2I TRANSLATION - supervised - unsupervised Supervised I2I ![image](https://hackmd.io/_uploads/ByZO1nBBa.png) > learning the conditional probability of samples drawn from a joint probability distribution. Unsupervised I2I ![image](https://hackmd.io/_uploads/B1ESy3Hr6.png) > conditional mappings must be learned from samples drawn from marginal distributions. To restrict the mapping space to reasonable highquality images (Unsupervised I2I) > cycle consistency or symmetry losses > weight-sharing > shared latent spaces Contrastive learning (Unsupervised I2I) > maximize the mutual information between source and translated images ## I2I APPROACHES ==Pix2Pix== [Note](/MFgtPJ5KSMeO5m5tsnSk6A) - the first successful proposed conditional image synthesis method employing a conditional GAN - combines an adversarial loss and an L1 loss to force low-frequency correctness. ==CycleGAN== [Note](/0Xez8cHrTDiqqAxNQqZB_g) - a cycle consistency loss to limit the mapping space and obtain an adversarial autoencoder - reduces mode collapse **Pix2Pix, CycleGAN ignores input noise, resulting in lack of diversity.** ==StarGAN== - reduce both the number of parameters and mappings when multiple domains are considered - employs a classification loss in an adversarial fashion. - drawback: not present any stochastic variations ==DRIT== - diversity - disentangles domain content and domain attribute spaces using weight sharing - cyclic reconstruction: a crosscycle constraint using cross-translations between input images ==UNIT== - unsupervised - a weight-sharing and a shared latent space constraint to perform a uni-modal translation. ==CUT== - operates at patch-level rather than image-level, using contrastive learning to replace cycleconsistency - efficiency providing a onesided translation and reduces the number of training samples to be used for maximizing the mutual information between source and translated patches - enforces a shared latent space for patches related to similar areas and benefits from intra-relationships within the image. ==StarGANv2== - disentangles image generation and style encoding to create a scalable approach able to generate diverse images ==NEGCUT== - uncover a limitation: contrastive learning-based I2I methods - heavily relies on negative examples able to efficiently push closer positive to query patches. - set-labels may not be available to be associated to domains of interest ==TUNIT== - **truly** unsupervised setting - pseudo-domain labels are obtained maximizing the mutual information between pairs of samples while style features are defined by means of a contrastive loss. **limitation: Previous methods consider source and target images as a whole and tend to fail with images containing multiple instances** Multi-instance transfiguration problems - Towards instance level image-to-image translation - Instagan: Instance-aware image-to-image translation - Instaformer: Instanceaware image-to-image translation with transformer ## LIMITATIONS AND SOLUTIONS ### Mode collapse and training instability :::success Mode collapse 理想上不同z會有不同的輸出，Mode collapse表示多個z對應到同一個輸出 ![image](https://hackmd.io/_uploads/BJW9hecST.png) source: https://www.youtube.com/watch?v=TLc6u8jwt7M&ab_channel=NeilRhodes Factors - Architecture - Bad hyperparameters - Training data quality - Discriminator is too good than generator ::: Solutions - MSGAN: mode collapse - [22]: vanishing gradient problem caused by easy negative samples. :::success training instability due to the highdimensional non-convex space and may not converge limiting the diversity requirement ::: Solutions - LSGAN loss - for non-overlapping distributions, Wasserstain distance metric ### Imbalanced or limited data Domain Bridge for Unpaired Image-to-Image Translation and Unsupervised Domain Adaptation [Note](https://hackmd.io/Yguj-oYVT4KroJADxfjvAg) ### Metrics Frechet Inception Distance (FID): measuring fidelity Learned Perceptual Image Patch Similarity (LPIPS): 圖片過於平滑SSIM類的會失效 ![image](https://hackmd.io/_uploads/BydqqFiSp.png) ### Large models ### Validation Lack of validation studies, mainly for industrial and medical fields