# **Diffusion Model Papers** ## **What is diffusion model** A diffusion model is a type of generative model that learns to generate data by starting with a high-entropy latent representation and gradually adding detail to it. This is done by repeatedly applying a diffusion process, which adds noise to the latent representation and then removes it. The diffusion process is reversible, so the model can also be used to generate new data by starting with a noise vector and then gradually adding detail to it. Diffusion models are trained using a technique called variational inference. Variational inference is a way of finding the parameters of a generative model that best explain a given dataset. In the case of diffusion models, the goal is to find the parameters of the diffusion process that best explain the distribution of the data in the latent space. Diffusion models have been shown to be effective at generating high-quality images, text, and other types of data. They are particularly well-suited for generating data that is high-dimensional and complex, such as natural images. Here are some of the advantages of diffusion models: * They can generate high-quality data. * They are efficient to train. * They can be used to generate data from a variety of domains. Here are some of the disadvantages of diffusion models: * They can be difficult to train. * They can be sensitive to the choice of hyperparameters. * They can be computationally expensive to generate data. Overall, diffusion models are a powerful tool for generating data. They are still under development, but they have the potential to revolutionize the way we create and interact with data. Here are some of the popular diffusion models: * Deep Diffusion Models (DDPM) * PixelCNN++ * BigGAN * Imagen * DALL-E 2 These models have been used to generate a variety of realistic and creative images, text, and other types of data. ## **Paper survey of diffusion model on segmentation** ## **Few-shot or Label-Efficient sementic segmentation** ### **LABEL-EFFICIENT SEMANTIC SEGMENTATION WITH DIFFUSION MODELS** [(PDF)](https://arxiv.org/pdf/2112.03126.pdf) ![](https://hackmd.io/_uploads/S1IhcQmT3.png) * Motivation: propose a simple semantic segmentation method, which exploits these representations and works successfully even if only a few labeled images are provided. * Method: Proper timesteps and block size have higher and more stable performance, too less or too more will devrease the performance. * Restriction: ## **Semi-supervised** ### **SEMI-SUPERVISED SEMANTIC SEGMENTATION OF CELL NUCLEI VIA DIFFUSION-BASED LARGE-SCALE PRE-TRAINING AND COLLABORATIVE LEARNING** [(PDF)](https://arxiv.org/pdf/2308.04578.pdf) ![](https://hackmd.io/_uploads/SkFq5dO63.png) * Motivation: Supervised deep learning models for cell nuclei segmentation demands a large amount of pixel-level annotation by experts which can be labor-intensive and error-prone. * Method: <!-- New point of semi-supervised learning, use Unsupervised large-scale pre-training with unlabeled data to be alternative framework. --> 1. Diffusion-based large-scale pre-training: A diffusion model is pre-trained on a large-scale unlabeled dataset. The diffusion model learns to represent the semantic information in the images in a latent space. 2. Semantic feature aggregation: A transformer-based decoder is used to aggregate the semantic features from the latent space. The decoder is trained on the small amount of labeled data. 3. Collaborative learning: This is the semi-supervised learning part of the architecture. A collaborative learning framework is used to further improve the segmentation performance. The diffusion model and the supervised segmentation model are trained together to iteratively refine each other's predictions. ### **Diffusion Models and Semi-Supervised Learners Benefit Mutually with Few Labels** [(PDF)](https://arxiv.org/pdf/2302.10586.pdf) ![](https://hackmd.io/_uploads/HkiQ1VV1a.png) ## **Self-supervised** ## **Unsupervised** ### **DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery** [(PDF)](https://arxiv.org/pdf/2303.09813.pdf) ![](https://hackmd.io/_uploads/rJzL3al6n.png) * Motivation: Aims to exploit pixel-level visual knowledge from pre-trained diffusion generation models, for downstream discriminative tasks. * Method: 1. Synthesis stage: Using a pre-trained model on large datasets, the model it used is **[Stable Diffusion](https://arxiv.org/pdf/2112.10752.pdf)**, and apply cross attention and self-attention to create rough masks. 2. Exploitation stage: Fine tune the rough masks by reverse the image back to noisy and combine the class predicted by **[CLIP](https://openai.com/research/clip)** then put the pair into diffusion model. At the output of each blocks, send it into segment decoder to generate fine-tuning masks. * Restriction: 1. The proposed method requires a dataset of manually segmented images to train the conditional diffusion model. This can be a limiting factor, as it can be difficult and time-consuming to obtain a large enough dataset of manually segmented images. 2. The proposed method is still under development and could be improved in several ways. For example, the authors could explore ways to improve the performance of the conditional diffusion model, or they could develop methods for incorporating more types of prior knowledge into the model. ####