A Novel Diffusion Model Based Approach for Mask Optimization(MO)

{%hackmd sMV2zv-CTsuIqnpb0hZLmA %} # A Novel Diffusion Model Based Approach for Mask Optimization(MO) ###### tags: `NTUST` `EE` `EDA Lab` `Review Paper` 用Diffusion Modelㄉ方法做Mask Optimization(MO) ## 0 ABSTRACT ### Problem of MO - Model collapse - Diminished performance - particularly with high-resolution layouts 問題：MOㄉ模型崩潰或低效能(尤其在高解析layouts) ### Our techniques - Lithography guided loss function - Denoising sampling algorithm ??????????????? ### Results - Promising printability - Lower computational time 結果：正確ㄉ圖形&較少ㄉ運算時間 # 1 INTRODUCTION ## 1.1 Mask Optimization(MO) - let design image = wafer image - ![upload_a5c6d42960b857c60d137d8c2530f2](https://hackmd.io/_uploads/B1eeB-1Qwp.png =70%x) :::spoiler 附上原文(可以點開但不建議XD) - As the technology nodes continually decrease, the minimum feature size also becomes substantially smaller, and thus the issues like optical diffraction and proximity effects are more and more non-negligible for lithography [1]. ::: > 因為東西都在縮小 > 要注意optical diffraction和proximity effects ## 1.2 Previous MO 1. model-based :::spoiler 原文 - the edges of the initial mask are usually divided into segments, which are moved iteratively under the guidance of the mathematical models. ::: > 將edges分割成segments丟進數學模型 :::spoiler Reference - Kuang et al. [2] proposed a robust mask optimization algorithm to jointly optimize the process window and mask printability. - Su et al. [3] proposed an effective process variation-aware mask optimization framework that optimizes EPE and PV bands simultaneously. - Matsunawa et al. [4] provided an approach for solving the MO problem based on the Bayesian inference technology. ::: 2. inverse lithography technique (ILT)-based :::spoiler 原文 - MO is regarded as an inverse imaging problem that is optimized pixel by pixel. ::: > 化作像素ㄉ逆成像問題 :::spoiler Reference - Gao et al. [5] introduced the method called MOSAIC, which formalizes the EPE violation as a sigmoid function, and derived the closed form of its gradient for the optimization. - Ma et al. [6] developed a gradient-based algorithm to optimize the layout decomposition and mask optimization simultaneously. ::: 3. mahine learning techniques :::spoiler 原文 - a representative technical roadmap comes from the seminal generative adversarial neteworks (GANs), such as [7, 9]. GAN [11] is a popular generative model for image generation, where the key idea is to train a generator and a discriminator by using the adversarial game. However, a major obstacle for applying GANs is called “mode collapse” [12], which is regarded as a central issue of GANs that seriously affects its pattern generation performance in practice. ::: > GAN的弊端：Mode collapse(模式崩潰) :::spoiler 原文 - Mode collapse commonly happens when training GANs; roughly speaking, it refers to the phenomenon that the generator network can only produce limited sample diversity and fail to cover all the patterns of the data distribution. Though a number of training methods have been proposed to tackle mode collapse, it still cannot be well avoided in practice [13]. ::: > Mode collapse： >> 只能產生有限的樣本多樣性，無法覆蓋所有模式。 :::spoiler Reference - Yang et al.[7] developed the GAN-OPC model to achieve better mask optimization performance and the training process can be simplified by pre-training. - Jiang et al. [8] designed a machine learning-based mask optimization acceleration model, and also proposed a mask printability evaluation framework for lithography-related applications. - Chen et al. [9] presented a MO system called DAMO with a high-resolution feature extractor DCGAN-HD. - Jiang et al. [10] proposed Neural-ILT, which is an end-to-end neural network thatattempts to completely replace the traditional ILT process. ::: 4. diffusion model (本篇) :::spoiler 原文 - cutting-edge generative model, called “diffusion model” [14], has revealed significantly better performance in many pattern generation tasks, especially for alleviating the mode collapse issue. ::: cutting-edge generative model 可以緩解 mode collapse :::spoiler 原文 - In fact, the realization of diffusion model for MO incurs several technical challenges. 1. how to control the randomness in the generative process, especially for high-resolution mask design 2. how to take account of the physical constraints in the training process 3. To further guarantee the computational efficiency of the diffusion model, we also propose a guided sampling method which uses low pass filter to guide the denoising process. ::: > 本篇framework called “DDiff_MO” > 要面對的問題 >> 1.如何控制生成的隨機性 > A：??? >> 2.如何加入physical constraints > A：lithography-guided loss function >> 3.確保計算效率 > A：guided sampling method :::spoiler Reference - Wang et al. [15] recently developed a diffusion model based method to generate layout pattern. ::: > 之前只用在生成layout # 2 PRELIMINARIES ## 2.0 Definitions - M：mask - Zt：target layout - Z：wafer image - I：aerial image (空間圖) - image切成𝑁×𝑁個kernel - “𝑚”&“𝑛”：column & row ## 2.1 MO ### 2.1.1 Metrics 1. for printability Edge Placement Error (EPE) [16] - the **distance** between the target edge and the lithographic contour Mean Square Error (MSE) - the pixel-wise difference between the wafer image and the target layout - 𝑀𝑆𝐸 = ||Z − Zt|$\rvert_F^2$ > 做frobenius norm(範數/類正規化)後再平方 2. for robustness穩健性 Process Variation Band (PVB) [16] - denotes the **total area** encompassed by the lithography simulation contours ### 2.1.2 Formal Definition - Hopkins theory[17]不好用 :::spoiler [17]說明 - the “Hopkins theory” of the partially coherence imaging system [17] - a widely used simulation and analysis tool for the lithography process. - the method is complex and usually takes high computational complexity. ::: - 用[18]這本書裡提到的方法 - $$I(𝑚, 𝑛) = \sum_{k=1}^{𝑁^2} 𝛼𝑘|h(𝑚,𝑛) ⊗ M(𝑚,𝑛)|^2$$ - h：lithography kernel - ⊗：convolution operation - 𝛼~𝑘~：kernel related coefficient > 將kernel跟對應Mask做convolution後平方 > 乘以**每個kernel的係數**並**求和** - $$Z(m,n)=\begin{cases}0, & \text{if $I(𝑚, 𝑛)$<$I_0$} \\ 1, & \text{otherwise} \end{cases}$$ - I~0~ is a given fixed threshold > I大於閥值就註記要刻 ## 2.2 Diffusion Model - ![螢幕擷取畫面 2023-12-22 191000](https://hackmd.io/_uploads/rJolQxQPT.png =80%x) - DDPM[14]：擴散(加入高斯噪聲)再恢復(去除) - 透過隨機雜訊獲得隨機圖形 - DDIB[19]：雙重擴散 > 還沒看完 the basic structure and training process > The Framework of Diffusion Model. “Denoising Diffusion Probabilistic Models(DDPM)” [14] is the pioneering work of diffusion models, and we take it as the example to illustrate. The diffusion model is inspired by physical diffusion processes, where it contains a forward process (diffusion process) and a reverse process (denoising process). Both of them are defined as a Markov chain. In the forward process, we gradually adds Gaussian noise to the data according to a variance schedule 𝛽1, ..., 𝛽T until the image becomes a pure Gaussian noise; “𝑇 ” denotes the maximum diffusion step number. The Gaussian noise of each single step can be calculate by: 𝑞(xt |xt−1) := N(xt; √︁ 1 − 𝛽𝑡 xt−1, 𝛽𝑡E) where xt is the set of the latent variables of the t-th step in the same sample space with x0 (the original image), and E is the identity matrix. In the reverse process, we gradually subtract the Gaussian noise from a pure Gaussian noise until data is recovered in the original data space. The Gaussian noise of each single step can be calculate by: 𝑝𝜃 (xt−1|xt) := N(xt−1; 𝜇𝜃 (xt, t), Σ𝜃 (xt, t)) where the parameter set 𝜃 is learned by the diffusion model to fit the data distribution 𝑞(𝑥0). The model predicts the noise at each step in the reverse process, so as to enable the reverse process to generate an image in the source domain from a pure Gaussian noise. The diffusion model is not suitable for solving mask optimization, because we need to generate the corresponding mask from a given design image, rather than generating a random mask from random noise. > The conditional diffusionmodel DDIB. The recently proposed “Dual diffusion implicit bridges(DDIB)” [19] is a conditional diffusion model that in particular can be applied to image-to-image translation problem. It consists of two diffusion models 𝐷1 and 𝐷2 that are trained independently in the source image domain X and the target image domainY. In the test process, DDIB reverses the denoising process of the source diffusion model 𝐷1, so that it receives the source image 𝑋 ∈ X and outputs the latent encoding 𝑍 of the source image 𝑋. Then the target diffusion model 𝐷2 receives the latent encoding 𝑍 and generates the target images 𝑌 ∈ Y corresponding to the source image 𝑋 by the reverse process. The high-level idea for the process is shown in figure 2. Figure 2: Test process of DDIB. X, Z, and Y represent the source image, the latent encoding, and the target image respectively. # 3 OUR DDIFF_MO FRAMEWORK - based on the dual diffusion model DDIB. ### 3.0.1 Major challenge - stylistically close to the target layout - satisfy the physical constraints ### 3.0.2 Our high-level idea > 還沒整理清楚 - modifying the model design for training - and then propose a new denoising sampling algorithm for the mask generation process - A key idea in our model design is to introduce a lithography-guided loss function, which enables the model to not only learn the mask style but also grasp the underlying physical constraints. - In the denoising sampling algorithm, we leverage a low pass filter, which is a particularly designed two-layer network, to guide the sampling process. - The guided sampling algorithm can effectively retain the crucial positional information that is essential for accurate mask representation. ## 3.1 Model Design (Overview of our framework) - ![image](https://hackmd.io/_uploads/HkYTQBCF6.png) - the target diffusion model (classic) - ![image](https://hackmd.io/_uploads/SkY-4HAKp.png) - for training, where 𝜖 is a Gaussian noise, x0 is the training data, 𝜖𝜃 is the module for predicting the noise, and 𝑥𝜃 is the module for predicting x0. - 為了加入 physical constraint 修改一下上面ㄉloss func. - Thus we design a lithographyguided loss function that incorporates lithography results with 𝑙2-norm error to enhance the generation quality for mask image. - The design image and mask image pair is represented by (y0, x0) in our loss. - We denote the lithography simulator as 𝑆, where a lithography simulator is a simulation model that can output the corresponding design image for a given mask image using the available lithography engine from ICCAD 2013 CAD Contest benchmark [16]. - Also let 𝜆 > 0 be the coefficient of the lithography error term. The new loss function can be expressed as Equation (3): - ![image](https://hackmd.io/_uploads/r19QVBCKT.png) - The detailed training process is described in algorithm 1. - ![image](https://hackmd.io/_uploads/BJY44B0Kp.png) 1. In each iteration, we first randomly sample a pair of data points (y0, x0). 2. Then we sample a step 𝑡 from [1,𝑇 ], sample the noise from N(0, 1), and compute the t-th step latent variable xt. 3. Finally, we use our proposed loss function (3) to compute the loss and update the model parameters with gradient descent method. ## 3.2 Mask Denoising Method (elaborate on technical) > 還沒看懂 3.2 Mask Denoising Method (elaborate on the technical parts) Now we turn our attention to the denoising process which directly determines our mask generation result. We first revisit the classical diffusion method [20], where the denoising process is typically iterated based on Equation (4)--------------- where 𝛼𝑡 and 𝜎𝑡 are prespecified constants, 𝜖 (𝑡 ) 𝜃 is the module for predicting the noise in the next step, and 𝜖𝑡 is a Gaussian noise. In this equation, the first two terms are both deterministic and the third term is the only random part. So if 𝜎𝑡 is set to 0, the entire iteration process becomes deterministic. That is, we can obtain the design image y0 from Gaussian noise through this denoising process by the source diffusion model. Similarly, we can also obtain the Gaussian noise associated with the design image y0 through the reverse process of the denoising procedure. We refer to this inverse procedure derived by (4) as the “reverse sampling” process. After the reverse sampling, which relies on the source diffusion model 𝐷1 trained on the design image domain, we obtain the latent encoding z0 of the design image y0. However, despite obtaining the latent encoding of y0, it is still challenging to compute the corresponding data x0 in mask image domain. The reason is that the diffusion model introduces too much randomness in the denoising process, where each sampling step needs to add a random Gaussian noise to the data. As a consequence, if without effective control, the model often fails to generate high-quality masks x0 from a given target design image y0. This motivates us to design a guided sampling method to constrain the randomness in the denoising process. Guided sampling method. Our idea is based on the simple observation that the design image and mask image pair (y0, x0) have the same position information. That is, the patterns in x0 and their corresponding patterns in y0 should be located at the same positions in these two images. Thus we expect to maintain the positional information unchanged during the generation process. According to the analysis in the recent work [21], positional information of an image in diffusion model often reveals low frequency feature, which means that it changes slowly when adding noise. So we employ a low pass filter, which is a two-layer network for capturing low frequency features, to extract the positional information and consequently guide the denoising process. During the process, we use the gradient descent algorithm to optimize the difference between the positional informations of the newly generated mask image and the design image. The difference can be calculated by the cosine similarity. The algorithm is described in algorithm 2 and the data flow is shown in Figure 3(b). We use xt to represent the latent variable at t-th step and y0 to represent the design image. At the t-th step of sampling, the algorithm begins by obtaining x′ t−1 from xt through the diffusion sampling process. Then, it uses a low-pass filter to extract the positional information P1 and P2 from y0 and x′ t−1, respectively. Next, the algorithm computes the similarity between P1 and P2, and updates x′ t−1 by using the gradient ascent method to obtain xt−1. We repeat this process until the final result x0 is achieved. # 4 EXPERIMENTAL RESULTS ### 4.0.1 Datasets - the combination of the datasets from [7, 22]. :::spoiler [7, 22]細節 - Yang et al. [7] - by ICCAD 2013 CAD Contest benchmark [16] - It contains 4000 training instances. - Jiang et al. [22] with [23] - generative learning-based pattern generation framework - It also contains 4000 training instances. ### 4.0.2 Implementation details - NVIDIA GeForce RTX 3090 with Pytorch library - original images：512 × 512 - diffusion step：1000 - iteration number：12000 - batchsize：1 - learning rate：0.0001 - #ResNet block：3 - reverse sample step of the source diffusion model：500 - sample step of the target diffusion model： 500 ## 4.1 MO Performance - ![未命名-1](https://hackmd.io/_uploads/r1me-_MF6.png) :::spoiler 原文 - After obtaining the initial mask image from diffusion model, we use an ILT engine to refine the performance of the mask image. ::: DDiff_MO製造Original mask image再交給ILT處理成Final mask ### 4.1.1 Methods - Conventional ILT [5] :::spoiler ILT：Mosai algorithm[25] - Su Zheng, Yuzhe Ma, Binwu Zhu, Guojin Chen,Wenqian Zhao, Shuo Yin, Ziyang Yu, and Bei Yu. Openilt: An open-source platform for inverse lithography technique research. https://github.com/OpenOPC/OpenILT/, 2023. ::: - Mahine learning techniques :::spoiler Generative model based PGAN-OPC：[7] - Haoyu Yang, Shuhe Li, Yuzhe Ma, Bei Yu, and Evangeline FY Young. Gan-opc: Mask optimization with lithography-guided generative adversarial nets. In Proceedings of the 55th Annual Design Automation Conference, pages 1–6, 2018. ::: :::spoiler Learning-based method Neural-ILT：[10] - Bentian Jiang, Lixin Liu, Yuzhe Ma, Hang Zhang, Bei Yu, and Evangeline FY Young. Neural-ilt: Migrating ilt to neural networks for mask printability and complexity co-optimization. In Proceedings of the 39th International Conference on Computer-Aided Design, pages 1–9, 2020. ::: ### 4.1.2 Metrics (Same as 2.1.1) - “L2”：mean square error(MSE) > 為什麼要改名叫L2再另外解釋@@ - “PVB”：process variation band - "EPE"：edge placement error ### 4.1.3 Results - ILTㄉ#iterations：40 - ![image](https://hackmd.io/_uploads/Sk54oc4Ka.png) > Case 1 in [16] by our method - ![image](https://hackmd.io/_uploads/SyH8j5Vt6.png) :::spoiler 原文 - The results show that our method can achieve the lowest average values for 𝐿2 and 𝑃𝑉𝐵, and in particular outperforms the previous machine learning based approaches PGAN-OPC and Neural-ILT. ::: 結果：L2跟PVBㄉ平均值可以最低優於2個ML法 ## 4.2 Computational Efficiency - 另外兩篇用ㄉILT不同沒有拿來比較 :::spoiler 原文 - To illustrate the convergence curve accurately, we take “case 1” as an example for plotting. - The curves in figure 6 show the convergence trajectories of ILT and ILT with DDiff_MO generation as the initial input. ::: - ![image](https://hackmd.io/_uploads/ByciqcVt6.png) > 也是case1 :::spoiler 原文 - The results shown in the figure reveal that using the mask image generated by DDiff_MO as the initial input can accelerate the convergence of the conventional ILT method, and meanwhile leads to slightly lower L2 loss. ::: - [ILT處理(DDiff_MO結果)] 比起 [ILT處理(原始image)] 可以加速ILT收斂並降低L2 - ![image](https://hackmd.io/_uploads/rJiqtcEt6.png) :::spoiler 原文 - Moreover, we test the whole runtime of the mask optimization process until convergence. For ILT, we test the entire iteration time. - For DDiff_MO, we evaluate both the generation time and the iteration time. - Due to its faster convergence speed, even though DDiff_MO requires extra time to generate the initial image, it still takes a shorter overall runtime. - The average time to generate an image using the diffusion model is approximately 26 seconds, while ILT takes around 3 seconds per iteration. - The experimental results demonstrate that our approach can accelerate the ILT method by 10% around. ::: {(DDiff_MO製造image)+[ILT處理(DDiff_MO結果)]} 比 [ILT處理(原始image)] 快10% ## 5 CONCLUSION :::spoiler 原文 - For the future work, it is deserved to consider large scale design image by taking the advantage of diffusion model with further acceleration algorithms. ::: > 證實Diffusion model可以用在MO > 未來可以用在大ㄉimage再加上加速演算法 解決模型崩潰 --- ## Comments - A. Overall score (1-5) - B. Overall technical contents (1-5) - B.1 originality, novelty (1-5) - B.2 significance of results (1-5) - B.3 readability (1-5) - C. Overall presentation (1-5) - C.1 English usage (1-5) - C.2 clarity (1-5) - C.3 adequacy of references (1-5) - D. Confidence on your decision (1-5) - E. Comments to authors : - I would recommend standardizing the usage of terms ("MSE" in Section 2.1 and "L2" in Table 1) to avoid confusion. - The description (Section 4.1 line 13) of each method listed in Table 1 can be reordered. - In Table 1, "Neural-ILT" is mistakenly written as "Neural-TLT".