加速計算光刻以實現光學鄰近校正 [S71759]

加速計算光刻以實現光學鄰近校正 [S71759] 研究員，三星電子高階主管特權副總裁我們將展示三星如何使用 NVIDIA GPU 和新採用的 NVIDIA cuLitho 加速其光纖鄰近校正 (OPC) 工作負載。隨著半導體製造中光刻複雜性的不斷增加，透過 cuLitho 的 GPU 加速使三星實現了顯著的效能提升，提高了關鍵 OPC 流程的速度和效率。關鍵要點： * CuLitho 整合：最近採用 NVIDIA 的 cuLitho 庫，進一步提高計算微影的效能 * GPU 加速 OPC：三星如何利用 NVIDIA GPU 加速光纖鄰近校正工作負載 * 效能提升：由於 GPU 和 cuLitho 的協同作用，OPC 流程的速度和效率均顯著提高 * 可擴展性：基於 GPU 的解決方案在滿足半導體製造日益複雜的需求方面發揮的作用 * 未來展望：透過持續的 GPU 優化，計算光刻技術未來可能會得到增強主題：模擬/建模/設計 - 物理產業領域：硬體/半導體技術等級：技術 - 高級目標受眾：企業主管所有目標受眾類型：研究：非學術 3 月 22 日，星期六凌晨 1:00 - 凌晨 1:40 中部標準時間 S71759_加速計算光刻用於光學鄰近校正.pdf AI逐字稿我想向大家介紹今天的演講者。尚蒂（Shanty）目前在三星電子（Samsung Electronics）擔任研究員。他隸屬於黑人部門（Black Men Division），負責三星記憶體與代工產品的光學鄰近校正（Optical Proximity Correction, OPC），並開發內部的OPC軟體。他在半導體產業擁有26年的豐富經驗，其中包括在英特爾公司（Intel Corporation）工作22年。在英特爾期間，他曾擔任過負責光學鄰近校正的小型研究員（Little Fellow），後來成為技術與器件建模（Technology and Device Modeling）的總監。今天，他將為我們介紹如何加速計算攝影技術（Computational Photography）應用於光學鄰近校正。 I’d like to introduce you to the speaker. Shanty is currently a fellow at Samsung Electronics. He’s in the Black Men Division. He’s in charge of optical proximity correction for Samsung memory and foundry products and in-house OPC software. He has 26 years of experience in the semiconductor industry, including 22 years at Intel Corporation, where he was a little fellow in charge of OPC and later became the director of technology and device modeling. For today’s talk, he will be presenting on accelerating computational photography for optical proximity correction. 非常感謝大家。各位早安！感謝剛才精彩的介紹。我要感謝所有參加這場活動的人，特別是你們能堅持到GTC的最後一天。我很榮幸能在此介紹我的團隊與三星（Samsung）和輝達（NVIDIA）之間的合作，並展示各種技術如何在光學鄰近校正（Optical Proximity Correction, OPC）領域加速半導體製造的經濟效益。正如剛才介紹的，我在半導體製造研發領域已有近30年的經驗。在過去25年中，我一直是光學鄰近校正的實踐者，從這項技術的萌芽時期就開始參與。我的背景是實驗物理學（Experimental Physics）和光學（Optics）。我並非軟體專家，在軟體演算法（Software Algorithm）或硬體（Hardware）方面沒有太多經驗。但我相信，我能為大家帶來一個全新的視角，來說明即時技術（Immediate Technology）如何為我所從事的領域帶來顯著改變。 Thank you very much. Good morning to everybody! Thanks for the nice introduction. I want to thank all of you for attending and staying until the last day of GTC. I feel honored to present the collaboration between my team, Samsung, and NVIDIA, and to show how various technologies can accelerate economic semiconductor manufacturing in the area of optical proximity correction. As was introduced, I’ve been in semiconductor manufacturing R&D for almost 30 years. For the past 25 years, I’ve been a practitioner of optical proximity correction since its inception. My background is in experimental physics and optics. I’m not a software expert and don’t have much background in software algorithms or even hardware. But I think I can bring you a fresh perspective on how immediate technology can bring significant changes to the world I work in. 今天，我將為大家簡要介紹什麼是光學鄰近校正（Optical Proximity Correction, OPC）。這場演講的目的之一，是讓大家更了解半導體製造中計算層面（Computational Aspects）的重要性，特別是在經濟效益的實現上。這樣，我未來或許能獲得更多協助。此外，我也想表達對輝達（NVIDIA）與我們合作的感激之情，這種合作賦予我們更多可能性。我將向大家解釋，為什麼光學鄰近校正需要如此大量的運算資源（Computer Resources），以及為什麼縮短周轉時間（Turnaround Time）對於半導體製造中的光學鄰近校正至關重要。 Today, I’m going to give you a brief introduction to what optical proximity correction is. One of the purposes of this presentation is to raise awareness about the computational aspects of semiconductor manufacturing, particularly in terms of economic benefits, so that I can get more help in the future. In addition to expressing my appreciation for the collaboration that NVIDIA has provided to enable us, I’m going to explain why so much computing power is needed in OPC and why reducing turnaround time is important for OPC in semiconductor manufacturing. 我將向大家展示我們合作的一些成果，並在演講結束時提到一些未來的合作項目，希望大家能一同思考。這個圖表顯示了Y軸上的特徵尺寸（Feature Size），也就是三星主要產品DRAM的特徵尺寸。從時間的函數來看，你們可以看到晶圓上印刷的特徵尺寸呈現指數級下降，這樣才能實現越來越高的位元密度（Bit Density）。未來幾年，DRAM製造將轉向三維堆疊（3D Stacking），以繼續提升位元密度的縮放。快閃記憶體（Flash Memory）的故事也類似。縮放一直是通過在晶圓和快閃記憶體上製作更小的特徵來推動的。對於快閃記憶體來說，向三維結構（3D Structure）的轉換比其他設備更早發生。這是一個更大的案例，晶圓上印刷的特徵尺寸，也就是電晶體尺寸（Transistor Size），在過去50年中一直在穩定地指數級減小。目前，正如Jensen在主題演講中提到的，你可以在單一晶片（Die）中放入數千億個電晶體。設備的縮放得益於多個領域的大量創新，包括成熟的創新和結構創新，例如鰭式場效電晶體（FinFET）和其他結構。但實現設備縮放的關鍵技術之一，是能夠印刷更小特徵的能力，而這縮放的核心在於光刻技術（Photolithography）。 I’m going to show you some results of our collaboration. I’m going to finish the presentation with some future collaboration items that I’d like to ask you to think about. So, the graph shows the feature size on the Y-axis, you know, the feature size on DRAM, Samsung’s main product. As a function of time, as you can see, there is literally an exponential decrease in the size of the features that we print on the wafer in order to enable bigger and bigger bit density. In a few years, DRAM manufacturing will be moving toward 3D stacking to continue the bit density scaling. A similar story goes for flash memory, you know. The scaling has been driven by making the features smaller on the wafer and flash. For flash memory, the convergence to the 3D structure happened earlier than with other devices. This is a larger case where the size of the features, the transistor size that we print on the wafer, has been steadily decreasing exponentially over the past 50 years. And currently, as Jensen said in his keynote, you can fit hundreds of billions of transistors in a single die. The scaling of the device has been enabled by a large number of innovations in several areas. There are mature innovations and also structural innovations like FinFET and other structures. But one of the key technologies that enables this scaling down of the device is the capability to print smaller features, and at the heart of that scaling stands something called photolithography. 由於我們使用光線，這是光刻機（Photolithography Machine）的剖面圖，你可以把它想像成一台非常非常昂貴的掃描器或影印機。基本上，你在右上方放入電路圖案（Circuit Patterns），這些圖案最終會變成數千億個電晶體，放在一個叫做光罩（Photo Mask）的東西上，並將其置於這個成像系統的物體側。這個系統由非常精密的鏡頭（Lenses）和反射鏡（Mirrors）組成，用來將光罩上的圖像投影到晶圓（Wafer）上。一旦圖像轉移到晶圓上，就會進行蝕刻（Etching），然後沉積材料並進行拋光（Polishing）。接著，經過多次額外的半導體加工（Semiconductor Processing），才能製造出你今天看到的設備。在這台機器中，製作更小圖案的關鍵參數之一是波長（Wavelength）。這是一個光學系統（Optical System），所以它使用光線。你可以這樣理解。 Since we use light, this is a cutaway view of a photolithography machine, and you can think of it as essentially a very, very expensive scanner or copy machine. Basically, what you do is, in the upper right, you put the circuit patterns that end up as hundreds of billions of transistors onto something called a photo mask and place it on the object side of this imaging system, which is composed of very sophisticated lenses and mirrors to project that image from the photo mask onto the wafer. Once the image is transferred onto the wafer, it gets etched, material gets deposited, and polished. And then it goes through additional semiconductor processing multiple times to produce the device you see today. One of the key parameters in this machine for making smaller patterns is something called wavelength. This is an optical system, so it uses light. You can think about it that way. 光線的波長（Wavelength）就像畫筆的大小。如果你的畫筆很小，你就能畫出非常精細的特徵。同樣地，如果機器的波長越小，你就能畫出越小的特徵。這是關鍵的控制因素之一，縮減波長是我們用來在晶圓（Wafer）上實現更小特徵的方法。這張圖表是我從輝達（NVIDIA）的一位合作者、也是我以前的同事那裡借來的。它顯示了Y軸上的特徵尺寸（Feature Size），並隨著時間的推移，我們一直在縮減設備的特徵尺寸。除此之外，我們還展示了光刻機（Lithography Machine）的波長，從436奈米開始，經過365奈米、248奈米，到193奈米的光線。最後，到達13奈米，這被稱為極紫外光（Extreme Ultraviolet, EUV），目前正應用於製造中。你會注意到，從248奈米開始，你想要印刷的特徵尺寸已經小於你所使用的光線波長。基本上，你的畫筆要畫出非常精細的特徵，但使用的卻是一個相當固定的尺寸，因此光線波長（你的畫筆）和你試圖印刷的設備尺寸之間的差距，被稱為次波長差距（Sub-Wavelength Gap）。在這個次波長區域，我們已經生活了將近20年。這涉及到許多基礎物理學（Physics）和化學（Chemistry），影響著半導體圖案化（Patterning）的過程。 The wavelength of the light is like the size of the brush. If your brushes are small, then you can draw very fine features. Likewise, if the wavelength of the machine is smaller, then you can draw very small features. That’s one of the key knobs; the reduction of the wavelength is what we’ve been using to deliver smaller features on the wafer. This is a graph that I borrowed from a colleague at NVIDIA, my collaborator and former colleague. It shows the feature size on the Y-axis, and as a function of time, we’ve been delivering devices with shrinking feature sizes as time goes along. On top of that, we are showing the wavelength of the lithography machine. It starts with 436 nm, then 365 nm, 248 nm, and 193 nm light. And finally, 13 nm, which is called extreme ultraviolet, currently being employed in manufacturing. One thing you can notice is that starting from 248 nm, the feature size you want to print has been smaller than the wavelength of the light you are using. Basically, your brush is trying to draw very fine features using a pretty constant process size, so the gap between the wavelength of the light, your brush, and the device size you’re trying to print is what we call a sub-wavelength gap. This is a similar gap, and in this sub-wavelength region, we’ve been operating for almost 20 years. There’s a lot of fundamental physics and chemistry that comes into play in semiconductor patterning. 簡單回顧一下，我們將光罩（Photo Mask）上的電路圖案（Circuit Patterns）放入掃描器的物體側（Object Side），然後將圖像轉移到晶圓（Wafer）上。假設在這個次波長環境（Sub-Wavelength Regime）下，會發生一些情況。比方說，你有這些黃色的圖案，假設這是你想在晶圓上印刷的電路圖案。如果你直接把這個圖案放在光罩上，會發生什麼呢？你會發現，你能得到一些相似的特徵，但孤立的線條會變得更細，有些孤島（Island）甚至無法印刷。這是由於光學干涉（Optical Interference）造成的，因為你的波長（你的畫筆）比特徵尺寸還要大，這是非常基本的物理原理。你無法用這種程度的光學圖像失真（Image Distortion）來製造設備。所以我們採取了一種叫做光學鄰近校正（Optical Proximity Correction, OPC）的方法。它的作用基本上是進入並調整光罩的形狀。你可以放大那些孤立的圖案，也可以放大那些孤島。 To recap, we put the photo mask pattern, the circuit patterns, onto the scanner’s object side, and that gets the image onto the wafer. Let’s say, unless something happens in this sub-wavelength regime, let’s say you have these yellow patterns. Suppose these are some circuit patterns you want to print on the wafer. If you put this pattern on the mask as it is, what happens is that you can see you get a little bit of similar features, but the isolated lines get thinner, and some of the islands don’t even print. This is due to optical interference, because your wavelength, your brush, is larger than your feature size—it’s very fundamental physics. You can’t really manufacture a device with this level of optical image distortion. So what we do is something called optical proximity correction. What it does is basically go in and manipulate the shape of the mask. You can size up those isolated patterns. You can size up those isolated islands. 此外，你還會沿著線條進行非常精細的調整，確保晶圓（Wafer）上的最終圖案能在正確的位置印刷出正確的尺寸。基本上，我們會觀察光罩圖案（Mask Pattern），看看晶圓上發生了什麼，然後相對地進行調整，直到晶圓上的形狀正確為止。這是一個高度迭代（Iterative）的過程。這個光學鄰近校正（Optical Proximity Correction, OPC）從25多年前開始，現在已經成為半導體製造中非常關鍵的一部分。這是半導體製造中一個非常獨特的步驟，完全依賴計算（Computational），需要大量的運算能力（Compute）。如果你看看我們生成的光罩形狀，左上方是傳統的做法，當我們開始進行光學鄰近校正時，只是這裡那裡做些調整。現在我們轉向非常複雜的形狀，你可以看到一些曲線形狀（Curvilinear Shapes）。你需要確保光罩的圖案寬度（Width of the Pattern）與所需的圖案寬度一致，還要考慮這些圖案的顏色（Colors）。你必須測量這些複雜圖案周圍的所有尺寸和角度。如果你看那些綠色的特徵（Green Features），會發現有一些大圖案，旁邊還有一些小小的塗鴉（Scribbles），這些被稱為次解析度輔助特徵（Sub-Resolution Assist Features, SRAF）。這些小東西不會印刷在晶圓上，但它們能幫助將光線引導到主要特徵（Main Feature），提升主要特徵的表現。這些是我們必須採用的關鍵技術，以確保圖案能在晶圓上以良好的品質印刷出來。 Also, you make very fine adjustments along the line so that the final pattern on the wafer is printed to the right size at the right locations. Basically, what we do is look at the mask pattern similarly to what’s going on with the wafer, and then you make these adjustments relatively until you get the right shape on the wafer. This is a highly iterative process. And you know, this OPC, which started more than 25 years ago, has now become a very critical part of semiconductor manufacturing. This is a very unique step in semiconductor manufacturing, which is purely computational and requires a huge amount of compute. If you look at the mask shapes that we are generating, the top left is the traditional approach when we started OPC—making some adjustments here and there. Now we’ve moved on to very complex shapes where you can see some curvilinear shapes, and you need to make sure the width of the mask, the width of the pattern that you have, matches the width of the pattern. Also, the colors of those patterns—you have to measure all these dimensions around all these angles, very complex patterns. Also, if you look at the green features, you see that there are big patterns and little tiny scribbles next to them. Those are called sub-resolution assist features. These little things don’t print on the wafer, but they help diffract the light into the main feature to improve the performance of those main features. These are essential technologies we need to employ to ensure the patterns are printed with good quality on the wafer. 如果你看向右邊，你會看到一系列的接觸圖案（Contact Patterns）。這些是你想在晶圓上印刷的小方形孔（Square Holes）。但為了印刷這些圖案，最終你在光罩上得到的卻是右邊的樣子。 If you move to the right, you see this series of contact patterns. These are the little square holes that you want to print on the wafer. But in order to print those on the wafer, what you end up with is what’s on the right. 藍色框框裡有很多曲線（Curvilinear）且精細的特徵（Scully Features）。如果你看看接觸孔（Contact Holes），它們的形狀不再只是圓形或橢圓形，而是經過精心設計和工程化的形狀。我們必須在光罩（Photo Mask）上實現這些形狀，才能在晶圓（Wafer）上獲得正確的圖案化。而最複雜的情況是所謂的反向光刻（Inverse Lithography）。在最右邊，你可以看到光罩上有非常波浪狀的圖案，但最終在晶圓上印刷出來的是接觸孔的系列。簡單來說，光罩上的物體平面（Object Plane）和晶圓上的結果之間，並沒有一對一的對應關係。因此，圖案的複雜度演變到了一個地步，光罩本身變得非常複雜，幾乎像是一個充滿物體元素的集合。當然，我們必須擁有非常準確的預測模型（Predictive Models），因為我們處理的精度是以奈米（Nanometer）為單位。我們需要將圖案的尺寸和位置控制在0.345奈米以內，這需要非常精確的邊緣放置（Edge Placement）。這種預測模型需要大量的數據（Fat Data），因為我們必須將模型與這些數據進行比較。此外，還需要運用大量的物理（Physical）、機械（Mechanical）、化學（Chemical）演算法，以及一些近似方法，例如機器學習（Machine Learning），才能建立良好的光學鄰近校正（Optical Proximity Correction, OPC）預測模型。因此，光學鄰近校正已經進化了很多，涉及非常精密的演算法（Sophisticated Algorithm）和模型驅動（Modeling Drives）。這些都推動了我們在光學鄰近校正開發中對運算能力（Compute Capacity）的巨大需求。 The blue box is where you have a lot of curvilinear, finely sculpted features. And if you look at the contact holes, the shape is not just a circle or ellipse. It is a heavily designed, engineered shape that we have to deliver on the mask to get the right patterning. The ultimate complexity is something called inverse lithography, where on the far right, you see these very wavy patterns on the photo mask. But it ends up printing a series of contact holes on the wafer. Simply, there is no one-to-one correlation between what is on the object plane of the mask and what’s on the wafer. So truly, this pattern complexity has evolved in such a way that the photo mask becomes a very complicated collection of object elements. Of course, we have to have very good predictive models because the accuracy we are dealing with is on the nanometer scale. We need to get pattern size and placement within 0.345 nanometers. So it needs very accurate placement of the edges. This predictive model requires a whole bunch of fat data because we need to compare the model against that data. Also, you need to employ a lot of optical, physical, mechanical, and chemical algorithms, and some approximations, such as machine learning, to have good predictive models in this OPC. So OPC has evolved quite a bit, and there are very sophisticated algorithms and modeling drives. The compute need that we have to employ for this OPC development is significant. 通常，光學鄰近校正（OPC）的開發需要數週時間。之後，我們需要進行磁帶輸出（Tape Out），而光罩製作（Mask Making）又需要另外幾週時間。在光學鄰近校正開發期間，如果晶圓廠（Fab）—這可是投資數百億美元的設施—只能等待你完成。如果你弄錯了，如果你的光學鄰近校正出了問題，我們可能在幾週後甚至幾個月後才會發現。期間用不完美的光學鄰近校正處理的所有晶圓（Wafers），基本上就成了廢品（Junk）。因此，作為光學鄰近校正的實踐者，我們承受著巨大的壓力，需要在非常短的時間內提供高品質的解決方案。這推動了對光學鄰近校正運算能力（Compute Capacity）的巨大需求。這張圖表顯示了三星（Samsung）在過去對運算能力的投資隨著時間的變化，以及我們對未來的預測。圖表中包括了在不同使用案例中對中央處理器（CPU Investment）的投資，而白色部分則顯示了圖形處理器（GPU Investment）的投資。 The problem is that typically, OPC development takes about several weeks. After that, we have to tape out, and the mask-making takes another several weeks. While the OPC is being developed, if the fab, which costs tens of billions of dollars to invest in, is just waiting for you to finish. And if you get it wrong, if your OPC is incorrect, we might find out several weeks later or even months later. Then all the wafers you processed during that time with imperfect OPC are basically junk. So there’s a lot of pressure on us, the practitioners of OPC, to have a very high-quality solution within a very short amount of time. That drives a huge compute capacity need for OPC. And with this graph, what the graph shows is Samsung’s investment in compute capacity as a function of time in the past and our projection into the future. The program includes CPU investment in different use cases, and the white part shows the GPU investment. 三星（Samsung）在我們的運算解決方案（OPC Solution）上具有一定的遠見，決定採用圖形處理器（GPU Solutions）。你可以看到，這在2019年真正開始起飛。因為我們從一開始就知道，某些模擬（Simulations），特別是影像模擬（Imaging Simulation），可以從圖形處理器內嵌的廉價並行性（Cheap Parallelism）中大幅受益。但即使從2017年開始使用輝達（NVIDIA）的技術，這仍然不夠。因為有些演算法的部分我們無法完全利用，導致周轉時間（Turnaround Time, TAT）的縮減受到這些限制性模式（Limiting Patterns）的影響。讓我簡單解釋一下為什麼光學鄰近校正（OPC）的演算法很難並行化（Parallelize）。你可以把光學鄰近校正想像成處理三種數據類型。第一種是頂部的矽影像（Silicon Image），這是我們需要確保結果正確的事實數據（Fact Data），用來保證晶圓（Wafer）上的矽外觀（Silicon Appearance）正確。左邊是多邊形向量格式（Polygon Vector Format），這是一堆座標（Coordinates），代表我們的設計佈局（Design Layout）、光罩（Mask）和所有處理中用到的幾何形狀（Geometry）。右邊則是我們用來模擬所有模型的中间模擬數據（Intermediate Simulation Data），這些數據基本上是二維矩陣數據（2D Matrix Data）。這種二維矩陣數據的處理從一開始就非常適合圖形處理器（GPU Friendly）。所以我們一直在使用圖形處理器來加速這部分矩陣數據的處理。但多邊形（Polygon）由於其編碼特性（Coding Nature），以及程式碼中有大量的分支（Branching），很難充分利用並行性。白色路徑顯示了一些程式碼片段（Code Snippets），這些內容被高度編輯過（Redacted），但你仍能感受到我們處理幾何形狀的風格。程式碼中有很多條件語句（If-Then Statements）：如果某個幾何形狀處於某種情況，就執行這個；如果屬於另一個類別，就執行那個。因此，程式碼有非常多的分支機會，這使得並行化變得非常困難。所以我們幾乎把所有的佈局操作（Layout Manipulation）和多邊形程式碼（Polygon Codes）都放在中央處理器（CPU）端。即使我們來回使用中央處理器和圖形處理器的組合，也只能獲得部分加速（Partial Speed-Up）。基本上，如我稍後展示的，我們的周轉時間最終還是受到中央處理器部分的限制。 Samsung, you know, we had some foresight for our OPC solution to employ GPU solutions. You can see that it really took off in 2019. Because we knew from the outset that some of the simulations, especially the imaging simulation, can benefit heavily from the cheap parallelism embedded in the GPU. But even using NVIDIA starting in 2017, that was not quite enough. Because there were some parts of the algorithm that we couldn’t fully utilize. So the turnaround time reduction was limited by those limiting patterns. Let me quickly explain why OPC algorithms are very hard to parallelize. You can think about OPC as dealing with three data types. The first one is the silicon image at the top. This is the fact data that we need to make sure delivers the right silicon images, the silicon appearance on the wafer. On the left side, you have the polygon vector format. It’s a bunch of coordinates, and that represents our design layout, mask, and all the geometry we use in our processing. On the right side, we have the intermediate simulation data that we use to simulate all the modeling. That data is basically 2D matrix data. And that 2D matrix data processing is very GPU-friendly to begin with. So we’ve been using GPUs for accelerating that matrix portion of the data. But the polygons, given their coding nature and the heavy branching in the code, were very difficult to take advantage of parallelism. The white paths show some of the code snippets, and they are highly redacted, but you can get the flavor of the geometry processing that we have. There are a lot of if-then statements: if a certain geometry is in a certain situation, do this; if it is in that category, then do that. So there’s a large opportunity for branching in the code, which becomes very hard to parallelize. So we’ve been putting essentially all the layout manipulation and polygon codes on the CPU side. And even using CPU and GPU combinations back and forth to get a partial speed-up. Essentially, as I’ll show you later, all our TAT was limited by the CPU portion of the code. 然後到了2022年，庫托（CUDA）進入我們的視野。具體來說，正如Jensen在主題演講中提到的，庫托核心（CUDA Core）是輝達庫（NVIDIA Library）中的一個模組，特別適合解決光學鄰近校正的周轉時間問題（OPC TAT Problem）。其中有一個模型叫做「Cucumber Too」，這是一個完整的計算幾何（Computational Geometry）解決方案，能加速圖形處理器內的多邊形（Polygon）處理。 And then in 2022, CUDA came into the picture. Specifically, as highlighted in Jensen’s presentation in the keynote, CUDA Core is a module in the NVIDIA library that is especially designed to address this OPC TAT problem. And there is one model called Cucumber Too, which is a full computational geometry solution that accelerates computational geometry, dealing with polygons within the GPU. 當我們看到這個機會時，我們開始了合作，這項合作在2023年和2024年的步伐明顯加快。在接下來的幾頁投影片中，我將展示我們合作的一些成果，以及我們實現的加速效果。第一個案例是我們加速了光刻模型（Photolithography Model）本身。左邊的柱狀圖中，白色虛線（Dashed Line）顯示了我們的基準實現（Baseline Implementation）。請記住，我們的基準實現已經是中央處理器（CPU）和圖形處理器（GPU）的組合。右邊的框框則顯示了我們啟用新技術後的情況。首先，你會注意到左邊有一大塊淺藍色部分（Light Blue Portion），這部分我們轉移到了圖形處理器上。這個模型被稱為光柵化（Rasterization），本質上是將多邊形格式（Polygon Format）的數據轉換成影像數據（Image Data）。這是模擬計算的第一步，我們必須將多邊形格式轉換成影像格式。過去我們不知道如何並行化（Parallelize）這部分程式碼。在輝達（NVIDIA）的幫助下，我們將這段原本在中央處理器上的程式碼轉換到了圖形處理器上，從而獲得了超過80倍的加速（80x Speedup）。我想特別請大家注意綠色框框（Green Box）。綠色框框是我們最大的一個建模部分，即便在基準實現中，這部分已經在圖形處理器上運行。但在輝達的協助下，我們還是將其加速了2倍（2x）。整體而言，相較於我們的中央處理器與圖形處理器組合的基準，我們實現了6倍的加速（6x Speedup）。這是一個非常驚人的結果，因為這意味著6倍的成本節省（Cost Saving）、6倍更快的周轉時間（Turnaround Time），以及6倍更多的開發機會（Development Opportunities）。 When we saw this, you know, we started collaboration, and the collaboration kicked into pace, became faster in 2023 and 2024. So in the next several slides, I can show you some of the results of the collaboration and the speedup that we achieved. This is case number one where we sped up the photolithography model itself. The left side bars, the white dashed line, show our baseline implementation. Remember, our baseline implementation already has a combination of CPU and GPU. And the right side box shows when we turned on the new solution. The first thing you see is that on the left side, there’s a big light blue portion, and we moved that to the GPU. That model is called rasterization. Essentially, it is a conversion of the data from the polygon format to the image data. So it’s the very first step of those simulation calculations—we have to convert the polygon format into the image format. And we didn’t know how to parallelize that portion of the code. With the help of NVIDIA, we turned that CPU code into GPU code. And we got more than an 80x speedup from the get-go. I want to draw your attention to the green box. The green box is one of the biggest modeling tasks we had. And that was on the GPU side even in our baseline. But with the help of NVIDIA, we were able to speed that up by 2x. So overall, we achieved a 6x speedup against our CPU-GPU baseline. This is a very impressive result because essentially you get 6 times the cost savings, 6 times faster turnaround time, and 6 times more development opportunities that we can pursue. 第二個案例是偏斜校正（Bias Correction）。這有點技術性且專業，但上面的兩張圖片展示了晶圓廠（Fab）的晶圓影像（Wafer Images）。左邊是經過光刻（Photolithography）後的光阻（Photoresist）影像，右邊則是經過蝕刻（Etching）後的影像。一旦縮減了圖案，你需要使用蝕刻過程將圖案轉移到基板（Substrate）上。雖然光刻技術能提供非常小的相似線寬（Line Width），但根據鄰近效應（Proximity），例如靠近線條的位置不同，蝕刻後的尺寸會有所差異。你必須對此進行建模並加以校正，就像我們在光學鄰近校正（OPC）中做的那樣。在這個案例中，我們使用某種機器學習模型（Machine Learning Model）來校正這種偏斜現象（Bias Phenomena）。這個模型的加速是我们開始研究的第二個使用案例。 The second case is bias correction. This is a little technical and esoteric, but the top two pictures show the fab wafer images. The left-hand side is after photoresist following lithography. The right-hand side picture is after etching—you know, once it reduces the pattern, then you have to transfer the pattern onto a substrate using the etch process. And although photolithography gives us very small, similar line widths, depending on the proximity—where you are close to the line and so on—you do get different sizes after etching. You have to model it, you have to correct for it, just like we did with OPC. In this case, we’re using some type of machine learning model to correct this bias phenomenon, and the speedup of this whole model was the second use case that we started working on. 我們在這張圖表中展示了結果。左邊再次是我們的基準（Baseline），這裡我們使用的是中央處理器（CPU）加上圖形處理器（GPU）的組合，這次沒有特別細分。右邊則是啟用新技術的部分。你可以看到，所有與模型相關的模組（Modules）的處理時間幾乎縮減到接近零。我的意思是，我不知道確切數字，可能是20倍（20x）、30倍（30x）的加速，數字實在太驚人了。黃色部分（Yellow Portion）的程式碼被稱為校正（Correction），這是對殘餘多邊形（Residual Polygon）進行操作的部分。這是我們目前正在努力加速的下一步，以實現這個模型的最終加速。與輝達（NVIDIA）的合作非常棒，基本上通過將速度提升2倍或6倍，我們能夠更有效率地利用時間，進行更多迭代（Iterations），獲得更好的光學鄰近校正（OPC），從而第一次就做對。作為最後一頁投影片，我想特別提醒大家，我們有這樣一個絕佳的機會，我看到了一扇大門為我敞開，因為我之前並不知道這種計算加速（Computational Acceleration）的可能性。所以我想請大家注意，有很多領域我可以向你們展示，但我只挑選了兩個案例。第一個是極紫外光隨機效應（EUV Stochastic）。極紫外光（Extreme Ultraviolet, EUV）使用193奈米的光線，因為它採用高能量光子（High-Energy Photons），有很特殊的特性，會產生大量的光子散粒噪聲（Photon Shot Noise）。而且，你試圖印刷的特徵尺寸（Feature Size）已經非常接近分子尺寸（Molecular Size），這時還會出現化學噪聲（Chemical Noise）。因此，我們需要進行大量的統計建模（Statistical Modeling）。每當涉及統計學，你就得投入大量運算能力（Computer Power），像是蒙特卡羅模擬（Monte Carlo Simulation），需要進行數百萬次的模擬。這是我們因運算負擔而無法解決的領域。但我認為，在圖形處理器的幫助下，我們有機會應對這個問題。 And the result we showed on the graph here—the left side, again, is the baseline. Again, you know, we were on CPU plus GPU; I didn’t break it down here. The right-hand side is the solution-enabled portion. You can see all those modules related to models have sped up almost to zero. I mean, I don’t know—it’s like 20x, 30x. I mean, the numbers are just mind-blowing. The yellow portion of the code is something called correction. It’s where the residual polygon manipulation is going on. And that is the next step that we are currently working on to speed up, in order to get the final speedup in these models. The collaboration with NVIDIA has been great, and essentially, by making it 2x faster or 6x faster, we can utilize that given time much more efficiently—more iterations, better OPC—so we get it right the first time. As the last slide, I want to draw some attention since we had such a good opportunity. I see a big door opening for me because I was not aware of this opportunity for computational acceleration. So I want to draw your attention—there are a lot of areas that I think I can present to you, but I just selected these two cases. One of them is EUV stochastic. EUV, extreme ultraviolet light with 193 nm light, has very peculiar characteristics due to stochastic effects because EUV uses very high-energy photons. There’s a lot of photon shot noise. Also, the feature size you try to print is very close to molecular size. Now you have something called chemical noise as well. So there’s a lot of statistical aspects of modeling that we have to do. Every time statistics comes in, you have to throw a lot of computer power at it because you have to do Monte Carlo simulations, like hundreds of millions of times. This is an area that we haven’t been able to tackle because of the compute burden. And I think with the help of GPUs, we do have an opportunity to tackle this. 第二個我想請大家注意的案例是光學科學（Optical Science）。這是關於麥克斯韋求解器（Maxwell Solver）的問題，這個技術已經存在150年了。但在今天的實務中，我們非常需要加速這種嚴謹的求解器（Rigorous Solver），以從根本上改變我們在半導體開發的方式。有很多未開發的機會（Untapped Opportunities）存在於不同的使用案例中。所以我想請大家關注這類使用案例，這是我每天在第一線（In the Trenches）都能看到的東西。有了這些，我想總結我的演講。光學鄰近校正（OPC）是一種非常精細的校正技術，嵌入在最先進的半導體製造（State-of-the-Art Semiconductor Manufacturing）中。希望我已經說服你們，光學鄰近校正的運算時間（OPC Compute Time）是影響晶圓廠製程開發（Fab Process Development）的主要瓶頸（Bottleneck），我承受著很大壓力。三星自2019年以來一直在利用輝達的圖形處理器（NVIDIA GPU），並與庫托（CUDA）合作，解決了許多未開發領域的運算加速問題。我非常感謝輝達為我們提供了這樣的機會。所以我想以特別感謝我的合作夥伴—輝達的直接參與者（Direct Singer）、我的三星工程師團隊，以及輝達的庫托團隊和技术團隊，他們一路上提供了幫助，來結束我的演講。非常感謝大家。 The second case that I wanted to draw attention to is optical science. This is about the Maxwell solver. This situation has been there for like 150 years. But today, in practice, there’s a lot of need for speeding up this rigorous solver to fundamentally change the way we do semiconductor development. There’s a lot of untapped opportunities lying there in a lot of different use cases. So I want to draw your attention to this type of use case—something that I see every single day because I’m in the trenches. With that, I want to summarize my presentation in the following way. OPC is a very fine-grained correction embedded in state-of-the-art semiconductor manufacturing. Hopefully, I’ve convinced you that OPC compute time is a major bottleneck impacting fab process development, and I’m getting a lot of pressure. Samsung has leveraged NVIDIA GPUs since 2019, and collaboration with CUDA has addressed several untapped areas for compute acceleration. I really thank NVIDIA for providing those opportunities for us. So I want to end my presentation with a special thank you to my partner, a direct contributor at NVIDIA, my engineers at Samsung, and the NVIDIA CUDA team and tech team that helped along the way. Thank you very much. 说话人1：非常感謝大家。現在開放接受任何問題？ Speaker 1: Thank you so much. It will be open to any questions? 说话人1：沒有問題嗎？好的。 Speaker 1: No questions, okay? Yes. 说话人3：非常感謝。請問這整個解決方案（Solution）是否已經在三星晶圓廠（Samsung Fabs）的大規模生產（Mass Production）中部署了，還是你們仍在研究它的好處（Benefits）？ Speaker 3: Thank you very much. Um, is the whole queue—this—was it already being deployed in mass production at Samsung fabs, or are you still researching the benefits of it? 说话人2：不，我想我們已經準備好了。我們目前關注的主要應用是開發（Development）方面。所以我們正在與內部團隊（Internal Team）合作，部署這些解決方案（Solutions）。 Speaker 2: No, I think that we are ready. So the main application that we are looking into is the development aspect of it. So we are working with the internal team to deploy the solutions. 说话人3：也許還有一個後續問題：你們需要什麼樣的圖形處理器集群規模（GPU Cluster Size）？说话人2：抱歉，我忘了提到，我這裡展示的所有結果都是基於100個安培級（Ampere Class）的設備，因為這是我們最大的安裝基礎（Installed Base）。這就是我們專注於這個集群（Cluster）的理由。因為顯然，這是我們擁有的最多的設備。一旦我們完成這種優化（Optimization），我們就可以將這個解決方案部署到製造業（Manufacturing）中。我們正在我們的數據中心（Data Centers）安裝H100。我確實相信，我的合作者輝達（NVIDIA）在會議前也表示，他們同樣看到了由於這種硬體（Hardware）帶來的顯著加速（Significant Speedup）。對於這種優化，我也非常期待。 Speaker 3: And maybe one follow-up: what sort of GPU cluster size do you need for it? Speaker 2: I’m sorry. Uh, I forgot to mention that all these results that I’m showing here are based on 100 Ampere-class units, because that’s the largest installed base. And that’s why we focused on that cluster. Because, obviously, that’s the most we have. And we can deploy that solution to manufacturing once we get this optimization. So we are getting this H100 installed in our data centers. And I do believe—and I think my collaborator at NVIDIA said the same before the meeting—that they also see a significant speedup due to this hardware. And for the optimization, I’m looking forward to that as well. 说话人1：好的。當然。還有其他問題嗎？ Speaker 1: That’s it. Sure. Any other questions? 说话人2：好吧。非常感謝大家，祝你們今天餘下的時間愉快。 Speaker 2: All right. Thank you very much and have a good rest of the day. ![image](https://hackmd.io/_uploads/HJpVcEa21e.png) ![image](https://hackmd.io/_uploads/HyC_9463kg.png) ![image](https://hackmd.io/_uploads/BJHvnEpn1e.png) ![image](https://hackmd.io/_uploads/rkt-04pnJe.png) ![image](https://hackmd.io/_uploads/ry6MAV62Jg.png) ![image](https://hackmd.io/_uploads/BJpQCE631g.png) ![image](https://hackmd.io/_uploads/rkmrAEp31l.png) ![image](https://hackmd.io/_uploads/r1LUCNa2Je.png) ![image](https://hackmd.io/_uploads/HkPD0Va3yx.png) ![image](https://hackmd.io/_uploads/BkHuCV62yx.png)