# Chiplet, Solution or Trend? My friend [Laatansa](https://www.facebook.com/LSlowmotion "his social media") found a [website](https://semianalysis.com/die-yield-calculator/ "die yield calculator") and said "Let's talk about die yield!" out of the blue. well, it turned into a conversation of monolithic vs chiplet technology and was quite interesting thus I will put my note here in case I needed it in the future (and I hope it can be a good reference for those reading this). ## 1. Down the Rabbit Hole ### What is "Yield"? Based on **Introduction to Semiconductor Manufacturing Technology** book, there are 3 kinds of yield : 1. Wafer yield >The ratio between the number of good wafers after process completion and total number of wafers used for IC chip fabrication $$ Y_{W} = {wafers_{good}\over wafers_{total}} $$ 2. Die yield >The ratio between the number of good dies per good wafer after process completion and total number of dies on the wafer $$ Y_{D} = {dies_{good}\over dies_{total}} $$ 3. Chip yield >The ratio between the number of good chips after packaging completion and total number of chips packaged $$ Y_{C} = {chips_{good}\over chips_{total}} $$ The overall yield of a fab is the product of all those above. $$ Y_{T} = Y_{W} \times Y_{D} \times Y_{C} $$ Die is what's obtained after cutting the wafer. They're usually tested to find out which one is good and which one is defective. ![image](https://hackmd.io/_uploads/ryA-fkCGyx.png) Taken from **Introduction of Semiconductor Manufacturing Technology** As seen from the above illustration, there is a relation between die size and yield. In general, the larger the die is, the lower the yield becomes (assuming the wafer will have defects on the same spot). ### How Yield affects Pricing Like mentioned in the **Introduction of Semiconductor Manufacturing Technology** book, the cost of manufacturing a chip consist mainly of wafer price, processing price and packaging price. I will take an example from page 26 of the book. Assuming a silicon wafer costs US$200 (I know the price is different now, but for example this is good enough), and it takes 500 steps for processing, assuming each processing step costs US$1, the total cost of processed wafer is US$700. If packaging cost US$10 and each chip is sold at US$30, in the case of 500 chips per wafer, with 100% wafer yield, 100% packaging yield (chip yield), then 35 chips (35 x $30 = US$1050) or 7% die yield is enough for break even. \begin{aligned} cost &= $700 + 35 \times$10(packaging) = $1050\\ revenue &= 35 \times$30 = $1050\\ profit &= revenue - cost = $0 \end{aligned} If the die yield increased to 50% (250 die) then : \begin{aligned} cost &= $700 + 250 \times$10(packaging) = $3200\\ revenue &= 250 \times$30 = $7500\\ profit &= revenue - cost = $7500 - $3200 = $4300 \end{aligned} If a fab can process 10.000 wafers a month with 100% wafer yield, 50% die yield and 100% packaging yield, then it can make US$43 million in profit per month. Well if building a 300-mm wafer fab costs US$3 billion, imagine how long it would take to breakeven. Not to mention other costs such as employing thousands of people. It might not be enough money to pay for all the bills. Thus the importance of improving yield and production capacity. After knowing the connection between die size, yield and costs, now let's talk about a little bit of history of how lately computer chips become larger each year. ## 2. Limit of Moore's Law >Gordon Moore, one of the cofounders of Intel Corporation, noticed that the number of components on a computer chip doubled every 12 months while the price stayed the same. He predicted that trend would hold in the future. His vision has become well known in the semiconductor industry as Moore’s law. >-- **Introduction of Semiconductor Manufacturing Technology** p.5 Moore's Law stated that the number of transistors on IC doubles every year (in 1975, they adjusted it from 12 to 18 months). Which means the general trend is cramming more components onto integrated circuits. > The ability of the microprocessor to ride the improvements in integrated circuit technology led to a higher rate of performance improvement—roughly 35% growth per year. >-- **Computer Architecture: A Quantitative Approach** p.2 > ...improvement of semiconductor manufacturing as predicted by Moore’s law has led to the dominance of microprocessor-based computers across the entire range of computer design. >-- **Computer Architecture: A Quantitative Approach** p.4 Historically speaking, the performance of chips for example microprocessors has been very reliant on the improvement of semiconductor manufacturing. Thus a slowdown in the semiconductor manufacturing can be felt by the slowdown of microprocessor's performance gain. As seen from the graph below : ![image](https://hackmd.io/_uploads/By9AED7X1g.png) Taken from *Computer Architecture: A Quantitative Approach p.3* >Since 2015, with the end of Moore’s Law, improvement has been just 3.5% per year, or doubling every 20 years! Performance for floating-point-oriented calculations follows the same trends, but typically has 1% to 2% higher annual growth in each shaded region. > -- **Computer Architecture: A Quantitative Approach** p.3 This means that there is a slowdown in the advancements of semiconductor manufacturing. Even some people believed that "moore's law is dead", google this term and you'll understand who and why :smiley: . In a nutshell, it is getting harder and harder to reduce transistor feature size. And scaling down does not produce proportional result anymore which means more components (cores, ALUs .etc) have to be crammed to produce more performance which enlarge chip size even further. Of course there is an option to not cram many components but then the company has to sell a new product that does not perform far above the previous generation at a "not-so-good-valued" price. As seen from the AMD Zen 5% memes and Intel's Arrow Lake launch, the market might not perceive it well, hence it is not a popular move. :upside_down_face: So, in summary, to deliver a product that performs **WAY** better than the previous generation, chip designers have to put more resources for example more cores into their products resulting in larger chip size. But as I said before, larger die means lower yield which also means unless the semiconductor fabrication company is willing to take a loss, the product will be sold at a more expensive price point. With that being said, now let me introduce you to chiplet technology ## 3. Monolithic vs Chiplet ### What is chiplet? ![image](https://hackmd.io/_uploads/Bkvy09Bzyx.png) Random image of a single IC Package Monolithic has been the *staple* technology used in IC manufacturing process (*Integrated Circuit* or what we usually call **chips**). Basically, with monolithic technology a package will be comprised of a single die. Unlike chiplet where a package can consists of multiple dies, as illustrated below : ![image](https://hackmd.io/_uploads/BJntw2VGkg.png) Illustration taken from **Chiplet Design and Heterogeneous Integration Packaging** With chiplet technology it is possible to produce a large package comprised of many smaller dies. This also means that it is possible to manufacture a product equivalent to a large die but with higher yield since the die is smaller. It is also possible to do mix and match of die technology like in the illustration, where in a single IC/chip package there is a CPU with state-of-the-art manufacturing process but inferior-but-cheap I/O. A real-life example can be seen in products like AMD Ryzen Dragon Range that released last year. ![image](https://hackmd.io/_uploads/H1xRRAVfye.png =80%x) AMD Dragon Range slide The CPU Core is made using the state-of-the-art 5nm Lithography while the I/O die is made using a cheaper and *a-bit-more-inferior* 6nm technology. This practice where a part of the IC can be made cheaper than the others, means that the price can be pushed down even further. Thus I will emphasize that there are at least two common ways to reduce the cost of manufacturing ICs in chiplet integration : - increasing yield - using more obsolete but cheaper technology to manufacture parts that does not require the bleeding edge technology/state-of-the-art technology How much is the yield improvement compared to monolithic? ### Yield, Monolithic vs Chiplet ![image](https://hackmd.io/_uploads/rkzGBp4Mke.png =80%x) Table taken from **Chiplet Design and Heterogeneous Integration Packaging** Below is also the comparison between monolithic vs chiplet die cost of AMD processor (hypothetical) ![image](https://hackmd.io/_uploads/By9BV6NMye.png =80%x) Table taken from **Chiplet Design and Heterogeneous Integration Packaging** I would like to give an example of expensive monolithic processors from Intel that the rumour says has comparably low yield like the Skylake-X, etc but Intel does not release its information regarding yield so I will put it here in the future when I can find a trusted source. Since it is possible for two separate die to be joined in a single chip, it is also possible for even two rival companies that design two different dies to manufacture a single chip, remember Intel with AMD Radeon Vega graphics? :laughing: ![image](https://hackmd.io/_uploads/ByfwzaVf1g.png) Example of chiplet technology combining Intel CPU and AMD GPU Advancements in chiplet technology also means that chiplet products will become better as well. ![image](https://hackmd.io/_uploads/B1tMEpNfyl.png) Example of advancements in heterogeneous integration by TSMC ## 4. Chiplet, Advantages and Disadvantages So far, chiplet sounds like a solution not an alternative. Then, why does some company still use monolithic design for their chips? ![image](https://hackmd.io/_uploads/SJ-GLp4MJl.png =65%x) Example of a monolithic SoC Well, although chiplet comes with its advantages, it also has it drawbacks. Here's the list of advantages and disadvantages of chiplet : Advantages : * yield improvement (lower cost) during manufacturing * faster time-to-market * cost reduction during design * better thermal performance * reusable of IP * modularization Disadvantages : * additional area for interfaces (larger package size) * higher packaging costs (more complex packaging) * more complexity and design effort * past methodologies are less suitable for chiplets Yes, chiplet integration has higher packaging cost than monolithic integration. ### Die-to-Die Interface The need of an interface needed for *die-to-die* (*D2D*) data transfer means it will take more effort. AMD came up with Infinity Fabric. It is used both in die-to-die communication and local/same-die communication. ![image](https://hackmd.io/_uploads/Hyq-Pa4G1e.png) AMD Infinity Fabric Intel came up with EMIB and I think there's Foveros as well. ![image](https://hackmd.io/_uploads/SJKXqT4M1l.png =50%x) Intel EMIB used as die-to-die data transfer in Stratix 10 FPGA Nvidia initially used chiplet for their GPU with HBMs. ![image](https://hackmd.io/_uploads/SJ2BjTNGkx.png =70%x) TSMC 2.5DIC, technology used for HBM. Lately (as per mid-2024), Nvidia also used chiplet design for their logic/processing unit ![image](https://hackmd.io/_uploads/HyYdThYGyg.png =80%x) NV-HBI is the new *die-to-die* interface from Nvidia used in Blackwell Previously they only use C2C interface with NVLink. ![image](https://hackmd.io/_uploads/BJ5j5aNG1x.png =80%x) NVLink here is used as *chip-to-chip* data transfer There's also idea of using the open-source AXI4 protocol as *die-to-die* data transfer. Paper can be found on the link below : https://ieeexplore.ieee.org/document/10121284 Die-to-die communication has higher latency and takes more energy than monolithic design, not to forget bandwidth limitation. Remember when AMD Ryzen 3000 series (Vermeer) has higher latency than its monolithic rival especially on its early days? Although the new ones has significantly better latency, it still is higher compared to monolithic design. ![image](https://hackmd.io/_uploads/r1S3yTFG1g.png =90%x) AMD Infinity fabric (EPYC/Server lineup uses 32B/cycle write instead of 16B/cycle write) As seen from the diagram above, there's a limit in how much data can be transfered (bandwidth limitation), not to mention the higher latency when doing communication between dies. ![image](https://hackmd.io/_uploads/B1xZbatzke.png =80%x) Ryzen 9 3950X (Vermeer) core-to-core latency Above is the core-to-core latency in ryzen 9 3950X (which has 32 cores). Once core-to-core communication requires the use of Infinity Fabric, i.e communicating with a core from a different die, the latency becomes worse. Here's a common die-to-die communication protocol ![image](https://hackmd.io/_uploads/rkwBXTYGyg.png =80%x) Die-to-die communication (source : https://semiengineering.com/die-to-die-chiplet-communication/) ![image](https://hackmd.io/_uploads/SyiSV6YMyg.png =80%x) Energy Cost of operations (source : Bill Dally's presentation) As seen above, transferring 1 bit data from one die to another can take between 0.5 and 2 pJ of data, which is equivalent to 5-20 times the energy used in a 32-bit addition or equivalent to one 16-bit floating point multiplication. ### More Headache ![image](https://hackmd.io/_uploads/HkgAiqOQye.png =80%x) Illustration of chip partition and chip split It is possible for those who already designed a monolithic SoC to make it into chiplet by doing chip partition or chip split. Monolithic SoC can be split/partitioned leading to a higher yield, which translates to lower manufacturing cost. But there are reasons companies refused to do so, for example Apple as quoted from the Springer book : >...partitioning and/or splitting their SoC design into chiplets would not be an attractive prospect (at least right now) because the additional chip-to-chip interconnection and communication overhead would create more headaches than it’s worth... And here's several more things mentioned by the Springer book : >- Chiplet design and heterogeneous integration packaging provide alternatives to SoCs, especially for advanced nodes, which most companies cannot afford >- Chip split and heterogeneous integration packaging is driven by cost and semiconductor manufacturing yield >- The most advantage of chiplet design and heterogeneous integration packaging comparing with SoC is lower cost >- The challenges of chiplet design and heterogeneous integration packaging are larger packaging sizes and higher packaging cost. Thus, the opportunities of packaging technologists are to reduce the packaging size and cost So in the end, the main reason engineers choose chiplet technology is **COST**! ## 5. Is Monolithic Over? For small chips with limited computing power (which means limited number of components) like those used in smartphones and edge devices, manufacturing them with monolithic integration might be less of a problem. Is it the reason Apple, Samsung and Huawei can still sell their SoC with "reasonable" price? Well that's up for debate. But chips with large computing power (which means they contain plenty of components), manufacturing it with monolithic integration might be a problem, at least cost wise. Is there no other way to avoid chiplet design at all? Well let me introduce you to Cerebras, a processor where one chip is made from a single wafer (yes, the whole silicon wafer). ### Redundancy is Part of The Design ![image](https://hackmd.io/_uploads/SJPsgCVfkl.png) Cerebras WSE-3 size comparison Cerebras for example, designed a mechanism to handle defective parts in their chips. Cerebras WSE-3 chip will still work even if it contains defects, whereas a normal defective die won't work when there are defects. This might come with a drawback as well. For every defect in Cerebras, computational power will be lower since the defective part in the end is unusable. ### Redundancy Doesn't Solve The Main Issue In other words, if Cerebras guarantees that WSE-3 chip will have a performance of 125 Petaflops (if I'm not mistaken)then, the chip can only handle a certain number of defects before performing below 125 Petaflops. Incorporating redundancy into the design does not mean it will have the best yield, but it will be better compared to discarding the die entirely when there's a defect especially when the die size is relatively large. Personally, I'm not sure if what AMD and Intel has been doing can be called as incorporating redundancy as well. For example, an 8-core processor with 1 or 2 faulty cores will have 2 cores disabled and sold as a 6-core processor instead. I will leave this information here and close this note :smile: . ## 6. Closure Chiplet technology although used in many products, is apparently anything but a solution to the problems occupying monolithic chips. It is an alternative chosen to increase yield and lower production cost, it does not replace monolithic integration. In a nutshell, if you want to design a large IC "cheaply" chiplet technology **is kind of a work around**. Well, tell me which engineer doesn't do work around. > If it's stupid but it works, it ain't stupid Words to live by In the end, chiplet is a design that is chosen as long as the benefit outweighs the "pain and suffering" that comes with it (nothing ever comes for free, **sigh*). Chiplet will stay until a better technology arrives or I should say discovered/invented. For those who are wondering "If chiplets are meant to reduce cost, then why the price of processors that use chiplets are still expensive especially those so called *AI processors*?". Well chiplets are meant to cut production costs. But how much the chips are going to be sold at are up to the companies themselves. Don't forget that chips price are also influenced by company's overhead like R&D, marketing, etc (apart from the hopes and dreams of the company's CEO and investors). Thanks for reading this, and special thanks for my friend Laatansa and people from Obral Obrol Discord Server, 'till next time! :smiley: ## Writer's Note For those who read the term *heterogeneous integration*, there's a bit of a difference with the term *chiplet* as quoted from the Springer book : >Chiplet is a chip design method, while heterogeneous integration is a chip packaging method So, when talking about chip design, we use the term *chiplet* but when talking about packaging we use the term *heterogeneous integration*. ## Sources Xiao, Hong. 2012. Introduction to Semiconductor Manufacturing Technology. SPIE. https://spie.org/publications/book/1100168 Lau, John. H. 2023. Chiplet Design and Heterogeneous Integration Packaging. Springer. https://link.springer.com/book/10.1007/978-981-19-9917-8 Hennessy, J. L., & Patterson, D. A. (2019). Computer architecture: A quantitative approach (6th ed.). Morgan Kaufmann. SemiEngineering, 2021, https://semiengineering.com/die-to-die-chiplet-communication/ Bill Dally's Presentation in Brice, 2019, https://www.youtube.com/watch?v=fnd05AeeFN4