laitimes

Advanced packaging of the strongest science

Advanced packaging of the strongest science

Over the past few years, advanced packaging has become an increasingly common topic in semiconductors. In this multi-part series, SemiAnalysis will break the megatrend. We will delve into technologies that enable advanced packaging, such as high-precision flip-chip, hot press bonding (TCB), and various types of hybrid bonding (HB).

This in-depth discussion will include the usage of various foundries, IDMs, OSAT and fabless design companies, equipment procurement, and differences in technology choices. It will also include reviews of devices and supply chains from companies such as Besi Semiconductor, ASM Pacific, Kulicke and Soffa, EV Group, Suss Microtec, SET, Shinkawa, Shibaura and Xperi.

Advanced packaging of the strongest science

First, let's discuss the need for advanced packaging. Moore's Law is developing at a rapid pace. Since TSMC's 32nm error, up to the current 5nm process node, TSMC's transistor density has increased by 2 times per year. Nevertheless, the density of real chips increases by about 2 times every 3 years. This slower speed is partly due to the demise of SRAM scaling, power transfer, and thermal density, but most of these issues are related to the input and output of data.

Advanced packaging of the strongest science

The input and output (IO) of the data on the chip is the lifeblood of computation. Placing memory on the chip helps reduce IO requirements by reducing communication overhead, but at the end of the day, it's a limited avenue to scale. The processor must transact with the outside world to send and receive data. Moore's Law increases the density of transistors in the industry by about 2 times every 2 years, but the rate of IO data only increases by 2 times every 4 years. For decades, this difference between transistor density and IO data rate has been hugely different. Co-packaged optics are just one way to solve this problem, and they don't come alone.

Fundamentally, chips need to accommodate more communication or IO points to keep up. Unfortunately, the last major step in this area of feature addition was the shift to flip chip packaging in the 90s.

Advanced packaging of the strongest science

Traditional flip chip packages have bump pitches between 150 microns and 200 microns. This means that each IO unit is 150 to 200 microns apart on the bottom side of the die. The TSMC N7 reduces the bump pitch to 130 microns, and Intel's 10nm reduces the bump pitch to 100 microns, these advances are known as fine-pitch flip chips. Don't underestimate these advances, as they have greatly promoted better processors, but the packaging technology in 2000 is basically the same as the packaging technology in 2021.

The 250mm? chip in 2000 is incredibly different from the 250mm chip in 2022 in terms of transistor number, performance, and cost. Moore's Law doubles every 2 years, indicating that the number of transistors has increased by more than 2,000 times. Obviously, the reality is not so favorable, but the transistors still increase by several orders of magnitude. On the other side of the coin, the package did not enjoy the same level of growth.

On TSMC's N7 nodes, AMD's bump spacing changed from about 200 microns to 130 microns, and the IO increased by only 2.35 times. As mentioned earlier, Intel has achieved greater scaling from a bump pitch of 200 microns to 100 microns on the 10nm process. This still only increases IO by a factor of 4. A 2.35x or 4x increase is a rounding error relative to an increase in the number of transistors.

Advanced packaging of the strongest science

This led to the concept of a limited design of pads (pins of silicon wafers). When you move an old design to a new process node, the design itself can shrink significantly, but IO requirements prevent the chip size from shrinking by much. Due to the need for IO, the die size remains large and leaves empty space. These conditions are called pad limited and are very frequent.

By the way, this is not only related to the frontier that will use advanced packaging, but also to the discussion around the shortage of automotive chips and semiconductors in general. Intel CEO Pat Gelsinger believes that companies in short supply should turn to Intel 16nm foundry services.

Pat Gelsinger said that today, we announced the availability of European foundry services on Intel 16 and other nodes in the Irish plant, which we believe has the opportunity to help accelerate the end of the supply shortage, and we are working with automotive and other industries to help build these capabilities. But I also want to say that some people might argue that, well, let's build most of the car chips on the old nodes. Don't old nodes need some old fabs? Do we want to invest in the past or in the future?

A new fab takes 4 to 5 years to build and has productive value. Instead of solving today's crisis, invest in the future, don't choose to invest backwards. Instead, we should move all designs to new modern nodes, ready for future increased supply and flexibility.

The problem with Intel is that when moving from ancient nodes to relatively modern nodes, these designs will be limited by pads. Since the cost per mm is higher, unit cost economics doesn't work here because the chip area doesn't scale well due to the limited pad. In addition to these costs, the one-time cost is also high due to the need to redesign the old chip on newer nodes and the entire recertification process. The solution of moving old chips to new nodes is not feasible.

Advanced packaging of the strongest science

So how do you increase the IO count?

One way is to look for ways to make the chip bigger. The larger the area, the more space the IO. This is not the best path, but designers will often increase the memory on the chip to store more data on the chip. This, in turn, reduces IO requirements to some extent. AMD's recent architecture is a good example of this, as they have huge caches on both the CPU and the GPU.

AMD named it InfinityCache (Wireless Cache). The solution is to reduce memory bandwidth requirements by providing large amounts of on-chip SRAM to store the most compute-related data in the processor. In the GPU space, AMD has made it clear that they are able to reduce the GDDR6 bus size from 384 bits to 256 bits by adding unlimited caching. Apple is also aggressive in this regard, stuffing a lot of cache on processors designed in-house. One component of these design choices is power-related, but a large part is also due to pad limitations.

Advanced packaging of the strongest science

Another way is to add various specialized circuits to improve chip efficiency. We see this in a large number of heterogeneous calculations. Going back to our Apple A15 chip analysis, it's surprising that there are so few dedicated areas for the CPU or GPU. These are the two areas that people talk about the most. Instead of focusing on these marketing aspects, Apple is spending a lot of areas on other features. Although there are no annotations, the lower right corner is mainly an image signal processor. This huge part is doing the calculations associated with taking pictures and videos. There is another unlabeled block related to calculations related to media encoding and decoding. Around the SoC, these fairly small uniform rectangles can be found, which are SRAM caches that hold more data on the chip without having to go into memory.

Advanced packaging of the strongest science

These workloads cannot run on traditional CPUs. Ai's model is getting bigger and bigger, and Facebook's deep learning recommendation system model has more than 12 trillion parameters. The ever-expanding size of the model is dedicated to keeping you on your app longer and clicking on more ads. Google has developed its own chip for the training and inference of artificial intelligence models, called TPUs. With the advent of VCU, a new type of processor, they expanded their research on chips that could replace 10 million CPUs if dedicated to the same task.

Amazon has custom network chips that also run their hypervisors and management stacks. They have their own chips dedicated to AI training, AI inference, storage control, and CPUs. When you look at the focus of Marvell and Broadcom ASIC services, you'll see that the decentralization of hardware design and architecture will only increase.

Advanced packaging of the strongest science

Even Intel, the company that believes every workload should run on a CPU, recognizes that the only way forward is heterogeneous design. Unlike generic CPU hardware for every task, the industry is taking common workloads and building chips specifically for them. This enables architects to achieve higher performance per unit of silicon.

Long story short, heterogeneous integration of ASICs is paramount in addition to CPUs. However, more memory and more heterogeneous computing are not a panacea. While increasing chip size by increasing memory and heterogeneous computation is useful for removing pad limitations and improving energy efficiency, these are all costly. Lots of money.

More chip area means more pins and more integration capabilities, but it's also a great way to get out of control. And the chip size has reached its limit. For example, take a look at Nvidia or Intel's data center lineup. Both are closer to the "reticulation limit" for more than 5 years. Even if they want to, they can't continue to make bigger chips. Chip shrinkage has slowed sharply, fuelling the problem.

Advanced packaging of the strongest science

So the shrinkage has slowed, the chip size can't grow bigger, and the design is limited by the pad, are these the only problems?

Unfortunately, it is not. Silicon unit economics has also encountered obstacles. The semiconductor industry and its downstream companies single-handedly pushed the deflationary environment of the entire economy, offsetting inflationary action elsewhere. Without it, the United States and Europe since the 1980s would have experienced endless stagflation. However, this transformative deflationary force is encountering obstacles. The semiconductor unit economy has not improved. In fact, shrinking transistors to smaller sizes, they get even worse. Not only is making large chips expensive, but it's also more expensive than the previous generation.

Advanced packaging of the strongest science

This chart from AMD paints a very pathological picture. While the transitions are not the same for each node, it's clear that at 7nm and 5nm, the industry has reached an inflection point. The increase in cost per square millimeter produced is not small, but large. Although node conversions bring similar density gains, or may be worse due to slower SRAM scaling, the increase in cost has not kept pace. The reversal of trends associated with the cost per transistor has shocked the industry. The reversal had such a huge impact that it even led ignorant bankers to use it as a reason to downgrade TSMC's valuation as an overvaluation.

Advanced packaging of the strongest science

Morgan Stanley believes that because Moore's Law is slowing down and the scaling of transistor costs has stopped, TSMC's pricing pressure will weaken. Morgan Stanley proves this with a ridiculous chart that shows that a 5nm transistor costs less than 7nm, in stark contrast to industry experts. With the introduction of FinFET nodes, the cost per transistor stagnated, with 7nm stabilizing completely, while 5nm was higher than ever. Our readers can do the math that the N7 wafer is about $9500 and the N5 wafer is about $16000. Apple's chip size barely dropped, but they paid for it.

As a result, the cost per transistor is still increasing, but the need for computing is increasing more than ever. We turned to heterogeneous architectures to fight back, but the chip design process is much more difficult now. The industry had to rely on many teams with different IPs to deliver and consolidate them on time. EDA vendors like Synopsys and Cadence do a great job of assisting, but that's not enough. For anyone without more than 10 million cell use cases, an open ecosystem that can purchase specific application IP or chips and integrate them into hardware designs is necessary. Even for these companies, chip-style system architecture is the answer.

Advanced packaging of the strongest science

As we continue to contract, yields are expected to slowly decline. This is a logical conclusion, as each consecutive node adds about 35% of the process steps. When cutting-edge processes are measured in thousands of process steps, errors begin to pile up quickly. Industrial companies like to talk about "Six Sigma," but that's not enough for semiconductor manufacturing. Let's assume a process with 2000 process steps, with six Sigma defects per cm− per step. Then D0 (the industry term for defect rate per cm? will eventually be 0.678. The larger the chip, the greater the likelihood of defects.

If this hypothetical process is to build Intel's high-end server CPU, Ice Lake. This results in 4 good dies and 76 defective dies per die. Now consider that this analysis is done at the cm− level, and there are billions of transistors per cm− at the cutting-edge process node. The semiconductor industry is much better than Six Sigma.

What other solution than dimensional perfection?

Chiplets – small chips! Divide the large chips into many smaller chips.

AMD is the most popular example of this, but it's a trend across the industry. AMD can design 3 chips, a CPU core chiplet, and 2 IO chips. These 3 designs cover a large portion of the market. At the same time, Intel designed 2 Alder Lake desktop chips and 3 Ice Lake server chips to serve the same potential market. As a result, AMD can save on design costs, make CPUs with more cores than Intel, and save on revenue costs.

To demonstrate the yield parameter, see the following table. AMD splits the CPU cores into 8 CPU core chiplets. If the yield is 100%, Intel will be able to manufacture cores at a lower cost per CPU core than AMD. But instead, Intel has to spend more on each CPU core because larger chips have more flaws. The table below has some obvious caveats, the biggest assumption being that the yield of the defective chip is 0 and that Intel and TSMC have the same D0. None of these assumptions are true, and this exercise is for demonstration purposes.

Advanced packaging of the strongest science

Chiplets are great, but it's not an isolated solution. We still have many of the same problems. The cost per transistor is still rising, design costs are soaring, and small chips are limited by pads due to the need for more IO to interface with other chips. Due to IO limitations, some chips cannot be split, so chip size is still reaching its peak.

What is the solution?

Advanced Package!

That's where we pay attention, with some tool vendors referring to all flip chip packages as "advanced packages." SemiAnalysis and most downstream people in the industry wouldn't say that. That's why we call all packages with bump sizes smaller than 100 microns "advanced."

Advanced packaging of the strongest science

The most common class of advanced packaging is called fan-out. Some people will argue that it's not even an advanced package, but those people are dead wrong. In the case of Apple, they will have TSMC take an application processor chip and package it with denser bumps ranging from 90 microns to 60 microns to recombinant or carrier wafers/panels. The bump density is approximately 8 times higher than in traditional flip chip packages.

This recombination or carrier wafer/panel then further unfolds the IO, hence the name Fan Out. Then connect the fan-out package to the motherboard. The design of the silicon chip can reduce concerns about the limitation of the pad, because the fan source of the pad is smaller. The package can also encapsulate DRAM memory, NAND storage, and PMIC. Not only does integrated fan-out favor density, but they also retain a lot of inter-chip IO on the package. Otherwise, the IO will have to interface through the motherboard in a larger IO pitch size.

Integrated fan-out is becoming more common for high-performance applications, not just mobile applications. The fastest-growing use case is in the networking aspect of designing things that have been constrained for more than a decade. AMD will be very aggressive in fan-out in its server CPUs and GPUs. The Tesla Dojo 1 is another compelling example of integrated fan-out packaging, but at the wafer level. SemiAnalysis revealed that Tesla will use this type of packaging ahead of the announcement.

Advanced packaging of the strongest science

Among the advanced packages, there are 2.5D and 3D packages. 2.5D involves silicon wafers encapsulated on other silicon wafers, but lower wafers are dedicated to wiring and have no active transistors. This is typically done at a distance of 55 microns to 50 microns, so the bump density is about 16 times higher. The most common and highest-capacity use case is an Nvidia data center GPU with TSMC CoWoS , a chip on a wafer on a substrate. TSMC encapsulates active chips on wafers with only interconnects and microconvexes. This stack of chips is then packaged onto a substrate using traditional methods.

Other examples basically include each processor with HBM. HBM is established as a ladder function to increase memory bandwidth, which is higher than traditional forms of DRAM. It does this by using a wider memory bus. These wide buses create IO count-related issues, but HBM is designed from scratch to coexist within the same package. This upends io issues while also allowing for tighter integration.

More examples of 2.5D include Intel EMIB-based products, Xilinx FPGAs, AMD's latest data center GPUs, and Amazon Graviton 3.

Advanced packaging of the strongest science

3D packaging is the packaging of one active chip on top of another active chip. This was initially shipped by Intel alongside 55 micron pitch logic silicon, but the bulk use case will be at 36 microns and below. TSMC and AMD will introduce 3D stacked V-caches with 17 micron pitch. The technology transitions from a bump to a through-silicon via (TSV) and has more room to expand.

Other applications, such as sony-made CMOS image sensors, already have a pitch of 6.3 microns. To maintain comparison, the 36-micron pitch has a 31-fold higher bump density, the 17-micron pitch implemented copper TSV will have a 138-fold higher IO density, while Sony's 6.3-micron pitch CMOS image sensor has an IO density of 567 times higher than the standard flip chip.

Advanced packaging of the strongest science

This is just a basic explanation of the main package types, but we'll dive into the different types of packages in this series. There are many different bets on the type of package, tool, and tool vendor of the future. The device and IP aspects are much more exciting than one might think at first glance, but before we dive deeper, we need to explain the basics.

For the upcoming ocean of innovation, there are many investment ideas and perspectives. The slowdown in Moore's Law is driving fundamental change. We are in the midst of a renaissance in semiconductor design driven by advanced packaging.

Turn to semiconductor industry observation

---- the full text ends here, if you like, please click "Watching" or share to the circle of friends.

- END -

Read on