laitimes

Memory chip giants start the battle for HBM supremacy! Synopsys industry leaders explain the power consumption challenge

author:Core stuff
Memory chip giants start the battle for HBM supremacy! Synopsys industry leaders explain the power consumption challenge

Compile | Wang Aoxiang

Edit | Cheng Qian

On April 11, the American semiconductor industry magazine EE Times ("Electronic Engineering Album") reported on Tuesday that under the generative AI boom in 2023, with the mass production of HBM3, the power consumption of HBM memory suitable for AI applications and data computing has attracted more and more attention.

With the rapid development of AI technology, the demand for AI server memory bandwidth is continuing to rise, but the rising cost of power in data centers has made enterprises start to regard bandwidth per watt as an important metric. Enterprises are faced with a balance between cost and performance when choosing memory.

As a key technology that can meet the demand for high-bandwidth memory for AI, HBM has become the preferred memory for enterprises. HBM suppliers such as Micron and Samsung are exploring innovative solutions to reduce HBM's power consumption and ensure that HBM plays a key role in future high-performance computing and AI applications.

EE Times interviewed Lou Ternullo, Senior Director of Silicon IP Product Marketing at Rambus, a leading U.S. semiconductor technology provider, Jim Handy, Principal Analyst at Objective Analysis, a U.S. market research and consulting firm, Graham Allan, Senior Product Manager at Synopsys, the world's largest semiconductor IP interface supplier, and Girish, Senior Director of Product Management at Micron Cherussery discussed the power consumption issues faced by HBM and the technical measures that suppliers can take in the current continuous development of AI.

1. Power consumption continues to rise, and memory selection is limited by cost

Lou Ternullo said in an interview that the increasing demand for memory bandwidth for AI is directly related to the increase in HBM bandwidth. "Across the market, we're seeing larger and larger parameters for datasets and trained models, and the generative AI boom in 2023 has only accelerated that trend," he said. ”

He believes that the exponential increase in demand for AI server performance, memory bandwidth, and memory size has brought higher expectations and pressure to the next generation of HBMs.

In addition, while the concept of bandwidth-per-watt is not new, and HBM optimizes bandwidth-per-watt to improve server efficiency, the energy consumption of AI data centers has been on the rise. "The huge investment and deployment of generative AI by companies in 2023 has led some to predict that data center electricity consumption will double by 2026," Ternullo said. ”

Ternullo added that the rapidly growing cost of electricity for data centers means that bandwidth per watt is becoming an even more important metric for businesses that need to monitor operating costs. This is even more important as society is increasingly focused on sustainability initiatives.

The high cost associated with HBM and the high price of the memory itself means that the total cost of ownership of the enterprise becomes the decisive factor when deciding whether ultra-high-power memory needs to be used, i.e., the sum of the costs of the entire enterprise data center. When deciding what kind of memory they need, customers first consider factors such as memory density, performance, and power consumption.

2. There is no upper limit to AI performance requirements, and HBM is the best memory for AI servers

Compared to other memory chips, AI or machine learning is one of the very few applications that can commercialize the more expensive HBM. "Applications like AI have an insatiable thirst for memory bandwidth that can deliver higher ROI for enterprises, justifying HBM's higher costs," Ternullo said. ”

However, the increased demand for AI does not directly lead to an increase in the cost of HBM. This is because the demand for AI is primarily driving the increase in the use of GPUs by enterprises, but GPUs typically require the use of HBM to achieve the expected performance of AI servers.

Jim Handy said companies need to have a clear reason to use HBM. For some graphics applications, companies like AMD use GDDR memory on some GPUs because GDDR is cheaper than HBM.

Handy explained that outside of AI scenarios, GPUs are mainly used for graphics processing, especially for post-production effects in games and computer animation. "A lot of companies are using GPUs, and there are quite a few of them," he said. They'll have a big data center full of GPUs. "While GDDR was originally designed for graphics work, emerging applications over the years have created competing demands for GDDR in other use cases.

Similarly, Graham Allan believes that expensive HBMs are now hard to come by, given AI developments. While HBM still has edge applications, most of them are focused on AI.

Even though HBM's third iteration has entered the high-volume production stage, Allan does not consider the technology to be mature. "HBM is unique in terms of DRAM because it is the only DRAM that is not mounted on the motherboard next to the processor. "However, HBM's 2.5D packaging technology requires additional technical steps, which poses a challenge for the entire industry." ”

Third, HBM needs to be integrated into the processor, and many suppliers are rushing to mass production

Allan thinks that the implementation of DRAM is very simple. "If you want to design an SoC with a DDR5 interface, you can look at any of the open source reference designs, such as finding Intel-approved DDR5 DIMMs and getting all the part numbers," he said. It's a proven technology. ”

But for HBM, everything including DRAM is encapsulated within the SoC. Companies can choose HBM from a number of vendors, including Micron, Samsung, and SK hynix, while having to address how to design the interposer assembly and other issues, including signal path and signal integrity.

Synopsys provides customers with the IP they need to control HBM, including controllers, physical layer interfaces (PHYs), and verification IP. "Customers are looking for help with HBM's expertise and specific reference designs," says Allan. We share reference designs and some of the most common interposer technologies. In addition, we assist with wafer testing, including interposers and module connections. In this way, we can provide customers with fully customized test chips. ”

He believes that silicon wafer testing is especially important for HBM, as it can be time-consuming to make changes once a company has put it into design and implemented HBM into the system.

"HBM is maturing, but it is still far from being as mature as DDR and LPDDR technologies. Although HBM4 has a similar logical approach to HBM3, the transition from DDR4 to DDR5 is a huge leap. "Choosing HBM was a big commitment because it was more complex and it was a low-volume product." Customers want to make decisions as risk-free as possible. ”

Allan also said that the customer chose HBM because other products did not meet their requirements. Under HBM, GDDR memory may be sufficient for some applications, and GDDR7 has twice the capacity of GDDR6 and increased data transfer rates. However, the high data transfer rate is due to the relatively narrow channel of data transmission.

"You can achieve higher data transfer rates, but you have to be very careful about designing your system because your system runs very fast. He said.

However, GDDR7 is a 2026 technology, and the bandwidth potential of HBM3, which was launched last year, is 3 times higher than that of GDRR7. Allan believes that there is a lot of room for bandwidth to evolve.

He added that this doesn't mean that the bandwidth potential is enough to meet the needs of the enterprise for AI, and that there are other factors that affect how much the server as a whole can accomplish. For example, the intermediary layer has the potential to be a bottleneck. If the server's PCB layout is poor and there is too much crosstalk, then the server performance may end up degraded.

The Solid-State Technology Society (JEDEC), the leading standards body for the microelectronics industry, is currently working on the HBM4 specification, but would not say how far this is going. In his keynote speech at Semicon Korea 2024, SK hynix Vice President Kim Chun-hwan revealed that the company plans to start mass production of HBM4 by 2026.

Micron recently started mass production of its HBM3E memory, and HBM production capacity has been basically sold out this year. The company's first HBM3E features 8 layers of stacking and 24GB capacity, with a 1024-bit interface, a data transfer rate of 9.2GT/s, and a total bandwidth of 1.2TB/s.

Memory chip giants start the battle for HBM supremacy! Synopsys industry leaders explain the power consumption challenge

▲美光HBM3E规格(图源:Micron Technology)

Fourth, data centers pay more attention to power consumption, and Micron and Samsung use different ways to reduce memory power consumption

Girish Cherussery said that when HBM first entered the market, Micron reviewed the workloads for which HBM was applicable and decided to target HBM performance to be 30 percent higher than industry demand. "We are future-proof. "A key metric is performance per watt, which is a key power boundary condition. We are focused on ensuring a significant increase in performance per watt. In addition, the customer wanted HBM to be close to the computing unit.

Cherussery explains that many AI workloads, including large language models, are becoming more memory-constrained rather than computationally constrained. If your server has enough computing power, then server memory bandwidth and capacity can be a limiting factor. AI workloads are putting a lot of pressure on data centers.

In addition, high memory utilization means that memory power is a large consumer of power in data centers, so saving 5 watts of power can improve memory utilization efficiency. More and more data centers are vastifying wattage over server count. Cooling HBM is also an important factor when using HBM because it is a stacked memory. The heat generated by HBM's operation needs to be dissipated.

In addition to bandwidth, power consumption, and overall heat dissipation, ease of integration is the most critical feature of all HBMs. Micron has its own patents that allow it to integrate its HBM into host systems, Cherussery said.

"The industry is ready for HBM3E, which can be easily integrated into systems using HBM. "Our products can be seamlessly integrated into the same socket without any changes." It occupies the same footprint as the previous generation. ”

Higher bandwidth and greater capacity will be the hallmarks of HBM4. With the growth of large AI models, enterprises' requirements for HBM capacity and bandwidth are also increasing linearly.

"The memory industry as a whole is in an interesting phase because there has never been a workload like generative AI and normal AI that is linear with the growth of memory bandwidth and memory capacity. This means that for compute and memory, businesses will have to start thinking about systems that are slightly different from what they used to be. Data centers themselves are becoming increasingly heterogeneous. He said.

Samsung is also witnessing significant growth in heterogeneous computing and more AI-focused services in data centers. Indong Kim, vice president of product planning and business support at Samsung, said, "This growth seems to coincide with the rise of hyperscalers offering both direct and indirect AI solutions. ”

He believes that data centers are evolving to maximize the potential of computing resources for specific workloads, including AI, and that the focus on achieving this potential is DRAM bandwidth and capacity. What's particularly exciting is that heterogeneous architectures with two different types of processors, CPUs and dedicated accelerators, share the same goal in terms of improving memory. He believes that this trend will provide significant growth opportunities for DRAM manufacturers.

At Memcon 2024, Samsung showcased what the company calls the world's first 12-stack HBM3E DRAM. It uses Samsung's advanced Hot-Pressed Non-Conductive Film (TC NCF) technology to improve the internal vertical density by more than 20% compared to its predecessor, while also improving product yield. As massively parallel computing becomes more prevalent in high-performance computing (HPC) environments, Kim says demand for HBM will explode.

Samsung's HBM3E DRAM is designed to meet the needs of high-performance computing and demanding AI applications. The company also launched the Memory Module-Box (CMM-B) memory box module based on the Compute Express Link (CXL) open interconnection protocol, which is designed to support applications that require large amounts of memory, such as AI, in-memory databases, and data analytics. CMM-B also supports memory pooling, which is a key element of heterogeneous computing.

Memory chip giants start the battle for HBM supremacy! Synopsys industry leaders explain the power consumption challenge

▲三星推出CXL Memory Module-Box内存盒模组(图源:Samsung Electronics)

Jin Rendong said that the growing demand for memory capacity and bandwidth for AI, and the growing parameter scale of models, have accelerated the pace of research and development of different memory technologies by memory chip players. The CXL protocol is intertwined with HBM to provide the best features to address the growing demand for AI, facilitating the development of existing DRAM-SSD storage hierarchies.

"We believe CXL will be the perfect complement to the growing capacity demand, providing the best features to bridge the existing DRAM-SSD hierarchy," he said. ”

Conclusion: HBM has broad development prospects and helps enterprises reduce costs

As the demand for memory bandwidth for AI continues to grow, HBM has attracted more and more attention as a high-performance memory technology. Despite the challenges of high cost and complex integration, HBM is becoming increasingly important in AI data centers and other use cases. HBM suppliers are also adopting different technologies to reduce HBM power consumption to help save data center power costs.

In this context, HBM is gradually maturing, but it still needs to face the challenges of mature technologies such as DDR and LPDDR. With the development and deployment of HBM4 and HBM3E, HBM is expected to continue to play an important role in high-performance computing and AI applications.

Read on