Several GPUs worked for several days≈ a team of 10 people worked for more than half a year, and NVIDIA used AI to efficiently design chips

2022-04-21 00:26:37

Selected from HPC wire

By John Russell

Machine Heart Compilation

Machine Heart Editorial Department

"It's also about porting a new technology library, if we use human labor, we need a team of about 10 people to work for more than half a year, but with AI, we only need to spend a few days running a few GPUs to do most of the work." 」

In recent years, chip design has become an important area of AI landing, and many companies such as Google, Nvidia, Samsung, and Siemens have planned or begun to try to use AI in chip design. Among them, NVIDIA, which has been working in the field of chip design and AI for many years, has a unique advantage. At the GTC conference some time ago, Bill Dally, chief scientist and computer architect of Nvidia, introduced their progress in this regard and the AI tools they use.

Here's the original text of Bill Dally's introduction at GTC.

Predict voltage drops

As AI experts, we naturally want to use AI to design better chips. We have a few different approaches: one is to take advantage of existing computer-aided design tools (and incorporate AI), for example we have a map that can map where the electricity is used in the GPU, and it can also predict how much the voltage grid will drop — the current multiplied by the resistance voltage drop, known as the IR voltage drop. Running the process on traditional CAD tools takes three hours.

It's an iterative process, so it's a bit of a hassle to proceed with. We want to train an AI model to process the same data. We did a series of designs to do this, and then we could enter the power diagram, and the final inference time only took three seconds. Of course, if we count the feature extraction time, it takes us 18 minutes and we get results very quickly.

Instead of using convolutional neural networks, we used graph neural networks, which are meant to estimate the switching frequencies of the different nodes in the circuit. Similarly, we were able to get very accurate power estimates faster than traditional tools, and in very little time.

Several GPUs worked for several days≈ a team of 10 people worked for more than half a year, and NVIDIA used AI to efficiently design chips

Predictive parasitics

One of my favorite jobs is predicting parasitic parameters with graph neural networks. Previously this work took a lot of time, because the previous circuit design was an iterative process, and you had to draw a schematic, like the one on the left. But you don't know how well it performs, until the designer uses the schematic to layout, extract parasitic parameters, and then run the circuit simulation, you will find that the design may not meet the specifications, and you can know the performance of the circuit.

Next, the designer will modify the schematic and verify the effectiveness of the circuit through layout again. It is a very long, repetitive, and even inhumane, labor-intensive job.

Now we can train graph neural networks to predict parasitic parameters without having to do layout. As a result, circuit designers can iterate very quickly without having to manually perform layout steps. It turns out that our neural network's predictions of parasitic parameters are very accurate.

Layout, routing challenges

Our neural network can also predict routing congestion, which is critical for chip layout. In a traditional process, we need to make a net list and run the layout and routing process, which can be time-consuming, usually taking several days. But if we don't, we won't be able to get the actual wiring congestion and find flaws in the initial layout. We need to refactor it and lay out the macro in a different way to avoid the red area shown in the image below (there are too many wires running through that area, similar to a traffic jam).

Now with neural networks, without running layout and routing, we can take these netlists and use graph neural networks to roughly predict where the congestion is, and the accuracy is also very high. This approach isn't perfect yet, but it shows the areas of problem, and then we can take action and iterate very quickly without having to do the full layout and routing.

Automate standard unit migrations

All of the above methods are using AI to evaluate the design that humans have already done, but in fact, it is more exciting to use AI to actually design chips.

Let me give you two examples. The first is a system we call the NV cell, which uses analog annealing and reinforcement learning to design our standard cell library (the standard cell library is a collection of underlying electronic logic functions such as AND, OR, INVERT, flip-flops, latches, and buffers). So at every iteration of technology, such as migrating from 7 nanometers to 5 nanometers, we have a cell library. We actually have thousands of these libraries, and they have to be redesigned with new technologies, with a very complex set of design rules.

We place transistors with reinforcement learning, but with that comes a bunch of design rule errors that reinforcement learning is good at. Design chips are like an Atari game, but it's a game that fixes design rule errors in standard units. By checking for and fixing these design rule errors through reinforcement learning, we were basically able to complete the design of standard units.

The following illustration shows the tool's 92% complete unit library with no design rules or electrical rule errors. 12% of these units are smaller than human-designed units. Overall, the tool does as well as human-designed units in terms of unit complexity, or even better than human-designed units.

This has two big benefits for us. The first is to save a lot of labor. The same is the port of a new technology library, if using human labor, we need a team of about 10 people to work for more than half a year, but with AI, we only need to spend a few days running a few GPUs to do most of the work that can be automated (92%), and then people will do the remaining 8%. Many times we can get better designs, so this method not only saves manpower, but also works better than the results of human handwork.

IJCAI 2022 - Neural MMO Massive AI Team Survival Challenge

On April 14th, the "IJCAI 2022-Neural MMO Massive AI Team Survival Challenge" initiated by Hyperparameter Technology and co-sponsored by AIcrowd, a well-known data science challenge platform, was officially launched.

Themed "Finding the Strongest AI Team in the Future Open World", the tournament achieved higher achievements than other contestants by exploring, searching and fighting in Neural MMO's massive multi-agent environment. The game also sets new rules, evaluates the strategic robustness of agents against new maps and different opponents, and introduces cooperation and role division in AI teams, enriching the content of the game and enhancing the fun.

Several GPUs worked for several days≈ a team of 10 people worked for more than half a year, and NVIDIA used AI to efficiently design chips

Read on