laitimes

Huang Jenxun released nuclear bombs to TSMC! Kill 40,000 CPU servers and speed up computing lithography by 40 times

author:Core stuff
Huang Jenxun released nuclear bombs to TSMC! Kill 40,000 CPU servers and speed up computing lithography by 40 times

Author | ZeR0 Cheng Qian

Edit | Shadow of indifference

Xindong reported on March 22 that the global AI computing technology event and the annual NVIDIA GTC Conference have arrived as scheduled!

NVIDIA CEO Jensen Huang (nicknamed "Lao Huang") wore his trademark leather jacket for 78 minutes, sharing with a smile on his face what NVIDIA has done in a muffled voice.

Huang Jenxun released nuclear bombs to TSMC! Kill 40,000 CPU servers and speed up computing lithography by 40 times

Overall, this speech can be summarized as a "highlight" and a "key point".

The "highlight" is that NVIDIA secretly developed for four years and threw a technological "nuclear bomb" to the chip manufacturing industry - through the breakthrough lithographic computing library cuLitho, the computational lithography is accelerated by more than 40 times, making the production of 2nm and more advanced chips possible. TSMC, the world's largest wafer fab, ASML, the global lithography machine overlord, and Synopsys, the world's largest EDA giant, all participated in the cooperation and introduced this technology.

Lao Huang also went directly to a small class of lithography machines, with animations to explain how the lithography machine, the most critical equipment for chip manufacturing, operates.

Huang Jenxun released nuclear bombs to TSMC! Kill 40,000 CPU servers and speed up computing lithography by 40 times

The "focus" is undoubtedly generative AI.

Lao Huang praised OpenAI's ChatGPT, praising it as "shocking the world" and representing that "a new computing platform has been born, and the "iPhone moment" of AI has arrived.

According to NVIDIA, OpenAI will use the NVIDIA H100 GPU on the Microsoft Azure supercomputer, and the AI startup Stability.ai is an early access customer of the H100 GPU.

In order to accelerate the development and deployment of generative AI, Lao Huang announced the launch of three new inference GPUs, which are respectively good at inference acceleration of large language models such as AI video, image generation, and ChatGPT.

In addition, NVIDIA also released the AI supercomputing service DGX Cloud, NVIDIA AI Foundations, a cloud service that accelerates enterprise creation of large models and generative AI, and announced that it has jointly built Japan's first generative AI supercomputer for accelerating pharmaceutical research with Japan's Mitsubishi.

Huang Jenxun released nuclear bombs to TSMC! Kill 40,000 CPU servers and speed up computing lithography by 40 times

NVIDIA also released a series of new developments for the metaverse, automotive, and quantum computing fields, including the PaaS service NVIDIA Omniverse Cloud is now open to specific enterprises, expanded cooperation with BMW Group to build virtual factories, BYD more models will use the NVIDIA DRIVE Orin platform, and launched the world's first GPU-accelerated quantum computing system in cooperation with Quantum Machines.

Lao Huang announced that NVIDIA has updated 100 accelerator libraries, and NVIDIA's global ecosystem now covers 4 million developers, 40,000 companies and 14,000 startups.

First, chip manufacturing explosion field! Computing lithography will be accelerated by 40 times, and the three semiconductor giants will stand on the platform

Let's take a look at today's "surprise bomb": NVIDIA has released a breakthrough technology for advanced chip manufacturing - NVIDIA cuLitho computing lithography library.

Lithography is the most complex, expensive and critical part of the chip manufacturing process, and its cost accounts for about 1/3 or more of the entire silicon wafer processing cost. Computational lithography simulates the behavior of light as it passes through optics and interacts with photoresist, applying inverse physics algorithms to predict patterns on mask plates in order to generate the final pattern on the wafer.

Huang Jenxun released nuclear bombs to TSMC! Kill 40,000 CPU servers and speed up computing lithography by 40 times

In short, computational lithography is a key means to improve lithographic resolution and drive chip manufacturing to 2nm and beyond.

"Computational lithography is the largest computing workload in chip design and manufacturing, consuming tens of billions of CPU hours per year." "Large data centers operate 24×7 to create mask plates for lithography systems. These data centers are part of the nearly $200 billion chipmaker's annual capital expenditure. ”

cuLitho is able to speed up computational lithography by up to 40 times. Lao Huang said that the NVIDIA H100 GPU requires 89 mask boards, and when running on the CPU, it takes two weeks to process a single mask board, while running cuLitho on the GPU takes only 8 hours.

In addition, TSMC can reduce the power from 35MW to 5MW by using cuLitho acceleration on 500 DGX H100 systems, replacing the 40,000 CPU servers previously used for computational lithography. Fabs using cuLitho can produce 3-5 times more photomasks per day, using only 1/9 of the current configured power.

TSMC, the world's largest wafer fab, ASML, the world's largest lithography machine manufacturer, and Synopsys, the world's largest EDA company, are all standing up for this new technology. Lao Huang revealed that cuLitho took four years of research and development and worked closely with these three chip manufacturers. TSMC will begin production qualification of cuLitho in June.

Huang Jenxun released nuclear bombs to TSMC! Kill 40,000 CPU servers and speed up computing lithography by 40 times

TSMC CEO Wei Zhejia praised it for opening up new possibilities for TSMC's widespread deployment of lithography solutions in chip manufacturing, making important contributions to the scale of semiconductors. ASML CEO Peter Wennink said ASML plans to integrate GPU support into all of its computing lithography software products.

According to Aart de Geus, chairman and CEO of Synopsys, running Synopsys' optical proximity correction (OPC) software on NVIDIA's cuLytho platform accelerates performance from weeks to days.

cuLitho will help fabs reduce prototype cycle times, increase yields, reduce carbon emissions, lay the foundation for 2nm and more advanced processes, and enable new solutions and innovations required at new technology nodes such as curve masks, high NA EUV, subatomic photoresist models, and more.

Second, release a dedicated reasoning GPU for ChatGPT, and access the AI supercomputer by logging into the browser

Focusing on generative AI, NVIDIA has released a series of new software and hardware products and services to accelerate model training and inference.

Lao Huang first told how NVIDIA entered the AI field at the beginning of the generative AI revolution.

"NVIDIA accelerated computing started with DGX (AI supercomputer), which is the engine behind the breakthrough of large language models." "I personally handed over the world's first DGX to OpenAI [in 2016], and since then, half of the Fortune 100 companies have installed DGX AI supercomputers. DGX has become an essential tool in the field of AI. ”

Huang Jenxun released nuclear bombs to TSMC! Kill 40,000 CPU servers and speed up computing lithography by 40 times

"Generative AI will reshape almost every industry." ChatGPT, Stable Diffusion, DALL-E and Midjourney have awakened the world's awareness of generative AI.

In his view, generative AI is a new type of computer, a computer that can be programmed in human language, similar to personal computers (PCs), the Internet, mobile devices and the cloud, and this ability is so far-reaching that everyone can command a computer to solve problems, and now everyone can be a programmer.

1. Training: Generative AI star enterprises are in use, and AI supercomputers have been fully put into production

In terms of training, NVIDIA H100 GPU is based on Hopper architecture and its built-in Transformer Engine, optimized for the development, training and deployment of generative AI, large language models and recommendation systems, and uses FP8 precision to provide 9 times faster AI training and 30 times faster AI inference on large language models than the previous generation A100.

The DGX H100 has 8 H100 GPU modules, providing 32PetaFLOPS computing power at FP8 accuracy, and provides a complete NVIDIA AI software stack to help simplify AI development. Jensen Huang announced that the NVIDIA DGX H100 AI supercomputer is in full production and will soon be available to global enterprises. Microsoft announced that Azure will open a private preview version to its H100 AI supercomputer.

Huang Jenxun released nuclear bombs to TSMC! Kill 40,000 CPU servers and speed up computing lithography by 40 times

Huang said that cloud giants are now offering NVIDIA H100 GPUs, and several star companies in the generative AI field are using H100 to accelerate work.

For example, OpenAI will train and run the AI chatbot ChatGPT with the H100's previous-generation A100, and will use the H100 on Microsoft's Azure supercomputer. AI Stars Startup Stability.ai is an H100 early access customer on AWS.

Meta, a social software giant that recently launched an open-source large model, has developed the Grand Teton system, an AI supercomputer based on the Hopper architecture. Compared with its predecessor, Zion, the computing power of the system is greatly improved, and it can support both training and inference of recommendation models and content understanding.

NVIDIA and its key partners announced the launch of powerful GPU NVIDIA H100 Tensor Core GPU products and services to meet generative AI training and inference needs.

AWS announced that the upcoming EC2 Super Cluster (EC2 P5 instances) will scale to 20,000 interconnected H100s. Oracle Cloud Infrastructure (OCI) announces a limited number of new OCI Compute bare metal GPU instances featuring H100.

Twelve Labs, a platform for multimodal video understanding for enterprises and developers, plans to use H100 instances on OCI Supercluster to search videos instantly, intelligently and easily.

2. Inference: Release 3 GPUs and 3 types of cloud services

In terms of inference, NVIDIA launched a new GPU inference platform: 4 configurations (L4 Tensor Core GPU, L40 GPU, H100 NVL GPU, Grace Hopper super chip), one architecture, one software stack, respectively used to accelerate AI video, image generation, large-scale language model deployment and recommendation system.

Huang Jenxun released nuclear bombs to TSMC! Kill 40,000 CPU servers and speed up computing lithography by 40 times

(1) L4: A general-purpose GPU designed for AI video, which can provide AI video performance 120 times higher than that of the CPU and improve energy efficiency by 99%; Optimized video decoding and transcoding, video content review, video calling, and other functions, such as background replacement, relighting, eye contact, transcription, and real-time translation. One 8-GPU L4 server will replace more than 100 dual-socket CPU servers for processing AI video.

(2) L40: Used for image generation, optimized for graphics and AI-supported 2D, video and 3D image generation, inference performance is 10 times that of NVIDIA's most popular cloud inference GPU T4.

Huang Jenxun released nuclear bombs to TSMC! Kill 40,000 CPU servers and speed up computing lithography by 40 times

(3) H100 NVL: For the large-scale deployment of large language models such as ChatGPT, equipped with dual-GPU NVLink, two PCIe H100 GPUs with 94GB HBM3 video memory are spliced together, which can handle GPT-3 large models with 175 billion parameters, while supporting commercial PCIe servers to easily expand.

Lao Huang said that the only GPU currently on the cloud that can actually handle ChatGPT is the HGX A100. A standard server with 4-pair H100 and dual-GPU NVLink is 10 times faster than the HGX A100 for GPT-3 processing, which reduces the processing cost of large language models by an order of magnitude.

Huang Jenxun released nuclear bombs to TSMC! Kill 40,000 CPU servers and speed up computing lithography by 40 times

(4) Grace Hopper super chip: AI database suitable for recommendation system and large language model, ideal for graph recommendation model, vector database and graph neural network, connecting NVIDIA Grace CPU and Hopper GPU through a 900GB/s high-speed consensus chip-to-chip interface.

Huang Jenxun released nuclear bombs to TSMC! Kill 40,000 CPU servers and speed up computing lithography by 40 times

Google Cloud is the first cloud service provider to offer Nvidia L4 inference GPUs to customers. Google is also integrating the L4 into its Vertex AI model store.

3. Cloud service: You can access the AI supercomputer by logging in to the browser

NVIDIA launched an AI supercomputing service called DGX Cloud, partnered with Microsoft Azure, Google OCP, Oracle OCI, and accessed through a web browser, allowing enterprises to train advanced models for generative AI and other pioneering applications.

DGX Cloud instances start at $36,999 per instance per month. Each instance has eight NVIDIA H100 or A100 80GB Tensor Core GPUs, for a total of 640GB of GPU memory per node. DGX Cloud offers a dedicated NVIDIA DGX AI supercomputing cluster and is equipped with NVIDIA AI software.

NVIDIA also introduced new cloud services and foundry NVIDIA AI Foundations, enabling enterprises to build, improve, and operate custom large and generative AI models trained using its proprietary data for domain-specific tasks:

Huang Jenxun released nuclear bombs to TSMC! Kill 40,000 CPU servers and speed up computing lithography by 40 times

(1) NeMo: Text generation model construction service, providing models from 8 billion to 530 billion parameters, will regularly update additional training data, help enterprises for customer service, enterprise search, chatbots, market intelligence and other generative AI applications to customize models.

(2) Picasso: Visual language model building service, with advanced text-to-video, text-to-3D functions, which can quickly create and customize visual content for applications that use natural text prompts such as product design, digital twin, and character creation.

(3) BioNeMo: Life science services that provide AI model training and inference, accelerate the most time-consuming and costly stages of drug discovery, accelerate the creation of new proteins and therapeutics, and genomics, chemistry, biology, and molecular dynamics research.

These cloud services running on the NVIDIA DGX Cloud can be accessed directly in the browser or through the API. NeMo, BioNeMo cloud services are open for early access, and Picasso cloud services are in private preview.

NVIDIA also announced a series of generative AI-related collaborations, including a partnership with Adobe to develop a new generation of advanced generative AI models; Cooperate with Getty Images to train responsible basic models of text and text to video; Working with Shutterstock, training to create generative 3D models from simple text prompts reduced authoring time from hours to minutes.

In addition, NVIDIA and Mitsubishi jointly released Japan's first generative AI supercomputer, Tokyo-1, which will be used to accelerate drug discovery. Using NVIDIA BioNeMo software on Tokyo-1, researchers can run advanced AI models with billions of parameters, including protein structure prediction, small molecule generation, pose estimation, and more.

Huang Jenxun released nuclear bombs to TSMC! Kill 40,000 CPU servers and speed up computing lithography by 40 times

BlueField-3 DPU has been put into production, creating the world's first GPU-accelerated quantum computing system

In terms of data processing units (DPUs), Huang announced that NVIDIA BlueField-3 DPUs have gone into production and are being adopted by leading cloud service providers such as Baidu, CoreWeave, JD.com, Microsoft Azure, Oracle OCI, Tencent Games, etc. to accelerate their cloud computing platforms.

For quantum computing, recovering data from quantum noise and decoherence requires error correction of a large number of qubits. In response, NVIDIA has partnered with Quantum Machines to launch a quantum control link that connects NVIDIA GPUs to quantum computers for error correction at extremely fast speeds.

The NVIDIA DGX Quantum, the world's first GPU-accelerated quantum computing system, combines a powerful accelerated computing platform (powered by NVIDIA Grace Hopper superchips and CUDA's quantum open-source programming model) with OPX, the world's most advanced quantum control platform, enabling researchers to build powerful applications that combine quantum computing with state-of-the-art classical computing to enable calibration, control, quantum error correction, and hybrid algorithms.

At the heart of the NVIDIA DGX Quantum is the NVIDIA Grace Hopper system, connected via PCIe to the universal quantum control system Quantum Machines OPX+, implementing a sub-microsecond delay processing unit (QPU) between QPU and quantum.

DGX Quantum also equips developers with a powerful hybrid GPU-Quantum programming model, NVIDIA CUDA Quantum, which can integrate and program QPUs, GPUs, CPUs in one system. Several quantum hardware companies have integrated CUDA Quantum into their platforms.

U.S. communications giant AT&T announced a partnership with NVIDIA to improve operations and increase sustainability using NVIDIA's full suite of AI platforms. AT&T will use the NVIDIA AI platform for data processing, optimizing service queuing, and creating conversational AI digital avatars for employee support and training.

Fourth, launch a new generation of metaverse servers, introduce generative AI and simulation updates

Facing the metaverse field, NVIDIA has launched the third-generation OVX computing system and a new generation of workstations to power large-scale digital twins based on NVIDIA Omniverse Enterprise.

Huang Jenxun released nuclear bombs to TSMC! Kill 40,000 CPU servers and speed up computing lithography by 40 times

The third-generation OVX server delivers breakthrough graphics and AI performance by combining a dual-CPU platform, BlueField-3 DPU, L40 GPU, two ConnectX-7 SmartNICs, and an NVIDIA Spectrum Ethernet platform to accelerate applications such as large-scale digital twin simulation to improve operational efficiency and predictive planning capabilities.

Enterprises can leverage OVX performance to collaborate on visualization, virtual workstations, and data center processing workflows.

In addition, the new generation of NVIDIA RTX workstation RTX 4000 SFF Ada Generation features NVIDIA Ada Lovelace GPUs, ConnectX-6 Dx SmartNICs, and Intel Xeon processors. The newly released RTX 5000 Ada Gen Laptop GPUs give professionals anytime, anywhere access to Omniverse and industrial metaverse workloads.

Huang Jenxun released nuclear bombs to TSMC! Kill 40,000 CPU servers and speed up computing lithography by 40 times

Huang also announced an update to NVIDIA Omniverse, NVIDIA's platform for building and operating metaverse applications, adding a series of generative AI, simulation and simulation related capabilities to make it easier for developers to deploy industrial metaverse applications.

Platform-as-a-Service (PaaS) NVIDIA Omniverse Cloud is now open to specific enterprises, enabling enterprises to unify digitization across their core products and business processes.

"From large physical facilities to handheld consumer goods, every man-made object will one day have a digital twin to build, manipulate and optimize the object." "Omniverse Cloud, a digital-to-physical operating system for industrial digitalization, comes just in time for trillions of dollars worth of new electric vehicles, batteries and chip factories that are being built," Huang said. ”

NVIDIA selected Microsoft Azure as the first cloud service provider for Omniverse Cloud. Omniverse Cloud, powered by NVIDIA's OVX computing system, will launch alongside Microsoft Azure in the second half of this year. Enterprises have access to the full-stack suite of Omniverse software applications and NVIDIA OVX infrastructure, as well as the scale and security of Azure cloud services.

The new subscription service for Omniverse Cloud on Azure makes it easy for automotive teams to digitize their workflows, whether it's connecting 3D design tools to accelerate vehicle development, building a digital twin factory for a car, or running closed-loop simulations to test vehicle performance.

During his presentation, Lao Huang shared a video showing how Amazon uses the NVIDIA Omniverse platform to build a fully realistic digital twin robot warehouse to save time and money.

Huang Jenxun released nuclear bombs to TSMC! Kill 40,000 CPU servers and speed up computing lithography by 40 times

NVIDIA and BMW Group announced an expanded partnership to open the first fully virtual factory for automakers. The BMW Group uses the NVIDIA Omniverse platform to build and run industrial metaverse applications across its global production network.

In addition, NVIDIA and its partners released the new Omniverse Connections, which connects more advanced applications in the world through the Universal Scenario Description (USD) framework.

Conclusion: Generative AI has sparked a sense of urgency in businesses around the world

"Generative AI is driving the rapid adoption of AI and reshaping countless industries." "We are in the 'iPhone moment' of AI, with startups racing to build disruptive products and business models, and established companies looking for ways to respond, and generative AI raising a sense of urgency for global companies to develop AI strategies," Huang said. ”

From today's series of software and hardware releases, it can be seen that NVIDIA's support for advanced AI computing has covered everything from hardware such as GPUs and DPUs to cloud services that help enterprises accelerate the construction of customized generative AI models, thereby promoting the release of human creativity.

This is not the first time Lao Huang has "jumped the prophet". NVIDIA's accelerated computing products can be said to be symbiotic and prosperous with the development of the AI industry. NVIDIA continues to provide a more powerful computing power base for larger-scale AI model training, which has played an important role in the cutting-edge development of AI training and reasoning, and the booming AI boom has brought a broader market and opportunities for NVIDIA.

Today, the commercial promise of generative AI is inspiring almost every industry to reimagine its business strategies and the technologies needed to make those strategies a reality. NVIDIA is moving quickly with its partners to provide a more powerful computing platform for AI applications, so that more people can benefit from the transformative power of cutting-edge applications such as generative AI.