laitimes

NVIDIA is "monopolizing" the AI industry, are domestic manufacturers ready?

"For the future of computing and humanity, I donate the world's first DGX-1. ”

In August 2016, Nvidia founder Jensen Huang came to OpenAI's office building with a supercomputer DGX-1 loaded with 8 P100 chips.

After the on-site personnel arrived, Lao Huang took out a marker and wrote this sentence on the chassis of the DGX-1.

NVIDIA is "monopolizing" the AI industry, are domestic manufacturers ready?

He was accompanied by Tesla and OpenAI, founder Elon Musk.

This trip to OpenAI, Lao Huang did nothing else, just to give this freshly baked supercomputer to OpenAI and add a wave of speed to their artificial intelligence project research.

Valued at more than one million, the DGX-1 was built by NVIDIA's more than 3,000 employees over three years.

This DGX-1 can compress OpenAI's one-year training time into just one month. And this is his big bet on the future of artificial intelligence, a wave of bets. Seven years later, at the recent GTC conference, Lao Huang wore a leather jacket and held a chip, and the whole lecture was inseparable from AI.

It seems to be telling you that in the era of AI, I NVIDIA, am about to be king, and he won the big bet of the year!

Let's just say that after experiencing a wave of mining disasters last year, many people thought that NVIDIA, which had made a fortune by relying on the mining tide, would plummet its market value under the mining disaster and collapse.

But the reality is a bit subtle... After falling for more than half a year, NVIDIA's stock price has risen all the way since October, and now, the entire NVIDIA market value has risen back to $650 billion, 4 times that of AMD and 6 times that of Intel.

Look, this is still the old yellow who begged you to buy a graphics card back then?

And what makes Nvidia's stock price skyrocket is the AI computing they have been betting on since more than a decade ago.

To give you a data, since 15 years later, the market share of NVIDIA's GPUs in supercomputing centers has risen all the way, and it has been stable at about 90% in recent years.

In the discrete GPU market, NVIDIA's market share once exceeded 80%.

In addition, including YouTube, Cat Finder, AlphaGo, GPT-3, GPT-4, the famous things in AI history are almost all made on NVIDIA's hardware.

NVIDIA's hardware is like a new era of internal combustion engines, carrying the AI era forward.

Poor friends may have a little question, why in the era of AI outbreak, it seems that only Lao Huang is beneficial, and the graphics cards of other graphics card manufacturers cannot train AI? Can train, but only a little.

Why? This has to mention that NVIDIA has been developing a thing since 2006 - CUDA (Unified Computing Device Architecture).

When you want to calculate some relatively large computing problems, through CUDA programming, you can make full use of the parallel processing power of the GPU, thereby greatly improving computing performance.

Bad reviewers say a metaphor.

CPU is like a math professor, GPU is 100 elementary school students, put a high number problem down the 100 primary school students may be confused; But put 100 four oral arithmetic problems, and 100 elementary school students must do it at the same time much faster than a math professor.

Deep learning is the 100 oral arithmetic problems in the above example, and the "tool" that allows GPU processors to perform parallel operations is called CUDA.

In general, there is often a gap of several times to tens of times in the calculation speed between the two and not using CUDA.

If CUDA is so useful, why don't other GPU vendors make a competitor? It's not that they don't do it, but they really didn't expect it!

In the early days, the role of the GPU was only to accelerate graphics rendering, and major manufacturers thought that it was a graphics dedicated computing chip, and did not think of using the GPU for other general-purpose computing. As for using it for deep learning? With the AI capabilities of that era, firstly, there was not much need, and secondly, no one found it useful.

Brian of NVIDIA's deep learning team said this when talking about CUDA:

Ten years after CUDA launched, the whole Wall Street has been asking, why did you make this investment and nobody use it? They value our market cap at $0. ”

However, it is too serious to say that no one uses it.

In fact, as early as 2012, Alex Krizhevsky of the University of Toronto beat other competitors in the ImageNet Computer Vision Challenge using GPU-powered deep learning, when they used a GTX580 graphics card.

After another 4 years, those who are engaged in deep learning suddenly realized that this design structure of GPUs is really incomparable to CPUs in terms of the speed of training AI. NVIDIA GPUs with CUDA's native support are the first choice.

Up to now, capital has seen the importance of AI, why is everyone still rolling AI models instead of rolling up the old yellow market?

The reason is that it is already difficult for them to get the ticket to the AI acceleration chip. In the artificial intelligence industry, the entire framework of deep learning is already in the shape of Lao Huang.

Over the decades of AI development, NVIDIA has been deeply bound to various AI frameworks through continuous investment in CUDA development and community.

Today, using the top AI frameworks, there is no one that does not support CUDA, which means you want to make your deep learning run fast? Buying a high-performance card that supports CUDA is the best option, and to put it bluntly, buy an N card.

Of course, during CUDA's vigorous development, there were other companies trying to break NVIDIA's near-monopoly situation.

In 2008, Apple proposed the OpenCL specification, a unified open API designed to provide a specification for a variety of GPU models to develop a common computing software framework similar to CUDA.

However, universal means not necessarily easy to use.

Because the GPU models of major manufacturers are complex and complex, in order to adapt to various hardware, there are many driver versions and uneven quality. And there is a lack of targeted optimization by corresponding vendors, so no matter which version of OpenCL, under the same computing power, it is not as fast as using CUDA.

And precisely because of the versatility of OpenCL, it is much more complicated to develop a framework that supports OpenCL than to develop CUDA. The reason is still the same, lack of official support, look at NVIDIA's tool support for CUDA development, CUDA Toolkit, NVIDIA GPU Computing SDK and NSight and so on.

On the OpenCL side, it's a little shabby...

As a result, there are very few deep learning frameworks that support OpenCL today.

For a very simple example, the hottest framework at the moment, PyTorch, is not even officially supported for OpenCL, and it has to rely on third-party open source projects to use.

So AMD, which is also a graphics card supplier, has its own solution in addition to OpenCL when facing the old yellow CUDA?

There are indeed methods, but the effect is really not great. In 2016, AMD released a new open computing platform - ROCm, which is benchmarked against NVIDIA's CUDA, and the most critical point is that it also supports CUDA programs at the source level.

You see, even if it is Lao Huang's nemesis AMD, what he wants is not to start from scratch, but to lower the threshold for adapting himself to CUDA...

However, to this day, ROCm still only supports the Linux platform, and it may be that there are too few people who use it, and it tastes a bit rotten, after all, since you support CUDA, then why should I go to great lengths to write a support framework for your ROCm?

In the same year, Google also took action, but after all, it is not a chip manufacturer, Google just launched its own TPU platform, specifically optimized for its own TensorFlow framework, of course, the best native support is only TensorFlow.

As for Intel, it has also launched a OneAPI to benchmark the old CUDA, but due to the late start, it is still in the process of developing the ecology, and it is not easy to say what will happen in the future.

Therefore, relying on the first-mover advantage and native support, the current deep learning is basically inseparable from NVIDIA's GPU and his CUDA.

The recent fire of ChatGPT uses Lao Huang's HGX motherboard and A100 chip, and Lao Huang is also very confident about this:

"The only GPU that can actually handle ChatGPT right now is our HGX A100. ”

That's right, there is nothing else available, this is Lao Huang's fearlessness.

With the successful verification of large model AI by OpenAI, the giants have entered the game of large model AI, and NVIDIA's card has immediately become a sought-after product.

So today's AI startups, there is a very interesting phenomenon, on their project reports, often catch up with how many NVIDIA A100 we have.

When everyone invested in the AI industry to pan for gold, NVIDIA relied on selling water to everyone - providing AI acceleration cards to make a lot of money, the key is that only the water it sells can quench thirst.

Because of its hardware and toolset, it can already affect the battle situation and development speed of the entire AI industry.

What's more frightening is that NVIDIA's advantages have formed a barrier, which is so thick that even AMD, the world's second-largest GPU manufacturer, cannot break through.

Therefore, in the current wave of AI, it is important to be able to make your own AI model, but bad critics feel that when to have their own NVIDIA and CUDA is also not to be underestimated.

Of course, this path is also more difficult.

Finally, bad critics feel that in the future, what we need to make breakthroughs is definitely not only the research related to artificial intelligence large models, but more importantly, the design, manufacturing, and construction of the entire computing chip.

The new industrial revolution has arrived, and the development of AI technology has not only accelerated the development of human productivity, but also accelerated the elimination of those backward production capacity, and now all walks of life are on the eve of change.

The stronger the stronger, the useless the weak. Although this sentence is cruel, in the field of AI, if you don't fight to catch up, you may really not need the "weak".

Read on