laitimes

ChatGPT "makes evil", and Chinese universities are short of computing power because of it

Hengyu originated from the Cave Fei Temple

Qubits | Official account QbitAI

If you don't make a big model, there is no computing power.

This is the cruel status quo of an AI laboratory in a domestic top3 university after ChatGPT ignited the AI outlet.

In the same laboratory, 6 people in the non-large model team use 4 3090 cards, which is not rich compared to 10 A800 cards used by 10 people in the large model team in the same laboratory.

Now, school-enterprise cooperation also prefers large models. After the release of ChatGPT in November last year, the number of companies working with non-large model teams plummeted, and those who came to the door recently also opened their mouths and asked:

"Don't you make big models?"

Do, with the full support of universities and enterprises; Don't do it? Then you can only watch the computing power go elsewhere.

Even if a quantitative private equity fund has 10,000 A100 cards and is still open to university research teams, it may not fall on your head.

"If only our group could get some." Seeing this Weibo, the data science doctor brother led by the non-large model team was envious, because of the lack of computing power, he was so worried that he was about to look up to the sky: We are also worth investing!!!

Now, everyone is scrambling to pounce on various large models like GPT-3.5 behind ChatGPT, and the same is true of the flow of computing power.

The already insufficient computing power in other AI fields is even more wasted, especially the computing power in the hands of domestic academia is distributed, and the gap between the rich and the poor is visible to the naked eye.

A whole lab is 4 3090 cards

The huge scale of computing power, the cost of renting in months, is not a small amount for the research team. Big models are right, and laboratories or teams that study large models in academia have the priority right to allocate computing resources.

Take the brother's personal experience in school, in their research room, 10 people in the large model group have 10 A800 cards available, while another laboratory that studies traditional machine learning has only 4 3090 cards in the entire laboratory.

Embracing mainstream trends is one reason, as well as the need for funds for the operation and maintenance of laboratories, one form of funding is to apply for national projects, but the necessary step is to provide thesis results.

For dual reasons, the already small computing power resources have to be prioritized for research such as large models that are popular and relatively easy to produce. Even for the academic community, training a large model is actually not very practical - because data, computing power and funds are a little stretched.

In order to obtain more resources, some non-large model laboratories even set up additional teams dedicated to studying large models.

Of course, school-enterprise cooperation is also an indispensable way to obtain funds and resources.

This important form of support to promote the integration of industry and research has lasted for a long time, and in 2020, KDD accounted for more than 50% of school-enterprise cooperation papers, and this proportion reached 45% in ICCV.

For example, in 2021, laboratories such as KEG, PACMAN (parallel and distributed computer systems), and NLP at Tsinghua University began to promote the training of dense models with hundreds of billions of parameters, but the team's computing resources for training models were insufficient. In the end, the off-campus enterprise Spectrum AI rented nearly 100 A100 servers and provided the required computing power for free, which led to the birth of the bilingual pre-trained language large model GLM-130B.

△GLM-130B mission performance

But at a time when everyone is scrambling to pounce on a GPT-3.5-like model, non-large-model teams are starting to talk less about this kind of cooperation.

After the release of ChatGPT in November last year, the number of companies negotiating school-enterprise cooperation with Xiaoge's team has decreased sharply. In other universities, non-large model teams in the AI field are also always faced with questions from enterprises, "whether or not to make a big model".

The already scarce computing power has a tendency to become a weight for chasing hot spots in the academic circles, and the Matthew effect of computing power resource allocation has gradually expanded, which has brought great trouble to academic research.

ChatGPT exacerbates the gap between rich and poor in computing power distribution

Computing power is an indispensable indicator for the rapid development of AI, and in 2018, OpenAI released a report pointing out a trend in computing power:

Since 2012, the computing power used by AI training tasks has doubled every 3.43 months. By 2018, the demand for AI computing power had increased by 300,000 times.

The demand for computing power has skyrocketed in industry, academia and research, how much computing power can we provide?

According to statistics from China Computing Power Group, as of the end of June 2022, the total scale of racks used in mainland data centers exceeded 5.9 million standard racks, and the scale of servers was about 20 million, ranking second in the world in terms of total computing power.

This ranking is not bad, but it is still far from enough, after all, looking at the world, there is no country that is not waiting to be fed, waiting for more computing power resources to be "fed".

Taking a step back, you can afford a graphics card, the computing power you have has gone up, and the electricity bill is also astronomical.

Moreover, there are special circumstances in the mainland -

Zhu Qigang, director of the business development department of the Open Atom Open Source Foundation, spoke at the CCF YOCSEF held this month to explain the current situation, saying that the core technology in the field of supercomputing is the IBM LSF supercomputing system and the other is the open source system. At present, most domestic supercomputing centers are based on open source systems for packaging, but the efficiency and ability of this version to schedule resources have a lot of room for improvement.

And, for well-known reasons, A100 and H100, the two most powerful GPUs at present, have not yet found a scalable alternative.

△NVIDIA A100 graphics card

In summary, insufficient computing power is already a drawback, but in the ChatGPT era, the demand for computing power has expanded dramatically, in addition to a large amount of training computing power, a large amount of reasoning computing power also needs to be supported.

So the current situation is that because ChatGPT shows the reasoning ability of large models, the computing power demand for training and studying large models increases; At the same time, because of the explosion of large models, the computing power resources flocking to large models have also increased.

The computing power resources allocated to the large model field have become abundant, and the lack of food and clothing in other AI fields has gradually intensified, and the research and development capabilities have been hindered.

It can be said that after ChatGPT became the white moonlight of today's AI industry, it exacerbated the gap between the rich and the poor in the distribution of computing power.

Is such a large model of the "rich" party the best on the path of AI research? No one has been able to answer yet.

But it is worth noting and paying attention to the fact that the large model led by the GPT series should not attract all attention, and there are various research directions in the entire AI field, as well as more subdivided vertical fields, as well as models and products that bring more productivity.

When the popularity of ChatGPT tends to flatten, will the gap in the allocation of computing resources in the academic community narrow?

All laboratories and teams that are not large models are probably looking forward to it.

Read on