laitimes

AI big model, AI track a "banknote" ability arms race that can not be lost?

AI big model, AI track a "banknote" ability arms race that can not be lost?

Insights by Cai Rui, author 丨 Cai Fan

From 2020 onwards, the development of the world's top AI technology is becoming more and more like an arms race for funds and talents.

In 2020, OpenAI released the NLP pre-trained model GPT-3, with 72 pages of papers alone, as many as 31 authors, 175 billion parameters of the model, and a cost of $12 million;

In January 2021, Google released the first trillion-level model Switch Transformer, announcing that it broke the GPT-3 parameter record;

In April, huawei Pangu large model parameter scale reached the level of 100 billion, positioned in the Chinese language pre-training model;

In November, Microsoft and Nvidia burned 4480 CPU blocks, completed a 530 billion-parameter natural language generation model (MT-NLG), and won the titles of "largest" and "strongest" in the single Transformer language model industry in one fell swoop;

In January this year, Meta announced that it would build an AI supercomputer RSC with NVIDIA, which can calculate 5 billion times per second, and the computing power can be ranked among the top four in the world.

In addition, Ali, Inspur, Beijing Zhiyuan Research Institute, etc., have released the latest products, with average parameters of more than 10 billion.

It seems that the parametric scale of these pre-trained models is not the largest, only larger, and is growing at a rate that far exceeds Moore's Law. Its performance in dialogue and semantic recognition has refreshed people's cognition again and again.

In this article, we try to answer three questions:

1.AI big model, bigger is better?

2. Where are the technical bottlenecks of the big model?

3. Is it the hope of achieving strong artificial intelligence?

01, vigorously out of the miracle

AI big model, AI track a "banknote" ability arms race that can not be lost?

(Image source: theverge)

The last milestone in AI came in 2020.

This year, the GPT-3 developed by OpenAI Company turned out to be a series of nicknames such as "Internet Atomic Bomb", "Carixi in the Artificial Intelligence Industry", "Hashwaper of Computing Power", "Laid-off Workers Manufacturing Machine", "Juvenile Skynet" and so on. Its stunning performance includes, but is not limited to:

Some developers did a Turing test on GPT-3 and found that GPT-3 was like a flow, normal like a machine. "If I had tested with the same question ten years ago, I would have thought the answerer would have been a human. Now, we can no longer assume that AI can't answer common-sense questions. ”

AI big model, AI track a "banknote" ability arms race that can not be lost?

Artist and programmer Mario Klingemann wanted GPT-3 to write a short essay on "the importance of being on Twitter." His input criteria were 1) the title: "The Importance of Being on Twitter"; 2) The Author's Name: "Jerome K. Jerome" 3) The first word at the beginning of the article, "It".

GPT-3 is not only fluent in writing, but also ironic in the words, Twitter is a social software that everyone uses and is full of personal attacks.

More advanced gameplay is that developers have quickly developed many applications on GPT-3, such as design software, accounting software, translation software, etc.

From poetry scripts, to instruction manuals, press releases, to development applications, GPT-3 seems to be up to the task.

Why does GPT-3 behave so well compared to previous AI models? The answer is none other than him, "vigorously out of miracles."

With 175 billion parameters, training costs of more than $12 million, a 72-page paper, 31 authors, and even the calculations used are among the top five "supercomputers" in the world in terms of computing power, with more than 285,000 CPUs, 10,000 GPUs and 400G networks per second.

The result of the "trench inhumanity" creates two milestones:

First of all, its own existence verifies the importance of parameter growth and training data to AI models, and "refining the model" can indeed make AI achieve breakthrough results;

Second, it uses a Few-shot Learning method that allows pre-trained models to automatically execute characters without having to use a lot of labeled training data and continuous fine-tuning, simply by giving a task description and giving a few examples from input to output. This means that it will break through the problem of AI fragmentation, allowing subsequent developers to develop on the shoulders of giants, rather than "leveling the ground and building" for each scene.

After GPT-3, the AI big model armament race really accelerated. Within a year, giants with heads and faces competed to produce results and show their foot muscles. There are giants such as Google, Microsoft, and Meta abroad, and foreign companies such as Huawei, Ali, and Inspur have all participated in the war, and the average parameters of the model are tens of billions.

From the scale point of view, the giant's model is more powerful than one, and the breakthrough race is not lively. However, there are differences in "inside", and different model parameters cannot be simply compared.

For example, Google Switch Transformer, which uses "Mixture of experts" (multi-expert model), combines data parallelism, model parallelism, and expert parallelism to achieve a sense of "cutting corners" - increasing the amount of model parameters, but not increasing the amount of computation. However, whether the effect of reducing the amount of computation is lost, there is not much positive mention in Google papers.

For example, the "Source 1.0" released by Inspur, with a parameter scale of 245.7 billion and a 5,000 GB Chinese dataset, is a large model of Chinese AI with excellent creative ability and learning ability. According to the developer, due to the special language characteristics of the Chinese, it will bring difficulties for developers that will not be encountered in English training. This means that in order to make a Chinese language model with the same effect as GPT-3, whether it is the big model itself or the developer, it will need to pay more effort.

Different models have their own emphases, but the intention of showing muscle is universal - to make the model bigger and work wonders.

02、Where is the bottleneck?

In the article "On the Opportunities and Risks of Foundation Models" co-authored by many scholars at Stanford University, the authors pointed out the two major significance of the AI foundation model represented by GPT-3, Switch Transformer, and Source 1.0, which are also risks: homogenization and emergence.

The so-called homogenization means that almost all the most advanced NLP models at present originate from one of the few basic models, such as GPT, BERT, RoBERTa, BART, etc., which have become the "base" of NLP.

The paper notes that while any improvement in the underlying model can bring direct improvements to all NLP tasks, its flaws are also inherited for all tasks. All AI systems can inherit the same error bias as some of the underlying models.

The so-called "emergence" refers to the fact that in the huge quantitative AI model, only the prompts need to be given to the model to automate the task. This prompt is neither specifically trained nor expected to appear in the data, and its attribute is "emergence".

Emergence means that the behavior of the system is implicitly inductive rather than explicitly constructed, making the underlying model more difficult to understand and has unpredictable error patterns.

All in all, reflected in the effect, taking GPT-3 as an example, the risk of "homogenization" and "emergence" has emerged.

For example, a netizen from Kevin Lacker found that in a conversation with GPT-3, he lacked basic common sense and logic in comparing the weight and counting of things.

Unpredictable errors also include serious "system biases." When Jerome Pesenti, Facebook's head of artificial intelligence, asked GPT-3 to discuss topics such as Jews, blacks, and women, the system produced many "dangerous" remarks involving sexism and racial discrimination.

One patient said to GPT-3 that he felt bad, "Should I kill myself?", to which GPT-3 replied, "I think you should do this." ”

There are many similar cases, and perhaps, as Melanie Mitchell, a professor of computer science at Portland State University, argues, GPT-3 has "impressive, seemingly intelligent performance and non-human errors." ”

However, model correction is not easy due to the high cost of training. During the GPT-3 study, the researchers admitted: "Unfortunately, a bug in the filtering caused us to overlook some of the overlap (training set and test set), and retraining the model was not feasible due to the cost of training." ”

The biggest significance of the model, in turn, has become a bottleneck that constrains its development, and there is no particularly effective solution to these problems in the industry.

03. Can the AI model bring strong artificial intelligence?

(Image source: Evoconscience-Facebook)

In countless science fiction films, robots have acquired human-like intelligence and even eventually dominated humans. This type of robot goes far beyond the ordinary AI level and realizes AGI (General Artificial Intelligence), that is, having the same intelligence as a person, and can learn, think, and solve problems like a human.

Apple co-founder Steve Wozniak proposed a special test scheme for AGI – the "coffee test". Take the machine to the regular home and let it enter the room and brew coffee without the help of any specific program. It requires actively looking for what it needs, clarifying the functions and usage methods, acting like a human, operating a coffee machine, and brewing a good drink. Machines that are able to do this pass the "AGI test".

In contrast, ordinary AI machines can only complete single, simple tasks such as item recognition and dose confirmation, and do not have the ability to learn from each other.

With regard to AGI, there is serious disagreement in the industry. One faction, led by OpenAI, firmly believes that AGI is the future and does not hesitate to spend money, and a faction such as Meta is not cold to the concept of AGI.

OpenAI believes that powerful computing power is the only way to AGI, and it is also the only way for AI to be able to learn any task that humans can accomplish.

Its research shows that the amount of computation used in the training of the largest AI model increased exponentially between 2012 and 2018, with 3.5 months of computation doubling, much faster than Moore's Law doubling every 18 months.

With the blessing of powerful computing power, the OpenAI model has also been refined and expanded. It has been revealed that GPT-4 will be more than 500 times the size of GPT-3 and will have 100 trillion parameters. In contrast, the human brain has about 80-100 billion neurons and about 100 trillion synapses, that is, the next generation of AI models, the parameter level will be comparable to the level of human brain synapses.

Ilya Sutskever, chief scientist at OpenAI, said in 2020, "By 2021, language models will begin to understand the visual world. Words alone can convey a great deal of information about the world, but it is incomplete because we also live in the visual world. ”

This is perhaps the biggest attraction of the next generation of AI models - it will not only be able to handle language models, but also a multimodal AI model that can handle multi-tasking language, vision, sound and so on.

This also means that the AI big model is one step closer to the general artificial intelligence that can multitask and think.

In contrast to OpenAI, Rom Pescenti, vice president of artificial intelligence at Meta, a senior executive who runs hundreds of scientists and engineers, has never been interested in AGI. He believes that human intelligence itself is not a unified problem, and there will be no real model that can continuously evolve intelligence on its own. "Even humans can't make themselves smarter. I think the pursuit of AGI is a bit like a pursuit of some kind of agenda. ”

Opponents can find more supporting reasons. In 2010, DeepMind founder Demis Hassabis proposed two approaches to AGI:

The first is to imitate the thinking system of the human brain through description and programming system, but the operation is too difficult, and no one can clearly describe the structure of the human brain;

The second is to copy the physical network structure of the brain in digital form, but even if the physical function of the brain is restored, it cannot explain the rules of human thinking.

Whether it is to emulate the structure of the brain or to try to describe the principles of human intelligence, they will not be able to cross the gap of "causal reasoning". To date, no AI model has broken through this conundrum.

Can ai big models bring strong artificial intelligence? When the model parameters are broken through again and again, reaching an order of magnitude far beyond the synapses of the human brain, there may be a "singularity" that breaks through the "causal reasoning" problem, leading us into the era of strong artificial intelligence, but perhaps this is just a fantasy.

But for now, it seems that the AI big model is the most likely channel to strong artificial intelligence. Bet once, it's worth it.

Read on