laitimes

What is the magical "emergence" of artificial intelligence models? Chen Jing

author:Yuan Lanfeng

On December 24, 2023, the finals of the National General Artificial Intelligence Innovation Competition hosted by Anhui Province were held in Wuhu. This is the China Computer Federation Science and Technology Innovation Competition, which specifically focuses on general artificial intelligence, and is related to the recent breakthrough of large models. The competition attracted more than 300 project teams across the country, 80 teams entered the semi-finals, more than 80% of the teams outside the province, and 20 teams advanced to the finals.

A set of software and hardware products on mental health in Hefei Zhongjuyuan Intelligent used artificial intelligence universal analysis to carry out real-time monitoring of people's mental health in the whole cycle, and won the first prize of the competition. In addition to cash prizes, the winning teams will also receive comprehensive support of up to 30 million yuan for landing in Anhui, and projects have been signed in Hefei, Wuhu and Suzhou.

The AI model is not only surprising through dialogue, but also many projects have already begun to be implemented. The key to all this is the miraculous "emergence" of large artificial intelligence models, and this article explains this phenomenon.

1. Artificial intelligence is booming again

The popularity of large models triggered by ChatGPT has not decreased, and more than 200 have been launched in China in a short period of time. Google's large-scale model Gemini launched on December 6 exploded, and the video demonstration was impressive, but it sparked a fake controversy.

AIGC (Artificial Intelligence Generated Content) is making progress. In the short video generated by HeyGen AI developed by Shenzhen Shiyun Technology, Guo Degang spoke fluent English, and Taylor Swift spoke Chinese tone and mouth shape, causing a commotion. Video image generation software such as Runaway and Pika have worked very well and have exploded in the Chinese and American technology circles. Midjourney's image generation has been a huge success in the market, with $200 million in annual revenue from just 40 employees without investors. The development process for game companies has changed, and concept artists have become much more efficient. AI-related venture capital is currently the hottest, and there is no one.

Artificial intelligence in 2023 is a bit unexpectedly popular. Originally, people thought that this would be an "artificial intelligence winter".

What is the magical "emergence" of artificial intelligence models? Chen Jing

Gartner's Emerging Technology Curve

In early 2016, Deepmind's AlphaGo defeated humans in the game of Go, triggering the biggest wave of artificial intelligence craze in years. However, since then, the popularity has gradually declined, and as many industry insiders have revealed and predicted, deep learning has both capabilities and flaws, and it is not advisable to expect too much. Just as autonomous driving has become a big hole in R&D, many companies have invested heavily but are difficult to break through. The valuation of artificial intelligence startups is declining, and venture capital is looking for a breakthrough. All this seems to be "normal", in line with the law of technological development, the excessive expectations generated during the explosion are shattered, the enthusiasm declines, and the industry continues to accumulate, recover from the trough, and promote the application of technology in the long term.

Even people in the industry did not expect that the artificial intelligence model and AIGC would become so popular in 2023. Zhou Hongyi, the founder of 360, shared his experience of going to Silicon Valley on November 30, saying that "investors will no longer consider companies that do not have AI concepts, AI functions, and AI components", and "the United States is betting on artificial intelligence, and the entire investment system, entrepreneurial system, large company system, and traditional company system are fully embracing AI." ”

From the perspective of industry and technological impact, the intensity of the artificial intelligence boom in 2023 has exceeded that of 2016. There is a reason for this, many researchers believe that human society has undergone a major breakthrough at the level of scientific principles that has not been seen for decades, and cannot be evaluated empirically with the general technological development curve.

This big breakthrough is the "emergence" of the ability of artificial intelligence large models. This article will explain from a technical point of view what the "emergence" of large models is and how significant it is.

2. Deep learning is a scientific breakthrough "emergence"

The most classic and well-known field of human scientific breakthroughs is physics. From the Galileo experiment and Newton's three laws, to the first half of the 20th century, when relativity and quantum mechanics reached their peak, this is the most deeply rooted development process in the field of science. New physical phenomena and laws are constantly being discovered, which has led to many scientific breakthroughs, and some have also brought about scientific and technological and industrial revolutions.

Since the second half of the 20th century, major physical discoveries have decreased significantly, and it seems that all the basic laws of the universe can be discovered. Some people believe that human society has "stagnated in science and technology", and major scientific discoveries and breakthroughs are becoming fewer and fewer, and even the ability to regress, such as the space landing on the moon. However, if you analyze it from the perspective of "emergence", you will have a different feeling.

Physics has been a lot of "emerging", and technological advances have allowed scientists to invent new experimental tools, discover exciting new phenomena, and observe and test new theories. In the early days of quantum mechanics, there were many big breakthroughs in a few years. Scientific discoveries often don't require much in-depth understanding, and even vague theories can lead to big breakthroughs with the right tools and testing instruments. Before the 20th century, people realized that matter is made of atoms, and many elements were discovered by means of spectroscopic analysis, but the microscopic theory of atoms is still not clear.

There are very few new phenomena in physics, and human science and technology will not stagnate. In the fields of biology and IT, there are constantly exciting new discoveries to promote the progress of science and technology and industry. There should be no distinction between scientific laws and phenomena, as long as they can bring people new abilities to understand and transform the world, it is a major breakthrough at the principle level. Artificial intelligence is built on the physical knowledge system, but its discovery significance is no less than that of the basic laws of physics.

The emergence of the ability of the large model of artificial intelligence can be compared to the discovery of electricity by human beings, and the emergence of exciting new phenomena, which is a basic scientific discovery with great potential. Although not many people really understand, people in the industry are exploring new worlds with a passion for science that they have not seen in decades.

In the more than 60 years of development history of artificial intelligence, many new phenomena have been generated. However, it is often controversial and the value is not as big as imagined, which is obviously limited by the development stage, and the "tools" (that is, computer hardware) capabilities that research and development of artificial intelligence rely on are insufficient. Criticism of AI capabilities and the revelation of major flaws have always accompanied the development of AI, and this is still the case in the era of large models, such as the "illusion" that is difficult to eliminate in machine conversations.

In the 50s and 60s of the last century, simple structures such as perceptrons and manually written algorithm programs played chess, which made scholars realize that artificial intelligence (AI) is a new scientific field. However, due to the overly simple structure of neural networks, the difficulty of writing AI programs with manual code, and the exponential increase in algorithm complexity, artificial intelligence encountered a trough in the early days. In the 80s, Japan chose artificial intelligence as the breakthrough direction of the "fifth generation of computers", but in the end it completely failed, and the technical data was worthless.

What is the magical "emergence" of artificial intelligence models? Chen Jing

Perceptron Models and "XOR Problems"

The famous "XOR problem" is that Minsky and other researchers pointed out that a single-layer perceptron can output the sum of, or, or unsuccessfully of two input values by adjusting the neural network coefficients, but no matter how the coefficients are adjusted, it cannot output the XOR result. Theoretically, it is not possible, the principle is as follows: 0 and 1 on the right side of the above figure are placed at the intersection of the four corners, and it is impossible to draw a straight line to put 0 and 1 aside. In general, if two patterns are "linearly separable" through a hyperplane, the perceptron can train convergence, but most of the pattern recognition problems in practical applications are nonlinear.

The fact that the "linearly separable" problem can be successfully trained with a neural network is a new phenomenon from the perspective of scientific discovery. The basic characteristics of the large model with trillions of coefficients can be found in the original perceptron, such as adjustment coefficients, simple operations of addition, subtraction, multiplication and division, and interpretation of numerical results. However, due to the limitations of understanding at that time, the academic community generally regarded neural networks as a "toy" with little meaning, corresponding to the first artificial intelligence winter from 1974 to 1980. There are many examples of this kind in the scientific community, and the research results are a bit interesting, but if there is no progress later, it will gradually cool down, and it will rarely heat up again.

In the 80s, researchers such as Yang Likun and Hinton (plus Bengio, the three of whom were winners of the 2018 Turing Award) introduced multi-layer neural networks and the meaningful "Back Propagation" (BP) algorithm, and successfully achieved handwritten digit recognition with sufficient accuracy. Thanks to the rise in computer performance and the impressive ability of the human code to write a well-written chess program, chess defeated the human world champion.

During this period, artificial intelligence has developed, and it can be regarded as a small achievement, but it is not very prominent in the IT tide at that time. This corresponds to the second AI winter from 1987 to 2016, which can be understood from an investment perspective. People are keen on software development, communications, the Internet, mobile apps and other investment directions, and artificial intelligence is not a hot topic.

IBM's Deep Blue development was very costly, and it was mothballed after the victory over Kasparov, and the subsequent development and technology had little impact. It is recognized that relying on artificial code to write artificial intelligence algorithms will be stuck in the exponential complexity of game problems, and the logic ability of the written expert system is limited, making it difficult to deal with complex problems. This kind of "symbolism" development path was the mainstream of artificial intelligence at that time, and the top achievements represented the industry, but the back direction was confused.

What is the magical "emergence" of artificial intelligence models? Chen Jing

BP neural network structure, a hidden layer

Later, the deep learning and large models that shined brightly have their basic structures and training frameworks at this stage. Multi-layer neural networks are connected before and after, corresponding to the "connectionism" of artificial intelligence. Forward conduction calculates the results of the final node, compares it with the sample to produce an "error", backpropagates back layer by layer, and repeatedly modifies the coefficient with methods such as "gradient descent" to reduce the error and optimize the overall "loss function". These seemingly uncomplicated basic techniques, with repeated training to minimize the loss function, can produce amazing pattern recognition results, recognizing simple patterns such as handwritten numbers. However, the ability of multi-layer neural networks at this stage is still limited, and slightly complex image pattern recognition problems are not performed well, which limits the application of technology.

In 2016, the AI boom suddenly took off, as AlphaGo defeated the best human players in an extremely difficult Go problem (symbolism is powerless), which was very unexpected and somewhat dramatic. In fact, for people in the industry, it is technically a matter of course, and it is the result of a combination of various technical factors, including the traditional MCTS (Monte Carlo Tree Search) game search algorithm, as well as the application of new technologies such as deep neural networks, reinforcement learning, and adversarial generative networks (GAN). The results are good, but the technology is not too groundbreaking, which is understandable, and many individual developers have developed powerful Go AI.

For the industry, the foundation of the 2012 image recognition neural network AlexNet is of greater significance. AlexNet's three-member development team includes mentor Hinton, as well as two students, one of whom is Ilya Sutskever, the technical core of ChatGPT, who also participated in the development of AlphaGo. AlexNet relies on deep convolutional neural networks, and in the ImageNet image recognition competition, the error rate has been reduced to 15% in one fell swoop, compared with 30% of other technologies. Here's a development that really excites the industry: deep learning shows its magic.

Deep learning has enabled the industry to find its way out of confusion in one fell swoop. The amount of training data is increasing rapidly as computer hardware speeds continue to increase, as well as GPU parallel acceleration. After breaking through the bottleneck, the capabilities of deep neural networks "emerge" at once. In a short period of time, deep learning has swept through almost all fields of science, and people's experiences have been constructed, trained, and brought into various neural network structures. The quality of machine translation has improved by leaps and bounds, the ability of face recognition is amazing, and the paintings are fake and real. These advances actually happened before AlphaGo came along, and society knew that "deep learning is powerful", but it didn't think too far.

This is an "emergence" in the true scientific sense. Computers were used as tools to help research in various fields of science, and it was domain expertise that led the way. But suddenly, all disciplines found that even the research paradigm had changed.

The "emergence" of deep learning has two meanings. On the first level, with the increasing scale of the neural network, the speed of the training machine, and the number of samples, after reaching a certain scale, it suddenly "changes from quantity to quality", and the neural network ability jumps and "emerges", greatly improving the image recognition effect. The second layer is that deep learning has performed extremely well in the field of image recognition, and this ability has been rapidly extended to other computer fields, and even changed other disciplines, and the application scope of this ability has also emerged.

Interestingly, people are paying so much attention to AlphaGo because of the interest in the ultimate meaning of "intelligence". Many people imagine that a humanoid machine is thinking about defeating a human chess player, and that the last bastion of human "intelligence" is proved to be inferior to a machine, and that work will be replaced by a machine, which has triggered a lot of philosophical and social thinking. However, the artificial intelligence technology represented by AlphaGo has nothing to do with the essence of intelligence, it is an "artificial" illusion, it just cleverly simulates a complex computing task. After the society slowly got used to it, the boundaries of artificial intelligence capabilities became clear, and the discussion on machine intelligence quickly cooled down after 2018, and it seems that it is going to return to the cold winter, and the investment enthusiasm is declining.

Of course, in the industry, since the explosion of artificial intelligence in 2016, there has been no cold winter. Developers are actively applying deep learning in various fields, and researchers are exploring new neural network architectures and training methods. It's just that the outside world thinks it's "not so magical", and it feels flat.

According to the Minsky mathematical understanding, this breakthrough in deep learning is to use a large number of coefficients (millions to hundreds of millions) to construct large-scale mathematical formulas to fit and approximate the solution space of complex problems such as Go and image recognition. It has developed from the simplest "linear segmentation plane" to the use of extremely large and complex hyperspace surfaces to divide space. The construction method is statistical fitting, through the comparison of a large number of samples with statistical errors, the backpropagation modification coefficient reduces the error, and the error is reduced to a very small size after multiple learning, and the numerical simulation is successful. Samples can be manually annotated or automatically generated, and the hardware basis is GPU-accelerated parallel computing (thousands of compute cores).

When the author communicates with humanities scholars, he explains the mathematical significance of statistical simulation, and the magical feeling of "intelligence" disappears from artificial intelligence. And this statistical simulation will obviously be flawed and will not have a solid logical basis. Its success is statistical, and it's hard to predict when a bug occurs.

What is the magical "emergence" of artificial intelligence models? Chen Jing

Just like AlphaGo's 3:0 victory over Ke Jie, the number one master of mankind, is considered to have completely defeated mankind, looking back, it is likely to have major flaws. By constructing a rare chess shape such as "Panlongyan", the researchers hit the weakness of the Go AI and made the extremely powerful AI make simple mistakes. In February 2023, Ryunosuke Shibano, a Japanese professional second dan, played a game of black against Go AI, where black induced white to walk out of a piece of chess that was connected in a circle (surrounded by a live piece of black chess). Since AI rarely appears in such a chess shape in training, it will have an illusion of the death and life of the chess pieces, and finally a large piece will die, and the industry believes that all Go AIs will have this bug.

Such examples are everywhere in various fields. Pattern recognition based on deep neural networks will have defects that are difficult to eliminate, and it will be very troublesome when applied to safety fields such as autonomous driving. To some extent, this is also the technical root of the "third artificial intelligence winter" statement, which does not meet expectations, and some researchers feel lost.

3. The second "emergence" of artificial intelligence: large models

Just when the industry generally thought that there would be no huge breakthrough in worker intelligence in the short term, a bigger breakthrough came!

At the end of 2022, ChatGPT and GPT4 detonated attention one after another, and major global IT companies urgently purchased NVIDIA GPUs to develop large models. In the eyes of the industry, the performance of artificial intelligence is really close to the original meaning of "intelligence", although there is still controversy. Because of a boom in 2016, the outside world has had a "lesson" of high hopes, but it is not very "fanatical".

Breakthroughs in artificial intelligence often start with some seemingly simple tasks. This time, the big model starts with a "simple" task: predicting what the next word will say. The basic operation of the "language model" is to spit out words one after another, and the form is as simple as that. In the past, people have also seen chatbots and poetry machines, and they are not too special, but they did not expect that a huge breakthrough has occurred in this field, which may produce real "intelligence".

Whether machines can learn real intelligence by learning only human language is controversial, and Yang Likun strongly denies it. But this is the category of "AI philosophy", so let's leave it alone and see what happens to the massive "corpus" of machine learning.

GPT stands for Generative Pre-Trained Transformer, so let's take a look at what it means. Transformer is a kind of neural network structure, which was invented in 2017 and proved its ability in the task of machine translation, which is not complicated, but has a very large number of coefficients, and is used to store hundreds of billions of coefficients. Generative is generative, GPT will generate dialogue text and other content, and the recent explosion of picture and video software is also generative application. Pre-Trained is "pre-training", and one understanding is to hand over the entire Internet's large-scale corpus text to Transformer to learn, and later add voice and video materials, multimodal. The corpus doesn't even need to be manually annotated (choosing to remove harmful content is another matter), and pre-training is to let GPT predict the next word in the text corpus, and the backpropagation adjustment factor is not allowed.

This task sounds simple, but come to think of it, what will GPT learn from it? It's not easy. It should be noted that the researchers have expanded the storage and training "computing power" of the machine to be enough to handle so much corpus of the entire Internet.

A traditional observation is that learning corpus allows the machine to learn "grammar" and "semantics". In the field of NLP (natural language understanding), people have a deep understanding of the machine translation task, and researchers use artificial code to implement grammar and establish corresponding associations with words, which is a dead end, and it is very ugly. The machine automatically learns from the training text, and can establish the grammatical and semantic associations between words of a language, and translate them in a decent way. It knows that some words are related to each other, often appear together, and what conditions are in place when they appear, and these relationships are recorded in the neural network coefficients. Transformer data structures can easily associate words in a sentence.

Even if the machine translates well, people know that the machine doesn't understand what the words mean. Mathematically, a machine encodes a piece of speech in an encoder and then decodes it into another language. It's an algorithm, which is encoded and decoded, and when it's debugged, people think it's a good translation. In fact, in the eyes of the translation machine, it is only faced with a few "tokens" (symbols), which are related to each other, and it does not need to know what they are. Just like Go with a definite answer (a complete information game), the output of the translation is relatively deterministic and is an "easy" task (humans always do this, and the solution is considered easy).

But the task of GPT pre-training is not to translate, but to predict the next token. This is much more difficult than translating decently, and in order to make the following sequence of texts reasonable (so that it will be consistent with the human corpus with a high probability), it needs to understand the "facts" and even learn to "reason"! At this time, in the field of artificial intelligence, a truly shocking new scientific phenomenon has "emerged".

What is the magical "emergence" of artificial intelligence models? Chen Jing

Taking the "Spark Model" of iFLYTEK, which ranks first in China's large model, as an example, it faces the problem of "why didn't I get to Beijing after driving 30 minutes from Xi'an". There will be no direct answer in the corpus, so the question needs to be broken down. To understand the relevant corpus of "driving has not arrived", "time" and "distance" will be introduced, and then according to the distance, Xi'an and Beijing will be correlated, and "speed" will be introduced, and finally the combined answer will be introduced. This process is not intuitive and is really like reasoning in form.

In the application of ChatGPT and GPT4, there are many such cases, which make people believe that machines really have powerful reasoning skills. The people at OpenAI say that sometimes they don't know how GPT4 was launched, and the mechanism inside is really amazing.

Of course, GPT also has a lot of logical flaws, and it is not difficult for people to induce outrageous answers from machines. However, from the perspective of scientific discovery, new phenomena can be repeated, even if the application requires conditions and is flawed, it is a very good substantive breakthrough. In the past, researchers thought that chatbots were just formal language imitators (and many people now see GPT as such), but they have never found that machines have such powerful reasoning capabilities. After watching a lot of GPT's dialogue, I can clearly feel that the data structure of the machine really contains the ability to reason, and it is impossible to imitate it.

Excitement about "new phenomena" rather than more attention to outrageous flaws is what distinguishes researchers from ordinary people. The outside world will ask to make up for serious defects, otherwise it will not pass the Turing test, and I can't believe it when I apply it. But researchers will pay more attention to the new capabilities that machines are exhibiting, knowing that there is a "new world" here. Physicists are extremely concerned about possible "new physics", and they are often disappointed when they pounce on a bunch of analyses for the slightest clue. Of course, AI researchers will focus on questions like "how machine reasoning power is generated", so big companies are crazy and jumping on it. The hard conditions such as computing power, storage, and capital required to study large models are too high, much higher than deep learning, otherwise there will be more researchers. But the number of large models in China and the United States is already very large, and this collective excitement of "great discoveries ahead" has never been seen before.

In just 10 years, there have been two "emergences" in the field of artificial intelligence at the level of scientific principles, one for deep learning and one for large language models. The meaning may not be clear to the outside world, but the industry has really generated unprecedented enthusiasm.

How do large models learn to reason? This can also be described. A similar example is the Go AI learning to "Zhengzi". AI training is constantly improving, with a set of "weights" corresponding to a version. A game like Go, which has a win-loss game, allows the AI to start from nothing, let each version play "self-playing" against each other, and improve the weights according to the results of the match, and the weights that perform well become the winners and continue to develop. This training can be distributed, and LeelaZero is powered by many enthusiasts who contribute to the machine's self-playing update weights.

What is the magical "emergence" of artificial intelligence models? Chen Jing

During the training, it was obvious to the enthusiasts that the conventional Go tactics, such as eating, lifting, and escaping, could be learned quickly by the AI versions, but Zhengzi was difficult to learn. Because it involves an oblique relationship between pieces that are far apart, it is difficult for AI to twist and capture them. But over time, there will be a lucky weighted version that learns to judge the signs, and will use this ability to kill other versions that don't. To learn to sign, the neural network structure of Go AI should be large enough, such as 20 layers can be 10 layers is not enough, and many games of self-play must be trained.

The same is true of GPT's pre-trained inference capabilities. First of all, the scale of the network structure should be large, and OpenAI will continue to expand its scale from GPT2 to GPT3 and GPT4, with hundreds of billions of coefficients, which is enough. Then the training corpus should be more and the training time should be longer. The rest is to see how GPT's abilities emerge gradually, just like the self-play training of Go AI, simple abilities are learned first, and complex ones are learned later.

The astonishing scientific discovery is that GPT3 is so successful that it learns complex inferences in pre-training! Just like AlexNet is for deep learning, GPT3 has made the industry realize the great potential of large language models.

It can be understood in this way that there are some GPT tasks that predict text, and if you can't reason, you will definitely not do well. If you don't do it well, the "loss" value will be relatively large. GPT trained repeatedly, constantly modifying the weights in various ways, trying to reduce the "loss", and finally at a certain point, the "loss" was lowered. And this is equivalent to GPT having the ability to reason, and the output is decent.

In fact, the same is true for human beings to learn reasoning, if you know it, you can pass the exam, otherwise you won't be able to pass it, and you won't be able to pass it. As for how human beings learn, everyone has their own method, and the test and application are the judging criteria. Philosophically, it is not clear to say that machines are not reasoning, they are calculating imitations, not intelligence. The fair judgment is that the machine has completed the task that requires reasoning, that is, it has the ability to pile up reasoning, and has mastered many of the "facts" needed for reasoning.

OpenAI did not publish the technical details of GPT3 and GPT4 as usual, and people can only guess about some training techniques. But there will always be exchanges between people in the industry, and employees will be poached, and technology cannot always be exclusive. Therefore, GPT training technology is spreading, and some cognitions have gradually become the consensus of the industry. GPT's success lies in its "emergence", which is the consensus of the industry.

The emergence of GPT this time is particularly philosophical, and there is more to say than the emergence of deep learning.

1. Similar to deep learning, the network scale, corpus, machine speed, and training time continue to grow, and eventually new capabilities emerge, and quantitative changes lead to qualitative changes. This is a conventional expectation, and people are just not sure if Transformer-based GPT will succeed and don't want to invest a lot. This hurdle has passed, and countless companies are willing to spend a lot of money.

2. GPT's pre-training is trying to reduce the "loss", and the value of this loss function is uniform. However, unlike AlexNet's single task, GPT actually has many tasks whose text output performance needs to be improved. It may perform well in some scenarios with low difficulty dialogue, while others will perform poorly if they test complex reasoning and even math skills. The emergence of GPT is not a one-time thing, but the ability of various types of tasks, from easy to difficult, and gradually improving. In other words, with the emergence of GPT, the phenomenon itself is very diverse, and there are many details worth exploring. For example, the ability of a certain type of task suddenly emerges, even if the loss function does not seem to be very different, which is new to other single-task training. Another example is that people find GPT's logical ability impressive, but when it comes to mathematics, it's much worse.

3. The emergence of GPT has not yet reached the end. People are already excited when they find out that "quantitative change leads to qualitative change" is happening, and maybe just the success of a few small tasks can convince people of this. But if you continue to train, you will find that there are more and more good things, and the types of tasks contained in human texts are actually endless, and the difficulty will become more and more difficult, and the ability will be tested very much. It's hard to tell just how powerful the GPT framework really is, but this sense of the unknown is even more exciting. People are like looking for treasure in a cave, knowing that there is a treasure, and not knowing what type it is, which will attract more treasure hunters. With the emergence of deep learning, it is easy to judge and have the ability, but it will not be better to train again, and the whole process is familiar.

4. The scale of GPT should continue to expand, from hundreds of billions of parameters to trillions of parameters, or even higher. For general deep learning tasks, the network scale is enough, and it is meaningless to expand it, but it may be "overfitted". However, for GPT to memorize the "factual" information of human society, the scale of hundreds of billions of coefficients is obviously not enough. While it can do "information compression", this will definitely lose information. Another intuition is that as the network scales, so does the "potential" of GPT, wandering in a sea of complex heuristics to discover deep correlations.

Looking at the above GPT emergence characteristics, it can be understood that the excitement of researchers is more than that of deep learning. Some radical scholars believe that the GPT architecture contains real intelligence, and really start to think about the destruction of human beings by machines. One of the strange things is that OpenAI has spent a lot of energy on AI security research, and even led to a "coup" type of turmoil within the company. The emergence of GPT is indeed very characteristic of human intelligence, diverse, complex, unpredictable, and with unlimited potential. Therefore, this is certainly the closest AI and the entire field of scientific research to "artificial general intelligence" (AGI). It is also understandable why scholars are seriously discussing the "AI philosophy" related to GPT.

The outside world does not know enough about the emergence of GPT, and it is easy to underestimate the significance of its scientific discoveries. Many people just pay attention to the chat performance of various GPTs, ChatGPT and GPT4 are strong, and there is a gap between domestic ones. Some people are amazed by the powerful reasoning power displayed in AI chat, or shocked by the AI chat's ridiculous nonsense and lies. There is a tendency in the outside world to seem that the research on GPT is mainly to make it talk better, and there are no flaws.

In fact, for the core of GPT research, it should be to explore more "emergent" details. Large companies such as Microsoft and Google are experimenting with larger models, not to make bots talk better, but to explore fascinating "emergence". Perhaps, through the continuous emergence of GPT, it finally leads to AGI, perhaps, as Yang Likun predicted, this road will not work. But in any case, now is not the time to focus on defects and improve the product. Perhaps when the boundaries of GPT's capabilities are clearly explored, developers can go back and use their capabilities to develop and find ways to avoid defects.

It is worth noting that at the beginning of 2022, GPT3 has already seen a very successful "emergence", and even GPT4 has been pre-trained in August 2022, but only a few professionals are amazed and did not explode. It wasn't until ChatGPT (GPT3.5) debugged the output language to make humans feel comfortable through RLHF (reinforcement learning with human feedback) that it detonated global attention at the end of 2022.

This shows that human beings are easily influenced by "appearances", even professional researchers. Deep learning is a real "emerging" breakthrough, but it is far less sensational in the press than AlphaGo versus humans. The pre-training of GPT3 and GPT4 is a R&D framework that allows "emergence" to emerge continuously, and the potential is unlimited, but the effect of RLHF is more concerned by the outside world.

The same is true for domestic large models, hundreds of which are under development, and should pay attention to the "emergence" of GPT. Even if the scale of some domestic large models is not very large due to limited funding and hardware conditions, the exploration of model characteristics is also beneficial, and the ability to discover the characteristics of GPT emergence may accelerate the occurrence of emergence. Don't pay too much attention to the defects of domestic large models, this cannot be avoided, and there may be many reasons, such as insufficient corpus preparation, insufficient training time, and algorithm details. It is always rewarding to set up a large-scale model research and development framework and explore the details of "emergence".

For the application ecology of large models, if you have an understanding of the "emergence" characteristics of GPT, you may have a different feeling. Large models should not be treated as chatbots, which limits the imagination. GPT's emerging capabilities such as reasoning, mathematics, information compression, multimodality, and content generation have opened up a new R&D architecture. Like deep learning, it's both an exploratory framework and an application architecture.

Major IT companies in the United States are trying to transform the entire software system with GPT. Large companies will continue to expand GPT, like an arms race. A more common act is to develop a toolchain that will bring GPT applications to life so that developers can join in and apply the capabilities that GPT has emerged across a wide range of industries. The latter is the area in which China should learn more and has advantages.

The author is not worried about the basic capabilities of domestic large models. For example, iFLYTEK's Spark Model 3.0 is considered to be close to the level of ChatGPT, and it is ready to launch Spark AI 4.0, which is close to GPT4, in May 2024. Large model evaluation is an important research area, and the standards are not very uniform, but it is obvious that the ability of domestic large models is making rapid progress, and the gap with the United States is more than 2 years.

For a certain question, if the answer of the domestic large model is not as good as ChatGPT and GPT4, public opinion will be very concerned. In fact, we should pay more attention to "emergence", if the R&D structure of Chinese companies can allow all kinds of emergences to occur continuously, then in essence, China and the United States are competing on the same track. Perhaps the U.S. model has achieved level 4 emergence, China can only achieve level 3, and American companies have eliminated more bugs, which will make the gap seem very large. As Chinese companies delve deeper into the "emergence", the gap will be filled.

The real impact is on the application ecology of large models. Without an ecosystem, the company's large-scale model research and development will eventually be unsustainable, and even OpenAI feels that large-scale model development, operation and maintenance are too expensive. If Chinese companies start a large-scale model application ecosystem, they can iteratively develop and improve the defects in industry applications in a targeted manner, which is what Chinese companies are good at. Some applications can be successful, even if the basic technology is not so strong, but it grasps the pain points of the industry, the application is promoted, and the ecology is done, which in turn drives the improvement of basic technology.

Anhui's general artificial intelligence development plan has a deep understanding of this, and in 2025, it is necessary to "build abundant intelligent computing power, open up high-quality data, lead the country in general large models and industry large models, walk in the forefront of scenario applications in China, gather a large number of general artificial intelligence enterprises in Anhui, and form a first-class industrial ecology".

The history of China's industrial development itself is a process of continuous "emergence". Since 2000, China's many industrial miracles have continued to rise. Hefei, the capital of Anhui Province, which the author has been advocating since 2013, is the fastest growing city in the world, with a GDP increase of 37 times in 2022 compared with 2000.

Even if it has been advocated for many years, it will still be shocked by the development of Hefei and Anhui. For example, Anhui's automobile output in 2023 will be 2.491 million units, becoming the second in the country (Guangdong is far ahead). The general artificial intelligence competition was held in Wuhu, and the local Chery produced 1.88 million vehicles in 2023, and its own brands and exports have exploded, and it will hit 4 million in 2024. Hefei's new energy vehicle production in 2022 will be 25.5 units, an increase of 133% year-on-year, and 746,000 units in 2023, an increase of 140%. Hefei has a very good layout, BYD, Volkswagen, and Weilai are all here, and the goal is to produce 2 million new energy vehicles in 2025 and 3.4 million in 2027!

Using GPT as an analogy, after the reform of China's development mechanism, it was replaced by Transformer (the word has the meaning of change), and an incredible industrial "emergence" occurred.

Understanding the "emergence" of GPT, and then looking at the US government's suppression of Chinese artificial intelligence, even the 4090 graphics card GPU is not allowed to sell, you can understand that the US government is gambling and believes that there will be a big breakthrough in general artificial intelligence. U.S. Secretary of Commerce Raimondo said nakedly that he would slow down the development of Chinese artificial intelligence.

But China is already prepared, and companies such as Huawei and iFLYTEK have become the vanguard of the AI industry's struggle against the United States. Because iFLYTEK was put on the entity list by the United States, it took half a year of effort to adapt to domestic GPUs, which is leading in the country. HUAWEI CLOUD has built three AI computing centers, one of which is in Wuhu, Anhui Province.

Industrial development is inertia, and the "emergence" of industries has been realized, and there are advantages in the planning and implementation of emerging industry policies. For the development of general artificial intelligence, the author also blesses and is optimistic about Anhui.

■ Read more

The U.S.-China auto competition doesn't exist Chen Jing

The computing power of Tsinghua Optoelectronic Fusion Chip is more than 3,000 times that of GPU Chen Jing

No one can "kill" $1.5 trillion Nvidia | Chen Jing

To learn from China's endogenous growth model of building a large market, we need to face up to the rise of India's economy Chen Jing

■ Author

Chen Jing

He holds a bachelor's degree in computer science from the University of Science and Technology of China, a master's degree in computer science from the Hong Kong University of Science and Technology, a member of the Society for Science and Technology and Strategy, and the author of "China's Government-run Economy".

Read on