laitimes

Li Guojie: What is behind the miracle? CCCF Picks

author:CCFvoice

The current popular "scale is all you need" is a "hypothesis" or "empirical law", and behind the "miracles" may be the changes in computational models and the emergence of complex systems. The main function of the large model is "guessing plus verification", which is not the classical Turing calculation, but is essentially an uncertain calculation based on probability and statistics, and its efficiency in solving complex problems is much higher than that of the Turing machine model. Many new phenomena cannot be explained by old computational theories, and we need to break new ground on the computational complexity of computational models and artificial intelligence.

Li Guojie: What is behind the miracle? CCCF Picks

Artificial intelligence has made extraordinary breakthroughs, known internationally as "phenomenal breakthrough". Phenomenal itself has the meaning of "extraordinary", and it may be that the earliest scholars who translated the relevant articles did not have a high level of English, and fabricated the strange Chinese of "phenomenal breakthrough" to attract attention, which was later accepted by everyone through the Internet. The phenomenon should be one level lower than the essence, but the "phenomenon-level breakthrough" is said to be the biggest breakthrough, and there are only a few times in history, and the two contradict each other. This is a case of what can be done wrong in language transmission. A phenomenal breakthrough actually refers to a remarkable breakthrough, an extraordinary breakthrough.

My view is that generative artificial intelligence (AIGC) has greatly accelerated the pace of human progress towards the intelligent era, the popularization of knowledge automation has become a hallmark of the fourth industrial revolution, and the impact of the emergence of understanding capabilities on human society cannot be underestimated. How far has artificial intelligence advanced now? Different people have different opinions. Some "prophets" and media personalities believe that the "singularity" is approaching and that humanity is in danger. But serious AI scholars are mostly calm, believing that AI is still on the eve of the Galileo (Kepler) era or the Newtonian era.

Generally speaking, the views on artificial intelligence should be divided into two points, that is, the "two-point theory": one is that artificial intelligence technology has made unprecedented major breakthroughs in the application level, which will have a profound impact on economic and social development, and the other is that it is not yet mature in science and needs to do in-depth basic research.

This wave of artificial intelligence is both gratifying and confusing. The development and application of large models has become an important trend in the development of artificial intelligence, which has led to the proportion of computing power consumption in global energy consumption increasing from 3% to 10% in recent years, and is expected to reach 30% or more by 2030. If computing power doubles every 4 months, computing power will increase by a billion times in 10 years. The rapidly growing demand for computing power poses a huge challenge to the existing energy system. The proliferation of fusion energy and quantum computing technology has struggled to meet this explosive demand. At present, we are not sure whether large language models (LLMs) are the ultimate direction of AI development, and many scholars still have reservations about it. This paper attempts to explore the reasons behind the "miracle of force" from the perspective of the evolution of computing models, and proposes the research directions that experts in the field of computer science need to pay attention to.

I started with a 2019 article written by Richard Sutton, which is a must-read blog post for OpenAI employees. Sutton is a distinguished research scientist at DeepMind and is known as the "Godfather of Reinforcement Learning." In this article, he gives an important conclusion: "Bitter lesson: AI researchers have been trying to build what humans know into their agents, and in the long run, this approach has stalled, and the only thing that matters is the use of computing." Breakthroughs are ultimately achieved through the opposite approach, based on search and learning. This success carries with it bitterness because instead of a human-centric approach, it relies on machine learning. ”

This lesson has two meanings: one is that in the traditional sense, we emphasize the importance of knowledge, believing that "knowledge is power", of course, knowledge is still a power, but data and computing power are also powerful forces, and they can combine to produce new knowledge;

Intelligent technologies such as GPT-4 are essentially the same as the artificial neural network theory of 20 years ago, and their principles can be traced back to the neuronal computational model proposed by McCulloch and Pitts in 1943. Generative technologies such as GPT-4 and Sora have not proposed new AI principles, with companies such as OpenAI and Google largely playing an engineering amplification role. I don't think this is a strict scientific judgment, but rather a "hypothesis" or "empirical law", or even a "belief" or "gamble".

Scholars represented by OpenAI have summarized several "axioms", emphasizing that scale is the magic weapon for victory. These axioms are not as time-tested as Euclidean axioms of geometry, but they have been proven true by decades of research and can therefore be regarded as "hypothetical axioms". The first axiom is the "bitter lesson", all the various technologies in the field of artificial intelligence are not as good as the general algorithms supported by computing power, so it should be considered that the general algorithms supported by powerful computing power (including models and data) are the real direction of artificial intelligence; algorithm, you can find a set of general rules, the more data, the larger the model, the better the effect, and this law can be predicted before training; the third axiom is emergence, with the expansion of scale, the increase of data, the large model will definitely emerge the ability that did not exist before, this ability can be seen by everyone.

The first axiom is that large models, large computing power, and big data are necessary conditions for Artificial General Intelligence (AGI), the second axiom is that large-scale is a sufficient condition for AGI, and big is good, and the third axiom is the test axiom. The lesson learned from companies like OpenAI is that if you can solve a problem with scale, don't solve it with a new algorithm. The greatest value of the new algorithm is how to scale it better. These three axioms are a summary of experience described in the vernacular, which has yet to be verified in future practice, and their expression is not as rigorous as that of mathematical axioms, so they can only be regarded as a kind of "faith" at present.

In my opinion, the breakthrough of artificial intelligence is due to big data, large models and large computing power, and these three "big" are indispensable and cannot rely on only one of them. Computing power alone is not a panacea. Take Go as an example, if the Go chessboard is expanded to 20×20, the computing power required for brute force search needs to be increased by 1018 times, that is, from 3361 to 3400, and computing power alone will not help.

Why do large models work wonders when they scale up? The reasons behind this may involve computational models and complex systems, and need to be considered in terms of computational complexity. The term "problem" in computer science refers to a precisely defined problem class that includes many instances of the problem, such as the traveling salesman's problem (TSP), the Boolean expression satency problem, and so on. The computational complexity of a problem is one of the few invariants in computer science, and it is as important as the conservation of mass and energy. The computational complexity of the problem does not change with the change of algorithms. But this invariance is for the same computational model, and most of the computational complexity we currently talk about under the Turing model. The computational complexity of the same problem may be different under different computing models. The most typical example is to solve a large number factorization problem under the quantum computing model, where the computational complexity of the Shor algorithm is at the polynomial level, while the exponential complexity is under the classical Turing model. Usually, we talk about the equivalence of different computational models for computability, and the comparison of computational complexity under different models is a concern for us, but there are not many such research results.

Many artificial intelligence problems such as natural language understanding and pattern recognition used to be recognized as difficult problems, and some people say that most of the artificial intelligence problems are NP difficult problems with exponential complexity (in layman's terms, NP difficult problems refer to problems that are difficult for computers to solve when the problem scale is large), which is only a vague and general statement, without strict definition and proof. Because the so-called problems to be solved by artificial intelligence mostly refer to a class of applications, such as face recognition, machine translation, etc. It is not clear how computationally complex the AI problem is. When an article is translated from English to Chinese, there is no strict definition of what is right and what is meant to complete a task. There is no way to discuss these problems with the existing theory of computational complexity, because the discussion of computational complexity must clearly explain what the inputs and outputs are, and the problems to be solved must be strictly defined.

Some people say that the high efficiency of large models in solving artificial intelligence problems is because of this ambiguity, and they do not seek optimal or exact solutions. But computational complexity theory tells us that some problems, such as the traveling salesman problem, are still exponentially complex when the approximate solution is solved using neural networks. The Institute of Computing Technology of the Chinese Academy of Sciences uses machine learning methods to fully automatically design CPU chips, and the accuracy rate is as high as 99.9999999999999% (13 9s), which can also be achieved within 5 hours. It can be seen that it is not necessary to find an approximate solution only to be efficient.

Nowadays, large models are used to do machine learning, whether it is text, image, video generation, or images, speech recognition, machine translation, weather forecasting, etc., and the actual effect is much better than the previous methods. What exactly is the reason for this? What exactly have we changed compared to the AI methods of logical reasoning and expert systems of the past? I think it's the change in the computational model (machine learning is also a computational model).

In addition to the Turing model, there are also lambda calculus, simulation calculation (continuous calculation), quantum computing, etc., and the machine learning that everyone is doing now is data-driven Turing computing, not classical Turing computing. The so-called "Turing machine" does not refer to a machine, but to a "process", and the Turing model defines what process is computation. The Turing machine has many limitations: first, all input information must be readily available, and it must be told what to input before computation, second, the computation process cannot interact with the input source, third, the machine must operate according to finite deterministic rules, end within a limited time, and so on.

Turing computing is a calculation in the strict sense of computer science, the input and output are all determined, calculated on different machines, the result is the same, today and tomorrow the result is the same, so its ability to solve problems is constant. But existing machine learning systems interact with the outside world, and computing power grows day by day before it reaches saturation. Figure 1 refers to Wang Pei's report at the "Science Popularization China Star Forum" on August 24, 2023, in which the black line indicates that the expected embodied AI system is more adaptable than the current machine learning system, directly interacts with the real physical world, and can learn real-world knowledge and laws, including knowledge that humans have not yet mastered. The red line is the exponential growth of superhuman intelligence predicted by some scholars, the existence of which has yet to be verified. The connotation of "computing" has changed, and the interactive information services and machine learning that never stop on the network are no longer Turing computing in the strict sense, but still use the same term, so it has caused a lot of confusion and controversy.

Li Guojie: What is behind the miracle? CCCF Picks

Fig. 1 Variation of the capabilities of different computational models over time

Von Neumann was the first to recognize that the neuronal model is different from the Turing machine model, and he noted: "The Turing machine and the neural network model represent an important way of research, respectively: a combinatorial approach and a holistic approach. McCulloch and Pitts axiomatically defined the underlying parts to give very complex combinatorial structures, while Turing defined the functions of the automaton without referring to specific parts. Von Neumann also made a prediction: "Information theory consists of two major parts: strict information theory and probabilistic information theory." Information theory, based on probability and statistics, is probably more important for modern computer design. "Judging by the success of the current large-scale model, von Neumann's prediction has become a reality. For automata theory, neuronal models may be more valuable than Turing models. Instead of implementing Turing computation according to a definite algorithm, the neural network has the main function of "guessing plus verifying". Guessing and computation are two different concepts, and a more appropriate name for a neural network-based machine is a "guessing machine" rather than a "computer". The essence of large models is uncertain calculations based on probability statistics, and their efficiency in solving complex problems is much higher than that of Turing machine models.

The neuronal model, which was proposed at about the same time as the Turing machine model, has been competing for decades. For a long time, the Turing machine model prevailed, but Hinton and others never gave up, and it was not until 2012 that deep learning based on neural networks became popular when they hit the ImageNet image recognition competition. The Turing machine computing model and the neural network computing model have their own advantages and disadvantages, and their performance is different in different fields.

It is worth pointing out that in 1948, Alan Turing wrote a paper entitled "Intelligent Machinery", which proposed the concept of "unorganized machines", which was actually an early model of randomly connected neural networks that almost described the basic principles of current AI connectionism, including genetic algorithms and reinforcement learning. Since it was not endorsed by his boss, the article was not published until 2004. This article shows that Turing himself was also bullish on neuronal computational models. If the academic community had seen this paper earlier, the computer world today might have looked different.

The basic assumption of AI is the Church-Turing thesis, which is that "cognition is equivalent to computation". In my 1992 article "A Study of the Computational Complexity of Artificial Intelligence" published in the journal Pattern Recognition and Artificial Intelligence, I pointed out: "There are only two ways for artificial intelligence to get out of the circle of playing children's games (toy problem), either to admit the Church-Turing hypothesis and build on existing computer capabilities (which are only polynomial times different from Turing machine capabilities). Find a suitable problem description and find a solvable problem in artificial intelligence, or disagree with the Church-Turing hypothesis and seek a new 'computational' model that makes the problem easy to solve for the human brain in the new model. "Today, the judgment of that time has stood the test of time, and finding the right problem description and finding new "computational" models is still the main task of the AI community.

Some people have refuted my argument that every step performed in a computer today is a Turing calculation, and that we are "mapping" all other computational models to a Turing machine and using a Turing machine to simulate other models. This may involve a dialectical relationship between the whole and the part. The whole process of machine learning is like a zigzag curve, and each small differential of the curve can be seen as a straight line. That is to say, at present, every step of the specific operation of the digital computer is done according to Turing calculation, but the whole process of machine learning is no longer Turing calculation. There may be many deeper mysteries hidden here. In the field of artificial intelligence, we need to conduct a completely new study of computational complexity, because many new phenomena have emerged that cannot be explained by old theories. There is a problem: scholars who focus on complexity research tend not to get involved in the field of AI, and scholars who work on AI are often not interested in complexity research. I believe that the combination of these two fields will lead to a breakthrough in principle.

Von Neumann's posthumously published book, "Theory of Self-Reproducing Automata", pointed out that the core concept of automata theory is complexity, and new principles emerge from ultra-complex systems. He introduced an important concept: complexity thresholds. Systems that have broken through the complexity threshold and are constantly evolving due to diffusion and mutation at the data layer can do difficult things. Today's neural network models, with hundreds of billions of parameters, may be approaching the threshold point of complexity to handle difficult problems. The complexity threshold is a very profound scientific issue, which has not yet attracted great attention from the academic community. The complexity threshold does not equal the size of the model and requires in-depth study.

The business community has different attitudes towards large models. Broadly speaking, it can be divided into "hammer faction" and "nail faction". The "hammer faction" is a technical belief school, believing in AGI and the scale law, pursuing the versatility of the model, believing that the large model is a hammer and can be hammered with any nail. The "nail faction" believes that the nail must be found before the hammer can work, and it believes in the business scenario that can be realized. I think both hammers and nails are important, and the two should be combined. Artificial intelligence is still in the exploratory stage, and it is necessary to encourage the diversity of technological approaches. The large model is one of the feasible ways proven in practice, and it cannot be regarded as a chance to try, and we must strive to catch up and make a breakthrough in the technology of the large model. It is also necessary to combine China's national conditions and embark on its own path of artificial intelligence development. Artificial intelligence technology should be used more in materials, medical treatment, industrial control and other fields to produce tangible economic benefits.

When we explore large models, we may discover new principles about the nature of intelligence, just as physicists discovered new principles about the physical world in the 20th century. Quantum mechanics was very counterintuitive when it was discovered, and they can be just as counterintuitive when the fundamentals of intelligence are discovered. If the explanation of the large model is understood in one lecture, then the real principle may not have been found.

In 2022, tech companies created 32 important machine learning models, while academia produced only three, a stark contrast to 2014, when most AI breakthroughs came from universities. In recent years, about 70% of people with AI PhDs have entered the private sector, up from 21% 20 years ago. The "monopoly" of leading technology companies in the field of AI is becoming more and more serious, and the academic community is facing unprecedented challenges. It is not necessarily in the common interest of all mankind that the direction of technological development is completely controlled by entrepreneurs and investors, and scientists should play their due role in leading the sound development of science and technology.

Note: On March 24, 2024, the author gave a report entitled "Progress of Artificial Intelligence and the Evolution of Computational Models" at the "Artificial Intelligence +" Academician Forum held by Pengcheng Laboratory and Hong Kong University of Chinese (Shenzhen).

Li Guojie: What is behind the miracle? CCCF Picks

Li Guojie

Academician of the Chinese Academy of Engineering, Honorary Chairman of CCF, and former Editor-in-Chief of CCCF.

Special statement: China Computer Federation (CCF) owns all the copyrights of the content published in the "China Computer Federation Newsletter" (CCCF), and without the permission of CCF, the text and photos of this journal shall not be reproduced, otherwise it will be regarded as infringement. For infringement, CCF will pursue its legal responsibility

CCF Recommended

【Articles】

  • 李国杰:迎接70年未有之大变局 | CCCF精选
  • Li Guojie: To become an excellent scholar, you need to cultivate your soul CCCF Picks
  • Li Guojie: Speak for the great changes in computer science and technology - congratulations to CCCF for its 200th issue
  • Li Guojie: A Wonderful Flower of Sino-US Academic Exchanges - Preface to "David's Column Collection".
  • Exploring the Road to the Reform of Science and Technology Societies - Preface to Du Zide's Collected Works
  • CCCF Column | Li Guojie: Some cognitive issues related to artificial intelligence

Read on