Overview of General Artificial Intelligence Technologies (5)

Original AGI Alliance Dr. Wu General Artificial Intelligence Alliance

Hello everyone, today we continue to share the original review of general artificial intelligence (AGI) technology, this short review will systematically sort out the current status and status of AGI development, and cutting-edge collection of the most driving results, which can be used as a primer for the field. This share lasted for five issues, and previous links are here:

Review of General Artificial Intelligence Technology (1) mainly introduces the definition and field of AGI, domestic and foreign research institutions and current status, multimodal perception, world model, and general feature extraction methods;

The review of general artificial intelligence technology (II) mainly introduces the cognitive architecture and its evolution, including the composition structure of cognitive system, cognitive process, memory structure, and cognitive structure based on neural network;

A review of general artificial intelligence technologies (3) mainly introduces cognitive technologies based on large models, including comparison with human cognitive ability, typical works of large models in multi-step logical reasoning, code generation, robot tasks, etc.;

A Review of General Artificial Intelligence Technologies (4) mainly introduces the learning mechanism, including curiosity, online/continuous learning, neural inductive logic programming, imitation learning, bionic learning, etc.

This issue first introduces the evaluation methods of general artificial intelligence, and then summarizes and prospects AGI.

Note: This article is in the form of PPT+ speech, it is recommended to use a computer instead of a mobile phone to display and watch, the speech is located above the explained PPT, some of this draft is a personal opinion, please understand that there is no rigor.

Then let's get started~

Overview of General Artificial Intelligence Technologies (5)

In this part, we mainly solve two scientific problems, the first is the definition of general artificial intelligence, and the second is the evaluation benchmark of general artificial intelligence.

In Chollet's paper "On the Measure of Intelligence", intelligence is defined as the efficiency of learners to transform their previous experience and a priori into new skills in new tasks involving uncertainty and adaptability, and we can simply summarize its view as intelligence being measured as the acquisition efficiency of skills, or as measuring the minimum new task experience required to achieve the desired skill level. Of course, this measurement needs to be based on a clear task domain, a priori, and the skill threshold of the target task, where the skill threshold can be understood as the ability to complete the new task is limited, such as only 90% correct.

A simple understanding is that the minimum new task case required to achieve the desired skill level represents the sample size of the new task required at the time of transfer learning, or the time to interact with the environment, and the smaller this time or sample size, the more efficient the skill acquisition. We can observe from GPT-3 and its derivatives that he only needs a few shot of a new task, or even a setting of zero shot to complete such a task, so his skill acquisition efficiency is relatively high, and some traditional methods require fine tune on new tasks, or even massive training, then his intelligence is correspondingly weaker, and its visual metaphor is shown in the left figure, that is, a system trained in a known scene. Then, measure the range of unknown scenes that can be used, that is, the two figures on the right show an area that can run, and the larger the area that can run, the higher the degree of intelligence of the system, that is, only a small number of known areas are needed to cover more unknown areas.

This measure has the following characteristics, first, he needs a range, that is, a defined task domain, this task domain, that is, the set of new tasks that are migrated, and it should be measured under the new task, not under the known task that has been trained, because he emphasizes the ability to generalize, and it is the ability to generalize. In addition, he needs to use fewer data samples or experiences to show higher intelligence, that is, he cannot exchange a large amount of data training for intelligence, because he believes that intelligence is not exchanged for massive data, but an ability, so he emphasizes small samples or even zero samples.

Secondly, in the difficulty of quantifying generalization, it is divided into local generalization and generalization and general generalization, the training introduced above in the known region and verified in the known, these task scenarios belong to local generalization, and the emphasis on generalization is more generalization and general generalization, that is, processing ability under unknown tasks.

Third, to reach the basic threshold of a skill, that is, to solve this new task, not necessarily 100% solving, may solve to 95%, it is enough, then this can be used as a threshold for measurement.

Finally, the need for prior knowledge, a priori is that the evaluated system needs to be pre-set knowledge or skills, less priors are like fewer axioms in the theorem, it can be proved that the system uses less priori, that is, the more flexible or adaptable the system is, and does not need to tell too much information in advance, such a system is more intelligent, so it is smarter with fewer priors, and the participant has to give which priori is used. And these priori should preferably be similar to the initial common sense of human beings, such as the concepts of high and low, size, good and bad. Then the intelligent measurement centered on generalization as described in this paper significantly revolutionizes the existing evaluation method of intelligence, and the existing data-level performance under a task usually emphasizes the ability embodied in the task, rather than its adaptability in the new environment and new task, so the measurement of intelligence extends from local normalization to generalized normalization and general normalization.

We take the ARC dataset as an example to introduce the generalized benchmark dataset, which is extremely simplified in terms of perception and is basically based on color blocks and grids, which can highlight the process of logical thinking and rule discovery. It has more tasks, but each task only provides a number of samples, and the number of samples is generally less than 4, so it has no way to do too much training, and more requires prior knowledge to be easy to solve.

As in the leftmost example below, you can do it first to see if you can find the pattern. After looking at these three examples, the dataset will give a left graph, and let the agent answer the shape of the right graph. To answer this example, it is necessary to observe the color of the square, the position, and have the concept of topological expansion. You will find that we fully draw on prior knowledge and inductive logic when observing these graphics, and find the corresponding laws from them, while modern computer deep learning is more difficult to find, and the above laws are difficult to find, so it is difficult to get a high test score. The characteristics of this dataset are mainly characterized by more tasks and fewer samples, strong inference and many priori, which further strengthens the measurement of generalization.

On the other hand, large language models have developed rapidly in recent years, and have also spawned a series of evaluation methods, such as Big Bench, this dataset integrates 204 language tasks, and is constantly expanding, it covers a variety of topics and various languages as shown in the figure below, mainly used to evaluate the ability of the language model under the Zero/few-shot setting, which is the range of capabilities measured and the corresponding number of tasks, which shows that he can measure like Logical Reasoning , Common Sense, programming, reading comprehension, mathematics, and a host of other smart features.

Secondly, in mathematical common sense reasoning, there are also corresponding data sets, such as Grade School Math, this data set has a variety of elementary school mathematics composed of some mathematical problems, generally need two to eight calculation steps similar to this case to solve, in addition, Common Sense QA is a common sense centered data set, containing more than 10,000 examples, it is mainly based on pre-learned prior knowledge, answering questions in some given relevant context, As shown in the example below, the question is: How can you put a cup standing on a river on a sunny day and get water? Then we deduce from the following common sense that it is a waterfall.

In addition, in terms of reasoning logic, there are also some assessment datasets available, such as ParaRules for reasoning rules, Common2Sense for judging logical consistency, StrategyQA for judging implicit stepwise reasoning, LogicQA is a logic problem extracted from our Chinese National Civil Service Examination, and AR-LSAT is analytical logical reasoning.

For the general sense section there are also some data questions such as Proto QA, CLUTRR, CODH, RICA, PIQA, TIMEDIAL, RECLOR and so on.

On some mathematical and theorem proofs, there are also datasets such as SVAMP, MATH, IsarStep, HOList and so on.

The above is information about the evaluation. Below, we give an overview, outlook and summary of the overall characteristics of AGI technology.

AGI system is a complex multi-level system, we can organize and summarize it from the following perspective, first of all, it has some application goals, such as, sensing and conversion, creativity, agent control, brain mechanism understanding, etc., its behavioral layer mainly includes online learning, lifelong, continuous learning, active learning, generalization ability and other behaviors.

These high-level behaviors need to be supported by a cognitive architecture, which includes the perception part, the memory part, the mind control part, and the reasoning and output, in which perception includes multimodal information processing, vision, language, sound extraction, synchronous update of world models, feature extraction, perceptual physical common sense and so on.

The other large part is memory, which includes long-term and short-term memory, long-term memory includes declarative situational memory, semantic memory, non-declarative and includes procedural memory, perceptual feature memory, etc., short-term memory includes working memory. In addition, in terms of specific implementation schemes, there are some artificial memory forms that imitate memory, including, deep neural computer (DNC), Token Turning Machines, Transformer, memory enhancement network and so on. In addition, we also need to study the reading and writing mechanism of memory, as well as the forgetting mechanism, review mechanism, etc., in terms of control, the agent needs to have content including curiosity, interest, self-awareness, emotion, concentration, etc., and in reasoning and learning, it needs to have logical reasoning, learning, planning, strategy search and other capabilities, in addition to some output capabilities. In summary, the AGI system has cognitive abilities such as perception, memory, control, reasoning, and execution, and has intelligent behavior and application.

At the network construction level, the AGI system also has a variety of network forms, such as artificial neural network forms and brain-like model forms, which are built through related building blocks. In terms of micro-learning mechanism, it includes global learning represented by backpropagation micro-representation, and bionic learning represented by plasticity mechanism and local learning mechanism. In addition, it also has learning mechanisms such as transfer learning, meta-learning, reinforcement learning, imitation learning, reverse learning, and neural inductive logic programming. In addition, AGI also has an evaluation system based on multi-task and generalization. In summary, the AGI system has a variety of neural network model structures and construction modules, diversified learning mechanisms, and evaluation methods based on generalization.

Finally, we look ahead to the trend and future of AGI. How is the general artificial intelligence system architecture realized, and when will the era of general artificial intelligence come? Where is the most likely AGI path at the moment? Is there an internationally recognized development idea?

In this regard, we need to focus on the foundation model, and its detailed review can refer to the review paper "On the Opportunities and Risks of Foundation Models" jointly written by more than 100 experts from Stanford University, which is written in great detail and explains the concepts, capabilities (such as language, vision, robot manipulation, reasoning, human interaction, etc.) of the basic model. Technical principles (e.g. model architecture, training, data, systems, security, evaluation, theory), applications, and social impact. It is proposed that such a model will have emergent properties due to its homogeneity (which can be understood as a unified approach), that is, the ability that the original single-task model will not have. Based on the above reasons, combined with the influence of the current large model in the industry and the ability to achieve SOTA, we believe that convergence to the basic model may be the main trend at present, which is a strong catalyst for AGI.

For example, MetaLM from Microsoft Research Asia is an implementation of the basic model, which is a general-purpose interface model, which itself is a quasi-causal transformer structure (one-way causal decoder, but can access multiple two-way non-causal encoders), which can be used to access various large models and further fusion, suitable for various natural language processing, multimodal processing and other tasks.

Another point beyond the basic model is that the large model itself has significant deficiencies in accurate calculation and accurate knowledge question answering, so it needs to be equipped with an accurate knowledge computing engine, Wolfram| Alpha is a computing system based on symbolic expression and accurate computing language, which can significantly improve problems with accurate numerical logic such as ChatGPT such as whether large models are correct for mathematical problems, and it can be considered that the combination of statistical methods and symbolic methods is a feasible evolution direction in the future.

In addition, beyond the basic model structure, we need a lot of extra effort to better understand the world, memory and imitation, such as some examples we have introduced before, the combination of large models and memory (Token Turning Machines), the combination of large models and augmented learning (SayCan), the combination of world models and reinforcement learning (Dreamer V3), Methods such as Imitation Learning (Gato) based on reinforcement learning are a very competitive architectural reference for AGI.

We can also start from the biological brain to see what other content has not yet been involved in current technology, such as several major components of the human brain, the general division of labor is: frontal lobe: mainly responsible for the advanced motor center, prefrontal lobe: recent memory, information integration. Parietal lobe: mainly responsible for the higher sensory center, body perception. Temporal lobe: mainly responsible for hearing, smell, facial recognition, emotions. Occipital lobe: mainly responsible for advanced vision. Cerebellum: Mainly responsible for motor coordination. Brain stem: Mainly responsible for the body's control functions. Dr. Alan, creator of the LifeArchitect.ai of the Strong Intelligence Analysis Resource Station, believes that places where insufficient progress has been made include the parietal lobe, the integrated information processing between multiple brain regions. This is followed by complex strategic planning of the occipital, cerebellar and prefrontal lobes, while the well-developed ones include the related functions of the prefrontal, temporal lobes and brainstem. At present, the author thinks this idea is interesting, but the summary of the current situation is not very accurate and comprehensive.

In addition, there are many complex manifestations of human mental behavior that are not included, including but not limited to the following aspects: contemplation, which may be related to long-range control, world models, episodic memory, and logical cognition; Self-awareness may be related to long-range control, world modeling, logical cognition; Purpose and intention are related to curiosity, long-range control (attention), and logical cognition; Will, desire, and interest are related to long-term control and logical cognition; Agents also have emotions and feelings, which are related to neural modulation (such as dopamine), long-range control (mainly refers to state regulation); Similar adjustments can also form the regulatory power of macro decision-making, such as more courageous to do things. In these cases, long-range control and neural modulation play an important role.

In summary, we recommend the AGI architecture based on neural networks as follows, first of all, it is a class of AGI models that combine large models and strong rule systems, in which large language models can achieve experienced imaginative logical reasoning and creation, but its ability to contextually relate is limited, because it has a fixed context window, in addition, its instability and stream-of-consciousness characteristics still exist, so it is necessary to build a long-range control system to achieve binding and rule-binding It does not need to be very flexible, but it needs to achieve this long-range, stable deep thinking, or a control logic, and has macrostate regulation capabilities similar to neural modulation.

On this basis, through the fusion of the world model, the perception of unstructured data is accepted, and at the same time, the historical experience and new knowledge rules of the corresponding agents are memorized through the memory model, and the interaction and output of the objective world can be realized through external data exchange, precise command control, and communication with people. In terms of learning mechanism, it is necessary to comprehensively use a variety of learning mechanisms such as ultra-large-scale data pre-training, reinforcement learning, continuous learning, bionic learning and imitation learning to build an open system with lifelong learning ability, and realize the closed-loop and close interaction between the agent and the external environment, so as to realize a more comprehensive and ideal AGI system.

That's all for the overview of general artificial intelligence

Overview of General Artificial Intelligence Technologies (5)

Read on