Overview of General Artificial Intelligence Technology (I)

author：AI self-sizophistication 2023-06-03 06:08:00

Original AGI Alliance by Dr. Wu General Artificial Intelligence Alliance

Hello everyone, starting today, we begin to share an original review of general artificial intelligence (AGI) technology, this short review will systematically sort out the current status and status of AGI development, and collect the most driving results in a cutting-edge manner, which can be used as a primer for the field. This sharing will last for four sessions, each with a different scenery, hoping to bring you a happy cutting-edge academic journey.

Note: This article is in the form of PPT+ speech, it is recommended to use a computer instead of a mobile phone to display and watch, the speech is located above the explained PPT, in addition, because part of the speech is dictated, please understand that there is no rigor.

Then let's get started~

Overview of General Artificial Intelligence Technology (I)

The following table of contents, first we will analyze the definition and current situation of AGI, then introduce its research content and progress from four perspectives: perception technology, cognitive technology, learning methods, and evaluation standards, and finally summarize the overall situation of AGI.

First, we will introduce the definition and current status of AGI

General artificial intelligence technology is a systematic technology used to build an agent with general task solving ability and continuous self-learning ability, with perception, cognition, decision-making and planning. And this agent needs to have the intelligence characteristics and level of the human brain. The ultimate goal of general artificial intelligence technology is to make such an agent have the intelligent ability and energy efficiency of the human brain, and can reach the human level in environmental adaptability, non-domain-specific task processing ability, learning ability, cognitive and logical thinking ability, memory ability, perception ability, autonomous driving ability, emotion and consciousness, operation energy efficiency and efficiency.

Many people have always had a question that general artificial intelligence is still out of reach and unattainable. Although with the recent advent of OpenAI and a series of achievements by Google, the call for AGI has been much higher than the original, and even everyone hopes that GPT-4 has the capabilities of AGI. But for now, we're still on the road to AGI, towards AGI.

We believe that intelligence itself is a gradual process, so our discussion is mainly aimed at some current technologies that are helpful to general intelligence, and it is a pragmatic status quo collation and discussion of how far we are from general intelligence.

General AI intersects with many related fields: cognitive intelligence, which in turn includes fields such as cognitive architecture, memory systems, neural symbolic computing, and inductive logic programming. AGI also has a cross-relationship with brain-like intelligence, including brain-inspired models, pulsed neural networks, computational neuroscience, bionic learning mechanisms, etc. AGI also has a certain relationship with the dual drive of knowledge data, including knowledge graph, world model and common sense expression. AGI is also closely linked to self-learning, meta-learning and online learning. Finally, the main implementation carriers of AGI are usually artificial neural networks and deep learning technologies, especially general large models and augmented learning technologies, which include multi-modal, open-domain visual processing and large language models.

A general-purpose artificial intelligence body was built. So what abilities will it show? It mainly includes the following points, such as the left part of the slice, and the external capabilities that can be achieved by building such an AGI agent are displayed in the right part of the slice.

At present, the results related to AGI are still relatively rich. First of all, in natural language processing, cognition, common sense processing and logical reasoning, GPT-3, PALM, LAMDA, ChatGPT and other large language models have developed rapidly, and large language models are currently in a leading position in the field of AGI. In code generation, mathematical problem solving, scientific problems and medical problem answering, general large models have also made good progress. For technologies such as general planning, MuZero, Gato and other enhanced learning mechanisms provide multi-modal, multi-task general planning ability and imitation learning ability, which is a very promising multi-task learning method. In terms of cognitive framework, brain modeling at the level of cognitive function enables a discussion of an artificially realized top-level structure of our cognitive structure, with very strong reference significance for general intelligence.

In some microscopic aspects, such as the dual drive of knowledge data, the neural symbol-based processing method can realize structured image processing through the combination of image perception and symbol processing. In addition, in terms of knowledge expression, the Atom Space method based on global implicit information and local information is an important reference idea for knowledge graph. In addition, the memory-enhancing network enables stronger memory mechanisms. Finally, similar general intelligence assessment datasets such as ARC dataset significantly enhance the research and evaluation of general intelligence capabilities such as generalization.

In summary, the current technology and current status of AGI include the following points.

At present, AGI's main international research institutions mainly include DeepMind, OpenAI, Allen Institute for AI, OpenCog and so on. In addition, it also includes some neural symbol learning research teams, world model research teams, etc.

The following is an introduction to AGI's main research institutions in China. Statistics mainly include strong or general artificial intelligence, as well as cognitive technology and general large models. This includes the following organizations.

The analysis of the research status of AGI can be summarized as follows. In domestic organizations, most of the research is in professional fields plus latent cognition, large language models, biological brain understanding, brain-like devices, etc. There are fewer institutions that systematically study AGI theory. Especially in generalization, logical thinking, memory, world model, knowledge expression, etc. Research organization and results in core areas are still very limited. The core area of AGI in foreign organizations has developed rapidly in recent years. It mainly includes a series of technologies based on large language models and deep reinforcement learning, which try to solve the core problems of AGI from a macro perspective. It also has obvious results. In addition, neural network logical reasoning technologies such as neuro-symbolic, and memory technologies such as Memory Augmented Network study logical reasoning and knowledge expression mechanisms from a microscopic perspective. This category is still in the proof-of-principle stage. Real-world problems cannot be solved yet. In addition, traditional cognitive architecture theory has been developed over the years. Still limited by the noise-free sign domain treatment of the proprietary domain. Combining with the latest deep learning techniques still needs to be improved.

The following is the content of AGI's research.

We will introduce it from four aspects: perception, cognition, learning and assessment.

First, we will introduce general perception technology.

Universal perception mainly solves the following three scientific problems. The first is how to build a general multimodal perception mechanism. At its core, it is universal. Second, how to establish the mapping of objective things to feature space, that is, feature extraction. Third, how to build the mapping and synchronization of the real world and the world model in the brain. The core is the mapping of the world model.

Let's start with the situation of the first scientific problem, which is multimodal fusion perception. At present, neural networks have realized the processing of many modalities. Including visual images, videos, auditory music, speech. Natural language, code, and formulas for text. Graphs and knowledge graphs of networks. Multi-channel sensors for sensation, such as touch, smell, etc. In terms of multimodal fusion, the use of multimodal transformer is the mainstream scheme at present, for example, BEiT-3 realizes the separate processing and fusion processing of multiple modalities through the expert system of multiple FFNs.

Let's take a look at the second scientific problem, that is, the situation of general feature extraction methods, which mainly includes open-domain object detection and open-domain object recognition. At present, the main technology is to use contrastive learning methods. Its main advances include the CLIP network model, which can realize the calculation of the similarity of arbitrary images and texts, so as to realize image classification under new labels in the open domain. Based on this, an open-domain object detection method similar to VILD can be constructed.

The third research content is the world model. We believe that there is a subjective modeling of the objective world in the human brain, and deductions and decisions are based on this subjective model, that is, the world model. It has the subjectivity of the interpretation of the objective world, as shown in the figure above right. The first image is a low-pixel quality human face, and the second image is a cliff, stone, which can also be seen as a side face. The third figure is an abstract figure of two paper clips of one person appeasing the other. None of these three graphs can be directly identified directly from the image itself in the way of deep learning feature extraction, and more top-down guidance of the subjective model is required to dig out such a meaning. In addition, for the detection of occluded vehicles, as shown in the figure, many cars have been blocked by 90%, but it can still be judged that there are cars in this location. For example, when Tesla is autonomous, it actually models the current people and cars from a god's perspective, and estimates the future movement, which are some examples of decision-making based on subjective models.

The world model can be used in the scene modeling direction. Make decisions about things that are impaired in perception. In addition, subjective and objective adaptation can be achieved. That is, the fusion of top-down and bottom-up logical reasoning.

In the construction method of the world model, the JEPA model proposed by LeCun can better grasp the intrinsic similarity between things by using the similarity on the projection space as a distance measure and estimating the prediction function. In the use of the world model, behavioral decision-making can be realized through the world model, that is, behavioral decisions are generated based on the world model, and the agent can interact with the real world by predicting multiple future states and behavioral schemes.

The following is an example of an agent based on the world model - Dreamer V3, which is a general algorithm that can use a set of fixed parameters to realize agent reinforcement learning in a variety of tasks, which overcomes the problem of uncommon and complex tuning of traditional reinforcement learning methods, and has very good versatility. It features the construction of a world model that predicts the future tens of steps of the embedded representation of perceptual information, and lets the actor and critic networks learn behavior on abstract sequences built entirely on this world model. This algorithm uses fixed parameters, performs well in more than 150 tasks in multiple fields, and is the first algorithm to mine diamonds in Minecraft without the guidance of human data, which is a complex task that requires a combination of hundreds of correct steps over a long period of time, and originally required imitation learning to achieve, thus demonstrating the versatility and ability of this algorithm.

(This network has 3 neural networks, where the world model predicts future scenarios based on estimated behavior, the critic network predicts the return for each state under the current actor behavior, and the actor network maximizes the expected return based on the model's state.) ）

The topic of the next sharing is cognitive architecture and cognitive technology based on large models, please continue to support~

Overview of General Artificial Intelligence Technology (I)

Read on