Overview of General Artificial Intelligence Technologies (IV)

Original AGI Alliance Dr. Wu General Artificial Intelligence Alliance

Hello everyone, today we continue to share the original review of general artificial intelligence (AGI) technology, this short review will systematically sort out the current status and status of AGI development, and cutting-edge collection of the most driving results, which can be used as a primer for the field. This sharing will last for five issues, this issue mainly describes the learning methods of the humanoid brain.

Note: This article is in the form of PPT+ speech, it is recommended to use a computer instead of a mobile phone to display and watch, the speech is located above the explained PPT, in addition, because part of the speech is a personal opinion, please understand that there is no rigor.

First of all, "learning" does not equal "training", and today we are studying learning in a broad sense, not how to refine pills. Then let's get started~

Overview of General Artificial Intelligence Technologies (IV)

Let's first observe the characteristics of learning, first of all, the conditions of learning. The human-like learning process is based on an online, streaming learning environment, this environment is learning while using, or learning during the work of the agent, so it is called online learning, then the second feature, the second is flow cytometry, compared to the current deep learning by first building a massive data set and then training is different, its learning is more dependent on real-world streaming serial data. Second, learning is driven by external stimuli and curiosity. That is to say, the motivation for learning is mainly from two things, one is the external stimulation, the other is their own curiosity, children usually outside the external stimulus, do some exploratory, based on their own curiosity to do things, these things are the key to the evolution of the agent.

The second part is how to learn, that is, the process of learning, the first type is the acquisition and combination of meta-skills, if a task, such as surgery or stir-frying needs to be broken down into hundreds or more steps, then each step is a skill, these skills need to be acquired through exercise, the agent can also combine it to form a complex behavioral mechanism, then the process and combination of these meta-skills is the place to learn.

The second category is imitation learning, through which humans can quickly learn skills and combinations, which is the main way for human knowledge inheritance.

The third category is biomimetic learning mechanisms, including neural modulation methods, synaptic plasticity mechanisms, etc., which can be used to update synaptic weights and reconstruct the structure of networks, and these learning mechanisms are the underlying mechanisms for the construction and renewal of neural networks.

The fourth category is memory, we think that the memory mechanism is a special type of learning, because humans usually make decisions through past experience, we usually guide current decisions based on some important but very few samples of historical experience, we do not update the structure or weight of the neural network widely, just remember this situation and the strategy that needs to be taken, and apply it, so it is also a special kind of learning, that is, notation, which is an important way of working.

Finally, in terms of the effect of learning, the results of learning are mainly used for adaptation to the new environment and the processing of new tasks, as well as continuous learning to update the self, that is, lifelong learning.

Due to the large number of topics in learning, we focus on the following 6 core scientific issues, including:

Scientific question 1. How to achieve lifelong, continuous, online learning ability, and achieve rapid adaptation to new environments and new tasks;

Scientific question 2. How to achieve self-driven learning based on curiosity and subjective initiative;

Scientific question 3. Explicit and implicit knowledge acquisition and application patterns;

Scientific question 4. How to acquire knowledge and skills through imitation learning in a versatile and efficient manner;

Scientific question 5. how to use memory to aid learning and decision-making;

Scientific question 6. What are the lessons learned by biological learning mechanisms for the learning of agents?

First, let's discuss the first scientific question, which is scientific question 1. How to achieve lifelong, continuous, online learning ability, and achieve rapid adaptation to new environments and new tasks.

In this part, we need to introduce the concept of lifelong learning machine, we can think of the human brain is a lifelong learning system, we are from birth to today, has been updated according to the needs of life, work and school. The current agent usually works based on the form of pre-training + finetune+ deployment, and the update ability is limited. So we need to look at lifelong learning. Lifelong learning never ends, performance grows empirically, and appropriate learning mechanisms are used using compute and storage resources. Lifelong learning also includes the concepts of transfer learning and continuous learning. Specifically, it includes the following aspects:

First, transfer and adaptation, that is, the ability to transfer knowledge to a new environment and adapt to the new environment using less-sample and meta-learning. Where meta-learning refers to learning how to learn concepts faster.

Second, overcome catastrophic forgetting, acquiring new knowledge without forgetting old knowledge.

Third, use task similarity to assist learning, skill primitiveness and compositionality, to realize the transfer of skills learned in old tasks to new tasks, and vice versa (forward/backward transfer).

Fourth, task diagnosis learning means that the agent needs to judge the changes of the task during the training process, and needs to find the old task similar to the new task.

Fifth, noise tolerance, because most of the sensor data is different from the optimized cleaned training set data, dirtier, more prone to environmental changes, the agent needs to adapt to such dirty training data;

Sixth, resource efficiency and sustainability, first need to not affect real-time reasoning, and secondly, need to maintain the growth of memory;

Seventh, non-tasked learning (autonomy) includes self-supervised learning and curiosity-based supervised learning.

In these areas, machine learning focuses on and implements some of the content, and we will focus on online learning, which refers to a type of learning method that learns by using and continuously updating, rather than training and then deploying. In this field, a major research direction is to overcome the problem of task change over time and the problem of streaming, that is, the need to detect task changes and update the model according to the new task, in addition, the need to update with a small number of samples and take effect immediately.

Another area is continuous learning, where one of the research directions is overcoming catastrophic forgetting. At present, the solutions that can be used include complementary learning systems, that is, short-term rapid learning using temporary memory (situational memory or working memory), long-term learning through changes in network structure and synaptic parameters, and information reproduction and conversion to long-term learning through replay mechanisms, similar to complementary mechanisms composed of hippocampus and cerebral cortex.

It is worth pointing out that these concepts in machine learning are a scaled-down or incomplete version of the corresponding concepts in biology, and there are many phenomena, and even sometimes they are just a little touched, and many of the concepts behind also have this situation.

Let's talk about the second scientific question, which is how to achieve self-driven learning based on curiosity and initiative.

Curiosity is a core driver of human learning, especially children. In the field of enhanced learning, there are already some achievements on intrinsic motivation, or curiosity-driven. A typical example is OpenAI's curiosity-driven augmented learning agent, which can achieve excellent performance for 54 Atari games, Super Mario games, and robot control. Its core is curiosity that uses prediction error as a reward signal, and uses internal rewards to actively explore the environment when the external environmental reward is very sparse or difficult to design. The results show that the effect of the intrinsic curiosity objective function and the hand-designed extrinsic reward are similar, and both achieve very good learning.

At present, most of the curiosity and intrinsic motivation are considered by the enhanced learning system, and most of its biological characteristics have not yet been explored, and there is a lot of research space.

Scientific question 3 is described below. Patterns of knowledge acquisition and application. First, explicit knowledge is introduced, which here mainly refers to the skills or common sense stored in declarative memory that can be clearly expressed, such as triples. We can understand this knowledge as meta-knowledge or meta-skills, and when using it, it needs to be combined to achieve complex logical thinking or planning execution. Let's start with the learning and combination process of meta-skills. The first is the management of meta-skills, we can think of meta-skills as a series of small independent skills, so it needs to be stored and managed effectively. The second is the learning and updating of meta-skills, for each meta-skill, it can be updated and learned independently, the third is the combination of meta-skills, that is, after the meta-skill itself is learned, it can be used in combination, and if it is often combined, it can also merge several small skills into a large skill, that is, chunked rule, the merger of meta-skills can also occur when two very similar meta-skills are merged into one, and there is also the forgetting of meta-skills, Those that do not require skills can be discarded.

In this regard, neural inductive logic programming has done some tentative work, which can learn rule logic from rule templates and a series of positive and negative examples, traditional ILP is symbolic expression, while Neural ILP is based on the differentiable implementation of neural networks. Neural ILP is committed to solving several major challenges: first, how to generalize rules, and can generalize, so that it is separated from concrete things and specific parameters, which is a challenge for rule improvement, that is, we prefer to obtain abstract rules, more adaptable rules. Second, it is necessary to support high-order relational data and quantifiers, third, it is necessary to support large rule complexity, especially the exponential growth problem formed by the concatenation of logical rules, and finally, it is necessary to use as few priors as possible, such as we do not manually restrict the rules template, or search space restrictions, and so on. Then some Neural ILP schemes adopt the following main ideas, including the use of soft logic such as t-norm instead of two-system logic, so that differentiation can be achieved, and then a single logic layer and a multi-step logical network are built to build a relatively complete logical network, and then the matching with input perception can be constructed through fully connected or sparse interconnection schemes, and then the expression ability of quantifiers, such as elementless, unit, multivariate, etc. and their switching paths, Finally, the constructed network is trained by the BP training method.

These are some typical learning cases of cognitive logical processes, including multi-step symbolic reasoning with quantifiers equivalent to neural networks, and network equivalents implemented by CNNs as perceptions and through relations modules as grounding, and later intervene in the main paradigm of logic. In addition, the figure on the right also shows the equivalent description of a neural network of a logical network and the scheme for obtaining parameters for differentiable training. These schemes can achieve certain results on some small agent tasks and some logical reasoning datasets.

Another aspect is reverse learning, which can work with the symbol system and the neural network system, this system is different from the above system, it does not change the neural network to logic or vice versa, but both exist and work together as the core point, it is characterized by the symbol system through knowledge to achieve the detection of rule violations, and by modifying a part of the label, so that the rules are more likely to be established under the detection results of the current neural network, that is, the predicted results of the nervous system, Label-level correction through the symbology system, because it has a knowledge framework, so that it can be corrected to provide a better pseudo-label, feedback to the neural network, the neural network through this new label weight update, so that its reasoning results under the new label probability compliance, the new label through the knowledge base for logical matching, as a loop, so as to achieve a virtuous update loop, we can use the example on the right as a typical case to illustrate, for example, This diagram in the middle, we can understand it as the sea? A few words of hundred rivers, but the middle word may not be understood, so how do we guess the content of this cursive, first of all, we have a language knowledge base, we have learned this idiom, so we can guess that the middle word is the word 'na' of the sea and hundred rivers, this is a correction process of the result of reasoning with knowledge, then with the label of the sea and hundred rivers, the figure on our right may be easier to understand, because, this diagram we may only be able to deduce from perception that the sea, hundred, A few words in Sichuan, the remaining words are difficult to understand, but after we deduce that the middle is a nano word, the remaining words become what the four words after the sea should be, through the language knowledge base, we can deduce that there is tolerance is great, so the remaining cursive words are guessed, this example is a perception system or neural network and knowledge base-based logical reasoning deep integration, loop, mutual correction to build an example of reverse learning.

The following describes another aspect of scientific question 3: the acquisition and application of implicit knowledge. We can find that a lot of knowledge common sense is hidden behind visual perception information, such as the 5 core elements of FPICU, which can be understood as implicit common sense knowledge, but can be learned from visual features, including these five aspects:

Causality: behavior acts on objects, forming a causal relationship, resulting in transient changes in objects, such as switches in an open and closed state;

Physics: Common knowledge of physics, such as whether a branch can withstand a child, people will express surprise at something beyond physical intuition;

Functionality: the function of the object, for example, the cup can hold water;

Intentions and goals: Intentions and goals, such as going to a place to pick up something;

Utility and preference: Utility, human behavior usually has the best utility, such as people from one place to another, generally take the shortest path.

These common sense information can significantly assist the processing and reasoning of visual information, and this information has better generalization ability, which is equivalent to constructing a high-level cognitive model for images, so that two-way reasoning can be achieved, that is, the combination of top-down and bottom-up, so that learning, tool use and so on can also be achieved with less data. On the other hand, the overall feeling is more from the perceptual system to cognition to expand the thinking of the planning scheme, many examples in the original article are more like perception + simplified special cognitive scheme, and I think if you consider from a general cognitive architecture ontology to perception may be more general and more generalized.

For example, the following example of physical common sense, for a lot of physical common sense, such as the laws of physical motion, some basic laws of mathematics, etc., these laws can be expressed by the relationship diagram or OR graph after the perception extraction, and the perceived information is further processed through cognitive knowledge, and the corresponding analysis is achieved through symbolic or sub-symbolic expression at the cognitive level, so as to realize a reasoning and use of these physical common sense.

Scientific question 4 is described below. That is, how to acquire knowledge and skills through imitation learning, universal and efficient. At present, a number of reinforcement learning algorithms have shown that imitation learning is an important way to learn quickly, such as such an example of a large language model based on GATO, which can achieve general multitasking, such as playing Atari games, labeling images, chatting, stacking blocks with robots, etc., which can take each input and output that needs to be imitated various modalities as tokens as input for embedding. After that, the next token is predicted through the transformer (the image needs to be ResNet first). Using autoregressive methods for training, we can train a network of imitated actions that meet our expectations, such an agent has a certain generalization, and the ability to adapt to new tasks is relatively strong.

Scientific question 5 is described below. That is, how to use memory to aid learning and decision-making. First, the process of making decisions based on semantic memory, such as a knowledge graph, is introduced. One of the main problems is how to solve current problems with the help of past experience, because the knowledge learned in the past is not completely consistent with the current needs, so it is necessary to find the solution to the current problem from the knowledge map summarized from the previous experience.

The prediction and completion of the knowledge graph is a typical case of this kind of thinking logic, explaining how to handle the triplet completion or prediction of the current input through the triplet knowledge graph. First of all, we can store knowledge in the knowledge graph constructed by triples, but such a knowledge graph usually lacks direct information about the current question, such as the example on the right, whether a triplet relationship such as X, Y and Appear in TV show is valid, we can obtain the corresponding hidden association from this graph, and calculate the probability of its success. This probability of holding, including the multi-hop relationship between the nodes, and some connections in the middle may be the relationship of reverse implication, such as the case of such a reverse implication of X and Y Has Actor, then it can be directly inferred that the above is true, or there can be such a complex chain of X-U-V-Y, which contains positive and reverse aspects, and also implies the possibility that X and Y have similar properties. Therefore, to a certain extent, this conclusion can be deduced from the top. In order to achieve a similar ability of knowledge completion, we can use an enhanced learning agent to walk upstream in the triplet of this Knowledge Base, traverse new paths, determine new factual relationships, first embed nodes and relationships into low-dimensional space, and express position and wandering relationships through state vectors, state transition probability matrices, etc., and let them wander in the knowledge graph through the reward mechanism. The reward mechanism is summarized as a combination of the following degrees, whether the goal is achieved, the effectiveness of the target node path, the diversification degree of the path, etc., the training process and the use of REINFORCE methods to update.

The agent here is actually the thinking logic controller in the brain, similar to the dynamic operation cycle of cognitive architecture, and this example has a certain similarity with people looking for solutions to problems through historical experience.

The following is an example of using contextual memory + curiosity to construct rewarding signals for reinforcing learning. Its network structure introduces contextual memory, which records the embedding vector of historical perceptual information and compares it with the similarity of current embeddings, done through the Comparator Network. Finally, by comparing the structures, the curiosity reward is constructed, which is used to enhance the reward signal of RL, that is, the process of addition. Among them, when the training of the comparator network is carried out, the concept of accessibility is introduced, because no current embedding will be completely consistent with historical information, so it can be considered that when the two scenes are not far away and can be reached within a few steps, it can be considered a positive case, far enough, that is, the number of steps is a negative example, so that a comparative network can be trained to distinguish how much difference between situational memory and the current embedding is.

Finally, introduce a case related to working memory, it is an improved version of the neural Turing machine, you should be familiar with Neural Turing Machines, that is, NTM, it can be said that it opened the precedent of neural network memory mechanism, but due to the difficulty of NTM read-write head training, high degree of specialization, so it has not really become popular, this improvement program is Token Turing Machines, The combination of Transformer and memory module enables the processing of long-range visual understanding tasks, such as the detection of video activity and the significant optimization of the completion rate of the saycan robot task. Its main feature is the construction of an easier to train and more versatile read/write head, and the use of transformer to achieve multi-step calculation. In the read/write head design, the dimension reduction calculation is realized through the token summariser, for example, the reading memory part can reduce the dimensionality of 96 memory tokens and more than 3,000 input tokens to 10 tokens and process them through the processing unit (that is, transformer). This dimensionality reduction process is done using token summariser, specifically through attention-like or MLP networks. This model effectively complements the deficit of the transformer in memory, and can also be regarded as a typical example of working through working memory.

Above we demonstrated the memory ability from the neural network implementation of three memory modes. The development space of memory is very large, as for whether to use a separate storage body or directly stored in the network, the current thinking is not consistent, but the memory ability of single sample and few samples is crucial.

Below we introduce the sixth scientific question, that is, how can biological learning mechanisms learn from the learning of agents. First of all, let's understand the characteristics of biological neural networks, which mainly include the following aspects: first, brain-like networks have a multi-cluster loop structure, and the network is highly loopback interconnected, unlike the feed forward form used by most deep learning algorithms, secondly, the connection between neurons is very sparse, and follows the characteristics of local densification, cross-local sparseness, even if it is a local dense part, such as the 1mm^2 microcircuit visual cortex model. The density of connections between neurons is only about 4%. Third, the execution of brain-like networks follows a dynamic process, that is, an evolution with a temporal dimension, which can be compared to a recurrent neural network. In addition, neurons have a more complex internal structure, synapses, dendrites also have a more complex structure, if modeled in detail, a neuron can be equivalent to a small network of deep learning. In addition, synapses have localized learning capabilities, called synaptic plasticity. Finally, the network follows the characteristics of event-driven, pulse propagation, also called dynamic sparsity, and synaptic computation only with input, which is one of the reasons why biological brains are more energy-saving. Therefore, starting from the biological brain, we can explore some new implementation ideas. For example, large MoE-based transformers can be thought of as artificial versions of event-driven and block-sparse networks. However, on the other hand, this sparse event-driven structure is not friendly to large-scale regular parallel computing and parallel access, and is not as efficient as matrix multiplication or convolution (measured by operands per watt), and training is also difficult, so it is more appropriate for us to draw on more trade-offs.

Based on the neural network structure of organisms, we can understand the mechanisms involved in learning in biological brains.

Neurodevelopment: Dynamic structures, growing new neurons, synapses, remembering new information, helping to overcome catastrophic amnesia and meeting new growth processing needs.

Situational repetition: (hippocampus-cortex) memory model (i.e., hippocampal rapid learning followed by cortical slow learning). The neocortex interweaves these replays initiated from the hippocampus with the replays of its own (already consolidated) neural patterns in order to integrate new information without rewriting previous memory structures, which play a large role in brain memory consolidation.

Metaplasticity: Synaptic plasticity is the core mechanism of memory, and plasticity of plasticity (metaplasticity) means that the ability of synapses to be modified depends on their internal biochemical state, which in turn depends on the history of synaptic modification and recent neural activity. Synaptic reinforcement contributes to the strengthening of memory, the realization of fast memory, slow forgetting. In addition, the modification of biological synaptic weights involves multiple cascading processes running on different time scales. The fast and slow mechanism allows for quick acquisition of new information and a decision on whether to make permanent changes based on subsequent event delays. Spurious signals may only lead to temporary changes in synaptic strength, while repeated strong input signals leave permanent memory traces. Helps solve the stability-plasticity dilemma.

Neuromodulation: releases neurotransmitters that have both local and global effects on activity and plasticity. Neuromodulation can facilitate learning, help overcome catastrophic forgetting, support adaptation to uncertain and novel experiences, and improve understanding of environmental change.

Context-dependent perception and gating: Context plays an important role in regulating, filtering, and absorbing new information. This is important for keeping track of changing environments, keeping an eye on changing parts, and integrating new information. Context gating is the selective opening of subsets of neurons that helps reduce interference between similar experiences. It also helps filter out less relevant stimuli and focus on critical stimuli that require an immediate response.

Hierarchical distributed systems: This allows processing and learning from multiple networks of neurons distributed throughout the body, each with dense but relatively sparse internetwork connections within the network. By leveraging this hierarchical and distributed architecture, biological systems greatly reduce the input and output dimensions of each layer to reduce latency and accelerate learning.

Out-of-brain cognition: Many biological systems demonstrate intelligence without the help of the nervous system, such as the ability to learn from experience, predict future events, and adaptively respond to new challenges. Subcellular processes such as single cells and even molecular networks, non-neural bioelectric networks or transcriptional networks. Biology uses the same mechanisms (bioelectric and other types of networks, multiscale homeostasis mechanisms, cooperation and competition within organizational hierarchies, and cooperation) to solve search problems in difficult spaces. Recent data also reveal important commonalities in how information is processed in whole-body neural networks and single-cell pathway networks.

Reconfigurability: Biological organisms are highly reconfigurable and are also capable of remodeling brain tissue while maintaining information content (memory).

Multi-sensory fusion: Upper colliuses integrate sensory information from different senses (i.e., visual, tactile, and auditory signals) to produce coordinated eye and head movements.

The following introduces the integration of bionic learning and machine learning mechanisms, in this part, we mainly introduce plasticity and local learning, global learning has the more familiar back-propagation, etc., bionic local learning is mainly bionic is the plasticity of biological brain neurons and synaptic plasticity, plasticity is moldable, changeable meaning, then synaptic plasticity, including connection or structural plasticity, and synaptic strength plasticity, Strength plasticity can be simply summarized as long-term and short-term, where long-term enhancement and weakening, such as Hebb's law, STDP's law and its variants, etc., somal plasticity includes some parameters inside neurons to adjust the ability, such as homeostasis, similar to the self-evolution mechanism of deep learning, and threshold adaptive methods, because the pulse neural network is a basic form of superthreshold issuance, so the self-regulation of the threshold is very important, global local fusion learning is the integration of global and local learning mechanisms Then its weight update has a global scope update, but also has a local update of synapses between neighboring neurons, such as Reward-modulated STDP, Predictive Coding, Equilibrium Propagation and other methods.

At present, bionic learning is still in a relatively primitive stage, with many algorithms but limited accuracy and versatility.

This issue of sharing is here, because the learning mechanism is still immature, a large number of problems are not conclusive and representative works, so this part has more concepts and limited implementation solutions. In the next issue, we will focus on the intelligent benchmark evaluation and summarize and outlook AGI. Thank you all for your attention~

Overview of General Artificial Intelligence Technologies (IV)

Read on