laitimes

Overview of General Artificial Intelligence (III)

author:AI self-sizophistication

Original AGI Alliance Dr. Wu General Artificial Intelligence Alliance

Hello everyone, today we continue to share the original review of general artificial intelligence (AGI) technology, this short review will systematically sort out the current status and status of AGI development, and cutting-edge collection of the most driving results, which can be used as a primer for the field. This sharing will last for five issues, and this issue mainly talks about cognitive technologies based on large models.

Note: This article is in the form of PPT+ speech, it is recommended to use a computer instead of a mobile phone to display and watch, the speech is located above the explained PPT, some of it is a personal opinion, please understand that it is not rigorous enough.

Overview of General Artificial Intelligence (III)

The above discussion has already touched on our two types of cognitive architecture, and here we recall that we divide into type 1 and type 2 in terms of whether they have microstructure or not. Type 1 is characterized by the learning and combination of meta-skills, including the study of cognitive architecture of sub-symbol space. and some neurosymbolic research. Type 2 is big data training based on end-to-end models, mainly using transformer and reinforcement learning as typical cases. The following mainly introduces type 2.

Overview of General Artificial Intelligence (III)

At present, the big language model is progressing very rapidly, and there are already a large number of hundreds of billions and tens of billions of large models as shown in the figure below. These models have obvious potential advantages for general-purpose intelligence.

Overview of General Artificial Intelligence (III)

Overall, there are now two types of mainstream technology routes. Among them, OpenAI has been adhering to the generative technology route. Includes GPT3 as well as code-generating Codex. and some subsequent evolutionary GPT 3.5 versions. The other is Google's BERT class, a large language model route based on mask prediction. In the later period, Google also gradually diversified, forming a generative technology route similar to OpenAI. DeepMind also has large models like Gopher and Chinchilla. Meta AI focuses more on large models built by multi-expert systems (MoE).

Domestic such as KLCII also has some ultra-large-scale MoE models such as Wudao 2.0, with trillion parameters. In addition, we can consider that its main technical characteristics include the following aspects,

First, Transformers are used, either in Encoder only or Decoder only form, or Encoder-Decoder form.

Secondly, it has a large scale or super scale, that is, it has reached the parameter level of tens of billions or hundreds of billions.

Third. In terms of task processing and tuning, the Prompt method is usually used. Used to improve the task versatility and small sample and zero sample capabilities of language models, the In-context learning method allows large language models to learn small shots during inference through a small number of examples, Chain of thought improves the accuracy of learning through step-by-step thinking, in addition to Instruction tuning to achieve instruction-level prompts, Tuning on code improves the ability of large language models to write code. RLHF enhances the ability of language models to provide answers that humans need by training an evaluation model with human feedback samples, and further fine-tune implementation of large language models based on evaluation models.

In terms of implementation structure, some models have the form of Pathway or MoE, through a variety of categories of FFNs, and the similarity evaluation of the current case, to select a small number of suitable expert subnetworks to implement reasoning, in this way can accommodate more parameters, and realize the specialization of the task. In addition, the model gradually evolved from pre-trained plus fine-tune forms to few-shot, one-shot, and zero-shot forms.

(Meta AI [ref]: Efficient Large Scale Language Modeling with Mixtures of Experts)

Overview of General Artificial Intelligence (III)

Current big language models have met or exceeded the human average on several datasets. For example, general knowledge tests, some exams, IQ Benchmark, and so on.

Overview of General Artificial Intelligence (III)

In the common sense multi-step logical reasoning of large models, Chain of Thought significantly improves the computational accuracy of the model, and can achieve most of the tasks on Big Benchmark Hard beyond human performance, Big Benchmark Hard chooses some problems that are more difficult for large models, so this also represents that the ability of a large number of natural language processing tasks is close to or exceeds that of humans. IN ADDITION, DAVERSE FURTHER IMPROVES THE PERFORMANCE OF LOGICAL REASONING BY LETTING THE MODEL OUTPUT MULTIPLE SOLUTIONS, ANALYZING THE SIMILARITY OF THESE SOLUTIONS, AND SELECTING THE OPTIMAL SOLUTION THROUGH THE VOTING MECHANISM THAT MOST OF THE SOLUTIONS ARE CORRECT.

Overview of General Artificial Intelligence (III)

The following introduces the mathematical problem-solving ability based on the large model, because mathematical problem solving is the most important in logical thinking, reflecting the level of intelligence and logical thinking level of a class of problems, so the achievement in mathematical problem-solving ability often marks a core logical reasoning intelligence level of these models. We mainly introduce a large language model network called Minerva, which is designed based on PaLM, and it can be seen that it is based on the training of general natural language data, and also realizes further training on mathematical web pages and ArXiv datasets, which makes it have a strong ability to solve mathematical problems, which can achieve proof problems like the left one, which need to be broken down into several steps to complete. Minerva achieved 78.5% accuracy on this dataset, and SOTA has now reached an accuracy level of 82.3%.

Overview of General Artificial Intelligence (III)

In terms of professional knowledge retrieval question answering, Med-PaLM can realize medical problem processing close to human level, which is based on the PaLM large language model and achieves a significant improvement in the answer quality of medical question answering in this application field through three technologies: prompt, instruction fine-tuning, and instruction-prompt fine-tuning. In addition, in the field of science, Galactica, a large language model is a super-large-scale language model trained based on paper and code and related scientific field materials, which realizes a unified tokenizing for various scientific fields, that is, various text formulas and even DNA sequences can be compiled into a unified token form, and large language model training is carried out, so as to realize the ability to summarize, reason and answer questions for scientific literature.

Overview of General Artificial Intelligence (III)

In terms of robot task processing and embodied intelligence of large language models, Google proposed the SayCan method, which proposes a method of combining large language models with accessibility value assessment models to adapt large models to the real operating environment of robots. Among them, the large language model is used to convert natural language instructions into a series of micro-instructions and scores, but these micro-instructions cannot be guaranteed to adapt to the current environment and state of the robot, so a value evaluation function is introduced to realize the adaptability score of micro-instructions and the current environment. By combining (multiplying) the scores of the two, the agent can not only listen to natural language instructions, but also adapt to the actual environment and the instructions are operable, which is a typical demonstration of the adaptation of large language model + enhanced learning system in the robot scene.

The specific operation process is shown in the figure on the right, the robot operation is divided into several steps, each step will be the natural language instruction itself, the execution instructions of all previous steps as the prompt part, used to guide the robot to generate the current instruction and score, a series of instructions generated by the language model through the value function to achieve acceptability of a score, the two scores multiplied together to obtain the total score, take the highest score as the execution instruction. Among them, reinforcement learning is used to implement the training of the value function.

Overview of General Artificial Intelligence (III)

This algorithm needs to perform many steps until it is done, which means that the execution is complete. In the experiment, we can see that the robot shown on the right performs multiple complex sub-steps, performs multiple operations such as going to different places, fetching things, putting things away, etc., and getting drinks and snacks for the owner.

Overview of General Artificial Intelligence (III)

There is also a very important area based on large language models, which is Program Synthesize, which is a way to let the program write its own program, or let the model write its own program. One of the most typical jobs is Codex, which is trained on GPT-3 and currently has some evolution on GPT-3.5. Codex used 54 million Github repositories in his version at the time, trained on more than 100 G code samples, and achieved a 77% success rate in the case of trial and error on the Human Eval dataset, which is a milestone. For example, the figure on the right can describe some code generation work such as language description to Python code, or describe and add examples to code.

Overview of General Artificial Intelligence (III)

We can apply the code generation ability of the large language model to the access to the database and the access to the table, Jigsaw this work is based on the Codex model, can achieve Pandas code generation, this work helps the realization of future continuous memory, because the ability of Pandas to operate the code table and the ability of the database is actually a management, indexing and storage of this declarative long-term memory data, so we can Codex As a driver to drive our long-term memory, this long-term memory is based on databases or tables to store information.

When Jigsaw did this, he also found that the original generated code of Codex was often error-prone, so he also made relevant optimizations, including building some Context Banks for preprocessing, which contains many Query, Program example pairs of sequences as few shot materials, and then extracts highly relevant examples from the current Query library to improve the accuracy of the current query's response. In addition, the code generated by the large model has been post-processed, including variable name correction, parameter error and semantic error correction, so as to make the generated code more correct.

Overview of General Artificial Intelligence (III)

In addition, another application of large language model program generation is mathematical problem solving. It generates a mathematical problem solving program through Codex and obtains the solution through the computer running program, which is equivalent to configuring a computer for the agent to do complex calculations to achieve mathematical solutions. In addition, it can also provide some interpretation of the generated code. It has been verified in a number of mathematics courses at MIT and Columbia University, as well as on topics on multiple MATH datasets, and achieved very good results, achieving an overall problem solving rate of 81%.

Overview of General Artificial Intelligence (III)

In addition, the program generation technology based on the large language model can also realize the control of agents in the home scene. In this example, its main body is based on GPT-3 large language model, on this basis, first by providing the corresponding background environment information, or variable information, and then providing some examples as code generation learnable samples, then the name of the defined function we want to generate the agent control, and then input to the large model, the corresponding task function execution process code will be generated. The code also makes corresponding judgments according to whether the current environment meets the expectations through assertion, and handles this behavior when the expectations are not met through the compensation processing mechanism to achieve closed-loop interaction with the environment, and realize the intelligent body control under typical tasks for the scenario of an intelligent robot in such a family.

Overview of General Artificial Intelligence (III)

Based on the Transformer, the execution process of the program can also be realized, just like the processor can execute the program, but this time through the neural network to execute the program, which has the ability to perform unstructured and fuzzy compared to pure program execution. It can realize the processing of this VQA task of program and image, as well as the refinement of policy execution. For VQA, it can realize the process of interaction between this instruction and this image, and for the refinement of this strategy, it can realize the refinement of an execution process for macro goals.

Let's take the Program Guided Transformer, or Proto, as an example. This example first has a program pointer update function that indicates at which step it is executed and decides whether to continue with this step or jump to the next step. Then there is an execution loop outside the program pointer, which has two substages: inference and pointer update.

Then this network structure is a bit like a processor that can execute this program. Its infrastructure is a three-layer Cross-Attention Transformer Layer form, first, implementing semantic attention, which includes functions (such as select) and parameters (such as bag). The second attention is the code dependency structure, that is, the current statement is connected to the execution result of which program statement, so use a line of a metric matrix of correlation to express their correlation. Finally, there is a fusion of scene information for processing data such as images and maps. After these three layers of attention, a neural network with program execution function is implemented.

Overview of General Artificial Intelligence (III)

Finally, we also tested the meta-skill combination ability of ChatGPT, where we made certain modifications to the assembly language instruction set of MIPS, modifying some instructions in Hanyu Pinyin, such as load replaced by jiazai, save became baocun, and the jne of the jump was rewritten as tiaobudeng and so on, but retained the meaning of each instruction. So we build a simple new instruction set, which can also be understood as a meta-skill set, on top of this, we test the large-language chat model ChatGPT and let it write a factorial code based on this instruction set, and the code written is shown on the right. The overall logic is correct by visual inspection, but the current experiment will have a large variance, sometimes produce different and incorrect code, and it is not easy to correct. Therefore, we can think that the current big language model has potential in this field, but there is still a need for further stabilization and improvement.

Overview of General Artificial Intelligence (III)

To sum up, the logical cognitive state of the general large model can be summarized as the following points, first of all, in terms of advantages, the large language model can use massive human text information for training, second, the current large model in many places does exceed the human level, or close to the human level, third, it can have a certain logical reasoning ability, including common sense, events, implicit reasoning, etc., fourth, it also has a certain code generation ability and code interpretation ability, and finally, It also has strong generative and strong creative properties, which have been demonstrated in recent areas of AIGC.

But it also has several shortcomings and challenges, including that it still does not count as a complete cognitive framework, it lacks long-term memory, situational memory, and model personalization, or it cannot remember what it has done before. In addition, it is based on the output of a limited context, which is still slightly insufficient for long-range writing or long-range logical reasoning. In addition, there is not enough interaction in the real world, including the lack of tight integration between cognitive perception, and the current logic is still based on text. In addition, an important factor affecting its development is that the training cost is relatively high, usually, the user lacks the ability to continuously update the model, the model is basically not modified after training, the cost of research and development is also high, the cost of use is not low, so the result is that there are fewer people who can afford to play.

Overview of General Artificial Intelligence (III)

The large model can be regarded as a concrete implementation of cognitive architecture at a certain level, with strong cognitive ability, is one of the hottest academic fields at present, but most analysis still understands it from the perspective of artificial intelligence content generation (AIGC), multimodality, and natural language processing, and does not understand it from the perspective of the significance of the milestone of strong intelligence. This article focuses on its cognitive understanding of logical reasoning, accurate interpretable output (program), and its potential ability to implement robotic tasks, which I personally believe may inspire more in the future than chatting or drawing. It is arguably the biggest innovation in the cognitive architecture that has developed more than 40 years. In addition, combined with its multimodal characteristics, it is expected to serve as a general foundation model, and we discuss its later evolution in the last sharing. This issue is shared here, and in the next issue we will focus on learning technology.

Read on