Is ChatGPT a bubble, or is AI really coming?

An interactive interface that solves a wide variety of problems? Before ChatGPT came out, many people may not dare to think about it, and even some people will think now, the current ChatGPT can't do it, right! With the blessing of computing power, algorithms, data aggregation, processing, annotation, continuous iteration, and ingenious design of application logic, ChatGPT was born. ChatGPT all kinds of problems, there are many articles on the Internet, everyone makes up for themselves, here we mainly discuss whether ChatGPT is a bubble, or is AI really coming? For this problem, we explain the advantages and disadvantages of chatGPT through in-depth technical analysis to see what the real solution is.

1. What is chatGPT?

On November 30, 2022, OpenAI unveiled a conversational chatbot model called ChatGPT, which interacts through conversations.

chatGPT is mainly based on GPT+RLHF, GPT-3.5 can be considered a very large pre-trained language model (LM), RLHF is a reinforcement learning method based on human feedback to optimize the language model (LM). To sum up, chatGPT is a large model + artificial feedback reinforcement learning. It can be said that the reason why chatGPT caught fire, RLHF is very important. GPT3 attracted attention at the time, but it was not so good and not so popular.

GPT (Generative Pre-trained Transformer) is a generative pre-trained language model that uses unsupervised deep learning methods to pre-train a set of language models. RLHF (Augmenting Reinforcement Learning with Human Feedback) is a reinforcement learning technique based on human feedback, which uses the trained reward model and multiple rounds of iteration to optimize the pre-trained language model.

GPT's generative language model training is based on context information to predict the next word, and then calculate the loss of each word, which is not a good way to optimize the model from the complete output level. The manual feedback method can evaluate the output of the model as a whole, which is more in line with the real scenario than the loss function of "predicting the next word based on context". It can be seen that the introduction of RLHF is very important. But chatGPT is essentially a model for promot (prompt/input) probabilistic prediction, which is a correlation between conditions and probabilities.

The main tasks of chatGPT include: classification, translation, reading comprehension, Q&A, fill in the blanks, news/fiction and other content generation, code generation and review, etc.

Second, the principle of chatGPT

As mentioned above, ChatGPT is based on GPT (Generative Pre-trained Language Model) and RLHF (Reinforcement Learning Based on Human Feedback), and here we focus on analyzing the structure and principle of chatGPT from these two aspects.

1. GPT architecture and its principles

GPT can be simply understood as a set of mathematical models, data input to GPT, GPT output a set of data. For example, enter a paragraph, and the GPT model automatically generates each word (sentence) of the answer based on the language/corpus probability. That is, using a paragraph that has already been said as an input condition to predict the probability distribution of the occurrence of different statements or even language sets at the next moment.

GPT is a generative pre-trained language model, and up to now, there are three versions: GPT-1, GPT-2, and GPT-3. The three versions first use unsupervised methods, that is, in a large amount of unlabeled corpus data, pre-trained a model that is not related to downstream tasks, and the main difference between the three versions here is that the parameters of the model are getting bigger and bigger. Secondly, in the use phase, the three versions experiment with fine-tuning the model through task data in specific tasks, fine-tuning the model without using specific task data, and solving specific task application problems through a small amount of task data interaction.

The GPT model architecture is shown in Figure 1, which adopts a model with Transformer as the core structure, where Trm is a Transformer structure, and a Trm can be simply understood as a set of data flow structure.

Figure 1 Model architecture of the GPT series

The complete structure of GPT-1 is shown in Figure 2, where the 12-layer Transformer structure used by GPT is on the left, and the structure of the pre-trained model fine-tuned according to different tasks is shown on the right. The focus here is to build a set of task-independent single-model frameworks with powerful natural language understanding capabilities: generative pre-training and discriminative fine-tuning. Pre-training and fine-tuning is defined as follows:

Pre-training: Using the objective function of the standard language model, that is, the likelihood function, predicts the probability of the next word based on the first k words. In the GPT pre-training phase, its authors used the language model (LM) training method, which uses a variant of transformer, the multi-layer transformer-decoder.

Fine-tuning: Use the complete input sequence + label. Objective function = supervised objective function + λ* unsupervised objective function.

It can be seen from the above that different downstream tasks can be realized by superimposing layers corresponding to downstream tasks for the pre-trained model. The pre-trained language model can make full use of large-scale unlabeled data to learn the general language model, and then use a small amount of labeled data from downstream tasks for model fine-tuning to achieve better results for specific tasks.

Figure 2 GPT-1 model architecture

GPT-2 and GPT-3 mainly increase the capacity of the model and remove the supervision and fine-tuning. The authors believe that large models should be able to learn the ability to multitask, without the need for large labeled datasets to solve specific task problems, although so far, in fact, it has not met expectations.

2. GPT development stage

Here we introduce the main changes that chatGPT has undergone through through the development process of a GPT model. Here we first take a look at the history of chatGPT:

Figure 3 Versions and differences of chatGPT

As shown in the figure above, the three generations of GPT-1, GPT-2, and GPT-3 models based on text pre-training are all models with Transformer as the core structure (Figure 3), the main difference is the parameters, that is, the later version of the model is very large. ChatGPT adds feedback to artificial reinforcement learning (RLHF) on top of this.

GPT-1 is generative pre-training in unlabeled text corpus, and then discriminative fine-tuning is carried out to obtain ability improvement in specific task scenarios.

Compared with GPT-1, GPT-2 does not make major adjustments to the model structure, but uses more parameter models and more training data. GPT-2 also caused a lot of sensation at the beginning of its birth, and the news it generated was enough to deceive most humans and achieve the effect of faking the real thing. At that time, it was concluded that when a large language model is unsupervised trained on a sufficiently large and diverse dataset, it can perform tasks on many domain datasets.

Based on the above conclusions, GPT-3 is proposed, GPT-3 greatly increases the number of parameters, with 175 billion. In addition to GPT-3 can complete common NLP tasks, researchers unexpectedly found that GPT-3 also has a good performance in writing SQL, JavaScript and other languages and performing simple mathematical operations. The basic training method of GPT-3, including model, data, and training are similar to GPT-2.

The GPT series model is introduced above, mainly the increase of parameters, the increase of data volume and its diversity. This prediction and experimental results brought good results, but still did not meet the expectations of use, so the researchers used the reinforcement learning method of manual feedback to optimize the system on this basis. The RLHF is highlighted below.

3. Reinforcement learning training process based on human feedback

In addition to the aforementioned models with increasingly large GPTs, ChatGPT is trained using reinforcement learning from human feedback, a method that augments machine learning with human intervention for better results. During the training process, the human trainer plays the role of user and AI assistant, and is fine-tuned by the proximal policy optimization algorithm. The training steps of RLHF are shown in Figure 4, and the process mainly includes three stages: collecting data to train the supervised policy model (SFT), collecting comparative data and training the reward model (RM), and using the PPO reinforcement learning algorithm to optimize the strategy for the reward model.

Figure 4 Training optimization process of ChatGPT

1) Collect data to train a supervised strategy model (SFT)

Here OpenAI first designed a prompt dataset, which has a large number of prompt samples and gives various task descriptions; Secondly, find an annotation team to annotate this prompt dataset (essentially a manual answer to high-quality answers); Finally, with this labeled data set to fine-tune GPT-3.5, this fine-tuned GPT-3.5 we call SFT model (supervised fine-tuning, full name Supervised fine-tuning, abbreviated SFT), at this time the model has been better than GPT-3 in following instructions/dialogue, but not necessarily in line with human preferences. The specific steps are as follows:

1. Extract problems from data sets;

2. The indexer writes high-quality answers;

Use these data to fine-tune GPT-3.5.

2) Collect comparison data and train a reward model (RM)

This stage mainly trains the reward model by manually labeling the training data (about 33K data). First, questions are randomly drawn from the dataset, and then the model generated in the first phase is used to generate multiple different responses for each question. Finally, the human annotator gives a ranking order for these results taking into account.

Next, use this sorted result data to train a reward model. Combine multiple sorting results to form multiple pairs of training data. The RM model accepts an input that gives a score that evaluates the quality of the answer. In this way, for a pair of training data, adjust the parameters so that high-quality answers are scored higher than low-quality scores. The specific steps are as follows:

1. Multiple answers to sampling questions and models;

2. The indexer scores and sorts the output;

3. Train the reward model using sorted comparison data.

3) PPO (Proximal Policy Optimization) reinforcement learning algorithm is used to optimize the strategy for the reward model.

This stage uses the reward model trained in the second stage to update the pre-trained model parameters by relying on the reward score. The specific method is to let the SFT model answer a question in the prompt dataset again, and then no longer let the manual evaluate the good or bad, but let the reward model trained in stage 2 score and sort the prediction results of the SFT model. The specific steps are as follows:

1. Sample a new problem;

2. Initialize the PPO model based on supervised strategy;

3. Let the strategy model (SFT) generate answers;

4. Reward model data calculates rewards for the answers generated;

5. The reward model uses the PPO update strategy model (SFT).

Finally, if we keep repeating the second and third stages, that is, using a reward model (RM) to judge whether the text generated by the model is of good quality (catering to human preferences), so that it is constantly generated, evaluated, optimized, and so on, and the cycle is iterated to finally train a higher quality ChatGPT model.

Third, what can be done and what cannot be done

1. Main functions and examples of chatGPT

This is mainly from the design purpose of chatGPT: multi-task single-model learning, the expected task types (functions) are introduced. Key features include:

Q&A and completion tasks;

Multiple rounds of Q&A, mainly referring to applications;

Code generation and review (code completion, natural language instruction code generation, code translation, bug fixing);

Text summary;

Creative writing (e.g. writing news, stories, emails, reports, and writing improvements);

translation tasks;

common sense reasoning;

reading comprehension;

Classify;

There are many application cases of chatGPT on the Internet, here are a few examples:

As can be seen from the example, in some professional scenarios, the questions answered by chatGPT seem formal and correct. But look closely, it is a serious nonsense, and we will analyze the specific problems below.

2. Analysis of the advantages and disadvantages of chatGPT

Through the above technical analysis, we can know that chatGPT is essentially a statistical model, relying on preconditions to generate subsequent content. It solves the process of quantitative change to qualitative change of content association, or path verification, and can better create some value from the perspective of word relevance. However, the contribution of this correlation statistics method in terms of deterministic knowledge (common sense, theorems, formulas, mathematical derivation, logical derivation) and scenario knowledge (scenario solutions) is still relatively limited. The advantages and disadvantages of chatGPT are summarized as follows:

2.1 Advantages of chatGPT

chatGPT proposes and validates a method based on pre-trained large model + RLHF, and achieves very good social effects, so that the combination of big data and manual use verification (iteration) has been recognized by the public.

chatGPT performs well on certain fixed-mode tasks: article generation, code writing optimization, open-domain dialogue, reading comprehension, etc., and has some practical value.

ChatGPT provides a task-agnostic, general-purpose (fixed set of certain applications) approach to machine learning, leading to a new technology path.

ChatGPT is a statistical model that can solve the review and aggregation of a large amount of text content that humans are not good at, which can provide a certain amount of help for human decision-making behavior and simplify the workload.

ChatGPT provides a great possibility for the standardization of NLP technology in some specific fields (such as creation, customer service, etc.), reduces the threshold for the use of NLP technology, and is very helpful for promoting the intelligent transformation of the industry.

2.2 Disadvantages of chatGPT

The disadvantages of chatGPT to be introduced here are actually an analysis of what chatGPT is not suitable for, or what are the problems. Before we elaborate on the downsides, let's look at an example of chatGPT in the area of coding that it is best at:

On December 5 last year, Stack Overflow, the world's largest technical Q&A website, announced the temporary ban on ChatGPT, officially saying: "The main problem is that although ChatGPT produces a high error rate of answers, it is difficult for us to see where it is wrong", which will cause the problem to be answered.

Stack Overflow explains that mainly because the average rate of getting correct answers from ChatGPT is so low, publishing answers created by ChatGPT is very harmful to websites and users who ask or look for the right answer.

The above case describes a core problem in chatGPT, and it is also described in the fourth part why it is not suitable for use in the field of technology. Because chatGPT is a generative model, it is essentially a probabilistic model, which is the correlation between conditions and probabilities. It can only give results based on probabilistic calculations, and cannot clearly indicate which ones are correct and which are potentially problematic. What the tech world needs is an accurate and reliable answer. Even in terms of knowledge verification, chatGPT is sometimes inferior to search engines. Below we specifically list some chatGPT problems:

It doesn't know what it knows, and it doesn't know what it doesn't know;

Unable to reason, unable to think;

chatGPT sometimes writes plausible but incorrect or ridiculous answers, and the answer you get may sound very authoritative, but it is likely to be completely wrong and fake a serious book;

It is still difficult to generate long texts, such as writing novels, which may still be repeated;

weak interpretability, how the model makes decisions, and which weights play a decisive role;

Factual questions will be unrealistic: ChatGPT does not have a good grasp of some common sense and factual content, and often some unrealistic content will appear;

So far, most of the "creations", including chatGPT, are essentially just "scraping" and "stitching".

Professional applications are not good, such as mathematics, physics, etc.

Fourth, how to solve the application in the field of science and technology

chatGPT has made great progress in the field of natural language through the combination of big data, large computing power, large models and artificial application feedback reinforcement learning. But this does not solve the landing of AI in the field of science and technology, because it is a probabilistic generative model, and the authenticity, reliability, causal correlation, etc. of the generated content cannot be guaranteed. Of course, chatGPT is still valuable as a core technology point to solve the application of science and technology.

Problems in the field of science and technology require evidence, experiments, derivations, reasoning and other causal relationships to generate conclusions, require a set of solutions (logic), and have been verified by a large number of experiments or theoretical derivations.

At present, there is no theory in the field of AI that can tell AI how to go like Newton's three laws. Even if it develops enough intelligence in the future, the development of artificial intelligence is not a cluster, and it requires continuous iteration, optimization, and precipitation.

Below we introduce the characteristics of the problems in the field of science and technology and what the current solutions are; What difficulties need to be overcome to solve problems in the field of science and technology, and why is the engineering system important; Finally, a summary of the solution "Professional Field Super Scientist Brain"!

1. Characteristics and solution forms in the field of science and technology (brain)

The essential difference between the application of science and technology and the pan-field is the evidence-based nature of the results and the explicit expression of the process (ChatGPT does not need to follow the scientific basis, based on the probability to give conclusions, and the process is a black box and cannot be explained. ）。 That is, the solution or solution obtained needs to follow scientific basis (mathematical derivation, experimental results, expert consensus, etc.), and the process of obtaining conclusions is visible (scientific and technological knowledge needs to be explicitly expressed, iteratively verified and stored).

It can be seen that the answer to scientific and technological problems is not just the probability of certain words. The field of science and technology needs a set of knowledge service system supported by accurate results, visible processes, and authoritative evidence support for the source of elements, and in the same way, a set of quantifiable and computational knowledge structures that computers can express, store and apply are necessary. Therefore, we need an engineering system to ensure the practical application of artificial intelligence in the field of science and technology.

chatGPT is not suitable for solving most technology problems. However, the chatGPT model also has great value for the application of AI in the field of science and technology, which is illustrated by a simple architecture diagram. We define the form of knowledge services in science and technology as dedicated brain engineering (Figure 5) to illustrate solutions in science and technology:

Figure 5 Dedicated brain engineering

As shown in the figure, there are three elements at the core of the dedicated brain engineering system, a set of cognitive technologies (chatGPT can be one of the technical points), massive professional data, and expert groups. Data and people are well understood, and we focus on the cognitive technology system. Here we call it the dedicated brain cognitive computing engine, which mainly includes two aspects: AI technology system and knowledge graph system. AI technology covers statistical models like GPT, as well as artificial feedback reinforcement learning methods like RLHF (we call it bionic adversarial training). The knowledge graph system is mainly an expression, storage and use form of knowledge that can be understood and reasoned by both people and computers.

Figure 6 Dedicated brain

It can be seen from the figure that a set of semantic network system (knowledge graph) that can be understood and used by computers and people is the basis for knowledge storage, explicit expression, knowledge reasoning and knowledge computing. The knowledge output can also take the form of a text, picture, table, or a set of open tools; The decision-making, supervision, and verification of some knowledge inevitably require human collaboration. Finally, a set of AI technology system supports the operation of the entire dedicated brain system.

Since the dedicated brain is a set of systems, not just a set of technology, it can not be as simple as a technology implementation, in the same way, the dependence on a certain technology is not so strong, the technical system of the dedicated brain needs to be a layered and modular design, with the continuous advancement of technology, each module can be independently and quickly upgraded, replaced. In this way, another focus of the specialized brain is in the construction and operation of the engineering system, and because the specialized brain system is relatively complex, we use the engineering idea to build and operate the specialized brain system - specialized brain engineering.

2. Engineering system and its necessity

Above we introduced the composition of the dedicated brain, and also mentioned the need for engineering operations. Before understanding the dedicated brain engineering system, let's discuss the difficulties in the construction of the specialized brain system through the application scenario of the specialized brain system:

Figure 7 Business logic of brain

Figure 7 shows a process in which users use dedicated brains to solve scientific and technological problems, and for simple problems, machines can provide verified answers to solve them directly; However, for complex problems, or problems that are difficult for machines to answer accurately, experts are required to assist; For problems that have already been solved, they need to be stored and made understandable to the computer; For complex problems, a series of solutions need to be combined to form a solution, etc. From this, we summarize the following questions:

How do you turn massive amounts of data into computer-usable knowledge?
How to confirm the accuracy and validity of knowledge?
How do you get experts willing to participate?
How to find an expert who is just adapted to answer questions (it is obviously unreliable to find an academician to answer 1+1)?
How do I rate an answer?
How are answers and reviews stored and used?
How to store and utilize the process, logic, and computability of question answering?
How can cross-cutting problems be solved?
How to solve the intellectual property rights of individuals and enterprises?
How are the boundaries of intellectual property rights defined?
How to provide solutions for different scenarios?
How to avoid the concentration of valuable knowledge on a central server (individual knowledge must be stored with individuals)?

The above mentioned the representation and storage of knowledge and its relationships, the guarantee of knowledge correctness, how people collaborate, how to optimize manual feedback, intellectual property rights and other issues. That is, it covers technical issues, but also includes a series of issues such as expert organization and social collaboration. These problems are actually problems contained in a set of landable engineering solutions and technologies, and with these problems, we look at specialized brain engineering (Figure 8).

Figure 8 Demand-driven dedicated brain engineering system

The dedicated brain engineering system adopts the modes of knowledge semantic representation (technical system of machine self-learning), demand traction (automatic trading of knowledge based on property rights protection), and benefit-driven (knowledge solution of human-machine integration). Firstly, the form of knowledge representation, storage, reasoning, computation and optimization (semantic web) is determined through the three-layer graph structure. Secondly, the semantic web is used to profile people and data, and to construct accurate matching of needs and solutions, requirements and experts. Finally, through the definition of property rights and the model of one-time construction and multiple marketing, the interests of knowledge providers are maximized. Moreover, the dedicated brain engineering system is composed of n many specialized brain individuals, and the specialized brain is divided into fields and levels. Figure 9 is an example of brains in different fields, and Figure 10 is an example of brains at different enterprise, application or individual levels.

Figure 9 Domain specialized brain group

Figure 10 Different business clusters

Through the above analysis, we can understand that the application of AI in the field of science and technology is a set of engineering systems (we name the dedicated brain, the specialized brain covers a series of technical points), chatGPT is a technical point in the system, of course, the rapid progress of each technical point will promote the strength of the engineering system (specialized brain). The technical verification of chatGPT will be of great help to the brains in the process of data to knowledge transformation, knowledge graph construction, knowledge and solution matching.

3. Why is it a professional field, a super, a scientist brain

The full name of "specialized brain" is "super scientist brain in the field of expertise" (specialized brain). Here there is a question, why the field of expertise, why the brain of a super scientist? This is a technical problem, but also a business logic problem, which we will illustrate through business.

Why do you need a professional field, let's give an example, architects correspond to architectural design drawings, doctors correspond to forensic medicine, lawyers correspond to various laws and cases, stock traders correspond to financial knowledge, market information and various K-line charts. In other words, the knowledge reserve, knowledge use logic, interaction and solution of problems in each field are different. So solving industrial design problems is completely different from our business logic for treating patients, and knowledge in the field of expertise has boundaries.

Why the super scientist brain, and not the super belt master, the super PK first person, the super saiyan? First of all, let's talk about scientists, in 1993 our country put forward "science and technology is the primary productive force", scientists should be the cornerstone of a country, according to the current buzzword, our era of Internet celebrities include Deng Jiaxian, Qian Xuesen, Qian Sanqiang, Yuan Longping, Li Siguang, Hua Luogeng and many other thunderous people, for us to solve the problems of food, national defense and discourse, national geology and so on. Then we talk about super, the current tasks in the field of AI are too simple, so people feel that they can be done simply through learning and data, but for the storage of knowledge, the query of knowledge, and the calculation of data, people really cannot be compared with computers, that is, memory, search and calculation are the strengths of computers, and people are characterized by understanding, reasoning and innovation. Ordinary science and technology workers, we mainly look for answers to questions in a quantitative amount of knowledge that we remember and use, which is mainly the strength of computers. So super mainly refers to the ability to give computer reasoning and innovation.

5. Challenges and opportunities

Through the above analysis, we can see that chatGPT is not yet real AI, but it is not a bubble either, but a new starting point. From the dedicated engineering system, we can see that the real AI must be an engineering system, and every leap in core technology promotes the rapid development of this system. chatGPT has been recognized by society, making everyone more and more accept the model of human-computer interaction, and the real intelligence comes from this interaction, from an engineering system based on human-computer collaboration, machine self-learning, and self-organization between machines. The dedicated brain system is not a technical point, but an engineering system, and the technical point in it needs to be able to constantly change with the progress of foreign technology research and development. Under the wave of chatGPT, we have a great opportunity to promote and realize dedicated brain engineering. But because it is an engineering system, there must still be some problems to solve.

The Future of AI from chatGPT – "Brain"