laitimes

ChatGPT, the "miracle" in the field of algorithms

author:第一财经YiMagazine

Reporter | Wu Yangyang

Edit the | CHEN Rui

If the history of technological development is divided into large and small years, 2022 should be regarded as a "medium" year in the history of artificial intelligence (AI) - it is not as good as 2012, when engineers stacked 5 layers of artificial neural networks for the first time, and the concept of "deep learning" was born; It's not comparable to 2016, when humanity's best Go player, Lee Sedol, was defeated by an AI called AlphaGo.

But the field of artificial intelligence in 2022 will definitely bear more fruit than any year after 2016.

In April, a model called DALL-E 2 was able to generate eye-catching images in seconds according to instructions; At the end of November, a model called ChatGPT was reappeared, logically answering any human question — except for the weather forecast. Both models were developed by OpenAI, an artificial intelligence company based in San Francisco.

ChatGPT, the "miracle" in the field of algorithms

ChatGPT's official website introduces it to optimizing the language model of conversation.

A month later, London-based Deepmind also released AlphaCode, a model that could write code. As the name suggests, it is the brother of the company's Go master AlphaGo and protein prediction master AlphaFold.

Previously, AI was generally good at analytics, such as determining whether an email was spam, identifying whether an image was an apple or pear, or guessing which type of product you liked. Unlike this type of "analytical AI," DALL-E 2, ChatGPT, and AlphaCode, born in 2022, are all "generative AI" that are better at creating than analytics.

ChatGPT, the "miracle" in the field of algorithms

A masterpiece of DALL-E 2. According to the official website, DALL· E 2 creates original, realistic images and artwork based on text descriptions.

One thing is for sure, both analytical and generative AI are still based on deep learning. In the past two weeks, most people who have experienced ChatGPT will go through this process: they are amazed at the first test, amazed that ChatGPT can give such logical answers, and the debriefing report it writes can be used almost with a change; Some people use it to debug code, and others use it to make up bizarre and creative science fiction stories; But soon, everyone found that this AI's answer to many questions was not reliable, for example, it would think that Jia Baoyu should marry Jia's mother, or give a serious proof that "4 is an irrational number".

ChatGPT, the "miracle" in the field of algorithms
ChatGPT, the "miracle" in the field of algorithms

A colleague of ours gave ChatGPT a question, and here's the answer.

ChatGPT, the "miracle" in the field of algorithms

Others asked ChatGPT to "write letters to leaders."

ChatGPT, the "miracle" in the field of algorithms

But if you ask ChatGPT what to think of Lin Daiyu upside down and weeping willows...

That's right, ChatGPT is still a "black box" of deep learning, you don't know what content it will generate, and it doesn't know it itself.

But now, the scale of neural networks used for deep learning has changed. Before ChatGPT was launched, OpenAI was known for a series of GPTs (Generative Pre-trained Transformers), of which GPT-3 was once the world's largest natural language model with 175 billion parameters. GPT proves that larger models do work better – at least more coherently, logically, and creatively than all previous chatbots. But this is still far from the final answer.

Language has always been regarded as the jewel of intelligence, both for people and for artificial intelligence. The GPT series was born with high hopes, and this part of the people believed that there was nothing mysterious about human language, it was a "Predict The Next Word" mode. Think back, do you often predict what the other person will say next when you hear the other person say the previous word? Yes, GPT advocates believe that everything is based on probability, and so is language, and probability is based on statistics, and what has been said above almost determines what will be said next.

But there are also cautious people who believe that human language is not so simple, because language reflects not only the relationship between words, but also the relationship between words and the outside world, so playing word games alone will not fundamentally solve the problem.

So, is an AI that is still just playing with words worth taking seriously about it? Or, how do we see its value? What are cutting-edge algorithm developers doing to achieve better intelligence?

Yimagazine and senior algorithm engineer Xu Yuchang had a conversation on ChatGPT-related technical issues. Xu graduated from the Department of Computer Science of Fudan University and joined Instant in 2018, focusing on the technology trends of recommended search for a long time, and has also dabbled in NLP (neuro-linguistic programming), audio and other directions.

Yi = YiMagazine

Xu XuChang = Instant Algorithm Engineer

Only one word can be predicted at a time

Yi: What was your experience with ChatGPT and did you think it was good? Stupid? Or is it an improvement over other AI?

Xu Yuchang: Actually, there have been some improvements, but this progress is the same as the previous DALL-E 2 model (note: OpenAI's image generation model), but it is only a step forward that everyone finds interesting, which is why it is so widely circulated. The technical change behind this step may not be very large, for example, it may just be that the accuracy rate has increased from 85% to 90%, which is only 5 points of progress, but this step has officially entered the ranks of the application level.

Yi: What is the accuracy rate based on?

Xu Yuchang: This is just a popular saying, for example, a classification model, give it 100 pictures, it is 80 correctly, then its accuracy rate is 80%, 90 correctly, the accuracy rate is 90%. Different models actually have different evaluation criteria, like this generative model, in fact, it is more about user experience. When training, someone will come and tell it whether the generated content is good or not. For example, ChatGPT, it will also have a feedback button on the web page that likes or likes for you to choose. Iterate the model by having more satisfactory answers and fewer unsatisfactory answers.

Yi: How does this generative AI generate an answer from scratch?

Xu: It's based on a natural language model, which emerged around 2000, and the essence is "predict the next word" and predict the next word. For example, if I tell you, "He is a ..." and ask you to predict what the next word will be. If you do this with two models, one of which gives the word "yes" and the other model gives the word "man", which word do you think is more likely to appear? Other models may give "king". For some users, the answer "king" may be more advanced than the first two. In short, each model is trained through a large number of corpus, and it sees which word is most often followed by which word, and predicts that it is the next most likely word to be generated, which is the principle of generative models.

Yi: Can I only generate at most one word at a time?

Xu Xu: Only one word can be generated. ChatGPT seems to be able to speak in large paragraphs, but the content it answers is also generated word by word. However, after each word generated, it will reuse the word as input to predict the next word that is most likely to appear next. For example, when it says "He is a king", the word king will in turn become part of the input, contributing to the generation of the next word.

Yi: Does this create a situation where you say where and where you look coherent, but don't answer the question?

Xu Xu: There will be such a problem, which we call "long-term memory loss". In fact, people will have this problem, for example, if I say a paragraph, sometimes what I say later will be completely unrelated to what I said earlier, because I don't remember so many things mentioned earlier.

AI has the same problem, and its input length is also limited, because too long an input will make the AI uncalculated. One of the reasons why ChatGPT is better at it, more logical in language, and more coherent in context across multiple rounds of dialogue is that it has better memory and can handle longer inputs.

Yi: Can longer inputs and better memory produce logic?

Xu: On the surface, ChatGPT produces some logical coherence. In fact, these current AIs are still deep learning models, do not have the so-called logical reasoning ability, they are essentially based on statistics. Its logic, the words and sentences it generates, all come from the corpus it has ever seen.

The logic of ChatGPT comes on the one hand from the logic used in the corpus it has seen, and on the other hand from supervised learning. GPT-3 (note: OpenAI's natural language model before ChatGPT) has a very large corpus, more than 40T. On the basis of GPT-3 large-scale corpus training, ChatGPT adds supervised learning for human marking. For example, after the machine generates the answers, ChatGPT engineers will review those answers, they will give feedback on where the answers are good and where they are not, modify their answers on the basis of ChatGPT answers to make them more logical, and then feedback the answers to ChatGPT to learn. In fact, the data quality has been further optimized.

Yi: How does this supervision exhaust the questions that users might ask?

Xu Xu: It really can't be exhausted, but whether these limited feedback data are enough is another question. At least for now, from the perspective of ChatGPT's effectiveness, this limited data has helped it take a step towards the application level, and this step has made everyone think it is fun.

Yi: What level of memory is needed to achieve the coherence that ChatGPT is today?

Xu: Although the length of information that GPT can remember (that is, the length of input when predicting the next word) is already longer than previous models, it is actually not too long, about 2048 words.

This length is related to cost. The increase in cost is measured in square levels, that is, for every doubling of length, the training cost increases by square levels. The original GPT and Bert (Google's generative model) were able to memorize and input 512 words, and now they can do 2048, which is a 4-fold increase in length and a 16-fold increase in training costs.

Eating:

Do people also build language in the same way as these models "predict the next word"?

Xu Yuchang: Now the way the model speaks and the person speaks are essentially two kinds of thinking. When people speak, they don't just think about what the next word is, but think about what my logic is below, and then organize the language after they know what to say. Instead of jumping out word by word like a model, it doesn't even know that what it says is incoherent, so it may produce a lot of nonsense literature.

Yi: Are these models loaded with human grammar that linguists have constructed?

Xu: As far as I know, generative models can't do that. It seems that we have summarized a lot of rules for our language, and you think that this rule must be correct, but if you write such a dead rule into the model, it will only follow this rule, and in the end there is no guarantee that everything it generates is correct.

So the results generated by generative models are trained by a large number of corpus, not because people give it linguistic rules. Even those connecting words, logical words, the parts of it that you think are very logical, are because it has been seen in a large number of corpus. If it has not been seen, it cannot be generated out of thin air.

The Age of Big Models: A Race for Parameters

Yi: Why do GPT series models perform better than other generative AI?

Xu: Starting with GPT and Bert, the industry has the concept of "big model" for the first time. Before that, all neural networks were relatively small and did not have many parameters. As soon as GPT-1 appeared, the number of parameters was hundreds of millions. Before, everyone didn't know whether so many parameters were good or not. The earliest neural network only had 2 layers, until AlexNet in 2012 had a 5-layer neural network, the image effect was indeed better, but everyone did not know how deep the network stack is good.

By GPT, GPT-1 started with 12 blocks, which you can understand as 12 layers, GPT-2 had 48 blocks, and GPT-3 stacked to 96. As it turns out, they do wonders.

Yi: Does a large model refer to a large network with many layers or parameters?

Xu Yanchang: The more layers there are, the more parameters there are, and each layer of the network has its own parameters. Training, on the other hand, means having a larger data set, otherwise the shallow network would already be able to train the data well enough.

So GPT actually does two things, one is that it makes the model bigger, and the other is that it uses more and higher quality data. Do you say that there is a breakthrough in technology? No, it still uses deep learning. If I were to do it, I could do it, but the question was whether I had that much data, whether I had so much computing power, whether I could get results as good as it, and how much money I would like to invest in it.

Eating:

Also based on deep learning, is there a technical path other than "predict the next word" used by GPT?

Xu: Google's Bert and OpenAI's GPT follow two paths. GPT takes a one-way transform, while Bert's technical path is a two-way transform. GPT makes one-way predictions from front to back, while Bert does cloze.

For example, the phrase "He is a man" and cutting out the word "a" in the middle to fill you in the blanks is what Google's Bert is doing. Because Google believes that both "He is" and "man" contribute to the word "a". GPT, on the other hand, predicts the next word "a" based on "He is" when it can't see the following "man". What GPT has always wanted to do is generative models, so it sticks to the path of one-way prediction, from GPT-1 to GPT-3 are one-way predictions.

Yi: Is the effect of two-way prediction not as good as one-way prediction?

Xu Xu: In fact, the earliest two-way prediction effect is better than one-way, and theoretically it is also the case. But now it is difficult to say who is better, because the models are getting bigger and bigger, for example, Bert first stacked 24 layers of neural networks, GPT-2 stacked 48, their network depth is not the same. Even if the network depth is the same, the parameters can be adjusted, as long as the parameters become larger, in fact, the one-way effect is not bad. So it's hard to say whether a one-way model or a two-way model is better, and there is no conclusion yet.

Yi: Will one-way training consume less computing power?

Xu Xu: Not necessarily, it seems that one-way prediction can be a little less than two-way prediction, but the actual training depends on the scale of demand for the dataset. One-way training may require twice the amount of data than two-way training for the model to converge. So we don't know how much computing power the two models actually use.

Commercial: intelligent customer service, search engine, creative, code assistant?

Eating:

ChatGPT is also considered a chatbot, how is it different from the intelligent customer service that many Internet companies already use?

Xu Yuchang: Q&A robots have been around for a long time, and Q&A bots are divided into closed-source Q&A and open-source Q&A. A closed-source answer is an exhaustive list of questions and answers that may be asked, given a particular scenario. Taobao customer service belongs to this type, and most of the questions it answers are related to its business, such as logistics information, products and whether they are in stock, etc., and these questions have corresponding reference answers. The questions that this kind of question-and-answer bot can answer are endless, and you can't ask questions outside of its business.

But the questions that GPT models can answer are open, it knows everything, which means that it must not be professional enough in specific fields, and this is also the case at present, there is no way to commercialize it.

Yi: What are the differences in the technologies used by these two types of robots?

Xu Yuchang: Taobao customer service and others belong to the previous generation of chatbots, which identify problems by grabbing keywords in questions. For example, they will build the 30 most frequently asked questions by customers, make 30 answer templates, and then output the template corresponding to the keyword to the user as the answer, which is different from generative models. If you have more than one keyword in your question, it won't necessarily catch it accurately. But this chatbot can be commercialized because it does save a lot of manpower.

Yi: What are the commercial scenarios of ChatGPT?

Xu Yuchang: Scenarios where the problem is inexhaustible need to be done with a generative model such as GPT. I think it might be helpful for some junior writers at the moment, not that the things it generates are ready to use, but it does provide some ideas. For example, writing a year-end summary or designing a recipe may not give you exactly the right thing, but it does give you some thought.

But if it's just a system that provides ideas, the average individual is less likely to pay for it, which is also a problem. In fact, when GPT-3 came out, everyone was already asking this question. OpenAI still can't make money from GPT.

Yi: Do you think it will replace search engines?

Xu: As long as it has no way to solve the problem of factual errors, it cannot replace search engines. The reason is simple, when you search for something, the probability given by the search engine is very relevant and guarantees a certain accuracy. For example, if you search for Chen Kaige's works, the results you search for generally will not be wrong, because they are edited by people. ChatGPT doesn't work, someone has tried, it may say what works of Zhang Yimou are Chen Kaige's. If it cannot be solved by this factual error, it cannot replace the search engine.

Yi: Does it have something to do with the fact that its corpus is static rather than dynamic?

Xu: There may be a relationship, but the bigger problem is that when its generation logic is still based on "predict the next word", it has no way to judge whether things are right or wrong. ChatGPT also admits this on its official website, and the people responsible for tagging the text generated by ChatGPT do not actually know whether the answer is right or wrong, they can only see which is better from the grammatical structure and make an order, but they cannot fully judge the factual error.

Eating:

Is Google's Bert better positioned to solve this problem, such as comparing data from Google's search engine with answers generated by Bert?

Xu Yuchang: Not for the time being, it may use the search engine results to give Bert some hints, but it will not directly display the results through Bert.

The problem is not that generative models and search engines have different data sources, but that they are essentially two sets of things with different architectures. The essence of search is to sort the existing data and match the most relevant information to the search term, and generative models are based on neural networks.

Eating:

When smart speakers appeared, there was talk about the possibility of them replacing search engines, what is the difference between generative models and the AI in these smart speakers?

Xu Yuchang: When you ask intelligent robots, in fact, they just convert the questions you ask into text, and then go to the search engine to search for it, get the answer and feedback it to you. These AIs do more of a speech recognition and text conversion job. If you're not connected, these AIs can't do anything. The most typical example is weather forecasting, which ChatGPT can't do because it's not a weather forecasting model. The AI in the smart speaker can give the next day's weather because it crawls weather predictions from other platforms through search engines.

Conversely, if you ask the AI in those smart speakers an open-ended question, such as "Do you think a certain piece of music is good", it may also generate some feedback, but it will be much more subjective than ChatGPT because the model is not large enough.

Yi: Do you also think that ChatGPT may replace programmers in the future?

Xu: Before ChatGPT, Google had launched the so-called auto-completion AI, which was used exclusively by programmers to complete code. It also uses the "predict the next word" way to complete the code paragraph by paragraph, just as you would if you asked a question in ChatGPT and answered it. For example, if you say to this kind of code writing assistant, "What function do you write me, what functions does this function meet", it can generate a function for you that meets your needs. In some cases, it is no worse than human writing, so it can replace some of the programmer's primary work.

Finding bugs is also one of the tasks it can accomplish. That is, when a language model can predict what the next word will be, it can compare its prediction with the programmer's input, and if the two are different, it can preliminarily determine that the programmer is wrong. This kind of code assistant is trained with a lot of code, so it knows what should be connected under a line of code with a high probability.

General intelligence Deep learning alone is not enough

Eating:

How much has ChatGPT advanced in the development of artificial intelligence? How to understand everyone's excitement?

Xu Yuchang: There is no new progress on the underlying framework, but it is still a step further to the application level on this framework (referring to deep learning). Since April this year, new things have been coming out of the entire AIGC (AI-Generated Content) field, bringing the whole community very hot. This advance is a new advance in the details of the model, using an algorithm called the Diffusion Model, which was only published in 2020.

When GPT-3 came out, some people said that it gave AI another 5 to 10 years of life. Because it's true that nothing new has come out in the AI field for a while, ChatGPT has better results and will indeed bring some excitement to everyone. But how long this excitement will last is uncertain. Progress is definitely there, but not that much.

Yi: Why is it difficult to make great progress in the field of artificial intelligence for a long time?

Xu Xu: The reasons that affect the progress of AI are not only the algorithm itself, but also the hardware. ChatGPT can have so much computing power, which is inseparable from hardware development. Without the optimization of hardware companies such as NVIDIA on graphics cards, it would not be possible to support the training of these large models. ChatGPT couldn't be trained 5 years ago because there weren't enough graphics cards.

Deep learning is actually what has happened in the past 10 years. The theory of deep learning has existed in the 1980s, but it really ran in 2012. Since 2015, the industry has begun to use it in full, and it has not been a few years now. The neural network in the 1980s was not called deep learning, because it was not deep, it only had 2 layers. Later, everyone found that the deeper the stack, the better the effect, and directly opened a new era.

Eating:

One view is that there is also Moore's Law in the field of deep learning, and with each additional level of training data, the performance of the model will go up to a higher level, so how can AI grow when the amount of data available for training peaks?

Xu Xu: There are two paths, one is to study whether the current model is really large enough (note: refers to the number of layers and parameters of the neural network), which is a problem. When GPT-1 appeared, there were already hundreds of millions of parameters, and we felt that it was big enough. Later facts tell us that the model can be larger, when GPT-2 comes out, the number of parameters reaches 1 billion, when you think it cannot be larger, GPT-3 parameters reach the level of hundreds of billions, GPT-4 will not have trillion parameters I don't know. This is the way to keep enlarging the model, and spending money can have better results.

Another way is to completely overturn deep learning and switch to other algorithms. But exactly where the new algorithm is, it's not clear. The academic community is still exploring new algorithms, including the so-called big three of deep learning (Note: Hilton, LeCun and Bengio, who won the 2018 Turing Award for their contributions to deep learning), and they are also looking for better algorithms than deep learning. For example, LeCun believes that the essence of deep learning is gradient backhaul (referring to gradient descent + backpropagation), and he believes that gradient backhaul is not the ultimate solution.

Some people also believe that deep learning is right in the direction, but some breakthroughs need to be made in theory. Otherwise, we can only keep expanding the model and increasing the computing power.

Yi: What are the different techniques of generative models when generating different content such as text, images, and videos?

Xu Yuchang: It's different, the generated pictures are not generated word by word, pixel by pixel like text. It is directly generated by a whole picture, but there are many steps in between, the first step is to form a relatively blurry picture, and then constantly denoise, the picture is embodied step by step. It doesn't do it by superimposing the most likely images corresponding to different keywords, and people think they do this when they first see the results, but they are not. It uses the Diffusion Model algorithm, which was only published in 2020.

Yi: Is there a general-purpose AI that can generate various types of content?

Xu: There are still some difficulties, but there is no guarantee that ChatGPT will be able to do this after more training. Now you ask ChatGPT to generate some images, and it seems to do that too.

Yi: Does the deep learning neural network model have to be bigger as possible?

Xu: It's not that bigger is better. It may be good to a certain extent, determine this level, and everyone can train according to this degree. But which scale model is the best, it is not yet known, this is also one of the directions of the industry's exploration.

Yi: Is it best to build neural networks at the scale of human brain neural networks?

Xu Yuchang: The problem is that human brain neurons still have more parameters than existing models. Therefore, there are also researchers who say that artificial intelligence neural networks must be as large as the number of parameters of a human brain in order to be intelligent. Whether it's right or not, I don't know yet.

Eating

: Now a company that makes social products, are you worried about this intelligent chatbot replacing your products in the future?

Xu Yuchang: Whether it will replace social networking, I personally think not necessarily, as far as its current ability is not up to it. At present, the entire AI field is still in the stage of weak artificial intelligence, and there is still a long way to go from strong artificial intelligence.

At least in the framework of deep learning, this cannot be achieved. As the AI is trained with more corpus, it will generate something closer and closer to what we show it. Because it has certainly seen a lot more than people, a person can only see part of the world, and the model can see more, then the content it gives you may feel fresh. But it still does not form the logic and emotions of people, which is irreplaceable for people.

Eating:

Many people think that these models answer questions poorly because they don't understand what they're talking about, and is "understanding" important to achieving machine intelligence?

Xu Xu: I personally understand that understanding this thing is not very important for machines, and the current framework (referring to deep learning) does not exist for machines to understand this thing. To the machine, it is just fitting the benchmark. Even when it does well, it is only because it has seen it in the training data, not that it has understood.

Yi: Since we can already add attention module and memory module to the framework of deep learning, why not add an understanding module?

Xu Xu: If we know how to achieve it, we should achieve it. At present, we have not made substantial progress in theory.

The copyright of this article belongs to CBN,

It may not be reproduced or translated without permission.

ChatGPT, the "miracle" in the field of algorithms

You can purchase the December 2022 issue of "Year-end Special"

Read on