laitimes

The 6 most worthy questions of GPT4 | See Wisdom Research

Artificial intelligence opens the road of rapid evolution, and Zhizhi Research (public number: Zhizhi Research Pro) specially invited [Ding Qi], senior vice president of CITIC Securities Research Department, to talk about the latest progress of artificial intelligence GPT4, and organize the core content as follows:

Summary:

1. It is very normal for multi-modal to increase costs.

2. The lower the cost of technology will be. In the long run, the marginal cost of OpenAI will approach 0 infinitely.

3. The change of human-computer interaction interface is the reason why GPT3.5 has begun to be valued by the industry

4. The essence of multimodal 4.0: all text, speech, images, and videos can be abstracted into a set of vectors. GPT is essentially the input of a vector, and through its correlation output another set of vectors, so as to convert into images, voices or videos, which is essentially the same, the difference is the computing power resources consumed.

5. There are actually two revolutions now, one is the energy revolution, based on lithium batteries, from the past fossil energy to our current lithium battery energy. Another revolution is the AGI general artificial intelligence represented by ChatGPT, in the future, after more sensitive mechanical feedback, robots are the largest application scenario, but digital humans will definitely be applied earlier than robots.

6. AI deduction path: it must be software before hardware, cloud first, backend, and finally edge.

body

Dinch: Microsoft's embedding of GPT4 into Bing and Office Family Bucket is a match made in heaven. The core of GPT4 is multi-modal, able to generate text, images and videos, which has a very large promotion effect on search engines and offices, like the core of search engines need not only links, but answers, GPT4 can directly generate the specific answers we want to answer a certain question.

We usually generate content through office software, PPT, Word, Excel, and now GPT has become a very powerful assistant, such as one-click generation of PPT, which greatly promotes office efficiency. Therefore, we believe that the cooperation between Microsoft and OpenAI will bring revolutionary changes to the production of content, and we hope that relevant domestic office software will launch relevant functions as soon as possible, so that people can enjoy the convenience of office.

See Wisdom Research: How to view the running costs of GPT4?

Ding Qi: First of all, GPT4 has not published a paper to say what the parameters are, but Mr. Zhou Hongyi, chairman of 360, made an estimate based on the effect of GPT4, which may be a trillion-level parameter. However, this is not so important in terms of cost.

In addition, the past pricing and the current pricing are based on tokens, and the current unit price is about 30 times more expensive (in the past, it was $0.002 for 1,000 tokens, now it is $0.06). Why is it more expensive? Because tokens are priced differently. The general pricing method is that you look at the input parameters, and now entering the same word is more expensive, because you can't just look at the cost of the input side, but also look at the cost of the output side. In the past, the input and output were text, so the cost was relatively low. The GPT 4.0 output may be a picture, or even a video later, and the output is greatly increased, so it is very normal for multi-modal to bring about a cost increase.

See Wisdom Research: Why can GPT 3.5 Turbo achieve more parameters, but the price is lower?

Ding: Compared with 3.5, the core parameters have decreased after tuning. The cost can be divided into two pieces, one is the training cost and the other is the inference cost. After 3.0, the model is well trained, and many costs are amortized, and after 3.5, everyone shares more inference costs.

Technology is like that, and it must be expensive in the beginning, because it has a lot of investment in research and development, a lot of investment in infrastructure. Of course, the more it will be cheaper in the future, because the more people use it, the more the cost will be shared.

So OpenAI says that in the long run, its marginal cost is infinitely close to zero. In fact, just like our current search engine, the cost of a search is extremely low now. Just 4. 0 compared to the past 3. 5. The generated content is different, and images and videos consume a lot of network bandwidth and computation, so short-term costs will also increase.

See Wisdom Research: After the release of GPT4, how do you view the value of the previous version? What are the evolution of GPT1, 2, 3, 3.5 to 4 generation large models?

Tinch: 1. 0 was released in 2018, 2. 0 was released in 2019, but in fact, it did not make much splash in the industry, it was an attempt by the transformer to NLP (natural language understanding). In the past, the experience effect on NLP was actually not very good, such as voice transcription, and the translation effect was actually not satisfactory.

In the past, artificial intelligence, its main algorithms are based on CNN, RNN, LSTM. It mimics human neurons, believing that signals are transmitted from one neuron to another, that is, finding correlations between adjacent words from one word. This method is particularly effective in images, because from the image, adjacent colors and textures will be very similar, and the CV field of artificial intelligence was born.

But like phonetics, words are not necessarily adjacent to each other, need to combine the context of the context, and even in the first few chapters may be foreshadowed, you need to have a complete memory and interpretation of the context. The key is how to enable machines to achieve contextual understanding.

The Transformer architecture provides a very good idea. Since understanding, wisdom, and experience in our lives are related to many things. Then expand the parameter group all the way to a range of billions or billions. In this way, you can find more correlations from a variety of training data.

The previous Transformer architecture did not show great characteristics, and was essentially just a statistical correlation. When it is only 1. When there are 1.7 billion parameters, that is, GPT1.0, everyone does not feel that it is too great. to GPT2. At 0, about 1.5 billion parameters, everyone also feels that there is no particularly big breakthrough. GPT3.0 through 3. 5, everyone found that there was a fundamental change. When its capacity expands to hundreds of billions of parameters, you can imagine how much knowledge is associated in it, and its accuracy rate has a leap at this time. So until GPT3.0 and 3. 5 It was only after it came out that the industry really began to pay attention to it.

3.5 with 3. The fundamental difference of 0 is that it changes the interface of human-computer interaction, which is a very big breakthrough. InstructGPT can better match language models with human intentions and preferences, and fine-tune the feedback mechanism. And 4. The essence of 0 is multimodality.

See Intelligence Research: How is multimodality achieved?

Dingqi: Whether Chinese or English, there are tens of thousands of common words, and you can use the matrix to make a code. The image is actually formed by pixels, each pixel is painted together by three colors, is a small matrix of three numbers, connect several matrices together, will form a large matrix. The image is essentially like this.

Every speech is a sine wave, sampling it, it also becomes a set of digital video, in fact, it is an image to stack it, so essentially all text, speech, images, videos can be abstracted into a set of vectors. GPT is essentially the input of a vector, through its correlation output another set of vectors, the vector is converted into images, voice or video, essentially the same, the difference is the consumption of computing resources. So why is multimodality possible? Because everything essentially becomes a set of matrices, everything can be abstracted into a set of matrices, which is the underlying reason why it can become multimodal.

In fact, whether we make Transformer or GPT, the fundamental purpose is to use a general artificial intelligence (AGI) to be used in different situations. In the past, it was all dedicated artificial intelligence, such as face recognition, license plate recognition, or industrial detection, all in some professional fields. Human-computer interaction must be multimodal, which is why 4. The 0 is encouraging to the industry, which means we are one step closer to AGI.

Insightful Research: What kind of imagination does GPT4 use have?

Dingqi: Search must be the first scene to land, such as New Bing is the first to access; The second is office software, such as Office365; Email, video conferencing are also scenarios for generating content classes. In addition, service robots and intelligent customer service are also very good application fields.

Now there are actually two revolutions, one is the energy revolution, based on lithium batteries, from the past fossil energy to our current lithium battery energy. Another revolution is AGI general artificial intelligence represented by ChatGPT, and robots are the biggest application scenario in the future after more sensitive mechanical feedback.

Digital humans will definitely be applied before robots. Because the robot is constrained by more places, including battery life, joint freedom, etc., the difficulty of landing is much greater. If it is a digital person in the digital world, its arrival may be much faster, such as hosts, live streamers, digital stars and so on.

In addition, the impact on the game industry is also great, especially the ability to directly reduce the development cost of games. Drawing with AI can greatly improve the productivity of game creators.

In the future, AI machines can replace many simple jobs, so people's creativity and human thinking will become particularly important. So we think there will be a particularly big increase in demand for content creators.

See Intelligence Research: Is the AI industry faster in hardware iteration, or will the application side develop faster?

Dingqi: I think there are two stages, the initial application must be faster, and now many overseas companies have access the API interface. There is also Baidu's Wen Xin in China, and it will also open APIs to access many applications later.

Now it seems that the development path of domestic and overseas is very similar, and one or two companies have made some general large models, and after accessing the API, they can expand to the upper layer of application software, which can greatly improve efficiency. Office365 is a prime example.

The iteration of hardware depends on several conditions. The hardware in the cloud is represented by GPU servers, and there is still a certain gap between China and NVIDIA, and it is not possible to quickly catch up in the short term. In the cloud, we believe that there will be some intelligent hardware on the end side in the future, and the computing power and memory capacity of these end side hardware are definitely difficult to support a large model with hundreds of billions of parameters.

And some applications become intelligent, and they also need to prune large models. It is possible to delimit the parameter range, have a specific application in a specific domain, from a large model to a specific model in a dedicated domain, so that some intelligent hardware on the edge side can also be used.

We think the first thing to get up is the software in the cloud, which is available to everyone in a SaaS-based way. The second is the hardware in the cloud, because how to do the hardware in the cloud, in fact, there is already a benchmark there, but everyone must do it based on the path steadily.

Now open AI has no answer, and everyone's current energy is not on this, but we believe that these hardware will be intelligent in the future, so the path must be software before hardware, especially the cloud, to the backend, and finally the edge.

This article is from Wall Street News, welcome to download the APP to see more

Read on