Source: Agent's subconscious

AGI is a revolution in productivity. If large language models are the steam engine, then AGI is an industrial revolution. In this process of new quality productivity revolution, technology is the driving engine, and a deep understanding of technology can better cut the cake of the business; just like a racing driver, he knows the control logic of the engine in order to overtake in the corner.

Let's talk about the technology paradigm first, and then the business model.

1. The real reason for scaling law

From childhood to adulthood, after countless exams, we know in common sense that it is easy to judge whether the question is right or wrong, and half of the points are scored for blind selection; it is difficult to choose one of the four answers for a single-choice question, and it is even more difficult to choose one out of ten. This logic is valid. The same is true for machine learning. From the perspective of image classification, if it is divided into ten categories, it is equivalent to choosing one out of ten multiple-choice questions, and the dataset of imagenet is 1000 categories, which is one out of 1000. And what is the large language model? He chooses the most likely token from more than 100,000 vocabulary lists, and the number of classifications increases exponentially. From the perspective of mathematical probability theory, the larger the shape of the softmax regression, the exponentially increasing sample size of the dataset is required to be fully trained, because the conditional probability distribution P(Ai|( A1A2...... A 100,000, input text), the number of A increases, then the number of combinations of text and A increases exponentially. It is necessary to calculate the probabilities one by one with a large amount of data in order to rule out other possibilities and get Allah Ai.

In order to obtain a complete distribution of such a scale, the expression of mathematical logic in it is more complex, so it requires large model parameters, the larger the parameters, the more mathematical logic it can express, and at the same time, it needs a large amount of data to provide a complete distribution, so that it can be fully trained. In order to find a way to achieve this goal, OpenAI and the others found that as long as it is a transformer, the width and depth do not need to be designed to permutate and combine, and if the number of parameters is about the same, the mathematical logic that the model can express is basically the same. Therefore, it is easy to find a more suitable model architecture. Direct violence increases the depth and enlarges the data set, and this huge task is accomplished: a multiple-choice task to choose one Allah answer from 100,000 options.

二、sora离真正的text2video的GPT4时刻究竟差多远

Let's get a basic estimate of how large the training set text2video will need. Previously analyzed

SORA Technology 6: In-depth understanding of full-modal video generation by Google VideoPoet

In the classic image classification project, the imagenet dataset has a total of 1000 categories, and you can think that the codebook of the token is 1000 size, and then the total dataset is 1.28 million, which is the appearance of 1300 images in each category. The meaning of this analogy is that 1300 examples are needed to calculate the full distribution probability of a token.

GPT1's dictionary size is 40,478, and GPT-2's dictionary size is 50257, so the dictionary difference is not very big, so let's assume that GPT4's dictionary size is 60,000, and its dataset is 13 trillion tokens, that is, each token has 200 million examples to calculate the full distribution probability to achieve the effect of GPT4.

The size of Videopoet's codebook is 270,000, and if the dictionary is too large, it will cause a huge embedding matrix, which will bring complexity to storage and time. Therefore, in the short term, the video generation task cannot reach the level of GPT4 because the codebook is too large. An analogy is:

When the codebook size is 1000, 1300 examples are required to calculate the full distribution.

At codebook size 60,000, 200,000,000 samples are required to calculate the full distribution. 150,000 times that of the 1300. The codebook size has only increased by a factor of 60. That is, the expansion ratio is 2300 times.

When the codebook size is 270,000, then it is 4.3 times that of 60,000, how many tokens do you need? That is 4.3*2300*200 million. Such a huge amount of data is impossible to calculate.

Therefore, it is no exaggeration to say that the demo released by SORA is just a local distribution trained to a corner on a small local optimal solution (saddle point), and he cannot reach the fully distributed local optimal solution. In other words, sora can only synthesize excellent videos in a few cases, and if the flood attack is really spread out for the public to test at will, it is basically impossible to reach the ability of chatGPT.

To solve this problem, on the one hand, we need scalaing law, which is the most earthy way, and on the other hand, the core is to reduce the size of the codebook. This is a crucial step towards AGI.

3. How difficult is it to land? General and vertical: two bodies of water

As long as it is a general large model, no matter how many dataset evaluation lists he has swiped, he is still a laboratory product after all, because he is training a public dataset, and the public dataset itself has semantic confusion, and he cannot enter a serious workplace environment to solve real-world problems. The classification model trained on Imagenet cannot be directly used for defect detection in industrial vision: this spot belongs to the normal noise of the CPU, and this pit is a defect in the CPU process. This requires reconstructing the real dataset to train a classification model realistically.

The general model is also like this, he is far from the last kilometer of the landing. As an example, if you let the general model answer medical questions, I guess people in the industry are not at ease. And that's what the real business scenario looks like. He is not an assistant for small talk. He needs to be strict about quality.

Therefore, the current problem that the vertical model needs to solve is to answer the convergence problem. Secondly, there is the case of taking the initiative to ask questions. is a real doctor, he needs to take the initiative to look at and inquire, take the initiative to dig out the patient's condition, and the current large model can't do it. Therefore, the vertical model needs to be closely integrated with the business to find another way.

Fourth, why do you need to independently train vertical large models?

The pedestal is the full data distribution of all codebooks, because there are many dross in the public data set, such as the iFLYTEK learning machine incident, the essence is that there are a lot of hostile ideological data in his base training set; the data he eats in the process of pedestal training is the full distribution of a codebook; this full distribution base is crooked, and the applications that grow on his basis are more or less weird outputs from time to time.

Therefore, we need to train a large model of the base of a vertical domain. How to train this large model?

The first is to reduce the size of the codebook. If we are doing medical consultations, then we definitely don't need the code codebook, the code tokens can be removed, and the second is to build a data set of moderate size with sufficient vertical data. If we only have a vertical dataset, we may not be able to cover the full distribution of the entire data, and if we only have a public dataset, we will not have a sufficient understanding of the vertical category. Therefore, it needs to be considered comprehensively. The third is the modest model size. To make a large vertical model, we are training an excavator worker from Nanxiang Technical School, who drives an excavator to work fast and well; instead of training a generalist from Peking University, he lives in a temple and worries about his people and has the world in mind.

5. LLM is loaded into the 1080 graphics card: to welcome the vertical large model of a hundred flowers

Cost determines the key to landing. First of all, the cost should be small, and the model should not be too large, and secondly, the high concurrency should be beaten. This also reduces the cost of landing.

The most important thing is that every corner of every industry needs to have a large vertical model dedicated to this work. The real big model is not the operating system, he doesn't need to be big and complete, he wants to be small and fine, in this field, his knowledge is very accurate, can solve problems in a closed loop,

For example, in the field of intelligent car cockpit, he can answer the car's control guide very accurately, for example, he can answer where to operate the child lock, which is different for each car, and the general model cannot answer.

Then there's the reduction of power consumption. The power consumption is too high, and it is inappropriate to require 4090 for inference. The 4090 has a power consumption of nearly 500W, which is too much electricity. Lower-cost deployments are also required. One day, an old graphics card like 1080 will be able to run, and the industry will be spring.

summary

The real business model has to deeply cultivate a vertical domain, train its own large model in this vertical domain, and the algorithm effect can achieve a closed loop. Then reduce the cost of deployment, so that it can really be a new quality of productivity to make money.

At present, text2video is not enough to meet the standards of commercial products. It's still hard to get to the ground. However, the scientific research task in this direction is still very heavy, and there will be no products similar to GPT4 in the short term and a year.

A word for the family.

The AGI Era: From Technology Paradigm to Business Model

1. The real reason for scaling law

二、sora离真正的text2video的GPT4时刻究竟差多远

3. How difficult is it to land? General and vertical: two bodies of water

Fourth, why do you need to independently train vertical large models?

5. LLM is loaded into the 1080 graphics card: to welcome the vertical large model of a hundred flowers

summary

Read on

Douyin spent 1.4 billion yuan to obtain an offline payment license, and the layout has been open up to the whole payment scene in 7 years [with analysis of the business model of the third-party payment industry]

9 pictures to understand the high-end "business model"!

Robin Li: Open source models are of little significance, and closed sources have real business models; Meituan's takeaway organizational structure has been readjusted; and the top 10 automakers in terms of sales volume were released in March

A good market capitalization requires a good business model

Hey, have you heard? China may usher in a big change in the future! It is said that the people who do business will gradually be eliminated, and a new affluent group is being formed. This news sounds a bit of a surprise

Those who can't make it through the private domain cycle need to reconstruct your private domain business model

Interview with Ernst & Young Greater China Managing Partner Shunjie Bi: New quality productivity has profoundly changed the industrial structure and business model

Meet the F1 teams: Behind the race on the track is the competition of business models

Business model planning scheme of a well-known electrical appliance group (BT version)

Zhou Hongyi denied that he targeted Robin Li!Open Source and Closed Source: An Analysis of the Business Model of Common Development

Lose money, go into debt, don't understand the business model, don't do the Internet, let alone start a business

Chinese enterprises go overseas to promote advanced business models and contribute to the global green transformation

Seamless connection from online to offline: Aishang is about to create an O2O business model

The performance scale of the house has increased steadily, and the digital intelligence of "Dongwo" has helped upgrade and reshape the business model

#鸿蒙智行全系蝉联4月销量冠军#4月中国新势力品牌销量排行榜单出炉, the number of new cars delivered by Hongmeng Zhixing exceeded 29,632, winning the championship again

Challenges and responses to community commercial renewal in urban renewal

Originally, I thought that Qianhu Miao Village was very commercial, until I went here and found that it was more commercial

I believe more and more that Zhou Yangqing's family is much better than Han Xue, because Zhou Yangqing's parents don't need their children to understand human nature, they just need her to grow up happily, while Han Xue has a need to know society early

Huawei's new patent revealed: Flying robots help commercial aerial information collection, subverting the traditional model

Travel, shopping festivals, and commercial markets are superimposed on passenger flow, and the Huangpu police have implemented security measures for long holidays

After Pu Yi died in 1967, he was buried in a commercial cemetery, and Li Shuxian wanted to be buried together but was kicked out by the royal family?

These four zodiac signs are naturally suitable for business, and all of them are business wizards

Lao Luo's live broadcast room is really not lacking in excitement these days. Recently, some netizens suddenly mentioned in Lao Luo's live broadcast room that a certain election has recently become popular with a new anchor, suggesting that a certain ton has also become popular. Hear this

The earthquake of China's business world! The rise of new car-making forces, Wang Chuanfu, Lei Jun, and Musk fell into thinking

Zhou Hongyi's Maybach auction incident has attracted widespread attention. The starting price was only 6 million yuan, but it was finally sold for 9.9 million yuan, and the fierce bidding on the spot was staggering. Maybach S60

Zhou Hongyi denied that the Maybach auction was planned, and the first commercial music video for Sora production was released

The innovative business scene is returning to the "center stage", and Red Star Macalline will continue to break the boundaries and recreate the scene

Who labeled the people's hospital as a "commercializer"! The hospital has always been for the people's health

Very business-minded!President Ronaldo launched Cristiano Ronaldo champagne and set a new record

Originally, I thought that Qianhu Miao Village was very commercial, until I went here and found that it was more commercial

Xiao Zhan: The strength interprets the gravitational field of the brand, illuminating the business map with bright stars