laitimes

Huawei doesn't want to follow the old path of ChatGPT

author:City boundary observation

After the hustle and bustle of the first half of the year, the big model boom brought by ChatGPT is ushering in the first round of cooling-off period. In June this year, ChatGPT visits fell for the first time monthly, and the new version of Bing (Microsoft Bing), which connected to ChatGPT's chat function at the beginning of the year, has also seen a recent decline in market share, even lower than before the revision.

Everything indicates that the moment of "watching lively" on the big model track has passed, and the market is putting forward higher requirements for the practicality of large models. Compared with the collective carnival of the C-end market, everyone is now more concerned about how the big model lands in the B-end industry, and the large model that only "Chat" (chat) can no longer meet the demand.

A new competitive situation has emerged. At the recent artificial intelligence conference, a number of large models focusing on different industries and scenarios began to emerge. From Huawei, Tencent, Ali to iFLYTEK, etc., they are trying to make large models land at the commercial level. Compared with the selling point of "writing poetry and painting" in the past, everyone is now more concerned about how to let large models help users solve practical problems.

It can be said that mainstream big model players are finally starting to get ready to do practical things.

As the earliest technology giant in China to invest in the research and development of large models, Huawei launched Pangu Large Model 1.0 as early as 2021. However, in this year's industry boom, it has not come up with its own products. It was not until the World Artificial Intelligence Conference on July 6 that Ken Hu, Huawei's rotating chairman, officially announced the release of Pangu 3.0, and mentioned that the key to the future development of artificial intelligence is to "go deeper and deeper" and empower industrial upgrading. On July 7, at the HUAWEI CLOUD Developer Conference (HDC2023), HUAWEI CLOUD officially released version 3.0 of the Pangu model.

Huawei doesn't want to follow the old path of ChatGPT

Unlike ChatGPT, Pangu 3.0 is not a large model that focuses on "chat", and Huawei even mentioned that Pangu large model will not be open to individual users for a period of time, which is not the main direction of the product. Although Huawei did not disclose how long this time is, it at least confirms that "Chat" is not the focus of Pangu's large model development.

"We have never compared to ChatGPT, we have not called Pangu Chat, nor Chat Pangu, we have no time to chat." Zhang Pingan, Executive Director of Huawei and CEO of HUAWEI CLOUD, mentioned in a media briefing on July 7.

According to Huawei, the Pangu 3.0 model is not a single large model, but a general term for a series of large model clusters and engineering application platforms, which are divided into three levels, including the general large model of the bottom layer (L0), the industry large model of the second layer (L1) and the subdivision scenario model of the third layer (L2).

It should be said that when the entire big model track is competing for who is better at writing poetry and painting, Pangu 3.0 has chosen a new path, and its focus is not only on the iteration of general capabilities, but also on the evolution of professional capabilities to meet the diversified needs of different industries and scenarios.

Huawei has obviously realized that if the large model is to be truly landed, it must be implemented to solve the actual needs. Large models must have highly specialized practical capabilities in different industries and scenarios in order to survive.

Don't do the next ChatGPT

What kind of big model do we really need? When ChatGPT became popular around the world at an unexpected speed, even the top tech giants may not have figured out the answer to this question. After Microsoft couldn't wait to make a radical revision of its search engine "Bing", it proved that the effect was not ideal.

But everyone believes that as generative AI leaps over a specific node, it will completely reshape the way the world is produced – perhaps as effectively as the last information revolution brought about by computers.

In this big model team battle, as the latest domestic technology giant to enter the big model track, Huawei chose to enter the To B market, which it is best at. After the "toss" in the first half of the year, the entire large model track has gradually realized that although the To C market is lively, in the end, the large model must be deepened and solid in order to land at the commercial level.

"Huawei's big model doesn't write poetry, only does things." When Pangu 3.0 was officially released at the HUAWEI CLOUD Developer Conference on July 7, Zhang Pingan, Executive Director of Huawei and CEO of HUAWEI CLOUD, said so.

As the earliest manufacturer in China to invest in the research and development of large models, Huawei launched the Pangu 1.0 model as early as 2021, and the official release of Pangu 3.0 is a major upgrade to Pangu 1.0. Just like the upgrade from GPT 3.0 to GPT 4.0, Pangu 3.0 is also a revolutionary iteration and takes a completely different path from ChatGPT.

Three years to sharpen a sword. In the past two or three years, the Pangu model has undergone significant upgrades in architecture and training methods.

In terms of architecture, Pangu 3.0 pioneered a three-layer architecture, the bottom layer of which is a series of general large models such as CV (vision), NLP (Chinese voice), multi-modality, predictive decision-making, scientific computing, search and recommendation, etc., the second layer is a large model of mining, meteorology, drug molecules, electric power, finance and other industries, and the third layer is a scenario model that solves specific problems, with highly customized functional features.

Huawei doesn't want to follow the old path of ChatGPT

In terms of training methods, Pangu 3.0 has also upgraded a set of training modes from general to specialized, with pre-training methods common in the industry and the general ability to train large models. At the same time, targeted special training is also added, which can be fine-tuned through SFT data to meet the needs of different industries, and there is also RHLF training, which can be reinforced for customer annotation and feedback.

In addition, as the industry's first fully hierarchically decoupled large model cluster, Pangu 3.0 distinguishes its different capabilities, rather than directly packaging it into a giant model like ChatGPT, allowing users to access it on demand.

In layman's terms, the capabilities of Pangu 3.0 can operate independently without interfering with each other. Because the needs of different customers in different industries are different, for example, the railway industry may mainly need the ability of visual models, the meteorological industry may mainly need the ability of scientific calculation, and the design of hierarchical decoupling is suitable for the customized needs of different industries.

"The hierarchical decoupling model can build a large model business model well, so that industry customers can take whatever they want like a medicine." Zhang Pingan mentioned in a group interview with the media on July 7.

Relying on the new three-layer architecture and layered decoupling capabilities, the core positioning of the Pangu model is to empower all walks of life, among which Huawei mainly focuses on the L0 level and the L1 level of general knowledge.

Of course, no matter what innovation is made in architecture and capabilities, the core competitiveness of large models is ultimately reflected in training results, which must rely on the huge amount of data and computing power.

In terms of data, the pre-training data of the Pangu large model contains more than 3 trillion tokens, using more than 1000+ TB of data training, and the instruction fine-tuning data is also tens of millions. Moreover, compared with other general large models, the Pangu industry large model also uses many industry open customer data and industry customer authorization data for training, and targeted industry data training further improves Pangu 3.0's ability to solve industry problems.

In terms of computing power, due to well-known reasons, Huawei cannot use a general-purpose GPU architecture, so it can only build its own framework and platform. According to Zhang Pingan, the Pangu large model computing power is based on Huawei's Ascend AI computing power cluster, and the core is the Ascend chip adaptation neural network computing of the da Vinci architecture. However, according to Huawei, the model training efficiency of Pangu 3.0 based on the Ascend AI cluster is 1.1 times higher than that of the GPU architecture.

In order to further increase the scale of computing power, Zhang Pingan announced at the conference on July 7 that the single-cluster 2000P Flops computing power Ascend AI cloud service will be launched at HUAWEI CLOUD's Ulanqab and Gui'an AI Computing Power Centers at the same time. Compared with the GPU architecture that is popular all over the world, the Ascend AI computing power cluster hopes to become the other pole of domestic AI computing power in the future.

Under multiple innovations, Pangu large model has achieved industry leadership in a number of capabilities, among which Pangu NLP large model is the industry's first Chinese model with 100 billion parameters, with strong text understanding and generation capabilities, and CV large model takes into account image discrimination and generation capabilities for the first time, reaching the industry's highest level in the accuracy of small sample classification on ImageNet 1% and 10% datasets.

Since entering June, many people in the AI industry have admitted that at the beginning of the popularity of ChatGPT at the beginning of the year, there was a certain amount of hype in the entire industry. When all enterprises are rushing to release the same chat model, it also indicates that this direction may begin to fall into a certain misunderstanding.

In February this year, Ren Zhengfei mentioned at the "Difficult Problem Unveiling" Spark Award symposium that the future AI big model will be surging, and it will not be just Microsoft. The direct contribution of artificial intelligence software platform companies to human society may be less than 2%, and 98% of them are the promotion of industrial society and agricultural society.

Now even OpenAI is considering entering the industry market, which largely shows that industrialization may be the only way for the entire industry. How to further transform industrial society and agricultural society will be a problem that all big model players need to think about together.

Deep ploughing the industry, landing scenes

When players in the industry gradually realize that industrialization is becoming the focus of future large-model competition, the in-depth level around different industries will become the key to victory - who can grasp the real needs of the industry and effectively solve problems for the industry, who can be the first to run through the business model of the big model.

As the world's largest communications equipment manufacturer, Huawei has decades of accumulation in the government and enterprise market, and has great advantages in the depth of the industry. In the past two years, Huawei has successively established 20 corps to go down to mines and coal shafts in order to further penetrate the industry and deeply serve government and enterprise customers.

In a media interview on July 7, the CEO of HUAWEI CLOUD mentioned that Huawei's biggest advantage is that the industry business is relatively deep, and where there are problems in the industry, scientists and mathematicians can be sent.

"Our scientists, mathematicians, can go down to the coal mine, and can squat in the workshop for a month, three months. We dare to go down to the field and the field, others may not be able to go down or unwilling to go down, this is Huawei's most important advantage in the big model. Zhang Pingan said.

From the perspective of practical application, many industries that were deeply cultivated by the Legion War before have indeed become the first areas used by the Pangu industry big model.

For example, in the government affairs market, which Huawei is good at, the Pangu government affairs model has mastered a wealth of industry knowledge such as laws, regulations, and procedures by finely adjusting more than 200,000 pieces of government affairs data, including 12345 hotlines, policy documents, and government affairs encyclopedias. In the application case of Shenzhen Futian District Government Service Data Management Bureau, Xiaofu, a Futian government intelligent assistant trained based on the Pangu government affairs big model, can accurately understand the people's consultation intention.

Huawei doesn't want to follow the old path of ChatGPT

For example, in the financial field, the Pangu Financial Model can automatically generate processes and operation instructions for counter staff according to customer problems through pre-training of various operations, policies and case documents of banks, reducing the original operation that required an average of 5 times to 1 and shortening the closing time by more than 5 minutes.

In the field of meteorology, the Pangu Meteorological Big Model is the first AI prediction model with more accuracy than traditional numerical prediction methods, and the prediction speed has also been greatly improved. Predicting the path of a typhoon in the next 10 days required five hours of simulation on a high-performance computer cluster of 3,000 servers.

Just before the HUAWEI CLOUD Developer Conference, on July 6, Nature magazine published the research results of the HUAWEI CLOUD Pangu model R&D team, "3D Neural Networks for Accurate Medium-term Global Weather Forecasting". The paper shows that the Pangea Meteorological Model breaks through the worldwide problem that the accuracy of AI forecasting weather is not as good as traditional numerical forecasting, and is the first AI model with more accuracy than traditional numerical prediction methods, and the speed is more than 10,000 times faster than traditional numerical forecasting.

In addition, in coal mines, railways, drug research and development and other industries, Pangu models have launched special industry models to further help the industry improve efficiency. Huawei said that the goal of Pangu's big model is to let every industry and everyone have their own "expert assistant".

"We have always adhered to the strategy of AI for Industries and continued to move forward on the road of deep cultivation of the industry. I firmly believe that big models will reshape industries, and every developer will be a hero who will change the world. Zhang Pingan, CEO of HUAWEI CLOUD, said.

On top of the industry's large model, more subdivided and more specific (L2) scene models are specially designed to solve specific problems, which Huawei says is "out of the box". At present, the Pangu model has been applied in more than 100 practical scenarios, lowering the threshold of artificial intelligence development and saving more than 80% of R&D costs on average.

For example, in the State Grid Chongqing Power Supply Company, the Pangu CV large model has been successfully applied to the power intelligent inspection after pre-training of massive unlabeled power data and fine-tuning of a small amount of data, which largely replaces the traditional UAV intelligent inspection AI model. In terms of data labeling capabilities, the sample screening efficiency of the new model is improved by about 30 times, and the screening quality is improved by about 5 times, taking Yongchuan's collection of 50,000 high-definition images per day as an example, which can save 170 manual labeling time per day.

In the application of judicial case retrieval, the Pangu NLP model has been fine-tuned and optimized for multiple industry difficulties, and even designed a new prediction function. Finally, in the CAIL (Challenge of AI in Law) competition of China's legal intelligence technology evaluation, it ranked first with a total score of 0.943 with a NDCG@30 score.

In the application of intelligent detection of freight trains, the Pangu model has achieved accurate detection of freight car operation faults, and can accurately identify 442 kinds of faults. The fault discovery rate of large categories reached 99.99%, the general fault discovery rate exceeded 98%, and the overall discovery rate reached 99.8%, exceeding customer expectations.

In Huawei's view, in the future, the big model will become a synthesis from general to specialized, in which the big language model is more like an inner brain center, providing a general solution in the field of natural language processing: it can understand human intentions and serve the industry by calling various subdivided industry models, while various scenario-focused tuning models are more like specific small functions, specifically to solve specific problems.

There is no doubt that the outbreak of the large model track since this year will become a scientific and technological revolution, completely reshaping the entire industrial society, and many industries will be completely changed. As an enterprise, in addition to research and engineering, Huawei needs to explore new large-model business models to ensure the commercial success of large-model models.

Nowadays, the Pangu large model is divided into a three-layer model from L0 to L2, and on the basis of complete decoupling, it is split and combined according to the needs of different customers, in order to further explore the boundaries of large model commercialization.

Huawei's road to big models

A few months ago, when domestic large models were born, the industry has been looking forward to the appearance of Huawei's large models. As the head player in the domestic AI industry, Huawei has always been regarded as one of the enterprises with the deepest accumulation in artificial intelligence, and what kind of big model road Huawei will take has always been concerned by the industry.

Although Pangu 3.0 debuted later than other mainstream players, if you look back, Huawei's research on large models has a long history.

According to Huawei, as early as 2020, Huawei judged that the future AI industry will have two main development directions, one is the trend from small models to large models, and the other is the combination of AI and traditional technology computing, that is, AI for Science. At that time, Huawei proposed six sub-topics on data models and knowledge, among which the model height plan and the pre-vision plan of everything are highly related to the large model, and Huawei has been promoting this direction before the launch of GPT-3.

But ChatGPT's promotion of the industry can not be ignored, especially after the launch of GPT-3 in 2020, which made the entire industry notice the new trend of accelerating the rise of large models. Huawei also began to conduct research on large-model NLP and large-model CV in the summer of 2020, and gradually expanded to multi-modality, personalized computing, and predictive decision-making for project establishment.

In addition to entering the game early enough, Huawei's talent accumulation in large models is also deep enough. According to Huawei, more than 50% of the Pangu model team are doctors, and there are many talented teenagers, and the average age of this team is less than 30 years old. Such a young team with excellent technology and daring to innovate is the most solid talent guarantee behind the Pangu model.

Now the appearance of Pangu 3.0 and the choice of the route to take root in the industry means that Huawei has finally taken the most important step towards the road of the big model it has chosen, but what will be the next step, like the entire industry, Huawei is still groping.

According to Zhang Pingan, CEO of HUAWEI CLOUD, in a media interview on July 7, Huawei made a very radical roadmap for the next step of the Pangu model, "We are now all in the Pangu model, and the road map is densely formulated." Zhang Pingan said.

In Zhang Pingan's view, the future Pangu model is more concerned not about how big the parameters are, but how high the vertical penetration rate of the industry is. In addition to the railway, coal mine, finance, government and other industries that have now cut in, there are more industries that need their own big models.

"The most valuable parameter of the Pangu model in the future is not 500 billion or trillions. It is about which industries have been deepened and which new industries have been expanded. Zhang Pingan said.

Although everyone is now moving in the direction of industrialization, Huawei is different from other players. Due to the long-term accumulation of computing power base and development framework, Pangu large model also has a major advantage, that is, full-stack research and development capabilities.

After the big model track caught fire this year, in order to meet the huge computing power demand for large model training, NVIDIA's GPU core is difficult to find. The entire domestic large-model track is now facing a shortage of computing power, and NVIDIA GPUs are likely to face supply constraints in the future, which requires domestic large-scale models to have full-stack independent research and development capabilities.

Huawei doesn't want to follow the old path of ChatGPT

According to Zhang Yuxin, CTO of HUAWEI CLOUD, the Pangu model is independent innovation from computing power to operators, frameworks, and development platforms, and does not use open source technology. The reason why it can achieve full-stack independent research and development is mainly due to Huawei's previous accumulation of root technologies such as AI base, computing power, and chips.

With Ascend's AI foundation, Mindspore's computing framework, and ModelArts training platform, Huawei can make deeper optimizations for different industries and scenarios when making models larger, and go one step deeper than other players.

As Ken Hu, Huawei's rotating chairman, mentioned on July 6, Huawei has two main focus points in the development of artificial intelligence: on the one hand, to build a strong computing power base and do a good job in industrial infrastructure, and on the other hand, to serve thousands of industries from general large models to industry large models.

When Pangu Model 3.0 was released, Pangu also announced a new LOGO, the new LOGO is above the sky, and the ground is stepped on, which is a simplified symbol of Pangu opening the world. "In ancient times, Pangu opened the heavens and the earth, and all things were reborn; Today, Pangu is everywhere, and the industry is reshaped. Zhang Pingan said.

When Huawei named its own large model Pangu, it may mean that Pangu large model shoulders the burden, and if the domestic large model also faces the "stuck neck" problem in the future, Pangu must support the sky alone.

Author | Zeng Guang

Edit | Li Yuan

Operations | Liu Shan

Read on