laitimes

Ali has entered the era of big models, and the core is computing power and ecology

Produced by Tiger Sniff Technology Group

Author|Qi Jian

Editor|Ivan Chen

Head Image|Alibaba Cloud

The wave of AI models seems to be pulling all Internet vendors on the same line.

"In the face of the AI era, all products are worth redoing with a large model." At the 2023 Alibaba Cloud Summit on April 11, Daniel Zhang, Chairman and CEO of Alibaba Group and CEO of Alibaba Cloud Intelligent Group, said.

At this Alibaba Cloud Summit, Daniel Zhang announced that all Alibaba products will be connected to the big model in the future and fully upgraded.

Such actions mean that in Alibaba Cloud, the AI big model will be more like an application-oriented platform, which Cang Jian, chief researcher of the Beijing Software and Information Service Industry Association, compared it to a "super APP". What Alibaba wants to build is the foundation of such an app, and the cloud business is the foundation of this ecosystem.

"Alibaba Cloud is very fortunate that we have caught up with the boom in China's Internet industry in the past ten years." Daniel Zhang said. According to the latest 2022 China cloud market data released by third-party market research agency Canalys, Alibaba Cloud accounted for 36% of China's cloud market share in 2022, ranking first, and although the performance grew steadily, the growth rate continued to slow down. In emerging markets such as cloud business, compared with "other clouds" that have grown rapidly in the past two years, Alibaba Cloud's keyword is more like "keeping business".

And just as Alibaba Cloud was moving slowly, ChatGPT fell from the sky.

In this wave of AI technology outbreak, ChatGPT was born in the cloud, and Azure performed well in the process of ChatGPT training and operation. AI large model capabilities have become the focus of global cloud vendors, and have also become one of the core competitive advantages of future cloud business.

At present, for cloud vendors, regardless of the early development, this wave of AI boom is indeed an opportunity to change lanes and overtake, and model capabilities, infrastructure, and developer ecology may determine the future of cloud vendors.

Hash power remains the focus

Computing power, algorithms, data are the three elements of AI big model development, in the competition of AI large model, mainstream cloud computing vendors obviously have stronger strength in computing power, but the research and development of an innovative technology must face many complex problems, sometimes advantages are also challenges.

"Alibaba Cloud's future core should do two things: first, make computing power more inclusive; Second, make AI more popular. Daniel Zhang suggested that cloud computing is the best way to popularize AI on a large scale, saying, "We hope that the cost of training a model on Alibaba Cloud can be reduced to one-tenth or even one percent of what it is today." Even small and medium-sized enterprises can obtain the capabilities and services of AI large models through the cloud platform. ”

According to Zhou Jingren, in the past decade, the cost of computing power provided by Alibaba Cloud has dropped by 80%, and the cost of storage has dropped by nearly 90%. In 2023, Alibaba Cloud will launch a computing power product closer to the ultimate form of cloud computing, which is named Universal Instance to further shield the hardware parameters of traditional IT, making the data center truly a supercomputer, providing inclusive computing power for small and medium-sized enterprises and developers. The price of Universal Instances has been significantly reduced, up to 40% lower than the previous generation of prime instances.

Price reduction and inclusiveness are indeed effective ways to promote cloud services and popularize AI, but can inclusive computing power meet the research and development needs of large models?

The development of AI large models requires high computing power, and the strength of computing power depends on multiple conditions, including hardware performance, number of hardware, system and network, software optimization, algorithm efficiency, and energy supply and heat dissipation.

OpenAI's public information shows that the GPT-3 model is developed using NVIDIA A100 graphics cards. At present, domestic computing power service providers are not optimistic about the amount of A100 stockpiling.

"AI training and operation require computing power, whether it is a traditional AI model or a pre-trained large model, computing power is definitely the core advantage of cloud computing vendors." Cangjian told Tiger Sniff that GPU chips are an important condition that affects the training and computing power of AI large models. The problem of core shortage of domestic service providers is not very obvious at present, because from the perspective of operation and development, domestic manufacturers will carry out long-term reserves in terms of computing power.

In addition, for cloud manufacturers, the requirements for chip technology in servers are lower than those of mobile phones, mainly in terms of volume and energy consumption, and some domestic self-developed chips can basically meet 60%-70% of AI large model research and development needs.

However, for the development of AI large models, although AI large models can be developed without high-end GPUs, the training effect and efficiency will inevitably be greatly reduced. First, if the GPU is not enough exist, it is necessary to architect large models, use model parallelism, or reduce the batch size to accommodate memory limitations, but this may affect model performance and training stability.

Jiang Linquan, a researcher at Alibaba Cloud and head of Alibaba Cloud's official website, said, "For large model research and development, high-end GPU chips mean stronger data storage capacity and are more friendly to AI large model training using large amounts of data. However, if there is no sufficiently advanced GPU, it is necessary to expand the GPU cluster and do large model training through distributed training and other means. ”

However, for cloud vendors, to expand the distributed training of GPU clusters, it is necessary to ensure high-speed communication and synchronization capabilities in the process of building GPU clusters, which also has certain technical thresholds for cloud computing vendors.

Alibaba Cloud did not disclose the chip used by the large model. At present, the chip used in domestic large model training is mostly NVIDIA's A100.

Although the inventory of cloud service providers can meet the temporary demand, with the rapid development of AI large models, the computing power gap may rise geometrically, and with the iterative update of AI technology and chip technology, the "inventory" of domestic manufacturers may soon be insufficient. The simple parallel connection of low-end graphics cards is difficult to meet more advanced research and development needs, and will soon face the problem of energy consumption and cost, how to calculate the accounting of the future computing power market, how to develop self-developed chips, are all problems in front of cloud vendors.

Ecology is the key to success

In addition to computing power, ecology is a battlefield for big models, and major manufacturers are racing the ground.

At the Alibaba Cloud Summit, Zhou Jingren officially announced Alibaba Cloud's large-scale language model product - Tongyi Qianwen.

Although a few days ago, Tongyi Qianwen has been online for the invitation test, but the invitation code application for the test has only been open for half a day, and most application users do not seem to have received the invitation code. Zhou Jingren said that at present, the test of Tongyi Qianwen is mainly aimed at targeted enterprise users.

The capabilities of this exhibition are richer than the current invited version, including not only the dialogue function of large-scale language model (LLM), multi-round interaction and complex instruction understanding. Multimodal fusion similar to GPT-4's "map recognition" capabilities were also mentioned, as well as support for external enhancement APIs.

Alibaba's AI big model capabilities are the same as those at the beginning of Alibaba Cloud's establishment, and the first step is to serve "own people". At this Alibaba Cloud Summit, Daniel Zhang announced that all Alibaba products will be connected to the big model in the future and fully upgraded. He said that to build a new AI open ecosystem, it is necessary to start from within Alibaba.

Taking DingTalk as an example, in Zhou Jingren's demonstration demo, DingTalk can realize nearly 10 new AI functions after connecting to Tongyi Qianwen, and users can use shortcut keys to evoke AI anytime, anywhere, and open up a new way of working. In DingTalk documents, Tongyi Qianwen can create poetry novels, write emails, generate marketing plans, etc., and comprehensively assist the office. In the DingTalk meeting, Tongyi Qianwen can generate meeting minutes at any time, automatically summarize meeting minutes, and generate to-do lists. Tongyi Qianqian can also help automatically summarize the main points in unread group chat messages.

Ali has entered the era of big models, and the core is computing power and ecology

One trend is that AI capabilities will become a hard indicator of SaaS software. "Someone once said that the domestic SaaS roll is not moving, you can try it at sea. But now I'm afraid there is no drama, in China you are facing DingTalk and other products, but overseas, you may have to face a team with GPT-4. A senior executive of a domestic collaborative office software company revealed to Tiger Sniff that in the short term, the AI functions on SaaS and collaborative office software may have to wait for a while, after all, the cost is there, but if Microsoft and Google "roll up" regardless of the cost, the good days of domestic manufacturers may come to an end.

"The AI big model may be more like super apps such as WeChat and Alipay, which is an application-oriented platform." Cangjian thought. Domestic manufacturers are unlikely to share their own data, so it is impossible to share a general AI model, let alone rely on other manufacturers' large models.

Ecological competition will become one of the keys to the success of AI models of various manufacturers. "For enterprises with large models as their main business, the main customers or partners should be industry enterprises with weak AI capabilities. By joining the ecology of a large model and binding an important service provider, it realizes the empowerment of AI large models. Cangjian said.

Grab users and let enterprises join their own ecology, it is not enough to have preferential prices. For enterprises and users, no matter what kind of digital and intelligent transformation, the purpose is nothing more than "reducing costs, improving quality, and increasing efficiency", and cloud technology has been trying to find scenarios that can achieve these three goals from the business of enterprises. However, today, any generative AI big model, to find such a scenario, the first thing to face is the two important issues of "reducing costs and improving stability", the same is true for ChatGPT, GPT-4, and the same is true for Tongyi Qianqianqing.

Read on