laitimes

Cold view artificial intelligence big model hot

author:New Hunan

At present, large models at home and abroad show a high fever phenomenon in both research and investment, but behind them, some problems deserve attention.

"Oriental Weekly" special writers: Han Jian, Zhong Xinlong, Wang Congcong, and editor Chen Rongxue

On March 24, in Yichang, Hubei Province, netizens showed pictures generated by Baidu's "Wen Xin Yiyan" on their mobile phones

At present, the artificial intelligence big model (hereinafter referred to as the "big model") of various countries around the world has shown a white-hot competition trend.

There are hidden problems behind the release of large models, including still weak underbelly in technology, governance system to be optimized, blind follow-up, huge resource consumption, and development path to be clarified.

To this end, it is appropriate to promote the research and application innovation of the underlying technology of large models, establish and improve the supervision mechanism of large models, guide rational investment in the capital market, and strengthen international cooperation and exchanges.

Competition is intense

The AI Big Model is an AI algorithm that can use big data and neural networks to simulate human thinking and creativity. It leverages massive amounts of data and deep learning techniques to understand, generate, and predict new content, often with tens or even trillions of parameters, that can exhibit intelligence across different domains and tasks. For example, it can generate high-quality text, images, audio and video content in various scenes.

Compared to traditional artificial intelligence, large models can handle multimodal tasks and are not limited to specific tasks or applications.

Taking natural language processing as an example, the large model can adjust the language style according to the context and emotion, and improve the naturalness and fluency of the dialogue with the logic and usage habits of human natural language dialogue. At the same time, based on the large model pre-training has a huge knowledge reserve, which can generate valuable answers that meet the actual needs of users.

In addition, the multimodal processing and generation capabilities of large models in audio, video, and pictures are difficult for traditional artificial intelligence to achieve.

At present, in the field of large models, the competition between domestic and foreign giants has become intense. OpenAI has become a benchmark for leading the development of large models. Following the release of the multimodal large model GPT-4, OpenAI is expected to release a more advanced version of ChatGPT-5 in the fourth quarter of this year. With the investment and cooperation of OpenAI, Microsoft has integrated its Office office products across the board and launched Copilot Office in late March. On May 24, Microsoft announced that Windows 11 was connected to GPT-4.

On May 10, Microsoft's direct competitor Google launched a new generation of large model PaLM 2, more than 25 AI products and functions have been fully connected to PaLM 2, including the original dialogue robot Bard, AI+ office assistant Duet AI, AI+ search engine, etc., Meta released the large-model LLaMA to join the competition. Amazon has partnered with AI startup Hugging Face to develop ChatGPT competitor, BLOOM.

In addition, Character. Start-ups and unicorns related to large model technology such as Al, Stability Al, and A121 Labs have also set off a new round of investment to join the big model competition.

Domestically, all parties involved in production, investment and research have accelerated the pace of layout.

First, domestic leading science and technology enterprises intensively release self-developed large models. Baidu released the big model Wen Xin, Ali released the first ultra-large-scale language model Tongyi Qianqian, and Tencent Mixed Element AI large model team launched the trillion-level Chinese NLP pre-training model HunYuan-NLP-1T. Huawei's Pengcheng Pangu model is the industry's first 100-billion-level NLP model to generate and understand Chinese.

Second, the investment and innovation industry actively entered the big model competition. Meituan co-founder Wang Huiwen brought $50 million into the AI big model, former Sogou CEO Wang Xiaochuan and former Sogou COO Ru Liyun co-founded Baichuan Intelligence, Lanzhou Technology released its language generation model - Mencius MChat controllable large model, and West Lake Xinchen also launched the Xinchen Chat large model.

Third, universities and research institutes actively layout large models. Fudan University launched China's first ChatGPT-like large model MOSS, Tsinghua University Knowledge Engineering Laboratory and its technology achievement transformation company Zhipu AI released ChatGLM, the Institute of Automation of the Chinese Academy of Sciences launched a multi-modal large model Zidong Taichu, and the IDEA Research Institute CCNL launched an open source general large model "Jiang Ziya".

There are shortcomings in technology and governance

At present, large models at home and abroad show a high fever phenomenon in both research and investment, but behind them, some problems deserve attention.

First of all, there are still weaknesses at the technical level.

From the perspective of output quality, natural language and image generation large models have problems such as poor output quality and harmful content in the generated content. Although the results of natural language large model generation seem smooth, there are risks in interpretability and logic, and the model has over-interpretation and redundant answers to simple concepts.

In particular, large models may have the risk of outputting a compromise on the content of the response. For example, give a specific course of action on the criminal act proposed by the user. In addition, from the perspective of uncertainty in future development, large models that continue to expand the scale of training parameters without limit may face the risk of being overtaken by innovative algorithms.

At present, the mainstream large models in the industry are designed on the Transformer architecture, the attention mechanism model proposed in 2017, and if the new mechanism framework can be trained with small-scale parameter samples in the future, then the current practice of hundreds of billions or even trillions of training parameters will no longer have advantages.

Second, the governance system needs to be optimized.

At present, the large model training process lacks supervision and the scope of supervision is more aimed at service providers, which to a certain extent ignores the regulation of service users and commercial operation entities, and the service users are still constrained by user agreements and platform rules, while operators lack the awareness of safety responsibility, which is easy to bury hidden dangers in the output of toxic content.

Service providers and service platforms have review blind spots for AI-generated content. For example, when major platform companies cause harm to malicious applications, the definition of the responsibilities, primary responsibilities and secondary responsibilities of content publishers and publishing platforms is not clear, which is easy to cause regulatory blind spots.

The existing distribution of rights and responsibilities for AI-generated content also needs to be improved. For example, AI-generated content may have the risk of intellectual property infringement, and its source and generated content may also have problems such as unclear ownership of property rights, difficulty in obtaining evidence and determining damages.

From the audience level, the public's awareness of AI-generated content technology is not strong, and the awareness of the risk prevention of illegal abuse is not high, which is also easy to cause personal reputation and psychological damage.

Blindly following the trend is costly

The big model boom is coming, and you need to be wary of blindly following the trend.

Shanghai World Artificial Intelligence Conference, HUAWEI CLOUD Pangu model

The capital speculation psychology generated by the explosion of large models represented by ChatGPT will override the investment psychology, which will have a negative impact on the subsequent long-term benign development of the industry. The ultra-high training cost threshold makes it inappropriate for small and medium-sized enterprises to invest in such projects.

Taking ChatGPT large-language model training as an example, it is initially estimated that the cost of ChatGPT training exceeds $2 million, and OpenAI's hardware investment cost alone exceeds $800 million. In addition to training costs, the development of technology is also a threshold, high-quality training corpus and large-scale manual labeling determine that only large institutions or leading enterprises have the corresponding strength, and blind follow-up of growing enterprises will lead to investment failure.

In addition, the homogeneity of the training set leads to low quality of large models. The training data set of large models generally comes from publicly available resources such as encyclopedia platforms, news corpus, social media texts, books, forums, etc. If the large models that join the competition use homogeneous training datasets, it will lead to the elimination of most large models with poor performance and waste of huge resources.

Even for industry giants, the consumption of large costs is also a risk.

The cost of computing power is the bulk of cost consumption. Taking ChatGPT as an example, it consumes a lot of computing power in daily operations and model iteration. According to Similarweb, a total of 602 DGX A100 servers are currently needed to meet daily traffic. At the same time, ChatGPT will also use the computing power of data centers originally used to support cloud computing, video streaming and 5G networks. Multiple companies preemptively deploy and train similar models, which will lead to waste computing power.

The cost of electricity is an implicit consideration that must be mentioned. During ChatGPT training, its energy consumption is equivalent to the electricity consumption of hundreds of computers in a year. Carbon emissions are an overlooked hidden spot. Training ChatGPT using carbon-intensive energy sources such as coal and natural gas would produce 550 tons of carbon emissions, equivalent to 550 round trips per person between New York and San Francisco.

Human resource consumption should also be a key focus. According to Time magazine, OpenAI not only hired a large number of Kenyan outsourced workers who earn less than $2 an hour to label the data, but also signed three additional contracts with Sama with a total value of about $200,000 to flag harmful content in the database. However, from the existing performance, more labeling manpower is still needed to improve model performance.

At present, domestic enterprises have shown a follow-up development trend in the research and judgment of the development direction of large models. For example, after GPT4 opened the development direction of multimodal large models, domestic first-mover large models have also developed multi-modal, how to get rid of the dependence of the follow-up development path and build independent controllable and innovative large models is the current key task.

Considering the large scale of domestic users, it will take time to verify whether the domestic large model can carry far higher concurrent processing requirements than overseas users. Considering the high computing power cost of large model training and inference, it is difficult to pass the path of free exchange for large-scale popularization and monetization, and how to achieve the cost return and incremental revenue of application empowerment costs still needs to be explored.

Suggested initiatives

In the face of the above problems, they can be dealt with in four aspects.

The first is to promote the research and application innovation of the underlying technology of large models.

Including increasing the investment in basic research, strengthening the original theoretical research and technical research in view of the technical challenges faced by large models, such as model generalization ability, computing resource consumption, interpretability and other issues, and continuously improving the performance of large models; Cultivate an innovation ecosystem, build an open R&D platform for large models, encourage the active participation of all parties involved in government, industry, academia and research, and promote the transformation of large model application achievements and the coordinated development of the industrial chain; Promote qualified enterprises to apply large models to practical scenarios as soon as possible, such as humanoid robots, intelligent networked vehicles, biomedicine, new materials and other fields, and improve the level of intelligence in key areas with application as the guide.

The second is to establish and improve the large-model supervision mechanism.

The large model training review mechanism should minimize the disorderly competition of homogeneous large models and reduce the waste of resources. Explore the establishment of safety and reliability evaluation standards for large models, and put forward specific technical standards and evaluation criteria to ensure the stable and reliable operation of various large models in various application scenarios; Strengthen the supervision of data sources and training processes of large models, review data processing logic, clarify compliance requirements for data collection, collation and use, and prevent data abuse and privacy leakage; Classify and manage the application scenarios of large models, establish the scope and restrictions of different types of models, and avoid the negative impact caused by improper application; Establish a responsibility traceability mechanism for large models, clarify the rights and obligations of developers, developers, deployers and users in the process of using the model, and provide a legal basis for potential disputes; Actively promote cross-departmental and cross-field regulatory coordination, form an all-round and multi-level regulatory pattern, and improve regulatory efficiency.

The third is to guide rational investment in the capital market.

Including guiding the investment trend, encouraging all kinds of investment entities to actively participate in the application of large models, forming a diversified investment pattern, and preventing clustering investment training links; Strengthen risk prevention awareness, guide investors to establish long-term investment concepts, pay attention to technological innovation and market prospects of large-model projects, and avoid blind investment and short-term behavior; Establish a large-model industrial chain coordination mechanism to promote the common development of upstream and downstream enterprises and form a cluster effect.

Fourth, strengthen international cooperation and exchanges.

Actively participate in international AI regulatory organizations and forums, discuss and share regulatory experience and technical achievements with governments, regulators, enterprises and research institutions, and jointly explore effective ways of transnational supervision; Explore the establishment of a multilateral cooperation framework with some countries to promote the formulation of global large-model regulatory norms and technical standards; Carry out multilateral exchanges and cooperation, carry out project cooperation, technical exchanges and talent training with other countries in the field of large-scale model supervision, so as to improve the practical ability and theoretical level of supervision; Pay attention to global regulatory trends, keep abreast of new policies, technologies and new cases in various countries in large-model supervision, and provide reference for domestic supervision; Actively promote international regulatory capacity building, support relevant international organizations and institutions to carry out regulatory capacity building projects, and enhance the mainland's international discourse power in large-model supervision; Build an international regulatory information sharing platform to realize information exchange and resource sharing between regulators in various countries to facilitate cross-border regulatory cooperation and problem solving.

Read on