laitimes

Dump Nvidia on the street! China's self-developed GPU is released, which is better than the 27 trillion parameter chip

author:Shou and Tianqi

The rise of the big model

A large model is an AI model with a large number of parameters and a large amount of training data. This model is trained on massive unlabeled data on a large scale in advance to learn the rich patterns contained in the data, so as to obtain general representation capabilities. After that, you only need to fine-tune a small amount of data on a specific task to complete that task.

The reason why large models can become an important direction for the development of artificial intelligence is mainly due to the joint promotion of three key factors: big data, large computing power and strong algorithms.

Dump Nvidia on the street! China's self-developed GPU is released, which is better than the 27 trillion parameter chip

With the advent of the era of big data, the massive amount of data required to train large models can be obtained. The rapid development of computing hardware, such as the emergence of specialized AI chips such as GPUs, has provided the necessary computing power to train large-scale models. Thirdly, it is the innovative breakthrough of machine learning algorithms such as deep learning that makes it possible to train large models.

Large models were the first to make breakthroughs in the field of natural language processing. In 2018, Google released the BERT model, which was the first to apply the Transformer encoder to language understanding tasks, and achieved excellent results. In 2020, OpenAI released the GPT-3 model, with a maximum of 175 billion parameters, demonstrating powerful text generation capabilities.

Dump Nvidia on the street! China's self-developed GPU is released, which is better than the 27 trillion parameter chip

Subsequently, large model technology has also made great progress in the fields of computer vision, speech recognition, and multimodality. For example, OpenAI's DALL-E 2 can generate realistic images based on natural language descriptions, and DeepMind's AlphaFold 2 is able to accurately predict the three-dimensional structure of proteins.

The current state of the big model

At present, large model technology is developing rapidly around the world and has become the dominant direction in the field of artificial intelligence. Here are some representative large models:

Dump Nvidia on the street! China's self-developed GPU is released, which is better than the 27 trillion parameter chip

GPT series (OpenAIGPT-3 (175 billion parameters), GPT-4 (1.8 trillion parameters, with strong natural language understanding and generation capabilities.

LaMDA (Google's conversational large language model with 1.8 trillion parameters.

PanGu-Alpha (Huawei's Chinese large model, with 200 billion parameters.

Wenxin Yiyan (Baidu Chinese large model, with a parameter volume of 1.7 trillion.

Tongyi Qianwen (Ali Chinese large model, with 720 billion parameters.

Dump Nvidia on the street! China's self-developed GPU is released, which is better than the 27 trillion parameter chip

In addition to the above-mentioned general large models, various industries are also actively exploring the application of large models in vertical fields, such as finance, healthcare, manufacturing, etc. Some companies are also trying to deploy large models on mobile devices and other end devices.

Large model technology is accelerating its development and application around the world. But large models also face some significant challenges and bottlenecks.

Challenges for large models

Computing power bottleneck

Training large-scale models requires extremely high computing power, and the demand for specialized AI chips such as GPUs is surging. GPT-4, for example, costs up to $63 million to train. At present, China has not been able to provide sufficient computing power to support large model training in a completely independent and controllable manner.

Dump Nvidia on the street! China's self-developed GPU is released, which is better than the 27 trillion parameter chip

Data bottlenecks

There is a huge demand for high-quality training data for large models. Taking GPT-3 as an example, the amount of data used for its training is as high as nearly 50 billion tokens. Obtaining high-quality data at this scale is not an easy task, and there are risks such as data privacy.

Model interpretability

The working mechanism inside a large model is often a "black box" with a lack of explainability, which brings obstacles and risks to its application in key fields. How to improve the explainability and trustworthiness of large models is an urgent problem to be solved.

Dump Nvidia on the street! China's self-developed GPU is released, which is better than the 27 trillion parameter chip

Energy consumption and carbon emissions

Training large models consumes a lot of energy, resulting in high carbon emissions. GPT-3 is estimated to be equivalent to driving 57 times around the globe in carbon emissions. How to reduce the energy consumption of large models is an important sustainable development consideration.

Intellectual Property and Ethical Risks

The massive amount of data required for large model training may involve legal risks such as intellectual property rights. The output results of large models may also carry ethical risks such as bias and false information, and need to be strengthened.

The future development direction of large models

Dump Nvidia on the street! China's self-developed GPU is released, which is better than the 27 trillion parameter chip

Computing innovation

Accelerate the R&D and industrialization of domestic AI chips, and provide independent and controllable computing power to support large model training. At the same time, it promotes the collaborative innovation of computing hardware and software algorithms to improve the efficiency of computing power utilization.

Data empowerment

Build high-quality datasets to promote the cross-domain and cross-scenario circulation of data elements. Explore technical means such as privacy-preserving computing to protect personal privacy. At the same time, data governance should be strengthened and data collection and use should be standardized.

The interpretability of the model has been improved

Strengthen the research on the internal mechanism of large models, and propose new explainable AI theories and methods. Combine artificial intelligence and causal reasoning and other technologies to enhance the explainability and trustworthiness of large models.

Dump Nvidia on the street! China's self-developed GPU is released, which is better than the 27 trillion parameter chip

Green and intelligent development

Promote the green and low-carbon development of computing facilities, and increase the proportion of renewable energy in computing centers. Research on efficient and low-carbon model training algorithms to reduce the energy consumption and carbon emissions of large models.

Ethics and Regulation

Strengthen research on the ethics of artificial intelligence, and formulate relevant laws and regulations. Establish a review and supervision mechanism for the output content of large models to prevent the spread of false information and harmful content.

Large model technology is leading the development of artificial intelligence into a new stage, but it still needs to continue to innovate and solve a series of technical and social challenges in order to truly unleash its huge potential and benefit human society.

Dump Nvidia on the street! China's self-developed GPU is released, which is better than the 27 trillion parameter chip

Read on