laitimes

AI Annual Summary and Outlook: The outbreak of ultra-large-scale pre-training models, and the eve of autonomous driving commercialization | insights

Alpha Commune

An angel investment fund that helps entrepreneurs a lot

━━━━━━

Alpha Commune said: This paper brings together the views of many scholars and front-line technical experts, reviews the breakthrough of AI in 2021, and looks forward to the trend of AI in 2022. The content covers the progress of basic technologies such as pre-trained models to breakthroughs in application scenarios such as automatic driving, and welcomes practitioners and entrepreneurs in the field of AI to refer to.

Click on the business card of the public account below to follow Alpha Commune and grow together with China's leading angel investment fund.

The year 2021 that has just passed has been another year of ups and downs. The epidemic is still not the sign of ending, the supply chain disruption caused by the lack of core is one after another, at the same time, digital and intelligent transformation is the trend of the times. Businesses and institutions around the world are constantly learning to adapt to the "new normal" and capture new business opportunities from it.

AI Annual Summary and Outlook: The outbreak of ultra-large-scale pre-training models, and the eve of autonomous driving commercialization | insights

In 2021, the field of artificial intelligence is still booming. AphaFold2 successfully predicted 98% protein structure, pre-trained large models ushered in a big explosion, autonomous driving entered a new stage of commercial pilot exploration, the concept of metaversics was blown by the east wind, the first global agreement on AI ethics was passed, and SenseTime won the "first share of AI" (in 2022)... The breakthrough of cutting-edge technology is gratifying, and the landing application of "moisturizing things is silent" has penetrated into various industries, and the industry has also begun to face up to the problems and challenges of artificial intelligence.

At the end of the year and the beginning of the year, we interviewed many industry experts, reviewed the development of VARIOUS AI technologies such as artificial intelligence big models, deep learning frameworks, NLP, intelligent voice, autonomous driving, and knowledge graphs in 2021, and looked forward to possible technology trends in 2022.

Summary and outlook of annual development of AI technology

Artificial intelligence is moving towards the stage of "refining the big model"

2021 is the year of the explosion of hyperscale pre-training models.

In 2020, GPT-3 turned out to be a pre-trained model with a parameter scale of 175 billion, and the zero-sample and small-sample learning ability shown by this pre-trained model with a parameter scale refreshed people's cognition and triggered the boom of AI big model research in 2021.

Google, Microsoft, NVIDIA, Zhiyuan Artificial Intelligence Research Institute, Ali, Baidu, Inspur and other domestic and foreign technology giants and institutions have carried out large-scale model research and exploration.

The "Arms Race" of Hyperscale Pre-Training Models

In January 2021, Google's Switch Transformer model broke the dominance of GPT-3 as the largest AI model with a parameter volume of up to 1.6 trillion, becoming the first trillion-level language model in history.

Domestic research institutions are not to be outdone. In June 2021, Beijing Zhiyuan Artificial Intelligence Research Institute released the ultra-large-scale intelligence model "Wudao 2.0", reaching 1.75 trillion parameters, surpassing Switch Transformer to become the world's largest pre-trained model.

It is worth mentioning that the research and development of domestic large models has progressed rapidly this year, and Huawei, Inspur, Ali, Baidu, etc. have all released self-developed large models.

Wu Shaohua, chief researcher of the Inspur Artificial Intelligence Research Institute, told reporters that there are now two technical routes for the industry to improve the amount of model parameters, resulting in two different model structures, one is a monomer model, and the other is a hybrid model. For example, Inspur's source model, Huawei's Pangu model, Baidu's Wenxin model, NVIDIA and Microsoft's natural language generation model MT-NLG, etc. all take the single model route; while Zhiyuan's Wudao model, Ali M6, etc. take the hybrid model route.

New advances in pre-trained model technology

Zeng Guanrong, an NLP algorithm engineer at OPPO Xiaobu Intelligence Center, believes that the important technological progress made by pre-trained models this year include:

The mechanism of knowledge representation and learning further innovates and breaks through

With the in-depth understanding of the pre-trained model, the mechanism of knowledge learning and representation of the pre-trained model is gradually clarified, and people can more smoothly inject the knowledge that needs to be learned into the model, and with the blessing of these knowledge, the ability to cope with complex tasks has been greatly improved.

Comparative learning, self-supervision and knowledge enhancement

With comparative learning as the center, a variety of enhancement methods as a tool can further improve the semantic understanding and representation ability of the pre-trained model, enhance the depth of the method to make the model self-supervision possible, so that the dependence of contrast learning on the sample, especially the positive sample is reduced, the reduction of data dependence is bound to make the model adaptable to less samples or even no sample tasks improved, the model can better complete this type of task, which will reduce the cost of pre-trained model landing to another level.

Lower the threshold for ai to land on a large scale

Pre-trained large models lower the barrier to entry for AI applications and solve two challenges for AI applications: data and industry knowledge. It does not require a large amount of labeling data, but also guarantees the base base.

In terms of business customization optimization and application of pre-trained models, Zeng Guanrong believes that since the release of the first pre-trained language model BERT, it has been applied under a number of popular tasks, and gradually changed from a "trend" to the "basic operation" of cutting-edge technology, such as pre-trained models have become the basic key technology in the field of machine translation. In addition, pre-trained models become part of a larger system, taking advantage of their semantic understanding.

Whether it is industry or scientific research, the use of pre-trained models is gradually flexible, and it is possible to disassemble the parts suitable for the task from the pre-trained model and assemble them into their own actual task models.

To this day, the performance optimization of pre-trained large models has not been terminated, in the academic community, there are still a large number of studies in the pre-trained model landing ability efforts, compression, pruning, distillation work still plays an important role. Not only the algorithm itself, but also the optimization of compilation, engine, hardware, etc., is also making great strides.

Summary and outlook

Wu Shaohua believes that overall, the research of large-scale pre-training models, including the evolution and landing of model structures, is still in the exploratory stage, and the continuous exploration of various companies is constantly expanding the cognitive boundaries of large-scale pre-training models.

"Large-scale pre-training models are the latest technological highlands of artificial intelligence, and an all-round test of the original innovation of massive data, high-performance computing and learning theory," Said Liu Zhiyuan, a professor at Tsinghua University and a member of the Zhiyuan Big Model Technical Committee, looking forward to the development trend of big models next year.

Liu Zhiyuan said he will focus on two levels of the problem next year:

First, artificial intelligence technology is showing a "unified" trend, such as pre-trained models can be used for many different tasks with the support of Technology such as Prompt Tuning, and the Transformer model framework is expanding from natural language processing to computer vision modes, and then we may see more work from the framework, model and task aspects to promote the unification of artificial intelligence technology; Another problem is that as the scale of pre-trained models increases, how to better and more efficiently achieve task adaptation and reasoning calculations will be an important technology for large models to fly into thousands of households.

Domestic deep learning framework is no longer a "follower of technology"

A large number of AI algorithms and applications have emerged in the past decade, all of which are inseparable from the support provided by open source deep learning frameworks.

The open source deep learning framework is a "scaffolding" for AI algorithm research and development and AI application landing, helping AI researchers and developers significantly reduce the threshold of algorithm research and development and improve research and development efficiency.

According to IDC' research, more than 90% of products in the field of Chinese intelligence use open source frameworks, libraries or other toolkits.

New developments, new trends

The core of the development of the deep learning framework is to follow the development of the field of deep learning.

Xu Xinran, head of R&D at MegIne, an open source deep learning framework, shared the new progress in deep learning he has observed over the past year:

(1) The Transformer model represented by ViT and Swin has begun to march into the field outside of NLP, showing its power in more scenarios and making the trend of "big" models more and more intense.

Correspondingly, deep learning frameworks have also made great strides in training large models, and multiple hybrid parallel schemes have emerged in an endless stream. Both deep learning frameworks and hardware vendors are wondering whether Transformer is a computational paper that will be fixed for a long time.

(2) The birth of graphics cards such as A100 has spawned a trend from dynamic charts to static charts. Frameworks that are more friendly to dynamic graphs have also tried to improve efficiency through compilation, such as PyTorch's Lazy Tensor and Jax's XLA. Many domestic frameworks are also trying to improve efficiency through the combination of dynamic and static, such as the Tensor Interpreter launched by MegEngine, MindSpore's Python code to static diagram program, etc.

In addition, MLIR and TVM, the two beacons of deep learning compilers, are growing rapidly, and how to compile by machine is also becoming the main direction of the development of various deep learning frameworks. At the same time, with the continuous development of deep learning methods, more emerging frameworks have been born, such as DGL in the field of neural networks.

The road to technological self-reliance

In the past two years, domestic deep learning frameworks have been open sourced and developed rapidly, gradually occupying a place in the open source framework market.

In terms of technology research and development, the domestic framework is no longer the role of the "follower" of technology, and many leading innovation points have been developed, such as MegEngine's DTR technology, OneFlow's SBP parallel scheme and MindSpore's AKG. In addition, it has reached a very high standard in terms of functionality, code quality and documentation.

In terms of open source ecological construction, each company has also continued to invest in helping the development of domestic open source ecology and talent training through open source community support and industry-university-research cooperation.

At present, the self-developed deep learning framework of domestic enterprises has not yet become an international mainstream learning framework.

Xu Xinran frankly said that the domestic deep learning framework still has a long way to go in ecological construction, which requires continuous investment and continuous improvement of ecological construction, and also needs to find differentiated technical competition points, fully combine China's national conditions and domestic hardware, and give full play to its own technical advantages and better ecological insight.

R&D difficulties

At this stage, in terms of deep learning framework, the R&D difficulties commonly faced by the industry are mainly reflected in the following three aspects:

(1) On the training side, NPU began to enter, many manufacturers have made their own training chips, how to efficiently dock training NPU is still to be solved;

(2) Academic research is developing rapidly, and framework technology needs to be continuously followed up, which brings certain challenges to framework research and development. The next period of time will continue a trend of large transformers, so what is the next trend?

(3) The speed of computing power improvement has begun to rely more on DSA hardware, but it is difficult to support the simple handwritten kernel, and the framework needs more compilation technology and domain knowledge to continuously improve the training efficiency. With the rapid iteration of chips such as NPUs and GPUs, compilation technologies including MLIR, XLA, and TVM will receive more attention.

Large model training will be better supported

With the continuous popularity of large models, it is expected that deep learning frameworks will continue to improve in parallel strategies, recomputing and other capabilities to better support the training of large models.

At the same time, the current training of large models still needs to consume a lot of resources, how to rely on the power of deep learning frameworks to save computing resources, or even complete tasks on a smaller scale, will be a technical direction worth exploring.

Intelligent voice this year: technological breakthroughs continue, industrial landing accelerated

Large-scale pre-trained models in the language field are emerging

Ze Jun, technical director of ByteDance AILAB, said that in 2021, the evolution of intelligent voice technology shows a trend of three levels:

(1) Basic modeling technology accelerates integration in breaking the boundaries of the field, such as the Transformer series model in the fields of natural language, vision and language, which shows the advantage of consistency, which has the meaning of "unifying the rivers and lakes".

(2) Super-scale supervised learning technology (self-supervised learning) shows a strong general learning ability in many of the above areas, that is, training scale-based pre-training models on massive label data, and then a small amount of labeled data can be fine-tuned to achieve good results.

In the past few years, the super model based on this two-stage training mode has continuously refreshed the records of various academic algorithm competitions, and has also become a model training and tuning paradigm in the industry.

In recent years, researchers from Facebook, Asia, Song, and Microsoft have successively proposed large-scale pre-training models in the field of language, such as Wav2vec, HuBERT, DecoAR, BigSSL, WavLM, and so on.

(3) In addition to basic technologies, in different scenarios, the technologies of multiple domain modes are also rapidly merging with each other to form a multimodal integrated system that combines vision, semantics and semantics, such as virtual numbers.

The landing of the industry has accelerated

Overall, the landing of intelligent language technology in the industrial world continues to accelerate, and the joint efforts from both business and technology to jointly pull and drive the application landing.

From the perspective of the traction of application scenarios, on the one hand, such as short, medium and long video services, which still maintain a high growth rate in the world, video content creators and content consumers are highly active; on the other hand, the epidemic has increased the demand for home office and remote collaboration, and intelligent language technology can provide key capabilities such as communication enhancement and speech recognition in video conferencing to provide participants with a better conference experience; new scenarios represented by intelligent automobiles and virtual reality VR/AR continue to appear, requiring more convenience, lower latency, A more immersive voice interaction experience.

From the perspective of the core technology, the basic model improvement and self-supervision technology continue to increase the upper limit of model performance, while the integration of multimodal technologies makes the ability of technical solutions stronger and stronger, which can hold more complex scenarios and bring better experiences.

The difficulty of commercialization mainly lies in the choice of business model

Ma Zejun believes that at this stage, the difficulty of intelligent voice commercialization is mainly the problem of business model exploration and route selection, including how to better meet demand, control costs and ensure the quality of delivery.

On the one hand, AI business model exploration needs to always be centered around the needs, and improving the model effect is not the same as solving the problems of users or customers in real scenarios. Solving practical problems requires AI developers to go deep into business scenarios, understand requirements and conditional constraints, find reasonable product technology plans, and constantly think about and abstract functions and technologies, precipitate general technical solutions, explore and verify scalable standard products, and reduce customization cycles and costs.

On the other hand, the cost of AI technology research and development is very high, how to optimize the algorithm low dependence on domain data, the construction of an automation platform to reduce manpower consumption and improve the efficiency of the R & D process is very critical to cost control.

Finally, we must pay attention to the quality of delivery and after-sales service. Only by doing the above three links at the same time can we complete the entire link from demand to delivery to service, thus laying the foundation for scale commercialization.

Technologies such as end-to-end and pre-training are still worth keeping an eye on

End-to-end sequence modeling techniques

(1) The end-to-end technology with higher accuracy and inference speed is worth looking forward to, of which alignmentmechanism is the key to end-to-end sequence modeling. The Continuous Integrate-and-Fire continuous integration CIF model being explored by ByteDance AI LAB is an innovative sequence end-to-end modeling alignment mechanism with soft alignment, low computational cost, and easy scalability.

(2) The end-to-end language recognition and synthesis technology on the end-side device is worth paying attention to, especially the lightweight, low power consumption, accuracy and customized flexible end-to-end language recognition and synthesis technology.

(3) End-to-end language recognition technology Towards hot word customization and field adaptation technology is often subject to heavy progress.

Unsupervised pre-training techniques

(1) Hyper-data scale and model size of the language supervision pre-training technology is worth paying attention to, the language supervision pre-training BERT has emerged (Wav2vec2.0/Hubert), and the GPT-3 of language supervision pre-training is likely to arrive in 2022.

(2) Multimodal language supervised pre-training technology is also often attracted, which may polarly improve the characterization ability of pre-trained models, from bringing a more range of supervised pre-training techniques to landing.

(3) Unsupervised pre-training technology in the field of language synthesis, music classification, and music recognition should also be worthy of attention, with the help of supervised pre-trained acoustic frequency characterization, can effectively improve the performance of downstream tasks.

Language Adversarial Attack and Defense Techniques.

(1) The confrontational attack in the language field, from the perspective of the attack segment, will evolve from the current box attack to the box attack; from the perspective of the attack content, it will evolve from the current stream of untarget attacks to target attacks.

Who can win the battle for autonomous driving?

In 2021, the field of autonomous driving is particularly lively.

Car-making fever

This year, The Internet manufacturers, new car-making forces and traditional enterprises have entered the field layout of automatic driving, it can be said that the giants who can go down to the field are basically off the field to build cars, automatic driving "battlefield" crowds of deer, do not know who will hold the bull's ear in the future?

In the capital market, autonomous driving is also highly sought after. According to the analysis of Zero One Think Tank, following the boom in 2016-2018, the field of autonomous driving ushered in a second investment boom in 2021. In November 2021, Momenta completed more than $1 billion in Series C financing, setting a record for the largest annual financing in the field of autonomous driving.

The eve of commercialization

Robotaxi is the most valuable business model for autonomous driving, and at this stage, many autonomous driving technology companies are trying to do Robotaxi. This year, many autonomous vehicles have moved from closed road testing sites to real roads. Baidu, Xiaoma Zhixing, Wenyuan Zhixing, and other enterprises have realized demonstration operations for the public and begun to explore commercialization. In November, The first pilot commercialization of autonomous driving services in China was officially launched in Beijing, and Baidu and Xiaoma Zhixing became the first batch of enterprises allowed to carry out commercial pilots. Industry insiders believe that this marks a new stage in the domestic autonomous driving field from test demonstration to commercial pilot exploration.

This year, the self-driving truck track is also particularly hot, mass production and commercialization are accelerating, and head players are going public. Recently, Zhang Kai, chairman of Zhixing, talked about the development of self-driving trucks in an interview with the media, saying that compared with the complexity of the passenger car auxiliary automatic driving operation scenario, RoboTruck has some advantages, such as running on a relatively smooth highway for a long time, and the operation scenario is relatively simple. At this stage, RoboTruck is taking a gradual development route similar to that of passenger cars from assisted driving to driverless. From the perspective of development prospects, Robotruck has the feasibility of commercialization and closed-loop, but the mass production of autonomous driving systems will be a hurdle.

Hou Jun, COO of Zhixing, believes that 2021 is the year of the outbreak of autonomous driving. On the one hand, thanks to the continuous progress of technology, market demand, policy support, capital optimism and other factors, high-level automatic driving has achieved preliminary results in landing exploration; on the other hand, the commercialization of intelligent driving is also rapidly penetrating and beginning to move towards the era of mass production.

In 2022, these technologies are the key to winning or losing the second half

According to Zhang Kai's prediction, "2022 will be the most critical year for the development of the automatic driving industry." The competition in the field of passenger car assisted driving will officially enter the second half, and the second half of the competition will be the urban open scene. Autonomous driving in other scenarios will also officially enter the first year of commercialization."

Zhang Kai believes that in 2022, a number of autonomous driving technologies deserve attention.

(1) Data intelligence will become the key to the success of mass production of automatic driving. The data intelligence system is the key to the closed loop of autonomous driving commercialization, and building an efficient and low-cost data intelligence system will help promote the continuous iteration of the automatic driving system.

(2) Transformer and CNN technology are deeply integrated, and will become the glue for the integration of autonomous driving algorithms. Transformer technology helps the autonomous driving perception system understand the semantics of the environment more deeply, and the deep integration with CNN technology can solve the problem of mass production and deployment of AI large models, which is the key technology of the second half of the competition in the autonomous driving industry.

(3) The large computing platform will be officially mass-produced in 2022, and Transformer technology and ONESTAGE CNN technology need to be supported by a large computing platform.

(4) With the mass production and scale of the automatic driving system, the AI perception technology composed of lidar and machine vision will be deeply integrated with the large computing platform, which will greatly improve the operation efficiency of the automatic driving perception and cognition module.

NLP, the golden age continues?

In recent years, NLP is in a stage of rapid development. Last year, a number of NLP experts judged that NLP ushered in a golden age of explosions. So how is NLP evolving this year?

Tip-based fine-tuning techniques quickly gained popularity

Dr. Jiang Hongfei, a senior algorithm expert at homework help NLP, told reporters that this year's prompt-based tuning technology has quickly become popular, which is a more efficient combination of human knowledge and large models. This technology is a new development that deserves attention this year.

"This year NLP has not had a big breakthrough in the underlying model. In terms of pre-trained models, many large models have emerged this year, but the overall homogenization is also more serious, and for the practical effect of industry, it is often in accordance with the principle of 'Occam Razor', and it is often enough to use the most appropriate such as Bert," Jiang Hongfei said.

At this stage, there are still many technical challenges in the development process of NLP technology, one of which is that it is difficult to obtain a large amount of high-quality labeling data. Deep learning relies on large-scale labeling data, for speech recognition, image processing and other perceptual tasks, labeling data is relatively easy, but NLP is often cognitive tasks, people's understanding is subjective, and there are many tasks and fields, resulting in large-scale corpus labeling time costs and labor costs are very large.

Compared with CV and speech recognition, NLP projects are slower to land in the service

NLP landing projects are often strongly related to business. Unlike image recognition and speech recognition, general capabilities also have a large number of landing scenarios in specific services, and the boundaries and indicators of business and algorithm collaboration are relatively easy to determine. NLP projects tend to land slowly in the business, requiring continuous deep running-in alignment between upstream and downstream.

NLP addresses the most difficult cognitive intelligences, and the ambiguity, complexity, and dynamics of human language make it challenging. However, the commercialization of NLP must face these essential problems, so it is unlikely that there will be a universal "one trick to eat the sky" technical solution.

"Although the current pre-trained models have been working in this direction, I think at least the current Transformer-style, or more generally, DNN, a honeycomb intelligent technology paradigm is not very good." Therefore, you can see that there are also many researchers who are making efforts in various other paradigms in the knowledge graph category," Jiang Hongfei said.

Since the universal model does not work, why can't a single specific scene task of the vertical class be quickly built? This question again involves the issue of data. The alignment of data standards, consistent and efficient data annotation, data coverage and balance, long-tail data processing, and dynamic data drift are all troubles faced by NLP practitioners every day. The relevant methodologies and basic tools are still very unsystematic and incomplete, which is the foundation that must be laid before the purpose of rapid commercialization can be achieved in the future.

In 2022, in which scenarios will NLP achieve large-scale landing?

In 2022, large-scale applications of NLP may appear in the following industries:

Education is intelligent.

Scenario-based high-standard machine-assisted translation, such as professional field document translation, real-time translation of conferences, etc.

Intelligent service operation: intelligent training, sales, marketing, service and other scenarios.

Intelligent assistance for foreign language learning/writing, a reference to the rapid development of Grammarly and Duolingo.

Medical intelligence. The text is widely present in electronic medical records, clinical trial reports, medical product specifications, and medical literature. Analyzing, mining, and leveraging this text has a large number of directly usable scenarios, and there may be breakthrough developments.

Code intelligence analysis. Code bug recognition, code intelligent optimization, etc.

In 2022, NLP is worth paying attention to

Prompt-based tuning technique.

Text generation techniques with logical reasoning, text generation techniques with good control and consistency. Text generation must meet these requirements for use in serious scenes, otherwise it can only be applied to entertainment scenes.

Multimodal technology. Such as NLP+CV, NLP+Image, NLP+Speech, etc.

Active learning, data augmentation, and more. Many of the pain points of NLP's large-scale rapid landing need these technologies to alleviate.

Code intelligence. Code problem identification, code translation, automatic code optimization, code effort assessment (e.g. Merico's scheme).

The metaverse concept is on fire, and computer vision is one of the cornerstone technologies

Looking back on the past year, He Miao, an expert in the productization of OPPO AI technology (speech semantics and computer vision and multimodal fusion direction), summarized the progress of computer vision in industry and academia.

Be intelligent and shift from passive AI to active AI

Embodied AI emphasizes that agents should interact with the real world and interact through multimodal interactions — not just for AI to learn to extract visual high-dimensional features and "input" to the cognitive world, but to actively obtain real feedback from the physical world through the six roots of "eyes, ears, nose, tongue, and body", and further let the agent learn and make it more "intelligent" and even "evolutionary" through feedback.

In February 2021, Li Feifei proposed a new computing framework, DERL (deep evolution reinforcement learning) deep evolutionary reinforcement learning. She mentions the relationship between biological evolution and the evolution of agents, and draws on the theory of evolution to apply the theory of evolution to the evolutionary learning of hypothetical agents.

Entering the Metaverse requires this ticket for Intelligent Perception and Interaction

This year, the concept of meta-universe is on fire, and various companies have entered the game.

Facebook highly recommends the metaverse, and in order to show its determination to invest in the metaverse, in 2021, Facebook changed its name to meta and announced "all in the metaverse".

Zuckerberg proposes that the cloud universe needs to have eight elements, one of which is the Anticipate development platform/kit. Presence is meta-universe basic development kit provided by Meta for Oculus VR headset developers, which provides a toolset based on computer vision and intelligent voice technology, namely the insight SDK, the interaction SDK and the voice SDK.

Entering the metaverse requires a ticket to intelligent perception and interaction technology, and the visual and speech technology in this ticket is the most important cornerstone.

Trend One: AIGC for Content Generation

The metaverse world needs to twin a large number of real-world objects or rebuild the characters in the real world, and these massive reconstructions must not be handmade by CG engineers one by one according to the methods in the traditional game world, so the efficiency is far from meeting the needs of the actual scene. Therefore, AIGC (algorithmic level) for content generation is necessary. Related technical directions include: image supersing, domain migration, extrapolation, implicit neural representation similar to CLIP (contrast language image pre-training model, visual model can be effectively learned from natural language supervision) - multimodal (CV+NLP) related technologies such as images generated through text description.

Trend two: SCV synthesis

Virtual reality engines have specialized components that generate synthetic data that is not only aesthetically pleasing, but also helps to train better algorithms.

Generated/synthesized data is not only an essential element of the metaverse, but also an important ingredient for training models. If you have the right tools to build data sets, you can eliminate the tedious process of manually marking data and better develop and train computer vision algorithms.

Gartner, a well-known data analytics company, believes that in the next 3 years, synthetic data will be more dominant than real data. In Synthetic Computer Vision (SCV), we use a virtual reality engine to train a computer vision model and deploy the trained model to the real world.

The main problem restricting the commercialization of knowledge graphs is standardization

Important technological advances

Important technological advances in knowledge graph technology over the past year include:

In terms of knowledge extraction, multimodal information extraction has made progress in processing text and video at the same time; in terms of knowledge representation, the knowledge representation method of self-attention mechanism is becoming more and more practical; and in terms of knowledge application, many industries have begun to build industry knowledge bases for various downstream tasks.

Zhang Jie, a senior scientist of Mingluo Technology, pointed out in an interview that at this stage, in terms of knowledge graph, the research and development difficulties generally faced by the industry are mainly reflected in two aspects: algorithms, the accuracy of information extraction and entity alignment for unstructured data is difficult to ensure direct commercial use, and manual verification is required; in terms of engineering, the construction cost of industry maps is high, a large number of manual labels are required, and the construction progress is not achieved overnight, requiring business experts to continuously operate and maintain.

Zhang Jie predicts that in 2022, the application of domain pre-training language models and Prompt in the knowledge graph is expected to further improve the information extraction process. The extraction technology and multi-modal extraction technology for skilled knowledge have broad commercial prospects.

Application landing progress

In 2021, the application of knowledge graph technology is landed, and it is still mainly used for search and recommendation improvement in ToC scenarios, and focuses on visualization in ToB scenarios.

Zhang Jie believes that at this stage, the main factor restricting the commercialization of the knowledge graph lies in standardization, and it is difficult for the schema of the industry map to achieve cognitive consistency within the enterprise, which affects the subsequent labeling, extraction and application.

In 2022, the large-scale application of knowledge graph technology may make a breakthrough in the manufacturing industry, with high knowledge density and emphasis on standardization in the manufacturing industry, and the head enterprises attach importance to digital construction and accumulate a large amount of raw data.

2021 ANNUAL AI TECHNOLOGY BREAKTHROUGH

Artificial intelligence predicts protein structure

On December 15, 2021, Nature released the "Top Ten Science News of 2021"; on December 17, Science followed closely behind, announcing the "Top Ten Scientific Breakthroughs of 2021". Both Nature and Science named "Artificial Intelligence Predicting Protein Structure" the most important discovery of the year, and Science ranked it as the top ten scientific breakthroughs of 2021.

For a long time, the prediction of protein structure has been a research hotspot and difficulty in the field of biology. There are three main methods of traditional protein structure detection: X-ray crystallography, nuclear magnetic resonance and cryo-EM. However, these methods are costly, the research cycle is long, and progress is limited.

Artificial intelligence has pressed the fast-forward button for this problem that has plagued the biological community for decades.

In July 2021, two major AI prediction algorithms for protein structure, DeepMind's AlphaFold2 and RoseTTAFold, developed by institutions such as the University of Washington, were open sourced.

AphaFold2 "unlocks" 98% of the human proteome

On July 16, DeepMind published a paper in Nature announcing that it had used Alpha Fold2 to predict 350,000 protein structures, covering 98.5 percent of the human proteome and nearly complete proteomes of 20 other organisms. The research team also released the open source code and technical details of AlphaFold2.

RoseTTAFold calculates protein structures in less than ten minutes

On the same day, Professor David Baker's research group at the University of Washington's Protein Design Institute and other collaborating institutions published a paper in Science, publishing the results of its open-source protein prediction tool, RoseTTAFold. The research team explored a network architecture that incorporated related ideas and achieved the best performance through a three-track network. The fabric prediction accuracy generated by the three-track network is close to that of the DeepMind team's AlphaFold2 in CASP14, and it is faster and requires less computer processing power. With just one gaming computer, protein structures can be reliably calculated in just ten minutes.

Other research advances

In August 2021, Chinese researchers used Alpha Fold2 to map the structure of nearly 200 proteins that bind to DNA. In November, researchers in Germany and the United States used Alpha Fold2 and cryo-EM to map the structure of the nuclear pore complex. On December 22, Shenzhen Potential Technology launched the protein structure prediction tool Uni-Fold, which reproduced Google's Alphafold2 full-scale training for the first time in China and open sourced training and inference code.

In 2022, important technology trends to watch

Artificial intelligence engineering

In the past two years, artificial intelligence engineering (AI Engineering) has attracted special attention. Among the important strategic technology trends released by Gartner for 2021 and 2022, artificial intelligence engineering is included. Artificial intelligence engineering is a comprehensive method to realize the operation of artificial intelligence models.

Not long ago, Gao Ting, senior research director of Gartner, said in an interview that AI engineering is essentially a large-scale, full-process landing process of AI in the enterprise, although everyone now has high expectations for AI, but in fact, the current application of AI is still underestimated. Because the value of many AI projects can only be reflected in some "peer-to-peer" one-time solutions. The engineering methods (including DataOps, ModelOps and DevOps) that implement AI on a large scale are summed up, which is a complete set of "engineering of AI".

Artificial intelligence engineering has many benefits for enterprises, and when enterprises land artificial intelligence, the landing efficiency and extensiveness of landing will be higher.

It is foreseeable that artificial intelligence engineering will be the direction that needs to be continuously paid attention to in the next 2-3 years, and artificial intelligence engineering should pay attention to three core points: data operation and maintenance, model operation and maintenance, and development operation and maintenance.

Gartner predicts that by 2025, 10 percent of companies that establish AI engineering best practices will generate at least three times more value from their AI efforts than 90 percent that don't.

Generative AI is becoming a trend

Generative Artificial Intelligence was also named by Gartner as one of the key strategic technology trends for 2022.

This machine learning approach learns content or objects from its data and uses the data to generate new, completely original, real-world artifacts. People can use AI to create new things, such as content creation, creating software code, assisting drug development, etc.

Recently, machine learning giant Wu Enda posted an article reviewing four important advances in AI in 2021, one of which is that AI-generated audio content shows a mainstream trend. Musicians and filmmakers are now accustomed to using AI-enabled audio production tools.

On domestic video platforms such as Youku and iQiyi, AI has also been widely used in audio and video content production and creation, such as AI-assisted video production, intelligent subtitle generation, intelligent translation, and special effects generation.

Gartner believes that in the future, AI will gradually change from a judgment machine to a creation machine. Generative AI is expected to account for 10% of all generated data by 2025, up from less than 1% today.

However, there is still some controversy about the technology, such as it will be abused for fraud, fraud, political rumors, identity forgery, etc., there are moral and legal risks.

Metaverse, a new outlet for fanaticism

In 2021, there may not be any technical term more popular than "metaverse". Many companies around the world are talking about the concept of metacosm, believing that metacosm is the "ultimate form" that points to the Internet. Nowadays, the dividend of the mobile Internet has peaked, and I wonder if the end of the Internet will be a metaverse?

The so-called metacosm is a collection of virtual time and space, consisting of a series of augmented reality (AR), virtual reality (VR) and the Internet (Internet). The realization of the metacosm is supported by a series of cutting-edge technologies, including artificial intelligence, VR/VR, 5G, cloud computing, big data, blockchain and other infrastructure.

The subdivisions that can be focused on in the metaverse are VR/AR, gaming, social, Metahuman, etc. The meta-universe report said that it is optimistic about the underlying technology companies based on the above form for a long time, and predicts that in the next decade, the meta-universe concept will still focus on entertainment fields such as social networking, games, and content, and will penetrate into the field of improving production and life efficiency by 2030.

Write at the end

Summarizing the development of artificial intelligence in 2021, there have been many exciting major breakthroughs, and artificial intelligence is also empowering, changing and even subverting many industries. Of course, there are still many difficulties that need to be invested in more time to overcome.

Recently, Robin Li commented on the future of AI: in the era of "human-machine symbiosis", China will usher in the golden decade of AI. In the next decade, the threshold for the application of AI technology will be significantly reduced, providing a technical "big base" for the intelligent transformation of all walks of life.

The development of artificial intelligence has gradually entered the deep water area, and it is expected that next year and the next 10 years, artificial intelligence can make more progress in technology and landing, and strive for the next "golden decade".

Interviewees (in alphabetical order):

He Miao, OPPO AI technology productization expert

Hou Jun, COO of Zhixing

Jiang Hongfei is a senior algorithm expert in NLP

Liu Zhiyuan is a professor at Tsinghua University and a member of the Zhiyuan Big Model Technical Committee

Ze Jun, Technical Director of ByteDance AILAB

Wu Shaohua is the principal researcher of inspur artificial intelligence research institute

Xinran Xu is the R&D leader of MegEngine, an open source deep learning framework

Zeng Guanrong, OPPO Xiaobu Intelligence Center, NLP Algorithm Engineer

Zhang Jie is a senior scientist of Mingluo Science and Technology

Kai Zhang, Chairman of Zhixing

This article is authorized to be reproduced from AI Frontline, author Liu Yan.

Read on