laitimes

Impression 2021| artificial intelligence is surging up

author:Shanggong robot

Today, after several years of development, artificial intelligence no longer has the original sense of mystery. Looking back at 2021, there are many commendable research and landing results in both artificial intelligence technology and industry. Ai basic theoretical research has gradually deepened, and there are good research in the fields of multimodality, machine learning, natural language processing, computer vision, chips and basic science that have triggered heated discussions. If you still focus on speech recognition, image recognition, smart reading, virus sequencing... Then the following cutting-edge technology and industry research is believed to be of interest to you.

The DeepMind machine learning framework helped discover two new mathematical conjectures

The British journal Nature published on December 1 a machine learning framework developed by artificial intelligence company DeepMind that has helped discover two new conjectures in the field of pure mathematics. The study demonstrates that machine learning can support mathematical research, and it is the first time that computer scientists and mathematicians have used ARTIFICIAL intelligence to help prove or propose complex theorems in mathematical fields such as Knot Theory and Representation Theory.

One of the key goals of pure mathematical research is to discover laws among mathematical objects and use these connections to form conjectures. Since the 1960s, mathematicians have used computers to help discover laws and formulate conjectures, but AI systems have not yet been widely used in theoretical mathematical research.

This time, the DeepMind team and mathematicians built a machine learning framework to assist in mathematical research. The team also says their framework could encourage further collaboration between math and artificial intelligence in the future.

Sony released an AI-ISP chip for sensor-memory-computing integrated design near-optical sensors

With the development of industries such as the Internet of Things, retail, and smart cities, the demand for AI processing power in camera products has grown rapidly. The AI processing power of edge chips can solve problems that only occur in cloud computing systems, such as latency, cloud communication, processing overhead, and privacy issues. The current market requirements for edge-end smart cameras include small, low-power, low-cost, easy to deploy, etc., but the current traditional CMOS image sensor can only output raw image data. Therefore, when designing intelligent cameras with AI capabilities, it is important to combine image signal processors (ISPs), neural network processing capabilities, DRAM, and so on.

At the 2021 IEEE International Solid-State Circuits Conference (ISSCC), Sony released its back-illuminated stacked CMOS image sensor chip, which consumes 4.97TOPS/W. By stacking image sensors, CNN processors, and subsystems such as ISPs, DSPs, and memory, complete AI image processing capabilities are realized on a single chip.

TRFold leads the breakthrough of computational biology in China

In July 2021, DeepMind published the source code for AlphaFold2 and published a paper in nature, a top tech magazine, laying out the technical details of AlphaFold2. On the same day, David Baker also published RoseTTAFold's algorithm and published the results of the study through Science.

This open source has made a huge wave in the biological community, which means that biologists have the opportunity to get rid of the constraints of advanced equipment, which is often very expensive and only has the conditions for well-funded universities or research institutions. After that, those small teams or individual researchers can also participate in protein research.

TRFold, a deep learning protein folding prediction platform developed by Chinese Intelligent Company, has obtained a score of 82.7/100 (TM-Score) in the enterprise internal test based on the CASP14 (14th International Protein Structure Prediction Competition in 2020), which has surpassed the RoseTTAFold 81.3/100 score developed by the team of biologists from the University of Washington, second only to the RoseTTAFold 81.3/100 AlphaFold2 has a score of 91.1/100. When predicting the protein chain of 400 amino acids, TRFold takes only 16 seconds. This is the best result achieved in all the public protein structure prediction models in China, and it marks that the performance of China's computational biology field has been in the world's first echelon.

In the context of the COVID-19 pandemic, the global life sciences sector is facing transformation, and the first year of AI+ life sciences is beginning. It is believed that in the next few years, a large number of institutions and companies will join the boom of technological innovation and life science research.

DeepMind publishes a paper on the assessment of social hazards in language models

In December 2021, DeepMind published a paper examining the ethical and social harms of pre-trained language models. The researchers mainly explored the adverse effects of the model in six major aspects, and talked about two ethical and social aspects that require continuous attention. One is that current benchmarking tools are not sufficient to assess some ethical and social harms. For example, when a language model generates error message, humans will believe that the information is true. Assessing this hazard requires more human-computer interaction with language models. Second, the research on risk control is still insufficient. For example, language models learn to reproduce and amplify social biases, but research on the problem is still in its early stages.

The MIT-IBM Joint Lab builds neural networks to learn NLP tasks based on Drosophila brains

In March 2021, researchers at the MIT-IBM Joint Laboratory built neural networks based on mature neurobiological network modulus (Motifs) in the drosophila brains to mathematically formalize the structure. The network can learn semantic representations to generate static, context-dependent word embeddings. According to experiments, the performance of the network is not only comparable to existing NLP methods, but also has a smaller memory footprint and requires less training time. In contextual word tasks, fruit fly networks performed nearly 3 percent better than GloVe and more than 6 percent higher than Word2Vec.

OpenAI proposes a large-scale multimodal pre-trained model DALL· E and CLIP

With the support of big data, large parameters and large computing power, pre-trained models can fully learn the representations in the text and master certain knowledge. If the model can learn data from multiple modes, it will have a stronger performance on visual language (VisionLanguage) tasks such as graphic generation and image question answering.

In January 2021, OpenAI simultaneously released two large-scale multimodal pre-trained models, DALL · E and CLIP. DALL· E Can generate images based on short text hints , such as a sentence or a piece of text — clips can classify pictures based on text hints. OpenAI said that the goal of developing multimodal large models is to break through the boundaries of natural language processing and computer vision and realize multimodal artificial intelligence systems.

Google proposed a multitasking unified model MUM

In May 2021, Google unveiled the development of the MultitaskUnifiedModel (MUM) at the 2021 IO conference. MUM model can understand 75 languages, and pre-trained a large amount of web data, is good at understanding and answering complex decision-making problems, and can find information from cross-language multimodal web page data, which has application value in Internet scenarios such as customer service, Q&A, and marketing.

Researchers such as Huawei Noah Lab proposed a dynamic resolution network DRNet

Deep convolutional neural networks are well-designed with a large number of learnable parameters to achieve high accuracy requirements for visual tasks. In order to reduce the problem of the high cost of deploying the network on the mobile side, the redundancy on the predefined architecture has recently been greatly achieved, but the redundancy problem of CNN input image clarity has not been fully studied, that is, the clarity of the current input image is fixed.

In October 2021, researchers from Huawei Noah Lab, the University of Chinese Academy of Sciences and other institutions proposed a new type of visual neural network DRNet (DynamicResolutionNetwork). Based on each input sample, the network can dynamically determine the sharpness of the input image. A clarity predictor is set up in the network, the computational cost is almost negligible, and it can be optimized together with the entire network. The predictor can learn the minimum clarity it needs for an image and is even able to achieve performance that exceeds the accuracy of past recognition. Experimental results show that DRNet can be embedded in any mature network architecture, achieving a significant reduction in computational complexity. For example, DR-ResNet-50 can reduce computations by 34% while achieving the same performance, compared to ResNet-50's 1.4-point performance improvement on ImageNet and 10% lower computations.

Lanzhou Technology and other research and development Chinese language model "Mencius"

In July 2021, lanzhou technology-innovation workshop team, Shanghai Jiao Tong University, Beijing Institute of Technology and other units jointly developed the Chinese language model "Mencius", with a parameter scale of only 1 billion, ranking first in the overall ranking of CLUE Chinese comprehension evaluation, as well as the classification ranking and reading comprehension rankings. Among them, the total ranking score exceeded 84 points, approaching the human benchmark score (85.61).

The Peking University team proposed a pulsed vision model that simulates the coding mechanism of primate retinal fovea

Deep learning has enabled great progress in machine vision over the past decade, but there are still huge gaps compared to biological vision, such as the fragility of the counter-attack, and the linear growth of computational complexity with resolution. Recently, the Peking University team proposed a pulsed vision model that simulates the primate retinal fovea coding mechanism, overturned the concept of cameras and videos that have been used for nearly two centuries, the patent was authorized by China, the United States, Japan, South Korea and Europe, developed a pulsed vision chip and camera that is thousands of times faster than human vision and film and television video, and realized the continuous imaging of high-speed physical processes such as high-speed vehicles, transient arcs, and wind tunnel shock waves with ordinary devices, and combined with pulse neural networks, real-time detection tracking and recognition of ultra-high-speed targets under the condition of notebook computing power. In the case of hardware and computing power, the machine vision performance is improved by three orders of magnitude.

The team also deeply studied the neural network structure and signal coding mechanism of the complex dynamic scene of biological retinal coding, proposed and implemented a retinal coding model based on convolutional recurrent neural network (CRNN), which can predict the response of large-scale retinal ganglion cells to dynamic natural scenes with high precision, and can learn the shape and position of receptive wilds of retinal ganglion cells, the model structure is closer to the biological retina, and a more accurate coding model can be learned with fewer parameters. Quantitative indicators to evaluate the complexity of stimulus spatiotemporality and the regularity of the receptive field spatiotemporal regularity are also proposed, and the experimental results reveal that the cyclic connection structure of the network is a key factor affecting the retinal coding, this model not only has biological value, but also has great significance for the design of a new generation of pulsed visual models, chips and even the development of retinal prostheses. Patterns) published.

Read on