laitimes

Ng's 2021 review shows that these big events have influenced AI this year

author:Data-pie THU
Ng's 2021 review shows that these big events have influenced AI this year
来源:AI前线本文共5000字,建议阅读10+分钟本文中吴恩达回顾了 2021 年全球人工智能在多模态、大模型、智能语音生成、Transformer 架构、各国 AI 法律举措等方面的主要进展。           
Ng's 2021 review shows that these big events have influenced AI this year

Recently, machine learning giant Andrew Ng published his latest article in the artificial intelligence weekly "The Batch" edited by him. In the article, Ng reviewed the main progress of global artificial intelligence in 2021 in multimodality, large models, intelligent speech generation, Transformer architecture, and AI legal initiatives in various countries.

A few days ago, Wu Enda issued a Christmas message with the theme of "giving people roses and having residual fragrance in their hands".

As the end of 2021 approaches, you may be working less to prepare for winter break. I'm looking forward to taking a break from work for a while, and I hope you're the same.
December is sometimes referred to as the season of giving. If you have free time and want to know how to take advantage of it, I think one of the best things each of us can do is think about how we can help others.
The historian and philosopher Will Durant once said, "Repetitive actions make us." If you're constantly looking to elevate others, it will not only help them, but perhaps just as importantly, it will also make you a better person. It is your repetitive behavior that defines who you are. There's also a classic study that shows that spending money on others may make you happier than spending it on yourself.
So, this holiday season, I hope you can take a break for a while. Rest, relax, recharge! Connect with people you love but haven't had enough time to connect in the past year. If time permits, do something meaningful to help others. This can be leaving encouraging comments in a blog post, sharing suggestions or encouragement with friends, answering an AI question on an online forum, or donating to a worthy cause. Among the charities related to education and/or technology, my favorites are the Wikimedia Foundation, Khan Academy, the Electronic Frontier Foundation, and the Mozilla Foundation.

Ng also talked about the development of the AI community. He said: The AI community has a strong spirit of cooperation when it is very small. It feels like a bunch of fearless pioneers marching around the world. People are eager to help others, offer advice, encourage each other, introduce each other. Those who benefit from it are often unrequited, so we reciprocate by helping our successors. As the AI community grows, I want to maintain that spirit. I pledge to continue my efforts to build the AI community. Hope you can too!

I also hope you'll consider ways, big or small, to reach out to people outside the AI community. There are still many places in the world that do not have advanced technology. Our decisions affect billions of dollars and the lives of billions of people. This gives us a special opportunity to do good in the world.

Ng reviewed the progress of global artificial intelligence in 2021 and looked forward to the development prospects of AI technology in 2022 and beyond.

Review 2021

Over the past year, the entire world has been battling extreme weather, economic inflation, supply chain disruptions, and the COVID-19 virus.

In the tech space, telecommuting and online conferencing have been throughout the year. The AI community continues its efforts to bridge the world and advance machine learning while strengthening its ability to benefit industries.

This time, we want to focus on the future of AI technology in 2022 and beyond.

The take-off of multimodal AI

While deep learning models such as GPT-3 and EfficientNet, which are separately targeted at tasks such as text and images, have gained a lot of attention, the most impressive thing this year has been the progress that AI models have made in discovering the relationship between licenses and images.

Background information

OpenAI uses CLIP (match images to text) to match Dall · E (generating images from input text) kicked off multimodal learning; DeepMind's Perceiver IO set out to classify text, images, videos, and point clouds; and Stanford's ConVIRT experimented with adding text labels to medical X-ray images.

An important benchmark

While most of these new multi-mode systems are in the experimental stage, breakthroughs have already been made in practical applications.

  • The open source community combines CLIP with a generative adversarial network (GAN) to develop compelling digital artwork. Artist Martin O'Leary used Samuel Coleridge's epic work Kublai Khan as input to generate the psychedelic "Sinuous Rills".
  • Facebook says its multi-modal hate speech detector is able to flag and remove 97 percent of abusive and harmful content on the social network. The system accurately classifies memes as "benign" or "harmful" based on 10 data types, including text, images, and videos.
  • Google says it has added multi-modal (and multi-language) features to search engines. Its multitasking unified model returns text, audio, images, and video links in response to queries submitted in 75 languages.

Behind the news

This year's multimodal development stems from decades of solid research.

Back in 1989, researchers at Johns Hopkins University and the University of California, San Diego developed a vowel-based classification system to recognize audio and visual data in human speech.

Over the next two decades, more research groups have tried multi-modal applications such as digital video library indexing and evidence-based/visual data classification of human emotions.

Development status

Images versus text are so complex that researchers can only focus on one of them for a long time. During this time, they developed a number of different technological achievements.

But over the past decade, computer vision and natural language processing have been effectively fused in neural networks, making it possible to eventually merge — and even audio integration has gained room for participation.

Trillion-level parameters

Over the past year, the model has undergone a process from large to larger.

Google kicked off 2021 with Switch Transformer, the first model in human history with trillion-level parameters, totaling 1.6 trillion.

The Beijing Artificial Intelligence Research Institute responded with the 1.75 trillion parameters of Enlightenment 2.0.

Simply pulling up the model parameters is nothing special. But as processing power and data sources grew, deep learning began to truly establish the "bigger is better" development principle.

Deep-pocketed AI vendors are piling up parameters at a feverish pace, both to improve performance and to show off "muscle." Especially when it comes to language models, Internet vendors provide a lot of unlabeled data for unsupervised and semi-supervised pre-training.

Since 2018, this parametric arms race has gone from BERT (110 million), GPT-2 (1.5 billion), MegatronLM (8.3 billion), Turing-NLG (17 billion), GPT-3 (175 billion), and finally over the trillion mark.

It's fine, but...

The expansion route of the model also brings new challenges. The growing model puts developers with four tough hurdles.

  • Data: Large models need to absorb a lot of data, but traditional data sources such as web and digital libraries often don't provide that much high-quality material. For example, BookCorpus, a common use by researchers, is a dataset of 11,000 e-books that has been used to train more than 30 large language models; but it contains certain religious biases because the content primarily discusses Christianity and Islamic teachings and has little to do with other religions.

The AI community realizes that data quality will directly determine model quality, but has been unable to reach a consensus on effective ways to compile large-scale, high-quality datasets.

  • Speed: Today's hardware is still difficult to handle large-scale models, and when data repeatedly enters and exits memory, the training and inference speed of the model is seriously affected.

To reduce latency, the Google team behind Switch Transformer developed a way to have individual tokens process only one subset of each layer of the model. Their best model predicts even 66% faster than traditional models with only one-thirtieth of their parameter volume.

In addition, the DeepSpeed library developed by Microsoft has chosen the route of processing data, layers and layer groups in parallel, and reduced processing redundancy by dividing tasks between CPUs and GPUs.

  • Energy consumption: Training such a large network consumes a lot of electrical energy. A 2019 study found that training a transformer model with 200 million parameters on eight NVIDIA P100 GPUs resulted in carbon emissions (measured from fossil fuel power generation) equivalent to the total emissions of an average car in five years.

Of course, next-generation AI-accelerated chips such as Cerebras' WSE-2 and Google's latest TPU are expected to reduce emissions, while the supply of wind, solar and other clean energy sources is increasing in tandem. It is believed that the damage to the environment of AI research will become more and more minor.

  • Model delivery: These massive models are difficult to run on consumer-grade or edge devices, so true scale deployments can only be achieved via Internet access or a thin version – though both are currently problematic.

The main force in the natural language modeling ranking is still the 100-billion-level model, after all, the processing difficulty of trillion-level parameters is too high.

But it's safe to say that more trillion-dollar club members will join in the coming years, and the trend will continue. There are rumors that the GPT-3 successor in OpenAI's planning will contain even more terrifying peta-billion-level parameters.

AI-generated audio content is becoming "mainstream"

Musicians and filmmakers have become accustomed to using AI-enabled audio production tools.

Professional media producers use neural networks to generate new sounds and modify old ones. The voice actors were naturally very unhappy about this.

Generative models learn features from existing recordings to create convincing replicas. There are also producers who use the technology directly to create original sounds or imitate existing sounds.

  • American startup Modulate uses a generative adversarial network to synthesize new voices for users in real time, enabling gamers and conversation users to establish their own virtual characters, and some transgender people use it to adjust their voices to obtain a sound that is consistent with gender identity.
  • Sonantic is a startup that specializes in sound synthesis. Actor Val Kilmer lost most of his vocal capacity due to throat surgery in 2015, and the company used the original footage to create a tone specifically for him.
  • Filmmaker Morgan Neville hired a software company to recreate the voice of the late travel show host Bourdain in his documentary, The Wanderer: A Film About Anthony Bourdain. But the move drew the ire of Bourdain's widow, who said she had not given permission to the act.

There is more to this controversy.

Voice actors are also worried that the technology will threaten their livelihoods. Fans of the 2015 game of the year, The Witcher 3: Wild Hunt, even used this technique to recreate the voices of the original voice actors in the Fanmount Mod version.

The recent trend towards the mainstreaming of audio generation is a natural continuation of early research.

  • OpenAI's Jukebox is trained on 1.2 million songs and can generate fully real-time recordings using autoencoders, converters and decoder pipelines in styles ranging from Elvis Presley to Eminaham.
  • In 2019, an anonymous AI developer devised a technology that allowed users to reproduce the sounds of animated and video game characters in as little as 15 seconds using lines of text.

Generating audio and video not only gives media producers an added ability to repair and enhance archived footage, but also allows them to create new, hard-to-tell material from scratch.

But the ethical and legal problems that arise from this are also increasing. If voice actors are completely replaced by AI, who will bear their losses? What ownership disputes are involved in reproducing the voice of a deceased person in a commercial work? Can ai be used to launch a new album for a late artist? Is this the right thing to do?

An architecture that governs everything

The Transformer architecture is rapidly expanding its reach.

The Transformers architecture was originally developed for natural language processing, but has become a "panacea" for deep learning. In 2021, people are already using it to discover drugs, recognize speech and images, and more.

Transformers has demonstrated its performance in vision missions, earthquake prediction, protein classification and synthesis, and more.

Over the past year, researchers have begun to push it into new and broader areas.

  • TransGAN is a generative adversarial network that combines transformers to ensure that each pixel generated is consistent with the pixels that have been generated before. This result effectively measures the similarity between the generated images and the original training data.
  • Facebook's TImeSformer uses this architecture to identify action elements in video clips. Its task is no longer to recognize sequences of words from text, but to try to interpret sequence relationships in video frames. Its performance is better than convolutional neural networks, which are able to analyze longer video clips in less time, so energy consumption is also controlled at a lower level.
  • Researchers at Facebook, Google, and the University of California, Berkeley, trained GPT-2 on text and then froze its self-attention and feed-forward layers. Based on this, they can fine-tune models for different use cases, including math, logic problems, and computer vision.
  • DeepMind has released an open source version of AlphaFold 2 that uses transformers to predict the 3D structure of proteins based on amino acid sequences. The model has caused an uproar within the medical community, and it is widely believed that it has great potential to advance drug discovery and reveal biological principles.

Transformer debuted in 2017 and quickly changed the way language processing models were designed. Its self-attention mechanism can track the relationship between elements in a sequence with other elements, which can be used not only to analyze word sequences, but also to analyze sequences of pixels, video frames, amino acids, seismic waves, etc.

Large language models based on transformer have established new objective standards, including model pre-training on large unlabeled corpora, fine-tuning for specific tasks with a limited number of labeled examples, and so on.

The good ubiquity of the Transformer architecture may indicate that in the future we will create AI models that can solve multiple problems in multiple fields.

In the development of deep learning, several concepts have rapidly gained popularity: The ReLU activation function, the Adam optimizer, the attention attention mechanism, and now the transformer.

Developments over the past year have proven that this architecture does have a strong vitality.

Governments have enacted laws related to artificial intelligence

Governments are writing new laws and proposals to control the impact of AI automation on modern society.

With the potential impact of AI on privacy, fairness, security, and international competition, governments are also increasing their regulation of AI.

AI-related laws tend to reflect countries' value judgments in the political order, including how to strike a balance between social equity and individual freedom.

  • The European Union drafted regulations prohibiting or restricting machine learning applications based on risk categories. Real-time facial recognition and social credit systems are explicitly prohibited; application directions such as control of critical infrastructure, law enforcement assistance, and biometrics require the submission of detailed documentation to prove that AI solutions are safe and reliable and subject to continuous manual supervision.

The draft rule, which was released in April this year, is still in the legislative process and is not expected to be implemented in the next 12 months.

  • Starting next year, China's Internet regulators will enforce oversight of AI systems and recommendation algorithms that could undermine social order and good customs. Targets include systems that disseminate disinformation, induce addictive behavior and jeopardize national security. Businesses must obtain approval before deploying any algorithm that could sway public sentiment, and none of the offending algorithms can go live.
  • The U.S. government has introduced an AI Bill of Rights Act to protect citizens from systems that may violate privacy and citizens' rights. The government will continue to gather public comments on the proposal until January 15 next year. Below the federal level, several state and city governments began to restrict facial recognition systems. New York City passed a law requiring bias audits of hiring algorithms.
  • The UN High Commissioner for Civil Rights is calling on member states to suspend certain uses of AI, including possible violations of human rights, restrictions on access to basic services, and misuse of private data.

The AI community is gradually moving towards a regulatory consensus.

A recent survey of 534 machine learning researchers found that 68 percent of respondents believe that model deployment should indeed value trustworthiness and reliability. Respondents also generally trusted international institutions such as the European Union and the United Nations more than they trusted governments.

Outside of China, most AI-related regulations are still under review. But judging from the current proposal, AI practitioners must prepare for the inevitable prospect of full government involvement.

Original link:

https://read.deeplearning.ai/the-batch/issue-123/

Read on