Jeff Dean posted a review: The Golden Decade of Deep Learning

2022-04-18 16:14:24

Reporting by XinZhiyuan

EDIT: LRS

Recently, Jeff Dean published a paper reviewing the golden decade of rapid development of deep learning, with the development of software and hardware as the core, and pointed out three potential research directions in the future: sparse modeling, AutoML and multi-task training.

In the past decade, due to the re-emergence of neural networks with the help of deep learning, unprecedented progress has been made in research in the field of artificial intelligence, computers have the ability to see, hear and understand the world, and advances in AI algorithms have also made great progress in the application of other scientific fields.

Jeff Dean, the head of Google AI, recently wrote an article examining the reasons for the rapid development of AI, including the advancement of hardware specifically designed to accelerate machine learning and the advent of open source software frameworks that have made the use of machine learning more efficient and made it easy for countless non-practitioners to use AI model applications.

The article also provides an extensive overview of the application areas of machine learning over the past decade and discusses some of the possible future directions for AI.

The article was published in the 17-page special issue of AI & Society of D dalus, the journal of the American Liberal Arts and Sciences Association, and was written by Jeff Dean alone.

Article link: https://www.amacad.org/publication/golden-decade-deep-learning-computing-systems-applications

The golden decade of deep learning

Humans have always dreamed of building a "thinking" machine.

In 1956, at a seminar organized by Dartmouth College, John F. Kennedy McCarthy proposed the concept of "artificial intelligence," in which a group of mathematicians and scientists came together to "find out how machines can use language to form abstractions and concepts to solve the various problems that exist in humans and improve themselves."

Participants in the workshop were optimistic that practical progress could be made on these issues after months of focused efforts.

But it's too optimistic to settle it in a few months.

Over the next 50 years, a variety of approaches to AI systems emerged, including logic-based, rules-based expert systems and neural networks. It turns out that the method of encoding the logical rules of the world and using those rules is ineffective.

As an undergraduate in 1990, Jeff Dean said he was fascinated by neural networks, felt they seemed like the right abstractions to create intelligent machines, and was convinced that all humans needed was more computing power to enable larger neural networks to handle larger, more interesting problems.

So his undergraduate thesis was about parallel training neural networks, and he was convinced that if we could train a neural network with 64 processors, then the neural network could solve more real-world tasks.

But it turns out once again that for a 1990 computer, what we need is 1 million times more computing power to make some progress.

It wasn't until around 2011 that AI began to enter a critical phase of development, and in the decade to 2021, we are one step closer to the goals set at that meeting in 1956.

Advances in the combination of hard and soft

In terms of hardware, unlike general-purpose computer programs, deep learning algorithms are often composed of different ways of composing a small number of linear algebraic operations: such as matrix multiplication, vector dot product, and similar operations. With so few instructions to operate, it is possible to build computers or accelerator chips specifically designed to support such calculations.

In the early 2000s, some researchers began to investigate how to use graphics processing units (GPUs) to implement deep learning algorithms. Although originally designed to draw graphs, the researchers found that these devices are also well suited for deep learning algorithms because they have a relatively high floating-point computation rate compared to CPUs.

In 2004, computer scientists Kyong Su Oh and Zheng Ki-quan Keechul Jung demonstrated a nearly 20-fold improvement in a neural network algorithm using a GPU.

In 2008, computer scientist Rajat Raina and colleagues demonstrated that for some unsupervised learning algorithms, using GPUs is 72.6 times faster than the best CPU-based implementations.

Later, modules and chips dedicated to AI acceleration were developed one after another, such as Google's TPU, which targeted 8-bit floating-point computing and was dedicated to deep learning inference, increasing the speed and performance of hardware by one or two orders of magnitude per watt. Later TPU systems consisted of larger-scale chips that connected each other through high-speed customized networks to form PODs and large supercomputers.

As deep learning methods began to show huge improvements in image recognition, speech recognition, and language understanding, and as compute-intensive models (trained on larger datasets) continued to show improved results, the field of machine learning really "took off."

With hardware in place, computer systems designers began to develop software frameworks that extended deep learning models into more computationally intensive, complex areas.

One early approach was to use large-scale distributed systems to train a single deep learning model. Researchers at Google developed the DistBelide framework to train a single neural network using a large-scale distributed system that is two orders of magnitude larger than previous neural networks. It can be trained on a large number of random frames in YouTube videos, and through a large network and enough computation and training data, it proves that a single artificial neuron in the model (the building block of the neural network) will learn to recognize advanced concepts such as a human face or a cat, although no information about these concepts has ever been obtained except for the pixels of the original image.

TensorFlow, developed by Google in 2015 and open source, was able to express machine learning computations and incorporate ideas from earlier frameworks such as Theano and DistBelief. TensorFlow has been downloaded more than 50 million times to date and is one of the most popular open source software packages in the world.

Released in 2016 a year after TensorFlow was released, PyTorch was popular with researchers for making it easy to express a variety of research ideas using Python.

JaX, released in 2018, is a popular python-oriented open source library that combines sophisticated automatic differentiation with the underlying XLA compiler, which TensorFlow also uses to efficiently map machine learning computations to a variety of different types of hardware.

The importance of open source machine learning libraries and tools such as TensorFlow and PyTorch is undeniable, allowing researchers to quickly try out ideas.

As researchers and engineers around the world make it easier to build on each other's work, progress across the field will accelerate!

The future of machine learning

In his paper, Jeff Dean points to areas of research that are emerging in the machine learning research community that, if combined, could yield some valuable results.

1. Sparse activation models, such as the sparsely-gated mixture of expertsE, show how to build very large-capacity models in which only a subset of the models are activated for any given instance, for example, including 2048 experts, of which 2-3 are activated.

2. Automated machine learning (AutoML), where techniques such as Neural Architecture Search (NAS) or Evolutionary Architecture Search (EAS) can automatically learn the efficient structure or other aspects of an ML model or component to optimize the accuracy of a given task. AutoML typically involves running many automated experiments, each of which may contain huge amounts of computation.

3. Multi-task training, training at the appropriate scale of several to dozens of related tasks at the same time, or transferring learning from a model trained with a large amount of data for related tasks and then fine-tuning on a small amount of data for new tasks, these methods have been shown to be very effective in solving various problems.

A very interesting research direction is to combine the above three trends, in which a system runs on large-scale ML accelerator hardware. The goal is to train a single model that can perform thousands or even hundreds of tasks. This model may consist of many components of different structures, and the data flow between instances is relatively dynamic on an instance-by-instance basis. The model may use techniques such as sparse gating experts to mix and learn routing to produce a very large-volume model, but one of the tasks or instances is only a small fraction of the total components in the sparse activation system.

Each component itself may be running some AutoML-like schema search to adapt the component's structure to the type of data that is routed to it. The new task can take advantage of components trained on other tasks as long as it is useful. Jeff Dean hopes that with very large-scale multitasking learning, shared components, and learning routing, the model can quickly complete new tasks with high accuracy, even if there are relatively few new instances of each new task. The reason is that the model is able to take advantage of the expertise and internal representation it has acquired in accomplishing other related tasks.

Building a single machine learning capable of handling millions of tasks and learning to automate new tasks is a real challenge in the field of artificial intelligence and computer systems engineering. This requires expertise in many areas such as machine learning algorithms, responsible AI (such as fairness and explainability), distributed systems, and computer architecture to advance the field of AI by building a system that can be generalized to independently solve new tasks in all areas of machine learning applications.

The decade since 2010 has indeed been the golden decade of deep learning research, and the problems raised at the Dartmouth Conference in 1956 have been overcome one after another, and in effect, they have reached the point where machines can see, hear and understand the world. With AI, humans will continue to create deep learning models that are more complex, powerful, and helpful for everyday life. Thanks to the powerful creativity of deep learning, the future of humanity is also full of more possibilities.

Resources:

https://www.amacad.org/publication/golden-decade-deep-learning-computing-systems-applications

Jeff Dean posted a review: The Golden Decade of Deep Learning

Read on