Yang Likun, the father of convolutional neural networks: The discovery of intelligent principles is the ultimate problem of AI

In The Road to Science: Man, Machine and the Future, Turing Prize winner Yann LeCun, the father of convolutional networks, argues that in the history of science, technological products often precede theories and sciences that explain their work. Discovering the underlying mechanisms and principles of intelligent work is his research plan for the next few decades.

At the new book sharing meeting, Huang Tiejun, dean of Beijing Zhiyuan Artificial Intelligence Research Institute, Liu Zhiyuan, associate professor of the Department of Computer Science of Tsinghua University, and Yuan Lanfeng, a well-known science and science blogger, had a high-level discussion on whether AI is technology or science, and the biological inspiration of artificial intelligence. The Zhiyuan community excerpted the core content of the interview and sorted out the original meaning without changing the original meaning.

Guest | Yang Likun (Yann LeCun, Professor, New York University), Huang Tiejun (Professor, School of Computer Science, Peking University), Liu Zhiyuan (Associate Professor, Department of Computer Science and Technology, Tsinghua University), Yuan Lanfeng (Guest Host, Associate Researcher, University of Science and Technology of China)

Organize | Li Mengjia, Zhou Zhiyi

The Birth of Neural Networks – The Past and Present Lives of Backpropagation Algorithms

(First of all, Yang Likun introduced the birth process of neural networks and backpropagation algorithms)

Yang Likun: Most people know me from convolutional neural networks. This model is a special way of organizing the connections between neurons and organizing neurons into multi-layered structures inspired by the structure of the visual cortex of mammals. This model structure is well suited for applications such as image recognition and even medical analysis.

For example, there are now induction cameras on the windshield of vehicles, which can identify obstacles in front of the car and automatically stop in time to avoid collisions. Today's large-screen TVs can get high-resolution video from low-resolution signals. This is also used in automated tumor detection systems in medicine, such as X-rays and medical imaging techniques such as MRI. These are all derived from convolutional neural networks. The technology I've been involved in inventing is ubiquitous and even life-saving, something I'm proud of.

In fact, the original idea of neural networks came from the classic works of Hubel and Wesel in the field of neuroscience in the 1960s. In the 1970s and 1980s, Japanese scientist Kunihiko Fukushima built the first convolutional neural network based on unsupervised learning, but there was no backpropagation algorithm at that time, and the training effect was limited.

My colleagues and I were the first to successfully practice convolutional neural networks (i.e., Yang Likun proposed LeNet at Bell Labs in 1988-1989). At that time, there were no tools such as PyTorch and TensorFlow, and I needed to write my own deep learning environment, and I didn't have Python, and I needed to write my own language for interaction. There were no Linux and Windows operating systems at the time, requiring expensive platforms to process image systems. So when we put in the effort to build the tools that would make it happen, and carefully designed the structure to make it work, we opened up a new technological breakthrough. In my opinion, this breakthrough is not a breakthrough in knowledge, because from the perspective of knowledge, its principle already exists.

Yang Likun, the father of convolutional neural networks: The discovery of intelligent principles is the ultimate problem of AI

Yang Likun proposed LeNet while at Bell Labs

The success of convolutional neural networks is inseparable from backpropagation algorithms. In 1986, I wrote a paper in French and published it, but no one cared about it, and if I wrote it in English, it might be seen by more people, and I wrote about this story in the book. At the time I independently proposed a prototype of a backpropagation algorithm, not knowing that Hinton had a similar view. In the 1960s, even though it was known that the use of multilayer convolution could improve the effectiveness of neural networks, it was not possible to design suitable optimization algorithms. Because they use binary neurons, the discontinuation of the function results in the absence of derivative functions in some regions, and the model cannot use the backpropagation algorithm based on the chain rule for parameter optimization.

The backpropagation algorithm is the basic optimization algorithm of deep learning. The design idea of this algorithm is related to the optimal control theory in modern control theory proposed in the 1960s, so its basic idea is very old. Aside from the chain rule, it doesn't require any complicated mathematical formulas. The idea of using chain rules in a multilayer structure such as a neural network didn't germinate until the 1980s. Over the next 10 years, people again lost interest in backpropagation due to theoretical obscurity, believing that such algorithms had no promise, even if the opposite was true. From the early 2000s to the 2010s, it took me and Geoffrey Hinton, Andrew Ng, and others a lot of hard work to convince the industry that it worked: it wasn't a fluke, it wasn't an accident.

Yuan Lanfeng: In The Road to Science, you mentioned that you invented a model called HLM (Hierarchical Linear Model), which, although the structure is very simple, is only one step away from deep learning because it uses non-continuous functions. If you use Sigmoid or other continuous functions instead, is it possible for the model to succeed?

Yang Likun: Yes. The reason I insisted on using binary neurons at that time was because computers weren't as fast as they are today. So I think if you use binary neurons, you can greatly reduce the computation time. To be able to implement backpropagation, I think each neuron can backpropagrate the target output, rather than the mid-range variable. However, the results show that in order to be able to update the model parameters effectively, continuous variables are still required. So in the circumstances at the time, HLM was really only one step away from the right approach.

AI: Technology or Science?

Huang Tiejun: I think AI is first and foremost technology, not science. What AI researchers need to do is build and design powerful intelligent systems. If the system is working well, we try to explore the reason why the system is working well, which is science. So my point is that AI is technology first, and then we'll look at the principles and systems, which is the basic point I make in the preface to the Chinese translation of your book. So I'd like to talk to you about that.

Yang Likun: In my opinion, the primary attribute of AI is innovation, that is, conceiving and designing a new product, a new system and a new idea, which is indeed a creative act. It's engineering work, like the career of artists. What scientists have to do is come up with new concepts to describe the world, and then use the scientific method to study the principles of interpreting systems, which are also two aspects of AI. Studying AI is both a technical and a scientific problem.

The ultimate problem is, we're trying to figure out what intelligence is. We need to build NOT only AI systems for visual and natural language understanding, but also the nature of intelligence. In the case of the steam engine, new inventions will advance theoretical research. More than a hundred years after scientists invented the steam engine, thermodynamics was born, and thermodynamics is essentially the foundation of all science or natural science. So the artifacts we invent in AI could be science or intelligence itself, or that's our vision.

Huang Tiejun: You share some examples in the book, the Wright brothers in 1903, and the earlier Clement, who invented the airplane. More than thirty years later, Theodore von Carmen discovered the theory of aerodynamics. In this case, the invention of the airplane is at least as important as aerodynamics. So for artificial intelligence, for example, deep learning works very well, it is an invention, a contribution, it is a very powerful artificial intelligence system. Of course, we need to explore why deep learning is so effective, but that could be many years from now. Maybe twenty or thirty years, or even more, it will be discovered. As dean of BAAI, I think someone needs to explore the principles of AI systems. At the same time, more people may be needed to design more powerful systems.

About biologically inspired intelligence

Yang Likun: I learned that BAAI also studies the structure of the human brain, trying to understand the working mechanism of the brain. In terms of power consumption, the efficiency of the brain is now much higher than that of computers. Suppose that to make a computer reach the computing power of the human brain, the energy consumption may be a million times that of the human brain, but this does not mean that the computer can replicate what the brain does.

The brain consumes only 25 watts of energy, which is equivalent to a normal GPU. How exactly does the human brain do it? Biology gives me a lot of inspiration, just as convolutional neural networks are inspired by the architecture of the visual cortex. But, as I mentioned in The Science Trail, if scientists rely too much on biology and try to replicate some of the minutiae of biological phenomena without understanding the basic principles, it will be difficult to build accurate and effective systems.

As an example, Clement Adair, a pioneer in the French aviation industry in the late 19th century, was a brilliant engineer who built aircraft that could actually take off on its own power in the 1890s, 30 years before the Wright brothers. But his plane was shaped like a bird and lacked control. So after the plane took off, at an altitude of about 15 centimeters from the ground, it flew 15 meters and crashed. The reason for this is that he only considered bionics but did not really understand the principles.

Adair's aircraft is full of imagination, and he is a genius in engine design, but due to the lack of theoretical support of aerodynamics, his design has not gone far. So it's an interesting lesson for people trying to be inspired by biology, and we also need to understand what the rationale is. There are a lot of details in biology that don't matter.

Huang Tiejun: I agree with your point about biological inspiration or brain inspiration, but there is a little difference, about the principles of the brain, brain scientists have been exploring for at least 100 years. But for AI, new designs can be made every ten or twenty years. For me, the biological inspiration is the structure of the visual cortex, to inspire us to design a new ANN structure, referring to the principle of neuronal synapses, etc., not necessarily the whole brain theory. We design artificial neural networks based on the resources available, and that's what we do at BAAI.

Yang Likun: The question is actually whether to use pulses. Today's artificial neural networks basically encode the output of neurons digitally, using numbers to indicate their activity. But neurons in the brain don't output voltage or anything like that, they output pulse signals. The intensity of this output is called the pulse frequency, so one of the questions is whether it's important to use pulse signals like the brain, rather than just numbers as is currently the case in artificial neural networks.

Many people have this doubt, and some people think that we should use pulses derived from hardware design, because in terms of software energy consumption, pulse signal transmission is more economical, even if slower.

In the case of pulsed neural networks, for example, everyone thinks it has magic. People don't understand the principle, just because the brain uses pulses, it introduces pulse signals into the neural network, I don't agree. Moreover, today's best-performing neural networks do not use pulse signals for transmission. This is my answer to whether I should get more inspiration from neurobiology.

Huang Tiejun: On this point, I would like to share with you something that I am proud of. One of my PhD students recently published a paper on the principles of retinal coding. She designed a CNN-based neural network to simulate human eye activity. This is one of the best models to date. My team is designing an "ultra-high-speed pulse camera."

Yang Likun: I know, this was a hot topic in the late 1980s and early 1990s, a pulse-based analog circuit that reproduces the function of neurons, using a pulsed neural network to encode data. This topic was in the doldrums for a while, but it is now back because of interest in low-power hardware for AI and neural networks. As for its advantages over traditional applications, I think it will take years of research to prove. Of course, a great deal of work is still needed in this area.

Interestingly, for all vertebrates, the retina is an extension of the brain. The eye collects light, while the retina must compress the information that passes through the eye. Because the nerves that connect the retina to the brain have to pass through an eyeball site, which is a physiological blind spot, the part of the eyeball that the visual fibers converge to pass out to the visual center (without photosensitive cells), and if that position is in your field of vision and does not cause vision, you see nothing. Although the brain receives information, in reality you don't see anything.

The retina has about 50 million to 100 million photoreceptor cells, but the optic nerve is connected to the brain by only a million, so a lot of pre-processing and dynamic compression must be done in the retina before the images can be transmitted.

This is an evolutionary error. This is true of all vertebrates. But invertebrates are not. Octopus and squid have nerves behind the retina, so they don't have this problem. This is better design. Invertebrates evolved more fortunately than vertebrates. So we can ask ourselves a question. If we want to reproduce a visual system with similar performance to humans, do we have to solve the bottleneck of information transmitted between the retina and the brain? Neuroscientists are building network models to process information transmitted between the retina and the brain. This is my view of evolution. Biology is also not absolutely correct. In this respect, vertebrates are a bit unlucky.

Huang Tiejun: Yes, I completely agree. In fact, my students designed a model to simulate retinal function, which was started from a biological point of view. At the same time, my own team designed a camera. As you said, the camera design doesn't care about the output bandwidth, so the output fiber will transmit the action potential to the computer at high speed. We're doing both at the same time.

How to look at large-scale NLP pre-trained models

Liu Zhiyuan: In recent years, we have witnessed the successful development of machine learning methods from supervised learning to self-supervised learning. We can fine-tune it with a lot of unlabeled data on pre-trained language models, and the amount of parameters has grown to hundreds of billions. I'm curious what you think about this massive pre-trained language model?

Yang Likun: The artificial intelligence community has undergone major changes in the past two or three years, and a new type of neural network architecture, Transformer, has been introduced. In fact, this architecture is similar to a memory module, when a series of vectors are input into the model, which can generate another associated vector that restores the related memory by querying.

So Transformer is an architecture in which a large number of related memory modules are arranged in a specific form, which can mine and store the information contained in the training data. When a string of text is entered into a pre-trained Transformer model, it can be used to predict the next word. These models have a huge number of parameters ranging from as few as billions to as many as trillions. With a very rich source of data, the amount of data is staggering. This is equivalent to giving the model some text-encoded human knowledge, so that it learns the transcendental information of the human world, and the content of this prior knowledge is often extremely rich, so the model can often have a stunning performance in processing natural language.

The use of large-scale natural language processing pre-trained models such as Transformer marks the beginning of a revolution in the field of deep learning by methods of self-supervised learning. Unlike traditional supervised learning, reinforcement learning, and other mechanisms, self-supervised training a model is not to complete a specific task, but to train its ability to understand data. So this learning method works by removing 10%-15% of the words in the sentence and training the system to predict the missing vocabulary. In the process, the system begins to build an understanding of the meaning of the text.

For example, if you use "cat chasing in the kitchen" as input to the model to predict the next word, the answer should be a mouse or other small animal, because such predictions fit the logic of the real world. If the "lion chases in the prairie" as the input to the model, then the output should be gazelles or other herbivores. While based on this limited information, the model may not be able to derive exactly what it is chasing, but due to the presence of cats, lions, kitchens, and savannahs as a priori, the model can predict a rough range. If you only input "XX in XX chase" to the model, due to the lack of actors and specific scenarios, it will be difficult for the model to determine what the vacant position should be filled in.

Therefore, this kind of self-supervised learning based on large-scale pre-trained models is to inject the system with a priori knowledge of the human world, so that when processing language tasks, it can make reasonable judgments based on the context information of the input vector. This self-supervised learning method is also a huge change in the task of natural language processing.

About self-supervised learning

Liu Zhiyuan: You call pre-trained language models a revolution. Do you think pre-trained models or self-supervised learning are the way to achieve the ultimate goal of AI? If so, how can we improve the effect of self-supervising models?

Yang Likun: My answer is yes. I think one of the great opportunities for AI right now is to learn from humans and animals, and the best paradigm is self-supervised learning. Self-supervision will transform AI and allow AI to make greater progress. This way of learning allows people to train models with a small amount of data. When the system is required to complete a specific task, there is no need for an excessive amount of data, just label the data according to the existing supervised training method.

Within two months of a child's birth, they learn basic things, such as the world is three-dimensional, objects are placed in front and back orientations, and each direction around them has a relative distance. These are very simple concepts. After that, he learned that objects still exist even when they are out of sight. This is called the permanence of things. By eight or nine months, he had learned that if objects were unsupported, they would fall, and that gravity had an effect on every object. In the first nine months of life, children learn a lot of basic environmental knowledge and understand how the world works. They built cognitive models of the world in their brains that allowed them to predict what was about to happen, understand the world, distinguish between animals and still lifes, and figure out how to move objects and assemble parts.

In my opinion, this learning mechanism is very similar to self-supervised learning, but it is very different from the supervised learning and reinforcement learning we use now. I think that studying the learning principles of the brain is a more effective way to replicate the principles of learning rather than directly reproducing the function of the brain, because the brain is too complex. Turing said in the 1950s that if you want to create intelligent machines, it is more reasonable to copy the brains of children than adults, because machines can learn and evolve themselves.

Motivation for writing The Road to Science

Yuan Lanfeng: On behalf of the public, I would like to ask a question: Why did you write this book?

Yang Likun: The reason is simple, because there is demand. People have witnessed their lives being changed by artificial intelligence and realized that there will be even greater changes in the future. Therefore, for the public, it is important to understand some knowledge of artificial intelligence. The book is divided into three parts. The first part is history, explaining the basic concepts, explaining the development of neural networks and deep learning; the second part describes the basic principles of mathematics, algorithms and computer science, readers do not need to have the relevant knowledge reserves, only have a high school level and above can read it; the last part is the application of artificial intelligence in today's world, including machine translation, content review, computing systems, etc., and also introduces the future development trend and explores what artificial intelligence is studying. What I'm talking about in this section is my subjective point of view, not the consensus of industry experts. What are the possible impacts of AI on society? What are the potential applications? I express my views on the future in this section.

All in all, the first and third parts can be read by all readers, and if you want to understand the principles and get some inspiration, you can read the second part. I joined the second part because, back in my own student days, I was obsessed with artificial intelligence, but artificial intelligence was still in its infancy. For a beginner, what is eager is a book that is concise and concise about the basic principles, rather than a book full of obscure concepts.

So another purpose of writing this book is to inspire young students to learn more about artificial intelligence, because this is an attractive and important field.

How to look at Artificial Intelligence Research in China

Yuan Lanfeng: Finally, do you have anything to say to Chinese readers?

Yang Likun: In my opinion, Chinese young people are enthusiastic about artificial intelligence. And not only young people, but also the government attaches great importance to investing, researching and deploying artificial intelligence. Over the past few decades, China's scientific community has been very active and has made incredible achievements, and one of the most outstanding areas is artificial intelligence and deep learning, and half of the computer vision summits are Chinese papers.

In addition, I am optimistic about the application of technology. But at the same time, we should also recognize that artificial intelligence is a double-edged sword, and the pros and cons depend on how it is used. In China, Europe and the United States, people also have different views on the use and acceptance of AI in society. The wrong use of AI invades privacy. We need to pay attention to how to build legal systems and policies or business regulations at the national level to protect the public from the negative effects of AI. Of course, this process takes some time to precipitate.

This article is reprinted with permission from the WeChat public account "Zhiyuan Community", originally titled "Yann LeCun: Discovering the Principle of Intelligence is the Ultimate Problem of AI" | Exclusive Conversations".

Special mention

1. Enter the "Boutique Column" at the bottom menu of the "Return to Simplicity" WeChat public account to view the series of popular science articles on different topics.

2. "Return to Park" provides the function of retrieving articles on a monthly basis. Follow the official account, reply to the four-digit year + month, such as "1903", you can get the index of articles in March 2019, and so on.

Yang Likun, the father of convolutional neural networks: The discovery of intelligent principles is the ultimate problem of AI

Read on