laitimes

Professor Wu Si of Peking University: There are infinitely more solutions to the visual recognition of the human brain

author:Lei Feng network
Professor Wu Si of Peking University: There are infinitely more solutions to the visual recognition of the human brain

Author | Twilight

Edit | Bush

On June 22, the Beijing Zhiyuan Conference held a special forum on the foundation of cognitive neuroscience, and Professor Bi Yanchao from the State Key Laboratory of Cognitive Neuroscience and Learning of Beijing Normal University, Professor Fang Fang from the School of Psychology and Cognition of Peking University, Professor Liu Jia from the Department of Psychology of Beijing Normal University, Professor Wu Si from the Department of Computer Science of Peking University, and Professor Yu Shan from the Institute of Automation of the Chinese Academy of Sciences made reports to explore what inspiration cognitive neuroscience can bring to AI.

The fourth speaker was Professor Wu Si of the Department of Computer Science at Peking University, and his speech was entitled "The Dialogue Between Biological Vision and Computer Vision".

In the report, Professor Wu Si pointed out that there is a very big difference between the visual recognition mechanism of organisms and the image recognition mechanism of deep neural networks, and the visual recognition of organisms involves the interaction of top-down pathways and bottom-up pathways, while deep neural networks only simulate the second pathway. The top-down visual pathway involves the globality, topology, and multi-solution characteristics of biological visual perception, especially when understanding images, it will face mathematical infinite solution problems, and these characteristics may be the next improvement direction of deep neural networks.

Professor Wu Si of Peking University: There are infinitely more solutions to the visual recognition of the human brain

The following is the full text of the speech, ai technology comments have been made without changing the original meaning of the collation.

My report illustrates the interaction between neuroscience and AI research by influencing each other in biological vision and computer vision research. Both fields are essentially unlocking the black box of intelligence, so it's only natural that the two inspire each other.

Professor Wu Si of Peking University: There are infinitely more solutions to the visual recognition of the human brain

1

Deep neural networks, the engine of the rise of artificial intelligence in recent years, have been very successful, and the recognition rate of objects in some large data sets even exceeds that of humans. However, deep neural networks still face many problems.

First, deep neural networks are more likely to simulate the way of feeding forward and hierarchical information processing in the visual cortex of the brain. But the brain's visual system is much more complex than that, so in many ways the human brain and deep neural networks are very different. In many tasks, people are more intelligent.

Professor Wu Si of Peking University: There are infinitely more solutions to the visual recognition of the human brain

Let's take a simple example. As shown in the image below, on the left is a bear, and the local information of the bear is removed, leaving only the outline, and we humans can recognize it as a bear at a glance. The figure on the right divides the bear into small pieces and then scrambles, leaving only the local information and the global information gone. We can find that these small pieces contain the eyes, mouth, and body of the bear, but it is difficult to recognize that the figure on the right is a bear, and the deep neural network recognizes the figure on the right as a bear at a glance.

Through comparison, it can be found that the object recognition mechanism of deep learning networks is very different from that of humans. Humans can obtain global information about objects for recognition, while current deep neural networks can only use local information for recognition.

Professor Wu Si of Peking University: There are infinitely more solutions to the visual recognition of the human brain

The inability to obtain global information is a fundamental problem faced by deep learning, especially feed-forward neural networks, and this basic problem has long been realized. Marvin Minsky, a pioneer in artificial intelligence, pointed out in 1969 that feed-forward neural networks are difficult to identify topological properties.

Topology is the study of properties of geometry or space that remain unchanged after continuously changing shape. It only considers the positional relationships between objects and does not consider their shape and size. In topology, important topological properties include connectivity and compactness.

Global information is difficult to obtain using feed-forward networks, and even when it comes to obtaining it, the computational complexity is growing exponentially. The acquisition of topology information and global information is a fundamental problem faced by deep learning networks.

Professor Wu Si of Peking University: There are infinitely more solutions to the visual recognition of the human brain

Therefore, it is necessary to understand how biological vision systems obtain global information. There has been a wide debate in neuroscience about whether humans recognize objects based on global or local information. The typical example corresponding to these two views is two schools of painting, as shown in the following figure, the painting on the left belongs to impressionism, if you only look at the part, you can't see the eyes or nose, but as long as you identify it from the whole, you can know that this is a man, which is an example of object recognition from global information. The painting on the right belongs to Cubism, this painting especially enlarges each local information, Picasso said that the painting is a beautiful girl, but many people think that it is not visible, because the local information can not be spelled out into the overall information, which is an example of object recognition from the local information.

Professor Wu Si of Peking University: There are infinitely more solutions to the visual recognition of the human brain

Deep learning networks are gradually constructing complex information to identify objects by aggregating local information, on the contrary, there is a theory in the field of cognitive neuroscience called "reverse hierarchy", which states that human recognition of objects is from simple to complex, from the whole to the local.

The "reverse hierarchy" is consistent with our life experience, if a person is fleeting in our vision, you will immediately react that it is an individual, and then identify the identity of the other party, which is a process of identification from the whole to the details.

Professor Wu Si of Peking University: There are infinitely more solutions to the visual recognition of the human brain

We look at a major difference between human visual cognition and machine learning from a neuroscience perspective. The figure below shows an experiment in which the participants were blind. Blindness is when the level of consciousness "cannot see" an object but can "perceive" its existence.

Professor Wu Si of Peking University: There are infinitely more solutions to the visual recognition of the human brain

Numerous experiments have shown that to see or be aware of objects, humans need object information to be received at least in the visual cortex v1. Assuming that v1 is damaged, blindness may occur. Objects can also be perceived because the subcortical pathway is still present, a short path from the retina to the superior oval and then to the higher cortex.

Professor Wu Si of Peking University: There are infinitely more solutions to the visual recognition of the human brain

Scientists have done a better job of demonstrating this using animal experiments. They placed the rats in cages, and a dynamic stimulus appeared on the ceiling, where a small spot of light quickly became larger, mimicking the light signals received by the rat's retina when the eagle swooped down on the rat in the natural environment. At this time, the first instinct of the mouse is to pretend to be dead. Scientists have found that manipulating neuronal responses at the upper mound allows mice to see the movement spot and no longer pretend to be dead, or even if there is no movement spot, the mouse actively pretends to be dead. This experiment showed that the instinctive rapid response followed the subcortical pathway without taking the upper cortical pathway simulated by the deep neural network.

In the above-mentioned experiment in which the mouse treated the moving spot as an eagle, the mouse did not deliberately identify whether the stimulus was a spot or an eagle, and immediately pretended to be dead. This is the animal's instinctive response, that is, the mouse can recognize the movement pattern without doing the detailed feature extraction.

Referring to this example, we propose a new algorithm that does not do feature extraction when recognizing movement patterns. We built a model that consists of two parts, the bottom left of the figure below is the external input, and the network in the black circle represents the "retina". The calculation of the "retina" here is very simple, it projects the motion mode into high-dimensional space, making the motion mode linearly separable, and then inputting it to the choice network. The "retina" has a particularly large number of neurons, equivalent to a network of libraries. We don't need to train the library network and the choice network, we only need the connection between the training library network and the choice network.

Professor Wu Si of Peking University: There are infinitely more solutions to the visual recognition of the human brain

With regard to the choice network, I will illustrate it with two neurons, as shown in the figure below, each decision neuron represents a type of motor pattern to be recognized. The dynamics of these neurons are particularly slow, because to recognize movement patterns, the key is to grasp the input temporal structure, not just the spatial structure. There is mutual inhibition between these decision neurons, each neuron collects evidence through the input of the library network, and if the evidence supports its own encoded motor patterns, the neuron's response will inhibit the activity of other neurons and ultimately win.

Professor Wu Si of Peking University: There are infinitely more solutions to the visual recognition of the human brain

The computational essence of this model is the recognition of space-time patterns, so we can generalize this model and use it for gait recognition. In this task, the person walks in front of the screen 1-2 times, and then enters the gait into the model to identify. The advantage of this model is that it can be trained on small samples, and only 1-2 times of data can be used to immediately learn the gait characteristics of a person.

Professor Wu Si of Peking University: There are infinitely more solutions to the visual recognition of the human brain

2

We introduce a psychophysical experiment to show that identification from the whole to the part is actually inevitable. Please take a look at the image presented in the image below and guess what it is.

Professor Wu Si of Peking University: There are infinitely more solutions to the visual recognition of the human brain

If you haven't seen this picture in the past, you won't be able to guess it, so I drew the outline of the image.

Professor Wu Si of Peking University: There are infinitely more solutions to the visual recognition of the human brain

Now you can see that it is a cow in the picture. If you remove the outline of the cow, you still think that it is a cow in the picture, because you already have the a priori knowledge of the cow from top to bottom in your brain. But that's just one answer. I can also draw the outline of a hand and then remove the outline, and then you will feel that it is a hand in the picture, because you have the a priori knowledge of the top-down hand.

Professor Wu Si of Peking University: There are infinitely more solutions to the visual recognition of the human brain

I can also draw a fish in the picture, and I'm sure you will think that it is a fish in the picture.

Professor Wu Si of Peking University: There are infinitely more solutions to the visual recognition of the human brain

This experiment showed that top-down signals from the cerebral cortex are important when humans recognize objects.

This simple experiment reveals a profound mathematical problem of image understanding, namely that given an image, its interpretation is theoretically infinitely more. Note that image understanding is not the same as object recognition, which involves two basic operations, one is image segmentation and the other is object recognition.

Professor Wu Si of Peking University: There are infinitely more solutions to the visual recognition of the human brain

But the order of the two is a difficult paradox of chicken and egg or egg and chicken: give you an image, without proper segmentation, how to do a good job of identification; but on the other hand, if the object is not identified in advance, how to do a suitable segmentation? Mathematically, an image has an infinite number of ways of segmentation and recognition, so mathematically this is an unsettled problem. Whether it is humans or AI, image understanding faces such a dilemma.

The brain's approach to solving this problem is a process of "guessing and confirming." When we identify an object, the image information of the object is quickly transmitted to the higher cortex, that is, through the so-called fast pathway, making a guess in the advanced cortex. The guess results are then connected by feedback, cross-corroborated with new inputs, and so on, before the object can be identified.

It's hard to be aware of this process in our daily lives, because in everyday life, many times it only takes a turn or two to successfully identify. But it is true that sometimes an image is not very clear, we will stare at it left and right, the brain may be uploaded and transmitted inside the information alternately, constantly carrying out "guess - confirm - guess - confirm", as long as the confirmation result is negative, the process will continue until the positive result is obtained.

Neurobiology amply demonstrates that the recognition mechanism of the human brain is indeed the case. Anatomically, there are more feedback connections from the advanced visual cortex to the primary visual cortex than feed-forward connections, compared to deep learning networks that primarily consider feed-forward connections. Electrophysiological experimental evidence also suggests that the brain's recognition of objects occurs first in the higher visual cortex and then in the lower visual cortex.

In general, biovisual recognition has at least two pathways, and the fast pathway identifies the object as a whole, and the result is that the slow pathway helps to identify the local information of the object.

Here's a look at how overall recognition might improve local recognition through feedback, in one of our recent pieces. When we consider the identification of objects, we first identify large categories of objects, and then help identify small categories based on large categories of information. For example, if we see a picture, first identify this as an animal, then identify this as a cat, and further identify what breed of cat this is. We found that large categories of information can help identify small categories of information through positive and negative feedback information.

The first step is push feedback, which suppresses interclass noise. Suppose the higher brain region recognizes that the object is a cat and tells the lower brain region not to process the dog's information anymore. This is positive feedback, enhancing the cat's information, suppressing the dog's information. The second step is the pull feedback, which suppresses the noise within the class, that is, subtracts the average of the cat commonality in the cat's information and amplifies the nuances between different cats.

Professor Wu Si of Peking University: There are infinitely more solutions to the visual recognition of the human brain

In general, the recognition mechanism of biological vision is very different from the image recognition mechanism of deep neural networks, and the visual recognition of organisms involves the interaction of top-down pathways and bottom-up pathways, while deep neural networks only simulate the second pathway. Top-down visual pathways involve the globality, topology, and multi-solution nature of biological visual perception, which may be the next improvement direction of deep neural networks. Cognitive neuroscience and artificial intelligence should talk to each other and learn from each other, and according to past experience, this often brings surprises.

Lei Feng network Lei Feng network Lei Feng net

Read on