laitimes

Discriminative or generative: which one represents the future of visual understanding?

Selected from simonsfoundation.org

By Grace Lindsay

Machine Heart Compilation

Editor: Qian Zhang

For decades, discriminative and generative approaches to understanding visual processing have led researchers down different paths, but now the two are merging.

Historically, much of the foundational work on the visual system has been done in a very simple way: showing an image to an animal, measuring the response of its neurons, showing another, and then repeating.

Such an approach is based on the assumption that visual processing can be understood as a kind of rote input-output transformation. Scientists study cells as if they are simply reacting based on the visual features present in the image, and then these reactions can be used to distinguish between different images.

While this understanding of the visual system has yielded fruitful results in many ways, it has always cast doubt on the part of some researchers. Some agree that the anatomical and kinetic findings of the visual system suggest that it does not simply respond in a "bottom-up" manner. Instead, it may generate some reactions based on a model that reflects the way the world works.

This debate between the "discriminative" and "generative" visual methods has been going on for decades. Although both models are designed to explain the visual processing process, both approaches stem from different philosophical and mathematical traditions. The consequence of this situation is that different researchers only use their favorite methods, rather than collaborating, thus creating a gap between the two paradigms.

Advances in both computer vision and computational neuroscience in recent years have demonstrated the limitations of this binary division approach, driving the development of broader modeling of visual processing. This requires representatives of both sides to come together and clarify their respective views and the consensus and differences between the two sides.

In September 2021, researchers presented proposals on this topic at the opening event of the Generative Adversarial Collaboration (GAC) of the Virtual Cognitive Computational Neuroscience (CCN) Conference.

Generative Adversarial Collaboration is a process launched by CCN in 2020 to allow researchers to explicitly and effectively raise scientific disagreements. Researchers can submit a controversial topic proposal to the CCN, and a small number of proposals will be selected for discussion at the GAC's activities. The following year, GAC organizers present a position paper outlining progress plans for these thematic areas and presenting that progress at that year's conference.

The 2021 GAC, which has a topic on generative and discriminant models in visual systems, is a team of 11 researchers. Some people use discriminant methods, some people use generative methods, but all are interested in exploring the intersection between the two. According to their proposal, the campaign aims to determine "whether our intellectual heritage has undifferentiated our intuitions about visual algorithms, plunging us into the wrong dichotomy."

"Simple and fast" and "flexible and slow"

In order to construct a debate framework, it is first necessary to know what a discriminant system and a generation system are. But maybe that's the first point of disagreement.

In the field of statistics, discriminant models and generative models have simple definitions. A discriminant model is a model that calculates the probability of a potential variable or a potential cause given an observation. In terms of visual processing, these underlying variables are objects in the world, and observations are light hitting the retina. For example, the model does some calculations on the pixels in an image to determine which objects are most likely to be present. Instead, the generative model calculates the joint probability of a potential variable and an observation. This requires knowing the likelihood that certain objects will exist in general, not just their likelihood in a given image.

While the calculations of these different probability distributions are technically completely different, the lines between the two begin to blur as these calculations map to the brain. "If you look closely, everything will collapse," said Niko Kriegeskorte, a neuroscientist at Columbia University and a GAC spokesman. The field lacks a rigorous definition of generative and discriminant models, and what emerges in the neuroscience research literature is better described as a loose set of associations.

Models that represent the discriminative side tend to be feed-forward, simple, and fast. For example, deep feed-forward convolutional neural networks are exemplary discriminant processing. These models are often trained in a supervised way: they learn to map images to labels, such as learning to classify images of cats and dogs. The resulting model can receive a new image and quickly tag it. Discriminant systems like these networks typically work in a bottom-up fashion, forming a simple response to their direct input. Because of the way they are trained, they are also thought to be dedicated to specific tasks, such as object recognition.

In contrast, generating models is slow, but they are also more flexible, rigorous, and more expressive. They often rely on unsupervised training methods with the aim of gaining a basic understanding of the statistics and structure of the world, which is then used for prediction. For example, in a world where cats are more common than dogs, a generative model might use visual visions of paws to predict that long beards are also present, and ultimately conclude that there are cats in the image. Structurally, these models are more likely to have recurrent connections, particularly from higher visual regions or top-down connections to the frontal cortex that transmits predictive signals to the visual system. They are also more likely to represent information in terms of probability distributions, which can lead to uncertainty associated with any given visual perception.

Scientists have reason to believe that both processes may play a role in the brain. Proponents of the generative approach point to its intuitive appeal and consistency with introspection. After all, we can produce visual perceptions in the form of mental images and dreams; this phenomenon is impossible without any top-down influences or models of the inner world. Learning general principles about how the world works can also make the generative system more adapted to the new environment.

During the GAC event, Josh Tenenbaum, a neuroscientist at the Massachusetts Institute of Technology and a researcher at the Simmons Global Brain Partnership (SCGB), applied image filters to illustrate this in his speech video: Because our vision system knew that videos could be filtered using different visual effects, such as changes in color and contrast, even though they were new to us, we were able to identify image content with this effect applied.

Proponents of the discriminative approach point out that it has had tangible success in interpreting neural data. Deep convolutional neural networks trained to classify images provide some of the best models for predicting real neural activity in response to complex visual inputs. We also know that the feed-forward path of the visual system can implement object classification very quickly, which is consistent with the discriminant model.

The two models are at different stages of development, and it is difficult to compare their strengths. The current discriminant model can process images in real-world use, which gives it an advantage over generating models. However, this may reflect more of what researchers can do on computers than what the brain can do. Currently, generative models are difficult to train and build, and can only run on toy problems, rather than the real challenges faced by visual systems. Without models that are as good at image processing as today's discriminant models, generative methods have no chance of beating discriminant models at quantitative predictions of neural activity. This contrast between them is a bit like comparing today's cars with self-driving cars. Self-driving cars may have some nice features, but if you need to run around today, they won't help much.

"At the end of the day, you have to have a model to test," said Jim DiCarlo, a neuroscientist and SCGB researcher at the Massachusetts Institute of Technology. At the GAC event, DiCarlo, representing the discriminator, demonstrated the power of a discriminant model trained on target recognition to predict neural activity. "Once someone has built a new image computing model, only the experimental data at that time can be used to determine the accuracy of the model relative to other models."

In a way, this reduces the debate about generative methods versus discriminant methods from the dimension of engineering. Even though generative methods have a lot of intuitive positive implications, researchers still need to make them work in practice in order to make large-scale comparisons with brain activity. At the moment, they cannot. But generative models aren't always at a disadvantage either. Given their various properties, and especially their ability to train without much labeled data, machine learning researchers hope they will become useful in the future.

"It's important that we don't confuse what we think is easy or can be done now with what the brain can do." Ralf Haefner, a neuroscientist at the University of Rochester, said at the event.

Explore the crossroads

As GAC panellists point out, many models don't fit perfectly into one category or another. Cyclic discriminant models exist, some generative models can be fast, and so on. Benjamin Peters, a neuroscientist at Columbia University, said in the discussion that forcing the brain to be framed by boxes defined by statisticians and engineers is risky. "We shouldn't be too rigid, but we should draw inspiration from algorithms."

For example, a vision system can use discriminant components to achieve quick and easy visual perception, but still contain build elements for deeper functionality. Alternatively, a built-in generative model can use its predictions of the world to help provide training data for the discriminating part of the brain. In her talk, Harvard neuroscientist Talia Konkle argues for acknowledging the separation between perception and cognition, with perception being a discriminative process and cognition being a more generative process.

Some hybrid methods are already popular in the field of machine learning. For example, in the training method of contrast learning, network learning groups similar things (such as different clipped fragments of the same image) and distinguishes between different things. This approach has a build component — training doesn't require explicit target labels, it creates a representation that captures a large number of relevant statistics in the data. At the same time, it can also be well applied to the typical feed-forward structure of discriminant models. It does learn to distinguish between similar and different images.

Given that these models may fall into the same scope, some researchers question whether it makes sense to focus on binary divisions. "Are these really the terms we want to converge on?" Kim Stachenfeld of DeepMind asked. Scientists and engineers acknowledge that a clear distinction between generative and discriminative processing is not necessary to build an efficient system. Nor is this distinction necessary to understand the brain. "If you think it's an either-or issue, you're missing the point," Kriegeskorte said. "I'm not sure we'll still think about this in this binary division in 10 to 20 years."

Part of the purpose of the GAC is to explore the divergence between discriminative and generative models as a means of moving the field forward.

Stachenfeld argues that this attempt is useful, by organizing the approach to visual systems into two camps and then "seeing what's left," and what's outside of both camps shows what new terminology and new ideas are needed in the field. Others also believe that this discussion helps to clarify which features are really necessary for each type of modeling approach, and how to consider evidence for each train of thought in the brain. Kriegeskorte notes that in using the terminology of these models, he now "avoids the stupid mistakes that he used to make so often."

Are these conceptual advances important? The real test will be how much they affect the experiment. Kriegeskorte says that experimental design is an area where it is difficult to make real progress.

Doris Tsao, a neuroscientist and SCGB researcher at the California Institute of Technology, proposes an experimental approach: to isolate the generative components of the nervous system and study its effects on neural activity without feed-forward inputs about the current state of the world. Previous studies of patients with lesions of the corpus callosum (the transverse bundle of nerve fibers that connect the left and right cerebral hemispheres) have provided some hints. In the case where part of the pathway between the two hemispheres is cut off, the researchers show words like "knight" to the right hemisphere through the left eye, causing patients (with the influence of feedback connections in the left hemisphere) to describe the visual scene of the knight, even without any visual stimulation or conscious awareness of the word. Tsao believes that similar experiments on animals could help determine top-down generative pathways that are responsible for reminiscent of such images. However, GAC participants disagreed on whether the artificial isolation of the build system would help clarify its function under normal circumstances.

Most participants agreed that experiments that needed to focus more on brain-generating abilities were needed. Nicole Rust, a neuroscientist and SCGB investigator at the University of Pennsylvania, has made arguments for studying visual predictions, such as the ability to predict what will happen next in a video. DiCarlo says he plans to do more experiments, inspired by the advantages of generative processing.

Over the next year, the group will continue to discuss concrete steps to advance research and share their progress with the wider community through publications and events.

Read on