laitimes

[NeurIPS 2022] Dynamics-based deep active learning

author:Machine learning and data analysis

Lead

In the rapid development of artificial intelligence, deep learning has become an important tool in many fields, including image recognition, natural language processing, and predictive modeling. However, deep learning models often require large amounts of labeled data for training, which not only consumes a lot of time, but also requires a lot of computing resources. Here, Active Learning comes in, which intelligently selects representative samples of data for labeling and training, reducing the amount of data required and computational costs.

[NeurIPS 2022] Dynamics-based deep active learning

Recently, a paper titled "Deep Active Learning by Leveraging Training Dynamics", published in NeurIPS 2022, furthered the research of deep active learning. The study, from a collaboration between the University of Illinois at Urbana-Champaign and the University of New South Wales, aims to explore how the efficiency of deep active learning can be improved by leveraging neural network dynamics.

[NeurIPS 2022] Dynamics-based deep active learning

Article link: https://arxiv.org/abs/2110.0861

introduce

Deep learning, especially neural network models, has achieved excellence in a variety of tasks. However, their success often relies on large amounts of labeled data, which makes them less practical when data is scarce. In addition, deep learning models often require a lot of computing resources and time to train, which increases the complexity and cost of their applications.

Active learning as a solution attempts to reduce the amount of data required and computational costs by intelligently selecting the most valuable data samples for labeling and training. However, although active learning has been extensively studied in traditional machine learning settings, its application in deep learning scenarios is still a relatively new and unexplored area of research.

A big problem with understanding and analyzing active learning from a classical (non-neural network theory) perspective is that the theoretical analysis of these classical settings may not be applicable to overparameterized deep neural networks, where conventional wisdom is invalid. Therefore, in theory, such analysis is difficult to guide us in designing practical active learning methods. In addition, from the empirical point of view, deep active learning, drawing on the observations and insights of classical theories and methods, has been observed that it cannot be applied in some application scenarios.

On the other hand, the optimization of neural networks and the analysis of generalization performance have seen some exciting developments in deep learning theory in recent years. The training dynamics of deep neural networks using gradient descent can be characterized by neural tangent kernels (NTKs) of infinite width networks. This is further used to characterize the generalization of overparametric networks through Rademacher complexity analysis. So we were inspired to ask the question:

How can we design a practical and universal active learning method with a theoretical basis for deep neural networks?

To answer this question, we first explore the relationship between the performance of the model on the test data and the convergence speed of the hyperparametric deep neural network on the training data. Based on the NTK framework, we theoretically show that if a deep neural network converges faster ("faster training"), then it tends to have better generalization performance ("better generalization"):

We connect optimization and generalization through Alignment

[NeurIPS 2022] Dynamics-based deep active learning

Among them, optimization theory:

[NeurIPS 2022] Dynamics-based deep active learning

Connecting bridges:

[NeurIPS 2022] Dynamics-based deep active learning

Let's first look at the relationship between optimization and bridges:

[NeurIPS 2022] Dynamics-based deep active learning

Generalization theory:

[NeurIPS 2022] Dynamics-based deep active learning

The relationship between generalization and bridges:

[NeurIPS 2022] Dynamics-based deep active learning

Inspired by the above connection, we first introduce training dynamics, i.e. the derivative of training loss pairs iterations, as an agent to quantitatively describe the training process. On this basis, we formally propose our general and theory-driven deep active learning method, dynamicAL, which will query labels for a set of unlabeled samples that maximize training dynamics. To compute training dynamics using only unlabeled samples, we utilize two relaxation methods, pseudo-labeling and subset approximation, to solve this non-trivial subset selection problem. Our relaxation method is able to efficiently estimate training dynamics and effectively solve the subset selection problem by reducing the complexity from O(Nb) to O(b).

[NeurIPS 2022] Dynamics-based deep active learning

Regarding the experiment, we empirically validated our theory by performing extensive experiments on three datasets, CIFAR10, SVHN, and Caltech101, using three types of network structures: CNN, ResNet, and VGG. We first show that the result of the subset selection problem provided by the subset approximation is close to the global optimal solution. In addition, in the active learning setting, our method not only outperforms other baselines, but also scales well on large deep learning models.

[NeurIPS 2022] Dynamics-based deep active learning

summary

In this work, we bridge the gap between theoretical discoveries of deep neural networks and applications of deep active learning in the real world. By exploring the relationship between generalization performance and training dynamics, we propose a theory-based approach, dynamicAL, which selects samples to maximize training dynamics. We show that the convergence speed and generalization performance of training are (positively) strongly correlated under ultra-wide conditions, and we show that maximizing training dynamics will lead to lower generalization errors. Empirically, our work shows that dynamicAL not only consistently outperforms strong baselines in various settings, but also scales well on large deep learning models.