laitimes

"Deep Learning Attention Mechanisms" TKDE 2022 Research Review

Reporting by XinZhiyuan

Source: Specialized

Attention Mechanism (Attention Mechanism) is a commonly used module in deep learning, as a resource allocation scheme, the limited computing resources are used to process more important information, is the main means to solve the problem of information overload. Below is a review paper on the Mechanisms of Deep Learning Attention from Erasmus University's Gianni Brauwers and Flavius Frasincar published at TKDE.

Attention is an important mechanism that can be used for a variety of deep learning models across many different domains and tasks. This review provides an important overview of the mechanisms of deep learning attention.

The various attention mechanisms are explained through a framework consisting of an attention model, uniform symbols, and a comprehensive categorical attention mechanism.

On this basis, this paper reviews various methods of attention model evaluation and discusses the structural representation methods of attention model based on this framework. Finally, the future work in the field of attention models is envisioned.

Thesis Link: https://ieeexplore.ieee.org/document/9609539/

introduction

The idea of simulating human attention first emerged in the field of computer vision, trying to reduce the computational complexity of image processing while improving performance by introducing a model that focuses only on specific areas of the image rather than the entire image.

However, the real starting point for the attention mechanisms we know today usually stems from the realm of natural language processing. Bahdanau et al. implemented attention in machine translation models to solve certain problems in the recurrent neural network structure.

After Bahdanau et al. emphasized the merits of attention, attention technology improved and quickly became popular for tasks such as text classification, image subtitling, sentiment analysis, and speech recognition.

Attention has become a popular technique in deep learning for several reasons. First, the model that integrates the attention mechanism has achieved state-of-the-art results in all of the above tasks and many others.

Most attention mechanisms can be trained in conjunction with basic models, such as recurrent neural networks or convolutional neural networks that use regular backpropagation. Attention introduces a specific type of interpretation to neural network models, which is often considered very complex.

The introduction of the Transformer model further demonstrates the effectiveness of attention, further increasing the popularity of attention mechanisms. Attention was originally introduced as an extension of recurrent neural networks. However, the Transformer model presented in is a major development in attention research, as it demonstrates that the attention mechanism is sufficient to build a state-of-the-art model.

This means that some drawbacks can be avoided, such as the fact that recurrent neural networks are particularly difficult to parallelize. Like the introduction of the original attention mechanism, the Transformer model was created for machine translation, but was quickly used for other tasks such as image processing, video processing, and recommendation systems.

The purpose of this review is to explain the general forms of attention and to provide a comprehensive overview of attention techniques in deep learning. The main difference between this review and previous studies is that other reviews generally focus on attention models within a domain. However, this review provides an overview of attention techniques across a field.

We will discuss attention techniques in a generic way that enables them to be understood and applied to various fields. We found that the classification methods proposed in previous studies lacked the depth and structure needed to properly distinguish between various attention mechanisms. Some important attention techniques have not been properly discussed in previous reviews, while other proposed attention mechanisms seem to lack technical details or intuitive explanations.

Therefore, in this paper, we propose important attention techniques by using a single framework of unified symbols, combined with techniques and intuitive explanations, and make a comprehensive classification of attention mechanisms.

Universal attention model

This section describes a general and corresponding symbol of attention. The framework described in this section will be used in the remainder of this article.

In order to implement a general attention model, it is first necessary to describe the general characteristics of a model that can use attention. First, we refer to the complete model as the task model. This model accepts only one input, performs the specified task, and produces the desired output.

For example, a task model can be a language model that takes a piece of text as input and outputs a summary of content, sentiment classification, or text that is translated verbatim into another language. Alternatively, the task model can take an image and generate a caption or segmentation for that image. The task model consists of four submodels: the feature model, the query model, the attention model, and the output model.

Attention taxonomy

There are many different types of attention mechanisms and extensions, and one model can use different combinations of these attention techniques. Therefore, we propose a taxonomy that can be used to classify different types of attention mechanisms.

Based on whether the technology is designed to handle specific types of feature vectors (correlation features), specific types of model queries (query correlations), or it's just a general mechanism, divided into three broad categories. Further explanations of these categories and their subcategories are provided in the following subsections.

Feature-related attention mechanisms

Based on a specific set of input data, the feature model extracts feature vectors, allowing the attention model to focus on these different vectors. These traits may have specific structures that require special attention mechanisms to process them. These mechanisms can be classified to handle one of the following feature features: diversity of features, level of features, or representation of features.

Universal attention mechanism

This major category includes attention mechanisms that can be applied to any type of attention model. The structure of the component can be broken down into the following sub-areas: attention scoring function, attention alignment, and attention dimension.

Query the relevant attention mechanism

Queries are an important part of any attention model because they directly determine what information is extracted from the feature vector. These queries are based on the expected output of the task model and can be interpreted as literal problems. Some queries have specific characteristics that require specific types of mechanisms to handle them.

Thus, this class encapsulates the attention mechanism that handles the characteristics of a particular type of query. The mechanisms in this class deal with one of two query characteristics: the type of query or the diversity of queries.

Attention model evaluation

In this section, we describe the evaluation of various types of attention models.

First, we can use taxonomy to evaluate the structure of the attention model. For such an analysis, we consider the attention mechanism category as the orthogonal dimension of the model. You can analyze the structure of a model by determining the mechanism that the model uses for each category.

Second, we discussed various techniques for evaluating the performance of attention models. Among them, the performance of the attention model can be evaluated by external or internal performance measurements.

conclusion

This study reviews recent research progress on attention models in deep learning. Attention mechanisms have become a significant development in deep learning models, as they have been shown to significantly improve model performance, producing state-of-the-art results in a variety of tasks in several research areas.

We present a comprehensive classification that can be used to classify and explain different amounts of attention mechanisms presented in the literature. The organization of the taxonomy is based on the structure of the task model, which consists of a feature model, an attention model, a query model, and an output model. In addition, attention mechanisms are discussed using query, key, and value-based frameworks.

Finally, we show how to use extrinsic and intrinsic measurements to evaluate the performance of an attention model, and how to use classification methods to analyze the structure of an attention model.

Resources:

[1] H. Larochelle and G. E. Hinton, “Learning to combine foveal glimpses with a third-order Boltzmann machine,” in 24th Annual Conference in Neural Information Processing Systems (NIPS 2010). Curran Associates, Inc., 2010, pp. 1243–1251.

[2] V. Mnih, N. Heess, A. Graves, and k. kavukcuoglu, “Recurrent models of visual attention,” in 27th Annual Conference on Neural Information Processing Systems (NIPS 2014). Curran Associates, Inc., 2014, pp. 2204–2212.

Read on