This article describes the latest work in this group of time anomaly detection directions: ANOMALY TRANSFORMER: TIME SERIES ANOMALY DETECTION WITH ASSOCIATION DISCREPANCY, the spotlight paper from the ICLR 2022 conference. This article explores methods for timing anomaly detection based on correlation differences between different time points.

ICLR 2022 Spotlight | Anomaly Transformer: A time series anomaly detection method based on associated differences

Authors: Xu Jiehui*, Wu Haixu*, Wang Jianmin, Long Mingsheng

Link:https://openreview.net/pdf?id=LzQQ89U1qm_

ICLR 2022 Spotlight | Anomaly Transformer: A time series anomaly detection method based on associated differences

I. Introduction

Real-world systems will have a large amount of continuous data, and the anomaly detection of these time series data is of great significance to ensure system security and avoid economic losses, such as the monitoring of servers and ground-to-air equipment. This article focuses on unsupervised timing anomaly detection.

This issue has received widespread attention. However, classical anomaly detection methods (e.g., LOF, OC-SVM) rarely consider time information, which limits their application in timing scenarios. Based on deep network methods, which often learn the point-level representation of time series data through a recurrent network (RNN), and then rely on reconstruction or prediction error to make judgments, however, the amount of information represented at the point level learned by such methods is small and may be dominated by normal patterns, making outliers difficult to distinguish.

Therefore, how to obtain more informative representations and thus define more differentiated criteria is particularly critical for timing anomaly detection.

2. Motivation

Unlike point-level representations, we note that in a time series each point can be represented by its association with the entire series, and can be represented as a distribution of its associated weights in the temporal dimension. At the same time, anomalies are difficult to relate strongly to all points in the sequence than normal points, and tend to focus more on neighboring areas due to continuity. This correlation difference between the entire sequence and the adjacent prior provides a natural, strongly distinguishing criterion for timing anomaly detection.

Based on the above observations, we propose that the Anomaly Transformer model is used to model time series correlation, and at the same time, the Minimax strategy is used to highlight the difference between normal and abnormal points, and the time series anomaly detection based on association discrepancy is realized. Anomaly Transformer achieved SOTA effects on 5 datasets in different domains.

3. Methods

3.1 Anomaly transformer

The Anomaly transformer model contains Anomaly-Attention for modeling time series associations, and its overall structure is an alternating stack of Anomaly-Attention and feed-forward layers, which facilitates the model to learn potential time series associations from multi-level features.

As shown in the figure below, Anomaly-Attention (left) models both prior-associations and series-associations of data. In addition to the reconstruction error, our model employs the Minimax maximum strategy to further increase the gap between the correlation difference between the outlier and the normal point.

Overall architecture

3.1.1 Anomaly-Attention

We propose a new attention mechanism, Anomaly-Attention, which is used to uniformly model prior associations and sequence associations, thereby facilitating the calculation of association differences.

· For a priori associations , we take a learnable Gaussian distribution whose center is on the index of the corresponding sequence. This design can take advantage of the unidirectional distribution to make the model more focused on adjacent points. At the same time, in order to make prior associations adapt to different timing patterns, Gaussian transcendentals contain learnable parameters.

· For sequence associations , it is obtained by the standard transferor attention calculation, and the sequence association of a point is the attention distribution of the corresponding row of the point in the attention matrix. This branch is to learn the associations in the original sequence, so that the most effective associations are learned adaptively.

Compared to point-by-point representations, both of our associations do a good job of maintaining the dependence of each point on the temporal dimension, resulting in a richer representation. They also reflect the true associations of neighboring a priori and learned, respectively. The difference between the two, known as association discrepancy, naturally distinguishes normal points from outliers.

3.1.2 Association Discrepancy

To quantify the difference between normal and outlier points, we define the correlation difference as follows, which is obtained by the symmetric KL distance calculation in the a priori and sequence associations.

3.2 Extremely minimally correlated learning

To learn representations unsupervised, we use reconstruction errors for model optimization. At the same time, to increase the gap between normal and outlier points, we used an additional correlation difference loss to increase the correlation difference. In this design, due to the unipolar nature of the prior associations, the new correlation difference losses drive the sequence associations to pay more attention to non-adjacent areas, making the reconstruction of anomalies more difficult and thus more discernible.

However, direct minimization of the correlation difference will make the Gaussian prior difference sharply smaller, and the result will be meaningless the prior distribution. Therefore, in order to better control the process of associative learning, we have adopted a Minimax strategy.

In the minimization phase, we have transcendental associations approximate sequence associations learned from the original time series, a process that adapts prior associations to different timing patterns. During the maximization phase, we optimize the sequence associations to maximize the differences between the associations. This process will make sequence associations more aware of non-adjacent points.

We combined the normalized correlation difference with the reconstruction error to develop a new anomaly detection criterion:

4. Experiments

We have conducted model validation on 5 datasets in different fields, covering multiple applications such as service detection and ground-to-air exploration. Anomaly Transformer achieves SOTA effects in all five benchmarks. For more benchmark models and data descriptions, see the paper.

4.1 Ablation experiments

To further verify the validity of the proposed modules, we designed an ablation experiment to verify the validity of the proposed training strategy, prior associations, and new anomaly criteria.

4.2 Visual Analysis

For 5 different types of exceptions, we visualized their differentiation under different exception criteria. A higher value means that the point is more likely to be abnormal. It is not difficult to find that the correlation-based anomaly judgment curve has more accurate differentiation.

For 5 different types of exception categories, we visualized the size of the resulting learning in a priori associations. It is not difficult to find that the one at the outlier is relatively small in the entire sequence, which means that its association with the non-adjacent part is very weak, which is also in line with our assumption that the outlier is strongly associated with the entire sequence.

4.3 Optimization Strategies

To verify the effect of optimization strategies on association learning, we compared the effects of direct reconstruction error training, direct maximization of correlation differences, and great and minimal correlation differences.

As shown in the table, directly maximizing the correlation difference will cause Gaussian prior optimization problems, which can degrade the effect. The maximum minimization strategy allows the a priori association to add a strong restriction on the sequence association, which can eventually achieve a more discerning result.

5. Summary

This paper focuses on the problem of unsupervised time series anomaly detection, proposes an anomaly detection model an aromatic transformer based on correlation differences, and greatly improves the anomaly detection ability of the model through a maximum minimization learning strategy.

Anomaly transformer has shown excellent anomaly detection results in applications such as service monitoring, ground-to-air exploration, and water flow observation, and the model has good effect robustness and strong application landing value.

This article is from: public number [THUML]

Author: Xu Jiehui

Illustration by Igor Kapustin from icons8

-The End-

Scan the code to watch!

New this week!

About my "door"

▼

Shomen is a new venture capital firm focused on discovering, accelerating and investing in technology-driven startups, including Shomun Innovation Services, Shomun Technology Community and Shomun Venture Capital Fund.

Founded at the end of 2015, the founding team was built by the original team of Microsoft Venture Capital in China, and has selected and in-depth incubated 126 innovative technology-based startups for Microsoft.

If you are a start-up in the field of technology, you not only want to get investment, but also want to get a series of continuous, valuable post-investment services,

Welcome to send or recommend items to my "door":

⤵ One click to send you into techbeat happy planet