laitimes

CVPR'24's latest PromptAD won the first place in pixel-level anomaly detection!

author:3D Vision Workshop

Source: 3D Vision Workshop

Add a small assistant: dddvision, note: direction + school/company + nickname, pull you into the group. At the end of the article, industry subdivisions are attached

This paper proposes a novel anomaly detection method called PromptAD, which aims to solve the challenge of having only normal samples in low-shot anomaly detection scenarios. The method guides the detection by automatically learning prompts, which includes two key points: one is to use semantic connections to construct enough exception prompts, and the other is to propose explicit exception edge loss, and the edge between normal and exception prompts is clearly determined by hyperparameters. Experimental results show that PromptAD achieves significant performance improvement in image-level and pixel-level anomaly detection tasks, which proves its effectiveness in the case of few samples.

Let's read about this work together~

论文题目:PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection

作:Xiaofan Li, Zhizhong Zhang

作者机构:East China Normal University, Shanghai, China等

Paper link: https://arxiv.org/pdf/2404.05231.pdf

Code link: https://github.com/FuNz-0/PromptAD

This model brings a huge improvement to few-shot industrial anomaly detection, which typically requires hundreds of prompts to engineer through prompts. For automated scenarios, we first used the traditional multi-class paradigm of prompt learning as a baseline to automatically learn prompts, but found that it did not work well in single-class anomaly detection. In order to solve the above problems, this paper proposes a single-class prompt learning method for few-shot anomaly detection, called PromptAD. First, we propose a semantic concatenation, which can transform normal prompts into abnormal prompts by concatenating normal prompts with exception suffixes, so as to construct a large number of negative samples for guiding prompt learning in a single-class setting. In addition, to alleviate the training challenges caused by the lack of anomalous images, we introduce the concept of explicit anomaly boundaries, which explicitly control the boundaries between normal and anomalous cue features through hyperparameters. For image-level/pixel-level anomaly detection, PromptAD took first place in both MVTec and VisA's 11/12 few-shot setups. The code can be found on https://github.com/FuNz0/PromptAD.git.

CVPR'24's latest PromptAD won the first place in pixel-level anomaly detection!

Qualitative comparison of single-pixel-level anomaly detection on MVTec and VisA.

CVPR'24's latest PromptAD won the first place in pixel-level anomaly detection!

Additional qualitative results from LectAD (single-shot) tested on MVTec and VisA.

CVPR'24's latest PromptAD won the first place in pixel-level anomaly detection!

Qualitative results of logical anomaly detection.

CVPR'24's latest PromptAD won the first place in pixel-level anomaly detection!

Qualitative results for minimal anomaly detection.

CVPR'24's latest PromptAD won the first place in pixel-level anomaly detection!
  • This paper discusses the feasibility of prompt learning in a class of anomaly detection, and proposes a class of prompt learning method called PromptAD, which completely defeats the traditional multi-class prompt learning.
  • Semantic concatenation (SC) is proposed, which can transform the semantics of normal prompts by concatenating abnormal suffixes, so as to construct enough negative prompts for normal samples.
  • An Explicit Exception Boundary (EAM) is proposed, which can explicitly control the distance between normal and abnormal suggestive features through hyperparameters.
  • For image-level/pixel-level anomaly detection, PromptAD took first place in the 11/12 few-shot setup for MVTec and VisA.

The basic principle of this article is to use the CLIP model for few-shot anomaly detection. In anomaly detection, usually only normal samples are available for training, while abnormal samples need to be identified during the testing phase. In order to solve this challenge, this paper proposes a PromptAD method.

  • CLIP and Prompt Learning: CLIP is a large-scale language-image pre-trained model known for its powerful zero-shot classification capabilities. Prompt learning is a method that draws on the success of natural language processing and aims to automatically learn effective prompts to improve the performance of CLIP in downstream classification tasks.
  • Semantic Concatenation (SC): In order to solve the problem of lack of abnormal samples in training, the authors propose the SC method. By concatenating a normal prompt with an exception suffix, you can convert a normal prompt into an exception prompt to construct sufficient contrasting prompts. This improves the model's ability to learn anomalous information.
  • Explicit Abnormal Edge (EAM): Due to the lack of abnormal visual samples in training, MAPs and LAPs can only learn from normal visual features as negative samples, lacking an explicit edge between normal and abnormal cues. Therefore, the authors propose an EAM method to control the edge between normal and abnormal prompt features through a hyperparameter, which enhances the model's ability to learn abnormal information during training.
  • Anomaly detection: During the testing phase, anomaly detection is performed using the learned prompts. By calculating the scores at the image level and at the pixel level, combined with visual guidance and prompt guidance, the abnormal parts of the image can be effectively identified.
CVPR'24's latest PromptAD won the first place in pixel-level anomaly detection!

This article introduces a method called PromptAD that aims to solve the problem of anomaly detection. This paper compares the performance of PromptAD with the latest methods at different shot settings, including image-level and pixel-level results. Experimental results show that PromptAD achieves significant improvement in the setting of few samples, especially in the performance of image-level AD.

In terms of datasets, MVTec and VisA are used as benchmark datasets. Both datasets contain multiple subsets, each containing only one object. Anomaly detection is treated as a class of tasks, so the training set contains only normal samples, while the test set contains both normal samples and anomalous samples with image- and pixel-level annotations.

In terms of evaluation metrics, the paper follows the approach in the literature, reporting the area under the receiver operating feature (AUROC) for image-level and pixel-level anomaly detection.

In terms of implementation details, the paper uses the CLIP implemented by OpenCLIP and its pre-training parameters, and uses the default hyperparameter τ. The authors also used the LAION-400M ViT-B/16+-based CLIP.

Experimental results show that PromptAD achieves significant improvement compared with other methods in image-level comparison, especially in the low-sample setting. In pixel-level comparisons, PromptAD achieves the best results at 1-shot and 2-shot settings, and also performs well at 4-shot settings.

In addition, an ablation study was conducted to verify the effects of different modules on the performance of PromptAD. The results show that modules such as semantic concatenation and explicit anomalous edges are essential to improve the effectiveness of prompt learning in anomaly detection.

Overall, the PromptAD method proposed in this paper achieves significant performance gains in few-shot settings, especially for image-level AD. Through the transformation of CLIP and the optimization of modules, PromptAD has achieved competitive results in pixel-level and image-level anomaly detection tasks.

CVPR'24's latest PromptAD won the first place in pixel-level anomaly detection!
CVPR'24's latest PromptAD won the first place in pixel-level anomaly detection!
CVPR'24's latest PromptAD won the first place in pixel-level anomaly detection!
CVPR'24's latest PromptAD won the first place in pixel-level anomaly detection!
CVPR'24's latest PromptAD won the first place in pixel-level anomaly detection!
CVPR'24's latest PromptAD won the first place in pixel-level anomaly detection!
CVPR'24's latest PromptAD won the first place in pixel-level anomaly detection!

In this paper, we propose a novel anomaly detection method called PromptAD, which automatically learns prompts using only normal samples in few-shot anomaly detection scenarios. Firstly, in order to solve the challenge of a class of tasks, we propose semantic joining, which constructs enough exception prompts by connecting normal prompts and exception suffixes to guide prompt learning. Secondly, we propose an explicit anomalous edge loss, which explicitly determines the edge between the normal and abnormal cue features through hyperparameters. Finally, for image-level/pixel-level anomaly detection, PromptAD achieved the first place in the 11/12 few-shot task.

This article is only for academic sharing, if there is any infringement, please contact to delete the article.

3D Vision Workshop Exchange Group

At present, we have established multiple communities in the direction of 3D vision, including 2D computer vision, large models, industrial 3D vision, SLAM, autonomous driving, 3D reconstruction, drones, etc., and the subdivisions include:

2D Computer Vision: Image Classification/Segmentation, Target/Detection, Medical Imaging, GAN, OCR, 2D Defect Detection, Remote Sensing Mapping, Super-Resolution, Face Detection, Behavior Recognition, Model Quantification Pruning, Transfer Learning, Human Pose Estimation, etc

Large models: NLP, CV, ASR, generative adversarial models, reinforcement learning models, dialogue models, etc

Industrial 3D vision: camera calibration, stereo matching, 3D point cloud, structured light, robotic arm grasping, defect detection, 6D pose estimation, phase deflection, Halcon, photogrammetry, array camera, photometric stereo vision, etc.

SLAM: visual SLAM, laser SLAM, semantic SLAM, filtering algorithm, multi-sensor fusion, multi-sensor calibration, dynamic SLAM, MOT SLAM, NeRF SLAM, robot navigation, etc.

Autonomous driving: depth estimation, Transformer, millimeter-wave, lidar, visual camera sensors, multi-sensor calibration, multi-sensor fusion, autonomous driving integrated group, etc., 3D object detection, path planning, trajectory prediction, 3D point cloud segmentation, model deployment, lane line detection, Occupancy, target tracking, etc.

3D reconstruction: 3DGS, NeRF, multi-view geometry, OpenMVS, MVSNet, colmap, texture mapping, etc

Unmanned aerial vehicles: quadrotor modeling, unmanned aerial vehicle flight control, etc

In addition to these, there are also exchange groups such as job search, hardware selection, visual product landing, the latest papers, the latest 3D vision products, and 3D vision industry news

Add a small assistant: dddvision, note: research direction + school/company + nickname (such as 3D point cloud + Tsinghua + Little Strawberry), pull you into the group.

3D Vision Workshop Knowledge Planet

3DGS, NeRF, Structured Light, Phase Deflection, Robotic Arm Grabbing, Point Cloud Practice, Open3D, Defect Detection, BEV Perception, Occupancy, Transformer, Model Deployment, 3D Object Detection, Depth Estimation, Multi-Sensor Calibration, Planning and Control, UAV Simulation, 3D Vision C++, 3D Vision python, dToF, Camera Calibration, ROS2, Robot Control Planning, LeGo-LAOM, Multimodal fusion SLAM, LOAM-SLAM, indoor and outdoor SLAM, VINS-Fusion, ORB-SLAM3, MVSNet 3D reconstruction, colmap, linear and surface structured light, hardware structured light scanners, drones, etc.

Read on