laitimes

YOLO算法改进Backbone系列之:HorNet

author:Nuist Object Detection

Driven by a new spatial modeling mechanism based on point product self-attention, recent advances in visual deformers have been used with great success in a variety of tasks. In this article, we show that the key elements behind visual deformers, namely input adaptation, long-distance, and higher-order spatial interaction, can also be effectively implemented with a convolution-based framework. We propose recursively gated convolution (gnConv) to achieve higher-order spatial interaction through gated convolution and recursive design. gnConv is available as a plug-and-play module for improving various vision converters and convolution-based models. On top of that, we've built a new universal vision backbone family called HorNet.

A large number of experiments in ImageNet classification, COCO object detection, and ADE20K semantic segmentation show that the performance of HorNet is significantly better than that of Swin Transformers and ConvNeXt when the overall architecture and training configuration are similar. HorNet has also shown good scalability to accommodate more training data and larger model sizes. In addition to its effectiveness in vision encoders, we also demonstrate that gnConv can be applied to task-specific decoders and consistently improve dense prediction performance with less computational effort. Our findings suggest that gnConv can become a new basic module for visual modeling, which effectively combines the advantages of visual transformers and CNNs.

The following diagram illustrates the core idea of this article: this diagram analyzes the interaction between features (red blocks) and the areas around it (gray blocks) in different operations. (a) Ordinary convolution operations do not take into account spatial information interactions. (b) The dynamic convolution operation uses dynamic weights to consider the information exchange of the surrounding area, which makes the model more performant. (c) The self-attention operation realizes the second-order spatial information interaction through the multiplication of two consecutive matrices between query, key and value. (d) The method proposed in this paper can efficiently realize information exchange of any order with the help of gated convolution and recursive operation. The basic operational trends of visual modeling suggest that the expressive power of the model can be improved by increasing the order of spatial interactions.

YOLO算法改进Backbone系列之:HorNet

The gated convolution structure is shown in the figure below, with the number of output channels indicated in parentheses. Gated convolution is the first way to adjust the number of feature channels through two convolutional layers. Then, the output features of the depth separable convolution are divided into multiple blocks along the features, and each piece is further multiplied by element by element with the features that interact with the previous piece, and finally the output features are obtained. Here, recursion is to continuously multiply elements one by one, and through this recursive method, the more high-order information is stored in the features that are behind, so that there will be enough feature interactions in the higher order

YOLO算法改进Backbone系列之:HorNet

The author uses the four-stage architecture of a typical Transformer network as shown in the following table, replacing attention with gnConv, directly following the number of blocks in each stage of SWIN, and adding an additional block to stage2 to make the overall complexity close, the number of blocks in each stage is [2, 3, 18, 2], and in each stage, the spatial order of gnConv is [2,3,4,5] respectively , the number of channels in the four stages is [C, 2C, 4C, 8C]

YOLO算法改进Backbone系列之:HorNet

Tutorial for adding a model as a backbone in a YOLOv5 project:

(1) Modify the models/yolo.py of the YOLOv5 project to the parse_model function and the _forward_once function of the BaseModel

YOLO算法改进Backbone系列之:HorNet
YOLO算法改进Backbone系列之:HorNet

(2) Create a new Hornet.py in the models/backbone file and add the following code:

YOLO算法改进Backbone系列之:HorNet

(3) Import the model in models/yolo.py and modify it in the parse_model function as follows (import the file first):

YOLO算法改进Backbone系列之:HorNet

(4) Create a new configuration file under the model: yolov5_hornet.yaml

YOLO算法改进Backbone系列之:HorNet

(5) Run verification: specify the --cfg parameter in the models/yolo.py file as the newly created yolov5_hornet.yaml

YOLO算法改进Backbone系列之:HorNet

Read on