YOLO算法改进Backbone系列之:HorNet

Driven by a new spatial modeling mechanism based on point product self-attention, recent advances in visual deformers have been used with great success in a variety of tasks. In this article, we show that the key elements behind visual deformers, namely input adaptation, long-distance, and higher-order spatial interaction, can also be effectively implemented with a convolution-based framework. We propose recursively gated convolution (gnConv) to achieve higher-order spatial interaction through gated convolution and recursive design. gnConv is available as a plug-and-play module for improving various vision converters and convolution-based models. On top of that, we've built a new universal vision backbone family called HorNet.

A large number of experiments in ImageNet classification, COCO object detection, and ADE20K semantic segmentation show that the performance of HorNet is significantly better than that of Swin Transformers and ConvNeXt when the overall architecture and training configuration are similar. HorNet has also shown good scalability to accommodate more training data and larger model sizes. In addition to its effectiveness in vision encoders, we also demonstrate that gnConv can be applied to task-specific decoders and consistently improve dense prediction performance with less computational effort. Our findings suggest that gnConv can become a new basic module for visual modeling, which effectively combines the advantages of visual transformers and CNNs.

The following diagram illustrates the core idea of this article: this diagram analyzes the interaction between features (red blocks) and the areas around it (gray blocks) in different operations. (a) Ordinary convolution operations do not take into account spatial information interactions. (b) The dynamic convolution operation uses dynamic weights to consider the information exchange of the surrounding area, which makes the model more performant. (c) The self-attention operation realizes the second-order spatial information interaction through the multiplication of two consecutive matrices between query, key and value. (d) The method proposed in this paper can efficiently realize information exchange of any order with the help of gated convolution and recursive operation. The basic operational trends of visual modeling suggest that the expressive power of the model can be improved by increasing the order of spatial interactions.

The gated convolution structure is shown in the figure below, with the number of output channels indicated in parentheses. Gated convolution is the first way to adjust the number of feature channels through two convolutional layers. Then, the output features of the depth separable convolution are divided into multiple blocks along the features, and each piece is further multiplied by element by element with the features that interact with the previous piece, and finally the output features are obtained. Here, recursion is to continuously multiply elements one by one, and through this recursive method, the more high-order information is stored in the features that are behind, so that there will be enough feature interactions in the higher order

The author uses the four-stage architecture of a typical Transformer network as shown in the following table, replacing attention with gnConv, directly following the number of blocks in each stage of SWIN, and adding an additional block to stage2 to make the overall complexity close, the number of blocks in each stage is [2, 3, 18, 2], and in each stage, the spatial order of gnConv is [2,3,4,5] respectively , the number of channels in the four stages is [C, 2C, 4C, 8C]

Tutorial for adding a model as a backbone in a YOLOv5 project:

(1) Modify the models/yolo.py of the YOLOv5 project to the parse_model function and the _forward_once function of the BaseModel

(2) Create a new Hornet.py in the models/backbone file and add the following code:

(3) Import the model in models/yolo.py and modify it in the parse_model function as follows (import the file first):

(4) Create a new configuration file under the model: yolov5_hornet.yaml

(5) Run verification: specify the --cfg parameter in the models/yolo.py file as the newly created yolov5_hornet.yaml

YOLO算法改进Backbone系列之:HorNet

Read on

Engrave compliance with rules and disciplines in your heart Party Discipline Study and Education Series (2)

Discipline is indispensable Party Discipline Study and Education Series (3)

Wonder of the Sea, Tiffany's "Star of the Sea" jewelry collection that pays tribute to the deep blue sea

The new generation of mid-range flagship benchmark iQOO Z9 series is officially on sale from 1149 yuan

Google has uploaded a series of patches to remove the support for the RISC-V architecture in the Android ACK kernel

Intangible Cultural Heritage Enters the Campus and Draws a Circle of National Unity - Linquan No. 6 Middle School carries out a series of activities to bring intangible cultural heritage into the campus

Labor promotes growth, science and technology innovation to the future - Linquan County Xingye Road Experimental School carried out a series of scientific and technological labor practice activities

The new products of the ZTE Survey series are coming, and only 229 have won the high-speed WIFI7 router Survey BE5100

Ten years on, how is the animation industry progressing?|Animation Industrialization Series (1)

Selected series of foreign street photography | The new Lamborghini Revuelto is on the streets

The Honor 200 series is on the network and is tentatively scheduled to debut in mid-to-late May

Glory 200 series is coming, Magic6 image is officially decentralized, and so on the party has won

The first choice for family self-driving trips is recommended for the luxury series of Yuanhang cars

During the May Day period, Jindingshan held a series of activities to celebrate the 20th anniversary

"Looking for the trace of an ant in the forest", the chest hospital uses AI to help accurate diagnosis - the seventh report in the series of "AI is coming".

May Day Fun Tour│ Series of activities + new formats, unlock new ways to play in Suiyang