Drone-YOLO: An effective UAV image object detection

Follow and star

Never get lost

Computer Vision Research Institute

Drone-YOLO: An effective UAV image object detection

Official ID | Computer Vision Research Institute

Study group|Scan the code to get the joining method on the homepage

Column of the Institute of Computer Vision

Column of Computer Vision Institute

Object detection in UAV images is an important foundation for various research fields. However, drone imagery presents unique challenges, including large image size, small object size, dense distribution, overlapping instances, and insufficient illumination, all of which affect the effectiveness of object detection.

Outline of the outlook

In today's sharing, we propose Drone-YOLO, a series of multi-scale UAV image target detection algorithms based on the YOLOv8 model, designed to overcome specific challenges related to UAV image target detection. In order to solve the problem of large scene size and small detection objects, we have improved the neck component of the YOLOv8 model. Specifically, we adopt a three-layer PAFPN structure combined with a detection head tailored for small targets using large-scale feature maps, which significantly enhances the algorithm's ability to detect small targets. In addition, we integrate the sandwich fusion module into each layer of the upper and lower branches of the neck. This fusion mechanism combines network features with low-level features to provide rich spatial information about objects at different layers of detection heads. We use deep separable evolution to achieve this fusion, which balances parameter cost and large receptive fields. In the network backbone, we use the RepVGG module as the downsampling layer, which enhances the ability of the network to learn multi-scale features and outperforms the traditional convolutional layer.

The proposed Drone-YOLO method has been evaluated in ablation experiments and compared with other state-of-the-art methods on the VisDrone2019 dataset. The results show that our Drone-YOLO(L) outperforms other baseline methods in terms of accuracy of target detection. Compared to YOLOv8, our method achieves significant improvements over the mAP0.5 metric, with a 13.4% increase in VisDrone2019 testing and a 17.40% increase in VisDrone 2019-val. In addition, the parameter efficiency Drone-YOLO (tiny) with only 5.25M parameters performed comparable or better on the dataset than the baseline method with 9.66M parameters. These experiments verify the effectiveness of the Drone-YOLO method in object detection tasks in UAV images.

background

In the past 15 years, with the gradual maturity of UAV control technology, UAV remote sensing images have become an important data source in the field of low-altitude remote sensing research due to their cost-effectiveness and easy accessibility. During this period, deep neural network methods have been extensively studied and gradually become the best method for tasks such as image classification, object detection, and image segmentation. However, most of the currently applied deep neural network models, such as VGG, RESNET, U-NET, PSPNET, are mainly developed and validated using manually collected image datasets such as VOC2007, VOC2012, MS-COCO, as shown in the figure below.

The images obtained from the drone show a significant difference compared to the real images taken manually. The images taken by these drones are as follows:

In addition to these image data features, UAV remote sensing target detection methods have two common application scenarios. The first involves the use of large desktop computers for post-flight data processing. After the drone flies, the captured data is processed on a desktop computer. The second involves real-time processing during flight, where embedded computers on drones synchronize aerial image data in real time. The application is commonly used for obstacle avoidance and automatic mission planning during drone flight. Therefore, the object detection method applied to the neural network needs to meet the different requirements of each scenario. For methods suitable for desktop computer environments, high detection accuracy is required. For methods suitable for embedded environments, the model parameters need to be within a certain range to meet the operational requirements of the embedded hardware. After the operating conditions are met, the detection accuracy of the method also needs to be as high as possible.

Therefore, neural network methods for object detection in UAV remote sensing images need to be able to adapt to the specific features of these data. They should be designed to meet the requirements of post-flight data processing, providing high-precision and recall results, or they should be designed as models with smaller-scale parameters that can be deployed in embedded hardware environments for real-time processing on drones.

Introduction to the new framework design

The figure below shows the architecture of our proposed Drone-YOLO(L) network model. This network structure is an improvement on the YOLOv8-l model. In the backbone part of the network, we use the reparameterized convolutional module of the RepVGG structure as the downsampling layer. During training, this convolutional structure trains 3×3 and 1×1 convolution simultaneously. During inference, two convolution kernels are merged into a 3×3 convolutional layer. This mechanism enables the network to learn more robust features without affecting inference speed or expanding the size of the model. At the neck, we expanded the PAFPN structure to three layers and attached a small size object detection head. By combining the proposed sandwich fusion module, spatial and channel features are extracted from the three different layer feature maps of the network backbone. This optimization enhances the ability of the multi-scale inspection head to collect spatial positioning information of the object to be inspected.

As shown in the figure below, we propose sandwich-fusion (SF), a new fusion module for three-dimensional feature maps that optimizes the spatial and semantic information of the target for detecting heads. The module is applied to the top-down layer of the neck. The module is inspired by the BiC model proposed in YOLOv6 v3.0: A Full-Scale Reloading. The input of SF is shown in the figure, including the feature maps of the lower stage, corresponding stage, and higher stage of the trunk. The goal is to balance the spatial information of low-level features and the semantic information of high-level features to optimize the identification and classification of target locations by network heads.

Project landing effect

For the project, we used Ubuntu 20.04 as the operating system and Python 3.8, PyTorch 1.16.0 and Cuda 11.6 as the software environment. The experiment used an NVIDIA 3080ti graphics card as hardware. The implementation code of the neural network is modified from Ultralytics version 8.0.105. Be consistent with the hyperparameters used during training, testing, and validation in your project. The training epoch is set to 300, and the image input into the network is rescaled to 640×640. In some of the results listed below, all YOLOv8 and our proposed Drone-YOLO network have self-detection results. In these landings, none of these networks use pre-training parameters.

For embedded application experiments, we used the NVIDIA Tegra TX2 as the experimental environment, which has a 256-core NVIDIA Pascal architecture GPU that provides 1.33 TFLOPS of peak computing performance and 8GB of memory. The software environment is Ubuntu 18.04 LTS operating system, NVIDIA JetPack 4.4.1, CUDA 10.2 and cuDNN 8.0.0.

Test the effect in VisDrone2019-test

Based on the results of the NVIDIA Tegra TX2

Drone-YOLO in action

On the left is the result of Yolov8, and it can be seen that most of the targets in the red box are not detected

Address: www.mdpi.com/2504-446X/7/8/526

Please contact this official account for authorization

The Computer Vision Research Institute Learning Group is waiting for you to join!

ABOUT

Computer Vision Research Institute

The Institute of Computer Vision is mainly involved in the field of deep learning, mainly focusing on target detection, object tracking, image segmentation, OCR, model quantization, model deployment and other research directions. The Institute shares the latest new framework of paper algorithms every day, provides one-click download of papers, and shares practical projects. The Institute mainly focuses on "technical research" and "practical landing". The Institute will share the practice process in different fields, so that everyone can truly experience the real scene of getting rid of the theory and cultivate the habit of hands-on programming and brain-thinking!

🔗

Drone-YOLO: An effective UAV image object detection

Read on

Indestructible! The Russian super tank broke through the drone siege and carried 40 drones to bomb

Unmanned aerial vehicle (UAV) patrol multi-department linkage to escort Fuchun landscape Ecological protection is no trivial matter

Surveying and Mapping Bulletin | Gao Guitang: A height measurement method for a single high-rise building based on convolutional neural network and UAV oblique photography image

The U.S. military was extremely angry: the Chinese drone actually photographed the American nuclear aircraft carrier close to its face

Seated take-off and landing design, coaxial twin rotors, can be put on board the domestic "Suzaku" inspection and combat integrated UAV

The Houthis shot down a US drone "live": the missile was launched at night, and the "Reaper" exploded in the air

How to DIY a set of your own drone

US media: China's fourth aircraft carrier may be completed, which may become the world's first dedicated drone carrier

Not cigarette butts! The explosion in St. Petersburg: was it a Ukrainian drone attack, or an explosion of a bomb left over from World War II

#头条创作挑战赛#用无人机攻击以色列境内目标, the intensity of the conflict on the Lebanese-Israeli interim border has increased in scopeThe conflict on the Lebanese-Israeli interim border has continued in recent years, but more recently

Come on, Chery is really ready to engage in drones! Electric ducted fans are all in place

Drones deliver milk tea, the tip of the iceberg of low-altitude economy

Chinese drones crashed, 4,000-ton ships sank, what's going on?

US press: China's 076 is not the world's largest amphibious assault ship, but the world's first drone carrier

To meet the use of drones and small equipment, Maverick Electric 300W lithium battery pure sine wave inverter dismantling

For the first time, the TB3 UAV underwent a glide jump flight test