laitimes

Drone-YOLO: An effective UAV image object detection

author:Institute of Computer Vision

Follow and star

Never get lost

Computer Vision Research Institute

Drone-YOLO: An effective UAV image object detection
Drone-YOLO: An effective UAV image object detection

Official ID | Computer Vision Research Institute

Study group|Scan the code to get the joining method on the homepage

Column of the Institute of Computer Vision

Column of Computer Vision Institute

Object detection in UAV images is an important foundation for various research fields. However, drone imagery presents unique challenges, including large image size, small object size, dense distribution, overlapping instances, and insufficient illumination, all of which affect the effectiveness of object detection.

01

Outline of the outlook

In today's sharing, we propose Drone-YOLO, a series of multi-scale UAV image target detection algorithms based on the YOLOv8 model, designed to overcome specific challenges related to UAV image target detection. In order to solve the problem of large scene size and small detection objects, we have improved the neck component of the YOLOv8 model. Specifically, we adopt a three-layer PAFPN structure combined with a detection head tailored for small targets using large-scale feature maps, which significantly enhances the algorithm's ability to detect small targets. In addition, we integrate the sandwich fusion module into each layer of the upper and lower branches of the neck. This fusion mechanism combines network features with low-level features to provide rich spatial information about objects at different layers of detection heads. We use deep separable evolution to achieve this fusion, which balances parameter cost and large receptive fields. In the network backbone, we use the RepVGG module as the downsampling layer, which enhances the ability of the network to learn multi-scale features and outperforms the traditional convolutional layer.

The proposed Drone-YOLO method has been evaluated in ablation experiments and compared with other state-of-the-art methods on the VisDrone2019 dataset. The results show that our Drone-YOLO(L) outperforms other baseline methods in terms of accuracy of target detection. Compared to YOLOv8, our method achieves significant improvements over the mAP0.5 metric, with a 13.4% increase in VisDrone2019 testing and a 17.40% increase in VisDrone 2019-val. In addition, the parameter efficiency Drone-YOLO (tiny) with only 5.25M parameters performed comparable or better on the dataset than the baseline method with 9.66M parameters. These experiments verify the effectiveness of the Drone-YOLO method in object detection tasks in UAV images.

02

background

In the past 15 years, with the gradual maturity of UAV control technology, UAV remote sensing images have become an important data source in the field of low-altitude remote sensing research due to their cost-effectiveness and easy accessibility. During this period, deep neural network methods have been extensively studied and gradually become the best method for tasks such as image classification, object detection, and image segmentation. However, most of the currently applied deep neural network models, such as VGG, RESNET, U-NET, PSPNET, are mainly developed and validated using manually collected image datasets such as VOC2007, VOC2012, MS-COCO, as shown in the figure below.

Drone-YOLO: An effective UAV image object detection

The images obtained from the drone show a significant difference compared to the real images taken manually. The images taken by these drones are as follows:

Drone-YOLO: An effective UAV image object detection

In addition to these image data features, UAV remote sensing target detection methods have two common application scenarios. The first involves the use of large desktop computers for post-flight data processing. After the drone flies, the captured data is processed on a desktop computer. The second involves real-time processing during flight, where embedded computers on drones synchronize aerial image data in real time. The application is commonly used for obstacle avoidance and automatic mission planning during drone flight. Therefore, the object detection method applied to the neural network needs to meet the different requirements of each scenario. For methods suitable for desktop computer environments, high detection accuracy is required. For methods suitable for embedded environments, the model parameters need to be within a certain range to meet the operational requirements of the embedded hardware. After the operating conditions are met, the detection accuracy of the method also needs to be as high as possible.

Therefore, neural network methods for object detection in UAV remote sensing images need to be able to adapt to the specific features of these data. They should be designed to meet the requirements of post-flight data processing, providing high-precision and recall results, or they should be designed as models with smaller-scale parameters that can be deployed in embedded hardware environments for real-time processing on drones.

03

Introduction to the new framework design

The figure below shows the architecture of our proposed Drone-YOLO(L) network model. This network structure is an improvement on the YOLOv8-l model. In the backbone part of the network, we use the reparameterized convolutional module of the RepVGG structure as the downsampling layer. During training, this convolutional structure trains 3×3 and 1×1 convolution simultaneously. During inference, two convolution kernels are merged into a 3×3 convolutional layer. This mechanism enables the network to learn more robust features without affecting inference speed or expanding the size of the model. At the neck, we expanded the PAFPN structure to three layers and attached a small size object detection head. By combining the proposed sandwich fusion module, spatial and channel features are extracted from the three different layer feature maps of the network backbone. This optimization enhances the ability of the multi-scale inspection head to collect spatial positioning information of the object to be inspected.

Drone-YOLO: An effective UAV image object detection

As shown in the figure below, we propose sandwich-fusion (SF), a new fusion module for three-dimensional feature maps that optimizes the spatial and semantic information of the target for detecting heads. The module is applied to the top-down layer of the neck. The module is inspired by the BiC model proposed in YOLOv6 v3.0: A Full-Scale Reloading. The input of SF is shown in the figure, including the feature maps of the lower stage, corresponding stage, and higher stage of the trunk. The goal is to balance the spatial information of low-level features and the semantic information of high-level features to optimize the identification and classification of target locations by network heads.

Drone-YOLO: An effective UAV image object detection

04

Project landing effect

For the project, we used Ubuntu 20.04 as the operating system and Python 3.8, PyTorch 1.16.0 and Cuda 11.6 as the software environment. The experiment used an NVIDIA 3080ti graphics card as hardware. The implementation code of the neural network is modified from Ultralytics version 8.0.105. Be consistent with the hyperparameters used during training, testing, and validation in your project. The training epoch is set to 300, and the image input into the network is rescaled to 640×640. In some of the results listed below, all YOLOv8 and our proposed Drone-YOLO network have self-detection results. In these landings, none of these networks use pre-training parameters.

For embedded application experiments, we used the NVIDIA Tegra TX2 as the experimental environment, which has a 256-core NVIDIA Pascal architecture GPU that provides 1.33 TFLOPS of peak computing performance and 8GB of memory. The software environment is Ubuntu 18.04 LTS operating system, NVIDIA JetPack 4.4.1, CUDA 10.2 and cuDNN 8.0.0.

Test the effect in VisDrone2019-test

Drone-YOLO: An effective UAV image object detection

Based on the results of the NVIDIA Tegra TX2

Drone-YOLO: An effective UAV image object detection
Drone-YOLO: An effective UAV image object detection

Drone-YOLO in action

Drone-YOLO: An effective UAV image object detection

On the left is the result of Yolov8, and it can be seen that most of the targets in the red box are not detected

Drone-YOLO: An effective UAV image object detection
Drone-YOLO: An effective UAV image object detection

Address: www.mdpi.com/2504-446X/7/8/526

© THE END

Drone-YOLO: An effective UAV image object detection

Please contact this official account for authorization

Drone-YOLO: An effective UAV image object detection

The Computer Vision Research Institute Learning Group is waiting for you to join!

ABOUT

Computer Vision Research Institute

The Institute of Computer Vision is mainly involved in the field of deep learning, mainly focusing on target detection, object tracking, image segmentation, OCR, model quantization, model deployment and other research directions. The Institute shares the latest new framework of paper algorithms every day, provides one-click download of papers, and shares practical projects. The Institute mainly focuses on "technical research" and "practical landing". The Institute will share the practice process in different fields, so that everyone can truly experience the real scene of getting rid of the theory and cultivate the habit of hands-on programming and brain-thinking!

🔗

Read on