EdgeYOLO: A real-time object detector and Pytorch implementation on edge devices

Follow and star

Never get lost

Computer Vision Research Institute

EdgeYOLO: A real-time object detector and Pytorch implementation on edge devices

Public ID | ComputerVisionGzq

Study group|Scan the code to get the joining method on the homepage

Code address: https://github.com/LSH9832/edgeyolo

Column of the Institute of Computer Vision

Author: Edison_G

The researchers shared today propose an efficient, low-complexity, and anchor-free object detector based on the state-of-the-art YOLO framework, which can be implemented in real time on edge computing platforms.

overview

The researchers developed an enhanced data augmentation method to effectively suppress overfitting during training, and designed a mixed random loss function to improve the detection accuracy of small targets. Inspired by FCOS, a lighter and more efficient decoupling head is proposed that can improve the speed of inference without loss of accuracy. The proposed baseline model can achieve 50.6% AP50:95 and 69.8% AP50 accuracy in the MS COCO2017 dataset, 26.4% AP50:95 and 44.8% AP50 accuracy in the VisDrone2019 DET dataset, and it meets the real-time performance requirements (FPS≥30) of the edge computing device Nvidia Jetson AGX Xavier.

introduce

On common object detection datasets, such as MS COCO2017, models that use a two-phase strategy are slightly better than those that use a one-phase strategy. Still, due to the internal limitations of a two-phase framework, it is far from meeting the real-time requirements of traditional computing devices, and is likely to face the same situation on most high-performance computing platforms. In contrast, a single-stage object detector can maintain a balance between real-time metrics and performance. Therefore, they are more of the attention of researchers, and the YOLO series of algorithms are updated iteratively at high speed. The update from YOLOv1 to YOLOv3 is mainly an improvement to the underlying framework structure, and most of the later mainstream versions of YOLO focus on improving accuracy and inference speed.

In addition, their optimized test platform is mainly large workstations with high-performance GPUs. However, their state-of-the-art models often run at unsatisfactory low FPS on these edge computing devices. To this end, some researchers have proposed network structures with fewer parameters and lighter structures, such as MobileNet and ShuffleNet, to replace the original backbone network to achieve better real-time performance on mobile and edge devices, but at the expense of some precision. In today's sharing, the goal of the researchers is to design an object detector with good accuracy and can operate in real time on edge devices.

As shown in the figure below, the researchers also designed lighter and fewer parameter models for edge computing devices with lower computing power, which also showed better performance.

New framework

Random data augmentation inevitably leads to some labels being invalid, such as the lower-right corner of the second graph in (a) and the lower-left corner of the third graph. Although there are boxes, they do not provide valid target information. Too few labels can have a significant negative impact on training, which can be avoided by increasing the number of valid boxes in (b).

Enhanced-Mosaic & Mixup

Commonly used data augmentation strategies are shown in (a) and (b) below, but (a) and (b) tend to include images without valid targets due to data transformation, and the probability of this situation gradually increases as the number of labels in each original image decreases.

The author therefore proposes method (c):

First, use the Mosaic method for multiple groups of images (the number of groups can be set based on the richness of the average number of labels in a single picture in the dataset)
Then, the last simplely-processed image is mixed with the Mosaic-processed image by the Mixup method (the original image boundary of the last image is within the boundary of the transformed final output image)

Lite-Decoupled Head

The decoupling head was first proposed in FCOS and then used in other Anchor-Free object detectors such as YOLOX. Using a decoupling structure at the last few network layers can accelerate network convergence and improve regression performance. But since the decoupling head employs a branching structure that results in additional inference costs, YOLOv6 proposes an efficient decoupling head with faster inference speed, which reduces the number of intermediate 3×3 convolutional layers to only one layer, while maintaining a larger number of channels identical to the input feature map.

But this additional inference cost also becomes more pronounced as channel and input sizes increase. Therefore, the introduction of re-participation technology enhances learning ability and speeds up reasoning.

experiment

representative results in VisDrone2019-DET-val

representative results on MS COCO2017-val

Please contact this official account for authorization

The Computer Vision Research Institute Learning Group is waiting for you to join!

ABOUT

Computer Vision Research Institute

The Institute of Computer Vision is mainly involved in the field of deep learning, mainly focusing on face detection, face recognition, multi-target detection, target tracking, image segmentation and other research directions. The institute will continue to share the latest new framework of paper algorithms, and the difference between our reform this time is that we need to focus on "research". After that, we will share the practice process for the corresponding field, so that everyone can truly experience the real scene of getting rid of the theory, and cultivate the habit of hands-on programming and brain-thinking!

VX：2311123606

🔗