Abstract: The pedestrian target detection system based on YOLOV8 model and Mot20 dataset can be used to detect and locate pedestrians in daily life, and the deep learning algorithm can realize object detection in pictures, videos, cameras, etc., and the system also supports result visualization and result export in pictures, videos and other formats. This system uses the YOLOv8 object detection algorithm to train the dataset, and uses the Pysdie6 library to build a front-end page display system. In addition, the functions supported by this system include the import and initialization of training models; Adjustment of detection confidence score and post-detection processing IOU threshold; Image uploading, detection, visualization result display and test result export; Video uploading, detection, visualization results display, and test result export; Camera image input, detection and visualization result display; The number of detected targets, list and location information; Features such as forward inference time. This blog post provides a complete Python code and installation and use tutorial, suitable for new friends to refer to, some important code parts are commented, complete code resource files please go to the download link at the end of the article.

Friends who need the source code can get the download link in the background private message blogger

Basic introduction

In recent years, machine learning and deep learning have made great progress, and deep learning methods show better performance than traditional methods in terms of detection accuracy and speed. YOLOv8 is a next-generation algorithm model developed by Ultralytics following the YOLOv5 algorithm, which currently supports image classification, object detection, and instance segmentation tasks. YOLOv8 is a SOTA model that builds on the success of previous YOLO series models and introduces new features and improvements to further enhance performance and flexibility. Specific innovations include a new backbone network, a new Ancher-Free detection head, and a new loss function that can run on a variety of hardware platforms, from CPUs to GPUs. Therefore, this blog paper uses the YOLOv8 object detection algorithm to implement a pedestrian target detection system based on the YOLOV8 model and Mot20 dataset, and then uses the Pyside6 library to build an interface system to complete the development of the object detection page.

This blogger has previously published the relevant model and interface of the YOLOv5 algorithm, and friends who need it can view it from my previous blog. In addition, this blogger plans to jointly release YOLOv5, YOLOv6, YOLOv7 and YOLOv8, friends who need it can continue to follow, and friends are welcome to follow the collection.

Environment setup

(1) Open the project directory and enter cmd in the search box to open the terminal

(2) Create a new virtual environment (conda create -n yolo8 python=3.8)

(3) Activate the environment, install the ultralytics library (yolov8 official library), pip install ultralytics -i https://pypi.tuna.tsinghua.edu.cn/simple

(4) Note that this installation method will only install the CPU version of Torch, if you need to install the GPU version of Torch, you need to install Torch: pip install Torch==2.0.1+cu118 Torchvision==0.15.2+Cu118 -f https://download.pytorch.org/whl/torch_stable.html; Again, pip install ultralytics -i https://pypi.tuna.tsinghua.edu.cn/simple

(5) Install the graphical interface library pyside6: pip install pyside6 -i https://pypi.tuna.tsinghua.edu.cn/simple

Interface and function display

The following gives the software interface designed by this blog post, the overall interface is simple and generous, and the general functions include the import and initialization of the training model; Confidence score and IOU threshold adjustment, image upload, detection, visual result display, result export and end detection; Video upload, detection, visual result display, result export and end detection; List of detected targets, location information; Time spent on forward inference. The initial interface is as follows:

Model selection and initialization

Users can click the model weight selection button to upload the trained model weight, and the training weight format can be .pt, . ONNX and engine, etc., and then click the model weight initialization button to configure the selected model initialization.

Change in confidence score with IOU

Change the value in the input box below Confidence or IOU to change the progress of the slider bar synchronously, and change the progress value of the slider bar to change the value of the input box synchronously; Changes in Confidence or IOU values will be synchronized to the configuration in the model, changing the detection confidence threshold and IOU threshold.

Image selection, detection, and export

Users can click the Select Image button to upload a single image for detection and recognition, and the system interface will display the input image synchronously after successful upload.

Then click the image detection button to complete the object detection function of the input image, and then the system will output the detection time in the time column, output the number of detected targets in the target number column, and select the detected target in the drop-down box, corresponding to the change of the target position (ie, xmin, ymin, xmax and ymax) label value.

Then click the Test Result Show button to display the results of the input image inspection at the bottom left of the system, and the system will display the category, location and confidence information of the identified target in the image.

Click the image test result export button to export the tested image, and enter the saved image name and suffix in the save field to save the test result image.

Click the End Image Detection button to complete the refresh of the system interface, clear all output information, and then click the Select Image or Select Video button to upload an image or video, or click the Open Camera button to turn on the camera.

Video selection, detection, and export

The user clicks the Select Video button to upload the video for detection and identification, and then the system will input the first frame of the video into the system interface for display.

Then click the video detection button to complete the object detection function of the input video, and then the system will output the detection time in the time column, output the number of detected targets in the target number column, and select the detected target in the drop-down box, corresponding to the change of the target position (ie, xmin, ymin, xmax and ymax) tag value.

Click the pause video detection button to realize the pause of the input video, at this time the button changes to continue video detection, the input video frame and frame detection results will remain in the system interface, you can click the drop-down target box to select the coordinate position information of the detected target, and then click the Continue video detection button to achieve the detection of the input video.

Click the video test result export button to export the tested video, and enter the saved picture name and suffix in the save field to save the test result video.

Click the End Video Detection button to complete the refresh of the system interface, clear all output information, and then click the Select Image or Select Video button to upload an image or video, or click the Open Camera button to turn on the camera.

The camera opens, detects and ends

The user can click the Open Camera button to turn on the camera device for detection and recognition, after which the system will input the camera image into the system interface for display.

Then click the camera detection button to complete the object detection function of the input camera, and then the system will output the detection time in the time column, output the number of detected targets in the number of targets column, and select the detected target in the drop-down box, corresponding to the change of the target position (ie, xmin, ymin, xmax and ymax) tag value.

Introduction to the principle of algorithms

This system adopts YOLOv8, a single-stage object detection algorithm based on deep learning, compared with the previous YOLO series of object detection algorithms, YOLOv8 object detection algorithm has the following advantages: (1) more friendly installation/operation mode; (2) Faster speed and higher accuracy; (3) New backbone, replacing C3 in YOLOv5 with C2F; (4) The YOLO series tried anchor-free for the first time; (5) New loss function. The overall structure of the YOLOv8 model is shown in the figure below, and the original picture is shown in the official warehouse of mmyolo.

The most obvious difference between YOLOv8 and YOLOv5 models is that the original C3 module is replaced by the C2F module, and the structure of the two modules is shown in the figure below, the original picture is shown in the official warehouse of mmyolo.

In addition, the head part has changed the most, from the original coupling head to the decoupling head, and from the Anchor-Based of YOLOv5 to the Anchor-Free. The structural comparison is shown in the figure below.

Introduction to datasets

The MOT20 pedestrian detection dataset used by this system is labeled with the pedestrian category, and the dataset has a total of 14410 pictures. The categories in this dataset have a large number of rotations and different lighting conditions, which helps to train more robust detection models. The pedestrian detection detection and recognition dataset of this experiment contains 8050 pictures in the training set, 881 pictures in the verification set, and 4479 pictures in the test set. Since the YOLOv5 algorithm has a limit on the size of the input images, all images need to be resized to the same size. In order to minimize image distortion without affecting the detection accuracy, we resized all images to 640x640 and maintained the original aspect ratio. In addition, to enhance the generalization ability and robustness of the model, we also used data augmentation techniques, including random rotation, scaling, clipping, and color transformation, to enrich the dataset and reduce the risk of overfitting.

Critical code parsing

In the training phase, we used a pre-trained model as the initial model for training, and then optimized the network parameters through multiple iterations to achieve better detection performance. In the training process, we use techniques such as learning rate decay and data augmentation to enhance the generalization ability and robustness of the model. A simple single-card model training command is as follows.

More parameters can also be specified at training time, most of the important parameters are as follows:

In the testing phase, we used the trained model to detect new images and videos. By setting a threshold, the detection box with confidence below the threshold is filtered out, and the detection result is finally obtained. At the same time, we can also save the test results in image or video format for subsequent analysis and application. This system is based on the YOLOv8 algorithm and is implemented using PyTorch. The main libraries used in the code include PyTorch, NumPy, OpenCV, Pyside6, etc.

Pyside6 interface design

PySide is a graphical interface (GUI) library for Python, developed from the C++ version of Qt, and its usage is basically not very different from the C++ version. Compared to other Python GUI libraries, PySide is faster to develop, more functional, and has better documentation support. In this blog post, we used the Pyside6 library to create a graphical interface to provide users with an easy-to-use interactive interface to select pictures and videos for object detection.

We used Qt Designer to design the graphical interface, and then used Pyside6 to convert the designed UI file into Python code. The graphical interface contains several UI controls, such as labels, buttons, text boxes, multi-check boxes, etc. Through the signal slot mechanism in Pyside6, UI controls can be connected to the program logic code.

Experimental results and analysis

In the experimental results and analysis section, we use metrics such as precision and recall to evaluate the performance of the model, and also analyze the training process through loss curves and PR curves. In the training phase, we used the dataset introduced earlier for training, and used the YOLOv8 algorithm to train the dataset, training a total of 100 epochs. During the training process, we used tensorboard to record the loss curve of the model on the training and validation sets. As can be seen from the figure below, with the increase of training times, the training loss and validation loss of the model gradually decrease, indicating that the model continues to learn more accurate features. After the training is completed, we use the model to evaluate on the validation set of the dataset and get the following results.

The figure below shows the PR curve of our trained YOLOv8 model on the validation set, and it can be seen from the figure that the model achieves high recall and accuracy, and the overall performance is good.

The figure below shows the Mosaic data augmented image when training the dataset using the YOLOv8 model.

In summary, the YOLOv8 model trained in this blog post performs well on the dataset, has high detection accuracy and robustness, and can be applied in practical scenarios. In addition, this blogger has tested the entire system in detail, and finally developed a smooth version of the high-precision object detection system interface, which is the demonstration part of this blog post, the complete UI interface, test pictures and videos, code files, etc. have been packaged and uploaded, interested friends can follow my private message to get. In addition, the PDF of this blog post and more object detection and recognition systems, please pay attention to the author's WeChat public account BestSongC (formerly Nuist computer vision and pattern recognition) to get.

Other deep learning-based object detection systems such as tomatoes, cats and dogs, goats, wild targets, cigarette butts, QR codes, helmets, traffic police, wild animals, wild smoke, human fall recognition, infrared pedestrians, poultry pigs, apples, bulldozers, bees, phone calls, pigeons, footballs, cows, face masks, safety vests, smoke detection systems, etc. Friends in need follow me and get download links from other videos of bloggers.

The full project directory looks like this:

Pedestrian target detection system based on YOLOV8 model and Mot20 dataset