Qihao Huang, School of Geospatial Information, University of Information Technology: Lightweight SAR Target Detection Combining Channel Pruning and Knowledge Distillation|Journal of Surveying and Mapping, Vol. 53, No. 4, 2024

The content of this article comes from the 4th issue of "Journal of Surveying and Mapping" in 2024 (drawing review number: GS Jing (2024) No. 0714)

Lightweight SAR target detection combining channel pruning and knowledge distillation

Huang Qihao

Qihao Huang, School of Geospatial Information, University of Information Technology: Lightweight SAR Target Detection Combining Channel Pruning and Knowledge Distillation|Journal of Surveying and Mapping, Vol. 53, No. 4, 2024

,1, Jin Guowang,1, Xiong Xin1, Wang Limei2, Li Jiahao1

1. School of Geospatial Information, Information Engineering University, Zhengzhou 450001, Henan, China

2. School of Surveying, Mapping and Urban Spatial Information, Henan University of Urban Construction, Pingdingshan 467000, China

Fund:This work was financially supported by the National Natural Science Foundation of China (41474010)

About the Author

第一作者简介:黄启灏（1998—），男，硕士生，研究方向为合成孔径雷达目标检测与识别。 E-mail:[email protected] 通信作者: 靳国旺 E-mail:[email protected]; [email protected]

Abstract:The lightweight SAR target detection method is of great significance for the rapid detection of ground objects in SAR images. In order to solve the problem of low accuracy of lightweight detection method, a lightweight SAR target detection method combining channel pruning and knowledge distillation was designed. In this method, the scaling factor γ of the batch normalization layer in the complex network is sparsely trained to determine the importance of the corresponding feature channels, and then the secondary channels are trimmed, and after fine-tuning the training, they are used as the teacher network to construct a knowledge distillation framework to guide the training of the lightweight model and improve the detection accuracy of the lightweight model. The YOLOv5-6.1 algorithm is used to build a detection framework, and the training and detection experiments are carried out on the recombinant MSAR and SSDD multi-class target datasets, and the results show that the proposed method can improve the SAR target detection accuracy under the condition of maintaining a lightweight model volume of only 3.73 MB, which verifies the effectiveness of the proposed method.

Key words: SAR ; Object detection ; Lightweight ; Channel pruning ; Knowledge distillation

This article cites format

HUANG Qihao, JIN Guowang, XIONG Xin, WANG Limei, LI Jiahao. Lightweight SAR Target Detection Based on Channel Pruning and Knowledge Distillation[J]. Journal of Surveying and Mapping, 2024, 53(4): 712-723. doi:10.11947/j.AGCS.2024.20220605.

HUANG Qihao. Lightweight SAR target detection based on channel pruning and know-ledge distillation[J]. Acta Geodaetica et Cartographica Sinica, 2024, 53(4): 712-723. doi:10.11947/j.AGCS.2024.20220605.

Read the original article

http://xb.chinasmp.com/article/2024/1001-1595/1001-1595-2024-04-0712.shtml

Synthetic Aperture Radar (SAR) is a microwave imaging sensor that actively emits and receives electromagnetic waves, and has played a huge role in disaster emergency rescue, space information reconnaissance, and environmental situational awareness because of its ability to penetrate clouds, fog, rain and snow, and can observe the earth in all weather [1-4]. In recent years, with the successive launch of SAR satellites, a large number of SAR image data have been obtained. However, due to the unique imaging mechanism of SAR, manual visual interpretation of SAR images requires rich experience and is time-consuming and laborious. Therefore, it is necessary to study the object detection technology of SAR images. Traditional SAR target detection uses different algorithms depending on the mission target. When targeting ship targets, the algorithm design is generally carried out in combination with the characteristics of ship wake and sea surface parameters, such as the method based on two-parameter template and wake joint detection proposed in Ref. [5]. For vehicle-like targets, methods such as template matching and feature extraction are commonly used, such as the PCA-based feature recognition method proposed in Ref. [6]. Traditional detection methods, although greatly improved in speed, are less accurate, and the classifier relies on manual design. Driven by the rapid acquisition of massive SAR data and practical tasks, traditional detection methods can no longer meet the current needs of timeliness and accuracy of SAR target detection, and many experts and scholars at home and abroad have introduced computer vision and deep learning methods into this field [7-9], among which the most representative is convolutional neural network (CNN), which learns the shallow features of the target texture and contour, as well as the deep abstract semantic information of the target [ 10] Realizing end-to-end object detection tasks is more suitable for detection tasks in complex scenarios than traditional methods, and has gradually become the mainstream method of SAR object detection at this stage [11]. However, at present, deep learning-based object detection algorithms usually require strong computing power, and in practical SAR object detection applications, the methods often need to meet the requirements of small size, easy portability, high precision and high efficiency. Therefore, lightweight SAR target detection methods have become the focus of current research, and researchers have mainly improved them from the aspects of network pruning [12], knowledge distillation [13], and network structure modification. In terms of network pruning, channel pruning [16] is introduced into the SAR object detection network in Ref. [14-15], which effectively reduces the number of parameters, computation, and inference time of the model, but sacrifices the detection accuracy. In Ref. [17], on the basis of the lightweight feature extraction network and feature fusion network, the pruning algorithm based on the geometric median value is used to further compress the network, which accelerates the inference speed of the model. In terms of knowledge distillation, MobileNetV2 [19] is used as a feature extraction network for SSD[20] detection algorithm to achieve lightweight improvement in Ref. [18], and uses knowledge distillation to improve the performance of lightweight networks, but there is still a certain gap with the detection accuracy of the source network model. The lightweight arbitrary direction SAR ship detector designed in Ref. [21] combines the characteristic map and the knowledge distillation of the detection head branch, and constructs the similarity between pixels as the heat map distillation knowledge, which improves the detection accuracy and speed under the condition of reducing the number of model parameters. Ref. [22] modeled the foreground and background relationships of SAR images, and enhanced the relationship between the foreground and background of students' online learning and the robustness of background noise by distilling knowledge about the topological distances of decoupled features. Ref. [23] explored the effectiveness of the non-structural sparsification of the teacher network before knowledge distillation from the perspective of the combination of different lightweight methods. In Ref. [23-24], the advantages of the above two methods are combined, and the network pruning is used to construct a lightweight network model, and then the detection accuracy of the pruned lightweight network model is improved through knowledge distillation. In Ref. [25], the secondary channels in the network are targeted by the attention mechanism, and then the knowledge distillation method of bridge connection is used to restore the network performance. However, in practical applications, the compression of network pruning often needs to sacrifice the detection accuracy, and excessive pruning will cause a serious decline in the accuracy of the model, which cannot fully meet the requirements of lightweight and high-precision SAR target detection. Although knowledge distillation can improve the detection accuracy of lightweight models to a certain extent, it is difficult for the student network to fully learn the complex knowledge of the teacher's network when the model capacity difference is too large, resulting in the limited effect of accuracy improvement. In order to solve the above problems, this paper designs a lightweight SAR target detection method that uses channel pruning to compress the size of the teacher's network and optimize the carrying knowledge, and then guides the students' network training through knowledge distillation, and builds a detection framework on the currently widely used advanced object detection method YOLOv5, and the experiments on the recombinant MSAR and SSDD multi-class target datasets verify the effectiveness of the method.

1 Lightweight SAR target detection method combining channel pruning and knowledge distillation

Figure 1 shows the lightweight SAR target detection framework designed in this paper, which is composed of a channel pruning module, a knowledge distillation module and a SAR target detection module.

Figure 1

Fig.1 Lightweight SAR target detection framework combining channel pruning and knowledge distillation

Fig. 1 Lightweight SAR target detection framework based on channel pruning and knowledge distillation

The channel pruning module trains the complex model on the SAR image dataset for sparse training, distinguishes the importance of each channel in the model and prunes the secondary channels, and fine-tunes the training to obtain the pruning optimized complex model. The knowledge distillation module uses the pruning optimized complex model as the teacher network, guides the lightweight student network for training, and constructs the L2 distillation loss on the output of the teacher network and the student network, so that the model parameters are continuously updated, and a lightweight SAR target detection weight model is obtained. Finally, the lightweight detection weight model is used under the SAR target detection module to achieve robust detection of targets in SAR images.

1.1 Channel pruning module

The channel pruning module consists of three steps: sparsity training, channel pruning and fine-tuning training. In the training process of the complex model, L1 regularization is used to sparsify the scaling factor γ in the batch normalization (BN) layer, identify and clip the corresponding secondary feature channels, and optimize the feature information in the model while compressing the volume of the complex model.

1.1.1 Sparse training

The purpose of sparsity training is to determine the importance of each channel in the convolutional layer feature map and provide a reference index for channel pruning. In order to avoid the introduction of additional computational overhead, the scaling factor in the BN layer is γ sparsified as the basis for judging the importance of the channel. Let the output of the previous layer be x1, x2,...,xm, m is the batch number of training samples, μB and σB are the mean and variance of each batch of data, and ε are the regularization parameters used to prevent the denominator from being 0 in the calculation, and the normalized output is

(1)

In order to improve the nonlinear feature extraction ability and overall expression ability of the model, the data distribution was reconstructed by the learnable scaling factor γ translation factor β and yi is the reconstruction result

(2)

The L1 regularization method was used to sparsely express the set Γ of the corresponding γ values of each channel in the convolutional layer. After sparsification, the feature channel with a γ value tending to 0 is the secondary channel. The regularization of the Γ is introduced into the network loss function L

(3) where l(f(x,W),y) is the loss function of the original convolutional neural network; (x,y) is the input and target of the training; W is the trainable weight; λ is the equilibrium factor; g(γ) is a sparse penalty term for the scaling factor, ie

。

Finding the partial derivative of the γ in Eq. (3) shows that the partial derivative is discontinuous at γ=0, and if the left and right partial derivatives are different, then there is a minimum point of the loss function L. In the optimization process of subgradient descent, some γ will continue to approach 0, and the ensemble Γ will become sparse

(4)

1.1.2 Channel pruning

Based on the γ of the scaling factor after sparsity training, it is connected with the corresponding channels in the convolutional layer feature map, and the absolute value is sorted. The pruning threshold S is determined by setting the pruning ratio, and if the γ corresponding to the feature channel is less than the threshold S, it is judged as a secondary channel and pruning is applied. However, in order to ensure the integrity of the network structure, the threshold should not be greater than the maximum γ value in any BN layer to ensure that it matches the dimensions of the backbone network. At the same time, the structure with residual connection in the network is not pruned to ensure that the dimensions of the diameter connection and the residual layer feature map are consistent. The process of channel pruning is shown in Figure 2. In the ith convolutional layer feature map of the original network, each channel is connected with a sparse scaling factor γ, and the γ values corresponding to the feature channels Ci2 and Ci4 in the graph are less than the threshold, which contributes relatively little to the overall performance of the network.

Figure 2

Fig.2. Channel pruning process

Fig. 2 The process of channel pruning

1.1.3 Fine-tuning training

In order to reduce the impact of the lack of feature channels on the detection accuracy, the pruning network model initialization weights were used to fine-tune the training to restore the detection accuracy of the pruning model, which laid the foundation for improving the detection accuracy of the lightweight model in the next step.

1.2 Knowledge Distillation Module

In the knowledge distillation module, the pruning optimized complex model output by the channel pruning module is used as the teacher network, and the lightweight model is used as the student network, and the loss functions between "student network prediction-teacher network prediction" and "student network prediction-truth label" are constructed respectively, and then the weighted sum of the two is used as the total loss to update the model parameters, so as to improve the detection accuracy and robustness of the lightweight student network to the target response.

1.2.1 Knowledge distillation network architecture

In order to realize the transfer of teachers' network generalization ability to students' network in knowledge distillation, it is necessary for the network to comprehensively learn the characteristics of positive and negative samples. The Softmax function in the network can map the output of the neuron to the interval (0,1), which is equivalent to the probability of the output prediction class. But when the probability distribution of the Softmax output is small, the value of the negative label will approach 0, and the contribution in calculating the loss function will be weakened. Therefore, the temperature variable τ is added to the Softmax function to amplify the information carried by the negative label, so that the student network can better learn the inter-class differences of different targets and improve the detection accuracy

(5)

(6)

where yi is the output of the previous neuron;

and y" are the output of the Softmax function before and after adding the temperature variable τ, respectively.

The structure of knowledge distillation is shown in Figure 3. The prediction results obtained by heating up the Softmax output of the teacher network are called soft labels, and the prediction results obtained by the Softmax output of the student network are called soft predictions and hard predictions respectively according to whether they are heated up, and the distillation loss Ldistill and student loss Lstudent are constructed respectively by combining the real labeling information of the hard labels, and the overall loss L is constructed in the form of a weighted sum, which is used to update the parameters to obtain the object detection weight model.

Figure 3

Fig.3 Structure of knowledge distillation

Fig. 3 Structure of knowledge distillation

1.2.2 Construction of knowledge distillation loss function

In the original network, the loss function is composed of three parts: target loss FOBJ, classification loss FCL and bounding box loss FBB, and the loss value gradually decreases and converges through continuous iterative training, and the predicted value of the model is constantly approaching the real value

(7)

During the ceremony,

、

objectives, category probabilities, and bounding boxes predicted for the model;

、

is the corresponding true value.

Since target loss is the main basis for judging the background and goal, it is also a prerequisite for further learning the object type and position regression. The first thing you need to do is to modify the target loss function. The softened teacher network prediction and real labeling are used to guide the students' network learning of the target features at the same time. The modified target loss function

(8)

During the ceremony,

is the modified target loss function;

、

The target true value, the predicted value of the student network and the predicted value of the teacher network were distinguished.

and

The student network target loss and the distillation target loss were respectively lost; λD is the equilibrium coefficient.

Since YOLOv5 predicts the bounding box at the same time as the target type, the standard knowledge distillation will make the prediction of the background box by the teacher network also transfer to the student network, thus affecting the prediction of the target box by the student network. Therefore, in order to ensure that the student network learns the class probability and bounding box information only when the teacher network has a high prediction value for the target, the distillation loss of the classification and bounding box loss function is introduced with the target prediction value of the updated teacher network

The product of . If the object in the prediction box is the background,

will be low, making the distillation loss close to 0, which effectively prevents students from learning background information incorrectly. The modified category loss function is

(9)

During the ceremony,

is the modified category loss function;

classify losses for student networks;

Classification losses for distillation;

、

and

The true value of the target probability, the predicted value of the student network and the predicted value of the teacher network are respectively. λD is the equilibrium coefficient;

Predicts the target value for the updated teacher network.

In the same way, the bounding box loss function has a similar expression as

(10)

During the ceremony,

is the modified bounding box loss function;

loss of bounding box for student networks;

is the loss of the distilled bounding box;

、

and

The real value of the target bounding box, the predicted value of the student network and the predicted value of the teacher network are respectively. λD is the equilibrium coefficient;

Predicts the target value for the updated teacher network.

Finally, the overall distillation loss L in the network used to update the model parameters is the sum of Eq. (8)—Eq. (10).

(11)

1.3 SAR target detection module

Under the correct configuration of environment and variables, the SAR image to be detected is input into the SAR target detection module, and the lightweight SAR target detection weight model and forward propagation algorithm output by the knowledge distillation module are used to regress to obtain the target type and position information in the target to be detected, and finally realize the detection and positioning of high-precision and lightweight SAR targets. In summary, the lightweight SAR object detection method designed in this paper combining channel pruning and knowledge distillation mainly improves the detection effect of lightweight student networks from three aspects: teacher network feature refinement, regularization and dark knowledge learning. (1) Channel pruning eliminates the secondary channels of the teacher's network and realizes feature refinement, so that important feature knowledge can be delivered to the student network more accurately in the process of knowledge distillation and the performance of the student network can be improved. (2) Channel pruning reduces the parameters and complexity of the teacher network, so it can be regarded as a regularization method to reduce the risk of overfitting, and the teacher network after pruning optimization can usually be better generalized to new data, and in the process of knowledge distillation, the student network will also be more inclined to learn more generalized patterns and knowledge. (3) One of the keys in knowledge distillation lies in the learning and transmission of dark knowledge such as "inter-class interrelationship" and "sample similarity", and the teacher network after pruning optimization is more likely to pay attention to the important features of the target, improve the quality of its prediction information, and the dark knowledge transmitted to the student network is more instructive and rich. Figure 4 illustrates the confidence representation of labels with different degrees of softening. Among them, the hard label of the binary value can only indicate whether it is the target or not, and the information is too simple. The softened labels can make the relative probabilities between classes smoother to a certain extent, which is helpful for the network to mine the deep features used to distinguish different types of targets. The soft labels after channel pruning provide a smoother information distribution, the differences between classes of different targets are further amplified, and the dark knowledge is richer, which makes it easier for the student network to understand the detection task and imitate the decision-making process of the teacher network, and enhances its ability to model uncertain targets.

Figure 4

Fig.4 Information representation of labels with different degrees of softening

Fig. 4 Information expression of labels with different softening degrees

2 Experimental verification

2.1 Experimental environment and SAR image dataset

The test environment in this paper is Windows 10 operating system and Pytorch deep learning framework, and the computer configuration is Intel Xeon Gold 6230, 2.10 GHz. 128 GB RAM, two NVIDIA GeForce RTX 2080Ti graphics cards, 22 GB VRRAM. In order to verify the applicability of the lightweight SAR target detection method designed in this paper under the condition of multiple target types, the tank and bridge targets in the MSAR dataset [26-27] and the ship targets in the SSDD dataset [28] were combined to construct the experimental dataset. The MSAR dataset contains 1250 tank target images and 1582 bridge target images, and the SSDD dataset contains 1160 ship target images (Figure 5). All kinds of targets are divided into 8:2 training set and validation set, and the partition of SSDD ship dataset is based on the partition specification of Ref. [28], as shown in Table 1.

Figure 5

Figure 5 Target type in the dataset

Fig. 5 The target types in the datasets

Table 1 Distribution of the number of experimental datasets

Tab. 1 Number distribution of experimental datasets

Target type	data set	Training set/sheet	Verification set/sheet	Average number of targets/(pcs/sheet)
oilcan	MSAR	1000	250	9.86
bridge	MSAR	1266	316	1.17
warship	SSDD	812	232	2.12

A new window opens| Download the CSV

The effectiveness of the method was verified according to the evaluation indexes commonly used in target detection, including recall rate R, accuracy P, average precision mAP, F1 value, model size and inference time

(12)

(13)

(14)

(15)

where TP is the number of correctly detected targets; FN is the number of missed detection targets; FP is the number of misdetected targets; n is the number of target categories; P(R) is the curve of accuracy and recall; The higher the value of the average precision mAP, the better the model performance.

2.2 Parameter setting and model optimization

In order to determine the optimal teacher network and student network, five networks of different sizes in YOLOv5 were used to train on the dataset. Parameter setting: epochs to 300; The initial learning rate is 0.01, the IoU threshold is 0.5, and the batch size is 16. The input image size is 640×640, and the optimizer uses SGD. In the process of channel pruning and knowledge distillation, the sr of the sparsity training parameter was 5×10-4. The knowledge distillation temperature variable τ is 20, and the knowledge distillation loss function is L2 distillation loss. The balance factor α is 0.5. After training, the performance of network models of different sizes on the validation set is shown in Table 2.

Table 2 Training results of network models of different sizes

Tab. 2 Training results of network models of different sizes

Network model	Average accuracy/(%)	Average Turnover Rate/(%)	Average Accuracy/(%)	Model volume/MB	Mean inference time/ms
YOLOv5-x	90.9	93.6	93.9	165.00	23.9
YOLOv5-l	90.9	90.9	93.1	88.50	14.4
YOLOv5-m	91.4	92.5	94.2	40.20	8.9
YOLOv5-s	91.3	93.3	94.0	13.70	3.6
YOLOv5-n	91.4	91.6	92.2	3.73	1.7

A new window opens| Download the CSV

Due to the limited number of datasets, too large models can easily lead to overfitting, resulting in a decrease in average accuracy. Therefore, YOLOv5-m, a medium-volume network with the highest detection accuracy, is selected as the benchmark teacher network. The YOLOv5-n is only 3.73 MB in size, making it suitable for student networking.

2.3 Test results and analysis

2.3.1 Comparative test of knowledge distillation under different channel pruning ratios

In order to verify the effectiveness of the proposed method and the effect of the teacher network on the performance improvement of the lightweight student model after pruning with different proportions, a set of comparative experiments on knowledge distillation under different channel pruning ratios were designed. YOLOv5-m was used as the teacher network after channel pruning at a ratio of 10%~69%, and the lightweight model YOLOv5-n was used as the student network for knowledge distillation, and the detection results of knowledge distillation were compared with the benchmark algorithm YOLOv5-n and the knowledge distillation without pruning with the same volume. The performance statistics are shown in Table 3, and the optimal results for each column are shown in bold font, the same below.

Table 3 The results of teacher network improvement on students' network performance with different pruning ratios

Tab. 3 Results of teacher networks with different pruning ratios improving student network performance

Student Network	Teacher Network	Pruning ratio/(%)	Average accuracy/(%)	Average Turnover Rate/(%)	Average Accuracy/(%)	Model volume/MB	Inference time/ms
YOLOv5-n	×	×	91.4	91.6	92.2	3.73	1.7
YOLOv5-n	YOLOv5-m	×	92.1	90.6	94.1
10	91.6	92.5	94.7
20	91.4	93.8	94.6
30	91.9	92.9	95.0
40	92.0	92.6	94.9
50	90.2	93.6	93.8
60	91.1	92.6	93.9
69	90.9	93.4	94.1

A new window opens| Download the CSV

It can be seen from Table 3 that under the condition that the model volume is the same, the average accuracy of the student network guided by the teacher network with channel pruning is higher than that of the benchmark algorithm YOLOv5-n, and the distillation effect of the teacher network with channel pruning is better than that of direct distillation. Among them, the distillation effect of 30% pruning was the best, and the average accuracy was increased by 2.8% compared with the original student network YOLOv5-n, and the average accuracy was increased by 0.9% compared with the direct distillation, which proved the effectiveness of the proposed method. However, it is also noted that a large proportion of pruning of teachers' networks will not be conducive to the improvement of students' network detection accuracy.

2.3.2 Effect of channel pruning ratio on teacher network performance

In order to further explore the reason why the pruning teacher network can improve the detection accuracy of the student network, the performance of the teacher network with different pruning ratios was tested. From the experimental results in Table 4, it can be seen that the performance of the teacher network with different pruning ratios has been improved to varying degrees after fine-tuning the training. When the pruning ratio is low (less than 30% in the test in this paper), the secondary channel is cut, the parameters and computational amount of the model are reduced, and the detection accuracy is restored or improved after fine-tuning. However, when a higher proportion of pruning is continued, the average detection accuracy of the model is oscillated due to the discarding of some important feature channels, and the stability is affected to a certain extent.

Table 4 Optimization results of different pruning ratios on teachers' network performance

Tab. 4 Optimization results of different pruning ratios on teacher network performance

Network model	Pruning ratio/(%)	Average accuracy/(%)	Average Turnover Rate/(%)	Average Accuracy/(%)	Model volume/MB	Inference time/ms
YOLOv5-m (Benchmark)	—	91.4	92.5	94.2	40.20	8.9
Teacher Network A	10	91.1	93.9	94.3	36.2	6.9
Teacher Network B	20	91.2	94.5	94.9	31.1	6.4
Teacher Network C	30	91.6	94.6	94.9	26.6	6.2
Teacher Network D	40	91.0	93.7	94.5	22.6	6.0
Teacher Network E	50	92.2	94.4	95.3	19.2	5.8
Teacher Network F	60	90.5	94.4	94.6	16.5	5.5
Teacher Network G	69	91.0	94.9	94.7	14.8	5.5

A new window opens| Download the CSV

As shown in Figure 6, compared with the benchmark model, the recall rate of the teacher model optimized by pruning and fine-tuning is significantly increased, and the feature expression ability of the model on the training set target is further improved, but the accuracy is always oscillating around the benchmark value. Combined with the results of knowledge distillation in Table 3, the recall rate of the student network after distillation is also improved accordingly, indicating that knowledge distillation can effectively transfer the useful information carried by the optimized teacher network to the lightweight student network.

Figure 6

注：虚线为YOLOv5-m的基准值。图6 不同剪枝比例对教师网络性能优化结果的可视化Fig. 6 Visualization of teacher network performance optimization with different pruning ratios

The Head part of YOLOv5 contains 3 detection layers of different scales for predicting targets of different sizes, and visualizes the heat of the output of the 3 detection layers in the teacher network model with different pruning ratios, as shown in Figure 7. The change of the heat region proves that after a certain proportion of channel pruning and fine-tuning training, the model can significantly improve the attention to the target and suppress the response to the background region. However, the information loss and prediction anomalies caused by the high pruning ratio also reflect that excessive pruning will cause unstable network performance.

Figure 7

Fig.7 Prediction heat of teacher network detection layer under different pruning ratios

Fig. 7 Predictive heat for detection layers of teacher networks with different pruning ratios

In summary, the teacher network with higher average detection accuracy after pruning optimization cannot guarantee a better knowledge distillation effect, and the performance oscillation caused by excessive pruning is not conducive to the steady improvement of the overall performance of the model. Therefore, according to the experimental results of this paper, a certain proportion of channel pruning on the teacher network is conducive to improving the knowledge distillation effect, because by eliminating the secondary channels, the target feature information is condensed, and the knowledge of better transfer is provided while compressing the volume of the teacher network, rather than its own detection accuracy is the highest. In the process of knowledge distillation, this information can also be transmitted to the student network more effectively, and finally the detection accuracy of the lightweight network can be improved.

2.3.3 Statistics and comparison of the performance of typical SAR object detection algorithms

In order to evaluate the comprehensive performance of the proposed method in terms of detection accuracy, model volume and detection efficiency, the current mainstream deep learning object detection algorithms YOLOv3, YOLOv7, YOLOX and the corresponding lightweight models were selected to compare the performance with the proposed method (30% channel pruning + knowledge distillation). Among them, YOLOv3, YOLOv7 and this paper are single-stage object detection methods based on anchor frames, and YOLOX is an anchor frame-free object detection method. During the experiment, the parameter settings of each method are consistent with the configuration in Section 2.2.

From the method performance statistics in Table 5 and the visualization results in Figure 7, it can be seen that the average accuracy of the proposed method is 95.0% under the condition that the model volume is only 3.73 MB and the inference time is less than 2 ms, which is 10.9%, 5.4%, 2%, 2.5% and 2.8% higher than that of other lightweight object detection algorithms YOLOv3-tiny, YOLOv7-tiny, YOLOX-tiny, YOLOX-nano and YOLOv5-n, respectively. At the same time, the proposed method greatly surpasses YOLOv3, YOLOv7 and YOLOX-s in terms of model volume and inference time, and the comprehensive performance is the best.

Table 5 Comparison of the performance of different SAR target detection methods

Tab. 5 Results of performance comparison of different SAR target detection algorithms

way	Average accuracy/(%)	Average Turnover Rate/(%)	Average Accuracy/(%)	Model volume/MB	Inference time/ms
YOLOv3	90.8	92.0	93.2	117.00	12.8
YOLOv3-tiny	90.9	75.1	84.1	16.60	1.9
YOLOv7	93.0	93.2	94.1	71.30	11.5
YOLOv7-tiny	91.1	88.6	89.6	11.70	3.4
YOLOX-s	92.2	93.8	94.9	34.30	7.9
YOLOX-tiny	91.5	92.7	93.0	19.40	4.8
YOLOX-nano	90.7	92.1	92.5	3.6	1.5
YOLOv5-n	91.4	91.6	92.2	3.73	1.7
Methods (30% channel pruning + knowledge distillation)	91.9	92.9	95.0	3.73	1.7

A new window opens| Download the CSV

FIGURE 8

Figure 8 Comparison of "Average Accuracy-Inference Time-Model Volume".

Fig. 8 Visual comparison of “mAP-inference time-model size”

2.4 Generalization performance test

In order to verify the rapid detection ability of the proposed method in SAR images of large-scale complex scenes, the optimal model trained on the combined dataset was used to detect ships, oil tanks and bridges in the strip mode images of the Haisi 1 SAR satellite. The test images were converted and filtered to generate 8-bit depth images with a size of 10 000×4262 pixels, including various types of scenes such as open sea, near shore, urban area and port. Before detection, the 20% overlap was set and divided into 525 image slides with a size of 640×640. Fig. 9 shows the migration detection results of the method in six regions A-F, in which the correctly detected ship, oil tank and bridge targets are represented by red, pink and orange rectangular boxes, the blue boxes represent missed targets, and the yellow boxes represent misdetected targets. The statistical results in Table 6 show that compared with the results of direct knowledge distillation, the average F1 value of the proposed method reaches 0.841 3, which is 8.63% higher than that of the direct distillation method. Except for some small oil tanks with vague shapes, shore ship targets affected by crane booms, and bridge targets with too short length, the rest of the targets can be detected correctly. At the same time, almost all of the targets of small ships can be detected correctly, which proves that the model has strong generalization performance and strong multi-scale target detection ability, which ensures high detection efficiency and flexible deployment ability while ensuring detection accuracy. However, in scenarios with relatively complex backgrounds and susceptible to interference from other targets or features (such as D and F areas), the recall rate is slightly reduced compared with the direct knowledge distillation method because the teacher network will slightly lose some target feature information during pruning optimization, and then cannot effectively provide the transfer of relevant information in the knowledge distillation process. The high accuracy of the proposed method has more important practical significance.

Figure 9

Fig.9 The generalization ability test results of the proposed method on the SSN-1 SAR image

Fig. 9 The results of the proposed method's generalization test on the HISEA-1 SAR image

Table 6 Statistics and method comparison of generalization test results in each region

Tab. 6 Generalization test results statistics and methods comparison in different regions

region	Scene type	Target type	Knowledge Distillation Method	Methods of this article
Average accuracy/(%)	Average Turnover Rate/(%)	F1 value	Average accuracy/(%)	Average Turnover Rate/(%)	F1 value
A	City	bridge	100.00	50	0.666 7	100.00	62.50	0.769 2
B	dock	Ships on the shore	69.23	60	0.642 9	90.91	66.67	0.769 3
C	high seas	warship	100.00	90.91	0.952 4	100.00	100.00	1.000 0
D	Pier jetty	Docking ships, oil tanks	90.99	90.18	0.905 8	95.96	84.82	0.900 5
And	high seas	warship	100.00	50	0.666 7	100.00	83.33	0.909 1
F	Yard Terminal	Docking ships, bridges	72.73	66.67	0.695 7	87.50	58.33	0.700 0
average value	88.83	67.96	0.755 0	95.73	77.43	0.841 3

A new window opens| Download the CSV

3 Concluding remarks

In order to solve the problem of low detection accuracy of existing lightweight SAR targets, this paper designs a lightweight SAR target detection method combining channel pruning and knowledge distillation, which optimizes the teacher network by channel pruning while keeping the volume of the lightweight model of the student network unchanged, and prunes the secondary channels, which reduces the capacity difference between the teacher network and the student network, so that the teacher network provides knowledge that is easier to migrate and better received by the student network, and improves the detection accuracy. The method was validated on the recombinant MSAR and SSDD multi-class target datasets, and the detection accuracy of the method designed in this paper was better than that of direct knowledge distillation and YOLOv5-n in the same volume. Compared with typical deep learning object detection methods, it has better comprehensive performance in terms of detection accuracy, model volume and inference speed, but in complex backgrounds, some targets with background interference are easy to be missed, and the generalization ability needs to be further improved. In the future, more advanced object detection methods will be considered, combined with intermediate layer knowledge distillation and difficult target mining, to optimize the SAR image object detection scheme in complex backgrounds, and improve the adaptability and robustness of the lightweight method in different scenarios.

First trial: Hou Lin review: Song Qifan

Final Judge: Jin Jun

information

Qihao Huang, School of Geospatial Information, University of Information Technology: Lightweight SAR Target Detection Combining Channel Pruning and Knowledge Distillation|Journal of Surveying and Mapping, Vol. 53, No. 4, 2024

Lightweight SAR target detection combining channel pruning and knowledge distillation

1 Lightweight SAR target detection method combining channel pruning and knowledge distillation

Figure 1

1.1 Channel pruning module

1.1.1 Sparse training

1.1.2 Channel pruning

Figure 2

1.1.3 Fine-tuning training

1.2 Knowledge Distillation Module

1.2.1 Knowledge distillation network architecture

Figure 3

1.2.2 Construction of knowledge distillation loss function

1.3 SAR target detection module

Figure 4

2 Experimental verification

2.1 Experimental environment and SAR image dataset

Figure 5

2.2 Parameter setting and model optimization

2.3 Test results and analysis

2.3.1 Comparative test of knowledge distillation under different channel pruning ratios

2.3.2 Effect of channel pruning ratio on teacher network performance

Figure 6

Figure 7

2.3.3 Statistics and comparison of the performance of typical SAR object detection algorithms

FIGURE 8

2.4 Generalization performance test

Figure 9

3 Concluding remarks

First trial: Hou Lin review: Song Qifan Final Judge: Jin Jun

Read on

First trial: Hou Lin review: Song Qifan

Final Judge: Jin Jun