基本概念

空间注意力机制使用SAM模块，在Darknet中，新添加的sam_layer层就是用于SAM模块，该层在darknet.h中的定义为sam. 其原理图如下：

【YOLOv4探讨之八】（2）空间注意力机制SAM -- 利用Darknet YOLOv4在网络中添加注意力机制基本概念配置实现源码修改与分析训练效果小结

其在网络中的部位仍然是RES残差模块中，首先对残差模块最后一个卷积模块输出分别求沿着通道方向的全局maxpool和全局avgpool，形成两个通道数为1的feature map,对两个feature map做containation，然后对这个2通道的输出做卷积，卷积完毕后使用Sigmoid激活函数确定空间平面上的权重，然后和残差模块最后一个卷积模块输出相乘。

【YOLOv4探讨之八】（2）空间注意力机制SAM -- 利用Darknet YOLOv4在网络中添加注意力机制基本概念配置实现源码修改与分析训练效果小结

该过程主要功能是提升目标定位效果，在空间上突出需要定位的目标打分权重。

配置实现

这里依然使用的是yolov3-tiny.cfg进行改造，添加RES和SAM模块需要在配置文件中增加####标注的内容：

......
......
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2


[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

#########新增的配置内容#######
####先对RES模块增加做准备，通道数一般往小设计，后续还要通过route层做containation###

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

####两个RES模块######
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

###SAM模块###
#通道方向全局最大池化
[maxpool]
maxpool_depth = 1
out_channels = 1
#通道方向全局平均池化
[route]
layers = -2

[avgpool]
channelpool = 1
#对两个1*H*W的池化层做containation
[route]
layers = -1, -3
#对containation后的池化层进行卷积
[convolutional]
batch_normalize=1
filters=128
size=7
stride=1
pad=1
activation=logistic#做Sigmoid
#空间注意力加权
[sam]
from = -6
activation= linear

###SAM模块结束####

[shortcut]
from=-9
activation=linear
###RES模块结束####

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[route]
layers = -1,-16

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
#####新增的配置内容结束#####
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=1

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
......
......

以上配置文件中，对containation后的池化层进行卷积这个过程这里直接进行128通道的卷积。严格按照原理图，卷积后为 1 × H × W 1\times H \times W 1×H×W，因为需要使用sam_layer和128通道的输入层进行相乘，这里需要进行128次containation。这个过程也可使用如下配置片段进行代替

......
......
#对containation后的池化层进行卷积
[convolutional]
batch_normalize=1
filters=1
size=7
stride=1
pad=1
activation=logistic#做Sigmoid
#空间注意力加权
[route]
layers = -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,\
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,\
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,\
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,\
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1

[sam]
from = -7
activation= linear

###SAM模块结束####

[shortcut]
from=-10
activation=linear
###RES模块结束####

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[route]
layers = -1,-17

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
#####新增的配置内容结束#####
......
......

但是遇到一个最大的问题，训练一段时间后，会导致

【YOLOv4探讨之八】（2）空间注意力机制SAM -- 利用Darknet YOLOv4在网络中添加注意力机制基本概念配置实现源码修改与分析训练效果小结

网上都说是显存溢出在进行此操作时，同时监控显存，发现显存占用并没有爆。

猜测有可能是因为128通道每个通道的内容相同，导致反响传播时候出现梯度爆炸，同时上面结果中也可看出loss = -nan。

但在之前的测试中也出现过持续的nan之类，并没有因为loss = -nan或loss = nan程序立马崩溃，可见主要问题还是内存出错。因此更大的可能是因为反复堆叠feature map，内存不连贯，指针偶尔出错有关。

为了实现128层的扩展，使用了128通道的卷积可以实现类似功能，但不会出现梯度爆炸等情况，这里就采用这种方式。

源码修改与分析

这里主要用到YOLOv4新增的sam_layer.c。

由于Darknet中的avgpool_layer.c中没有通道方向的全局平均池化，本人在Darknet的代码中进行了修改，主要涉及parser.c,avgpool_layer.c,avgpool_layer.h和avgpool_layer_kennels.cu.

废话不多说，放码过来。

sam_layer

parser.c

//parse_avgpool可以看出在Darknet框架中cfg文件中需要配置的参数为from和activation
//from就是指定将当前的SAM权重map和哪个层的feature map相乘
//activation默认为linear，同时不支持SWISH或MISH
layer parse_sam(list *options, size_params params, network net)
{
    char *l = option_find(options, "from");
    int index = atoi(l);
    if (index < 0) index = params.index + index;

    int batch = params.batch;
    layer from = net.layers[index];

    layer s = make_sam_layer(batch, index, params.w, params.h, params.c, from.out_w, from.out_h, from.out_c);

    char *activation_s = option_find_str_quiet(options, "activation", "linear");
    ACTIVATION activation = get_activation(activation_s);
    s.activation = activation;
    if (activation == SWISH || activation == MISH) {
        printf(" [sam] layer doesn't support SWISH or MISH activations \n");
    }
    return s;
}

sam_layer.c

void forward_sam_layer(const layer l, network_state state)
{
    //计算输出feature map的尺寸
    int size = l.batch * l.out_c * l.out_w * l.out_h;
    
    float *from_output = state.net.layers[l.index].output;

    int i;
    #pragma omp parallel for
    for (i = 0; i < size; ++i) {
        //将SAM模块输出map和需要处理的feature map点乘
        //注意，输出的size设置为多大，SAM模块输出map有多少层，feature map就会选取的有多少层。另一层含义是和feature map相乘的SAM模块输出map要保持与之相同的size.
        l.output[i] = state.input[i] * from_output[i];
    }

    activate_array(l.output, l.outputs*l.batch, l.activation);
}

void backward_sam_layer(const layer l, network_state state)
{
    gradient_array(l.output, l.outputs*l.batch, l.activation, l.delta);
    //axpy_cpu(l.outputs*l.batch, 1, l.delta, 1, state.delta, 1);
    //scale_cpu(l.batch, l.out_w, l.out_h, l.out_c, l.delta, l.w, l.h, l.c, state.net.layers[l.index].delta);

    int size = l.batch * l.out_c * l.out_w * l.out_h;
    //int channel_size = 1;
    float *from_output = state.net.layers[l.index].output;
    float *from_delta = state.net.layers[l.index].delta;

    int i;
    #pragma omp parallel for
    for (i = 0; i < size; ++i) {
        //这个求微分的过程是分别求from_output和input的偏微分，分别反向传播使用
        state.delta[i] += l.delta[i] * from_output[i]; // l.delta * from  (should be divided by channel_size?)

        from_delta[i] = state.input[i] * l.delta[i]; // input * l.delta
    }
}

avgpool_layer

考虑两种增加功能的思路，一种是增加一个新的函数，一种是增加配置参数进行选择，为尽可能避免给原框架造成过多的调整，考虑增加配置参数+原函数改造的办法。

这里增加的参数定义为channelpool，如果需要从通道方向做池化，设置这个参数为1。

parser.c代码修改

//增加了channelpool，用于选择使用沿着通道方向的平均池化
avgpool_layer parse_avgpool(list *options, size_params params)
{
    int batch,w,h,c;
    //2021.07.04 add增加的参数，默认为0
    int channelpool = option_find_int(options, "channelpool",0);
    w = params.w;
    h = params.h;
    c = params.c;
    batch=params.batch;
    if(!(h && w && c)) error("Layer before avgpool layer must output image.");
    //2021.07.04 modify 调整make_avgpool_layer，增加参数channelpool
    avgpool_layer layer = make_avgpool_layer(batch,w,h,c,channelpool);
    return layer;
}

avgpool_layer.c代码修改

//2021.07.04 modify
avgpool_layer make_avgpool_layer(int batch, int w, int h, int c, int channelpool)
{

    avgpool_layer l = { (LAYER_TYPE)0 };
    l.type = AVGPOOL;
    l.batch = batch;
    l.h = h;
    l.w = w;
    l.c = c;
    l.channelpool = channelpool;
    l.inputs = h*w*c;
    //根据配置设置输出尺寸
    if(!channelpool){
        l.out_w = 1;
        l.out_h = 1;
        l.out_c = l.c;
        fprintf(stderr, "avg                          %4d x%4d x%4d ->   %4d \n",  w, h, c, c);
    }else{
        l.out_w = l.w;//通道方向平均池化输出宽度为输入的宽度
        l.out_h = l.h;//通道方向平均池化输出高度为输入的高度
        l.out_c = 1;//通道方向平均池化输出通道数为1
        l.bflops = (l.c * l.out_h*l.out_w) / 1000000000.;
        //设置网络输出图
        fprintf(stderr, "avg                          %4d x%4d x%4d ->   %4d x%4d x%4d\n",  w, h, c, l.out_w, l.out_h, l.out_c, l.bflops);//
    }
    l.outputs = l.out_h * l.out_w * l.out_c;
    int output_size = l.out_h * l.out_w * l.out_c * batch;
    //int output_size = l.outputs * batch;
    l.output = (float*)xcalloc(output_size, sizeof(float));
    l.delta = (float*)xcalloc(output_size, sizeof(float));
    l.forward = forward_avgpool_layer;
    l.backward = backward_avgpool_layer;
    
    #ifdef GPU
    l.forward_gpu = forward_avgpool_layer_gpu;
    l.backward_gpu = backward_avgpool_layer_gpu;
    l.output_gpu  = cuda_make_array(l.output, output_size);
    l.delta_gpu   = cuda_make_array(l.delta, output_size);
    #endif 

    return l;
}

void resize_avgpool_layer(avgpool_layer *l, int w, int h)
{
    l->w = w;
    l->h = h;
    l->inputs = h*w*l->c;
       
    if(l->channelpool){
    l->out_w = w;//根据载入resize图片的尺寸修改输出尺寸
    l->out_h = h;}
    //l->out_c = l->c;这一句不能添加，会导致通道数错误，也是段错误
    l->outputs = l->out_w * l->out_h * l->out_c;
    int output_size = l->outputs * l->batch;

    if (l->train)l->delta = (float*)xrealloc(l->delta, output_size * sizeof(float));
    l->output = (float*)xrealloc(l->output, output_size * sizeof(float));/**/

//增加：申请GPU处理所需的显存空间
#ifdef GPU
    CHECK_CUDA(cudaFree(l->output_gpu));
    l->output_gpu  = cuda_make_array(l->output, output_size);

    if (l->train) {
        CHECK_CUDA(cudaFree(l->delta_gpu));
        l->delta_gpu = cuda_make_array(l->delta, output_size);
    }

#endif

}

//2021.07.04 modify
void forward_avgpool_layer(const avgpool_layer l, network_state state)
{
    int b,i,k;
    if(!l.channelpool)//如果不使用通道方向池化，处理方法不变
	for(b = 0; b < l.batch; ++b){
	    for(k = 0; k < l.c; ++k){
		int out_index = k + b*l.c;
		l.output[out_index] = 0;
		for(i = 0; i < l.h*l.w; ++i){
		    int in_index = i + l.h*l.w*(k + b*l.c);
		    l.output[out_index] += state.input[in_index];
		}
		    l.output[out_index] /= l.h*l.w;
	    }
	}
    else{//如果使用通道方向池化，增加如下内容
        for(b = 0; b < l.batch; ++b){             
            for(i = 0; i < l.h*l.w; ++i){
                int out_index = i + b*l.h*l.w;
	        l.output[out_index] = 0;
                for(k = 0; k < l.c; ++k){
		    int in_index = k + l.c*(i + b*l.h*l.w);
		    l.output[out_index] += state.input[in_index];           
                }
		    l.output[out_index] /= l.c;
            }
        }
    }

   
}

//2021.07.04 modify
void backward_avgpool_layer(const avgpool_layer l, network_state state)
{
    int b,i,k;
    if(!l.channelpool){//如果不使用通道方向池化，处理方法不变
        for(b = 0; b < l.batch; ++b){
            for(k = 0; k < l.c; ++k){
                int out_index = k + b*l.c;
                for(i = 0; i < l.h*l.w; ++i){
                    int in_index = i + l.h*l.w*(k + b*l.c);
                    state.delta[in_index] += l.delta[out_index] / (l.h*l.w);
                }
            }
        }
    }
    else{//如果使用通道方向池化，增加如下内容
        for(b = 0; b < l.batch; ++b){
            for(i = 0; i < l.h*l.w; ++i){
                int out_index = i + b*l.h*l.w;
                for(k = 0; k < l.c; ++k){
		    int in_index = k + l.c*(i + b*l.h*l.w);
                    state.delta[in_index] += l.delta[out_index] / (l.c);
                }
            }
        }
    }

}

avgpool_layer_kernels.cu 代码修改

__global__ void forward_avgpool_layer_kernel(int n, int w, int h, int c, float *input, float *output,int channelpool)
{
    int id = (blockIdx.x + blockIdx.y*gridDim.x) * blockDim.x + threadIdx.x;
    if(id >= n) return;
    if(!channelpool){//如果不使用通道方向池化，处理方法不变
	    int k = id % c;
	    id /= c;
	    int b = id;

	    int i;
	    int out_index = (k + c*b);
	    output[out_index] = 0;
	    for(i = 0; i < w*h; ++i){
		int in_index = i + h*w*(k + b*c);
		output[out_index] += input[in_index];
	    }
	    output[out_index] /= w*h;
    }
    else{//如果使用通道方向池化，增加如下内容
            int size = w*h;
            int k = id % size;
	    id /= size;
	    int b = id;

	    int i;
	    int out_index = (k + size*b);
	    output[out_index] = 0;
	    for(i = 0; i < c; ++i){
		int in_index = i + c*(k + b*size);
		output[out_index] += input[in_index];
	    }
	    output[out_index] /= c;  

    }
}

__global__ void backward_avgpool_layer_kernel(int n, int w, int h, int c, float *in_delta, float *out_delta,int channelpool)
{
    int id = (blockIdx.x + blockIdx.y*gridDim.x) * blockDim.x + threadIdx.x;
    if(id >= n) return;

    if(!channelpool){//如果不使用通道方向池化，处理方法不变
	    int k = id % c;
	    id /= c;
	    int b = id;

	    int i;
	    int out_index = (k + c*b);
	    for(i = 0; i < w*h; ++i){
		int in_index = i + h*w*(k + b*c);
		in_delta[in_index] += out_delta[out_index] / (w*h);
	    }
    }
    else{//如果使用通道方向池化，增加如下内容
            int size = w*h;
            int k = id % size;
	    id /= size;
	    int b = id;

	    int i;
	    int out_index = (k + size*b);
	    for(i = 0; i < c; ++i){
		int in_index = i + h*w*(k + b*size);
		in_delta[in_index] += out_delta[out_index] / c;
	    }
    }
}

extern "C" void forward_avgpool_layer_gpu(avgpool_layer layer, network_state state)
{   
    size_t n = layer.c*layer.batch;
    forward_avgpool_layer_kernel<<<cuda_gridsize(n), BLOCK, 0, get_cuda_stream() >>>(n, layer.w, layer.h, layer.c, state.input, layer.output_gpu, layer.channelpool);//forward_avgpool_layer_kernel函数中增加layer.channelpool参数
    CHECK_CUDA(cudaPeekAtLastError());
}

extern "C" void backward_avgpool_layer_gpu(avgpool_layer layer, network_state state)
{
    size_t n = layer.c*layer.batch;

    backward_avgpool_layer_kernel<<<cuda_gridsize(n), BLOCK, 0, get_cuda_stream() >>>(n, layer.w, layer.h, layer.c, state.delta, layer.delta_gpu, layer.channelpool);
    //backward_avgpool_layer_kernel函数中增加layer.channelpool参数
    CHECK_CUDA(cudaPeekAtLastError());
}

avgpool_layer.h代码修改

只要修改如下内容即可

maxpool_layer

maxpool_layer内容比较多，这里重点分析和沿通道方向池化有关的代码

parser.c

//Darknet中maxpooling处理方式比较多，参数也比较丰富
//stride和size默认相等，表示池化的尺寸和池化窗口的平移量
//stride_x，stride_y可以将池化平移量在X和Y方向设置成不同的数值
//padding元素填充，一般用于feature map尺寸为奇数的情况
//maxpool_depth表示沿通道方向最大池化
//out_channels只有在maxpool_depth=1时候有用，表示输出层数量
//antialiasing去混叠
maxpool_layer parse_maxpool(list *options, size_params params)
{
    int stride = option_find_int(options, "stride",1);
    int stride_x = option_find_int_quiet(options, "stride_x", stride);
    int stride_y = option_find_int_quiet(options, "stride_y", stride);
    int size = option_find_int(options, "size",stride);
    int padding = option_find_int_quiet(options, "padding", size-1);
    int maxpool_depth = option_find_int_quiet(options, "maxpool_depth", 0);
    int out_channels = option_find_int_quiet(options, "out_channels", 1);
    int antialiasing = option_find_int_quiet(options, "antialiasing", 0);
    const int avgpool = 0;

    int batch,h,w,c;
    h = params.h;
    w = params.w;
    c = params.c;
    batch=params.batch;
    if(!(h && w && c)) error("Layer before [maxpool] layer must output image.");

    maxpool_layer layer = make_maxpool_layer(batch, h, w, c, size, stride_x, stride_y, padding, maxpool_depth, out_channels, antialiasing, avgpool, params.train);
    return layer;
}

maxpool_layer.c

maxpool_layer make_maxpool_layer(int batch, int h, int w, int c, int size, int stride_x, int stride_y, int padding, int maxpool_depth, int out_channels, int antialiasing, int avgpool, int train)
{
......
......

    l.batch = batch;
    l.h = h;
    l.w = w;
    l.c = c;
    l.pad = padding;
    l.maxpool_depth = maxpool_depth;
    l.out_channels = out_channels;
    //当使用通道方向最大池化时，设置输出的map尺寸
    if (maxpool_depth) {
        l.out_c = out_channels;
        l.out_w = l.w;
        l.out_h = l.h;
    }
    else {//当不使用通道方向最大池化时，设置输出的map尺寸，根据池化窗口和平移量进行缩减
        l.out_w = (w + padding - size) / stride_x + 1;
        l.out_h = (h + padding - size) / stride_y + 1;
        l.out_c = c;
    }
    l.outputs = l.out_h * l.out_w * l.out_c;
    l.inputs = h*w*c;
    l.size = size;
    l.stride = stride_x;
    l.stride_x = stride_x;
    l.stride_y = stride_y;
    ......
    ......
}

//maxpool前向传播
void forward_maxpool_layer(const maxpool_layer l, network_state state)
{
    if (l.maxpool_depth)
    {
        int b, i, j, k, g;
        for (b = 0; b < l.batch; ++b) {
            #pragma omp parallel for
            for (i = 0; i < l.h; ++i) {
                for (j = 0; j < l.w; ++j) {
                    for (g = 0; g < l.out_c; ++g)
                    {
                        int out_index = j + l.w*(i + l.h*(g + l.out_c*b));
                        float max = -FLT_MAX;
                        int max_i = -1;
                        //当out_channels大于1，沿通道方向分段取最大值，最后保留out_channels个最大值
                        for (k = g; k < l.c; k += l.out_c)
                        {
                            int in_index = j + l.w*(i + l.h*(k + l.c*b));
                            float val = state.input[in_index];

                            max_i = (val > max) ? in_index : max_i;
                            max = (val > max) ? val : max;
                        }
                        l.output[out_index] = max;
                        if (l.indexes) l.indexes[out_index] = max_i;
                    }
                }
            }
        }
        return;
    }
   ......
   ......
}
//maxpool反向传播
//微分直接穿过，只要通道数正确即可
void backward_maxpool_layer(const maxpool_layer l, network_state state)
{
    int i;
    int h = l.out_h;
    int w = l.out_w;
    int c = l.out_c;
    #pragma omp parallel for
    for(i = 0; i < h*w*c*l.batch; ++i){
        int index = l.indexes[i];
        state.delta[index] += l.delta[i];
    }
}

maxpool_layer_kernels.cu

//比较简单，直接使用CUDA库中的函数，感兴趣的读者可以去英伟达官网看CUDA文档
extern "C" void forward_maxpool_layer_gpu(maxpool_layer layer, network_state state)
{
    if (layer.maxpool_depth) {
        int h = layer.out_h;
        int w = layer.out_w;
        int c = 1;// layer.out_c;

        size_t n = h*w*c*layer.batch;
//如果Makefile中定义CUDA=1，直接使用CUDA库中的函数操作
        forward_maxpool_depth_layer_kernel << <cuda_gridsize(n), BLOCK, 0, get_cuda_stream() >> >(
            n, layer.w, layer.h, layer.c, layer.out_c, layer.batch, state.input, layer.output_gpu, layer.indexes_gpu);
        CHECK_CUDA(cudaPeekAtLastError());

        return;
    }
    ......
    ......
 }

//CUDA库是一个专门的学习内容，实现功能使用下面代码段，这里不再详解
extern "C" void backward_maxpool_layer_gpu(maxpool_layer layer, network_state state)
{
......
......
    if (layer.maxpool_depth) {
        int h = layer.out_h;
        int w = layer.out_w;
        int c = layer.out_c;

        size_t n = h * w * c * layer.batch;

        backward_maxpool_depth_layer_kernel << <cuda_gridsize(n), BLOCK, 0, get_cuda_stream() >> >(n, layer.w, layer.h, layer.c, layer.batch, layer.delta_gpu, state.delta, layer.indexes_gpu);
        CHECK_CUDA(cudaPeekAtLastError());
        return;
    }

    size_t n = layer.h*layer.w*layer.c*layer.batch;

    backward_maxpool_layer_kernel<<<cuda_gridsize(n), BLOCK, 0, get_cuda_stream() >>>(n, layer.h, layer.w, layer.c, layer.stride_x, layer.stride_y, layer.size, layer.pad, layer.delta_gpu, state.delta, layer.indexes_gpu);
    CHECK_CUDA(cudaPeekAtLastError());
}

训练效果

这里直接上图，具体调参这里省略，不具体比较不同网络结构之间的效果。主要证明改造成功。

【YOLOv4探讨之八】（2）空间注意力机制SAM -- 利用Darknet YOLOv4在网络中添加注意力机制基本概念配置实现源码修改与分析训练效果小结

训练2400次效果

【YOLOv4探讨之八】（2）空间注意力机制SAM -- 利用Darknet YOLOv4在网络中添加注意力机制基本概念配置实现源码修改与分析训练效果小结

训练3000次效果

【YOLOv4探讨之八】（2）空间注意力机制SAM -- 利用Darknet YOLOv4在网络中添加注意力机制基本概念配置实现源码修改与分析训练效果小结

小结

在Darknet框架中使用SAM模块最大的挑战在于修改了avgpool_layer的实现代码，增加了沿通道方向的平均池化功能。Darknet框架并不是一个成熟的框架，很多细小的功能都需要修改代码进行添加，希望AlexyAB再辛苦一点，把该完善的功能都增加上，则各种新的tricks就可以直接修改cfg 文件实现了。

这里抛砖引玉，小伙伴还有什么问题，可以给我留言相互交流哦。

【YOLOv4探讨之八】（2）空间注意力机制SAM -- 利用Darknet YOLOv4在网络中添加注意力机制基本概念配置实现源码修改与分析训练效果小结

目录

基本概念

配置实现

源码修改与分析

sam_layer

avgpool_layer

parser.c代码修改

avgpool_layer.c代码修改

avgpool_layer_kernels.cu 代码修改

avgpool_layer.h代码修改

maxpool_layer

训练效果

小结

继续阅读

yolov7 tensorrt模型加速部署【实战】

Android-opencv之CVCamera

基于ORB特征点匹配的对极几何约束实现（源码+讲解）

使用hector构图_如何使用均衡的构图拍摄更清晰的照片

GMS：基于网格运动统计的快速极度鲁棒的特征匹配摘要1 简介2 本文的方法3 应用于快速记分的网格框架4 实验 5 结论

如何使用前景和背景创建更清晰的照片

在Ubuntu16.04上提取相邻序列图像之间的ORB的特征点，并用暴力方法找到匹配点并连线一、什么是ORB特征二、什么是暴力匹配三、实现代码四、运行方法

Matlab中将二维灰度图像三维显示

车道线检测

自监督｜「CoCLR」视频自监督对比学习笔记

视频对象分割（Video Object Segmentation）研究小记任务定义与数据集技术路线分类基于神经网络的模型总结

opencv视觉跟踪——消除背景建模

图形处理单元(GPU)的演进

2021-09-30三维点云测量正方形包裹体积

DOG算子

cs231n斯坦福基于卷积神经网络的CV学习笔记（一）KNN和线性分类器/分类器损失/反向传播一，KNN图像分类算法二，线性分类器三，线性分类器损失四，反向传播五，神经网络

【YOLOv4探讨 之八】（2）空间注意力机制SAM -- 利用Darknet YOLOv4在网络中添加注意力机制基本概念配置实现源码修改与分析训练效果小结

目录

基本概念

配置实现

源码修改与分析

sam_layer

avgpool_layer

parser.c代码修改

avgpool_layer.c代码修改

avgpool_layer_kernels.cu 代码修改

avgpool_layer.h代码修改

maxpool_layer

训练效果

小结

继续阅读

【YOLOv4探讨之八】（2）空间注意力机制SAM -- 利用Darknet YOLOv4在网络中添加注意力机制基本概念配置实现源码修改与分析训练效果小结