看不到圖請轉至：http://sangni007.blog.163.com/blog/static/174728148201481305957863/

=======雙邊濾波概述=======

雙邊濾波（Bilateral filter）是一種可以保邊去噪的濾波器。之是以可以達到此去噪效果，是因為濾波器是由兩個函數構成。一個函數是由幾何空間距離決定濾波器系數。另一個由像素內插補點決定濾波器系數。可以與其相比較的兩個filter：高斯低通濾波器(http://en.wikipedia.org/wiki/Gaussian_filter)和α-截尾均值濾波器（去掉百分率為α的最小值和最大之後剩下像素的均值作為濾波器）。

雙邊濾波CUDA優化——BilateralFilter CUDA =======雙邊濾波概述==============雙邊濾波優化（CUDA）=======

=======雙邊濾波公式=======

雙邊濾波CUDA優化——BilateralFilter CUDA =======雙邊濾波概述==============雙邊濾波優化（CUDA）=======

=======雙邊濾波代碼（CPU）======= OpenCV源碼：

/****************************************************************************************\
                                   Bilateral Filtering
\****************************************************************************************/

namespace cv
{

static void
bilateralFilter_8u( const Mat& src, Mat& dst, int d,
                    double sigma_color, double sigma_space,
                    int borderType )
{
    int cn = src.channels();
    int i, j, k, maxk, radius;
    Size size = src.size();

    CV_Assert( (src.type() == CV_8UC1 || src.type() == CV_8UC3) &&
        src.type() == dst.type() && src.size() == dst.size() &&
        src.data != dst.data );

    if( sigma_color <= 0 )
        sigma_color = 1;
    if( sigma_space <= 0 )
        sigma_space = 1;

    double gauss_color_coeff = -0.5/(sigma_color*sigma_color);
    double gauss_space_coeff = -0.5/(sigma_space*sigma_space);

    if( d <= 0 )
        radius = cvRound(sigma_space*1.5);
    else
        radius = d/2;
    radius = MAX(radius, 1);
    d = radius*2 + 1;

    Mat temp;
    copyMakeBorder( src, temp, radius, radius, radius, radius, borderType );

    vector<float> _color_weight(cn*256);
    vector<float> _space_weight(d*d);
    vector<int> _space_ofs(d*d);
    float* color_weight = &_color_weight[0];
    float* space_weight = &_space_weight[0];
    int* space_ofs = &_space_ofs[0];

    // initialize color-related bilateral filter coefficients
    for( i = 0; i < 256*cn; i++ )
        color_weight[i] = (float)std::exp(i*i*gauss_color_coeff);

    // initialize space-related bilateral filter coefficients
    for( i = -radius, maxk = 0; i <= radius; i++ )
        for( j = -radius; j <= radius; j++ )
        {
            double r = std::sqrt((double)i*i + (double)j*j);
            if( r > radius )
                continue;
            space_weight[maxk] = (float)std::exp(r*r*gauss_space_coeff);
            space_ofs[maxk++] = (int)(i*temp.step + j*cn);
        }

    for( i = 0; i < size.height; i++ )
    {
        const uchar* sptr = temp.data + (i+radius)*temp.step + radius*cn;
        uchar* dptr = dst.data + i*dst.step;

        if( cn == 1 )
        {
            for( j = 0; j < size.width; j++ )
            {
                float sum = 0, wsum = 0;
                int val0 = sptr[j];
                for( k = 0; k < maxk; k++ )
                {
                    int val = sptr[j + space_ofs[k]];
                    float w = space_weight[k]*color_weight[std::abs(val - val0)];
                    sum += val*w;
                    wsum += w;
                }
                // overflow is not possible here => there is no need to use CV_CAST_8U
                dptr[j] = (uchar)cvRound(sum/wsum);
            }
        }
        else
        {
            assert( cn == 3 );
            for( j = 0; j < size.width*3; j += 3 )
            {
                float sum_b = 0, sum_g = 0, sum_r = 0, wsum = 0;
                int b0 = sptr[j], g0 = sptr[j+1], r0 = sptr[j+2];
                for( k = 0; k < maxk; k++ )
                {
                    const uchar* sptr_k = sptr + j + space_ofs[k];
                    int b = sptr_k[0], g = sptr_k[1], r = sptr_k[2];
                    float w = space_weight[k]*color_weight[std::abs(b - b0) +
                        std::abs(g - g0) + std::abs(r - r0)];
                    sum_b += b*w; sum_g += g*w; sum_r += r*w;
                    wsum += w;
                }
                wsum = 1.f/wsum;
                b0 = cvRound(sum_b*wsum);
                g0 = cvRound(sum_g*wsum);
                r0 = cvRound(sum_r*wsum);
                dptr[j] = (uchar)b0; dptr[j+1] = (uchar)g0; dptr[j+2] = (uchar)r0;
            }
        }
    }
}

OpenCV 雙邊濾波調用

bilateralFilter(InputArray src, OutputArray dst, int d, double sigmaColor, double sigmaSpace,
                      int borderType=BORDER_DEFAULT )；

d 表示濾波時像素鄰域直徑，d為負時由 sigaColor計算得到；d>5時不能實時處理。 sigmaColor、sigmaSpace非别表示顔色空間和坐标空間的濾波系數sigma。可以簡單的指派為相同的值。<10時幾乎沒有效果；>150時為油畫的效果。 borderType可以不指定。

OpenCV 雙邊濾波實驗

用sigma為10，150,240,480 時效果如下：

雙邊濾波CUDA優化——BilateralFilter CUDA =======雙邊濾波概述==============雙邊濾波優化（CUDA）=======

=======雙邊濾波優化（CUDA）=======

在進行圖像處理時，由于計算量大，常常無法到達實時的效果，是以需利用GPU處理，使用CUDA進行優化。尤其是圖像濾波這種，(1) 并行度高，線程間耦合度低，每個像素的處理并不互相影響；(2) 像素傳輸量小，計算量大；特别适合CUDA進行計算。 CUDA BilateralFilter流程（可擴充至CUDA圖像處理領域）

複制資料 Copy Data to Device
- 在Device上開辟2維資料空間作為輸入資料：
  - UINT *dImage = NULL; //original image
  - size_t pitch;
  - checkCudaErrors( cudaMallocPitch(&dImage, &pitch, sizeof(UINT)*width, height) );
- 複制資料到顯示卡 Copy Data from Host to Device：
  - checkCudaErrors( cudaMemcpy2D(dImage, pitch, hImage, sizeof(UINT)*width, sizeof(UINT)*width, height, cudaMemcpyHostToDevice));
- 在Device上開辟2維資料空間儲存輸出資料：
  - unsigned int *dResult;
  - size_t pitch;
  - checkCudaErrors( cudaMallocPitch((void **)&dResult, &pitch, width*sizeof(UINT), height) );
使用紋理儲存器
- 聲明CUDA數組前，以結構體channelDesc描述CUDA數組中元件的數量和資料類型
  - cudaChannelFormatDesc desc = cudaCreateChannelDesc<uchar4>();
- 聲明紋理參照系:texture<Type,Dim,ReadMode> texRef;
  - texture<uchar4, 2, cudaReadModeElementType> rgbaTex; //以全局變量形式出現
- 将紋理參照系綁定到一個CUDA數組
  - checkCudaErrors( cudaBindTexture2D(0, rgbaTex, dImage, desc, width, height, pitch));
- 紋理拾取 // 書P65
  - tex1Dfetch()
  - tex1D(); tex2D(); tex3D();
使用常量儲存器
- 位于顯存，但擁有緩存加速。空間較小（64K），儲存頻繁通路的隻讀資料，
- 兩種使用方法
  - 直接指派：__constant__ float c_cuda[4] = {0,1,2,3}; __constant__ float c_num = 1;
  - 使用函數：__constant__ float color_weight[4*256]; checkCudaErrors(cudaMemcpyToSymbol(color_weight, color_gaussian, sizeof(float)*(4*256)));
- 本程式
  - __constant__ float color_weight[4*256];
  - __constant__ float space_weight[1024];
  - 函數聲明局部變量數組 float color_gaussian[4*256]; float space_gaussian[1024]; 使用定義域核和值域核指派
  - 函數指派：
    - checkCudaErrors(cudaMemcpyToSymbol(color_weight, color_gaussian, sizeof(float)*(4*256)));
    - checkCudaErrors(cudaMemcpyToSymbol(space_weight, space_gaussian, sizeof(float)*(1024)));
CUDA處理Kernel
- 并行設計：
  - dim3 blockSize(16, 16);
  - dim3 gridSize((width + 16 - 1) / 16, (height + 16 - 1) / 16);
  - d_bilateral_filter<<< gridSize, blockSize>>>(dDest, width, height, radius);
- 紋理拾取
  - center = tex2D(rgbaTex, x, y);
- 計算定義域和值域核的值
複制資料Copy Data to Host
- checkCudaErrors( cudaMemcpy(hImage,dResult,(width * height)*sizeof(UINT),cudaMemcpyDeviceToHost));

CUDA源碼： BilateralKernel.cu

#include <helper_math.h>
#include <helper_functions.h>
#include <helper_cuda.h>       // CUDA device initialization helper functions
#include "cuda.h"
#include "cuda_runtime_api.h"
#include "device_launch_parameters.h"
#include "device_functions.h"

__constant__ float color_weight[4*256];
__constant__ float space_weight[1024];

UINT *dImage  = NULL;   //original image
UINT *dTemp   = NULL;   //temp array for iterations
size_t pitch;

texture<uchar4, 2, cudaReadModeElementType> rgbaTex;//聲明紋理參照系


__device__ float colorLenGaussian(uchar4 a, uchar4 b)
{
	//若想達到漫畫效果，就注釋掉sqrt,使顔色距離變大
    uint mod = (uint)sqrt(((float)b.x - (float)a.x) * ((float)b.x - (float)a.x) +
				((float)b.y - (float)a.y) * ((float)b.y - (float)a.y) +
                ((float)b.z - (float)a.z) * ((float)b.z - (float)a.z) +
				((float)b.w - (float)a.w) * ((float)b.w - (float)a.w));

    return color_weight[mod];
}
__device__ uint rgbaFloatToInt(float4 rgba)
{
    rgba.x = __saturatef(fabs(rgba.x));   // clamp to [0.0, 1.0]
    rgba.y = __saturatef(fabs(rgba.y));
    rgba.z = __saturatef(fabs(rgba.z));
    rgba.w = __saturatef(fabs(rgba.w));
    return (uint(rgba.w * 255.0f) << 24) | (uint(rgba.z * 255.0f) << 16) | (uint(rgba.y * 255.0f) << 8) | uint(rgba.x * 255.0f);
}
__device__ float4 rgbaIntToFloat(uint c)
{
    float4 rgba;
    rgba.x = (c & 0xff) * 0.003921568627f;       //  /255.0f;
    rgba.y = ((c>>8) & 0xff) * 0.003921568627f;  //  /255.0f;
    rgba.z = ((c>>16) & 0xff) * 0.003921568627f; //  /255.0f;
    rgba.w = ((c>>24) & 0xff) * 0.003921568627f; //  /255.0f;
    return rgba;
}
//column pass using coalesced global memory reads
__global__ void
d_bilateral_filter(uint *od, int w, int h, int r)
{
    int x = blockIdx.x*blockDim.x + threadIdx.x;
    int y = blockIdx.y*blockDim.y + threadIdx.y;

    if (x >= w || y >= h)
    {
        return;
    }

    float sum = 0.0f;
    float factor = 0.0f;;
    uchar4 t = {0, 0, 0, 0};
	float tw = 0.f,tx = 0.f, ty = 0.f, tz = 0.f;
    uchar4 center = tex2D(rgbaTex, x, y);
	//t = center;
	int posIndex = 0;
    for (int i = -r; i <= r; i++)
    {
        for (int j = -r; j <= r; j++)
        {
			uchar4 curPix = {0, 0, 0, 0};
			UINT d = (UINT) sqrt((double)i*i + (double)j*j);
			if(d>r) 
				continue;

			if(x + j<0||y + i<0||x + j>w-1||y + i>h-1)
			{
				factor = 0;
			}
			else
			{
				curPix = tex2D(rgbaTex, x + j, y + i);
				factor =space_weight[d] *     //domain factor
                     colorLenGaussian(curPix, center);             //range factor
			}
            

            tw += factor * (float)curPix.w;
			tx += factor * (float)curPix.x;
            ty += factor * (float)curPix.y;
			tz += factor * (float)curPix.z;
			sum += factor;
        }
    }
	t.w = (UCHAR)(tw/sum);
	t.x = (UCHAR)(tx/sum);
	t.y = (UCHAR)(ty/sum);
	t.z = (UCHAR)(tz/sum);
    od[y * w + x] = (((UINT)t.w)<<24|((UINT)t.z)<<16|((UINT)t.y)<<8|((UINT)t.x));
}

extern "C"
void updateGaussian(float sigma_color,float sigma_space, int radius)
{
	if( sigma_color <= 0 )  
        sigma_color = 1;  
    if( sigma_space <= 0 )  
        sigma_space = 1;  
	double gauss_color_coeff = -0.5/(sigma_color*sigma_color);  
    double gauss_space_coeff = -0.5/(sigma_space*sigma_space);  

	float color_gaussian[4*256];
	float space_gaussian[1024];

	for(int i=0;i<256*4;i++)
	{	
		color_gaussian[i] = (float)std::exp(i*i*gauss_color_coeff);
		space_gaussian[i] = (float)std::exp(i*i*gauss_space_coeff);
		//if(i>100) color_gaussian[i] = 0.0f; //漫畫效果
	}
// 	for(int i = -radius,int maxk=0;i<radius;i++)
// 		for(int j=-radius;j<radius;j++)
// 		{
// 			double r = sqrt((double)i*i + (double)j*j);
// 			 if( r > radius )
//                 continue;  
// 			space_gaussian[maxk++] = (float)std::exp(r*r*gauss_space_coeff); 
// 			//space_ofs[maxk++] = (int)(i*temp.step + j*4);  
// 		}

    checkCudaErrors(cudaMemcpyToSymbol(color_weight, color_gaussian, sizeof(float)*(4*256)));
	checkCudaErrors(cudaMemcpyToSymbol(space_weight, space_gaussian, sizeof(float)*(1024)));
}

extern "C"
void initTexture(int width, int height, uint *hImage)
{
	// copy image data to array
	checkCudaErrors(cudaMallocPitch(&dImage, &pitch, sizeof(UINT)*width, height));
	checkCudaErrors(cudaMallocPitch(&dTemp,  &pitch, sizeof(UINT)*width, height));
	checkCudaErrors(cudaMemcpy2D(dImage, pitch, hImage, sizeof(UINT)*width,
		sizeof(UINT)*width, height, cudaMemcpyHostToDevice));
}

extern "C"
void freeTextures()
{
    checkCudaErrors(cudaFree(dImage));
    checkCudaErrors(cudaFree(dTemp));
}

// RGBA version
extern "C"
double bilateralFilterRGBA(uint *dDest,
                           int width, int height,
                           int radius, int iterations,
                           StopWatchInterface *timer)
{
    // var for kernel computation timing
    double dKernelTime;

    // Bind the array to the texture
    cudaChannelFormatDesc desc = cudaCreateChannelDesc<uchar4>();
    checkCudaErrors(cudaBindTexture2D(0, rgbaTex, dImage, desc, width, height, pitch));

    for (int i=0; i<iterations; i++)
    {
        // sync host and start kernel computation timer
        dKernelTime = 0.0;
        checkCudaErrors(cudaDeviceSynchronize());
        sdkResetTimer(&timer);
		
		dim3 blockSize(16, 16);
        dim3 gridSize((width + 16 - 1) / 16, (height + 16 - 1) / 16);
        
        d_bilateral_filter<<< gridSize, blockSize>>>(dDest, width, height, radius);

        // sync host and stop computation timer
        checkCudaErrors(cudaDeviceSynchronize());
        dKernelTime += sdkGetTimerValue(&timer);

    }

    return ((dKernelTime/1000.)/(double)iterations);
}

CUDA運作結果：

pace_sigma 和 color_sigma正向影響平滑力度，取值越大，平滑越明顯。
- pace_sigma = 10 ；color_sigma=10；

雙邊濾波CUDA優化——BilateralFilter CUDA =======雙邊濾波概述==============雙邊濾波優化（CUDA）=======

- pace_sigma = 150 ；color_sigma=150；

雙邊濾波CUDA優化——BilateralFilter CUDA =======雙邊濾波概述==============雙邊濾波優化（CUDA）=======

漫畫效果
- 在進行雙邊濾波時，發現有時會出現類似漫畫的效果，物體的邊緣有黑邊

雙邊濾波CUDA優化——BilateralFilter CUDA =======雙邊濾波概述==============雙邊濾波優化（CUDA）=======

漫畫效果原理：

這是由于當顔色差距過大時，值域核為0，

顔色和空間高斯值差距過大時，定義域核和值域核為0（定義域核一般不會為0）

修改代碼實作漫畫效果：

計算高斯核數組時，void updateGaussian( ) 加入：if(i>100) color_gaussian[i] = 0.0f; //漫畫效果
計算顔色距離時，colorLenGaussian(uchar4 a, uchar4 b)，去掉sqrt

雙邊濾波CUDA優化——BilateralFilter CUDA =======雙邊濾波概述==============雙邊濾波優化（CUDA）=======

=======雙邊濾波概述=======

=======雙邊濾波優化（CUDA）=======

繼續閱讀

如何使用前景和背景建立更清晰的照片

在Ubuntu16.04上提取相鄰序列圖像之間的ORB的特征點，并用暴力方法找到比對點并連線一、什麼是ORB特征二、什麼是暴力比對三、實作代碼四、運作方法

Matlab中将二維灰階圖像三維顯示

車道線檢測

自監督｜「CoCLR」視訊自監督對比學習筆記

linxu下CUDA靜态庫-上

視訊對象分割（Video Object Segmentation）研究小記任務定義與資料集技術路線分類基于神經網絡的模型總結

《cuda并行程式設計》勘誤（3）

《cuda并行程式設計》勘誤（2）

opencv視覺跟蹤——消除背景模組化

圖形處理單元(GPU)的演進

2021-09-30三維點雲測量正方形包裹體積

DOG算子

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory（完美解決）

一種解決思路： ImportError: libcublas.so.10.0: cannot open shared object file: No such file

cs231n斯坦福基于卷積神經網絡的CV學習筆記（一）KNN和線性分類器/分類器損失/反向傳播一，KNN圖像分類算法二，線性分類器三，線性分類器損失四，反向傳播五，神經網絡