FFT加速計算兩個圖的協方差
文章目錄
- <center> FFT加速計算兩個圖的協方差
-
- 1. 傅裡葉變換和卷積
-
- 1.1 卷積定理
- 1.2 空域卷積和頻域乘積的複雜度
- 2. opencv中的DFT
- 3. FFT用于NCC
- 4. 測試結果
- 部分代碼
1. 傅裡葉變換和卷積
1.1 卷積定理

圖檔來源
在空域上的卷積就是上面的動圖所展示的過程,把兩個圖重疊的部分相乘再相加,不斷滑動一個圖到所有可重疊的地方,計算完後得到卷積的結果。
根據卷積定理:兩個函數的卷積等于兩個函數在頻率域相乘。用式子表示如下:
f ( x ) ∗ g ( x ) = F ( ω ) ⋅ G ( ω ) f(x)*g(x)=F(\omega)\cdot G(\omega) f(x)∗g(x)=F(ω)⋅G(ω)
也就是說如果我們把上面的兩個圖像轉換到頻率域,上述的卷積過程就可以變成兩個頻率域圖像對應點的相乘(頻率域的兩個圖大小相同)。轉到頻域再相乘的方式主要的耗時都在傅裡葉變換和逆變換那裡,因為最後的頻域圖相乘那裡複雜度隻有平方項,而傅裡葉變換和逆變換的的複雜度的階數高于2,傅裡葉變換這一步有非常成熟的快速傅裡葉變換(FFT)可以使用來加速這一個過程。
1.2 空域卷積和頻域乘積的複雜度
假設兩個圖像分别為 A N × N , B M × M , M ≤ N A_{N\times N},B_{M\times M},M\leq N AN×N,BM×M,M≤N,則在空域的卷積的時間複雜度是 O ( N 2 ∗ M 2 ) O(N^2*M^2) O(N2∗M2)
FFT的時間複雜度 O ( N 2 log N ) O(N^2\log N) O(N2logN)
注意這裡的複雜度沒有錯,有人可能會覺得應該是 O ( N 2 log N 2 ) O(N^2\log N^2) O(N2logN2),但是在表示複雜度的 O ( ) O() O()表示法中可以忽略系數隻關注最高階的項數, O ( N 2 log N 2 ) = O ( 2 ∗ N 2 log N ) = O ( N 2 log N ) O(N^2\log N^2)=O(2*N^2\log N)=O(N^2\log N) O(N2logN2)=O(2∗N2logN)=O(N2logN)。
上圖來源
複雜度 | n=10 | n=20 |
---|---|---|
log n \log n logn | 1 | 2.996 |
n n n | 10 | 20 |
n log n n\log n nlogn | 10 | 59.9 |
n 2 n^2 n2 | 100 | 400 |
2 n 2^n 2n | 1024 | 1048576 |
n ! n! n! | 3628800 | 2,432,902,008,176,640,000 |
上面兩種方式的複雜度可以看成是 n 2 , n log n n^2 ,n\log n n2,nlogn的差别,可以看到當兩個要卷積圖像都比較大時采用FFT加速的方式還是可以獲得非常可觀的效果。
2. opencv中的DFT
opencv dft 官網介紹
void cv::dft ( InputArray src,
OutputArray dst,
int flags = 0,
int nonzeroRows = 0
)
- src input array that could be real or complex.
- dst output array whose size and type depends on the flags .
- flags transformation flags, representing a combination of the DftFlags
- nonzeroRows when the parameter is not zero, the function assumes that only the first nonzeroRows rows of the input array (DFT_INVERSE is not set) or only the first nonzeroRows of the output array (DFT_INVERSE is set) contain non-zeros, thus, the function can handle the rest of the rows more efficiently and save some time; this technique is very useful for calculating array cross-correlation or convolution using DFT.
根據官網的示例和learning opencv3中的示例實作的卷積如下
/**
* @brief 用FFT實作兩個圖像的卷積,src: H*W ,kernel: h*w,
* 把卷積的過程想象成kernel在src上滑動,記在src上和kernel對應的子圖像塊為Sxy,
* conv(x,y) = \sigma \sigma Sxy(i,j)*kernel(i,j),i in [0,h) ,j in [0,w)
* output: (H-h+1)*(W-w+1)
*
* @param src : CV_8UC1的圖像
* @param kernel : CV_8UC1的圖像
* @param output : CV_32FC1的圖像
* @return int
*/
int convFFT(const cv::Mat &src, const cv::Mat &kernel, cv::Mat& output)
{
if (src.empty() || kernel.empty())
{
MYCV_ERROR(kImageEmpty,"input is empty");
return kImageEmpty;
}
int dft_h = cv::getOptimalDFTSize(src.rows + kernel.rows - 1);
int dft_w = cv::getOptimalDFTSize(src.cols + kernel.cols - 1);
cv::Mat dft_src = cv::Mat::zeros(dft_h, dft_w, CV_32F);
cv::Mat dft_kernel = cv::Mat::zeros(dft_h, dft_w, CV_32F);
cv::Mat dft_src_part = dft_src(cv::Rect(0, 0, src.cols, src.rows));
cv::Mat dft_kernel_part = dft_kernel(cv::Rect(0, 0, kernel.cols, kernel.rows));
//把src,kernel分别拷貝放到dft_src,dft_kernel左上角
src.convertTo(dft_src_part, dft_src_part.type());
kernel.convertTo(dft_kernel_part, dft_kernel_part.type());
cv::dft(dft_src, dft_src, 0, dft_src.rows);
cv::dft(dft_kernel, dft_kernel, 0, dft_kernel.rows);
cv::mulSpectrums(dft_src, dft_kernel, dft_src, 0, true);
int output_h = abs(src.rows - kernel.rows) + 1;
int output_w = abs(src.cols - kernel.cols) + 1;
cv::dft(dft_src, dft_src, cv::DFT_INVERSE + cv::DFT_SCALE, output_h);;
cv::Mat corr = dft_src(cv::Rect(0, 0, output_w,output_h));
output = corr;
return kSuccess;
}
3. FFT用于NCC
NCC前情提要
記 T m × n 為目标圖 ( t a r g e t ) 記T_{m\times n}為目标圖(target) 記Tm×n為目标圖(target), S M × N S_{M\times N} SM×N為源搜尋圖(source), S x , y S_{x,y} Sx,y為S中以點 ( x , y ) (x,y) (x,y)為左上角的和T大小相同的子圖, R ( M − m + 1 ) × ( N − n + 1 ) R_{(M-m+1)\times (N-n+1)} R(M−m+1)×(N−n+1)為比對的結果圖,則 R ( x , y ) = c o v ( S x , y , T ) σ ( S x , y ) σ ( T ) R(x,y)=\frac{cov(S_{x,y},T)}{\sigma(S_{x,y})\sigma(T)} R(x,y)=σ(Sx,y)σ(T)cov(Sx,y,T)
其中 c o v ( S x , y , T ) = E ( S x , y T ) − E ( S x , y ) E ( T ) = Σ i = 1 m Σ j = 1 n S x , y ( i , j ) T ( i , j ) m n − S x , y ˉ T ˉ \begin{aligned} cov(S_{x,y},T) &=E(S_{x,y}T)-E(S_{x,y})E(T)\\ &=\frac{\Sigma_{i=1}^{m}\Sigma_{j=1}^{n}S_{x,y}(i,j)T(i,j)}{mn} - \bar{S_{x,y}}\bar{T} \end{aligned} cov(Sx,y,T)=E(Sx,yT)−E(Sx,y)E(T)=mnΣi=1mΣj=1nSx,y(i,j)T(i,j)−Sx,yˉTˉ
圖像塊 S x , y S_{x,y} Sx,y的均值計算過程已經用積分圖加速過了,NCC的速度從opencv的八百分之一提升到三百分之一。在協方差的計算過程中一直會用到模闆圖和 S x , y S_{x,y} Sx,y的逐像素相乘再求和的結果,就是下式
Σ i = 1 m Σ j = 1 n S x , y ( i , j ) T ( i , j ) \Sigma_{i=1}^{m}\Sigma_{j=1}^{n}S_{x,y}(i,j)T(i,j) Σi=1mΣj=1nSx,y(i,j)T(i,j)
從整個圖像的計算過程來看這就是卷積!而卷積天然就适合用FFT來加速。用一個矩陣來存儲模闆和源圖的卷積結果,上面的計算式就變為通路卷積結果某個位置的值。
4. 測試結果
source image size w,h = (500,500)
target image size w,h = (100,100)
my NCC run 10 times,average use 12.000000 ms
min_value=-0.045359 , min_loc(x,y)=(316,121), max_value=0.044341,max_loc(x,y)=(213,264)
opencv NCC run 10 times,average use 4.000000 ms
min_value=-0.045360 , min_loc(x,y)=(316,121), max_value=0.044340,max_loc(x,y)=(213,264)
source image size w,h = (600,600)
target image size w,h = (100,100)
my NCC run 10 times,average use 16.000000 ms
min_value=-0.046514 , min_loc(x,y)=(186,308), max_value=0.044515,max_loc(x,y)=(444,178)
opencv NCC run 10 times,average use 7.000000 ms
min_value=-0.046515 , min_loc(x,y)=(186,308), max_value=0.044514,max_loc(x,y)=(444,178)
source image size w,h = (700,700)
target image size w,h = (100,100)
my NCC run 10 times,average use 24.000000 ms
min_value=-0.044219 , min_loc(x,y)=(428,192), max_value=0.042701,max_loc(x,y)=(347,262)
opencv NCC run 10 times,average use 8.000000 ms
min_value=-0.044219 , min_loc(x,y)=(428,192), max_value=0.042701,max_loc(x,y)=(347,262)
source image size w,h = (800,800)
target image size w,h = (100,100)
my NCC run 10 times,average use 30.000000 ms
min_value=-0.045473 , min_loc(x,y)=(452,292), max_value=0.044728,max_loc(x,y)=(298,175)
opencv NCC run 10 times,average use 10.000000 ms
min_value=-0.045473 , min_loc(x,y)=(452,292), max_value=0.044728,max_loc(x,y)=(298,175)
source image size w,h = (900,900)
target image size w,h = (100,100)
my NCC run 10 times,average use 39.000000 ms
min_value=-0.046390 , min_loc(x,y)=(716,347), max_value=0.043345,max_loc(x,y)=(591,252)
opencv NCC run 10 times,average use 13.000000 ms
min_value=-0.046390 , min_loc(x,y)=(716,347), max_value=0.043345,max_loc(x,y)=(591,252)
source image size w,h = (1000,1000)
target image size w,h = (100,100)
my NCC run 10 times,average use 52.000000 ms
min_value=-0.046292 , min_loc(x,y)=(727,726), max_value=0.048027,max_loc(x,y)=(664,237)
opencv NCC run 10 times,average use 18.000000 ms
min_value=-0.046293 , min_loc(x,y)=(727,726), max_value=0.048028,max_loc(x,y)=(664,237)
source image size w,h = (1100,1100)
target image size w,h = (100,100)
my NCC run 10 times,average use 57.000000 ms
min_value=-0.047824 , min_loc(x,y)=(116,699), max_value=0.049075,max_loc(x,y)=(319,546)
opencv NCC run 10 times,average use 21.000000 ms
min_value=-0.047824 , min_loc(x,y)=(116,699), max_value=0.049075,max_loc(x,y)=(319,546)
source image size w,h = (1200,1200)
target image size w,h = (100,100)
my NCC run 10 times,average use 75.000000 ms
min_value=-0.048051 , min_loc(x,y)=(1095,47), max_value=0.051388,max_loc(x,y)=(95,634)
opencv NCC run 10 times,average use 25.000000 ms
min_value=-0.048050 , min_loc(x,y)=(1095,47), max_value=0.051388,max_loc(x,y)=(95,634)
請按任意鍵繼續. . .
我實作的NCC耗時從opencv的800多倍,用積分圖降到300多倍,用FFT降到現在的2-3倍。沒想到FFT加速這麼多!FFT YYDS!
部分代碼
/**
* @brief 模闆比對,歸一化交叉相關算法。衡量模闆和待比對圖像的相似性時
* 用(Pearson)相關系數來度量。
* r=cov(X,Y)/(sigma(X) * sigma(Y))
* 其中cov(X,Y): 表示兩個變量的協方差
* cov(X,Y) = E[(X-E(x)) * (Y-E(Y))] = E(XY) - E(x)E(Y)
* sigma(X): 表示X變量的标準差
* sigma(Y): 表示Y變量的标準差
*
* @param source : 搜尋圖CV_8UC1格式
* @param target :模闆圖CV_8UC1格式
* @param result : 比對結果的map圖
* @return int : 程式運作的狀态碼
*/
int NormalizedCrossCorrelationFFT(
const cv::Mat &source,
const cv::Mat &target,
cv::Mat &result
)
{
if (source.empty() || target.empty())
{
MYCV_ERROR(kImageEmpty, "NCC empty input image");
return kImageEmpty;
}
int H = source.rows;
int W = source.cols;
int t_h = target.rows;
int t_w = target.cols;
if (t_h > H || t_w > W)
{
MYCV_ERROR(kBadSize, "NCC source image size should larger than targe image");
return kBadSize;
}
//r = cov(X,Y)/(sigma(X) * sigma(Y))
//sigma(X) = sqrt(var(X))
int r_h = H - t_h + 1; //結果圖的高度
int r_w = W - t_w + 1;
cv::Mat integral_image;//source的積分圖
cv::Mat sq_integral;//source 的像素平方的積分圖
integral(source, integral_image, sq_integral);
//cv::integral(source, integral_image, sq_integral, CV_64FC1, CV_64FC1);
//計算模闆圖在source上的卷積
cv::Mat conv;
convFFT(source, target, conv);
const double target_size = t_h * t_w;
double target_mean = calculateMean(target);
double target_var = calculateVariance(target, target_mean);
double target_std_var = std::sqrt(target_var);
result = cv::Mat::zeros(cv::Size(r_w, r_h), CV_32FC1);
double region_sum = 0;
double region_sq_sum = 0;
for (int row = 0; row < r_h; row++)
{
float * p = result.ptr<float>(row);
float * convp = conv.ptr<float>(row);
for (int col = 0; col < r_w; col++)
{
cv::Rect ROI(col, row, t_w, t_h);//source上和目标圖比對的子圖
cv::Mat temp = source(ROI);
//計算source中對應子塊的均值
getRegionSumFromIntegralImage(integral_image, col, row, col + t_w - 1, row + t_h - 1, region_sum);
double temp_mean = region_sum / target_size;
//計算兩個圖的協方差
//cov(X,Y) = E(X*Y) - E(X)*E(Y)
double cov = (*(convp + col)) / target_size - temp_mean * target_mean;
//double cov = calculateCovariance(temp, target, temp_mean, target_mean);
//計算source中對應子塊的方差
getRegionSumFromIntegralImage(sq_integral, col, row, col + t_w - 1, row + t_h - 1, region_sq_sum);
double temp_var = (region_sq_sum - temp_mean * region_sum) / target_size;
double temp_std_var = std::sqrt(temp_var);
p[col] = cov / ((temp_std_var + 0.0000001) * (target_std_var + 0.0000001));
}
}
return kSuccess;
}
void test_NCC_speed()
{
const int TIMES = 10;
std::string src_path = "data\\source.jfif";
std::string target_path = "data\\target.jfif";
std::string log_path = "ncc_speed.txt";
cv::Mat source, target, result;
//source = cv::imread(src_path, cv::IMREAD_GRAYSCALE);
//target = cv::imread(target_path, cv::IMREAD_GRAYSCALE);
std::chrono::steady_clock::time_point start_time,end_time;
double myncc_runtime = 0, opencv_runtime = 0;
auto logger = spdlog::basic_logger_mt("NCC", log_path);
logger->set_level(spdlog::level::critical);
// location
double min_value, max_value;
cv::Point min_loc, max_loc;
for (int src_size = 500; src_size <= 1200; src_size += 100)
{
source = cv::Mat(cv::Size(src_size, src_size), CV_8UC1);
target = cv::Mat(cv::Size(100, 100), CV_8UC1);
cv::randu(source,cv::Scalar(0),cv::Scalar(255));
cv::randu(target,cv::Scalar(0),cv::Scalar(255));
logger->info("src_size:(h,w)=({0},{1}), target_size:(h,w)=({2},{3})",
source.rows,source.cols,target.rows,target.cols);
// my NCC test
printf("source image size w,h = (%d,%d) \n", source.cols, source.rows);
printf("target image size w,h = (%d,%d) \n", target.cols, target.rows);
//warm up
//mycv::NormalizedCrossCorrelation(source, target, result);
mycv::NormalizedCrossCorrelationFFT(source, target, result);
start_time = std::chrono::steady_clock::now();;
for (int n = 0; n < TIMES; n++)
{
//mycv::NormalizedCrossCorrelation(source, target, result);
mycv::NormalizedCrossCorrelationFFT(source, target, result);
}
end_time = std::chrono::steady_clock::now();;
myncc_runtime = std::chrono::duration_cast<std::chrono::milliseconds>(end_time - start_time).count() / TIMES;
printf("my NCC run %d times,average use %f ms \n", TIMES, myncc_runtime);
cv::minMaxLoc(result, &min_value, &max_value, &min_loc, &max_loc);
printf("min_value=%f , min_loc(x,y)=(%d,%d), \t max_value=%f,max_loc(x,y)=(%d,%d)\n",
min_value, min_loc.x, min_loc.y, max_value, max_loc.x, max_loc.y);
logger->info("my NCC run {0} times,average use {1} ms \n", TIMES, myncc_runtime);
logger->info("my NCC min_value = {0}, min_loc(x, y) = ({1}, {2}), \t max_value = {3}, max_loc(x, y) = ({4}, {5})\n",
min_value, min_loc.x, min_loc.y, max_value, max_loc.x, max_loc.y);
//warm up
cv::matchTemplate(source, target, result, cv::TM_CCOEFF_NORMED);
// opencv NCC test
start_time = std::chrono::steady_clock::now();;
for (int n = 0; n < TIMES; n++)
{
cv::matchTemplate(source, target, result, cv::TM_CCOEFF_NORMED);
}
end_time = std::chrono::steady_clock::now();;
opencv_runtime = std::chrono::duration_cast<std::chrono::milliseconds>(end_time - start_time).count() / TIMES;
printf("opencv NCC run %d times,average use %f ms \n", TIMES, opencv_runtime);
cv::minMaxLoc(result, &min_value, &max_value, &min_loc, &max_loc);
printf("min_value=%f , min_loc(x,y)=(%d,%d), \t max_value=%f,max_loc(x,y)=(%d,%d)\n",
min_value, min_loc.x, min_loc.y, max_value, max_loc.x, max_loc.y);
logger->info("opencv NCC run {0} times,average use {1} ms \n", TIMES, opencv_runtime);
logger->info("opencv NCC min_value = {0}, min_loc(x, y) = ({1}, {2}), \t max_value = {3}, max_loc(x, y) = ({4}, {5})\n",
min_value, min_loc.x, min_loc.y, max_value, max_loc.x, max_loc.y);
logger->info("speed : myncc_runtime / opencv_runtime = {}", (int)(myncc_runtime / opencv_runtime));
}
}