關于實作Halcon算法加速的基礎知識(2)（多核并行/GPU）

四、GPU

1、Halcon中使用GPU提速，效果明顯。

Windows開始菜單--運作--輸入dxdiag--顯示，可以看到自己電腦的顯示卡型号。

官方自帶的例程compute_devices.hdev，實作提速的優良效果，必須先關閉裝置：dev_update_off()；

來自官方例程compute_devices.hdev

* This example shows how to use compute devices with HALCON.
* 
dev_update_off ()
dev_close_window ()
dev_open_window_fit_size (0, 0, 640, 480, -1, -1, WindowHandle)
set_display_font (WindowHandle, 16, 'mono', 'true', 'false')
* 
* Get list of all available compute devices.
query_available_compute_devices (DeviceIdentifier)
* 
* End example if no device could be found.
if (|DeviceIdentifier| == 0)
    return ()
endif
* 
* Display basic information on detected devices.
disp_message (WindowHandle, 'Found ' + |DeviceIdentifier| + ' Compute Device(s):', 'window', 12, 12, 'black', 'true')
for Index := 0 to |DeviceIdentifier| - 1 by 1
    get_compute_device_info (DeviceIdentifier[Index], 'name', DeviceName)
    get_compute_device_info (DeviceIdentifier[Index], 'vendor', DeviceVendor)
    Message[Index] := 'Device #' + Index + ': ' + DeviceVendor + ' ' + DeviceName
endfor
disp_message (WindowHandle, Message, 'window', 42, 12, 'white', 'false')
disp_continue_message (WindowHandle, 'black', 'true')
stop ()

2、操作GPU裝置有關的算子：

query_available_compute_devices

get_compute_device_info

open_compute_device

init_compute_device

activate_compute_device

deactivate_compute_device

3、官方自帶的例程get_operator_info.hdev，可以檢視支援GPU加速（OpenCL）的算子；

* Determine all operators that support OpenCL

get_opencl_operators (OpenCLSupport)

* 自定義函數展開之後,有get_operator_info算子

get_operator_name ('', OperatorNames)

get_operator_info (OperatorNames[Index], 'compute_device', Information)

這裡舉例Halcon 19.11版本可以加速的算子有82個：

['abs_diff_image', 'abs_image', 'acos_image', 'add_image', 'affine_trans_image', 'affine_trans_image_size', 'area_center_gray', 'asin_image', 'atan2_image', 'atan_image', 'binocular_disparity_ms', 'binocular_distance_ms', 'binomial_filter', 'cfa_to_rgb', 'change_radial_distortion_image', 'convert_image_type', 'convol_image', 'cos_image', 'crop_domain', 'crop_part', 'crop_rectangle1', 'depth_from_focus', 'derivate_gauss', 'deviation_image', 'div_image', 'edges_image', 'edges_sub_pix', 'exp_image', 'find_ncc_model', 'find_ncc_models', 'gamma_image', 'gauss_filter', 'gauss_image', 'gray_closing_rect', 'gray_closing_shape', 'gray_dilation_rect', 'gray_dilation_shape', 'gray_erosion_rect', 'gray_erosion_shape', 'gray_histo', 'gray_opening_rect', 'gray_opening_shape', 'gray_projections', 'gray_range_rect', 'highpass_image', 'image_to_world_plane', 'invert_image', 'linear_trans_color', 'lines_gauss', 'log_image', 'lut_trans', 'map_image', 'max_image', 'mean_image', 'median_image', 'median_rect', 'min_image', 'mirror_image', 'mult_image', 'points_harris', 'polar_trans_image', 'polar_trans_image_ext', 'polar_trans_image_inv', 'pow_image', 'principal_comp', 'projective_trans_image', 'projective_trans_image_size', 'rgb1_to_gray', 'rgb3_to_gray', 'rotate_image', 'scale_image', 'sin_image', 'sobel_amp', 'sobel_dir', 'sqrt_image', 'sub_image', 'tan_image', 'texture_laws', 'trans_from_rgb', 'trans_to_rgb', 'zoom_image_factor', 'zoom_image_size']

4、官方手冊

C:\Program Files\MVTec\HALCON-19.11-Progress\doc\pdf\reference\reference_hdevelop.pdf

Chapter 25 System --- 25.1 Compute Devices

五、舉例測試

*參考官方例程optimize_aop.hdev;query_aop_info.hdev;simulate_aop.hdev;
*舉例edges_sub_pix算子性能測試
dev_update_off ()//實作提速的優良效果，必須先關閉裝置
dev_close_window ()
dev_open_window_fit_size (0, 0, 640, 480, -1, -1, WindowHandle)
set_display_font (WindowHandle, 16, 'mono', 'true', 'false')
get_system ('processor_num', NumCPUs)
get_system ('parallelize_operators', AOP)
*讀取圖檔
read_image(Image, 'D:/hellowprld/2/1-.jpg')
*彩色轉灰階圖
count_channels (Image, Channels)
if (Channels == 3 or Channels == 4)
    rgb1_to_gray (Image, ImageGray)
endif
alpha:=5
low:=10
high:=20
   
*測試1:去掉AOP,即沒有加速并行處理
set_system ('parallelize_operators', 'false')
get_system ('parallelize_operators', AOP)
count_seconds(T0)
edges_sub_pix (ImageGray, Edges1, 'canny', alpha, low, high)
count_seconds(T1)
Time0:=(T1-T0)*1000
stop()
*測試2:AOP自動加速并行處理
*Halcon的預設值是開啟AOP的,即parallelize_operators值為true
set_system ('parallelize_operators', 'true')
count_seconds(T1)
edges_sub_pix (ImageGray, Edges1, 'canny', alpha, low, high)
count_seconds(T2)
Time1:=(T2-T1)*1000
stop()
*測試3:GPU加速，支援GPU加速的算子Halcon19.11有82個
*GPU加速是先從CPU中将資料拷貝到GPU上處理，處理完成後再将資料從GPU拷貝到CPU上。從CPU到GPU再從GPU到CPU是要花費時間的。
*GPU加速一定會比正常的AOP運算速度快嗎?不一定!結果取決于顯示卡的好壞.
query_available_compute_devices(DeviceIdentifiers)
DeviceHandle:=0
for i:=0 to |DeviceIdentifiers|-1 by 1
    get_compute_device_info(DeviceIdentifiers[i], 'name', Nmae)
    if (Nmae == 'GeForce GT 630')//根據GPU名稱打開GPU
        open_compute_device(DeviceIdentifiers[i], DeviceHandle)
        break
    endif
endfor
if(DeviceHandle#0)
    set_compute_device_param (DeviceHandle, 'asynchronous_execution', 'false')
    init_compute_device(DeviceHandle, 'edges_sub_pix')
    activate_compute_device(DeviceHandle)
endif
*獲得顯示卡的資訊
get_compute_device_param (DeviceHandle, 'buffer_cache_capacity', GenParamValue0)//預設值是顯示卡緩存的1/3
get_compute_device_param (DeviceHandle, 'buffer_cache_used', GenParamValue1)
get_compute_device_param (DeviceHandle, 'image_cache_capacity', GenParamValue2)
get_compute_device_param (DeviceHandle, 'image_cache_used', GenParamValue3)
*GenParamValue0 := GenParamValue0 / 3
*set_compute_device_param (DeviceHandle, 'buffer_cache_capacity', GenParamValue0)
*get_compute_device_param (DeviceHandle, 'buffer_cache_capacity', GenParamValue4)
count_seconds(T3)
*如果顯示卡緩存不夠,會報錯,error #4104 : Out of compute device memory
edges_sub_pix (ImageGray, Edges1, 'canny', alpha, low, high)
count_seconds(T4)
Time2:=(T4-T3)*1000
if(DeviceHandle#0)
    deactivate_compute_device(DeviceHandle)
endif
stop()
*測試4:AOP手動優化
set_system ('parallelize_operators', 'true')
get_system ('parallelize_operators', AOP)
*4.1-優化線程數目方法'threshold'
optimize_aop ('edges_sub_pix', 'byte', 'no_file', ['file_mode','model','parameters'], ['nil','threshold','false'])
count_seconds(T5)
edges_sub_pix (ImageGray, Edges1, 'canny', alpha, low, high)
count_seconds(T6)
Time3:=(T6-T5)*1000
*4.2-優化線程數目方法'linear'
optimize_aop ('edges_sub_pix', 'byte', 'no_file', ['file_mode','model','parameters'], ['nil','linear','false'])
count_seconds(T7)
edges_sub_pix (ImageGray, Edges1, 'canny', alpha, low, high)
count_seconds(T8)
Time4:=(T8-T7)*1000
stop()
*4.3-優化線程數目方法'mlp'
optimize_aop ('edges_sub_pix', 'byte', 'no_file', ['file_mode','model','parameters'], ['nil','mlp','false'])
count_seconds(T9)
edges_sub_pix (ImageGray, Edges1, 'canny', alpha, low, high)
count_seconds(T10)
Time5:=(T10-T9)*1000
stop()
dev_clear_window()
Message := 'edges_sub_pix runtimes:'
Message[1] := 'CPU only Time0 without AOP='+Time0+'ms,'
Message[2] := 'CPU only Time1 with AOP='+Time1+'ms,'
Message[3] := 'GPU use Time2='+Time2+'ms,'
Message[4] := 'optimize Time3 threshold='+Time3+'ms'
Message[5] := 'optimize Time4 linear='+Time4+'ms'
Message[6] := 'optimize Time5 mlp='+Time5+'ms'
disp_message (WindowHandle, Message, 'window', 12, 12, 'red', 'false')
stop()

edges_sub_pix算子性能測試結果：

rotate_image算子性能測試結果：

得出的結論是：

1、GPU加速是先從CPU中将資料拷貝到GPU上處理，處理完成後再将資料從GPU拷貝到CPU上。從CPU到GPU再從GPU到CPU是要花費時間的。

2、GPU加速一定會比正常的AOP運算速度快嗎?不一定!結果取決于顯示卡的好壞.

3、GPU加速，如果顯示卡緩存不夠,會報錯,error #4104 : Out of compute device memory

完整的*.hdev工程檔案請下載下傳：

https://download.csdn.net/download/libaineu2004/12146529

關于實作Halcon算法加速的基礎知識(2)（多核并行/GPU）

繼續閱讀

Command Network(POJ 3164)---定根最小樹形圖模闆題題目描述輸入格式輸出格式輸入樣例輸出樣例分析源程式

開源低帶寬語音編解碼器

windows不能在本地計算機上運作oracleDbConsoleorcl

241 Different Ways to Add Parentheses（C代碼版）

【趨高機器視覺】機器視覺技術原了解析及解決方案

CSMA/CD1． CSMA/CD的概述2． CSMA 的工作原理3． CSMA/CD控制規程及特點4． CSMA/CD協定5． CSMA/CD的優點6．結束語

Windows下VS開發環境環境安裝工程項目設定關于Debug和Release的提示

極大似然法(ML)與最大期望法(EM)

C++ 第十五周報告1--《冒泡法排序》

筆試面試題目：滑動視窗(二)

Windows下配置Apache的SSL服務

Mac｜Windows系統本地照片自動上傳到伺服器

資料結構與算法（27）——排序（二）

Dijkstra--簡易版（最短路徑）

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

hdu7108哈希