編譯器中的優化
- 不同編譯器的對比
不同編譯器的對比
下表對比了不同的編譯器的優化效果。
必須強調的是,編譯器在不同的測試例子上可能表現不同。下表僅供參考。
優化方法 | Gnu | Clang | Microsoft | Intel |
---|---|---|---|---|
通用優化 | ||||
函數内聯 | x | x | x | x |
常量折疊 | x | x | x | x |
常量傳播 | x | x | x | x |
循環的常量傳播 | x | x | - | - |
指針消除 | x | x | x | x |
公共子表達式消除 | x | x | x | x |
寄存器變量 | x | x | x | x |
Fused multiply and add | x | x | x | x |
生命周期分析 | x | x | x | x |
合并相同的分支 | x | x | x | x |
消除跳轉 | x | x | x | x |
尾調用 | x | x | x | x |
移除總為false的分支 | x | x | x | x |
循環展開,數組循環 | x | x | x | x |
循環展開,結構體 | x | x | x | - |
相同循環體代碼移動 | x | x | x | x |
數組元素的歸納變量 | x | x | x | x |
整數表達式的歸納變量 | - | x | 1 | x |
浮點表達式的歸納變量 | - | - | - | x |
乘法累加器,整數 | - | x | x | - |
乘法累加器,浮點 | - | x | x | - |
去虛拟化 | x | x | x | x |
Profile-guided optimization | x | x | x | x |
全局程式優化 | x | x | x | x |
整數代數化簡 | ||||
a+b = b+a, a*b = b*a (交換律) | x | x | x | x |
(a+b)+c = a+(b+c), (a*b)*c = a*(b*c) (結合律) | - | x | x | - |
a*b + a*c = a*(b+c)(配置設定律) | x | x | x | x |
a+b+c+d = (a+b)+(c+d) (提高并行) | - | - | - | x |
a*b*c*d = (a*b)*(c*d) (提高并行) | - | x | - | x |
x*x*x*x*x*x*x*x = (((x2)2)2) | x | x | - | x |
a+a+a+a = a*4 | x | x | x | x |
a*x*x*x + b*x*x + c*x + d = ((a*x+b)*x+c)*x + d | x | x | x | x |
-(-a) = a | x | x | x | x |
a-(-b) = a+b | x | x | x | x |
a-a = 0 | x | x | x | x |
a+0 = a | x | x | x | x |
a*0 = 0 | x | x | x | x |
a*1 = a | x | x | x | x |
(-a)*(-b) = a*b | x | x | x | x |
a/a = 1 | x | x | x | - |
a/1 = a | x | x | x | x |
0/a = 0 | x | x | x | - |
乘以常量= 移位和加法 | x | x | x | x |
除以常量 = 乘法和移位 | x | x | x | x |
除以2的次幂 = 移位 | x | x | x | x |
(-a == -b) = (a == b) | x | x | x | - |
(a+c == b+c) = (a==b) | - | x | x | x |
!(a < b) = (a >= b) | x | x | x | x |
(a<b && b<c && a<c) == (a<b && b<c) | x | - | - | - |
浮點代數化簡 | ||||
a+b = b+a, a*b = b*a (交換律) | x | x | x | x |
(a+b)+c = a+(b+c)(結合律) | x | x | - | x |
,(a*b)*c = a*(b*c) (結合律) | x | x | - | - |
a*b + a*c = a*(b+c)(配置設定律) | x | x | x | x |
a+b+c+d = (a+b)+(c+d), a*b*c*d = (a*b)*(c*d) | x | x | - | - |
a*x*x*x + b*x*x + c*x + d = ((a*x+b)*x+c)*x + d | x | x | x | x |
x*x*x*x*x*x*x*x = (((x2)2)2) | x | x | - | - |
a+a+a+a = a*4 | x | x | x | - |
-(-a) = a | x | x | x | x |
a-(-b) = a+b | x | x | x | x |
a-a = 0 | x | x | x | x |
a+0 = a | x | x | x | x |
a*0 = 0 | x | x | x | x |
a*1 = a | x | x | x | x |
(-a)*(-b) = a*b | x | x | x | x |
a/a = 1 | x | x | - | - |
a/1 = a | x | x | x | x |
0/a = 0 | x | x | x | - |
(-a == -b) = (a == b) | x | x | - | - |
(-a > -b) = (a < b) | x | x | x | - |
除以常量 = 乘以倒數 | x | x | x | x |
布爾代數化簡 | ||||
沒有分支的布爾操作 | x | x | - | 極少 |
a && b = b && a, a||b = b||a (交換律) | x | x | - | x |
a && b && c = a && (b && c) (結合律) | - | - | - | x |
(a&&b)||(a&&c) = a&&(b||c) (配置設定律) | x | x | - | - |
(a||b)&&(a||c) = a||(b&&c) (配置設定律) | x | x | - | - |
!(!a) = a | x | x | x | x |
!a && !b = !(a || b) (德摩根定律) | x | x | - | - |
a && !a = false, a || !a = true | x | x | x | x |
a && true = a, a || false = a | x | x | x | x |
a && false = false, a || true = true | x | x | x | x |
a && a = a | x | x | x | x |
(a&&b) || (a&&!b) = a | x | - | x | x |
(a&&b) || (!a&&c) = a ? b : c | - | x | - | - |
(a&&b) || (!a&&c) || (b&&c) = a ? b : c | x | - | x | x |
(a&&b) || (a&&b&&c) = a&&b | x | x | x | x |
(a&&!b) || (!a&&b) = a XOR b | x | x | - | - |
向量寄存器中的位操作代數化簡: | ||||
a & b = b & a, a|b = b|a (交換律) | x | x | - | - |
a & b & c = a & (b & c) (結合律) | x | x | - | - |
(a&b)|(a&c) = a&(b|c) (配置設定律) | x | x | - | - |
(a|b)&(a|c) = a|(b&c) (配置設定律) | x | x | - | - |
三值邏輯指令 | - | - | - | x |
(a) = a | x | x | - | - |
~a & ~b = ~(a | b) | x | x | - | - |
a & ~a = false, a | ~a = true | x | x | - | - |
a & true = a, a | false = a | x | x | - | - |
a & false = false | x | x | x | x |
, a | true = true | x | x | x | - |
a & a = a, a | a = a | x | x | - | x |
(a&b) | (a&~b) = a | x | x | - | - |
(a&b) | (~a&c) = a ? b : c | x | - | - | - |
(a&b) | (~a&c) | (b&c) = a ? b : c | - | - | - | - |
(a&b) | (a&b&c) = a&b | x | x | - | - |
(a&&~b) | (~a&b) = a ^ b | x | x | - | - |
~a ^ ~b = a ^ b | x | x | - | - |
a <<b<<c = a<<(b+c) | - | - | - | - |
整數向量代數化簡: | ||||
a+b = b+a, a*b = b*a (交換律) | x | x | - | - |
(a+b)+c = a+(b+c), (a*b)*c = a*(b*c) (結合律) | x | x | - | - |
a*b + a*c = a*(b+c)(配置設定律) | x | x | - | - |
a+b+c+d = (a+b)+(c+d) | - | - | - | - |
x*x*x*x*x*x*x*x = (((x2)2)2) | x | x | - | - |
a+a+a+a = a*4 | - | x | - | - |
a*x*x*x + b*x*x + c*x + d = ((a*x+b)*x+c)*x + d | x | x | - | - |
-(-a) = a | x | x | - | - |
a-(-b) = a+b | x | x | - | - |
a-a = 0 | x | x | - | x |
a+0 = a | x | x | - | - |
a*0 = 0 | x | x | - | x |
a*1 = a | x | x | - | - |
(-a)*(-b) = a*b | x | x | - | - |
乘以2的次幂 = 移位 | x | x | x | x |
(-a == -b) = (a == b) | - | x | - | - |
(a+c == b+c) = (a == b) | - | x | - | - |
!(a < b) = (a >= b) | - | - | - | - |
(a<b && b<c && a<c) == (a<b && b<c) | - | - | - | - |
浮點向量代數化簡: | ||||
a+b = b+a, a*b = b*a (交換律) | x | x | - | x |
(a+b)+c = a+(b+c), (a*b)*c = a*(b*c)(結合律) | x | x | - | - |
a*b + a*c = a*(b+c)(配置設定律) | x | x | - | - |
a+b+c+d = (a+b)+(c+d) | - | - | - | - |
x*x*x*x*x*x*x*x = (((x2)2)2) | x | x | - | - |
a+a+a+a = a*4 | - | x | - | 2*a+a+a |
a*x*x*x + b*x*x + c*x + d = ((a*x+b)*x+c)*x + d | x | x | - | x |
-(-a) = a | x | x | - | - |
a-(-b) = a+b | - | - | - | - |
a-a = 0 | x | x | - | x |
a+0 = a | x | x | x | x |
a*0 = 0 | x | x | - | x |
a*1 = a | x | x | - | x |
(-a)*(-b) = a*b | - | - | - | - |
a/a = 1 | x | x | - | - |
a/1 = a | - | x | - | - |
0/a = 0 | x | x | - | - |
除以常量 = 乘以倒數 | - | - | - | - |
(-a == -b) = (a == b) | - | - | - | - |
!(a < b) = (a >= b) | - | - | - | - |
通用向量優化: | ||||
自動向量化 | x | x | 256bit | x |
合并廣播到指令 | - | x | - | x |
merge blend into masked instruction | x | x | - | x |
merge conditional zero into masked instruction | x | - | - | x |
合并布爾AND到掩碼比較 | x | x | - | x |
消除所有為true的掩碼 | x | x | x | x |
消除所有為false的掩碼 | x | x | - | x |
表8.1. 不同C++編譯器裡優化的比較
測試在打開所有相關優化選項時編譯在64-bit Windows下的測試代碼,包括放寬浮點精度。測試了以下編譯器版本:
Gnu C++ v.7.4.0 (2019, Cygwin64).
Clang C++ v.5.0.1(2019, Cygwin64).
Microsoft C++ Compiler v.19.21.27702 (Visual Studio 2019).
Intel C++ Compiler v.19.0.4.245 for Intel64, 2019.
Clang和Gnu編譯器是在測試中表現最好的;Microsoft編譯器在向量方面表現普通。在自動向量化方面,目前的Microsoft編譯器使用256-bit向量而不是512-bit向量。Intel編譯器自動使用512-bit向量,但需要指定
/Qopt-zmm-usage:high
。
Clang編譯器傾向于過多的展開循環。過多的循環展開會減慢性能,因為它會填滿CPU中的微指令緩存或回環緩沖。