接下來上一篇文章《 SQL優化案例(1):隐式轉換》的介紹,此處内容圍繞OR的優化展開。
在MySQL中,同樣的查詢條件,如果變換OR在SQL語句中的位置,那麼查詢的結果也會有差異,在多個複雜的情況下,可能會帶來索引選擇不佳的性能隐患,為了避免執行效率大幅度下降的問題,我們可以适當考慮使用統一所有對查詢邏輯複雜的SQL進行分離。
常見OR使用場景,請閱讀以下案例。
案例一:不同列使用OR條件查詢
1.待優化場景
SELECT
..
..
FROM`t1` a
WHERE a.token= '16149684'
AND a.store_id= '242950'
AND(a.registrationId IS NOT NULL
AND a.registrationId<> '')
OR a.uid= 308475
AND a.registrationId IS NOT NULL
AND a.registrationId<> ''
執行計劃
+--------------+-----------------------+-----------------+----------------+-------------------+-------------------+---------------+----------------+---------------------------------------------+
| id | select_type | table | type | key | key_len | ref | rows | Extra |
+--------------+-----------------------+-----------------+----------------+-------------------+-------------------+---------------+----------------+---------------------------------------------+
| 1 | SIMPLE | a | range |idx_registrationid | 99 | | 100445 | Using index condition; Using where |
+--------------+-----------------------+-----------------+----------------+-------------------+-------------------+---------------+----------------+---------------------------------------------+
共傳回1行記錄,花費 5 ms 。
2.場景解析
從查詢條件中可以研磨令牌和uid過濾性都非常好,但是由于使用了,或者,需要采用索引合并的方法才能獲得比較好的性能。但在實際執行過程中MySQL優化器替代選擇了使用registrationId的索引,導緻SQL的性能很差。
3.場景優化
我們将SQL改寫成union all的形式。
SELECT
...
...
FROM`t1` a
WHERE a.token = '16054473'
AND a.store_id = '138343'
AND b.is_refund = 1
AND (a.registrationId IS NOT NULL
AND a.registrationId <> '')
union all
SELECT
...
...
FROM`t1` a
where a.uid = 181579
AND a.registrationId IS NOT NULL
AND a.registrationId <> ''
+--------------+-----------------------+-----------------+----------------+------------------------------+---------------+-------------------+------------------------------+----------------+------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+--------------+-----------------------+-----------------+----------------+------------------------------+---------------+-------------------+------------------------------+----------------+------------------------------------+
| 1 | PRIMARY | a | ref | IDX_TOKEN,IDX_STORE_ID_TOKEN | IDX_TOKEN | 63 | const | 1 | Using index condition; Using where |
| 1 | PRIMARY | b | eq_ref | PRIMARY | PRIMARY | 4 | youdian_life_sewsq.a.role_id | 1 | Using where |
| 2 | UNION | a | const | PRIMARY | PRIMARY | 4 | const | 1 | |
| 2 | UNION | b | const | PRIMARY | PRIMARY | 4 | const | 0 | unique row not found |
| | UNION RESULT | <union1,2> | ALL | | | | | | Using temporary |
+--------------+-----------------------+-----------------+----------------+------------------------------+---------------+-------------------+------------------------------+----------------+------------------------------------+
共傳回5行記錄,花費 5 ms 。
通過對比優化前後的執行計劃,可以明顯修剪,将SQL合并成兩個子查詢,再使用union對結果進行合并,穩定性和安全性更好,性能更高。
案例二:同一列使用OR查詢條件
select
....
....
from
t1 as mci
left join t1 as ccv2_1 on ccv2_1.unique_no = mci=category_no1
left join t1 as ccv2_2 on ccv2_2.unique_no = mci=category_no2
left join t1 as ccv2_3 on ccv2_3.unique_no = mci=category_no3
left join(
select product_id,
count(0) count
from t2 pprod
inner join t3 pinfo on pinfo.promotion_id = pprod.promotion_id
and pprod.is_enable =1
and ppinfo.is_enable=1
and pinfo.belong_t0 =1
and pinfo.end_time >=now()
and not (
pinfo.onshelv_time>'2019-06-30 00:00:00'
or pinfo.end_time>'2018-12-05 00:00:00'
)group by pprod.product_id
)as pc on pc.product_id = mci.product_id
where mci.is_enable =0
and mci.comodifty_type in ('1', '5', '6')
and (pc.count =0 or pc.count isnull ) limit 0,5;

本例的SQL查詢中有一個子查詢,子查詢被當成成驅動表,産生了auto_key,通過SQL進行進行測試,驗證主要是(pc.count = 0或pc.count為null)會影響到整個SQL的性能,需要進行比較改寫。
首先我們可以單獨思考(pc.count = 0或pc.count為null)如何進行優化?先寫一個類似的SQL
Select col from test where col =100 or col is null;
+--------+
| col |
+--------+
| 100 |
| NULL |
+--------+
2 rows in set (0.00 sec)
這個時候我們看到的其實是同一個列,但對應不同的值,這種情況可以利用case when進行轉換。
Select col From test where case when col is null then 100 else col =100 end;
+--------+
| col |
+--------+
| 100 |
| NULL |
+--------+
2 rows in set (0.00 sec)
再回到原始SQL進行改寫。
select
....
....
from
t1 as mci
left join t1 as ccv2_1 on ccv2_1.unique_no = mci=category_no1
left join t1 as ccv2_2 on ccv2_2.unique_no = mci=category_no2
left join t1 as ccv2_3 on ccv2_3.unique_no = mci=category_no3
left join(
select product_id,
count(0) count
from t2 pprod
inner join t3 pinfo on pinfo.promotion_id = pprod.promotion_id
and pprod.is_enable =1
and ppinfo.is_enable=1
and pinfo.belong_t0 =1
and pinfo.end_time >=now()
and not (
pinfo.onshelv_time>'2019-06-30 00:00:00'
or pinfo.end_time>'2018-12-05 00:00:00'
)group by pprod.product_id
)as pc on pc.product_id = mci.product_id
where mci.is_enable =0
and mci.comodifty_type in ('1', '5', '6')
and case when pc.count is null then 0 else pc.count end=0 limit 0,5;
可以抛光優化後的SQL比原始SQL快了30秒,執行效率提升約50倍。
案例三:優化關聯SQL OR條件
SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’, …
FROM user_msg
LEFT JOIN user ON user_msg.user_id = user.user_id
LEFT JOIN group ON user_msg.group_id = group.group_id
WHERE user_msg.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL30SECOND)
OR user.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)
OR group.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)
我們仔細分析上述查詢語句,發現雖然業務邏輯隻需要查詢半分鐘内修改的資料,但執行過程卻必須對所有的資料進行關聯操作,帶來的性能損失。
我們對原始SQL進行分解操作,第一部分sql-01如下:
SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’, …
FROM user_msg
LEFT JOIN user ON user_msg.user_id = user.user_id
LEFT JOIN group ON user_msg.group_id = group.group_id
WHERE user_msg.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)
sql-01以user_msg表為驅動,使用gmt_modified索引過濾最新資料。
第二部分sql-02如下:
SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’, …
FROM user_msg
LEFT JOIN user ON user_msg.user_id = user.user_id
LEFT JOIN group ON user_msg.group_id = group.group_id
WHERE user.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)
sql-02以使用者為驅動表,msg user_id的索引過濾行很好。
第三部分sql-03如下:
SELECT user_msg.msg_id AS ‘msg_id’, user_msg.content AS ‘msg_content’, …
FROM user_msg
LEFT JOIN user ON user_msg.user_id = user.user_id
LEFT JOIN group ON user_msg.group_id = group.group_id
WHERE group.gmt_modified >= date_sub('2018-03-29 09:31:44', INTERVAL 30 SECOND)
sql-03以group為驅動表,使用gmt_modified索引過濾最新資料。
總結
MySQL OR條件優化的常見場景主要有以下情況:
1,相同列可以使用IN進行代替
2,不同列及複雜的情況下,可以使用union all進行分離
3,關聯SQL OR條件
我們需要結合實際場景,分析優化。