【說明】
本文主要是針對以下兩個問題進行的分析整理:
1、如果SQL中存在left join,是否絕對無法改寫?
2、在left join左邊的表,并不一定就是驅動表,explain執行計劃可能會将left join右邊的表作為驅動表,什麼情況下,會産生這種情況?
【left join的用途】
我們先來了解下left join的實際用途,即它會産生什麼效果。
left join(左聯接),會傳回包括左表中的所有記錄和右表中聯結字段相等的記錄。
這句話的含義是,如果a left join b,那麼傳回的結果集是“a表與b表join關聯得到的結果集”+"a表中,無法與b關聯的其它資料",其中,a與b無法關聯的資料隻會顯示a表的内容,b表對應的内容都為空。是以a left join b得到的資料行數,是大于等于a表的資料行的。
示例:
a、b表的結構、資料如下:
mysql>select * from a
+--------------+--------------+--------------+
| id | t1 | t2 |
+--------------+--------------+--------------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 111 | 1 | 1 |
| 222 | 2 | 2 |
| 11 | 1 | 1 |
| 22 | 2 | 2 |
+--------------+--------------+--------------+
傳回行數:[6],耗時:8 ms.
mysql>select * from b
+--------------+--------------+--------------+
| id | t1 | t2 |
+--------------+--------------+--------------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 3 | 2 | 2 |
+--------------+--------------+--------------+
傳回行數:[3],耗時:8 ms.
mysql>select a.*,b1.*
from a
join b b1 on a.id=b1.t1
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 1 | 1 |
| 2 | 2 | 2 | 2 | 2 | 2 |
| 2 | 2 | 2 | 3 | 2 | 2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
mysql>select a.*,b1.*
from a
left join b b1 on a.id=b1.t1
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 1 | 1 |
| 2 | 2 | 2 | 2 | 2 | 2 |
| 2 | 2 | 2 | 3 | 2 | 2 |
| 111 | 1 | 1 | | | |
| 222 | 2 | 2 | | | |
| 11 | 1 | 1 | | | |
| 22 | 2 | 2 | | | |
+--------------+--------------+--------------+--------------+--------------+--------------+
傳回行數:[7],耗時:8 ms.
可以看到a left join b的上半部分結果就是“a表與b表join關聯得到的結果集”,下半部分的結果集是"a表中,無法與b關聯的其它資料"
【疑問一的分析】
明白了left join的實際作用後,我們再來分析以下幾種情況:
1、問:
A:select a.,c. from a left join (select * from b where b.t1=1) c on a.id=c.id
是否可以改寫為
B:select a.,b. from a left join b on a.id=b.id where b.t1=1
答:否,這兩種情況的意義完全不一樣,所得到的是兩種不同的結果。
在A情況下,a表是與子查詢的結果集進行左表連接配接,雖然自查詢通過where條件進行了過濾,但影響的是b表會有哪些資料會在結果集中顯示,a表中的資料不會有任何缺失,最後的結果行數肯定還是大于等于a表的行數的
在B情況下,a表與b表進行左表連接配接後,得到一個結果集,最後再對整個結果集進行過濾,在對整個結果集過濾的時候,a表中的資料可能也會被過濾掉,過濾後的結果行數可能是大于、等于或小于a表的行數的
注:a、b兩表的結構與資料在第一處代碼中已有展示,這裡不再贅述
mysql>select a.*,c.* from a left join (select * from b where b.t1=1) c on a.id=c.id
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 1 | 1 |
| 2 | 2 | 2 | | | |
| 111 | 1 | 1 | | | |
| 222 | 2 | 2 | | | |
| 11 | 1 | 1 | | | |
| 22 | 2 | 2 | | | |
+--------------+--------------+--------------+--------------+--------------+--------------+
傳回行數:[6],耗時:57 ms.
mysql>select a.*,b.* from a left join b on a.id=b.id where b.t1=1
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 1 | 1 |
+--------------+--------------+--------------+--------------+--------------+--------------+
傳回行數:[1],耗時:7 ms.
2、問:
A:select a.,b. from (select * from a where a.t1=1) left join b on a.id=b.id
B:select a.,b. from a left join b on a.id=b.id where a.t1=1
答:可以,效果是一樣的
在A情況下,a表過濾後的結果集作為左連接配接的左表,最後的結果集肯定是隻包含a.t1=1的
在B情況下,a與b左連接配接先得到一個結果集,然後對結果集中根據a表的t1列進行過濾,最後的結果集肯定也是隻包含a.t1=1的
注:a、b兩表的結構與資料在第一處代碼中已有展示,這裡不再贅述
mysql>select c.*,b.* from (select * from a where a.t1=1) c left join b on c.t1=b.t1
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 1 | 1 |
| 111 | 1 | 1 | 1 | 1 | 1 |
| 11 | 1 | 1 | 1 | 1 | 1 |
+--------------+--------------+--------------+--------------+--------------+--------------+
傳回行數:[3],耗時:10 ms.
mysql>select a.*,b.* from a left join b on a.t1=b.t1 where a.t1=1
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 1 | 1 |
| 111 | 1 | 1 | 1 | 1 | 1 |
| 11 | 1 | 1 | 1 | 1 | 1 |
+--------------+--------------+--------------+--------------+--------------+--------------+
傳回行數:[3],耗時:8 ms.
【小結】
上述兩種情況是對left join中子查詢存在where的情況進行了分析:
如果left join的右表即被驅動表是一個子查詢,裡面有where條件對被驅動表進行了過濾,這種情況不能直接将where條件放在最外邊,因為子查詢中通過where過濾後還是會與驅動表進行左連接配接,驅動表中的資料不會有任何缺失,但如果将where條件放在最外邊,可能也會過濾掉驅動表的資料,兩種SQL的意義是完全不一樣的;
如果left join的左表即驅動表是一個子查詢,裡面有where條件對驅動表進行了過濾,那麼可以将where條件放在最外邊。
3、問:下面三種情況是等價的嗎?
B:select a.,b. from a left join b on a.id=b.id and b.t1=1
C:select a.,b. from a left join b on a.id=b.id and a.t1=1
答:A與B是完全等價的,與C是不等價的。
A的含義是,b表過濾後再被驅動表左連接配接,得到的結果集中,a表的内容會全部顯示,b表隻會顯示與a表關聯後符合b.t1=1的内容
B的含義是,a表與b表進行左連接配接關聯,得到的結果集中,a表的内容會全部顯示,但b表的内容需要同時滿足a.id=b.id與b.t1=1才會顯示,即b表隻會顯示與a表關聯後且b.t1=1的内容
C的含義是,a表與b表進行左連接配接關聯,得到的結果集中,a表的内容會全部顯示,b表會顯示的内容需要同時滿足a.id=b.id且a表的a.t1=1
a表資料如下所示
mysql>select * from a
+--------------+--------------+--------------+
| id | t1 | t2 |
+--------------+--------------+--------------+
| 1 | 1 | 1 |
| 2 | 2 | 2 |
| 111 | 1 | 1 |
| 222 | 2 | 2 |
| 11 | 1 | 1 |
| 22 | 2 | 2 |
+--------------+--------------+--------------+
b表資料如下所示
mysql>select * from b
+--------------+--------------+--------------+
| id | t1 | t2 |
+--------------+--------------+--------------+
| 1 | 2 | 1 |
| 2 | 2 | 2 |
| 3 | 2 | 2 |
+--------------+--------------+--------------+
mysql>select a.*,c.* from a left join (select * from b where b.t1=1) c on a.id=c.id
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | | | |
| 2 | 2 | 2 | | | |
| 111 | 1 | 1 | | | |
| 222 | 2 | 2 | | | |
| 11 | 1 | 1 | | | |
| 22 | 2 | 2 | | | |
+--------------+--------------+--------------+--------------+--------------+--------------+
傳回行數:[6],耗時:9 ms.
mysql>select a.*,b.* from a left join b on a.id=b.id and b.t1=1
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | | | |
| 2 | 2 | 2 | | | |
| 111 | 1 | 1 | | | |
| 222 | 2 | 2 | | | |
| 11 | 1 | 1 | | | |
| 22 | 2 | 2 | | | |
+--------------+--------------+--------------+--------------+--------------+--------------+
傳回行數:[6],耗時:7 ms.
mysql>select a.*,b.* from a left join b on a.id=b.id and a.t1=1
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 2 | 1 |
| 2 | 2 | 2 | | | |
| 111 | 1 | 1 | | | |
| 222 | 2 | 2 | | | |
| 11 | 1 | 1 | | | |
| 22 | 2 | 2 | | | |
+--------------+--------------+--------------+--------------+--------------+--------------+
傳回行數:[6],耗時:7 ms.
從這裡可以看出 left join on 中and 的含義,and與where不同,and是與on組合使用的,表示當同時滿足這些條件時,才會顯示被驅動表的對應内容,在left join中and對驅動表要顯示的内容并沒有過濾性
【進一步思考】
那麼在join中,and的作用又是怎麼樣的?
A:select a.,c. from a join (select * from b where b.t1=1) c on a.id=c.id
B:select a.,b. from a join b on a.id=b.id and b.t1=1
C:select a.,b. from a join b on a.id=b.id and a.t1=1
此時A與B還是等價的,與C不是等價的
A表示,b表通過b.t1=1過濾,然後與a表關聯,得到一個結果集,結果集是符合a.id=c.id且b.t1=1的
B表示,a與b join關聯後,再對結果集的b.t1=1進行過濾,最後的結果集也是符合a.id=c.id且b.t1=1的
C表示,a與b join關聯後,再對結果集的b.t1=1進行過濾,最後的結果集是符合a.id=c.id且c.t1=1的
是以,在join情況下,and與where的作用是一樣的,都是對結果集的進一步過濾
總結:
如果帶有where的子查詢是驅動表,可以将其改寫,如果帶有where的子查詢是被驅動表,不能改寫
【問題二的分析】
現象:
mysql>explain
SELECT b.*
FROM a
LEFT JOIN b ON b.uid = a.uid
WHERE b.w_id = 2327
+--------------+-----------------------+-----------------+----------------+-------------------------+---------------+-------------------+------------------+----------------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+--------------+-----------------------+-----------------+----------------+-------------------------+---------------+-------------------+------------------+----------------+--------------------------+
| 1 | SIMPLE | b | ref | w_id,uid | w_id | 4 | const | 402 | Using where |
| 1 | SIMPLE | a | eq_ref | PRIMARY,idx_4 | PRIMARY | 3 | test.b.uid | 1 | Using where; Using index |
問:left join不是一定會以左表作為驅動表嗎?為什麼explain檢視left join的執行計劃時,有時右邊的表會作為驅動表?如果left join右邊的表足夠小,會被轉換為驅動表嗎?
答:并不是右表足夠小就會作為驅動表,而是當left join的效果與join是完全相同的時候,left join會被轉換為join,此時驅動表與被驅動表會依據結果集的大小來判斷,自然會有右邊的表會作為驅動表的情況
問:什麼情況下,left join的效果與join是完全相同的?
答:當最外層對右表進行where條件非空過濾時
即:
A:select a.,b. from a left join b on a.id=b.id where b.t1=2
B:select a.,b. from a join b on a.id=b.id where b.t1=2
A與B對效果是完全相同的,最後的結果都是a與滿足b.t1=1的b表關聯的結果集,A會被優化器轉換為B
mysql>select a.*,b.* from a left join b on a.id=b.id
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 2 | 1 |
| 2 | 2 | 2 | 2 | 2 | 2 |
| 111 | 1 | 1 | | | |
| 222 | 2 | 2 | | | |
| 11 | 1 | 1 | | | |
| 22 | 2 | 2 | | | |
+--------------+--------------+--------------+--------------+--------------+--------------+
傳回行數:[6],耗時:9 ms.
mysql>select a.*,b.* from a join b on a.id=b.id
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 2 | 1 |
| 2 | 2 | 2 | 2 | 2 | 2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
傳回行數:[2],耗時:8 ms.
結果集相同,可互相轉換:
mysql>select a.*,b.* from a left join b on a.id=b.id where b.t1=2
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 2 | 1 |
| 2 | 2 | 2 | 2 | 2 | 2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
傳回行數:[2],耗時:8 ms.
mysql>select a.*,b.* from a join b on a.id=b.id where b.t1=2
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 2 | 1 |
| 2 | 2 | 2 | 2 | 2 | 2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
傳回行數:[2],耗時:7 ms.
結果集不同,不可轉換:
mysql>select a.*,b.* from a left join b on a.id=b.id where b.t1 is null
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 111 | 1 | 1 | | | |
| 222 | 2 | 2 | | | |
| 11 | 1 | 1 | | | |
| 22 | 2 | 2 | | | |
+--------------+--------------+--------------+--------------+--------------+--------------+
傳回行數:[4],耗時:9 ms.
mysql>select a.*,b.* from a join b on a.id=b.id where b.t1 is null
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
傳回行數:[0],耗時:8 ms.
【進一步說明】
如果where條件是對左表的非空過濾,left join是否還等價于join?
A:select a.,b. from a left join b on a.id=b.id where a.t1=1
B:select a.,b. from a join b on a.id=b.id where a.t1=1
答:不等價。
A表示a與b左表關聯後,關聯後的結果集中肯定有a表的全部資料,然後再通過a.k1=1進行過濾
B表示a與b進行關聯後,關聯後的結果集中是a表的部分資料,然後再對這部分資料通過a.k1=1進行過濾
兩者是不等價的,這種情況的left join一定會以a表作為驅動表
mysql>select a.*,b.* from a left join b on a.id=b.id
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 2 | 1 |
| 2 | 2 | 2 | 2 | 2 | 2 |
| 111 | 1 | 1 | | | |
| 222 | 2 | 2 | | | |
| 11 | 1 | 1 | | | |
| 22 | 2 | 2 | | | |
+--------------+--------------+--------------+--------------+--------------+--------------+
傳回行數:[6],耗時:7 ms.
mysql>select a.*,b.* from a join b on a.id=b.id
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 2 | 1 |
| 2 | 2 | 2 | 2 | 2 | 2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
傳回行數:[2],耗時:8 ms.
mysql>select a.*,b.* from a left join b on a.id=b.id where a.t1=1
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 2 | 1 |
| 111 | 1 | 1 | | | |
| 11 | 1 | 1 | | | |
+--------------+--------------+--------------+--------------+--------------+--------------+
傳回行數:[3],耗時:10 ms.
mysql>select a.*,b.* from a join b on a.id=b.id where a.t1=1
+--------------+--------------+--------------+--------------+--------------+--------------+
| id | t1 | t2 | id | t1 | t2 |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 1 | 1 | 1 | 1 | 2 | 1 |
+--------------+--------------+--------------+--------------+--------------+--------------+
傳回行數:[1],耗時:8 ms.
簡單點了解,如果left join最後帶有where條件,這個where是針對右表的,通過where過濾後可能導緻右表的資料小于左表,而此時left join與join也是等價的,這樣就會以右表作為驅動表
如果left join的where條件是針對左表的,在left join中,左表本來就會作為驅動表,通過where過濾後左表的資料可能更小,更适合作為驅動表
【總結】
本文對left join對兩種問題進行了分析與說明,如有不足之處歡迎指正。