MySQL Join

MySQL本身沒有正常意義上的執行計劃，一般情況就是通過JOIN和QEP_TAB這兩個結構組成。QEP_TAB 的全稱是Query Execution Plan Table，這個“Table“可以是實體表、記憶體表、常量表、子查詢的結果表等等。作為整個單獨JOIN執行計劃載體之前還承擔着整個執行路徑的調用和流轉，但是從8.0.20後，全面的生成了獨立的Iterator執行器引擎模式。在8.0.22中，又引入了AccessPath概念，真正的生成了獨立的執行計劃，進而進一步做到了優化過程到樹型執行計劃，最後到Iterator載體在執行引擎中的執行。

MySQL原始的Join都是依賴于QEP_TAB清單，因為原來MySQL并不支援其他形态的Join結構，隻支援左深樹，那很容易直接使用數組來表示就可以了。優化器在生成執行計劃隻需要在QEP_TAB上增加JOIN的屬性op_type，就可以遞歸去使用不同的Join方法和表通路方式了。

// Operation between the previous QEP_TAB and this one.
  enum enum_op_type {
    // Regular nested loop.
    OT_NONE,

    // Aggregate (GROUP BY).
    OT_AGGREGATE,

    // Various temporary table operations, used at the end of the join.
    OT_MATERIALIZE,
    OT_AGGREGATE_THEN_MATERIALIZE,
    OT_AGGREGATE_INTO_TMP_TABLE,
    OT_WINDOWING_FUNCTION,

    // Block-nested loop (rewritten to hash join).
    OT_BNL,

    // Batch key access.
    OT_BKA
  } op_type = OT_NONE;

Hypergraph Join Optimizer

官方共分了11個Patch來送出對于Join優化器的增強，當然其中包含了對優化器和執行器分離更進一步重構，我們先來看看官方是怎麼送出這樣的重大重構的。

[Basic] 動态規劃查詢超圖算法（DPhyp-Hypergraph partitioning algorithm）

官方首先實作了基于DPhyp的動态規劃查詢超圖算法，論文可以搜尋《Dynamic Programming Strikes Back》。資料庫中關于Join ordering算法有很多，引用2，3中的作者已經做了詳盡的解釋。我這裡隻做簡單的介紹。

每一個Query，都可以定義為一個無向Query Graph，包括查詢中的所有關系R1，R2，...，Rn作為節點；連接配接謂詞表達式作為邊，如a1 = a2 （a1 ∈ Ri，a2 ∈ Rj）；連接配接謂詞中包含常量會形成自邊（self-edge），如a1 = const (a1 ∈ Ri)；大部分的自邊在Join算法裡是不考慮的，因為它會被下推下圖。例如對于 select * from Student s, Attend a, Lecture l, Professor p where s.sno = a.asno and a.alno = l.lno and l.lpno = p.pno and p.pname = 'Larson'，有如下Query Graph結構：

MySQL 8.0.23 Hypergraph Join Optimizer代碼詳解MySQL JoinHypergraph Join Optimizer

對于Join Tree，一般會有以下幾種：left-deep tree、right-deep tree、zigzag tree和bushy tree。前三種是屬于線性Join tree。MySQL之前采取左深樹，為了考慮更好的支援Hash Join和NestLoop Join的選擇，現在開始考慮Bushy Tree了。為了避免任何時候的笛卡爾積Join，線性Join的Join ordering算法通常很簡單。那麼為什麼要引入複雜的Bushy Tree。假設定義Query(R1, R2, R3)有如下屬性，y |R1| = 10, |R2| = 20, |R3| = 20, |R4| = 10, f1,2 = 0.01, f2,3 = 0.5, f3,4 = 0.01。||代表行數，fn,m代表Rn和Rm的選擇率，可以看到Bushy Tree有更好的執行效率。

不過遺憾的是，Bushy Tree的搜尋可能性非常大：

是以，原始左深樹使用的Greedy Heuristics算法，在Bushy Tree下，計算Join Ordering通常使用動态規劃算法（DPccp和DPhyp）。

DPccp的算法如下：

但是DPccp有很多限制：複雜謂詞，涉及到多個表（R1，R2，R3）做為連接配接，例如：R1.a + R2.b + R3.c = R4.d + R5.e + R6.f ；隻支援inner joins；是以引入了新的基于Hypergraph的算法DPhyp。

select *
from R1 r1, R2 r2, R3 r3,
R4 r4, R5 r5, R6 r6
where r1.a=r2.a and r2.b=r3.c and
r4.d=r5.d and r5.e=r6.e and
abs(r1.f + r3.f )
= abs(r4.g + r6.g)

介紹算法先介紹下基本概念超圖（hypergraph）相比普通的圖，其特點是圖中的節點是一個集合，稱為超節點（hypernode），圖中的邊所連接配接的是超節點，即連接配接兩個集合。這類邊稱為超邊（hyperedge）。超圖就是由超節點和超邊作為最基本元素而構成的。有了超圖那麼上面的Join Graph可以變成：

由于使用DPccp和Top-Down Partition Search，不能夠解決outer join，antijoin的不能自由重排的算法。

MySQL目前采用Bitmap（64bit）來表示，假設Join table個數不會超過61個，看下它的定義

+struct Hyperedge {
+  // The endpoints (hypernodes) of this hyperedge. See the comment about
+  // duplicated edges in Node.
+  //
+  // left and right may not overlap, and both must have at least one bit set.
+  NodeMap left;
+  NodeMap right;
+};
+
+struct Hypergraph {
+  std::vector<Node> nodes;  // Maximum 8*sizeof(NodeMap) elements.
+  std::vector<Hyperedge> edges;
+
+  void AddNode();
+  void AddEdge(NodeMap left, NodeMap right);
+};

基本算法流程如下：

1.找到一個圖中種子節點Ri

2.不斷增加i去找hyperedges，不考慮不連接配接的和已經處理過的。

3 對于每一個連通子圖subgraph (csg)，再重複1和2步驟，找出一個仍然可以連通子圖(complement, cmp)，然後連接配接這個圖的cmp成為更大的連通子圖(csg-cmp-pair).

4. 當找到一個csg-cmp-pair，就形成一個可以進行估算的subjoin。

感興趣可以閱讀相應的論文和MySQL的代碼（sql/join_optimizer）。

QEP_TAB和執行器Iterator解藕，重新來設定InnoDB row buffer

總所周知，QEB_TAB結構上承載了很多資訊，除了上面表通路和Join方法的資訊之外，還有InnoDB row buffer、表通路的優化通路方式（ref/range/loose scan/first match/materialize）、附加屬性（having/distinct/sort/icp/lateral derived/mrr/cte）、基本實體表結構TABLE_LIST等。作為删除QEP_TAB的基礎，首先先做了和執行器的解藕工作，Iterator和QEP_TAB分離。

class TableScanIterator final : public TableRowIterator {
  public:
-  // Accepts nullptr for qep_tab; qep_tab is used only for setting up record
-  // buffers.
-  //
-  // The pushed condition can be nullptr.
+  // “expected_rows” is used for scaling the record buffer.
+  // If zero or less, no record buffer will be set up.
   //
   // "examined_rows", if not nullptr, is incremented for each successful Read().
-  TableScanIterator(THD *thd, TABLE *table, QEP_TAB *qep_tab,
+  TableScanIterator(THD *thd, TABLE *table, double expected_rows,
                     ha_rows *examined_rows);

接下來解藕

-static bool init_index_and_record_buffer(const QEP_TAB *qep_tab, handler *file,
+static bool init_index(TABLE *table, handler *file, uint idx, bool sorted) {

-bool set_record_buffer(const QEP_TAB *tab);
+bool set_record_buffer(TABLE *table, double expected_rows_to_fetch);

=>

-  return init_index_and_record_buffer(m_qep_tab, m_qep_tab->table()->file,
-                                      m_ref->key, m_use_order);
+  if (table()->file->inited) return false;
+  if (init_index(table(), table()->file, m_ref->key, m_use_order)) {
+    return true;
+  }
+  return set_record_buffer(table(), m_expected_rows);

實作CostingReceiver和轉化查詢塊select_lex成為超圖hypergraph

MySQL 8.0.23提供了支援hypergraph的優化器模型的第一個原型版本，通過set optimizer_switch="hypergraph_optimizer=on";來打開，主要和原有的優化器差別在于：

不再局限于左深樹的執行計劃
用DPhyp動态規劃算法代替了強力算和啟發式的剪枝方式，減少了搜尋空間，當然還有一些限制
Hash join成為主要的選擇方式
直接和AccessPath互通，而非直接生成Iterators

主要通過FindBestQueryPlan函數來實作，邏輯如下：

先判斷是否屬于新優化器可以支援的Query文法（CheckSupportedQuery），不支援的直接傳回錯誤ER_HYPERGRAPH_NOT_SUPPORTED_YET
轉化top_join_list變成JoinHypergraph結構。由于Hypergraph是比較獨立的算法層面的實作，JoinHypergraph結構用來更好的把資料庫的結構包裝到Hypergraph的edges和nodes的概念上的。
通過EnumerateAllConnectedPartitions實作論文中的DPhyp算法
CostingReceiver類包含了過去JOIN planning的主要邏輯，包括根據cost選擇相應的通路路徑，根據DPhyp生成的子計劃進行評估，保留cost最小的子計劃。
得到root_path後，接下來處理group/agg/having/sort/limit的。對于Group by操作，目前Hypergraph使用sorting first + streaming aggregation的方式。

最後通過CreateIteratorFromAccessPath函數生成對應的執行Iterator樹，在Iterator執行器中執行。

舉例說明：

兩個連通子圖
root:test> explain format=tree select * from t1,t2,t3,t4 where t2.f2 = t1.a and t1.a = t3.a;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Hash cartesian product (no condition)  (cost=1.83 rows=2)
    -> Inner hash join (t2.f2 = t1.a)  (cost=1.55 rows=2)
        -> Table scan on t2  (cost=0.25 rows=2)
        -> Hash
            -> Inner hash join (t1.a = t3.a)  (cost=1.27 rows=1)
                -> Table scan on t1  (cost=1.00 rows=1)
                -> Hash
                    -> Table scan on t3  (cost=0.25 rows=1)
    -> Hash
        -> Table scan on t4  (cost=0.25 rows=1)
 |

一個連通子圖
root:test> explain format=tree select * from t1,t2,t3,t4 where t2.f2 = t1.a and t1.a = t3.a and t2.f2 = t4.pk and t1.a = t4.pk;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Inner hash join (t2.f2 = t4.pk), (t1.a = t4.pk)  (cost=1.83 rows=2)
    -> Inner hash join (t2.f2 = t1.a)  (cost=1.55 rows=2)
        -> Table scan on t2  (cost=0.25 rows=2)
        -> Hash
            -> Inner hash join (t1.a = t3.a)  (cost=1.27 rows=1)
                -> Table scan on t1  (cost=1.00 rows=1)
                -> Hash
                    -> Table scan on t3  (cost=0.25 rows=1)
    -> Hash
        -> Table scan on t4  (cost=0.25 rows=1)
 |

參考資料：

《MySQL 8.0 新的火山模型執行器》《SIGMOD'08|Join Ordering:應對複雜謂詞和非内連接配接的場景》

MySQL 8.0.23 Hypergraph Join Optimizer代碼詳解MySQL JoinHypergraph Join Optimizer

MySQL Join

Hypergraph Join Optimizer

[Basic] 動态規劃查詢超圖算法（DPhyp-Hypergraph partitioning algorithm）

QEP_TAB和執行器Iterator解藕，重新來設定InnoDB row buffer

實作CostingReceiver和轉化查詢塊select_lex成為超圖hypergraph

舉例說明：

繼續閱讀

MySQL的4種隔離級别？出現問題

資料結構與算法（27）——排序（二）

XX系統實施過程問題總結

無元件上傳圖檔到資料庫中，最完整解決方案

【MySQL資料庫】資料庫索引事務1.索引2.事務

android 主線程的相關問題

neo4j之cypher使用文檔

Dijkstra--簡易版（最短路徑）

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

NOSQL安全攻擊

mybatis_入門程式Mybatis入門

mysql使用source指令導入.sql檔案

登入plsql 報錯 the account is locked --使用者被鎖

sqlServer根據經緯查距離

SequoiaDB巨杉資料庫C++驅動概述

hdu7108哈希