SQL調優指南—調優方法論

下推更多的計算

PolarDB-X會盡可能将更多的計算下推到存儲層MySQL。下推計算能夠減少資料傳輸，減少網絡層和PolarDB-X層的開銷，提升SQL語句的執行效率。PolarDB-X支援下推幾乎所有算子，包括：

過濾條件，如WHERE或HAVING中的條件。
聚合算子，如COUNT，GROUP BY等，會分成兩階段進行聚合計算。
排序算子，如ORDER BY。
JOIN和子查詢，兩邊JOIN Key分片方式必須一樣，或其中一邊為廣播表。

如下示例講解如何将更多的計算下推到MySQL來加速執行

> EXPLAIN select * from customer, nation where c_nationkey = n_nationkey and n_regionkey = 3;
Project(c_custkey="c_custkey", c_name="c_name", c_address="c_address", c_nationkey="c_nationkey", c_phone="c_phone", c_acctbal="c_acctbal", c_mktsegment="c_mktsegment", c_comment="c_comment", n_nationkey="n_nationkey", n_name="n_name", n_regionkey="n_regionkey", n_comment="n_comment")
  BKAJoin(condition="c_nationkey = n_nationkey", type="inner")
    Gather(concurrent=true)
      LogicalView(tables="nation", shardCount=2, sql="SELECT * FROM `nation` AS `nation` WHERE (`n_regionkey` = ?)")
    Gather(concurrent=true)
      LogicalView(tables="customer_[0-7]", shardCount=8, sql="SELECT * FROM `customer` AS `customer` WHERE (`c_nationkey` IN ('?'))")

若執行計劃中出現了BKAJOIN，BKAJOIN每次從左表擷取一批資料，就會拼成一個IN查詢取出右表相關聯的行，并在最後執行JOIN操作。由于左表資料量很大，需要取很多次才能完成查詢，執行很慢。

無法下推JOIN的原因是：目前情況下，nation是按主鍵n_nationkey切分的，而本查詢的JOIN Key是c_custkey，二者不同，是以下推失敗。

考慮到nation （國家）表資料量并不大、且幾乎沒有修改操作，可以将其重建成如下廣播表：

--- 修改後 ---
CREATE TABLE `nation` (
  `n_nationkey` int(11) NOT NULL,
  `n_name` varchar(25) NOT NULL,
  `n_regionkey` int(11) NOT NULL,
  `n_comment` varchar(152) DEFAULT NULL,
  PRIMARY KEY (`n_nationkey`)
) BROADCAST;  --- 聲明為廣播表

修改後，可以看到執行計劃中不再出現JOIN，幾乎所有計算都被下推到存儲層MySQL執行了（LogicalView中），而上層僅僅是将結果收集并傳回給使用者（Gather算子），執行性能大大增強。

> EXPLAIN select * from customer, nation where c_nationkey = n_nationkey and n_regionkey = 3;
Gather(concurrent=true)
  LogicalView(tables="customer_[0-7],nation", shardCount=8, sql="SELECT * FROM `customer` AS `customer` INNER JOIN `nation` AS `nation` ON ((`nation`.`n_regionkey` = ?) AND (`customer`.`c_nationkey` = `nation`.`n_nationkey`))")

更多關于下推的原理和優化，請參見

查詢改寫與下推

。

增加索引

PolarDB-X支援

全局二級索引

以下以慢SQL示例來講解如何通過GSI下推更多算子

> EXPLAIN select o_orderkey, c_custkey, c_name from orders, customer
          where o_custkey = c_custkey and o_orderdate = '2019-11-11' and o_totalprice > 100;
Project(o_orderkey="o_orderkey", c_custkey="c_custkey", c_name="c_name")
  HashJoin(condition="o_custkey = c_custkey", type="inner")
    Gather(concurrent=true)
      LogicalView(tables="customer_[0-7]", shardCount=8, sql="SELECT `c_custkey`, `c_name` FROM `customer` AS `customer`")
    Gather(concurrent=true)
      LogicalView(tables="orders_[0-7]", shardCount=8, sql="SELECT `o_orderkey`, `o_custkey` FROM `orders` AS `orders` WHERE ((`o_orderdate` = ?) AND (`o_totalprice` > ?))")

執行計劃中，orders按照o_orderkey拆分而customer按照c_custkey拆分，由于拆分次元不同JOIN算子不能下推。考慮到2019-11-11當天總價高于100的訂單非常多，跨分片JOIN耗時很高，需要在orders表上建立一個GSI來使得JOIN算子可以下推。查詢中使用到了orders表的o_orderkey, o_custkey, o_orderdate, o_totalprice四列，其中o_orderkey, o_custkey分别是主表和索引表的拆分鍵，o_orderdate, o_totalprice作為覆寫列包含在索引中用于避免回表。

> create global index i_o_custkey on orders(`o_custkey`) covering(`o_orderdate`, `o_totalprice`)
        DBPARTITION BY HASH(`o_custkey`) TBPARTITION BY HASH(`o_custkey`) TBPARTITIONS 4;

增加GSI并通過force index(i_o_custkey)強制使用索引後，跨分片JOIN變為MySQL上的局部JOIN （IndexScan中），并且通過覆寫列避免了回表操作，查詢性能得到提升。

> EXPLAIN select o_orderkey, c_custkey, c_name from orders force index(i_o_custkey), customer
          where o_custkey = c_custkey and o_orderdate = '2019-11-11' and o_totalprice > 100;
Gather(concurrent=true)
  IndexScan(tables="i_o_custkey_[0-7],customer_[0-7]", shardCount=8, sql="SELECT `i_o_custkey`.`o_orderkey`, `customer`.`c_custkey`, `customer`.`c_name` FROM `i_o_custkey` AS `i_o_custkey` INNER JOIN `customer` AS `customer` ON (((`i_o_custkey`.`o_orderdate` = ?) AND (`i_o_custkey`.`o_custkey` = `customer`.`c_custkey`)) AND (`i_o_custkey`.`o_totalprice` > ?))")

更多關于全局二級索引的使用細節，請參見

執行計劃調優

大多數情況下，PolarDB-X的查詢優化器可以自動産生最佳的執行計劃。但是，少數情況下，可能因為統計資訊存在缺失、誤差等，導緻生成的執行計劃不夠好，這時，可以通過Hint來幹預優化器行為，使之生成更好的執行計劃。如下示例将講解執行計劃的調優。

> EXPLAIN select o_orderkey, c_custkey, c_name from orders, customer
          where o_custkey = c_custkey and o_orderdate = '2019-11-15' and o_totalprice < 10;
Project(o_orderkey="o_orderkey", c_custkey="c_custkey", c_name="c_name")
  HashJoin(condition="o_custkey = c_custkey", type="inner")
    Gather(concurrent=true)
      LogicalView(tables="customer_[0-7]", shardCount=8, sql="SELECT `c_custkey`, `c_name` FROM `customer` AS `customer`")
    Gather(concurrent=true)
      LogicalView(tables="orders_[0-7]", shardCount=8, sql="SELECT `o_orderkey`, `o_custkey` FROM `orders` AS `orders` WHERE ((`o_orderdate` = ?) AND (`o_totalprice` < ?))")

實際上2019-11-15這一天總價低于10元的訂單數量很小，隻有幾條，這時候用BKAJOIN是比Hash JOIN更好的選擇（關于BKAJOIN和Hash JOIN的介紹，請參見

JOIN優化和執行

通過如下/*+TDDL:BKA_JOIN(orders, customer)*/ Hint強制優化器使用BKAJOIN（LookupJOIN）:

> EXPLAIN /*+TDDL:BKA_JOIN(orders, customer)*/ select o_orderkey, c_custkey, c_name from orders, customer
          where o_custkey = c_custkey and o_orderdate = '2019-11-15' and o_totalprice < 10;
Project(o_orderkey="o_orderkey", c_custkey="c_custkey", c_name="c_name")
  BKAJoin(condition="o_custkey = c_custkey", type="inner")
    Gather(concurrent=true)
      LogicalView(tables="orders_[0-7]", shardCount=8, sql="SELECT `o_orderkey`, `o_custkey` FROM `orders` AS `orders` WHERE ((`o_orderdate` = ?) AND (`o_totalprice` < ?))")
    Gather(concurrent=true)
      LogicalView(tables="customer_[0-7]", shardCount=8, sql="SELECT `c_custkey`, `c_name` FROM `customer` AS `customer` WHERE (`c_custkey` IN ('?'))")

可以選擇執行加如下Hint的查詢：

/*+TDDL:BKA_JOIN(orders, customer)*/ select o_orderkey, c_custkey, c_name from orders, customer where o_custkey = c_custkey and o_orderdate = '2019-11-15' and o_totalprice < 10;

以上操作加快了SQL查詢速度。為了讓Hint發揮作用，可以将應用中的SQL加上Hint，或者更友善的方式是使用執行計劃管理（Plan Management）功能對該SQL固定執行計劃。具體操作如下：

BASELINE FIX SQL /*+TDDL:BKA_JOIN(orders, customer)*/ select o_orderkey, c_custkey, c_name from orders, customer where o_custkey = c_custkey and o_orderdate = '2019-11-15';

這樣一來，對于這條SQL（參數可以不同），PolarDB-X都會采用如上固定的執行計劃。更多關于執行計劃管理的資訊，請參見

執行計劃管理

并發執行

使用者可以通過HINT /*+TDDL:PARALLELISM=4*/ 指定并行度，充分利用多核能力加速計算。比如以下例子:

mysql> explain physical select a.k, count(*) cnt from sbtest1 a, sbtest1 b where a.id = b.k and a.id > 1000 group by k having cnt > 1300 or
der by cnt limit 5, 10;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| PLAN                                                                                                                                                              |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ExecutorType: AP_LOCAL                                                                                                                                                 |
| The Query's MaxConcurrentParallelism: 2                                                                                                                           |
| Fragment 1                                                                                                                                                        |
|     Shuffle Output layout: [BIGINT, BIGINT] Output layout: [BIGINT, BIGINT]                                                                                       |
|     Output partitioning: SINGLE [] Parallelism: 1                                                                                                                 |
|     TopN(sort="cnt ASC", offset=?2, fetch=?3)                                                                                                                     |
|   Filter(condition="cnt > ?1")                                                                                                                                    |
|     HashAgg(group="k", cnt="COUNT()")                                                                                                                             |
|       BKAJoin(condition="k = id", type="inner")                                                                                                                   |
|         RemoteSource(sourceFragmentIds=[0], type=RecordType(INTEGER_UNSIGNED id, INTEGER_UNSIGNED k))                                                             |
|         Gather(concurrent=true)                                                                                                                                   |
|           LogicalView(tables="[000000-000003].sbtest1_[00-15]", shardCount=16, sql="SELECT `k` FROM `sbtest1` AS `sbtest1` WHERE ((`k` > ?) AND (`k` IN (...)))") |
| Fragment 0                                                                                                                                                        |
|     Shuffle Output layout: [BIGINT, BIGINT] Output layout: [BIGINT, BIGINT]                                                                                       |
|     Output partitioning: SINGLE [] Parallelism: 1 Splits: 16                                                                                                      |
|     LogicalView(tables="[000000-000003].sbtest1_[00-15]", shardCount=16, sql="SELECT `id`, `k` FROM `sbtest1` AS `sbtest1` WHERE (`id` > ?)")                     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+

預設的并行度并不高，通過強制指定并行度，利用單機或者多機并行模式來加速。

mysql> explain physical /*+TDDL:PARALLELISM=8*/select a.k, count(*) cnt from sbtest1 a, sbtest1 b where a.id = b.k and a.id > 1000 group by k having cnt > 1300 order by cnt limit 5, 10;                                                                                                                                                     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ExecutorMode: AP_LOCAL                                                                                                                                      |
| Fragment 0 dependency: [] parallelism: 8                                                                                                                    |
| BKAJoin(condition="k = id", type="inner")                                                                                                                   |
|   Gather(concurrent=true)                                                                                                                                   |
|     LogicalView(tables="[000000-000003].sbtest1_[00-15]", shardCount=16, sql="SELECT `id`, `k` FROM `sbtest1` AS `sbtest1` WHERE (`id` > ?)")               |
|   Gather(concurrent=true)                                                                                                                                   |
|     LogicalView(tables="[000000-000003].sbtest1_[00-15]", shardCount=16, sql="SELECT `k` FROM `sbtest1` AS `sbtest1` WHERE ((`k` > ?) AND (`k` IN (...)))") |
| Fragment 1 dependency: [] parallelism: 8                                                                                                                    |
| LocalBuffer                                                                                                                                                 |
|   RemoteSource(sourceFragmentIds=[0], type=RecordType(INTEGER_UNSIGNED id, INTEGER_UNSIGNED k, INTEGER_UNSIGNED k0))                                        |
| Fragment 2 dependency: [0, 1] parallelism: 8                                                                                                                |
| Filter(condition="cnt > ?1")                                                                                                                                |
|   HashAgg(group="k", cnt="COUNT()")                                                                                                                         |
|     RemoteSource(sourceFragmentIds=[1], type=RecordType(INTEGER_UNSIGNED id, INTEGER_UNSIGNED k, INTEGER_UNSIGNED k0))                                      |
| Fragment 3 dependency: [0, 1] parallelism: 1                                                                                                                |
| LocalBuffer                                                                                                                                                 |
|   RemoteSource(sourceFragmentIds=[2], type=RecordType(INTEGER_UNSIGNED k, BIGINT cnt))                                                                      |
| Fragment 4 dependency: [2, 3] parallelism: 1                                                                                                                |
| TopN(sort="cnt ASC", offset=?2, fetch=?3)                                                                                                                   |
|   RemoteSource(sourceFragmentIds=[3], type=RecordType(INTEGER_UNSIGNED k, BIGINT cnt))                                                                      |
+---------------------------------------------------------------------------------------------------------

SQL調優指南—調優方法論

下推更多的計算

增加索引

執行計劃調優

并發執行

繼續閱讀

SQL優化SQL語句優化的目的

資料遷移方法資料遷移原則資料遷移之雙寫方案資料遷移之級聯同步方案

redis叢集資料一緻性_RedisRaft為Redis叢集帶來強大的資料一緻性

JAVA高效程式設計指南

寶塔面闆mysql恢複2018.1.8更新

Centos7 MySQL 5.7 安裝MySQL 5.7 安裝

查找入職員工時間排名倒數第三的員工所有資訊

Hibernate使用Hibernate的“3個準備，7個步驟”Hibernate API簡介操作實體對象對象識别

雲計算面試題——mysql/存儲引擎/備份

關于SQL語言

SQL語言基礎：常用的資料查詢語句

Ubuntu16.04安裝Apache+MySQL+PHP1. 安裝Apache2. 安裝MySQL3. 安裝PHP4. 安裝phpMyAdmin

MySQL的4種隔離級别？出現問題

neo4j之cypher使用文檔

mysql使用source指令導入.sql檔案

sqlServer根據經緯查距離