一條Insert語句怎麼優化和解構

這是學習筆記的第 2207 篇文章

讀完需要

分鐘

速讀僅需7分鐘

今天同僚問了一個關于DML的優化問題，SQL是一條Insert語句：

insert into crash_extend_bak select a.extend_id,a.crash_id,a.system_log,a.user_log,a.crash_stack,b.java_stack_map from crash_extend a where a.crash_id in (select b.crash_id from crash_record_bak b where a.crash_id=b.crash_id)

執行的時候抛出了下面的錯誤：

ERROR 1197 (HY000): Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage; increase this mysqld variable and try again

看起來這個操作的代價比較高，導緻binlog都承載不了這樣大的一個事務。

看到這裡，我的大體思路已經有了，首先是定位資料量，然後是定位瓶頸。

其中表crash_extend 的資料量為200萬左右，大小為1G左右，crash_record_bak 的資料量遠大于200萬，表的存儲容量在70G左右，具體的數量暫未知。

是以按照這些基本資訊，我們來梳理基本的邏輯就有眉目了，輸出結果是以crash_extend 的為準，而且crash_extend 的extend_id的字段會和 crash_record_bak的字段 crash_id進行比對。

是以我先拆解了邏輯，

檢視crash_extend的資料量。

| 2130620 |

是以基本能夠确定整個查詢的量級。

然後我去掉了crash_record_bak的字段（該字段為json類型）進行查詢：

select a.extend_id,a.crash_id,a.system_log,a.user_log,a.crash_stack from crash_extend a,crash_record_bak b where a.crash_id=b.crash_id;

耗時在3分鐘左右。

然後寫入資料：

insert into crash_extend_bak(extend_id,crash_id,system_log,user_log,crash_sta

ck)select a.extend_id,a.crash_id,a.system_log,a.user_log,a.crash_stack from crash_extend a,crash_record_bak b where a.crash_id=b.crash_id;

Query OK, 2106507 rows affected (4 min 50.77 sec)

這個過程耗時為5分鐘左右。

接下來定位瓶頸的部分，對json字段進行更新。

改造為：

update crash_extend_bak a set a.java_stack_map=(select java_stack_map from cr

ash_record_bak b where a.crash_id=b.crash_id) where extend_id between xxxx and xxxx;

Query OK, 8867 rows affected (7.47 sec)

經過測試，基本是1萬條左右的效率是比較穩定的，是以拆解一下就可以保證基本的效率了。

可以看到一條Insert語句經過拆解和定位，可以拆解為多條語句進行優化和改進。

本文參與騰訊雲自媒體分享計劃，歡迎熱愛寫作的你一起參與！

本文章分享自微信公衆号

楊建榮的學習筆記

作者：楊建榮

原始發表時間：2020-03-13

如有侵權，請聯系 [email protected] 删除。