PostgreSQL 9.6 核心優化 - sort性能增強(batch化quicksort代替replacement selection when work_mem small)

digoal

2016-10-08

postgresql , 9.6 , 核心優化 , sort , replacement selection , quciksort

排序是比較常見的業務需求，為了降低排序的cpu開銷，通常會使用索引來滿足排序的需求。

但是并不是所有的query都能使用索引排序，或者說使用索引排序就一定高效。

例如帶過濾條件的query，過濾完之後再根據某些字段或表達式排序。這種query的排序不一定能用上索引。

當需要實時排序時，postgresql資料庫怎麼處理的呢？

postgresql根據排序的資料量, work_mem的大小選擇合适的排序算法。

排序算法參考

<a href="https://en.wikipedia.org/wiki/sorting_algorithm">https://en.wikipedia.org/wiki/sorting_algorithm</a>

包括每種排序算法的最小開銷，最大開銷，平均開銷，記憶體需求，穩定性等。

1. 當需要排序的資料量較小，可以在work_mem參數設定的記憶體值内完成時，會使用qsort排序，這個9.6并沒有改進，與以前一樣。

quicksort算法參考

<a href="https://en.wikipedia.org/wiki/quicksort">https://en.wikipedia.org/wiki/quicksort</a>

PostgreSQL 9.6 核心優化 - sort性能增強(batch化quicksort代替replacement selection when work_mem small)

2. 當需要排序的資料量較大，無法在work_mem參數設定的記憶體值内完成時，會使用臨時檔案以及标準的external sort算法。

2.1 postgresql 9.6以前的版本external sort使用heapsort算法(selection排序的變種)。

參考1 knuth volume 3, algorithm 5.2.3h

<a href="https://books.google.com.hk/books?id=cyulbaaaqbaj&pg=pa757&lpg=pa757&dq=knuth++5.2.3h&source=bl&ots=kjcxoiqs5g&sig=s0gebvmp_bb0uqrgxshzfrdbyju&hl=zh-cn&sa=x&ved=0ahukewiy8tsj0czpahxe3ymkhe3ydr0q6aeiojae#v=onepage&q&f=false">https://books.google.com.hk/books?id=cyulbaaaqbaj&pg=pa757&lpg=pa757&dq=knuth++5.2.3h&source=bl&ots=kjcxoiqs5g&sig=s0gebvmp_bb0uqrgxshzfrdbyju&hl=zh-cn&sa=x&ved=0ahukewiy8tsj0czpahxe3ymkhe3ydr0q6aeiojae#v=onepage&q&f=false</a>

參考2 heapsort

<a href="https://en.wikipedia.org/wiki/sorting_algorithm#heapsort">https://en.wikipedia.org/wiki/sorting_algorithm#heapsort</a>

heapsort原理如圖

selection sort原理如圖，對排序順序與實際順序一緻的資料效果很不錯，但是對離散資料效果不好。

2.2 postgresql 9.6 的改進，隻對頭部掃描的n條記錄使用selection排序，提升随機資料的排序效率，通過參數work_mem與replacement_sort_tuples限制n。随後的資料使用分批的quicksort和merge sort得到結果。

決定使用replacement selection

replacement_sort_tuples建議設定的值與cpu cache size有關，cpu cache size越大，可以調大這個值。否則建議不要修改它。

3. 分批使用knuth's algorithm 5.4.2d, 在postgresql的logtape.c中實作。

4. merge sort, 參考

<a href="https://en.wikipedia.org/wiki/polyphase_merge_sort">https://en.wikipedia.org/wiki/polyphase_merge_sort</a>

5. tuplesort說明。

src/backend/utils/sort/tuplesort.c

當work_mem不足完成記憶體排序時，會選擇external sort或external merge。

根據資料量的大小，分階段排序，合并。

可以通過trace_sort觀察，後面會有例子。

explain 時可能看到的一些資訊

前面已經描述過了，以9.5作為比較，9.6不同的地方，如下圖

9.5每個小的batch(tape)都是使用replacement selection的排序算法。

而9.6則是通過replacement_sort_tuples配置，初次掃描的replacement_sort_tuples條記錄使用replacement selection算法（以裝下一個work_mem機關為止，是以可能實際的replacement selection記錄數小于replacement_sort_tuples的設定），超出的tuples使用quicksort。

使用trace_sort也可以跟蹤到兩個版本的排序算法差異。

replacement selection 是替代排序，對于亂序的資料，排序的效率很差，因為得不停的替換，而對于實體順序與邏輯順序一緻(是順序一緻，倒序一緻不算)的效果則很不錯。

亂序資料quick sort效率比replacement selection效率高，适用範圍廣泛。

9.6的排序改進說明如下

<a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0711803775a37e0bf39d7efdd1e34d9d7e640ea1">https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0711803775a37e0bf39d7efdd1e34d9d7e640ea1</a>

1. 1000萬亂序資料排序

1.1 9.5

1000萬随機資料9.5排序約耗時9562-995=8567毫秒

1.2 9.6

1000萬随機資料9.6排序約耗時7944-899=7045毫秒

如果不想讓9.6使用replacement selection，把replacement_sort_tuples設定為0即可。

如果要少點batch，可以設定更大的work_mem，但是不要超過表的大小，否則會使用inmemory排序，那就看不到external sort或external merge的效果了。

9.5

9.6

2. 1000萬順序資料排序

2.1 9.5

1000萬順序資料9.5排序約耗時5717-901=4816毫秒

2.2 9.6

1000萬順序資料9.6排序約耗時5903-924=4979毫秒

通常不建議修改replacement_sort_tuples的值，與cpu cache比對。

<a href="https://www.postgresql.org/docs/9.6/static/runtime-config-resource.html#runtime-config-resource-memory">https://www.postgresql.org/docs/9.6/static/runtime-config-resource.html#runtime-config-resource-memory</a>

<a href="https://en.wikipedia.org/wiki/external_sorting">https://en.wikipedia.org/wiki/external_sorting</a>

<a href="https://en.wikipedia.org/wiki/selection_sort">https://en.wikipedia.org/wiki/selection_sort</a>

<a href="http://info.flagcounter.com/h9v1">count</a>

PostgreSQL 9.6 核心優化 - sort性能增強(batch化quicksort代替replacement selection when work_mem small)

繼續閱讀

Codeforces 1417 D. Make Them Equal(思維+構造)

查找算法之二分查找查找算法之二分查找

查找算法學習之二分查找（Python版本）——BinarySearch

CQ V1.0分詞bates(基于雙數組tire樹)—應該是目前最快的中文分詞算法

Command Network(POJ 3164)---定根最小樹形圖模闆題題目描述輸入格式輸出格式輸入樣例輸出樣例分析源程式

開源低帶寬語音編解碼器

241 Different Ways to Add Parentheses（C代碼版）

【趨高機器視覺】機器視覺技術原了解析及解決方案

CSMA/CD1． CSMA/CD的概述2． CSMA 的工作原理3． CSMA/CD控制規程及特點4． CSMA/CD協定5． CSMA/CD的優點6．結束語

極大似然法(ML)與最大期望法(EM)

C++ 第十五周報告1--《冒泡法排序》

筆試面試題目：滑動視窗(二)

資料結構與算法（27）——排序（二）

Dijkstra--簡易版（最短路徑）

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

hdu7108哈希