PostgreSQL 9.6 内核优化 - sort性能增强(batch化quicksort代替replacement selection when work_mem small)

digoal

2016-10-08

postgresql , 9.6 , 内核优化 , sort , replacement selection , quciksort

排序是比较常见的业务需求，为了降低排序的cpu开销，通常会使用索引来满足排序的需求。

但是并不是所有的query都能使用索引排序，或者说使用索引排序就一定高效。

例如带过滤条件的query，过滤完之后再根据某些字段或表达式排序。这种query的排序不一定能用上索引。

当需要实时排序时，postgresql数据库怎么处理的呢？

postgresql根据排序的数据量, work_mem的大小选择合适的排序算法。

排序算法参考

<a href="https://en.wikipedia.org/wiki/sorting_algorithm">https://en.wikipedia.org/wiki/sorting_algorithm</a>

包括每种排序算法的最小开销，最大开销，平均开销，内存需求，稳定性等。

1. 当需要排序的数据量较小，可以在work_mem参数设置的内存值内完成时，会使用qsort排序，这个9.6并没有改进，与以前一样。

quicksort算法参考

<a href="https://en.wikipedia.org/wiki/quicksort">https://en.wikipedia.org/wiki/quicksort</a>

PostgreSQL 9.6 内核优化 - sort性能增强(batch化quicksort代替replacement selection when work_mem small)

2. 当需要排序的数据量较大，无法在work_mem参数设置的内存值内完成时，会使用临时文件以及标准的external sort算法。

2.1 postgresql 9.6以前的版本external sort使用heapsort算法(selection排序的变种)。

参考1 knuth volume 3, algorithm 5.2.3h

<a href="https://books.google.com.hk/books?id=cyulbaaaqbaj&pg=pa757&lpg=pa757&dq=knuth++5.2.3h&source=bl&ots=kjcxoiqs5g&sig=s0gebvmp_bb0uqrgxshzfrdbyju&hl=zh-cn&sa=x&ved=0ahukewiy8tsj0czpahxe3ymkhe3ydr0q6aeiojae#v=onepage&q&f=false">https://books.google.com.hk/books?id=cyulbaaaqbaj&pg=pa757&lpg=pa757&dq=knuth++5.2.3h&source=bl&ots=kjcxoiqs5g&sig=s0gebvmp_bb0uqrgxshzfrdbyju&hl=zh-cn&sa=x&ved=0ahukewiy8tsj0czpahxe3ymkhe3ydr0q6aeiojae#v=onepage&q&f=false</a>

参考2 heapsort

<a href="https://en.wikipedia.org/wiki/sorting_algorithm#heapsort">https://en.wikipedia.org/wiki/sorting_algorithm#heapsort</a>

heapsort原理如图

selection sort原理如图，对排序顺序与实际顺序一致的数据效果很不错，但是对离散数据效果不好。

2.2 postgresql 9.6 的改进，只对头部扫描的n条记录使用selection排序，提升随机数据的排序效率，通过参数work_mem与replacement_sort_tuples限制n。随后的数据使用分批的quicksort和merge sort得到结果。

决定使用replacement selection

replacement_sort_tuples建议设置的值与cpu cache size有关，cpu cache size越大，可以调大这个值。否则建议不要修改它。

3. 分批使用knuth's algorithm 5.4.2d, 在postgresql的logtape.c中实现。

4. merge sort, 参考

<a href="https://en.wikipedia.org/wiki/polyphase_merge_sort">https://en.wikipedia.org/wiki/polyphase_merge_sort</a>

5. tuplesort说明。

src/backend/utils/sort/tuplesort.c

当work_mem不足完成内存排序时，会选择external sort或external merge。

根据数据量的大小，分阶段排序，合并。

可以通过trace_sort观察，后面会有例子。

explain 时可能看到的一些信息

前面已经描述过了，以9.5作为比较，9.6不同的地方，如下图

9.5每个小的batch(tape)都是使用replacement selection的排序算法。

而9.6则是通过replacement_sort_tuples配置，初次扫描的replacement_sort_tuples条记录使用replacement selection算法（以装下一个work_mem单位为止，所以可能实际的replacement selection记录数小于replacement_sort_tuples的设置），超出的tuples使用quicksort。

使用trace_sort也可以跟踪到两个版本的排序算法差异。

replacement selection 是替代排序，对于乱序的数据，排序的效率很差，因为得不停的替换，而对于物理顺序与逻辑顺序一致(是顺序一致，倒序一致不算)的效果则很不错。

乱序数据quick sort效率比replacement selection效率高，适用范围广泛。

9.6的排序改进说明如下

<a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0711803775a37e0bf39d7efdd1e34d9d7e640ea1">https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0711803775a37e0bf39d7efdd1e34d9d7e640ea1</a>

1. 1000万乱序数据排序

1.1 9.5

1000万随机数据9.5排序约耗时9562-995=8567毫秒

1.2 9.6

1000万随机数据9.6排序约耗时7944-899=7045毫秒

如果不想让9.6使用replacement selection，把replacement_sort_tuples设置为0即可。

如果要少点batch，可以设置更大的work_mem，但是不要超过表的大小，否则会使用inmemory排序，那就看不到external sort或external merge的效果了。

9.5

9.6

2. 1000万顺序数据排序

2.1 9.5

1000万顺序数据9.5排序约耗时5717-901=4816毫秒

2.2 9.6

1000万顺序数据9.6排序约耗时5903-924=4979毫秒

通常不建议修改replacement_sort_tuples的值，与cpu cache匹配。

<a href="https://www.postgresql.org/docs/9.6/static/runtime-config-resource.html#runtime-config-resource-memory">https://www.postgresql.org/docs/9.6/static/runtime-config-resource.html#runtime-config-resource-memory</a>

<a href="https://en.wikipedia.org/wiki/external_sorting">https://en.wikipedia.org/wiki/external_sorting</a>

<a href="https://en.wikipedia.org/wiki/selection_sort">https://en.wikipedia.org/wiki/selection_sort</a>

<a href="http://info.flagcounter.com/h9v1">count</a>

PostgreSQL 9.6 内核优化 - sort性能增强(batch化quicksort代替replacement selection when work_mem small)

继续阅读

Codeforces 1417 D. Make Them Equal(思维+构造)

查找算法之二分查找查找算法之二分查找

查找算法学习之二分查找（Python版本）——BinarySearch

CQ V1.0分词bates(基于双数组tire树)—应该是目前最快的中文分词算法

Command Network(POJ 3164)---定根最小树形图模板题题目描述输入格式输出格式输入样例输出样例分析源程序

开源低带宽语音编解码器

241 Different Ways to Add Parentheses（C代码版）

【趋高机器视觉】机器视觉技术原理解析及解决方案

CSMA/CD1． CSMA/CD的概述2． CSMA 的工作原理3． CSMA/CD控制规程及特点4． CSMA/CD协议5． CSMA/CD的优点6．结束语

极大似然法(ML)与最大期望法(EM)

C++ 第十五周报告1--《冒泡法排序》

笔试面试题目：滑动窗口(二)

数据结构与算法（27）——排序（二）

Dijkstra--简易版（最短路径）

GitHub连夜封杀！这份阿里 10W 字内部 Java 字面试手册到底有多强？

hdu7108哈希