PostgreSQL中1000分區的繼承分區表更新失敗的原因

指令行(SQL執行失敗）：

點選(此處)折疊或打開

-bash-4.1$ pgbench -n -r -c 10 -j 10 -T 2 -f update.sql db1000

Client 2 aborted in state 1. Probably the backend died while processing.

WARNING: terminating connection because of crash of another server process

DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

HINT: In a moment you should be able to reconnect to the database and repeat your command.

Client 3 aborted in state 1. Probably the backend died while processing.

Client 0 aborted in state 1. Probably the backend died while processing.

Client 9 aborted in state 1. Probably the backend died while processing.

Client 8 aborted in state 1. Probably the backend died while processing.

Client 7 aborted in state 1. Probably the backend died while processing.

Client 1 aborted in state 1. Probably the backend died while processing.

Client 4 aborted in state 1. Probably the backend died while processing.

Client 6 aborted in state 1. Probably the backend died while processing.

Client 5 aborted in state 1. Probably the backend died while processing.

transaction type: Custom query

scaling factor: 1

query mode: simple

number of clients: 10

number of threads: 10

duration: 2 s

number of transactions actually processed: 0

tps = 0.000000 (including connections establishing)

tps = 0.000000 (excluding connections establishing)

statement latencies in milliseconds:

0.271800 \setrandom id 1 10000000

0.000000 update maintb set name = 'aaaaa12345' where id=:id;

pg服務端日志(服務端程序被殺并重新開機):

LOG: server process (PID 18565) was terminated by signal 9: Killed

DETAIL: Failed process was running: update maintb set name = 'aaaaa12345' where id=150458;

LOG: terminating any other active server processes

LOG: all server processes terminated; reinitializing

LOG: database system was interrupted; last known up at 2015-01-16 14:18:26 CST

LOG: database system was not properly shut down; automatic recovery in progress

LOG: redo starts at 3/4A153AC8

LOG: record with zero length at 3/4AD01498

LOG: redo done at 3/4AD01468

LOG: last completed transaction was at log time 2015-01-16 14:35:11.990522+08

LOG: database system is ready to accept connections

LOG: autovacuum launcher started

/var/log/messages(OOM):

Jan 16 14:35:40 hanode1 kernel: Out of memory: Kill process 18565 (postgres) score 94 or sacrifice child

Jan 16 14:35:40 hanode1 kernel: Killed process 18565, UID 26, (postgres) total-vm:608864kB, anon-rss:150480kB, file-rss:3180kB

通過debug PG9.4的代碼，發現生成分區表的1個update語句的執行計劃居然耗費了717M的記憶體。

sql:

db1000=# update maintb set name='aaaaa12345' where id =3;

UPDATE 19

src/backend/tcop/postgres.c:

* OK to analyze, rewrite, and plan this query.

* Switch to appropriate context for constructing querytrees (again,

* these must outlive the execution context).

oldcontext = MemoryContextSwitchTo(MessageContext);

querytree_list = pg_analyze_and_rewrite(parsetree, query_string,

NULL, 0);

plantree_list = pg_plan_queries(querytree_list, 0, NULL);

/* Done with the snapshot used for parsing/planning */

if (snapshot_set)

PopActiveSnapshot();

在上面的pg_plan_queries()調用前後分别在gdb中執行“call MemoryContextStats(MessageContext)”，從輸出結果可以看出pg_plan_queries配置設定了717M記憶體。

MessageContext: 16384 total in 2 blocks; 9976 free (9 chunks); 6408 used

MessageContext: 721420288 total in 96 blocks; 3715776 free (13 chunks); 717704512 used

1個用戶端700多M，10個用戶端并行執行就是7個G，我的VM隻有2G記憶體，難怪被kill掉。

同樣的方法，測量了另外兩種情況的記憶體消耗。

100個分區的update

db100=# update maintb set name='aaaaa12345' where id =3;

UPDATE 8

生成執行計劃消耗的記憶體:8M

MessageContext: 24576 total in 2 blocks; 18168 free (9 chunks); 6408 used

MessageContext: 8380416 total in 10 blocks; 5056 free (3 chunks); 8375360 used

從8M到700M，不難看出：分區表update的執行計劃消耗記憶體與分區數目n是n^2的倍數關系。

進一步細化，發現關鍵點在下面兩處記憶體配置設定。

src/backend/optimizer/plan/planner.c

static Plan *

inheritance_planner(PlannerInfo *root)

{

...

foreach(lc, root->append_rel_list)//循環1001次

{

AppendRelInfo *appinfo = (AppendRelInfo *) lfirst(lc);

PlannerInfo subroot;

Plan *subplan;

Index rti;

/* append_rel_list contains all append rels; ignore others */

if (appinfo->parent_relid != parentRTindex)

continue;

* We need a working copy of the PlannerInfo so that we can control

* propagation of information back to the main copy.

memcpy(&subroot, root, sizeof(PlannerInfo));

* Generate modified query with this rel as target. We first apply

* adjust_appendrel_attrs, which copies the Query and changes

* references to the parent RTE to refer to the current child RTE,

* then fool around with subquery RTEs.

subroot.parse = (Query *)

adjust_appendrel_attrs(root,

(Node *) parse,

appinfo);//大約配置設定了300K

* The rowMarks list might contain references to subquery RTEs, so

* make a copy that we can apply ChangeVarNodes to. (Fortunately, the

* executor doesn't need to see the modified copies --- we can just

* pass it the original rowMarks list.)

subroot.rowMarks = (List *) copyObject(root->rowMarks);

* The append_rel_list likewise might contain references to subquery

* RTEs (if any subqueries were flattenable UNION ALLs). So prepare

* to apply ChangeVarNodes to that, too.

subroot.append_rel_list = (List *) copyObject(root->append_rel_list);//大約配置設定了400K

* Add placeholders to the child Query's rangetable list to fill the

* RT indexes already reserved for subqueries in previous children.

* These won't be referenced, so there's no need to make them very

* valid-looking.

while (list_length(subroot.parse->rtable) list_length(final_rtable))

subroot.parse->rtable = lappend(subroot.parse->rtable,

makeNode(RangeTblEntry));

}

select和update走的是不同的分支，沒有走上面的inheritance_planner()函數，記憶體消耗隻有7M。

1000個分區的select

db1000=# select * from maintb where id =3;

id | name

----+------------

3 | aaaaa12345

(19 rows)

生成執行計劃消耗的記憶體:7M

MessageContext: 8192 total in 1 blocks; 1864 free (1 chunks); 6328 used

MessageContext: 8388608 total in 11 blocks; 1176776 free (6 chunks); 7211832 used

update的調用棧：

(gdb) bt

#0 inheritance_planner (root=0x1fd6548) at planner.c:787

#1 0x00000000005fe7c0 in subquery_planner (glob=0x1fd64b8, parse=0x1fd59e0, parent_root=value optimized out>,

hasRecursion=value optimized out>, tuple_fraction=0, subroot=0x7fffbf4c70b8) at planner.c:569

#2 0x00000000005fe935 in standard_planner (parse=0x1fd59e0, cursorOptions=0, boundParams=0x0) at planner.c:210

#3 0x00000000006784fa in pg_plan_query (querytree=value optimized out>, cursorOptions=value optimized out>,

boundParams=value optimized out>) at postgres.c:750

#4 0x00000000006785e4 in pg_plan_queries (querytrees=value optimized out>, cursorOptions=0, boundParams=0x0)

at postgres.c:809

#5 0x0000000000678bcc in exec_simple_query (

query_string=0x1fd4b30 "update maintb set name='aaaaa12345' where id =3;") at postgres.c:974

#6 0x000000000067a179 in PostgresMain (argc=value optimized out>, argv=value optimized out>,

dbname=0x1f70fb0 "db1000", username=value optimized out>) at postgres.c:4016

#7 0x00000000006283cb in BackendRun (argc=value optimized out>, argv=value optimized out>) at postmaster.c:4123

#8 BackendStartup (argc=value optimized out>, argv=value optimized out>) at postmaster.c:3797

#9 ServerLoop (argc=value optimized out>, argv=value optimized out>) at postmaster.c:1576

#10 PostmasterMain (argc=value optimized out>, argv=value optimized out>) at postmaster.c:1223

#11 0x00000000005c0cd8 in main (argc=3, argv=0x1f6fa10) at main.c:227

select的調用棧:

#0 grouping_planner (root=0x1fd6568, tuple_fraction=0) at planner.c:1080

#1 0x00000000005fe5b4 in subquery_planner (glob=0x1fd64d8, parse=0x1fd5b20, parent_root=value optimized out>,

hasRecursion=value optimized out>, tuple_fraction=0, subroot=0x7fffbf4c70b8) at planner.c:572

#2 0x00000000005fe935 in standard_planner (parse=0x1fd5b20, cursorOptions=0, boundParams=0x0) at planner.c:210

#5 0x0000000000678bcc in exec_simple_query (query_string=0x1fd4b30 "select * from maintb where id =3;")

at postgres.c:974

按目前PG對分區表的處理邏輯，确實處理不了很多分區的情況。（但是update的那2處記憶體配置設定應該是可以優化的）。

http://blog.chinaunix.net/xmlrpc.php?r=blog/article&uid=20726500&id=4732253

PostgreSQL中1000分區的繼承分區表更新失敗的原因

繼續閱讀

set define off關閉替代變量功能

報錯：'mysql' 不是内部或外部指令，也不是可運作的程式或批處理檔案。

Linxu常用指令技巧彙總

ERROR 1 (HY000): Can't create/write to file '/tmp/#sql_4188_1.MYI' (Errcode: 28)

艱難安裝LDAP,SSL認證

《Linux指令行與Shell腳本程式設計大全第2版.布盧姆》pdf

MySQL的4種隔離級别？出現問題

XX系統實施過程問題總結

無元件上傳圖檔到資料庫中，最完整解決方案

【MySQL資料庫】資料庫索引事務1.索引2.事務

neo4j之cypher使用文檔

NOSQL安全攻擊

mybatis_入門程式Mybatis入門

登入plsql 報錯 the account is locked --使用者被鎖

sqlServer根據經緯查距離

SequoiaDB巨杉資料庫C++驅動概述