天天看點

官人要杯咖啡嗎? - PostgreSQL實時監測PLAN tree的執行進度 pg_query_state

postgresql , long query , plan tree , sql進度條

當我們在資料庫中執行一些比較大的查詢,或者執行比較複雜的函數時,如果要知道執行到哪裡了,預計還要多久。

怎麼辦呢? 有方法得到嗎?當然有。

目前postgresql支援的是傳統的sql執行方法,将(nonutility)語句parser後,根據parser tree生成plan tree, 然後根據plan tree去執行裡面的每一個node。

也就是說,我們實際上是可以随時檢視plan tree,以及每個node的執行情況的(包括一些loop,例如 nest loop join).

比如

這個query包含了3個node,分别是index scan和nestloop.

得到進度條的原理如下

1. 生成執行計劃(plan tree)

2. 按部就班的執行

3. 在執行過程中,我們可以跟蹤執行的node,輸出統計資訊,列印query的執行進度。

oleg 的postgrespro公司開源的一個插件pg_query_state就是用來幹這個事情的。

可以觀察sql執行過程中動态變化的資訊,包括hit, run tim, loop, memory開銷等等。

用法舉例

編譯安裝

觀察進度例子1

再來個複雜一點的

觀察進度例子2

觀察進度例子3

來一個嵌套循環

觀察進度例子4

來一個并行查詢

the <code>pg_query_state</code> module provides facility to know the current state of query execution on working backend. to enable this extension you have to patch the latest stable version of postgresql. different branches are intended for different version numbers of postgresql, e.g., branch pg9_5 corresponds to postgresql 9.5.

each complex query statement (select/insert/update/delete) after optimization/planning stage is translated into plan tree wich is kind of imperative representation of declarative sql query. explain analyze request allows to demonstrate execution statistics gathered from each node of plan tree (full time of execution, number rows emitted to upper nodes, etc). but this statistics is collected after execution of query. this module allows to show actual statistics of query running on external backend. at that, format of resulting output is almost identical to ordinal explain analyze. thus users are able to track of query execution in progress.

in fact, this module is able to explore external backend and determine its actual state. particularly it's helpful when backend executes a heavy query or gets stuck.

using this module there can help in the following things:

detect a long query (along with other monitoring tools)

to install <code>pg_query_state</code>, please apply patches <code>custom_signal.patch</code>, <code>executor_hooks.patch</code> and <code>runtime_explain.patch</code> to the latest stable version of postgresql and rebuild postgresql.

correspondence branch names to postgresql version numbers:

pg9_5 --- postgresql 9.5

pgpro9_5 --- postgrespro 9.5

master --- development version for postgresql 10devel

then execute this in the module's directory:

add module name to the <code>shared_preload_libraries</code> parameter in <code>postgresql.conf</code>:

it is essential to restart the postgresql instance. after that, execute the following query in psql:

done!

tests using parallel sessions using python 2.7 script:

prerequisite packages:

<code>psycopg2</code> version 2.6 or later

<code>pyyaml</code> version 3.11 or later

options:

- -host --- postgres server host, default value is localhost

- -port --- postgres server port, default value is 5432

- -database --- database name, default value is postgres

- -user --- user name, default value is postgres

- -password --- user's password, default value is empty

extract current query state from backend with specified <code>pid</code>. since parallel query can spawn workers and function call causes nested subqueries so that state of execution may be viewed as stack of running queries, return value of <code>pg_query_state</code> has type <code>table (pid integer, frame_number integer, query_text text, plan text, leader_pid integer)</code>. it represents tree structure consisting of leader process and its spawned workers. each worker refers to leader through <code>leader_pid</code> column. for leader process the value of this column is<code>null</code>. for each process the stack frames are specified as correspondence between <code>frame_number</code>, <code>query_text</code> and <code>plan</code> columns.

thus, user can see the states of main query and queries generated from function calls for leader process and all workers spawned from it.

in process of execution some nodes of plan tree can take loops of full execution. therefore statistics for each node consists of two parts: average statistics for previous loops just like in explain analyze output and statistics for current loop if node have not finished.

optional arguments:

<code>verbose</code> --- use explain verbose for plan printing;

<code>costs</code> --- add costs for each node;

<code>timing</code> --- print timing data for each node, if collecting of timing statistics is turned off on called side resulting output will contain warning message <code>timing statistics disabled</code>;

<code>buffers</code> --- print buffers usage, if collecting of buffers statistics is turned off on called side resulting output will contain warning message <code>buffers statistics disabled</code>;

<code>triggers</code> --- include triggers statistics in result plan trees;

<code>format</code> --- explain format to be used for plans printing, posible values: {<code>text</code>, <code>xml</code>, <code>json</code>, <code>yaml</code>}.

if callable backend is not executing any query the function prints info message about backend's state taken from <code>pg_stat_activity</code> view if it exists there.

calling role have to be superuser or member of the role whose backend is being called. othrewise function prints error message <code>permission denied</code>.

<code>pg_query_state.enable</code> --- disable (or enable) <code>pg_query_state</code> completely, default value is <code>true</code>

<code>pg_query_state.enable_timing</code> --- collect timing data for each node, default value is <code>false</code>

<code>pg_query_state.enable_buffers</code> --- collect buffers usage, default value is <code>false</code>

this parameters is set on called side before running any queries whose states are attempted to extract. warning: if <code>pg_query_state.enable_timing</code> is turned off the calling side cannot get time statistics, similarly for <code>pg_query_state.enable_buffers</code> parameter.

assume one backend with pid = 20102 performs a simple query:

other backend can extract intermediate state of execution that query:

in example above <code>materialize</code> node has statistics on passed loops (average number of rows delivered to <code>nested loop</code> and number of passed loops are shown) and statistics on current loop. other nodes has statistics only for current loop as this loop is first (<code>loop number</code> = 1).

assume first backend executes some function:

other backend can get the follow output:

first row corresponds to function call, second - to query which is in the body of t