天天看點

PostgreSQL sharding : citus 系列5 - worker節點網絡優化

标簽

PostgreSQL , citus , pgbouncer , 網絡優化 , worker節點

https://github.com/digoal/blog/blob/master/201809/20180905_02.md#%E8%83%8C%E6%99%AF 背景

citus 節點間的網絡需求:

1、cn節點通路所有worker節點。oltp業務的通路較頻繁。

2、重分布資料時,worker節點間互相通路。通路頻度不大,OLAP業務常見,一旦有可能資料交換吞吐較大。

citus的cn節點連worker節點為有兩種模式,

一種為事務級保持連接配接模式(每條SQL發起時建立連接配接,SQL結束後釋放連接配接(除非在事務中,否則SQL結束立即釋放連接配接)。),

另一種為會話級保持連接配接模式(會話發起時建立連接配接,會話結束後釋放連接配接。)。

1、跑OLAP類的SQL時,使用的是第一種即時連接配接模式(OLAP場景并發不高,建立連接配接帶來的額外開銷不明顯)

可以在worker節點打開參數進行跟蹤

postgres=# show log_connections ;  
 log_connections   
-----------------  
 on  
(1 row)  
  
postgres=# show log_disconnections ;  
 log_disconnections   
--------------------  
 on  
(1 row)  
           

例子,

以下兩條SQL均為即時短連接配接模式(

Custom Scan (Citus Task-Tracker) Custom Scan (Citus Real-Time)

)。

postgres=# set citus.task_executor_type =task;  
ERROR:  invalid value for parameter "citus.task_executor_type": "task"  
HINT:  Available values: real-time, task-tracker.  
  
  
postgres=# set citus.task_executor_type ='task-tracker';  
SET  
postgres=# explain select count(*) from pgbench_accounts ;  
                                                     QUERY PLAN                                                        
---------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=0.00..0.00 rows=0 width=0)  
   ->  Custom Scan (Citus Task-Tracker)  (cost=0.00..0.00 rows=0 width=0)  
         Task Count: 128  
         Tasks Shown: One of 128  
         ->  Task  
               Node: host=172.24.211.224 port=1921 dbname=postgres  
               ->  Aggregate  (cost=231.85..231.86 rows=1 width=8)  
                     ->  Seq Scan on pgbench_accounts_106812 pgbench_accounts  (cost=0.00..212.48 rows=7748 width=0)  
(8 rows)  
  
  
postgres=# set citus.task_executor_type ='real-time';  
  
postgres=# explain select count(*) from pgbench_accounts ;  
                                                     QUERY PLAN                                                        
---------------------------------------------------------------------------------------------------------------------  
 Aggregate  (cost=0.00..0.00 rows=0 width=0)  
   ->  Custom Scan (Citus Real-Time)  (cost=0.00..0.00 rows=0 width=0)  
         Task Count: 128  
         Tasks Shown: One of 128  
         ->  Task  
               Node: host=172.24.211.224 port=1921 dbname=postgres  
               ->  Aggregate  (cost=231.85..231.86 rows=1 width=8)  
                     ->  Seq Scan on pgbench_accounts_106812 pgbench_accounts  (cost=0.00..212.48 rows=7748 width=0)  
(8 rows)  
           

2、跑OLTP查詢時(通常并發很高,前端有連接配接池(保持會話)),為會話級保持連接配接模式(

Custom Scan (Citus Router)

)。

以下SQL為長連接配接模式(不會立即釋放,而是等會再釋放,以降低高并發時連接配接帶來的開銷)

postgres=# explain select * from pgbench_accounts where aid=5;  
                                                                QUERY PLAN                                                                  
------------------------------------------------------------------------------------------------------------------------------------------  
 Custom Scan (Citus Router)  (cost=0.00..0.00 rows=0 width=0)  
   Task Count: 1  
   Tasks Shown: All  
   ->  Task  
         Node: host=172.24.211.224 port=1921 dbname=postgres  
         ->  Index Scan using pgbench_accounts_pkey_106836 on pgbench_accounts_106836 pgbench_accounts  (cost=0.28..2.50 rows=1 width=97)  
               Index Cond: (aid = 5)  
(7 rows)  
           

看以上兩種場景,CITUS應該說設計得已經很不錯了。既能滿足TP也能滿足AP。

但是前面也說了,連接配接保持是在事務或會話層面的,如果查詢量大,或者使用者使用了短連接配接,建立連接配接的開銷就會很凸顯。

newdb=> \c postgres postgres  
You are now connected to database "postgres" as user "postgres".  
  
postgres=# select * from pgbench_accounts where aid=1;  
 aid | bid | abalance |                                        filler                                          
-----+-----+----------+--------------------------------------------------------------------------------------  
   1 |   1 |     7214 |                                                                                       
(1 row)  
  
Time: 11.264 ms  -- 包括建立連接配接的開銷  
  
postgres=# select * from pgbench_accounts where aid=1;  
 aid | bid | abalance |                                        filler                                          
-----+-----+----------+--------------------------------------------------------------------------------------  
   1 |   1 |     7214 |                                                                                       
(1 row)  
  
Time: 0.905 ms  -- 已建立連接配接  
           

https://github.com/digoal/blog/blob/master/201809/20180905_02.md#%E4%BD%BF%E7%94%A8pgbouncer%E8%A7%A3%E5%86%B3%E5%BB%BA%E7%AB%8B%E8%BF%9E%E6%8E%A5%E7%9A%84%E5%BC%80%E9%94%80 使用pgbouncer,解決建立連接配接的開銷

PostgreSQL sharding : citus 系列5 - worker節點網絡優化

在worker節點上,部署pgbouncer,所有與worker節點建立的連接配接都通過pgbouncer連接配接池,以此來保持住連接配接,降低worker節點頻繁建立連接配接的開銷。

部署方法

https://github.com/digoal/blog/blob/master/201809/20180905_02.md#1%E6%89%80%E6%9C%89worker%E8%8A%82%E7%82%B9 1、所有worker節點

pgbouncer

yum install -y pgbouncer  
           

配置

vi /etc/pgbouncer/pgb.ini  
  
[databases]  
newdb = host=/tmp dbname=newdb port=1921 user=digoal pool_size=128 reserve_pool=10  
[pgbouncer]  
logfile = /var/log/pgbouncer/pgbouncer.log  
pidfile = /var/run/pgbouncer/pgbouncer.pid  
listen_addr = 0.0.0.0  
listen_port = 8001  
auth_type = any  
auth_file = /etc/pgbouncer/userlist.txt  
pool_mode = session  
server_reset_query = DISCARD ALL  
max_client_conn = 5000  
default_pool_size = 128  
; 最大不要超過4倍CPU數  
           

啟動

pgbouncer -d -u pgbouncer /etc/pgbouncer/pgb.ini  
           

在一個citus叢集中,可以同時存在直連worker或者通過pgbouncer連接配接worker。

不同的database可以有不同的配置。

以下例子,建立一個database,使用pgbouncer連接配接worker.

https://github.com/digoal/blog/blob/master/201809/20180905_02.md#2%E6%89%80%E6%9C%89%E8%8A%82%E7%82%B9%E5%8C%85%E6%8B%ACcn-worker 2、所有節點(包括cn, worker)

建立資料庫,插件

su - postgres  
psql -c "create role digoal login;"  
psql -c "create database newdb;"  
psql -c "grant all on database newdb to digoal;"  
psql -U postgres newdb -c "create extension citus;"  
           

https://github.com/digoal/blog/blob/master/201809/20180905_02.md#cn%E8%8A%82%E7%82%B9 cn節點

将worker添加到叢集配置,使用pgbouncer的連接配接端口

su - postgres -c "psql -U postgres newdb -c \"SELECT * from master_add_node('xxx.xxx.xxx.224', 8001);\""      
su - postgres -c "psql -U postgres newdb -c \"SELECT * from master_add_node('xxx.xxx.xxx.230', 8001);\""      
su - postgres -c "psql -U postgres newdb -c \"SELECT * from master_add_node('xxx.xxx.xxx.231', 8001);\""      
su - postgres -c "psql -U postgres newdb -c \"SELECT * from master_add_node('xxx.xxx.xxx.225', 8001);\""      
su - postgres -c "psql -U postgres newdb -c \"SELECT * from master_add_node('xxx.xxx.xxx.227', 8001);\""      
su - postgres -c "psql -U postgres newdb -c \"SELECT * from master_add_node('xxx.xxx.xxx.232', 8001);\""      
su - postgres -c "psql -U postgres newdb -c \"SELECT * from master_add_node('xxx.xxx.xxx.226', 8001);\""      
su - postgres -c "psql -U postgres newdb -c \"SELECT * from master_add_node('xxx.xxx.xxx.229', 8001);\""     
           

MX配置,同樣,将worker節點添加到中繼資料同步中。

psql newdb postgres  
           
select * from master_add_node('xxx.xxx.xxx.224',8001);    
select * from master_add_node('xxx.xxx.xxx.230',8001);    
           

開啟同步到中繼資料。

select start_metadata_sync_to_node('xxx.xxx.xxx.224',8001);    
select start_metadata_sync_to_node('xxx.xxx.xxx.230',8001);    
           

https://github.com/digoal/blog/blob/master/201809/20180905_02.md#%E6%B5%8B%E8%AF%95 測試

1、tpc-b 長連接配接測試

pgbench -i -s -U digoal newdb  
           
psql -U digoal newdb  
  
select create_distributed_table('pgbench_accounts','aid');  
select create_distributed_table('pgbench_branches','bid');    
select create_distributed_table('pgbench_tellers','tid');    
select create_distributed_table('pgbench_history','aid');  
           
pgbench -M prepared -v -r -P 1 -c 64 -j 64 -T 120 -U digoal newdb -S  
           

性能與不使用pgbouncer差不多,因為使用了長連接配接測試簡單SQL(本身citus就使用了會話級連接配接保持,沒有短連接配接問題)。

https://github.com/digoal/blog/blob/master/201809/20180905_02.md#citus%E7%AC%AC%E4%B8%80%E6%AC%A1query%E7%9A%84%E9%9A%90%E8%97%8F%E8%80%97%E6%97%B6 citus第一次QUERY的隐藏耗時

在一個建立的會話中,第一次查詢總是需要需要耗費更多的時間(如果沒有建立連接配接,那麼包括建立連接配接的時間。即使已建立連接配接,也需要耗費額外的一些時間)(具體原因可以跟蹤分析一下code)。

以下使用的是pgbouncer連接配接worker,是以第一次QUERY不包含建立連接配接的時間。

newdb=> \q  
postgres@digoal-citus-gpdb-test001-> psql newdb digoal  
psql (10.5)  
Type "help" for help.  
  
\timing  
  
newdb=> select * from pgbench_accounts where aid=5;  
 aid | bid | abalance |                                        filler                                          
-----+-----+----------+--------------------------------------------------------------------------------------  
   5 |   1 |        0 |                                                                                       
(1 row)  
  
Time: 6.016 ms  -- 不包括建立連接配接(已使用pgbouncer建立),但是也多了幾毫秒  
-- 但是相比未使用pgbouncer,已經降低了5毫秒左右的延遲。
  
newdb=> select * from pgbench_accounts where aid=5;  
 aid | bid | abalance |                                        filler                                          
-----+-----+----------+--------------------------------------------------------------------------------------  
   5 |   1 |        0 |                                                                                       
(1 row)  
  
Time: 0.989 ms  
           

多出的幾毫秒,我們社群的小夥伴鄧彪、王健給出了原因如下,很多地方需要檢查安裝的citus版本與citus.control控制檔案的版本是否相容(比如加載分布式TABLE的RELCACHE時,第一次通路就是這個問題),不相容報錯:

詳見代碼

https://github.com/citusdata/citus/blob/3fa04d8f2c8f27b1377fe4c1468ee47358117e3c/src/backend/distributed/utils/metadata_cache.c
/*
 * CheckAvailableVersion compares CITUS_EXTENSIONVERSION and the currently
 * available version from the citus.control file. If they are not compatible,
 * this function logs an error with the specified elevel and returns false,
 * otherwise it returns true.
 */
bool
CheckAvailableVersion(int elevel)
{
	char *availableVersion = NULL;

	if (!EnableVersionChecks)
	{
		return true;
	}

	availableVersion = AvailableExtensionVersion();

	if (!MajorVersionsCompatible(availableVersion, CITUS_EXTENSIONVERSION))
	{
		ereport(elevel, (errmsg("loaded Citus library version differs from latest "
								"available extension version"),
						 errdetail("Loaded library requires %s, but the latest control "
								   "file specifies %s.", CITUS_MAJORVERSION,
								   availableVersion),
						 errhint("Restart the database to load the latest Citus "
								 "library.")));
		return false;
	}

	return true;
}
 

/*
 * AvailableExtensionVersion returns the Citus version from citus.control file. It also
 * saves the result, thus consecutive calls to CitusExtensionAvailableVersion will
 * not read the citus.control file again.
 */
static char *
AvailableExtensionVersion(void)
{
	ReturnSetInfo *extensionsResultSet = NULL;
	TupleTableSlot *tupleTableSlot = NULL;
	FunctionCallInfoData *fcinfo = NULL;
	FmgrInfo *flinfo = NULL;
	int argumentCount = 0;
	EState *estate = NULL;

	bool hasTuple = false;
	bool goForward = true;
	bool doCopy = false;
	char *availableExtensionVersion;

	InitializeCaches();

	estate = CreateExecutorState();
	extensionsResultSet = makeNode(ReturnSetInfo);
	extensionsResultSet->econtext = GetPerTupleExprContext(estate);
	extensionsResultSet->allowedModes = SFRM_Materialize;

	fcinfo = palloc0(sizeof(FunctionCallInfoData));
	flinfo = palloc0(sizeof(FmgrInfo));

	fmgr_info(F_PG_AVAILABLE_EXTENSIONS, flinfo);
	InitFunctionCallInfoData(*fcinfo, flinfo, argumentCount, InvalidOid, NULL,
							 (Node *) extensionsResultSet);

	/* pg_available_extensions returns result set containing all available extensions */
	(*pg_available_extensions)(fcinfo);

	tupleTableSlot = MakeSingleTupleTableSlot(extensionsResultSet->setDesc);
	hasTuple = tuplestore_gettupleslot(extensionsResultSet->setResult, goForward, doCopy,
									   tupleTableSlot);
	while (hasTuple)
	{
		Datum extensionNameDatum = 0;
		char *extensionName = NULL;
		bool isNull = false;

		extensionNameDatum = slot_getattr(tupleTableSlot, 1, &isNull);
		extensionName = NameStr(*DatumGetName(extensionNameDatum));
		if (strcmp(extensionName, "citus") == 0)
		{
			MemoryContext oldMemoryContext = NULL;
			Datum availableVersion = slot_getattr(tupleTableSlot, 2, &isNull);

			/* we will cache the result of citus version to prevent catalog access */
			oldMemoryContext = MemoryContextSwitchTo(CacheMemoryContext);

			availableExtensionVersion = text_to_cstring(DatumGetTextPP(availableVersion));

			MemoryContextSwitchTo(oldMemoryContext);

			ExecClearTuple(tupleTableSlot);
			ExecDropSingleTupleTableSlot(tupleTableSlot);

			return availableExtensionVersion;
		}

		ExecClearTuple(tupleTableSlot);
		hasTuple = tuplestore_gettupleslot(extensionsResultSet->setResult, goForward,
										   doCopy, tupleTableSlot);
	}

	ExecDropSingleTupleTableSlot(tupleTableSlot);

	ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
					errmsg("citus extension is not found")));

	return NULL;
}
           

https://github.com/digoal/blog/blob/master/201809/20180905_02.md#pgbouncer%E8%BF%9E%E6%8E%A5%E6%B1%A0%E8%BF%9E%E6%8E%A5worker%E7%9A%84%E5%A5%BD%E5%A4%84 pgbouncer連接配接池連接配接worker的好處

1、對于業務層短連接配接會有比較好的效果。可以降低至少5毫秒左右的延遲。

2、對于大量複雜查詢(需要motion的查詢),可以減少節點間的連接配接數。

https://github.com/digoal/blog/blob/master/201809/20180905_02.md#%E5%90%8C%E6%97%B6%E4%B9%9F%E9%9C%80%E8%A6%81%E6%B3%A8%E6%84%8Fpgbouncer%E7%93%B6%E9%A2%88%E5%B8%A6%E5%AE%BDqps%E5%8D%95%E6%A0%B8%E5%8D%95%E4%B8%AApgbouncer%E8%BF%9B%E7%A8%8B%E5%B3%B0%E5%80%BCqps%E7%BA%A65%E4%B8%87 同時也需要注意pgbouncer瓶頸:帶寬,QPS(單核,單個pgbouncer程序峰值QPS約5萬)

https://github.com/digoal/blog/blob/master/201809/20180905_02.md#%E5%8F%82%E8%80%83 參考

man 1 pgbouncer

man 5 pgbouncer

《PostgreSQL 連接配接池 pgbouncer 使用》