關于 citus.limit_clause_row_fetch_count
優化參數
citus.limit_clause_row_fetch_count
citus.limit_clause_row_fetch_count
是citus的一個性能優化的參數。具體适應于什麼場景呢?
官方文檔說明
下面是官方文檔說明,還是不夠具體。
https://docs.citusdata.com/en/v7.3/develop/api_guc.html?highlight=limit_clause_row_fetch_count
Planner Configuration
citus.limit_clause_row_fetch_count (integer)
Sets the number of rows to fetch per task for limit clause optimization.
In some cases, select queries with limit clauses may need to fetch all
rows from each task to generate results. In those cases, and where an
approximation would produce meaningful results, this configuration value
sets the number of rows to fetch from each shard. Limit approximations
are disabled by default and this parameter is set to -1. This value can
be set at run-time and is effective on the coordinator.
測試用例
從citus的測試用例中,可以清楚的看到它的作用。
citus-7.2.1\src\test\regress\expected\multi_limit_clause_approximate.out
-- Enable limit optimization to fetch one third of each shard's data
SET citus.limit_clause_row_fetch_count TO 600;
SELECT l_partkey, sum(l_partkey * (1 + l_suppkey)) AS aggregate FROM lineitem
GROUP BY l_partkey
ORDER BY aggregate DESC LIMIT 10;
DEBUG: push down of limit count: 600
l_partkey | aggregate
-----------+------------
194541 | 3727794642
160895 | 3671463005
183486 | 3128069328
179825 | 3093889125
162432 | 2834113536
153937 | 2761321906
199283 | 2726988572
185925 | 2672114100
196629 | 2622637602
157064 | 2614644408
(10 rows)
上面的SQL,如果不加
SET citus.limit_clause_row_fetch_count TO 600
,CN需要到worker上把所有資料都撈出來,然後再在CN上排序取TopN結果。
大資料量的情況,性能會非常糟糕。加上
SET citus.limit_clause_row_fetch_count TO 600
,就隻會到每個worker上取前600的記錄。但可能會帶來準确性的損失。
另外一個需要注意的是,上面的GROUP BY字段
l_partkey
不是分片字段,如果GROUP BY字段已經包含了分片字段,不需要這個優化,因為這種情況下可以直接把LIMIT下推下去。
另一個測試用例,形式類似
...
SET citus.limit_clause_row_fetch_count TO 150;
SET citus.large_table_shard_count TO 2;
SELECT c_custkey, c_name, count(*) as lineitem_count
FROM customer, orders, lineitem
WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey
GROUP BY c_custkey, c_name
ORDER BY lineitem_count DESC, c_custkey LIMIT 10;
DEBUG: push down of limit count: 150
c_custkey | c_name | lineitem_count
-----------+--------------------+----------------
43 | Customer#000000043 | 42
370 | Customer#000000370 | 38
79 | Customer#000000079 | 37
689 | Customer#000000689 | 36
472 | Customer#000000472 | 35
685 | Customer#000000685 | 35
643 | Customer#000000643 | 34
226 | Customer#000000226 | 33
496 | Customer#000000496 | 32
304 | Customer#000000304 | 31
(10 rows)
小結
- 适用場景
-
适用于分組聚合并取TopN結果的SQL的性能優化citus.limit_clause_row_fetch_count
-
- 不适用場景
- 要求精确結果
- 聚合字段包含分片字段
- count(DISTINCT)