天天看點

關于citus.limit_clause_row_fetch_count優化參數關于citus.limit_clause_row_fetch_count優化參數

關于

citus.limit_clause_row_fetch_count

優化參數

citus.limit_clause_row_fetch_count

是citus的一個性能優化的參數。具體适應于什麼場景呢?

官方文檔說明

下面是官方文檔說明,還是不夠具體。

https://docs.citusdata.com/en/v7.3/develop/api_guc.html?highlight=limit_clause_row_fetch_count

Planner Configuration

citus.limit_clause_row_fetch_count (integer)

Sets the number of rows to fetch per task for limit clause optimization. 
In some cases, select queries with limit clauses may need to fetch all 
rows from each task to generate results. In those cases, and where an 
approximation would produce meaningful results, this configuration value
 sets the number of rows to fetch from each shard. Limit approximations 
are disabled by default and this parameter is set to -1. This value can 
be set at run-time and is effective on the coordinator.
           

測試用例

從citus的測試用例中,可以清楚的看到它的作用。

citus-7.2.1\src\test\regress\expected\multi_limit_clause_approximate.out

-- Enable limit optimization to fetch one third of each shard's data
SET citus.limit_clause_row_fetch_count TO 600;
SELECT l_partkey, sum(l_partkey * (1 + l_suppkey)) AS aggregate FROM lineitem
    GROUP BY l_partkey
    ORDER BY aggregate DESC LIMIT 10;
DEBUG:  push down of limit count: 600
 l_partkey | aggregate  
-----------+------------
    194541 | 3727794642
    160895 | 3671463005
    183486 | 3128069328
    179825 | 3093889125
    162432 | 2834113536
    153937 | 2761321906
    199283 | 2726988572
    185925 | 2672114100
    196629 | 2622637602
    157064 | 2614644408
(10 rows)
           

上面的SQL,如果不加

SET citus.limit_clause_row_fetch_count TO 600

,CN需要到worker上把所有資料都撈出來,然後再在CN上排序取TopN結果。

大資料量的情況,性能會非常糟糕。加上

SET citus.limit_clause_row_fetch_count TO 600

,就隻會到每個worker上取前600的記錄。但可能會帶來準确性的損失。

另外一個需要注意的是,上面的GROUP BY字段

l_partkey

不是分片字段,如果GROUP BY字段已經包含了分片字段,不需要這個優化,因為這種情況下可以直接把LIMIT下推下去。

另一個測試用例,形式類似

...
SET citus.limit_clause_row_fetch_count TO 150;
SET citus.large_table_shard_count TO 2;
SELECT c_custkey, c_name, count(*) as lineitem_count
    FROM customer, orders, lineitem
    WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey
    GROUP BY c_custkey, c_name
    ORDER BY lineitem_count DESC, c_custkey LIMIT 10;
DEBUG:  push down of limit count: 150
 c_custkey |       c_name       | lineitem_count 
-----------+--------------------+----------------
        43 | Customer#000000043 |             42
       370 | Customer#000000370 |             38
        79 | Customer#000000079 |             37
       689 | Customer#000000689 |             36
       472 | Customer#000000472 |             35
       685 | Customer#000000685 |             35
       643 | Customer#000000643 |             34
       226 | Customer#000000226 |             33
       496 | Customer#000000496 |             32
       304 | Customer#000000304 |             31
(10 rows)
           

小結

  • 适用場景
    • citus.limit_clause_row_fetch_count

      适用于分組聚合并取TopN結果的SQL的性能優化
  • 不适用場景
    • 要求精确結果
    • 聚合字段包含分片字段
    • count(DISTINCT)

繼續閱讀