天天看點

開發指南—DQL語句—Grouping Sets、Rollup和Cube擴充

注意事項

  • 本文介紹的所有GROUP BY相關的擴充文法,均不支援查詢下推至

    LogicalView

    算子中執行。關于查詢下推,請參見 查詢改寫與下推
  • 本文示例中所用測試資料資訊如下:使用如下語句建立一張

    requests

    表:
CREATE TABLE requests (
  `id` int(10) UNSIGNED NOT NULL,
  `os` varchar(20) DEFAULT NULL,
  `device` varchar(20) DEFAULT NULL,
  `city` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE = InnoDB DEFAULT CHARSET = utf8 dbpartition BY hash(`id`) tbpartition BY hash(`id`);      
  • requests

    表中使用如下語句插入測試所需的資料:
INSERT INTO requests (id, os, device, city) VALUES
(1, 'windows', 'PC', 'Beijing'),
(2, 'windows', 'PC', 'Shijiazhuang'),
(3, 'linux', 'Phone', 'Beijing'),
(4, 'windows', 'PC', 'Beijing'),
(5, 'ios', 'Phone', 'Shijiazhuang'),
(6, 'linux', 'PC', 'Beijing'),
(7, 'windows', 'Phone', 'Shijiazhuang');      

GROUPING SETS擴充

  • 功能介紹GROUPING SETS是GROUP BY子句的擴充,可以生成一個結果集,該結果集實際上是基于不同分組的多個結果集的串聯(與UNION ALL運算結果類似),但UNION ALL運算和GROUPING SETS擴充并不會消除合并結果集中的重複行。
  • 文法
GROUPING SETS (
  { expr_1 | ( expr_1a [, expr_1b ] ...) |
    ROLLUP ( expr_list ) | CUBE ( expr_list )
  } [, ...] )      
  • 說明GROUPING SETS擴充可包含一個或多個由半形逗號(,)分隔表達式(如

    expr_1

    (expr_1a [, expr_1b ] ...)

    )的任意組合,以及帶半角圓括号(())的表達式清單(如

    ( expr_list )

    ),其中:
    • 每個表達式都可用于确定結果集的分組方式。
    • GROUPING SETS内也支援嵌套使用ROLLUP或者CUBE。
  • 示例
    • 通過GROUPING SETS擴充對資料進行分組查詢,文法如下:
select os,device, city ,count(*)
from requests
group by grouping sets((os, device), (city), ());
上述語句等效于如下語句:
select os, device, NULL, count(*)
from requests group by os, device
union all
select NULL, NULL, NULL, count(*)
from requests
union all
select null, null, city, count(*)
from requests group by city;      
    • 傳回結果如下:
+---------+--------+--------------+----------+
| os      | device | city         | count(*) |
+---------+--------+--------------+----------+
| windows | PC     | NULL         |        3 |
| linux   | PC     | NULL         |        1 |
| linux   | Phone  | NULL         |        1 |
| windows | Phone  | NULL         |        1 |
| ios     | Phone  | NULL         |        1 |
| NULL    | NULL   | Shijiazhuang |        3 |
| NULL    | NULL   | Beijing      |        4 |
| NULL    | NULL   | NULL         |        7 |
+---------+--------+--------------+----------+      
    • 說明 未在分組集中使用的表達式,會用NULL充當占位符,便于對這些未在分組集使用的結果集進行操作,例如結果

      city

      列中顯示為NULL的行。
    • 通過在GROUPING SETS中嵌套ROLLUP來對資料進行分組,文法如下:
select os,device, city ,count(*) from requests 
group by grouping sets((city), ROLLUP(os, device));
上述語句等效于如下語句:
select os,device, city ,count(*) from requests 
group by grouping sets((city), (os), (os, device), ());      
+---------+--------+--------------+----------+
| os      | device | city         | count(*) |
+---------+--------+--------------+----------+
| NULL    | NULL   | Shijiazhuang |        3 |
| NULL    | NULL   | Beijing      |        4 |
| windows | PC     | NULL         |        3 |
| linux   | PC     | NULL         |        1 |
| ios     | Phone  | NULL         |        1 |
| linux   | Phone  | NULL         |        1 |
| windows | Phone  | NULL         |        1 |
| windows | NULL   | NULL         |        4 |
| linux   | NULL   | NULL         |        2 |
| ios     | NULL   | NULL         |        1 |
| NULL    | NULL   | NULL         |        7 |
+---------+--------+--------------+----------+      
    • 通過在GROUPING SETS中嵌套CUBE擴充來對資料進行分組,文法如下:
select os,device, city ,count(*) from requests 
group by grouping sets((city), CUBE(os, device));
上述語句等效于如下語句:
select os,device, city ,count(*) from requests 
group by grouping sets((city), (os), (os, device), (), (device));      
+---------+--------+--------------+----------+
| os      | device | city         | count(*) |
+---------+--------+--------------+----------+
| NULL    | NULL   | Beijing      |        4 |
| NULL    | NULL   | Shijiazhuang |        3 |
| windows | PC     | NULL         |        3 |
| ios     | Phone  | NULL         |        1 |
| linux   | Phone  | NULL         |        1 |
| windows | Phone  | NULL         |        1 |
| linux   | PC     | NULL         |        1 |
| windows | NULL   | NULL         |        4 |
| ios     | NULL   | NULL         |        1 |
| linux   | NULL   | NULL         |        2 |
| NULL    | PC     | NULL         |        4 |
| NULL    | Phone  | NULL         |        3 |
| NULL    | NULL   | NULL         |        7 |
+---------+--------+--------------+----------+      
    • 通過GROUP BY、CUBE和GROUPING SETS組合産生GROUPING SETS,示例如下:
select os,device, city, count(*)
from requests 
group by os, cube(os,device), grouping sets(city);
上述語句等效于如下語句:
select os,device, city, count(*)
from requests 
group by grouping sets((os,device,city),(os,city),(os,device,city));      
+---------+--------+--------------+----------+
| os      | device | city         | count(*) |
+---------+--------+--------------+----------+
| linux   | Phone  | Beijing      |        1 |
| windows | Phone  | Shijiazhuang |        1 |
| windows | PC     | Shijiazhuang |        1 |
| linux   | PC     | Beijing      |        1 |
| windows | PC     | Beijing      |        2 |
| ios     | Phone  | Shijiazhuang |        1 |
| linux   | NULL   | Beijing      |        2 |
| windows | NULL   | Shijiazhuang |        2 |
| windows | NULL   | Beijing      |        2 |
| ios     | NULL   | Shijiazhuang |        1 |
+---------+--------+--------------+----------+      

ROLLUP擴充

  • 功能介紹ROLLUP擴充生成一系列有總計的分層組,每個分層組都有小計。該層次結構的順序由ROLLUP表達式清單中給定的表達式的順序确定。該層次結構的頂部是清單中最左側的項。每個連續項都會沿右側在該層次結構中向下移動,最右側的項是最低級别。
ROLLUP ( { expr_1 | ( expr_1a [, expr_1b ] ...) }
  [, expr_2 | ( expr_2a [, expr_2b ] ...) ] ...)      
  • 說明
    • 每個表達式都會用于确定結果集的分組方式。如果采用帶圓括号形式的表達式,例如

      ( expr_1a, expr_1b, ...)

      ,則

      expr_1a

      expr_1b

      傳回的值組合定義層次結構的單個分組級别。
    • 對于清單中的第一項,例如

      expr_1

      ( expr_1a, expr_1b, ...)

      的組合,PolarDB-X将為每個唯一值傳回一個小計。對于清單中的第二項,例如

      expr_2

      ( expr_2a, expr_2b, ...)

      的組合,PolarDB-X将為第二項的每個分組中的每個唯一值傳回一個小計,依此類推。最後,PolarDB-X将為整個結果集傳回一個總計。
    • 對于小計行,将為小計包含的各項傳回NULL。
    • 通過ROLLUP對

      (os, device, city)

      按層級聚合的方式産生GROUPING SETS,文法如下:
select os,device, city, count(*)
from requests 
group by rollup (os, device, city);
上述語句等效于如下語句:
select os,device, city, count(*)
from requests 
group by os, device, city with rollup;
也等效于如下語句:
select os,device, city, count(*)
from requests 
group by grouping sets ((os, device, city),(os, device),(os),());      
+---------+--------+--------------+----------+
| os      | device | city         | count(*) |
+---------+--------+--------------+----------+
| windows | PC     | Beijing      |        2 |
| ios     | Phone  | Shijiazhuang |        1 |
| windows | PC     | Shijiazhuang |        1 |
| linux   | PC     | Beijing      |        1 |
| linux   | Phone  | Beijing      |        1 |
| windows | Phone  | Shijiazhuang |        1 |
| windows | PC     | NULL         |        3 |
| ios     | Phone  | NULL         |        1 |
| linux   | PC     | NULL         |        1 |
| linux   | Phone  | NULL         |        1 |
| windows | Phone  | NULL         |        1 |
| windows | NULL   | NULL         |        4 |
| ios     | NULL   | NULL         |        1 |
| linux   | NULL   | NULL         |        2 |
| NULL    | NULL   | NULL         |        7 |
+---------+--------+--------------+----------+      
    • os, (os,device), city

select os,device, city, count(*)
from requests 
group by rollup (os, (os,device), city);
上述語句等效于如下語句:
select os,device, city, count(*)
from requests 
group by os, (os,device), city with rollup;
也等效于如下語句:
select os,device, city, count(*)
from requests 
group by grouping sets ((os, device, city),(os, device),(os),());      
+---------+--------+--------------+----------+
| os      | device | city         | count(*) |
+---------+--------+--------------+----------+
| windows | PC     | Beijing      |        2 |
| windows | PC     | Shijiazhuang |        1 |
| linux   | PC     | Beijing      |        1 |
| linux   | Phone  | Beijing      |        1 |
| windows | Phone  | Shijiazhuang |        1 |
| ios     | Phone  | Shijiazhuang |        1 |
| windows | PC     | NULL         |        3 |
| linux   | PC     | NULL         |        1 |
| linux   | Phone  | NULL         |        1 |
| windows | Phone  | NULL         |        1 |
| ios     | Phone  | NULL         |        1 |
| windows | NULL   | NULL         |        4 |
| linux   | NULL   | NULL         |        2 |
| ios     | NULL   | NULL         |        1 |
| NULL    | NULL   | NULL         |        7 |
+---------+--------+--------------+----------+      

CUBE擴充

  • 功能介紹CUBE擴充與ROLLUP擴充類似,但與生成分組并基于ROLLUP表達式清單中從左到右的項清單生成層次結構的ROLLUP擴充不同,CUBE是基于CUBE表達式清單中所有項的每個排列生成分組和小計。是以,與對同一表達式清單執行的ROLLUP相比,CUBE結果集會包含更多的行。
CUBE ( { expr_1 | ( expr_1a [, expr_1b ] ...) }
  [, expr_2 | ( expr_2a [, expr_2b ] ...) ] ...)      
    • 每個表達式都會用于确定結果集的分組方式。如果采用帶半角圓括号的形式,例如

      ( expr_1a, expr_1b, ...)

      expr_1a

      expr_1b

      傳回的值組合定義單個組。
    • expr_1

      ( expr_1a, expr_1b, ...)

      expr_2

      ( expr_2a, expr_2b, ...)

      的組合,PolarDB-X在為每個唯一值傳回一個小計的同時,還将為第一項和第二項的每個唯一組合傳回一個小計。如果存在第三項,PolarDB-X則會為第三項的每個唯一值、第三項和第一項組合的每個唯一值、第三項和第二項組合的每個唯一值以及第三項、第二項和第一項組合的每個唯一值傳回一個小計。最後,再将為整個結果集傳回一個總計。
    • 通過CUBE枚舉

      (os, device, city)

      的所有可能列為GROUPING SETS,文法如下:
select os,device, city, count(*)
from requests 
group by cube (os, device, city);
上述語句等效于如下語句:
select os,device, city, count(*)
from requests 
group by grouping sets ((os, device, city),(os, device),(os, city),(device,city),(os),(device),(city),());      
+---------+--------+--------------+----------+
| os      | device | city         | count(*) |
+---------+--------+--------------+----------+
| linux   | Phone  | Beijing      |        1 |
| windows | Phone  | Shijiazhuang |        1 |
| windows | PC     | Beijing      |        2 |
| ios     | Phone  | Shijiazhuang |        1 |
| windows | PC     | Shijiazhuang |        1 |
| linux   | PC     | Beijing      |        1 |
| linux   | Phone  | NULL         |        1 |
| windows | Phone  | NULL         |        1 |
| windows | PC     | NULL         |        3 |
| ios     | Phone  | NULL         |        1 |
| linux   | PC     | NULL         |        1 |
| linux   | NULL   | Beijing      |        2 |
| windows | NULL   | Shijiazhuang |        2 |
| windows | NULL   | Beijing      |        2 |
| ios     | NULL   | Shijiazhuang |        1 |
| linux   | NULL   | NULL         |        2 |
| windows | NULL   | NULL         |        4 |
| ios     | NULL   | NULL         |        1 |
| NULL    | Phone  | Beijing      |        1 |
| NULL    | Phone  | Shijiazhuang |        2 |
| NULL    | PC     | Beijing      |        3 |
| NULL    | PC     | Shijiazhuang |        1 |
| NULL    | Phone  | NULL         |        3 |
| NULL    | PC     | NULL         |        4 |
| NULL    | NULL   | Beijing      |        4 |
| NULL    | NULL   | Shijiazhuang |        3 |
| NULL    | NULL   | NULL         |        7 |
+---------+--------+--------------+----------+      
    • (os, device),(device, city)

      所有可能列為GROUPING SETS,文法如下:
select os,device, city, count(*) 
from requests 
group by cube ((os, device), (device, city));
上述語句等效于如下語句:
select os,device, city, count(*) 
from requests 
group by grouping sets ((os, device, city),(os, device),(device,city),());      
+---------+--------+--------------+----------+
| os      | device | city         | count(*) |
+---------+--------+--------------+----------+
| linux   | Phone  | Beijing      |        1 |
| windows | Phone  | Shijiazhuang |        1 |
| windows | PC     | Beijing      |        2 |
| windows | PC     | Shijiazhuang |        1 |
| linux   | PC     | Beijing      |        1 |
| ios     | Phone  | Shijiazhuang |        1 |
| linux   | Phone  | NULL         |        1 |
| windows | Phone  | NULL         |        1 |
| windows | PC     | NULL         |        3 |
| linux   | PC     | NULL         |        1 |
| ios     | Phone  | NULL         |        1 |
| NULL    | Phone  | Beijing      |        1 |
| NULL    | Phone  | Shijiazhuang |        2 |
| NULL    | PC     | Beijing      |        3 |
| NULL    | PC     | Shijiazhuang |        1 |
| NULL    | NULL   | NULL         |        7 |
+---------+--------+--------------+----------+      

GROUPING和GROUPING_ID函數

  • 功能介紹
    • GROUPING函數在GROUP BY子句使用GROUPING SETS、ROLLUP、或CUBE擴充時,GROUPING SETS結果中會使用NULL來充當占位符,導緻無法區分占位符NULL與資料中真正的NULL。此時,您可以使用PolarDB-X提供的GROUPING函數來作區分。

      GROUPING函數接受一個列名作為參數,如果結果對應行使用了參數列做聚合,則結果傳回0,此時意味着NULL來自輸入資料。如果結果對應行未使用參數列做聚合,則傳回1,此時意味着NULL來自GROUPING SETS結果中的占位符。

    • GROUPING_ID函數GROUPING_ID函數簡化了GROUPING函數,用于确定ROLLBACK、CUBE或GROUPING SETS擴充的結果集中行的小計級别。GROUPING函數僅采用一個清單達式并傳回一個值來訓示行是否為給定列的所有值的小計。是以,當解釋具有多個分組列的查詢的小計級别時,可能需要多個 GROUPING函數。GROUPING_ID函數接受ROLLBACK、CUBE或GROUPINGSETS擴充中已使用的一個或多個清單達式,并傳回單個整數,該整數可用于确定其中哪一列已聚合小計。
    • GROUPING函數
SELECT [ expr ...,] GROUPING( col_expr ) [, expr ] ...
FROM ...
GROUP BY { ROLLUP | CUBE | GROUPING SETS }( [...,] col_expr
  [, ...] ) [, ...]      
    • 說明 GROUPING函數采用單個參數,該參數必須是GROUP BY子句中ROLLUP、CUBE或GROUPING SETS擴充的表達式清單中指定的次元列的表達式。
    • GROUPING_ID函數
SELECT [ expr ...,]
  GROUPING_ID( col_expr_1 [, col_expr_2 ] ... )
  [, expr ] ...
FROM ...
GROUP BY { ROLLUP | CUBE | GROUPING SETS }( [...,] col_expr_1
  [, col_expr_2 ] [, ...] ) [, ...]      
  • 示例通過GROUPING_ID函數将多個列名作為參數,并将參數列的GROUPING結果按照Bitmap的方式組成整數,文法如下:
select a,b,c,count(*),
grouping(a) ga, grouping(b) gb, grouping(c) gc, grouping_id(a,b,c) groupingid 
from (select 1 as a ,2 as b,3 as c)
group by cube(a,b,c);      
+------+------+------+----------+------+------+------+------------+
| a    | b    | c    | count(*) | ga   | gb   | gc   | groupingid |
+------+------+------+----------+------+------+------+------------+
|    1 |    2 |    3 |        1 |    0 |    0 |    0 |          0 |
|    1 |    2 | NULL |        1 |    0 |    0 |    1 |          1 |
|    1 | NULL |    3 |        1 |    0 |    1 |    0 |          2 |
|    1 | NULL | NULL |        1 |    0 |    1 |    1 |          3 |
| NULL |    2 |    3 |        1 |    1 |    0 |    0 |          4 |
| NULL |    2 | NULL |        1 |    1 |    0 |    1 |          5 |
| NULL | NULL |    3 |        1 |    1 |    1 |    0 |          6 |
| NULL | NULL | NULL |        1 |    1 |    1 |    1 |          7 |
+------+------+------+----------+------+------+------+------------+