我们通过group by子句进行分组时,经常会出现数据重复的情况。例如GROUP BY CUBE (a,b), CUBE (b,c)便可能出现数据重复的情况。
PG14中支持group by distinct的语法,可以用来进行数据去重。
例子:
group by:
bill@bill=>select a, b, c
from (values (1, 2, 3), (4, null, 6), (7, 8, 9)) as t (a, b, c)
group by rollup(a, b), rollup(a, c)
order by a, b, c;
a | b | c
---+---+---
1 | 2 | 3
1 | 2 |
1 | 2 |
1 | | 3
1 | | 3
1 | |
1 | |
1 | |
4 | | 6
4 | | 6
4 | | 6
4 | |
4 | |
4 | |
4 | |
4 | |
7 | 8 | 9
7 | 8 |
7 | 8 |
7 | | 9
7 | | 9
7 | |
7 | |
7 | |
| |
(25 rows)
group by distinct:
bill@bill=>select a, b, c
bill-# from (values (1, 2, 3), (4, null, 6), (7, 8, 9)) as t (a, b, c)
bill-# group by distinct rollup(a, b), rollup(a, c)
bill-# order by a, b, c;
a | b | c
---+---+---
1 | 2 | 3
1 | 2 |
1 | | 3
1 | |
4 | | 6
4 | | 6
4 | |
4 | |
7 | 8 | 9
7 | 8 |
7 | | 9
7 | |
| |
(13 rows)
需要注意并不等价于先distinct再group by:
bill@bill=>select distinct a, b, c
bill-# from (values (1, 2, 3), (4, null, 6), (7, 8, 9)) as t (a, b, c)
bill-# group by rollup(a, b), rollup(a, c)
bill-# order by a, b, c;
a | b | c
---+---+---
1 | 2 | 3
1 | 2 |
1 | | 3
1 | |
4 | | 6
4 | |
7 | 8 | 9
7 | 8 |
7 | | 9
7 | |
| |
(11 rows)