天天看点

PostgreSQL 14 group by distinct子句分组去重

我们通过group by子句进行分组时,经常会出现数据重复的情况。例如GROUP BY CUBE (a,b), CUBE (b,c)便可能出现数据重复的情况。

PG14中支持group by distinct的语法,可以用来进行数据去重。

例子:

group by:

bill@bill=>select a, b, c
from (values (1, 2, 3), (4, null, 6), (7, 8, 9)) as t (a, b, c)
group by  rollup(a, b), rollup(a, c)
order by a, b, c;
 a | b | c
---+---+---
 1 | 2 | 3
 1 | 2 |
 1 | 2 |
 1 |   | 3
 1 |   | 3
 1 |   |
 1 |   |
 1 |   |
 4 |   | 6
 4 |   | 6
 4 |   | 6
 4 |   |
 4 |   |
 4 |   |
 4 |   |
 4 |   |
 7 | 8 | 9
 7 | 8 |
 7 | 8 |
 7 |   | 9
 7 |   | 9
 7 |   |
 7 |   |
 7 |   |
   |   |
(25 rows)
           

group by distinct:

bill@bill=>select a, b, c
bill-# from (values (1, 2, 3), (4, null, 6), (7, 8, 9)) as t (a, b, c)
bill-# group by distinct rollup(a, b), rollup(a, c)
bill-# order by a, b, c;
 a | b | c
---+---+---
 1 | 2 | 3
 1 | 2 |
 1 |   | 3
 1 |   |
 4 |   | 6
 4 |   | 6
 4 |   |
 4 |   |
 7 | 8 | 9
 7 | 8 |
 7 |   | 9
 7 |   |
   |   |
(13 rows)
           

需要注意并不等价于先distinct再group by:

bill@bill=>select distinct a, b, c
bill-# from (values (1, 2, 3), (4, null, 6), (7, 8, 9)) as t (a, b, c)
bill-# group by rollup(a, b), rollup(a, c)
bill-# order by a, b, c;
 a | b | c
---+---+---
 1 | 2 | 3
 1 | 2 |
 1 |   | 3
 1 |   |
 4 |   | 6
 4 |   |
 7 | 8 | 9
 7 | 8 |
 7 |   | 9
 7 |   |
   |   |
(11 rows)