我们知道SQL中有个explode函数,经常使用的也是该函数,但还有个函数posexplode用法和其类似,使用好这个函数,可以简化不少工作。话不多说,直接上示例。
案例:有一张表,其数据如下
seat_no | bd_time |
A,B,C,D | 8:00,8:10,8:20,8:30 |
H,I,J,K | 9:10,9:20,9:30,9:40 |
... |
展示如下:
sn | bt |
A | 8:00 |
B | 8:10 |
C | 8:20 |
D | 8:30 |
分析:看到字段数据,可以想到使用lateral view explode展开,我们先使用explode展开看下
with tb1 as (
select
seat_no,
bd_time
from values('A,B,C,D','8:00,8:10,8:20,8:30'),
('H,I,J,K','9:10,9:20,9:30,9:40')
t(seat_no,bd_time)
)
select
seat_no,
bd_time,
exp_sn
from tb1
lateral view explode(split(seat_no,',')) t as exp_sn;
--只展开seat_no 结果如下
seat_no bd_time exp_sn
A,B,C,D 8:00,8:10,8:20,8:30 A
A,B,C,D 8:00,8:10,8:20,8:30 B
A,B,C,D 8:00,8:10,8:20,8:30 C
A,B,C,D 8:00,8:10,8:20,8:30 D
H,I,J,K 9:10,9:20,9:30,9:40 H
H,I,J,K 9:10,9:20,9:30,9:40 I
H,I,J,K 9:10,9:20,9:30,9:40 J
H,I,J,K 9:10,9:20,9:30,9:40 K
select
seat_no,
bd_time,
exp_sn,
exp_bt
from tb1
lateral view explode(split(seat_no,',')) t as exp_sn
lateral view explode(split(bd_time,',')) t as exp_bt;
--两个字段均展开
seat_no bd_time exp_sn exp_bt
A,B,C,D 8:00,8:10,8:20,8:30 A 8:00
A,B,C,D 8:00,8:10,8:20,8:30 A 8:10
A,B,C,D 8:00,8:10,8:20,8:30 A 8:20
A,B,C,D 8:00,8:10,8:20,8:30 A 8:30
A,B,C,D 8:00,8:10,8:20,8:30 B 8:00
A,B,C,D 8:00,8:10,8:20,8:30 B 8:10
A,B,C,D 8:00,8:10,8:20,8:30 B 8:20
A,B,C,D 8:00,8:10,8:20,8:30 B 8:30
A,B,C,D 8:00,8:10,8:20,8:30 C 8:00
A,B,C,D 8:00,8:10,8:20,8:30 C 8:10
A,B,C,D 8:00,8:10,8:20,8:30 C 8:20
A,B,C,D 8:00,8:10,8:20,8:30 C 8:30
A,B,C,D 8:00,8:10,8:20,8:30 D 8:00
A,B,C,D 8:00,8:10,8:20,8:30 D 8:10
A,B,C,D 8:00,8:10,8:20,8:30 D 8:20
A,B,C,D 8:00,8:10,8:20,8:30 D 8:30
H,I,J,K 9:10,9:20,9:30,9:40 H 9:10
H,I,J,K 9:10,9:20,9:30,9:40 H 9:20
H,I,J,K 9:10,9:20,9:30,9:40 H 9:30
H,I,J,K 9:10,9:20,9:30,9:40 H 9:40
H,I,J,K 9:10,9:20,9:30,9:40 I 9:10
H,I,J,K 9:10,9:20,9:30,9:40 I 9:20
H,I,J,K 9:10,9:20,9:30,9:40 I 9:30
H,I,J,K 9:10,9:20,9:30,9:40 I 9:40
H,I,J,K 9:10,9:20,9:30,9:40 J 9:10
H,I,J,K 9:10,9:20,9:30,9:40 J 9:20
H,I,J,K 9:10,9:20,9:30,9:40 J 9:30
H,I,J,K 9:10,9:20,9:30,9:40 J 9:40
H,I,J,K 9:10,9:20,9:30,9:40 K 9:10
H,I,J,K 9:10,9:20,9:30,9:40 K 9:20
H,I,J,K 9:10,9:20,9:30,9:40 K 9:30
H,I,J,K 9:10,9:20,9:30,9:40 K 9:40
从上述结果可以看到,按两个字段展开相当于直接笛卡尔积了,在之前的文章中我们也分析过生成索引的posexplode函数,接下来我们加上索引试试看
select
seat_no,
bd_time,
inx_sn,
exp_sn
from tb1
lateral view posexplode(split(seat_no,',')) t as inx_sn,exp_sn;
--只展开seat_no 结果如下
seat_no bd_time inx_sn exp_sn
A,B,C,D 8:00,8:10,8:20,8:30 0 A
A,B,C,D 8:00,8:10,8:20,8:30 1 B
A,B,C,D 8:00,8:10,8:20,8:30 2 C
A,B,C,D 8:00,8:10,8:20,8:30 3 D
H,I,J,K 9:10,9:20,9:30,9:40 0 H
H,I,J,K 9:10,9:20,9:30,9:40 1 I
H,I,J,K 9:10,9:20,9:30,9:40 2 J
H,I,J,K 9:10,9:20,9:30,9:40 3 K
select
seat_no,
bd_time,
inx_sn,
exp_sn,
inx_bt,
exp_bt
from tb1
lateral view posexplode(split(seat_no,',')) t as inx_sn,exp_sn
lateral view posexplode(split(bd_time,',')) t as inx_bt,exp_bt;
--两个字段均展开
seat_no bd_time inx_sn exp_sn inx_bt exp_bt
A,B,C,D 8:00,8:10,8:20,8:30 0 A 0 8:00
A,B,C,D 8:00,8:10,8:20,8:30 0 A 1 8:10
A,B,C,D 8:00,8:10,8:20,8:30 0 A 2 8:20
A,B,C,D 8:00,8:10,8:20,8:30 0 A 3 8:30
A,B,C,D 8:00,8:10,8:20,8:30 1 B 0 8:00
A,B,C,D 8:00,8:10,8:20,8:30 1 B 1 8:10
A,B,C,D 8:00,8:10,8:20,8:30 1 B 2 8:20
A,B,C,D 8:00,8:10,8:20,8:30 1 B 3 8:30
A,B,C,D 8:00,8:10,8:20,8:30 2 C 0 8:00
A,B,C,D 8:00,8:10,8:20,8:30 2 C 1 8:10
A,B,C,D 8:00,8:10,8:20,8:30 2 C 2 8:20
A,B,C,D 8:00,8:10,8:20,8:30 2 C 3 8:30
A,B,C,D 8:00,8:10,8:20,8:30 3 D 0 8:00
A,B,C,D 8:00,8:10,8:20,8:30 3 D 1 8:10
A,B,C,D 8:00,8:10,8:20,8:30 3 D 2 8:20
A,B,C,D 8:00,8:10,8:20,8:30 3 D 3 8:30
H,I,J,K 9:10,9:20,9:30,9:40 0 H 0 9:10
H,I,J,K 9:10,9:20,9:30,9:40 0 H 1 9:20
H,I,J,K 9:10,9:20,9:30,9:40 0 H 2 9:30
H,I,J,K 9:10,9:20,9:30,9:40 0 H 3 9:40
H,I,J,K 9:10,9:20,9:30,9:40 1 I 0 9:10
H,I,J,K 9:10,9:20,9:30,9:40 1 I 1 9:20
H,I,J,K 9:10,9:20,9:30,9:40 1 I 2 9:30
H,I,J,K 9:10,9:20,9:30,9:40 1 I 3 9:40
H,I,J,K 9:10,9:20,9:30,9:40 2 J 0 9:10
H,I,J,K 9:10,9:20,9:30,9:40 2 J 1 9:20
H,I,J,K 9:10,9:20,9:30,9:40 2 J 2 9:30
H,I,J,K 9:10,9:20,9:30,9:40 2 J 3 9:40
H,I,J,K 9:10,9:20,9:30,9:40 3 K 0 9:10
H,I,J,K 9:10,9:20,9:30,9:40 3 K 1 9:20
H,I,J,K 9:10,9:20,9:30,9:40 3 K 2 9:30
H,I,J,K 9:10,9:20,9:30,9:40 3 K 3 9:40
从上述结果看,posexplode的索引的确省下了不少事,可以利用两个索引值相同进行过滤,可得到想要的结果
select
seat_no,
bd_time,
exp_sn,
exp_bt
from tb1
lateral view posexplode(split(seat_no,',')) t as inx_sn,exp_sn
lateral view posexplode(split(bd_time,',')) t as inx_bt,exp_bt
where inx_sn = inx_bt;
--结果如下
seat_no bd_time exp_sn exp_bt
A,B,C,D 8:00,8:10,8:20,8:30 A 8:00
A,B,C,D 8:00,8:10,8:20,8:30 B 8:10
A,B,C,D 8:00,8:10,8:20,8:30 C 8:20
A,B,C,D 8:00,8:10,8:20,8:30 D 8:30
H,I,J,K 9:10,9:20,9:30,9:40 H 9:10
H,I,J,K 9:10,9:20,9:30,9:40 I 9:20
H,I,J,K 9:10,9:20,9:30,9:40 J 9:30
H,I,J,K 9:10,9:20,9:30,9:40 K 9:40
关于posexplode具体细节用法,可参考阿里云文档
https://help.aliyun.com/document_detail/293597.html#section-2yc-ymd-p11拜了个拜