union的巧妙用法,使用union也可以橫向合并sql查詢結果
為什麼 在使用union效率會比較高,因為hive對union做了優化。
我們都知道union是縱向連接配接查詢結果,join是橫向,但是用union連接配接大家試過嗎,未優化sql:
select ep.productid,productname,count(st.tduserid),count(distinct sl.tduserid),count(distinct sn.tduserid),avg(sl.interval_level)
from(select productid,productname from enterprise.product where productid = '3006090') ep
join(select tduserid,productid from tdanalytics.stg_td_web_page_ex where l_date <= '2019-04-07' and l_date >= date_add('2019-04-07', -6)) st
on ep.productid=st.productid
join(select tduserid,interval_level,productid from tdanalytics.stg_td_web_launch_ex where l_date <= '2019-04-07' and l_date >= date_add('2019-04-07', -6)) sl
on st.productid=sl.productid
join(select tduserid,productid from tdanalytics.stg_td_web_newuser_ex where l_date <= '2019-04-07' and l_date >= date_add('2019-04-07', -6)) sn
on sl.productid=sn.productid
group by ep.productid,productname;
然後用了union,豁然開朗,隻跑了1m26s,簡直好用到爆,寫起來可能複雜些,但是可以把分析師寫的語句直接放進來,自己隻需要改改就ok了,不多說代碼如下:
select '2019-04-07' dates,
'3006090' productid,
max(pro) productname,
sum(pv) pv,
sum(uv) uv,
cast(sum(duration) as decimal(10,4)) duration,
sum(new_uv) new_uv
from
(select productname pro,
'0' pv,
'0' uv,
'0' duration,
'0' new_uv
from enterprise.product where productid = '3006090'
union all
select '0' pro,
count(tduserid) pv,
'0' uv,
'0' duration,
'0' new_uv
from tdanalytics.stg_td_web_page_ex where l_date <= '2019-04-07' and l_date >= date_add('2019-04-07', -6) and
productid = '3006090'
union all
select '0' pro,
'0' pv,
count(distinct tduserid) uv,
avg(interval_level) duration,
'0' new_uv
from tdanalytics.stg_td_web_launch_ex where l_date <= '2019-04-07' and l_date >= date_add('2019-04-07', -6) and
productid = '3006090'
union all
select '0' pro,
'0' pv,
'0' uv,
'0' duration,
count(distinct tduserid) new_uv
from tdanalytics.stg_td_web_newuser_ex where l_date <= ‘2019-04-07’ and l_date >= date_add(‘2019-04-07’, -6) and
productid = ‘3006090’
) t;
那麼數值的可以用sum求和,當有漢字時怎麼辦呢,用max就可以解決這個問題:
select cast('${n_date}' as date) dates,
'3006090' productid,
max(pro) productname,
sum(pv) pv,
sum(uv) uv,
cast(sum(duration) as decimal(10,2)) duration,
sum(new_uv) new_uv
from
(select productname pro,
'0' pv,
'0' uv,
'0' duration,
'0' new_uv
from enterprise.product where productid = '3006090'
union all
select '0' pro,
count(tduserid) pv,
'0' uv,
'0' duration,
'0' new_uv
from tdanalytics.stg_td_web_page_ex where l_date <= '${n_date}' and l_date >= date_add('${n_date}', -6) and
productid = '3006090'
union all
select '0' pro,
'0' pv,
count(distinct tduserid) uv,
avg(interval_level) duration,
'0' new_uv
from tdanalytics.stg_td_web_launch_ex where l_date <= '${n_date}' and l_date >= date_add('${n_date}', -6) and
productid = '3006090'
union all
select '0' pro,
'0' pv,
'0' uv,
'0' duration,
count(distinct tduserid) new_uv
from tdanalytics.stg_td_web_newuser_ex where l_date <= '${n_date}' and l_date >= date_add('${n_date}', -6) and
productid = '3006090'
) t;
還有幹貨一點,join篩選後的表要比join并篩選快,因為可以選自己想要的字段,代碼如下:
join (select url,displayname from enterprise.pagename where productid = '3006090') p on p.displayname=b.displayname 更優化
對比
join p on p.displayname=b.displayname where p.productid = '3006090'