天天看點

巧用union all 優化HiveSqlunion的巧妙用法,使用union也可以橫向合并sql查詢結果

union的巧妙用法,使用union也可以橫向合并sql查詢結果

為什麼 在使用union效率會比較高,因為hive對union做了優化。

我們都知道union是縱向連接配接查詢結果,join是橫向,但是用union連接配接大家試過嗎,未優化sql:

select ep.productid,productname,count(st.tduserid),count(distinct sl.tduserid),count(distinct sn.tduserid),avg(sl.interval_level)
from(select productid,productname from enterprise.product where productid = '3006090') ep
join(select tduserid,productid from tdanalytics.stg_td_web_page_ex where l_date <= '2019-04-07' and l_date >= date_add('2019-04-07', -6)) st
on ep.productid=st.productid
join(select tduserid,interval_level,productid from tdanalytics.stg_td_web_launch_ex where l_date <= '2019-04-07' and l_date >= date_add('2019-04-07', -6)) sl
on st.productid=sl.productid
join(select tduserid,productid from tdanalytics.stg_td_web_newuser_ex where l_date <= '2019-04-07' and l_date >= date_add('2019-04-07', -6)) sn
on sl.productid=sn.productid
group by ep.productid,productname;
           

然後用了union,豁然開朗,隻跑了1m26s,簡直好用到爆,寫起來可能複雜些,但是可以把分析師寫的語句直接放進來,自己隻需要改改就ok了,不多說代碼如下:

select '2019-04-07' dates,
       '3006090' productid,
       max(pro) productname,
       sum(pv) pv,
       sum(uv) uv,
       cast(sum(duration) as decimal(10,4)) duration,
       sum(new_uv) new_uv
from 
(select productname pro,
       '0' pv,
       '0' uv,
       '0' duration,
       '0' new_uv
 from enterprise.product where productid = '3006090'
union all
select '0' pro,
       count(tduserid) pv,
       '0' uv,
       '0' duration,
       '0' new_uv
from tdanalytics.stg_td_web_page_ex where l_date <= '2019-04-07' and l_date >= date_add('2019-04-07', -6) and
       productid = '3006090'
union all
select '0' pro,
       '0' pv,
       count(distinct tduserid) uv,
       avg(interval_level) duration,
       '0' new_uv
from tdanalytics.stg_td_web_launch_ex where l_date <= '2019-04-07' and l_date >= date_add('2019-04-07', -6) and
       productid = '3006090'
union all
select '0' pro,
       '0' pv,
       '0' uv,
       '0' duration,
       count(distinct tduserid) new_uv
           

from tdanalytics.stg_td_web_newuser_ex where l_date <= ‘2019-04-07’ and l_date >= date_add(‘2019-04-07’, -6) and

productid = ‘3006090’

) t;

那麼數值的可以用sum求和,當有漢字時怎麼辦呢,用max就可以解決這個問題:
           
select cast('${n_date}' as date) dates,
       '3006090' productid,
       max(pro) productname,
       sum(pv) pv,
       sum(uv) uv,
       cast(sum(duration) as decimal(10,2)) duration,
       sum(new_uv) new_uv
from 
(select productname pro,
       '0' pv,
       '0' uv,
       '0' duration,
       '0' new_uv
 from enterprise.product where productid = '3006090'
union all
select '0' pro,
       count(tduserid) pv,
       '0' uv,
       '0' duration,
       '0' new_uv
from tdanalytics.stg_td_web_page_ex where l_date <= '${n_date}' and l_date >= date_add('${n_date}', -6) and
       productid = '3006090'
union all
select '0' pro,
       '0' pv,
       count(distinct tduserid) uv,
       avg(interval_level) duration,
       '0' new_uv
from tdanalytics.stg_td_web_launch_ex where l_date <= '${n_date}' and l_date >= date_add('${n_date}', -6) and
       productid = '3006090'
union all
select '0' pro,
       '0' pv,
       '0' uv,
       '0' duration,
       count(distinct tduserid) new_uv
from tdanalytics.stg_td_web_newuser_ex where l_date <= '${n_date}' and l_date >= date_add('${n_date}', -6) and
       productid = '3006090'
) t;
           

還有幹貨一點,join篩選後的表要比join并篩選快,因為可以選自己想要的字段,代碼如下:

join (select url,displayname from enterprise.pagename where productid = '3006090') p on p.displayname=b.displayname   更優化
對比
   join  p on p.displayname=b.displayname where p.productid = '3006090'
           

繼續閱讀