天天看點

【學習筆記】hive 之行拆列explode

1、explode

explode(ARRAY) 清單中的每個元素生成一行

explode(MAP) map中每個key-value對,生成一行,key為一列,value為一列

【學習筆記】hive 之行拆列explode

限制:

1、No other expressions are allowed in SELECT

SELECT pageid, explode(adid_list) AS myCol... is not supported           

2、UDTF's can't be nested

SELECT explode(explode(adid_list)) AS myCol... is not supported           

3、GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY is not supported

SELECT explode(adid_list) AS myCol ... GROUP BY myCol is not supported
           

2、lateral view

可使用lateral view解除以上限制,文法:

lateralView: LATERAL VIEW explode(expression) tableAlias AS columnAlias (',' columnAlias)*

fromClause: FROM baseTable (lateralView)*

案例:

table名稱為pageAds

【學習筆記】hive 之行拆列explode
SELECT pageid, adid

FROM pageAds LATERAL VIEW explode(adid_list) adTable AS adid;

輸出結果:

【學習筆記】hive 之行拆列explode

3、多個lateral view

from語句後面可以帶多個lateral view語句

表名:baseTable

【學習筆記】hive 之行拆列explode

from後隻有一個lateral view:

SELECT myCol1, col2 FROM baseTable

LATERAL VIEW explode(col1) myTable1 AS myCol1;

結果:

【學習筆記】hive 之行拆列explode

多個lateral view:

SELECT myCol1, myCol2 FROM baseTable

LATERAL VIEW explode(col1) myTable1 AS myCol1

LATERAL VIEW explode(col2) myTable2 AS myCol2;

【學習筆記】hive 之行拆列explode

4、Outer Lateral Views

如果array類型的字段為空,但依然需傳回記錄,可使用outer關鍵詞。

比如:select * from src LATERAL VIEW explode(array()) C AS a limit 10;

這條語句中的array字段是個空清單,這條語句不管src表中是否有記錄,結果都是空的。

而:select * from src LATERAL VIEW OUTER explode(array()) C AS a limit 10;

結果中的記錄數為src表的記錄數,隻是a字段為NULL。

比如:

238 val_238 NULL

86 val_86 NULL

311 val_311 NULL

27 val_27 NULL

165 val_165 NULL

409 val_409 NULL

255 val_255 NULL

278 val_278 NULL

98 val_98 NULL

官方文檔:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-explode