Hive --- 複雜的資料類型、列轉行和行轉列

一、複雜的資料類型都有哪些呢？

hive中複雜資料類型分為三種，分别是數組array、鍵值對map和結構體struct

array : col array<基本類型> ,下标從0開始，越界不報錯，以NULL代替
map : column map<string,string>
struct: col struct

二、數組array的基本操作

1、建立一個表

create table if not exists arr1(
name string,
score array<string>
)
row format delimited fields terminated by '\t'
collection items terminated by ',';

2、将資料導入arr1表

#原始資料注：中間不是空格是 \t
liqi    11,22,33,44,55,66
zhangsan        12,23,34,45,56,67
shenghuang      111,222,333,444,555,666 
           
# 導入資料
load data local inpath 'arr1.txt' into table arr;
           
#查詢資料
select * from arr1;
           
Hive --- 複雜的資料類型、列轉行和行轉列

3、列轉行操作

其實就上把一列資料轉化成多行

explode：主要是講清單中的每個元素生成一行，（由列變行）
lateral view：側視圖的意義是配合explode，功能是将一個語句生成的單行資料拆解成多行後的資料結果集; 具體的解釋為 lateral view 會将explode生成的結果放到一個虛拟表中，然後這個虛拟表會和目前表進行 join連接配接，來達到資料聚合的目的。

格式為：lateral view explode(字段) 虛拟表名 as 虛拟表字段名

将列變成行：

select name,s from arr1 lateral view explode(score) score as s;
           
Hive --- 複雜的資料類型、列轉行和行轉列

統計每個學生的總成績：

select name,sum(s) as tatalscore from arr1 lateral view explode(score) score as s;
           
Hive --- 複雜的資料類型、列轉行和行轉列

4、行轉列操作

此時用到 collect_set函數

collect_set函數功能：将分組中的某列轉為一個數組再傳回

準備資料：就是上面的列轉行的資料，再将其轉列

#1、建立一個臨時表存儲由列轉行的資料的表
create table temp_arr1
 as
 select name,s from arr1 lateral view explode(score) score as s;
           
#2、建立一個存儲某列的表
create table if not exists arr2(
name string,
score array<string>
)
row format delimited fields terminated by ' '
collection items terminated by ','
           
#3、然後就是将資料插入到arr2表中，期間利用collect_set函數
insert into arr2 
select name,collect_set(s) from temp_arr1 group by name;
#注意：一定會是字段s，因為轉為行的時候起的列名為s
           
Hive --- 複雜的資料類型、列轉行和行轉列

三、鍵值對map的基本操作

1、建立字段類型為map類型的表

create table if not exists map1(
name string,
score map<string,int>
)
row format delimited fields terminated by ' '
collection items terminated by ','
map keys terminated by ':';
           
提示：鍵值對map 會切兩次，數組array隻切一次，結構體struct也是切一次

2、資料準備

zhangsan chinese:90,math:87,english:63,nature:76
lisi chinese:60,math:30,english:78,nature:0
wangwu chinese:89,math:25,english:81,nature:9

3、加載資料

load data local inpath '/root/hivetest/map1.txt' into table map1;

4、鍵值對map類型的查詢操作

#查詢全部表map1的資料
select * from map1;
           
#查詢數學大于35分的學生的英語和自然成績：

提示：最好起别名，别管為什麼，總之有好處，因為别的表再通路的時候可以根據字段名通路
select name,m.score['english'],m.score['nature']
from map1 as m
where m.score['math']>35;
           
Hive --- 複雜的資料類型、列轉行和行轉列

5、map的列轉行操作

其實類似于 array的列轉行操作，隻要array的列轉行明白，這個或者下面肯定都明白了，下面我會一點一點展示到完整。

利用explode展開資料

select explode(score) as (m_key,m_value) from map1;

利用Lateral view和split，explode等一起使用，它能夠将一行資料拆成多行資料，并在此基礎上對拆分後的資料進行聚合

#列轉行的操作語句：

select name,m_key,m_value from map1 lateral view explode(score) s as m_key,m_value;

5、map的行轉列操作

# 資料準備: -- 使用新的資料

name7,38,75,66

name6,37,74,65

name5,36,73,64

name4,35,72,63

name3,34,71,62

name2,33,70,61

name1,32,69,60

# 建立臨時表,并加載資料
create table temp_map1(
name string,
score1 int,
score2 int,
score3 int
)
row format delimited fields terminated by ',';
           
# 加載資料到臨時表
load data local inpath 'temp_map1.txt' into table temp_map;
           
Hive --- 複雜的資料類型、列轉行和行轉列
# 建立要導入資料Map表
create table if not exists map2(
name string,
score map<string,int>
)
row format delimited fields terminated by ' '
collection items terminated by ','
map keys terminated by ':';
           
# 導入資料：
insert into map2
select name,map('chinese',score1,'math',score2,'english',score3) from
temp_map1
           
# 查詢是否成功導入到表map2
select * from map2;
           

四、struct資料類型的基本操作

#注：其實struct 與array、map 是一樣的，按我了解隻不過過了層包裝、多了幾種資料類型罷了

#建立一個表：

create table if not exists str2(
uname string,
addr struct < province:string,
              city:string,
              xian:string,
              dadao:string >)
row format delimited fields terminated by '\t'
collection items terminated by ',';

#導入資料：

load data local inpath 'struct.txt' into table str2;

#查詢資料：

select uname,addr.province,addr.city,addr.xian from str2;

#複雜資料類型案例

#類型介紹

uid uname belong tax addr
xdd ll,lw,lg,lm wuxian:300,gongjijin:1200,shebao:300 山,濟,曆
lkq lg,lm,lw,ll,mm wuxian:200,gongjijin:1000,shebao:200 河,石,中

#建立表

create table if not exists tax(
id int,
name string,
belong array<string>,
tax map<string,double>,
addr struct<province:string,city:string,road:string>
)
row format delimited fields terminated by ' '
collection items terminated by ','
map keys terminated by ':'
stored as textfile;

#導入資料

load data local inpath 'tax.txt' into table tax;

#查詢：下屬個數大于4個，公積金小于1200，省份在河的資料

select id,
name,
belong[0],
belong[1],
tax['wuxian'],
tax['shebao'],
addr.road
from tax
where size(belong) > 4 and
tax['gongjijin'] < 1200 and
addr.province = '河';

#嵌套資料類型

Hive --- 複雜的資料類型、列轉行和行轉列

繼續閱讀

Hadoop離線_Hive的基本操作

Hive中内部表、外部表、分區、分桶以及SQL的執行順序

Hive中的内部表外部表及分區表

Hive---外部分區表的建立

Hive學習筆記 3 Hive的資料模型：内部表、分區表、外部表、桶表、視圖

HiveQL(二):分區表

Hive的分區表入門

Hive的分區表

Hive（二）--分區分桶，内部表外部表

大資料高頻面試題之Hive的小檔案合并

世界因大資料而改變

hive sql通過具體位址解析出行政區劃(省＞市＞區＞縣＞鄉＞鎮＞村)

Excel多行轉多列

Hive最全常見錯誤及解決方案hive --service metastore &

《Hive權威指南》第八章：HiveQL索引8 HiveQL：索引

HiveQl語句應用執行個體：WordCount具體步驟如下：