Hive部分函數解析

Hive裡的exists ,in ,not exists ,not in 相關函數

表資料準備：

1.選擇指定資料庫 eg: use bg_database1;

2. 建立表

drop table demo0919 ;
create table demo0919(
 name string
,age int
,sex int
) row format delimited fields terminated by '01';

3.插入表資料

insert overwrite table demo0919 values('zs',18,1);
insert into table demo0919 values('ls',18,1);
insert into table demo0919 values('nisa',19,0);
insert into table demo0919 values('rina',22,0);
insert into table demo0919 values('zhaoxi',25,1);

4. 根據原表 demo0919 再建立一張表 demo0919_1，用于比對資料。

create table demo0919_1 as select *from demo0919;

5.檢視表資料

select *from demo0919;

函數測試

in:

in的簡單使用（ok，支援）：

select name,age,sex from demo0919 where age in (18,22);

in 裡面嵌套子查詢（error ,不支援）

select name,age,sex from demo0919 where age in (select a.age from demo0919_1 a );

not in :

not in 的簡單使用（ok, 支援）

select name,age,sex from demo0919 where age not in (18,22);

not in 裡面嵌套子查詢（error ,不支援）

select name,age,sex from demo0919 where age not in (select a.age from demo0919_1 a);

exists:

exists 基本使用（ok）

select name,age,sex from demo0919 where exists (select 1 from demo0919_1 a where a.age=18 and demo0919.name = a.name);

exists子查詢裡面使用了外表demo0919中的字段不等于（> , < , >= , <= , <>）子查詢表中的字段（error 不支援）

select name,age,sex from demo0919 where exists (select 1 from demo0919_1 a where a.age>demo0919.age and demo0919.name = a.name);

處理方案：

根據此段我們可以借助left outer join，left semi join 來實作類似功能前者允許右表的字段在select或where子句中引用，而後者不允許。

（left semi join ：需要注意使用left semi join時右側的表不能被使用，隻能在on後面作為條件篩選）

select d.name,d.age,d.sex from demo0919 d left outer join demo0919_1 a on d.name = a.name where a.age>d.age;

exists子查詢裡面未使用外表demo0919中的字段不等于（> , < , >= , <= , <>）子查詢表中的字段（ok 支援）

select name,age,sex from demo0919 where exists (select 1 from demo0919_1 a where a.age>18 and demo0919.name = a.name);

not exists 與 exist雷同。

Hive資料類型轉換函數：

daycount string; daycount表示耗時資料資訊，原來定義為string類型

cast(daycount AS FLOAT) 将string類型資料轉換為FLOAT類型

Hive日期類型轉換函數：

unix_timestamp(countdate) ：将日期轉換為時間戳， countdate為日期字段

from_unixtime(unix_timestamp(countdate),'yyyy-MM-dd HH:mm:ss') ：格式化目前時間

Hive的group by :(這裡是因為我們在使用group by時用到了帶時分秒的日期字段，hive精确到了毫秒級别，mysql中精确到秒，帶有日期字段的資料一起dsitinct 或 group by的時候資料就會有差異)

因為hive保留了毫秒位資料，故結果資料會比mysql多

例如： 2019-09-19 12:12:12.1 2019-09-19 12:12:12.2

在hive裡面 distinct後這是兩個不同的日期 2019-09-19 12:12:12.1 2019-09-19 12:12:12.2

在mysql裡面 distinct後這就是相同的日期了 2019-09-19 12:12:12

Hive部分函數解析Hive部分函數解析