Hive常用指令

Hive中定義變量

内置命名空間

Hive内置命名空間包含了hivevar、hiveconf、system和env。

2020-11-20Hive常用指令

在Hive中寫入hivevar變量

hive --define/--hivevar key=value

顯示變量

set env:HOME
set hivevar:key
set key

給變量指派

set key=value
set hivevar:key=value

在sql語句中調用變量

create table table_name(i int, ${hivevar:key} string)
create table table_name(i int, ${key} string)

隻執行一次的Hive腳本

echo "one row" > /root/path/myfile 
hive -e "load data local inpath '/root/path/myfile' into table table_name"

在Hive中執行shell指令

hive>!pwd

在Hive中使用hadoop

hive>dfs -ls

JDBC時間和UTC時間之間的互相轉換

指定了格式的轉換

from_unixtime(unix_timestamp('20180930',"yyyyMMdd"),'yyyyMMdd')

為空生成目前的unix時間戳

hive> select unix_timestamp();
unix_timestamp(void) is deprecated. Use current_timestamp instead.
OK
1547438004

轉換成目前的UTC時間戳

hive> select from_unixtime(unix_timestamp());
unix_timestamp(void) is deprecated. Use current_timestamp instead.
OK
2019-01-14 13:27:12

UTC時間格式為

YYYY-MM-DD hh:mm:ss.ffffffff

集合資料類型

struct：需要事前定義好格式
map: 類似于struct，但不是事前定義好格式的
array：數組資料類型

create table employees( 
name string, 
salary float, 
subordinates array, 
deductions map<string, float>, 
address struct<street:string,city:string,state:string,zip:int> 
)

Hive資料庫的基本操作

指定Hive資料庫的路徑

create database database_name
location '/my/preferred/directory'

檢視Hive資料庫的路徑

describe database database_name

建立Hive資料庫的擴充資訊

create database database_name
WITH DBPROPERTIES ('creator' = 'name', 'date' = '2019-01-09');

檢視所建立的擴充資訊

describe database extended database_name

修改資料庫屬性

ALTER DATABASE database_name SET DBPROPERTIES ('edited-by' = 'tjm', 'date' = '2019-01-09')

Hive資料表的基本操作

表的建立

create table if not exists employee(
name string comment 'Employee name',
salary float comment 'Employee salary',
subordinates array<string> comment 'Names of subordinates',
deductions map<string, float> comment 'Keys are deductions name,values are percnetages',
address struct<street:string, city:string, state:string,zip:int> comment 'Home address')
COMMENT 'Description of the table'
TBLPROPERTIES ('creator'='name', 'date'='2019-01-09')

用複制表結構的方式建立表

create table if not exists employee2
LIKE employee

檢視所建立表的基本資訊

關鍵字：show、describe、formatted、extended

show tables in database_name  --檢視指定庫内的所有表

describe employees  --描述表的字段資訊

describe extended employees  --顯示所建立表的擴充資訊

describe formatted employee2  --最全面的資訊檢視方式，能夠顯示出最多的資訊

show create table employee  --顯示建表時候的建表腳本

show tables 'empl.*' --通過模糊比對的方式進行查詢

使用外部表來建表

外部表的重要作用是當我們對一張表進行删除的時候，僅僅對表結構進行删除，不會影響到建表時使用到的原始資料。

關鍵字：external

create external table if not exists stocks (
exchange string,
symbol string,
ymd string,
price_open float,
price_high float,
price_low float,
price_close float,
volume int,
price_adj_close float)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

使用分區來管理表

關鍵字：partitioned by

create table employees(
name string,
salary float,
subordinates array<string>,
deductions map<string, float>,
address struct<street:string, city:string, state:string, zip:int>
)
PARTITIONED BY (country string, state string);

通過分區來載入資料

關鍵字：load、local

load data local inpath '${env:HOME}/directory' into table employees
partition (country='US',state='CA')

表重命名

關鍵字：alter

alter table table_name rename to new_table_name;

增加、修改和删除表分區

增加表分區

alter table table_name add if not exists 
partition(year=2011, month=1, day=1) location '/logs/2011/01/01' 
partition(year=2011, month=1, day=2) location '/logs/2011/01/02'
partition(year=2011, month=1, day=3) location '/logs/2011/01/03';

修改表分區

alter table table_name partition(year=2011, month=12, day=2)
set location 'hdfs://ip_address/logs/2011/01/02';

删除表分區

alter table table_name drop if exists 
partition(year=2011, month=1, day=1);

增加和修改表的列資訊

修改列

alter table employees
change column name new_name string comment 'change to  new column name'
first --first替換後放到最前，如果使用after，那麼緊接列名，表示移動到這個字段之後

增加列

alter table employees add columns(
app_name string comment 'application name',
session_id bigint comment 'session_id name'
)

更新表的屬性

alter table employees set tblproperties(
'creator'='name',
'date'='2019-01-10'
)

操作完成後舊表的屬性會被新表的屬性所覆寫。

資料的導入

向已經建立的表中導入資料

指定的路徑是一個目錄或者檔案，如果建立的表不是分區表那麼就不需要添加partition。導入使用了overwrite字段的話，如果要導入的表在資料庫中不存在，那麼首先進行建立，如果表在資料庫中已經存在，就進行重寫，如果沒有overwrite關鍵字，僅僅是把新檔案增加到目标檔案夾中而不會删除之前的資料。

load data local inpath '${env:HOME}/directory' overwrite into table employees
partition (country='US',state='CA')

load data local inpath '${env:HOME}/directory' into table employees
partition (country='US',state='CA')

通過查詢語句向表中插入資料

insert overwrite table employees
partition (country='US',state='CA')
select * from staged_employees se
where se.city='US' and se.st='CA'

insert into table employees
partition (country='US',state='CA')
select * from staged_employees se
where se.city='US' and se.st='CA';

這裡可以選用overwrite或者into，這兩個字段的意義和之前所說的一樣。

一次掃描完成全部分區資料的插入

都從原表讀取資料的情況。

insert overwrite table sales select * from history where action='purchased'
insert overwrite table credits select * from history where action='returned'

隻進行一次查詢的情況。

from history 
insert overwrite table sales select * where action='purchased'
insert overwrite table credits select * where action='returned'

差別于對資料完成一次掃描然後進行一次資料插入，我們還可以在進行一次掃描之後完成我們想要的全部資料的插入。

使用單個查詢語句建立表并導入資料

關鍵字：create...as...

create table table_name as 'select no,age from
use_table where age>50'

資料的導出

使用hadoop指令

hadoop fs -cp source_path target_path

這個指令适用于同一個伺服器之間的資料拷貝，如果是不同的伺服器之間通過hadoop來進行資料的拷貝，如下：

hadoop fs -get file_path --原伺服器
hadoop fs -put file_path --目标伺服器

使用insert...directory...導出檔案到指定目錄

insert overwrite/into local directory '/tmp/ca_employees'
select name.salary,address 
from table_name 
where se.cnty-''US and se.st='OR';

同時對多個檔案進行導出

from staged_employees se
insert overwrite/into local directory '/tmp/or_employees'
select *  where se.city='US' and se.st='OR';
insert overwrite/into local directory '/tmp/ca_employees'
select *  where se.city='US' and se.st='CA';

Hive中的類型轉換

使用關鍵字cast進行資料類型的轉換。

hive> select cast(anum as float) from littlebigdata;
OK
NULL
2.0

此例是将整型資料轉換為了浮點型資料。

Hive中的抽樣查詢

随機抽樣

select * from table_name tablesample(bucket 3 out of 10 on rand()) s;

10表示對整個資料劃分出的bucket個數，3表示從第三個bucket開始抽樣。

基于百分比的資料塊抽樣

select * from table_name tablesample(0.1 percent) s;

注：這種抽樣并不适用于所有檔案的格式。

Hive建立查詢視圖

使用Hive的視圖，可以避免select語句的多重嵌套，增加可讀性。

不使用視圖的腳本

select a.head_no,a.bank_name from (select no, name from table_name) a limit 2;

視圖腳本

關鍵字：create view

create view table_view as (select no,name from table_name)  --建立視圖

select no, name from table_view limit 2  --基于視圖的查詢

檢視Hive函數的描述

發現所有函數

hive> show functions;
OK
!
!=
$sum0
%
&
*
+
-
/
<
<=
<=>
<>
=
==
>
>=
^
abs
acos
add_months
aes_decrypt
.
.
.

檢視指定函數的描述

hive> describe function abs;
OK
abs(x) - returns the absolute value of x

檢視帶有執行個體的更詳細的描述

hive> describe function extended abs;
OK
abs(x) - returns the absolute value of x
Example:
  > SELECT abs(0) FROM src LIMIT 1;
  0
  > SELECT abs(-5) FROM src LIMIT 1;
  5
Function class:org.apache.hadoop.hive.ql.udf.generic.GenericUDFAbs
Function type:BUILTIN

表生成函數

hive> select array(1,2,3);
OK
[1,2,3]

hive> select explode(array(1,2,3));
OK
1
2
3

自定義Hive檔案和記錄格式

修改如下語句，替代為指定的格式以更改存儲格式。

STORED AS 
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

記錄格式：SerDe

SerDe是序列化、反序列化的簡寫，一個SerDe是将一條記錄的非結構化位元組轉化成Hive可以使用的一條記錄的過程。

以ORC檔案格式進行存儲

ORC格式：

ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'

參考：《Hive程式設計指南》

2020-11-20Hive常用指令

Hive常用指令

Hive中定義變量

隻執行一次的Hive腳本

在Hive中執行shell指令

在Hive中使用hadoop

JDBC時間和UTC時間之間的互相轉換

集合資料類型

Hive資料庫的基本操作

Hive資料表的基本操作

資料的導入

資料的導出

Hive中的類型轉換

Hive中的抽樣查詢

Hive建立查詢視圖

檢視Hive函數的描述

自定義Hive檔案和記錄格式

繼續閱讀

luogu1078 文化之旅

Hadoop離線_Hive的基本操作

Hive中内部表、外部表、分區、分桶以及SQL的執行順序

Hive中的内部表外部表及分區表

Hive---外部分區表的建立

Hive學習筆記 3 Hive的資料模型：内部表、分區表、外部表、桶表、視圖

HiveQL(二):分區表

Hive的分區表入門

Hive的分區表

Hive（二）--分區分桶，内部表外部表

大資料高頻面試題之Hive的小檔案合并

世界因大資料而改變

hive sql通過具體位址解析出行政區劃(省＞市＞區＞縣＞鄉＞鎮＞村)

Hive最全常見錯誤及解決方案hive --service metastore &

《Hive權威指南》第八章：HiveQL索引8 HiveQL：索引

HiveQl語句應用執行個體：WordCount具體步驟如下：