天天看點

[Hive]Hive使用指南四 用戶端導入資料

版權聲明:本文為部落客原創文章,未經部落客允許不得轉載。 https://blog.csdn.net/SunnyYoona/article/details/52935649

根據導入的地方不一樣,主要介紹下面幾種方式:

  • 從本地檔案系統中導入資料到Hive表
  • 從HDFS上導入資料到Hive表
  • 從别的表中查詢出相應的資料并導入到Hive表中

1. 本地檔案系統導入Hive表中 http://gitlab.corp.qunar.com/jifeng.si/learningnotes/blob/master/IT/%E5%A4%A7%E6%95%B0%E6%8D%AE/Hive/%5BHive%5DHive%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97%E5%9B%9B%20%E5%AE%A2%E6%88%B7%E7%AB%AF%E5%AF%BC%E5%85%A5%E6%95%B0%E6%8D%AE.md#1-hive

1.1 導入普通Hive表 http://gitlab.corp.qunar.com/jifeng.si/learningnotes/blob/master/IT/%E5%A4%A7%E6%95%B0%E6%8D%AE/Hive/%5BHive%5DHive%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97%E5%9B%9B%20%E5%AE%A2%E6%88%B7%E7%AB%AF%E5%AF%BC%E5%85%A5%E6%95%B0%E6%8D%AE.md#1-1-hive

1.1.1 建立Hive普通表 http://gitlab.corp.qunar.com/jifeng.si/learningnotes/blob/master/IT/%E5%A4%A7%E6%95%B0%E6%8D%AE/Hive/%5BHive%5DHive%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97%E5%9B%9B%20%E5%AE%A2%E6%88%B7%E7%AB%AF%E5%AF%BC%E5%85%A5%E6%95%B0%E6%8D%AE.md#1-1-1-hive
CREATE TABLE IF NOT EXISTS order_uid_total(
  uid string,
  bucket_type string,
  file_name string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
           
1.1.2 導入資料 http://gitlab.corp.qunar.com/jifeng.si/learningnotes/blob/master/IT/%E5%A4%A7%E6%95%B0%E6%8D%AE/Hive/%5BHive%5DHive%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97%E5%9B%9B%20%E5%AE%A2%E6%88%B7%E7%AB%AF%E5%AF%BC%E5%85%A5%E6%95%B0%E6%8D%AE.md#1-1-2
load data local inpath '/home/xiaosi/adv/order_uid_total.txt' into table order_uid_total;
           
1.1.3 檢視資料 http://gitlab.corp.qunar.com/jifeng.si/learningnotes/blob/master/IT/%E5%A4%A7%E6%95%B0%E6%8D%AE/Hive/%5BHive%5DHive%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97%E5%9B%9B%20%E5%AE%A2%E6%88%B7%E7%AB%AF%E5%AF%BC%E5%85%A5%E6%95%B0%E6%8D%AE.md#1-1-3
hive (test)> select * from order_uid_total where bucket_type = 'put' limit 10;
OK
868 put uids.3_1
865 put uids.3_1
DC8 put uids.3_1
861 put uids.3_1
867 put uids.3_1
861 put uids.3_1
868 put uids.3_1
867 put uids.3_1
867 put uids.3_1
A00 put uids.3_1
           

1.2 導入Hive分區表 http://gitlab.corp.qunar.com/jifeng.si/learningnotes/blob/master/IT/%E5%A4%A7%E6%95%B0%E6%8D%AE/Hive/%5BHive%5DHive%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97%E5%9B%9B%20%E5%AE%A2%E6%88%B7%E7%AB%AF%E5%AF%BC%E5%85%A5%E6%95%B0%E6%8D%AE.md#1-2-hive

1.2.1 建立分區表 http://gitlab.corp.qunar.com/jifeng.si/learningnotes/blob/master/IT/%E5%A4%A7%E6%95%B0%E6%8D%AE/Hive/%5BHive%5DHive%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97%E5%9B%9B%20%E5%AE%A2%E6%88%B7%E7%AB%AF%E5%AF%BC%E5%85%A5%E6%95%B0%E6%8D%AE.md#1-2-1
CREATE TABLE IF NOT EXISTS order_uid_total_partition(
  uid string
)
PARTITIONED BY(bucket_type string, file_name string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
           
1.2.2 靜态導入資料 http://gitlab.corp.qunar.com/jifeng.si/learningnotes/blob/master/IT/%E5%A4%A7%E6%95%B0%E6%8D%AE/Hive/%5BHive%5DHive%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97%E5%9B%9B%20%E5%AE%A2%E6%88%B7%E7%AB%AF%E5%AF%BC%E5%85%A5%E6%95%B0%E6%8D%AE.md#1-2-2
load data local inpath '/home/xiaosi/adv/order_uid_total_partition.txt' into table order_uid_total_partition partition(bucket_type='put',file_name='uids_3_1');
           
1.2.3 動态導入資料 http://gitlab.corp.qunar.com/jifeng.si/learningnotes/blob/master/IT/%E5%A4%A7%E6%95%B0%E6%8D%AE/Hive/%5BHive%5DHive%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97%E5%9B%9B%20%E5%AE%A2%E6%88%B7%E7%AB%AF%E5%AF%BC%E5%85%A5%E6%95%B0%E6%8D%AE.md#1-2-3

如果表中有多個分區,按上面插入語句會要寫很多的SQL,而且查詢語句要對應上不同的分區,這樣就插入語句用起來就會很繁瑣。為了解決這個問題,Hive中提供了動态分區插入的功能,它能跟分區字段的内容自動建立分區,并在每個分區插入相應的内容:

insert overwrite table order_uid_total_partition
partition (bucket_type, file_name)
select uid, bucket_type, file_name from order_uid_total;
           

如果直接運作上面代碼,會報如下錯誤:

FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
           

從上面錯誤提示中,我們知道我們需要設定分區動态導入模式:

set hive.exec.dynamic.partition.mode=nonstrict;
           

預設值為

strict

,設定完畢之後,再次導入資料即可。

備注

SELECT語句中要包含分區字段
           

2. HDFS上導入資料到Hive表 http://gitlab.corp.qunar.com/jifeng.si/learningnotes/blob/master/IT/%E5%A4%A7%E6%95%B0%E6%8D%AE/Hive/%5BHive%5DHive%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97%E5%9B%9B%20%E5%AE%A2%E6%88%B7%E7%AB%AF%E5%AF%BC%E5%85%A5%E6%95%B0%E6%8D%AE.md#2-hdfs-hive

從本地檔案系統中将資料導入到Hive表的過程中,其實是先将資料臨時複制到HDFS的一個目錄下,然後再将資料從那個臨時目錄下移動到對應的Hive表的資料目錄裡面。是以,我們可以直接從HDFS上的一個目錄移動到相應Hive表的資料目錄下,假設HDFS上有下面這個檔案

data/order_uid_total.txt

,具體的操作如下:

load data inpath 'data/order_uid_total.txt' overwrite into table order_uid_total;
           

相對比于從本地檔案導入Hive表中,唯一的差別就是少了一個local關鍵詞。local表示檔案位于本地檔案系統上,如果沒有local關鍵詞,表示檔案位于HDFS上。

3. 從别的表中查詢出相應的資料導入到Hive表中 http://gitlab.corp.qunar.com/jifeng.si/learningnotes/blob/master/IT/%E5%A4%A7%E6%95%B0%E6%8D%AE/Hive/%5BHive%5DHive%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97%E5%9B%9B%20%E5%AE%A2%E6%88%B7%E7%AB%AF%E5%AF%BC%E5%85%A5%E6%95%B0%E6%8D%AE.md#3-hive

3.1 普通Hive表 http://gitlab.corp.qunar.com/jifeng.si/learningnotes/blob/master/IT/%E5%A4%A7%E6%95%B0%E6%8D%AE/Hive/%5BHive%5DHive%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97%E5%9B%9B%20%E5%AE%A2%E6%88%B7%E7%AB%AF%E5%AF%BC%E5%85%A5%E6%95%B0%E6%8D%AE.md#3-1-hive

建立普通Hive表

CREATE TABLE IF NOT EXISTS order_uid_total(
  uid string,
  bucket_type string,
  file_name string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
           

查詢資料導入Hive表中

insert overwrite table order_uid_total
select uid, bucket_type, file_name from order_uid_total_partition;
           

3.2 分區Hive表 http://gitlab.corp.qunar.com/jifeng.si/learningnotes/blob/master/IT/%E5%A4%A7%E6%95%B0%E6%8D%AE/Hive/%5BHive%5DHive%E4%BD%BF%E7%94%A8%E6%8C%87%E5%8D%97%E5%9B%9B%20%E5%AE%A2%E6%88%B7%E7%AB%AF%E5%AF%BC%E5%85%A5%E6%95%B0%E6%8D%AE.md#3-2-hive

建立分區Hive表

CREATE TABLE IF NOT EXISTS order_uid_total_partition(
  uid string
)
PARTITIONED BY(bucket_type string, file_name string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
           
insert overwrite table order_uid_total_partition
partition (bucket_type, file_name)
select uid, bucket_type, file_name from order_uid_total;