Hive---外部分區表的建立
Hive相當于在hadoop之上加了一個SQL文法層,它提供了用戶端與接口供使用人員與hadoop間接互動,簡化了hadoop的使用。它接收SQL語句,轉換成MapReduce任務,供hadoop執行。而且SQL文法很規範哦,上手很簡單。
(1)假設有個分區表,資料如下:
hive> show create table partition_parquet;
OK
CREATE TABLE `partition_parquet`(
`member_id` string,
`name` string,
`add_item` string)
PARTITIONED BY (
`stat_date` string,
`province` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
'field.delim'='\t',
'serialization.format'='\t')
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet'
TBLPROPERTIES (
'last_modified_by'='a6',
'last_modified_time'='1525229204',
'transient_lastDdlTime'='1525229204')
Time taken: 0.173 seconds, Fetched: 22 row(s)
部分資料如下:
hive> SELECT * FROM partition_parquet where stat_date='20110527' and province ='liaoning';
OK
1 liujiannan NULL 20110527 liaoning
2 wangchaoqun NULL 20110527 liaoning
3 xuhongxing NULL 20110527 liaoning
4 zhudaoyong NULL 20110527 liaoning
5 zhouchengyu NULL 20110527 liaoning
存儲目錄如下;
bogon:bin a6$ hadoop dfs -ls -R hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
18/06/23 19:34:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
drwxr-xr-x - a6 supergroup 0 2017-11-07 10:38 hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/province=liaoning
-rwxr-xr-x 1 a6 supergroup 437 2017-11-07 10:38 hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/province=liaoning/000000_0
(2)不好的例子——外部分區表的建立及資料導入
CREATE external TABLE `partition_external_parquet`(
`member_id` string,
`name` string,
`add_item` string)
PARTITIONED BY (
`stat_date` string,
`province` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
'field.delim'='\t',
'serialization.format'='\t')
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/province=liaoning'
但是此時表中資料并沒有顯示,如下:
hive> SELECT * FROM partition_external_parquet;
OK
Time taken: 1.695 seconds
原因:沒有加入分區
接下來我們加入分區.
hive> alter table partition_external_parquet add PARTITION(stat_date='20110527',province='liaoning') location 'hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/province=liaoning';
OK
Time taken: 0.836 seconds
在此檢視資料:
hive> SELECT * FROM partition_external_parquet;
OK
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
1 liujiannan NULL 20110527 liaoning
2 wangchaoqun NULL 20110527 liaoning
3 xuhongxing NULL 20110527 liaoning
4 zhudaoyong NULL 20110527 liaoning
5 zhouchengyu NULL 20110527 liaoning
Time taken: 1.474 seconds, Fetched: 5 row(s)
(3)良好的例子——外部分區表的建立及資料導入
hive> create external table if not exists partition_external_parquet like partition_parquet;
OK
Time taken: 0.106 seconds
hive> show create table partition_external_parquet2;
OK
CREATE EXTERNAL TABLE `partition_external_parquet2`(
`member_id` string,
`name` string,
`add_item` string)
PARTITIONED BY (
`stat_date` string,
`province` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
'field.delim'='\t',
'serialization.format'='\t')
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_external_parquet2'
TBLPROPERTIES (
'transient_lastDdlTime'='1529753081')
Time taken: 0.057 seconds, Fetched: 20 row(s)
以靜态全靜态分區的形式導入資料
hive> alter table partition_external_parquet2 add PARTITION(stat_date='20110527',province='liaoning') location 'hdfs://localhost:9002/user/hive/warehouse/yyz_workdb.db/partition_parquet/stat_date=20110527/province=liaoning';
OK
Time taken: 0.078 seconds
hive> select * from partition_external_parquet2;
OK
1 liujiannan NULL 20110527 liaoning
2 wangchaoqun NULL 20110527 liaoning
3 xuhongxing NULL 20110527 liaoning
4 zhudaoyong NULL 20110527 liaoning
5 zhouchengyu NULL 20110527 liaoning
Time taken: 0.133 seconds, Fetched: 5 row(s)
參考:https://blog.csdn.net/a2011480169/article/details/51991421