天天看點

hadoop 資料存儲格式

一般而言,存儲格式分為列式存儲和行存儲,對于hadoop來時,列式存儲有parquet,rcfile,orcfile等,行存儲有SequenceFile,MapFile,Avro Datafile等

hive中的使用

1.orc

create table test_orc(
 ...
 )
 PARTITIONED BY (day int ) 
 STORED AS ORC 
 LOCATION '/test/test_orc/' 
 tblproperties ("orc.compress"="SNAPPY"); 
           

預設為tblproperties(“orc.compress”=”ZLIB”);

2.parquet

create table test_parquet( ... ) PARTITIONED BY (day int ) STORED AS parquet LOCATION '/test/test_parquet/' ;

CREATE TABLE … STORED AS ORC

ALTER TABLE … SET FILEFORMAT ORC

SET hive.default.fileformat=ORC

參考:

http://blog.csdn.net/bingduanlbd/article/details/52088520

https://www.cnblogs.com/zhenjing/archive/2012/11/02/File-Format.html

繼續閱讀