天天看點

資料導入(一):Hive On HBase

Hive內建HBase可以有效利用HBase資料庫的存儲特性,如行更新和列索引等。在內建的過程中注意維持HBase jar包的一緻性。Hive與HBase的整合功能的實作是利用兩者本身對外的API接口互相進行通信,互相通信主要是依靠hive_hbase-handler.jar工具類。

整合hive與hbase的過程如下:

1.将HBASE_HOME下的 hbase-common-0.96.2-hadoop2.jar 和 zookeeper-3.4.5.jar 拷貝(覆寫)到HIVE_HOME/lib檔案夾下

2.修改HIVE_HOME/conf下hive-site.xml檔案,添加如下内容(根據實際修改):

<property>
<name>hive.querylog.location</name>
<value>$HIVE_HOME/logs</value>
</property>

<property>
<name>hive.aux.jars.path</name> 
<value>file:///hive-0.7.1/lib/hive-hbase-handler-0.7.1.jar,file:///hive-0.7.1/lib/hbase-common-0.96.2-hadoop2.jar,file:///hive-0.7.1/lib/zookeeper-3.3.2.jar</value>
</property>      

3.拷貝hbase-common-0.96.2-hadoop2.jar到所有hadoop節點(包括master)的hadoop/lib下

4.拷貝hbase/conf下的hbase-site.xml檔案到所有hadoop節點(包括master)的hadoop/conf下。

注意:如果3,4兩步跳過的話,運作hive時很可能出現如下錯誤:

org.apache.hadoop.hbase.ZooKeeperConnectionException: HBase is able to connect to ZooKeeper but the connection closes immediately.

This could be a sign that the server has too many connections (30 is the default). Consider inspecting your ZK server logs for that error and

then make sure you are reusing HBaseConfiguration as often as you can. See HTable's javadoc for more information. at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.

5.啟動hive

單節點啟動:bin/hive -hiveconf hbase.master=master:60000

如果hive-site.xml檔案中沒有配置hive.aux.jars.path,則可以按照如下方式啟動。

hive --auxpath /opt/mapr/hive/hive-0.7.1/lib/hive-hbase-handler-0.7.1.jar,/opt/mapr/hive/hive-0.7.1/lib/hbase-0.90.4.jar,/opt/mapr/hive/hive-0.7.1/lib/zookeeper-3.3.2.jar -hiveconf hbase.master=localhost:60000

叢集啟動:bin/hive -hiveconf hbase.zookeeper.quorum=node1,node2,node3 (所有的zookeeper節點)

經測試修改hive的配置檔案hive-site.xml,就可以不用增加參數啟動hive聯合hbase

<property>
<name>hive.zookeeper.quorum</name>
<value>node1,node2,node3</value>
<description>The list of zookeeper servers to talk to. This is only needed for read/write locks.</description>
</property>      

6.啟動後進行測試

(1).建構Hbase表hbase_student

hbase> create 'hbase_student', 'info'      

(2).建構hive外表hive_student, 并對應hbase_student表

Hive內建HBase需要在Hive表和HBase表之間建立映射關系,也就是Hive表的列(columns)和列類型(column types)與HBase表的列族(column families)及列限定詞(column qualifiers)建立關聯。

每一個在Hive表中的域都存在于HBase中,而在Hive表中不需要包含所有HBase中的列。

HBase中的RowKey對應到Hive中為選擇一個域使用 :key 來對應,列族中的列在Hive中為 cf:q。

CREATE EXTERNAL TABLE hive_student (rowkey string, name string, age int, phone string)
    STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
    WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:name,info:age,info:phone")
    TBLPROPERTIES("hbase.table.name" = "hbase_student");       

7.資料導入及驗證:

(1). 建立資料外表data_student

CREATE EXTERNAL TABLE data_student (rowkey string, name string, age int, phone string)
  ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' 
  LOCATION '/test/hbase/tsv/input/';       

(2). 資料通過hive_student導入到hbase_student表中

SET hive.hbase.bulk=true;
INSERT OVERWRITE TABLE hive_student SELECT rowkey, name, age, phone FROM data_student;      

備注: 若遇到java.lang.IllegalArgumentException: Property value must not be null異常, 需要hive-0.13.0及以上版本支援

轉載于:https://www.cnblogs.com/skyl/p/4849163.html