
<a href="#_Toc27518%20">目錄 1</a>
<a href="#_Toc26557%20">1. 前言 1</a>
<a href="#_Toc28650%20">2. 約定 1</a>
<a href="#_Toc12246%20">3. 服務端口 2</a>
<a href="#_Toc14634%20">4. 安裝MySQL 2</a>
<a href="#_Toc2703%20">4.1. 安裝MySQL 2</a>
<a href="#_Toc13359%20">4.2. 建立Hive中繼資料庫 2</a>
<a href="#_Toc27670%20">5. 安裝步驟 3</a>
<a href="#_Toc15229%20">5.1. 下載下傳Hive 0.12.0二進制安裝包 3</a>
<a href="#_Toc26530%20">5.2. 安裝Hive 3</a>
<a href="#_Toc21777%20">5.3. 安裝MySQL-Connector 3</a>
<a href="#_Toc13404%20">5.4. 修改配置 3</a>
<a href="#_Toc26662%20">5.4.1. 修改/etc/profile或~/.profile 3</a>
<a href="#_Toc15994%20">5.4.2. 修改其它配置檔案 4</a>
<a href="#_Toc29270%20">5.4.2.1. 修改hive-env.sh 4</a>
<a href="#_Toc20070%20">5.4.2.2. 修改hive-site.xml 4</a>
<a href="#_Toc16058%20">5.4.2.3. 修改hive-log4j.properties 5</a>
<a href="#_Toc24036%20">5.4.2.4. 修改hive-exec-log4j.properties 5</a>
<a href="#_Toc30437%20">6. 啟動運作 5</a>
<a href="#_Toc13199%20">7. 遠端執行HSQL 6</a>
<a href="#_Toc25851%20">8. 基本指令 6</a>
<a href="#_Toc24365%20">9. 常見錯誤 7</a>
關于Hadoop 2.4.0的安裝,請參見《Hadoop-2.4.0分布式安裝手冊》一文。
本文約定Hadoop被安裝在/data/hadoop/current,将Hive 0.12.0的安裝到目錄/data/hadoop/hive(實際是指向/data/hadoop/hive-0.12.0-bin的軟連結)。在實際安裝部署時,可以指定為其它目錄。
10000
hive.server2.thrift.port,執行hiveserver2時會啟動它
9083
hive.metastore.uris,執行hive --service metastore時會啟動它
将二進制安裝包解壓後,可看到名為INSTALL-BINARY的檔案,該檔案有說明如何安裝MySQL,本文基本參照它進行的,MySQL安裝目錄為/data/hadoop/mysql,具體步驟如下(未使用mysql使用者及mysql使用者組,而是直接使用了目前登入使用者hadoop,hadoop隸屬使用者組users):
# 以下均以目前使用者hadoop執行
cd /data/hadoop
tar xzf mysql-5.6.17-linux-glibc2.5-x86_64.tar.gz
ln -s mysql-5.6.17-linux-glibc2.5-x86_64 mysql
cd mysql
scripts/mysql_install_db --user=hadoop
bin/mysqld_safe --user=hadoop &
cp support-files/mysql.server /etc/init.d/mysql.server # 這一條需求以root使用者運作
建立資料庫hive:
create database if not exists hive;
建立資料庫使用者hive:
create user hive identified by 'hive2014';
授權可以通路資料庫hive的IP和使用者,其中localhost的實際IP為172.25.39.166:
grant all on hive.* to 'hive'@'localhost' identified by 'hive2014';
grant all on hive.* to 'hive'@'172.25.39.166' identified by 'hive2014';
grant all on hive.* to 'hive'@'172.25.40.171' identified by 'hive2014';
進入hive資料庫:
1) 本機進入:mysql -uhive -phive2014
2) 非本南進入:mysql -uhive -h172.25.39.166 -phive2014
1) 切換到/data目錄:cd /data
2) 解壓二進制安裝包:tar xf hive-0.12.0-bin.tar.gz
3) 建立軟連結:ln -s hive-0.12.0-bin hive
選擇“Connector/J”,接着選擇“Platform Independent”,本文下載下傳的是“mysql-connector-java-5.1.30.tar.gz”。
壓縮包“mysql-connector-java-5.1.30.tar.gz”中有個mysql-connector-java-5.1.30-bin.jar,解壓後将mysql-connector-java-5.1.30-bin.jar上傳到Hive的lib目錄下,這個是MySQL的JDBC驅動程式。
設定環境變量HIVE_HOME,并将Hive加入到PATH中:
export HIVE_HOME=/data/hadoop/hive
export PATH=$HIVE_HOME/bin:$PATH
進入/data/hadoop/hive/conf目錄,可以看到如下:
hadoop@VM-40-171-sles10-64:~/hive/conf> ls
hive-default.xml.template hive-exec-log4j.properties.template
hive-env.sh.template hive-log4j.properties.template
可以看到4個模闆檔案,複制并重命名成配置檔案:
cp hive-env.sh.template hive-env.sh
cp hive-default.xml.template hive-site.xml
cp hive-log4j.properties.template hive-log4j.properties
cp hive-exec-log4j.properties.template hive-exec-log4j.properties
如果之前沒有設定好HADOOP_HOME環境變量,則可在hive-env.sh中,進行設定:
HADOOP_HOME=/data/hadoop/current
1) 修複BUG
該檔案有個文法BUG,需要修改,進入到hive-site.xml的第2000行,該行内容為:auth,明顯的文法錯誤,需要将“auth”改成“value”。
2) 修改javax.jdo.option.ConnectionURL
将值設定為:jdbc:mysql://172.25.39.166:3306/hive?characterEncoding=UTF-8。
3) 修改javax.jdo.option.ConnectionDriverName
将值設定為:com.mysql.jdbc.Driver。
4) 修改javax.jdo.option.ConnectionUserName
将值設定為通路hive資料庫的使用者名hive:hive。
5) 修改javax.jdo.option.ConnectionPassword
将值設定為通路hive資料庫的密碼:hive2014。
6) 修改hive.metastore.schema.verification
該值試情況進行修改。
7) 修改hive.zookeeper.quorum
将值設定為:10.12.154.77,10.12.154.78,10.12.154.79,ZooKeeper被安裝在這三台機器上。
8) 修改hive.metastore.uris
将值設定為:thrift://172.25.40.171:9083,9083為Hive中繼資料的RPC服務端口。
9) 修改hive.metastore.warehouse.dir
将值設定為:/data/hadoop/hive/warehouse,注意啟動前,需要建立好該目錄(mkdir /data/hadoop/hive/warehouse)。
10) 修改hive.server2.thrift.bind.host
該值預設為localhost,如果需要在其它機器遠端通路Hive,則需要将它改成IP位址,本文将它改成172.25.40.171。
修改日志檔案存放目錄,将日志目錄由/tmp/${user.name}改為/data/hadoop/hive/logs:
hive.log.dir=/data/hadoop/hive/logs
然後建立好目錄/data/hadoop/hive/logs。
修改日志檔案存放目錄,将日志目錄由預設的/tmp/${user.name}改為/data/hadoop/hive/logs/exec:
hive.log.dir=/data/hadoop/hive/logs/exec
然後建立好目錄/data/hadoop/hive/logs/exec。
1) 初始化metastore
安裝配置好後,在啟動Hive服務端之前,需要在服務端執行一次“schematool -dbType mysql -initSchema”,以完成對metastore的初始化。
2) 啟動metastore
執行指令:hive --service metastore &
3) 啟動Hive服務
執行:hiveserver2 &。
4) 進入Hive指令行操作界面(類似于mysql)
執行:hive
hadoop@VM-40-171-sles10-64:~/hive/bin> ./beeline
Beeline version 0.12.0 by Apache Hive
beeline> !connect jdbc:hive2://172.25.40.171:10000 hive hive2014 org.apache.hive.jdbc.HiveDriver
Connecting to jdbc:hive2://172.25.40.171:10000
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/hadoop/hadoop-2.4.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/hadoop/hive-0.12.0-bin/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Connected to: Hive (version 0.12.0)
Driver: Hive (version 0.12.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://172.25.40.171:10000> select * from invites limit 2;
+------+----------+-------+
| foo | bar | ds |
| 474 | val_475 | 2014 |
| 281 | val_282 | 2014 |
2 rows selected (1.779 seconds)
0: jdbc:hive2://172.25.40.171:10000>
将hive/bin、hive/lib、hive/conf和hive/examples打包,如:tar czf hive-bin.tar.gz hive/bin hive/lib hive/conf hive/examples。
然後将hive-bin.tar.gz上傳到其它機器,借助beeline即可遠端執行HSQL(用hive可能會遇到問題,本文在操作時,使用hive,在執行HSQL時總會卡住,日志也沒有記錄特别原因,暫未去定位)。
CREATE TABLE pokes (foo INT, bar STRING);
CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
SHOW TABLES;
SHOW TABLES '.*s';
DESCRIBE invites;
DROP TABLE pokes;
Hive的安裝目錄下有個examples子目錄,存儲了示例用到的資料檔案等。測試往表invites中加載資料,将檔案../examples/files/kv2.txt加載到表invites中:
LOAD DATA LOCAL INPATH '../examples/files/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='2014');
可以通過“select * from invites;”來檢驗加載情況,或者執行“select count(1) from invites;”。
1) Failed to connect to the MetaStore Server
如果運作hiveserver2,遇到下列錯誤後,推薦打開DEBUG日志級别,以更檢視更詳細的資訊,将日志配置檔案hive-log4j.properties中的“hive.root.logger=WARN,DRFA”改成“hive.root.logger=DEBUG,WARN,DRFA”即可。
2014-04-23 06:00:04,169 WARN hive.metastore (HiveMetaStoreClient.java:open(291)) - Failed to connect to the MetaStore Server...
2014-04-23 06:00:05,173 WARN hive.metastore (HiveMetaStoreClient.java:open(291)) - Failed to connect to the MetaStore Server...
2014-04-23 06:00:06,177 WARN hive.metastore (HiveMetaStoreClient.java:open(291)) - Failed to connect to the MetaStore Server...
2014-04-23 06:00:07,181 WARN hive.metastore (HiveMetaStoreClient.java:open(291)) - Failed to connect to the MetaStore Server...
2014-04-23 06:00:08,185 WARN hive.metastore (HiveMetaStoreClient.java:open(291)) - Failed to connect to the MetaStore Server...
2014-04-23 06:00:09,194 ERROR service.CompositeService (CompositeService.java:start(74)) - Error starting services HiveServer2
org.apache.hive.service.ServiceException: Unable to connect to MetaStore!
at org.apache.hive.service.cli.CLIService.start(CLIService.java:85)
at org.apache.hive.service.CompositeService.start(CompositeService.java:70)
at org.apache.hive.service.server.HiveServer2.start(HiveServer2.java:73)
at org.apache.hive.service.server.HiveServer2.main(HiveServer2.java:103)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
修改後,再次運作hiveserver2,日志變詳細了,猜測是metastore沒有起來,可以通過執行“hive --service metastore”來啟動metastore。
2014-04-23 06:04:27,053 INFO hive.metastore (HiveMetaStoreClient.java:open(244)) - Trying to connect to metastore with URI thrift://172.25.40.171:9083
2014-04-23 06:04:27,085 WARN hive.metastore (HiveMetaStoreClient.java:open(288)) - Failed to connect to the MetaStore Server...
org.apache.thrift.transport.TTransportException: java.net.ConnectException: 拒絕連接配接
at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:283)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:164)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:104)
at org.apache.hive.service.cli.CLIService.start(CLIService.java:82)
at org.apache.hive.service.CompositeService.start(CompositeService.java:70)
2) Version information not found in metastore
執行“./hive --service metastore”報下面這樣的錯誤原因是未對metastore進行初始化,需要執行一次“schematool -dbType mysql -initSchema”。
MetaException(message:Version information not found in metastore. )
at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:5638)
at org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:5622)
at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124)
at com.sun.proxy.$Proxy2.verifySchema(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:403)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:441)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:326)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.(HiveMetaStore.java:286)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:54)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4060)
at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:4263)
at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:4197)