一堆hive配置(hive-site.xml):
<configuration>
<property>
<name>datanucleus.schema.autoCreateTables</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateColumns</name>
<value>true</value>
<description>不存在時,自動建立Hive中繼資料列</description>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>false</value>
</property>
<property>
<name>datanucleus.autoStartMechanism</name>
<value>SchemaTable</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://172.16.100.35:3306/hive?createDatabaseIfNotExist=true&autoReconnect=true&ampuseSSL=false</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>使用者名</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>密碼</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/warehouse/hive</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>hive.enable.spark.execution.engine</name>
<value>true</value>
</property>
<property>
<name>spark.home</name>
<value>/work/poa/spark-1.6.2-bin-2.6.0</value>
</property>
<property>
<name>spark.master</name>
<value>yarn-cluster</value>
</property>
<property>
<name>hive.metastore.local</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://HZTelSpark-008:9083,HZTelSpark-009:9083</value>
<description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>
<property>
<name>hive.server2.support.dynamic.service.discovery</name>
<value>true</value>
</property>
<property>
<name>hive.server2.zookeeper.namespace</name>
<value>hiveserver2_zk</value>
</property>
<property>
<name>hive.zookeeper.quorum</name>
<value>HZTelSpark-001:2181,HZTelSpark-002:2181,HZTelSpark-003:2181</value>
</property>
<property>
<name>hive.zookeeper.client.port</name>
<value>2181</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10001</value>
</property>
<!--other spark configuration -->
<property>
<name>spark.executor.memory</name>
<value>10g</value>
</property>
<property>
<name>spark.executor.cores</name>
<value>10</value>
</property>
<property>
<name>spark.executor.instances</name>
<value>10</value>
</property>
<property>
<name>spark.serializer</name>
<value>org.apache.spark.serializer.KryoSerializer</value>
</property>
</configuration>
配置比較繁雜,beeLine竟然不自帶,
編譯hive,不知道是被共産法西斯牆了還是怎麼地,一直編譯不通過,
官網預編譯版的spark跟hive不相容,需要重新編譯spark,去除hive相關參數,
同時要保證parquet版本跟hive一緻,否則hive使用parquet檔案會報錯,
hive的資料類型跟parquet不相容導緻表加載和查詢過程中各種報錯,使用
hive on spark做互動式查詢,速度也沒有想象那麼快,hive為
presto提供中繼資料服務,一連串的中繼資料管理和優化也是個大問題。