天天看点

hive安装和使用相关问题解决方式

ubuntu下面:

(1)安装好java

设置 JAVA_HOME

在文件/etc/profile中设置

JAVA_HOME=/usr/

export JAVA_HOME

[email protected]:~/apache/trunk$ echo $JAVA_HOME

/usr/

(2)ssh

[email protected]:~$ ssh-keygen -t rsa -P ""

[email protected]:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

ssh localhost

如果没有ssh就需要安装

tian[email protected]:~$ sudo apt-get install openssh-server

[email protected]:~$ ssh localhost

(3) hadoop的配置,  hadoop-env.sh  mapred-site.xml hdfs-site.xml core-site.xml

官方的版本里面要在 hadoop-env.sh 加上 :

export JAVA_HOME=/usr/

[email protected]:~/hadoop-0.20.2-cdh3u0/conf$ cat core-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

  <name>fs.default.name</name>

  <value>hdfs://localhost:54310</value>

  <description>The name of the default file system. A URI whose

  scheme and authority determine the FileSystem implementation. The

  uri's scheme determines the config property (fs.SCHEME.impl) naming

  the FileSystem implementation class. The uri's authority is used to

  determine the host, port, etc. for a filesystem.</description>

</property>

<property>

  <name>hadoop.tmp.dir</name>

  <value>/home/tianzhao/hadoop-0.20.2-cdh3u0/tmp</value>

</property>

</configuration>

[email protected]:~/hadoop-0.20.2-cdh3u0/conf$ cat hdfs-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>dfs.name.dir</name>

<value>/home/tianzhao/hadoop-0.20.2-cdh3u0/hdfs/name</value>

</property>

<property>

<name>dfs.data.dir</name>

<value>/home/tianzhao/hadoop-0.20.2-cdh3u0/hdfs/data</value>

</property>

</configuration>

[email protected]:~/hadoop-0.20.2-cdh3u0/conf$ cat mapred-site.xml <?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>localhost:54311</value>

<description>The host and port that the MapReduce job tracker runs

at. If "local", then jobs are run in-process as a single map

and reduce task.

</description>

</property>

</configuration>

bin/hadoop namenode -format

bin/start-all.sh

bin/stop-all.sh

(3.1)hbase   hbase-site.xml

<configuration>

  <property>

    <name>hbase.rootdir</name>

    <value>hdfs://localhost:54310/hbase</value>

  </property>

<property> 

  <name>hbase.cluster.distributed</name> 

  <value>true</value> 

</property> 

  <property>

    <name>dfs.replication</name>

    <value>1</value>

  </property>

</configuration>

// master

http://localhost:60010

// region server

http://localhost:60030

mysql 4不支持,mysql 5以上才支持。 外键问题。

(4)mysql安装

[email protected]:~/hive-0.7.0-cdh3u0$ sudo apt-get install mysql-server

root用户密码:tianzhao1234

[email protected]:~/hive-0.7.0-cdh3u0$ mysql -uroot -ptianzhao1234

mysql> create database hive;

Query OK, 1 row affected (0.00 sec)

mysql> create user 'hive'@'localhost' identified by '123456';

Query OK, 0 rows affected (0.00 sec)

mysql> grant all privileges on hive.* to 'hive'@'localhost' with grant option;

Query OK, 0 rows affected (0.00 sec)

flush privileges;

sudo /etc/init.d/mysql restart

http://www.pantz.org/software/mysql/mysqlcommands.html

用mysqldump命令行

命令格式

mysqldump -u用户名 -p 数据库名 > 数据库名.sql

范例:

mysqldump -uroot -p abc > abc.sql

(导出数据库abc到abc.sql文件)

提示输入密码时,输入该数据库用户名的密码。

用mysql命令行

命令格式

mysql -u用户名 -p 数据库名 < 数据库名.sql

范例:

mysql -uabc_f -p abc < abc.sql

(导入数据库abc从abc.sql文件)

提示输入密码时,输入该数据库用户名的密码。

drop database hive;

show databases;

(5)hive  hive-config.sh   hive-site.xml

1.修改bin/hive-config.sh

export HADOOP_HOME=/home/tianzhao/hadoop-0.20.2-cdh3u0/

conf/hive-site.xml

[email protected]:~/hive-0.7.0-cdh3u0/conf$ cat hive-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" target="_blank" rel="external nofollow" ?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

<property>

  <name>hive.metastore.local</name>

  <value>true</value>

</property>

<property>

        <name>javax.jdo.option.ConnectionURL</name>

        <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>

        <description>JDBC connect string FOR a JDBC metastore</description>

</property>

<property>

        <name>javax.jdo.option.ConnectionDriverName</name>

        <value>com.mysql.jdbc.Driver</value>

        <description>Driver class name FOR a JDBC metastore</description>

</property>

<property>

        <name>javax.jdo.option.ConnectionUserName</name>

        <value>hive</value>

        <description>username TOUSE against metastore database</description>

</property>

<property>

        <name>javax.jdo.option.ConnectionPassword</name>

        <value>123456</value>

        <description>password TOUSE against metastore database</description>

</property>

</configuration>

待修改

export HADOOP_HOME=/home/tianzhao/hadoop-0.19.2

export HADOOP_CONF_DIR=$HOME/config

echo $HOME 

/home/tianzhao

export HIVE_HOME=/home/tianzhao/hive

export HIVE_CONF_DIR=/home/tianzhao/hive-config

export HIVE_LIB=${HIVE_HOME}/lib

export HIVE_AUX_JARS_PATH=/home/tianzhao/jar

在 hive  cli中执行show tables,alter table等hivesql变得很慢,往往是因为mysql的原因,因为hive操作分几部分:表信息操作是通过连接的数据库,目录操作的是hdfs上面的目录,job提交给jobtracker。

show tables主要是从mysql中读取表的信息,然后显示。如果变得很慢就是hive从Mysql获取table信息很慢,可以重启mysql或者把mysql迁移到其他机器上。

http://forge.mysql.com/wiki/MySQL_Internals_Optimizer#The_Optimizer

hive> show tables;

FAILED: Error in metadata: javax.jdo.JDODataStoreException: Exception thrown obtaining schema column information from datastore

NestedThrowables:

java.sql.SQLException: Got error 28 from storage engine

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

java.sql.SQLException: Got error 28 from storage engine  这个是因为磁盘空间满了的原因

http://bugs.mysql.com/bug.php?id=3611

This issue still exist in MySQL server version 5.1.41

I did a sub-query in FROM clause and used WHERE clause to filter results.

The following query works fine without WHERE clause. If you add WHERE clause you will

receive an error code: "1267 Illegal mix of collations (latin1_swedish_ci, IMPLICIT) and

(utf8_general_ci, COERCIBLE) for operation '='"

This example is taken from the book I was reading:

SELECT PLAYERNO, NAME, PLAYERS.TOWN, NUMBER * 1000

FROM PLAYERS,

(SELECT 'Stratford' AS TOWN, 4 AS NUMBER

UNION

SELECT 'Plymouth', 6

UNION

SELECT 'Inglewood', 1

UNION

SELECT 'Douglas', 2) AS TOWNS

WHERE PLAYERS.TOWN = TOWNS.TOWN

ORDER BY PLAYERNO

脏数据产生了分区字段,然后出错。

Failed with exception javax.jdo.JDODataStoreException: Error executing JDOQL query "SELECT 'org.apache.hadoop.hive.metastore.model.MPartition' AS NUCLEUS_TYPE,`THIS`.`CREATE_TIME`,`THIS`.`LAST_ACCESS_TIME`,`THIS`.`PART_NAME`,`THIS`.`PART_ID` FROM `PARTITIONS` `THIS` LEFT OUTER JOIN `TBLS` `THIS_TABLE_TABLE_NAME` ON `THIS`.`TBL_ID` = `THIS_TABLE_TABLE_NAME`.`TBL_ID` LEFT OUTER JOIN `TBLS` `THIS_TABLE_DATABASE` ON `THIS`.`TBL_ID` = `THIS_TABLE_DATABASE`.`TBL_ID` LEFT OUTER JOIN `DBS` `THIS_TABLE_DATABASE_DATABASE_NAME` ON `THIS_TABLE_DATABASE`.`DB_ID` = `THIS_TABLE_DATABASE_DATABASE_NAME`.`DB_ID` WHERE `THIS_TABLE_TABLE_NAME`.`TBL_NAME` = ? AND `THIS_TABLE_DATABASE_DATABASE_NAME`.`NAME` = ? AND `THIS`.`PART_NAME` = ?" : Illegal mix of collations (latin1_bin,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='.

NestedThrowables:

java.sql.SQLException: Illegal mix of collations (latin1_bin,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation '='

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

Failed with exception javax.jdo.JDODataStoreException: Add request failed : INSERT INTO `COLUMNS` (`SD_ID`,`COMMENT`,`COLUMN_NAME`,`TYPE_NAME`,`INTEGER_IDX`) VALUES (?,?,?,?,?)

NestedThrowables:

java.sql.BatchUpdateException: The table 'columns' is full

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

错误信息:

Caused by: java.sql.SQLException: The table 'xxx' is full

        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1073)

        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3597)

        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3529)

解决办法:

请参阅如下地址

1. http://forums.dcm4che.org/jiveforums/message.jspa?messageID=2751

2. http://jeremy.zawodny.com/blog/archives/000796.html

3. http://dev.mysql.com/doc/refman/5.0/en/full-table.html