天天看點

HIVE之 Sqoop 1.4.6 安裝、hive與oracle表互導

1. sqoop資料遷移

1.1 概述

sqoop是apache旗下一款“Hadoop和關系資料庫伺服器之間傳送資料”的工具。 

導入資料:MySQL,Oracle導入資料到Hadoop的HDFS、HIVE、HBASE等資料存儲系統; 

導出資料:從Hadoop的檔案系統中導出資料到關系資料庫

1.2 工作機制

将導入或導出指令翻譯成mapreduce程式來實作 

在翻譯出的mapreduce中主要是對inputformat和outputformat進行定制

1.3 sqoop實戰及原理

1.3.1 sqoop安裝

安裝sqoop的前提是已經具備java和hadoop的環境 

1、下載下傳并解壓 

最新版下載下傳位址http://ftp.wayne.edu/apache/sqoop/1.4.6/ 

比如:sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

[root@hadoop1 sqoop]# tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

[root@hadoop1 sqoop]# mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha sqoop

[root@hadoop1 sqoop]# ls

apache-hive-1.2.1-bin hadoop-2.7.4 hdfs pig-0.17.0 pig_1517170893185.log sqoop tmp

2、修改配置檔案 

在/etc/profile中配置sqoop_home,代碼如下:

vim /etc/profile

export SQOOP_HOME=/usr/local/hadoop/sqoop

追加path

export PATH=$PATH:$SQOOP_HOME/bin

[root@hadoop1 sqoop]# source /etc/profile

$ cd $SQOOP_HOME/conf

$ mv sqoop-env-template.sh sqoop-env.sh

打開sqoop-env.sh并編輯下面幾行: ## 去掉前面的##

export HADOOP_COMMON_HOME=/usr/local/hadoop/hadoop-2.7.4/

export HADOOP_MAPRED_HOME=/usr/local/hadoop/hadoop-2.7.4/

export HIVE_HOME=/usr/local/hadoop/apache-hive-1.2.1-bin/

配置後的界面效果如下: 

這裡寫圖檔描述

3.1 加入oracle的驅動包

将 ojdbc6.jar 放到 $SQOOP_HOME/lib/ 下。

3.2 加入mysql的jdbc驅動包 

将mysql-connector-java-5.1.38.jar 放到 $SQOOP_HOME/lib/ 下。

4、驗證啟動

$ cd $SQOOP_HOME/bin

$ sqoop-version

預期的輸出:

[root@hadoop1 sqoop]# sqoop-version

Warning: /usr/local/hadoop/sqoop/../hbase does not exist! HBase imports will fail.

Please set $HBASE_HOME to the root of your HBase installation.

Warning: /usr/local/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail.

Please set $HCAT_HOME to the root of your HCatalog installation.

Warning: /usr/local/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.

Please set $ACCUMULO_HOME to the root of your Accumulo installation.

Warning: /usr/local/hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail.

Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.

18/01/29 19:09:34 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6

Sqoop 1.4.6

git commit id c0c5a81723759fa575844a0a1eae8f510fa32c25

Compiled by root on Mon Apr 27 14:38:36 CST 2015

[root@hadoop1 sqoop]#

到這裡,整個Sqoop安裝工作完成。

資料遷移> oracle to hive ## 注意 HIVE 表名需要大寫

sqoop# sqoop import --hive-import --connect jdbc:oracle:thin:@10.100.25.8:1521:devdb1 --username andy --password andy \

--table ANDY --hive-database oracletohive --hive-table ANDY -m 1

說明: 遷移的表時,如果 hive中已經存在,則預設會追加在原表中。 如果 hive 中不存在,則自動建立。

日志輸出:

18/01/29 19:35:46 INFO hive.HiveImport: Loading uploaded data into Hive

18/01/29 19:35:51 INFO hive.HiveImport: 

18/01/29 19:35:51 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/usr/local/hadoop/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties

18/01/29 19:36:02 INFO hive.HiveImport: OK

18/01/29 19:36:02 INFO hive.HiveImport: Time taken: 2.42 seconds

18/01/29 19:36:03 INFO hive.HiveImport: Loading data to table oracletohive.andy

18/01/29 19:36:04 INFO hive.HiveImport: Table oracletohive.andy stats: [numFiles=1, totalSize=1996]

18/01/29 19:36:04 INFO hive.HiveImport: OK

18/01/29 19:36:04 INFO hive.HiveImport: Time taken: 1.579 seconds

18/01/29 19:36:04 INFO hive.HiveImport: Hive import complete.

18/01/29 19:36:04 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.

> show databases;

OK

default

oracletohive

Time taken: 0.027 seconds, Fetched: 2 row(s)

hive> 

> use oracletohive;

Time taken: 0.034 seconds

> show tables;

andy

Time taken: 0.037 seconds, Fetched: 1 row(s)

hive> select count(*) from andy;

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2018-01-29 19:43:46,040 Stage-1 map = 0%, reduce = 0%

2018-01-29 19:43:54,738 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.49 sec

2018-01-29 19:44:03,323 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.39 sec

MapReduce Total cumulative CPU time: 3 seconds 390 msec

Ended Job = job_1517222580457_0002

MapReduce Jobs Launched: 

Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.39 sec HDFS Read: 16343 HDFS Write: 2 SUCCESS

Total MapReduce CPU Time Spent: 3 seconds 390 msec

7

Time taken: 34.543 seconds, Fetched: 1 row(s)

資料遷移> hive to oracle

hive遷移oracle,需要提前在oracle中建立遷移的表,否則報 java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist

sqoop# sqoop export --table ANDY --connect jdbc:oracle:thin:@10.100.25.8:1521:devdb1 --username andy --password andy 

 --export-dir /user/hive/warehouse/oracletohive.db/andy --input-fields-terminated-by '\001' \

--input-lines-terminated-by '\n'

18/01/29 20:21:34 INFO mapreduce.Job: Job job_1517222580457_0005 completed successfully

18/01/29 20:21:34 INFO mapreduce.Job: Counters: 30

。。。。。 省略輸出

18/01/29 20:21:34 INFO mapreduce.ExportJobBase: Transferred 5.502 KB in 116.7414 seconds (48.2605 bytes/sec)

18/01/29 20:21:34 INFO mapreduce.ExportJobBase: Exported 7 records.

-- oracle端檢視

SQL> select count(*) from andy;

COUNT(*)

----------

14 > 由 7條 變為了 14條 , 說明 hive 導入 oracle 成功!

本文轉自 張沖andy 部落格園部落格,如需轉載請自行聯系 原作者原文連結:http://www.cnblogs.com/andy6/p/8378450.html