Sqoop2 安裝配置與測試
-
- 下載下傳
- 安裝配置
-
- 1. 解壓
- 2. 環境變量配置
-
- 2.1 編輯`/etc/profile`
- 2.2 編輯`/sqoop/conf/sqoop.properties`
- 2.3 修改Hadoop的`yarn-site.xml`
- 2.4 修改`core-site.xml`并重新開機Hadoop
- 2.5 需要安裝JDBC驅動,将JDBC驅動放入server/lib/目錄下。
- 3.初始化和驗證
-
- 3.1 sqoop初始化
- 3.2 sqoop驗證
-
- 3.2.1 異常處理
- 4 測試通路
-
- 4.1 啟動sqoop
-
- 4.1.1 啟動historyserver這個程序
- 4.1.2 啟動sqoop服務端
- 4.1.3 啟動sqoop用戶端
- 4.2 sqoop測試使用
-
- 4.2.1 連接配接sqoop服務端
- 4.2.2 檢查Sqoop服務(server)已經注冊的 connectors:
- 4.2.3 建立連接配接Mysql導入的link
- 4.2.4 檢視link情況
- 4.2.5 建立導出到hdfs的link
- 4.2.6 檢視已建立的link具體狀态
- 4.2.7 建立資料從mysql 轉移到hdfs 的job :
- 4.2.8 檢視job狀态
- 4.2.9 啟動job
下載下傳
sqoop1:1.4.7
sqoop2:1.99.7
版本差別請參考《sqoop簡介及sqoop1與sqoop2差別》
我們需要下載下傳sqoop-1.99.7-bin-hadoop200.tar.gz
確定Hadoop已安裝,未安裝請參考《centos下Hadoop叢集安裝》
安裝配置
1. 解壓
tar -zxvf sqoop-1.99.7-bin-hadoop200.tar.gz -C /root/
mv sqoop-1.99.7 sqoop
2. 環境變量配置
2.1 編輯 /etc/profile
/etc/profile
export SQOOP_HOME=/root/sqoop
export LOGDIR=$SQOOP_HOME/logs
export PATH=$PATH:$JAVA_HOME/bin:$SQOOP_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HBASE_HOME/sbin:$HIVE_HOME/bin
export SQOOP_SERVER_EXTRA_LIB=/root/sqoop/server/lib
source /etc/profile
使之生效
2.2 編輯 /sqoop/conf/sqoop.properties
/sqoop/conf/sqoop.properties
# JDBC repository provider configuration
#org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.derby.DerbyRepositoryHandler
org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.mysql.MySqlRepositoryHandler
org.apache.sqoop.repository.jdbc.transaction.isolation=READ_COMMITTED
org.apache.sqoop.repository.jdbc.maximum.connections=10
#org.apache.sqoop.repository.jdbc.url=jdbc:derby:@BASEDIR@/repository/db;create=true
#我的是MySQL資料庫
org.apache.sqoop.repository.jdbc.url=jdbc:mysql://master:3306/sqoop
#org.apache.sqoop.repository.jdbc.driver=org.apache.derby.jdbc.EmbeddedDriver
org.apache.sqoop.repository.jdbc.driver=com.mysql.jdbc.Driver
org.apache.sqoop.repository.jdbc.user=root
#記得修改為自己資料庫的密碼
org.apache.sqoop.repository.jdbc.password=yourpassword
# Hadoop configuration directory
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/root/hadoop-2.10.0/etc/hadoop
org.apache.sqoop.jetty.port=12000
2.3 修改Hadoop的 yarn-site.xml
yarn-site.xml
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
2.4 修改 core-site.xml
并重新開機Hadoop
core-site.xml
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
2.5 需要安裝JDBC驅動,将JDBC驅動放入server/lib/目錄下。
我安裝的是MySQL,需要先确定好mysql版本,
mysql> select version();
+-----------+
| version() |
+-----------+
| 5.6.46 |
+-----------+
1 row in set (0.01 sec)
再下載下傳對應的mysql驅動放入到server/lib/目錄下,下載下傳位址:https://repo1.maven.org/maven2/mysql/mysql-connector-java/,不過我已經安裝了HIVE,直接将hive目錄下的
mysql-connector-java-5.1.46.jar
複制過來即可
cp /root/hive-3.1.2/lib/mysql-connector-java-5.1.46.jar /root/sqoop/server/conf
3.初始化和驗證
3.1 sqoop初始化
[[email protected] bin]# ./sqoop2-tool upgrade
Setting conf dir: /root/sqoop/bin/../conf
Sqoop home directory: /root/sqoop
Sqoop tool executor:
Version: 1.99.7
Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.UpgradeTool
2020-01-28 21:56:19,969 INFO [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/sqoop/server/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop-2.10.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Tool class org.apache.sqoop.tools.tool.UpgradeTool has finished correctly.
3.2 sqoop驗證
[[email protected] bin]# ./sqoop2-tool verify
Setting conf dir: /root/sqoop/bin/../conf
Sqoop home directory: /root/sqoop
Sqoop tool executor:
Version: 1.99.7
Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.VerifyTool
2020-01-28 21:56:35,082 INFO [main] core.SqoopServer (SqoopServer.java:initialize(55)) - Initializing Sqoop server.
2020-01-28 21:56:35,089 INFO [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/sqoop/server/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop-2.10.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Verification was successful.
Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.
3.2.1 異常處理
ERROR[main] org.apache.sqoop.repository.common.CommonRepositoryHandler - Can't execute query: CREATE DATABASE IF NOT EXISTS"SQOOP"
在mysql中執行
#此步驟每次MySQL重新開機均需重新配置
set global sql_mode ='ANSI_QUOTES';
或者在mysql的配置文檔my-default.cnf中添加
#此步驟永久配置
sql_mode=ANSI_QUOTES
4 測試通路
4.1 啟動sqoop
4.1.1 啟動historyserver這個程序
#在hadoop的sbin目錄下執行如下指令
# /root/hadoop-2.10.0/etc/hadoop 是我的hadoop存放配置檔案的目錄
./mr-jobhistory-daemon.sh start historyserver --config /root/hadoop-2.10.0/etc/hadoop
4.1.2 啟動sqoop服務端
[[email protected] bin]# ./sqoop2-server start
Setting conf dir: /root/sqoop/bin/../conf
Sqoop home directory: /root/sqoop
Starting the Sqoop2 server...
2020-01-28 07:44:00,338 INFO [main] core.SqoopServer (SqoopServer.java:initialize(55)) - Initializing Sqoop server.
2020-01-28 07:44:00,412 INFO [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/sqoop/server/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop-2.10.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Sqoop2 server started.
4.1.3 啟動sqoop用戶端
[[email protected] bin]# ./sqoop-shell
bash: ./sqoop-shell: No such file or directory
[[email protected] bin]# ./sqoop2-shell
Setting conf dir: /root/sqoop/bin/../conf
Sqoop home directory: /root/sqoop
Sqoop Shell: Type 'help' or '\h' for help.
sqoop:000>
4.2 sqoop測試使用
将MySQL資料通過sqool工具傳輸到hdfs上
4.2.1 連接配接sqoop服務端
sqoop:000> set server --host master --port 12000 -webapp sqoop
Server is set successfully
4.2.2 檢查Sqoop服務(server)已經注冊的 connectors:
sqoop:000> show connector
+------------------------+---------+------------------------------------------------------------+----------------------+
| Name | Version | Class | Supported Directions |
+------------------------+---------+------------------------------------------------------------+----------------------+
| generic-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |
| kite-connector | 1.99.7 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO |
| oracle-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO |
| ftp-connector | 1.99.7 | org.apache.sqoop.connector.ftp.FtpConnector | TO |
| hdfs-connector | 1.99.7 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO |
| kafka-connector | 1.99.7 | org.apache.sqoop.connector.kafka.KafkaConnector | TO |
| sftp-connector | 1.99.7 | org.apache.sqoop.connector.sftp.SftpConnector | TO |
+------------------------+---------+------------------------------------------------------------+----------------------+
4.2.3 建立連接配接Mysql導入的link
sqoop:000> create link -c generic-jdbc-connector
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: link1_generic_jdbc
Database connection
Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://master/sqoop #確定有sqoop這個資料庫
Username: root
Password: ********* #與MYSQL密碼一緻
Fetch Size:
Connection Properties:
There are currently 0 values in the map:
entry#
SQL Dialect
Identifier enclose: (此處為空格)
New link was successfully created with validation status OK and name link1_generic_jdbc
4.2.4 檢視link情況
sqoop:000> show link
+--------------------+------------------------+---------+
| Name | Connector Name | Enabled |
+--------------------+------------------------+---------+
| link1_generic_jdbc | generic-jdbc-connector | true |
+--------------------+------------------------+---------+
4.2.5 建立導出到hdfs的link
sqoop:000> create link -c hdfs-connector
Creating link for connector with name hdfs-connector
Please fill following values to create new link object
Name: link2_hdfs
HDFS cluster
URI: hdfs://master:9000
Conf directory: /root/hadoop-2.10.0/etc/hadoop
Additional configs::
There are currently 0 values in the map:
entry#
New link was successfully created with validation status OK and name link2_hdfs
4.2.6 檢視已建立的link具體狀态
sqoop:000> show link --all
2 link(s) to show:
link with name link1_generic_jdbc (Enabled: true, Created by root at 1/28/20 8:13 PM, Updated by root at 1/28/20 8:13 PM)
Using Connector generic-jdbc-connector with name {1}
Database connection
Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://master/sqoop
Username: root
Password:
Fetch Size:
Connection Properties:
SQL Dialect
Identifier enclose:
link with name link2_hdfs (Enabled: true, Created by root at 1/28/20 8:14 PM, Updated by root at 1/28/20 8:14 PM)
Using Connector hdfs-connector with name {1}
HDFS cluster
URI: hdfs://master:9000
Conf directory: /root/hadoop-2.10.0/etc/hadoop
Additional configs::
4.2.7 建立資料從mysql 轉移到hdfs 的job :
sqoop:000> create job -f "link1_generic_jdbc" -t "link2_hdfs"
Creating job for links with from name link1_generic_jdbc and to name link2_hdfs
Please fill following values to create new job object
Name: jdbcToHdfs
Database source
#確定有sqoop這個資料庫
Schema name: sqoop
#user表即需要導入到hdfs庫的資料
Table name: user
SQL statement:
Column names:
There are currently 0 values in the list:
element#
Partition column:
Partition column nullable:
Boundary query:
Incremental read
Check column:
Last value:
Target configuration
Override null value:
Null value:
File format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
2 : PARQUET_FILE
Choose: 0
Compression codec:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0
Custom codec:
Output directory: /jdbctohdfs #確定hdfs無此檔案夾,或此檔案夾為空
Append mode:
Throttling resources
Extractors:
Loaders:
Classpath configuration
Extra mapper jars:
There are currently 0 values in the list:
element#
New job was successfully created with validation status OK and name jdbcToHdfs
4.2.8 檢視job狀态
sqoop:000> show job --all
1 job(s) to show:
Job with name jdbcToHdfs (Enabled: true, Created by root at 1/28/20 8:17 PM, Updated by root at 1/28/20 8:17 PM)
Throttling resources
Extractors:
Loaders:
Classpath configuration
Extra mapper jars:
From link: link1_generic_jdbc
Database source
Schema name: sqoop
Table name: user
SQL statement:
Column names:
Partition column:
Partition column nullable:
Boundary query:
Incremental read
Check column:
Last value:
To link: link2_hdfs
Target configuration
Override null value:
Null value:
File format: TEXT_FILE
Compression codec: NONE
Custom codec:
Output directory: /jdbctohdfs
Append mode:
4.2.9 啟動job
sqoop:000> start job -n jdbcToHdfs
2020-01-28 21:44:57 PST: FAILED
運作失敗,檢視
sqoop.log
分析原因
按理說應該在
sqoop/server/lib
檔案中查找對應jar包,這裡顯示的是sqoop/bin下查找的,未弄清需要在哪裡修改配置,直接把sqoop/server/lib下的jar軟連接配接過去
檢視
/root/sqoop/bin
下,已有jar包軟連接配接
再次啟動,成功
sqoop:000> start job -n jdbcToHdfs
Submission details
Job Name: jdbcToHdfs
Server URL: http://master:12000/sqoop/
Created by: root
Creation date: 2020-01-28 21:58:31 PST
Lastly updated by: root
External ID: job_local292485477_0006
http://localhost:8080/
Source Connector schema: Schema{name= sqoop . website ,columns=[
FixedPoint{name=websiteid,nullable=true,type=FIXED_POINT,byteSize=8,signed=true},
Text{name=companyName,nullable=true,type=TEXT,charSize=null},
Text{name=companyAddr,nullable=true,type=TEXT,charSize=null},
Text{name=companyInfo,nullable=true,type=TEXT,charSize=null},
Text{name=jobInfo,nullable=true,type=TEXT,charSize=null},
Text{name=jobAddr,nullable=true,type=TEXT,charSize=null},
Text{name=jobName,nullable=true,type=TEXT,charSize=null},
Text{name=salary,nullable=true,type=TEXT,charSize=null}]}
2020-01-28 21:58:34 PST: SUCCEEDED
在web上檢視HDFS資料上傳成功