天天看點

Sqoop2(1.99.7) 安裝配置與測試

Sqoop2 安裝配置與測試

    • 下載下傳
    • 安裝配置
      • 1. 解壓
      • 2. 環境變量配置
        • 2.1 編輯`/etc/profile`
        • 2.2 編輯`/sqoop/conf/sqoop.properties`
        • 2.3 修改Hadoop的`yarn-site.xml`
        • 2.4 修改`core-site.xml`并重新開機Hadoop
        • 2.5 需要安裝JDBC驅動,将JDBC驅動放入server/lib/目錄下。
      • 3.初始化和驗證
        • 3.1 sqoop初始化
        • 3.2 sqoop驗證
          • 3.2.1 異常處理
      • 4 測試通路
        • 4.1 啟動sqoop
          • 4.1.1 啟動historyserver這個程序
          • 4.1.2 啟動sqoop服務端
          • 4.1.3 啟動sqoop用戶端
        • 4.2 sqoop測試使用
          • 4.2.1 連接配接sqoop服務端
          • 4.2.2 檢查Sqoop服務(server)已經注冊的 connectors:
          • 4.2.3 建立連接配接Mysql導入的link
          • 4.2.4 檢視link情況
          • 4.2.5 建立導出到hdfs的link
          • 4.2.6 檢視已建立的link具體狀态
          • 4.2.7 建立資料從mysql 轉移到hdfs 的job :
          • 4.2.8 檢視job狀态
          • 4.2.9 啟動job

下載下傳

sqoop1:1.4.7

sqoop2:1.99.7

版本差別請參考《sqoop簡介及sqoop1與sqoop2差別》

我們需要下載下傳sqoop-1.99.7-bin-hadoop200.tar.gz

確定Hadoop已安裝,未安裝請參考《centos下Hadoop叢集安裝》

安裝配置

1. 解壓

tar -zxvf sqoop-1.99.7-bin-hadoop200.tar.gz -C /root/
mv sqoop-1.99.7 sqoop
           

2. 環境變量配置

2.1 編輯

/etc/profile

export SQOOP_HOME=/root/sqoop
export LOGDIR=$SQOOP_HOME/logs
export PATH=$PATH:$JAVA_HOME/bin:$SQOOP_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$HBASE_HOME/sbin:$HIVE_HOME/bin
export SQOOP_SERVER_EXTRA_LIB=/root/sqoop/server/lib
           

source /etc/profile

使之生效

2.2 編輯

/sqoop/conf/sqoop.properties

# JDBC repository provider configuration
#org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.derby.DerbyRepositoryHandler
org.apache.sqoop.repository.jdbc.handler=org.apache.sqoop.repository.mysql.MySqlRepositoryHandler

org.apache.sqoop.repository.jdbc.transaction.isolation=READ_COMMITTED
org.apache.sqoop.repository.jdbc.maximum.connections=10
#org.apache.sqoop.repository.jdbc.url=jdbc:derby:@BASEDIR@/repository/db;create=true
#我的是MySQL資料庫
org.apache.sqoop.repository.jdbc.url=jdbc:mysql://master:3306/sqoop

#org.apache.sqoop.repository.jdbc.driver=org.apache.derby.jdbc.EmbeddedDriver
org.apache.sqoop.repository.jdbc.driver=com.mysql.jdbc.Driver

org.apache.sqoop.repository.jdbc.user=root
#記得修改為自己資料庫的密碼
org.apache.sqoop.repository.jdbc.password=yourpassword

# Hadoop configuration directory
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/root/hadoop-2.10.0/etc/hadoop

org.apache.sqoop.jetty.port=12000
           

2.3 修改Hadoop的

yarn-site.xml

<property>
  <name>yarn.log-aggregation-enable</name>
  <value>true</value>
</property>
           

2.4 修改

core-site.xml

并重新開機Hadoop

<property>
  <name>hadoop.proxyuser.root.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.root.groups</name>
  <value>*</value>
</property>
           

2.5 需要安裝JDBC驅動,将JDBC驅動放入server/lib/目錄下。

我安裝的是MySQL,需要先确定好mysql版本,

mysql> select version();
+-----------+
| version() |
+-----------+
| 5.6.46    |
+-----------+
1 row in set (0.01 sec)
           

再下載下傳對應的mysql驅動放入到server/lib/目錄下,下載下傳位址:https://repo1.maven.org/maven2/mysql/mysql-connector-java/,不過我已經安裝了HIVE,直接将hive目錄下的

mysql-connector-java-5.1.46.jar

複制過來即可

cp /root/hive-3.1.2/lib/mysql-connector-java-5.1.46.jar /root/sqoop/server/conf
           

3.初始化和驗證

3.1 sqoop初始化

[[email protected] bin]# ./sqoop2-tool upgrade
Setting conf dir: /root/sqoop/bin/../conf
Sqoop home directory: /root/sqoop
Sqoop tool executor:
	Version: 1.99.7
	Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
	Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.UpgradeTool
2020-01-28 21:56:19,969 INFO  [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/sqoop/server/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop-2.10.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Tool class org.apache.sqoop.tools.tool.UpgradeTool has finished correctly.
           

3.2 sqoop驗證

[[email protected] bin]# ./sqoop2-tool verify
Setting conf dir: /root/sqoop/bin/../conf
Sqoop home directory: /root/sqoop
Sqoop tool executor:
	Version: 1.99.7
	Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
	Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
Running tool: class org.apache.sqoop.tools.tool.VerifyTool
2020-01-28 21:56:35,082 INFO  [main] core.SqoopServer (SqoopServer.java:initialize(55)) - Initializing Sqoop server.
2020-01-28 21:56:35,089 INFO  [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/sqoop/server/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop-2.10.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Verification was successful.
Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.
           
3.2.1 異常處理

ERROR[main] org.apache.sqoop.repository.common.CommonRepositoryHandler - Can't execute query: CREATE DATABASE IF NOT EXISTS"SQOOP"

在mysql中執行

#此步驟每次MySQL重新開機均需重新配置
set global sql_mode ='ANSI_QUOTES';
           

或者在mysql的配置文檔my-default.cnf中添加

#此步驟永久配置
sql_mode=ANSI_QUOTES
           

4 測試通路

4.1 啟動sqoop

4.1.1 啟動historyserver這個程序
#在hadoop的sbin目錄下執行如下指令
# /root/hadoop-2.10.0/etc/hadoop 是我的hadoop存放配置檔案的目錄
./mr-jobhistory-daemon.sh start historyserver --config /root/hadoop-2.10.0/etc/hadoop
           
4.1.2 啟動sqoop服務端
[[email protected] bin]# ./sqoop2-server start
Setting conf dir: /root/sqoop/bin/../conf
Sqoop home directory: /root/sqoop
Starting the Sqoop2 server...
2020-01-28 07:44:00,338 INFO  [main] core.SqoopServer (SqoopServer.java:initialize(55)) - Initializing Sqoop server.
2020-01-28 07:44:00,412 INFO  [main] core.PropertiesConfigurationProvider (PropertiesConfigurationProvider.java:initialize(99)) - Starting config file poller thread
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/root/sqoop/server/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hadoop-2.10.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Sqoop2 server started.
           
4.1.3 啟動sqoop用戶端
[[email protected] bin]# ./sqoop-shell
bash: ./sqoop-shell: No such file or directory
[[email protected] bin]# ./sqoop2-shell
Setting conf dir: /root/sqoop/bin/../conf
Sqoop home directory: /root/sqoop
Sqoop Shell: Type 'help' or '\h' for help.

sqoop:000> 
           

4.2 sqoop測試使用

将MySQL資料通過sqool工具傳輸到hdfs上

4.2.1 連接配接sqoop服務端
sqoop:000> set server --host master --port 12000 -webapp sqoop
Server is set successfully
           
4.2.2 檢查Sqoop服務(server)已經注冊的 connectors:
sqoop:000> show connector
+------------------------+---------+------------------------------------------------------------+----------------------+
|          Name          | Version |                           Class                            | Supported Directions |
+------------------------+---------+------------------------------------------------------------+----------------------+
| generic-jdbc-connector | 1.99.7  | org.apache.sqoop.connector.jdbc.GenericJdbcConnector       | FROM/TO              |
| kite-connector         | 1.99.7  | org.apache.sqoop.connector.kite.KiteConnector              | FROM/TO              |
| oracle-jdbc-connector  | 1.99.7  | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO              |
| ftp-connector          | 1.99.7  | org.apache.sqoop.connector.ftp.FtpConnector                | TO                   |
| hdfs-connector         | 1.99.7  | org.apache.sqoop.connector.hdfs.HdfsConnector              | FROM/TO              |
| kafka-connector        | 1.99.7  | org.apache.sqoop.connector.kafka.KafkaConnector            | TO                   |
| sftp-connector         | 1.99.7  | org.apache.sqoop.connector.sftp.SftpConnector              | TO                   |
+------------------------+---------+------------------------------------------------------------+----------------------+
           
4.2.3 建立連接配接Mysql導入的link
sqoop:000> create link -c generic-jdbc-connector 
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: link1_generic_jdbc

Database connection

Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://master/sqoop   #確定有sqoop這個資料庫
Username: root
Password: *********    #與MYSQL密碼一緻
Fetch Size: 
Connection Properties: 
There are currently 0 values in the map:
entry# 

SQL Dialect

Identifier enclose:  (此處為空格)
New link was successfully created with validation status OK and name link1_generic_jdbc
           
4.2.4 檢視link情況
sqoop:000> show link
+--------------------+------------------------+---------+
|        Name        |     Connector Name     | Enabled |
+--------------------+------------------------+---------+
| link1_generic_jdbc | generic-jdbc-connector | true    |
+--------------------+------------------------+---------+
           
4.2.5 建立導出到hdfs的link
sqoop:000> create link -c hdfs-connector
Creating link for connector with name hdfs-connector
Please fill following values to create new link object
Name: link2_hdfs

HDFS cluster

URI: hdfs://master:9000
Conf directory: /root/hadoop-2.10.0/etc/hadoop
Additional configs:: 
There are currently 0 values in the map:
entry# 
New link was successfully created with validation status OK and name link2_hdfs
           
4.2.6 檢視已建立的link具體狀态
sqoop:000> show link --all
2 link(s) to show: 
link with name link1_generic_jdbc (Enabled: true, Created by root at 1/28/20 8:13 PM, Updated by root at 1/28/20 8:13 PM)
Using Connector generic-jdbc-connector with name {1}
  Database connection
    Driver class: com.mysql.jdbc.Driver
    Connection String: jdbc:mysql://master/sqoop
    Username: root
    Password: 
    Fetch Size: 
    Connection Properties: 
  SQL Dialect
    Identifier enclose:  
link with name link2_hdfs (Enabled: true, Created by root at 1/28/20 8:14 PM, Updated by root at 1/28/20 8:14 PM)
Using Connector hdfs-connector with name {1}
  HDFS cluster
    URI: hdfs://master:9000
    Conf directory: /root/hadoop-2.10.0/etc/hadoop
    Additional configs:: 
           
4.2.7 建立資料從mysql 轉移到hdfs 的job :
sqoop:000> create job -f "link1_generic_jdbc" -t "link2_hdfs"
Creating job for links with from name link1_generic_jdbc and to name link2_hdfs
Please fill following values to create new job object
Name: jdbcToHdfs

Database source
#確定有sqoop這個資料庫
Schema name: sqoop
#user表即需要導入到hdfs庫的資料
Table name: user
SQL statement: 
Column names: 
There are currently 0 values in the list:
element# 
Partition column: 
Partition column nullable: 
Boundary query: 

Incremental read

Check column: 
Last value: 

Target configuration

Override null value: 
Null value: 
File format: 
  0 : TEXT_FILE
  1 : SEQUENCE_FILE
  2 : PARQUET_FILE
Choose: 0
Compression codec: 
  0 : NONE
  1 : DEFAULT
  2 : DEFLATE
  3 : GZIP
  4 : BZIP2
  5 : LZO
  6 : LZ4
  7 : SNAPPY
  8 : CUSTOM
Choose: 0
Custom codec: 
Output directory: /jdbctohdfs   #確定hdfs無此檔案夾,或此檔案夾為空
Append mode: 

Throttling resources

Extractors: 
Loaders: 

Classpath configuration

Extra mapper jars: 
There are currently 0 values in the list:
element# 
New job was successfully created with validation status OK  and name jdbcToHdfs
           
4.2.8 檢視job狀态
sqoop:000> show job --all
1 job(s) to show: 
Job with name jdbcToHdfs (Enabled: true, Created by root at 1/28/20 8:17 PM, Updated by root at 1/28/20 8:17 PM)
  Throttling resources
    Extractors: 
    Loaders: 
  Classpath configuration
    Extra mapper jars: 
From link: link1_generic_jdbc
  Database source
    Schema name: sqoop
    Table name: user
    SQL statement: 
    Column names: 
    Partition column: 
    Partition column nullable: 
    Boundary query: 
  Incremental read
    Check column: 
    Last value: 
To link: link2_hdfs
  Target configuration
    Override null value: 
    Null value: 
    File format: TEXT_FILE
    Compression codec: NONE
    Custom codec: 
    Output directory: /jdbctohdfs
    Append mode: 

           
4.2.9 啟動job
sqoop:000> start job -n jdbcToHdfs
2020-01-28 21:44:57 PST: FAILED 
           

運作失敗,檢視

sqoop.log

分析原因

Sqoop2(1.99.7) 安裝配置與測試

按理說應該在

sqoop/server/lib

檔案中查找對應jar包,這裡顯示的是sqoop/bin下查找的,未弄清需要在哪裡修改配置,直接把sqoop/server/lib下的jar軟連接配接過去

檢視

/root/sqoop/bin

下,已有jar包軟連接配接

Sqoop2(1.99.7) 安裝配置與測試

再次啟動,成功

sqoop:000> start job -n jdbcToHdfs
Submission details
Job Name: jdbcToHdfs
Server URL: http://master:12000/sqoop/
Created by: root
Creation date: 2020-01-28 21:58:31 PST
Lastly updated by: root
External ID: job_local292485477_0006
	http://localhost:8080/
Source Connector schema: Schema{name= sqoop . website ,columns=[
	FixedPoint{name=websiteid,nullable=true,type=FIXED_POINT,byteSize=8,signed=true},
	Text{name=companyName,nullable=true,type=TEXT,charSize=null},
	Text{name=companyAddr,nullable=true,type=TEXT,charSize=null},
	Text{name=companyInfo,nullable=true,type=TEXT,charSize=null},
	Text{name=jobInfo,nullable=true,type=TEXT,charSize=null},
	Text{name=jobAddr,nullable=true,type=TEXT,charSize=null},
	Text{name=jobName,nullable=true,type=TEXT,charSize=null},
	Text{name=salary,nullable=true,type=TEXT,charSize=null}]}
2020-01-28 21:58:34 PST: SUCCEEDED
           

在web上檢視HDFS資料上傳成功

Sqoop2(1.99.7) 安裝配置與測試