Sqoop 将mysql資料導入到hive分區表Sqoop 使用——将mysql資料導入到hive分區表

2023-03-18 07:38:30

@羲凡——隻為了更好的活着

Sqoop 使用——将mysql資料導入到hive分區表

前期準備

a./etc/profile添加

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/*

b.将hive-site.xml 拷貝到 $SQOOP_HOME/conf目錄下(否則報錯找不到hive庫)

0.參數說明

--connect #關系型資料庫連接配接
--username #關系型資料庫連接配接使用者名
--password #關系型資料庫連接配接密碼
--table #關系型資料庫的表
--split-by #如果-m的數量不為1，則一定要加上該參數且最好是數值類型，否則會報錯
--direct　#快速模式，使用mysql自帶的mysqldump導出資料
--delete-target-dir　#如果hdfs的目錄已經存在則先删除
--target-dir #導入到hdfs時的目标目錄
--export-dir #從hdfs導出時的源目錄
--fields-terminated-by #導入到hdfs時的hdfs檔案分隔符
--input-fields-terminated-by #從hdfs導出時的hdfs檔案分隔符
--hive-drop-import-delims #導入hive中的資料某列中如果有換行符或Enter鍵可以删除
--hive-database #hive的資料庫
--hive-table #hive的表
--hive-overwrite #覆寫之前的分區插入資料
--hive-partition-key #hive分區字段
--hive-partition-value #hive分區值
-m #指定map數量，也是生成的檔案數

特别說明:如果指定的map的數量不為1，則一定要加上–split-by參數且最好是數值類型

1.準備mysql

CREATE DATABASE test_data;
CREATE TABLE test_data.mysql_stu_info (
  `name` varchar(20),
  `age` int(5) ,
  primary key (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

insert into test_data.mysql_stu_info values("aaron",666);
insert into test_data.mysql_stu_info values("yaoyao",777);

2.準備hive

CREATE DATABASE test_data;
CREATE TABLE test_data.stu_info(
  `name` string, 
  `age` int)
PARTITIONED BY (ymday string)
ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t';

3.sqoop腳本

sqoop import \
--connect "jdbc:mysql://deptest75:3306/test_data?useUnicode=true&characterEncoding=utf8" \
--username root \
--password 1q2w3e4r \
--table mysql_stu_info \
--delete-target-dir \
--hive-drop-import-delims \
--hive-import \
--hive-overwrite \
--hive-database test_data \
--hive-table stu_info \
--hive-partition-key ymday \
--hive-partition-value 20190329 \
--split-by age \
--fields-terminated-by '\t' \
-m 6

4.結果展示

hive (test_data)> select * from test_data.stu_info;
OK
stu_info.name	stu_info.age	stu_info.ymday
aaron	666	20190329
yaoyao	777	20190329
Time taken: 0.142 seconds, Fetched: 2 row(s)

5.延伸

如果你需要将mysql的表篩選後再導入到hive分區表中

a.需要用 --query ，後面跟查詢語句

b.必須加上 --target-dir，後面跟hive分區表的location路徑

示例：

sqoop import \
--connect "jdbc:mysql://deptest75:3306/test_data?useUnicode=true&characterEncoding=utf8" \
--username root \
--password 1q2w3e4r \
--query 'select name, age from mysql_stu_info where $CONDITIONS' \
--target-dir /hive/warehouse/test_data.db/stu_info \
--delete-target-dir \
--hive-import \
--hive-overwrite \
--hive-database test_data \
--hive-table stu_info \
--hive-partition-key ymday \
--hive-partition-value 20190404 \
--fields-terminated-by '\t' \
-m 1

====================================================================

@羲凡——隻為了更好的活着

若對部落格中有任何問題，歡迎留言交流

Sqoop 将mysql資料導入到hive分區表Sqoop 使用——将mysql資料導入到hive分區表

Sqoop 使用——将mysql資料導入到hive分區表

0.參數說明

1.準備mysql

2.準備hive

3.sqoop腳本

4.結果展示

5.延伸

繼續閱讀

KETTLE實作循環批量多表抽取添加字段

微擎背景導入excel

Eclipse項目導入MyEclipse中無法編譯問題

KuduMaster 多節點配置

采集日志Flume的叢集搭建與詳細配置

為什麼使用Hive和Impala

ETL 資料加載機制概述

大資料計算前資料抽取（ETL）概述

ETL的簡單了解

ETL思想（2021-05-31）

ETL詳解

ETL：etl簡介

《手把手陪您學Python》30——子產品

HiveQL(二):分區表

JSONObject包導入異常 java.lang.NoClassDefFoundErrorweb項目的導入包的問題