在HortonWorks HDP 2.1 和2.2 叢集間進行資料遷移包括（Hive資料表）

2023-08-05 06:10:20

我之前搭建了一個基于HDP 2.1 的叢集。現在又根據需要重新搭建了一個新的HDP2.2版本的叢集準備做新的生産環境。 HDP2.1 叢集上大約有600GB的資料，主要以Hive資料表格的形式存在。是以需要将HDP2.1叢集的資料遷移到新叢集上來。

實施的思路參考了這篇文章：

https://amalgjose.wordpress.com/2013/10/11/migrating-hive-from-one-hadoop-cluster-to-another-cluster-2/

1) Install hive in the new hadoop cluster 在新叢集中安裝Hive，及meta store等

2) Transfer the data present in the hive metastore directory (/user/hive/warehouse) to the new hadoop

cluster 通過hadoop distcp 指令将資料檔案夾從老叢集copy到新叢集。

具體指令

hadoop distcp hdfs://[oldcluster.fqdn]:8020/user/hive hdfs://[newcluster.fqdn]:8020/user/

hadoop distcp hdfs://[oldcluster.fqdn]:8020/apps/hive/warehouse hdfs://[newcluster.fqdn]:8020/apps/hive/

distcp 指令的說明http://hadoop.apache.org/docs/r1.2.1/distcp.html

3) take the mysql metastore dump.

在老叢集的hive metastore所在的節點上使用資料庫dump工具将metastore 庫的資料都dump出來。 hive metastore存儲了hive表的結構，資料庫資訊等中繼資料。我所使用的hive metastore 是mysql 資料庫是以使用下面的指令行

Database Type	Backup	Restore
MySQL	`mysqldump $dbname > $outputfilename.sql` For example: `mysqldump hive > /tmp/mydir/backup_hive.sql`	`mysql $dbname < $inputfilename.sql` For example: `mysql hive < /tmp/mydir/backup_hive.sql`
Postgres	`sudo -u $username pg_dump $databasename > $outputfilename.sql` For example: `sudo -u postgres pg_dump hive > /tmp/mydir/backup_hive.sql`	`sudo -u $username psql $databasename < $inputfilename.sql` For example: `sudo -u postgres psql hive < /tmp/mydir/backup_hive.sql`
Oracle	Connect to the Oracle database using `sqlplus` export the database: `exp username/[email protected] full=yes file=output_file.dmp`	Import the database: `imp username/[email protected] ile=input_file.dmp`

4) Install mysql in the new hadoop cluster （這一步可忽略，因為安裝HDP2.2叢集時本來就已經安裝了hive 及hive metastore

5) Open the hive mysql-metastore dump using text readers such as notepad, notepad++ etc and search for

hdfs://ip-address-old-namenode:port and replace with hdfs://ip-address-new-namenode:port and save it.

将剛才導出的資料庫dump 檔案，複制到本地，使用文本編輯器查找替換老叢集的namenode的域名，為新叢集的namenode

Where ip-address-old-namenode is the ipaddress of namenode of old hadoop cluster and ip-address-

new-namenode is the ipaddress of namenode of new hadoop cluster.

6) After doing the above steps, restore the editted mysql dump into the mysql of new hadoop cluster.

将修改後的資料庫dump檔案複制到新叢集的hive metastore所在的節點上。并使用mysql 指令行，将資料導入進去。

For example:

mysql hive < /tmp/mydir/backup_hive.sql

7) Configure hive as normal and do the hive schema upgradations if needed.

這時，應該已經可以使用hive shell或者hue 來檢視導入的資料表是否可以通路了。

繼續閱讀