天天看點

PostgreSQL 與 MSSQL(SQL Server) 之間 資料互相遷移、導入、導出測試

标簽

PostgreSQL , ms sql , SQL Server

https://github.com/digoal/blog/blob/master/201808/20180819_02.md#%E8%83%8C%E6%99%AF 背景

測試表結構

create table test (id int, info text);  
           

https://github.com/digoal/blog/blob/master/201808/20180819_02.md#%E4%BB%8E-postgresql-%E5%AF%BC%E5%85%A5-mssql 從 PostgreSQL 導入 MSSQL

MS SQL bcp與BULK INSERT都不支援stdin,是以我這裡使用落地到檔案的方法,從PostgreSQL導入MS SQL。

1、psql寫出1億行記錄到本地檔案

time psql -h /tmp -p 1925 -U postgres postgres -c "copy (select id,md5(random()::text) from generate_series(1,100000000) t(id)) to stdout WITH (NULL '')" > /data01/test.out  
  
real    2m1.441s  
user    0m10.535s  
sys     0m12.536s  
           

2、使用bcp,從本地檔案批量加載到ms sql

bcp test in /data01/test.out -c -k -S localhost -U SA -P Digoal_mssql -b 65535 -a 65535 -r \\n 
  
Starting copy...  
  
100000000 rows copied.  
Network packet size (bytes): 4096  
Clock Time (ms.) Total     : 779490 Average : (128289.0 rows per sec.)  
           

https://github.com/digoal/blog/blob/master/201808/20180819_02.md#%E4%BB%8E-mssql-%E4%B8%8D%E8%90%BD%E5%9C%B0-%E5%AF%BC%E5%85%A5-postgresql 從 MSSQL 不落地 導入 PostgreSQL

https://github.com/digoal/blog/blob/master/201808/20180819_02.md#%E4%BD%BF%E7%94%A8named-pipe 使用named pipe

1、建立fifo管道

mkpipe /tmp/namepipe  
           

2、PostgreSQL,使用copy command的服務端PROGRAM調用接口,從管道讀取内容,寫入test表

time psql -h /tmp -p 1925 -U postgres postgres -c "copy test from PROGRAM 'cat /tmp/namepipe' WITH (NULL '')"  
  
COPY 100000103  
  
real    4m57.212s  
user    0m0.002s  
sys     0m0.004s  
           

3、MS SQL,使用bcp,批量導出資料,寫到管道

bcp test out /tmp/namepipe -c -k -S localhost -U SA -P Digoal_mssql -b 65535 -a 65535 -r \\n
  
Starting copy...  
  
100000103 rows copied.  
Network packet size (bytes): 32576  
Clock Time (ms.) Total     : 301248 Average : (331952.8 rows per sec.)  
           

https://github.com/digoal/blog/blob/master/201808/20180819_02.md#%E6%88%96%E8%80%85%E7%9B%B4%E6%8E%A5%E4%BD%BF%E7%94%A8stdout-stdin 或者直接使用stdout, stdin:

# bcp test out /tmp/namepipe -c -k -S localhost -U SA -P Digoal_mssql -b 65535 -a 65535 -r \\n | psql -h /tmp -p 1925 -U postgres postgres -c "copy test from PROGRAM 'cat /tmp/namepipe' WITH (NULL '')"
           

https://github.com/digoal/blog/blob/master/201808/20180819_02.md#%E5%B0%8F%E7%BB%93 小結

1、如果mssql的導入導出都可以支援管道,使用起來會更加的友善。目前看隻有導出可以支援管道,導入時使用管道會報錯(使用bcp, bulk insert試過是這樣的情況)。

1.1、檔案内容與pipe如下

# cat /tmp/test  
1       test  
  
# mkfifo /tmp/namepipe  
  
# ll /tmp/test  
-rw-r--r-- 1 root root 7 Aug 20 00:15 /tmp/test  
  
# ll /tmp/namepipe  
prwxrwxrwx 1 root root 0 Aug 20 00:07 /tmp/namepipe  
  
# cat /tmp/test|cat /dev/stdin  
1       test  
           

1.2、bcp, BULK INSERT無法正确讀取來自stdin與namepipe的内容

cat test|bcp test in /dev/stdin -c -S 127.0.0.1 -U SA -P Digoal_mssql -a 65535 -r \\n  
  
Starting copy...  
  
0 rows copied.  # 正常的話這裡應該是COPY 1條  
Network packet size (bytes): 32576  
Clock Time (ms.) Total     : 1       
           
cat /tmp/test > /tmp/namepipe|bcp test in /tmp/namepipe -c -S localhost -U SA -P Digoal_mssql -a 65535 -r \\n   
  
Starting copy...  
  
0 rows copied.  # 正常的話這裡應該是COPY 1條  
Network packet size (bytes): 32576  
Clock Time (ms.) Total     : 1       
           
cat test|sqlcmd -S localhost -U SA -P  'Digoal_mssql' -Q "bulk insert test from '/dev/stdin'"  
  
(0 rows affected)  # 正常的話這裡應該是COPY 1條  
           
# cat /tmp/test > /tmp/namepipe  
  
another session:  
  
# sqlcmd -S localhost -U SA -P  'Digoal_mssql' -Q "bulk insert test from '/tmp/namepipe'"  
  
Msg 4860, Level 16, State 1, Server iZbp13nu0s9j3x3, Line 1  
Cannot bulk load. The file "/tmp/namepipe" does not exist or you don't have file access rights.  
# 正常的話這裡應該是COPY 1條  


psql -h 127.0.0.1 -p 1925 -U postgres postgres -c "copy (select generate_series(1,10000),md5(random()::text)) to stdout;" |bcp test in /dev/stdin -c -S localhost -U SA -P Digoal_mssql -a 65535 -r \\n   

Starting copy...

BCP copy in failed
# 正常的話應該導入10000條。  
           

1.3、bcp, BULK INSERT直接從檔案讀取内容正常

# bcp test in /tmp/test -c -S 127.0.0.1 -U SA -P Digoal_mssql -a 65535 -r \\n  
Starting copy...  
  
1 rows copied.  
Network packet size (bytes): 32576  
Clock Time (ms.) Total     : 2      Average : (500.0 rows per sec.)  
  
# sqlcmd -S localhost -U SA -P  'Digoal_mssql' -Q "bulk insert test from '/tmp/test'"  
  
(1 rows affected)  
           

2、PostgreSQL在資料庫服務對端、用戶端、協定層都支援COPY協定,資料的進出都非常友善。

3、批量寫入加載速度對比

MS SQL: 12.8萬行/s

PostgreSQL: 33.2萬行/s

4、格式問題,如果在資料内容中出現了分隔符的值,bcp的output模式并不會對其進行處理。 但是可以輸出為bcp自己識别的fmt。而輸出到文本後直接導入到PG并不适合。

例如這裡的内容中用到了制表符、逗号等。導入到pg就存在問題。 

create table test1 (id int, info text, c1 int, c2 text);
insert into test1 values (null,null,null,'abc');
insert into test1 values (null,null,1,'abc');
insert into test1 values (1,'test	,		test',1,'a,b,c,		d		d');
insert into test1 values (1,'test	test		test		test',1,'a			a');
  
需要加入轉義的過程,解決這個跨産品遷移的問題。
           
<乘數科技在sqlserver遷移PostgreSQL項目中總結的文檔-sqlserver轉PG經驗總結及PG的一些特性>

關于轉義,可能會将其資料遷移産品,與rds_dbsync一樣開源出來。

https://github.com/digoal/blog/blob/master/201808/20180819_02.md#%E5%8F%82%E8%80%83 參考

1、man mkfifo

2、bcp

https://docs.microsoft.com/en-us/sql/tools/bcp-utility?view=sql-server-2017

3、BULK INSERT

https://docs.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-2017

4、關于PostgreSQL的服務端copy與協定層copy(用戶端copy)

《PostgreSQL 服務端COPY和用戶端COPY - 暨PG有哪些服務端操作接口》 《PostgreSQL copy (quote,DELIMITER,...) single byte char 的輸入》

5、

《MSSQL(SQL Server) on Linux 簡明部署與使用》

6、

https://stackoverflow.com/questions/2197017/can-sql-server-bulk-insert-read-from-a-named-pipe-fifo