天天看點

别裁員!别裁員!别裁員! 一招降低企業資料庫IT成本

背景

2020疫情無情,多數企業是以受挫,特别中小企業,甚至到了要裁員的地步, 但是人才是最寶貴的,裁員一定是下下策,如何渡過這個難關,疫情帶給我們什麼反思?

開源節流有新方法,通常資料庫在企業IT支出中的占比将近一半,降低資料庫成本對降低企業IT成本效果明顯,但是一般企業沒有專業DBA,很難在這方面下手,不過沒關系,有了雲廠商,一切變得簡單。借助阿裡雲我們找到了可以為企業IT節省至少一倍成本的方法.

到底時什麼方法呢? 回顧一下年前做的一系列MySQL+PG聯合解決方案的課程.

《阿裡雲 RDS PostgreSQL+MySQL 聯合解決方案課程 - 彙總視訊、課件》

在衆多資料庫中, PG是一個企業級的開源資料庫, 各方面的功能與Oracle對齊, 适合範圍廣, 能處理的資料量龐大. 采用PG的大型企業例如平安,郵儲銀行,阿裡,華為,中興,人保, 招商, 富士康, 蘋果, SAP, saleforce等以及全球财富1000強等衆多企業。

《外界對PostgreSQL 的評價》

阿裡雲RDS PG的優勢:

  • 支援完整生命周期管理,包括高可用, 容災, 備份, 安全, 審計, 加密, cloud dba等子產品, 大幅降低企業的使用和管理成本.
  • 專業核心和DBA團隊 7*24小時服務.
  • 支援并行計算,LLVM,GPU加速,向量計算,分析能力更強。
  • PG的優化器強大,應對複雜SQL處理效率更高,适合複雜業務場景, 更适合新零售、制造業、工業、線上教育、遊戲、金融、政府、企業ERP等行業或領域。
  • 核心擴充, 根據垂直領域的需求定制化。
    • Ganos插件, GIS功能更強更專業,支援平面、球面幾何,栅格,時空軌迹,點雲,拓撲網絡模型。
    • pase插件, 支援高維向量搜尋, 支援精确的圖像搜尋, 人臉識别, 相似查詢.
    • roaringbitmap插件, 支援實時大資料使用者畫像, 精準營銷.
    • rdkit插件, 支援化學分析, 分子式的相似搜尋, 化學機器學習等.
  • 多模能力更強,其表現在索引更豐富,除了btree,hash還支援gin,gist,spgist,brin,bloom,rum等索引接口,适合模糊搜尋,全文檢索,多元任意搜尋,時空搜尋,高維向量(廣泛應用于圖像識别、相似特征擴選,時序搜尋,使用者畫像,化學分析,DNA檢索等。
  • 類型更加豐富,同時支援擴充類型,除了基本類型以外,支援網絡、全文檢索、數組、xml、JSON、範圍、域、樹、多元、分子、GIS等類型。支援更豐富的應用場景。
  • 支援oss_fdw, 可以将資料庫的歸檔資料存儲在oss中, 降低成本, 并且通路方法不變.

    本文将對PG和MySQL進行多方位對比, 在某些方面PG的綜合性能比MySQL高出一個數量級, PG+MySQL結合使用, 可以大幅降低企業成本.

    疫情無情PG有情, 别裁員了, 建立多元化的技術棧, 強化企業IT能力更重要.

環境

申請阿裡雲RDS PG 12執行個體, 8核32G 1500G ESSD

同硬體配置的MySQL 8.0

使用者密碼:

user123      
xxxxxx!                 

庫:

db1                 

連接配接串:

PG:

export PGPASSWORD=xxxxxx!      
psql -h pgm-bp1z26gbo3gx893a129310.pg.rds.aliyuncs.com -p 1433 -U user123 db1                 

MySQL:

mysql -h rm-bp1wv992ym962k85888370.mysql.rds.aliyuncs.com -P 3306 -u user123 --password=xxxxxx! -D db1                 

測試用的用戶端ecs centos 7.x x64安裝mysql, pg用戶端

yum install -y mysql-*      
yum install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm      
yum install -y postgresql12                 

MySQL 8.0測試

測試表

CREATE TABLE employees (      
  id INT NOT NULL,      
  fname VARCHAR(30),      
  lname VARCHAR(30),      
  birth TIMESTAMP,      
  hired DATE NOT NULL DEFAULT '1970-01-01',      
  separated DATE NOT NULL DEFAULT '9999-12-31',      
  job_code INT NOT NULL,      
  store_id INT NOT NULL      
);                 

批量寫入存儲過程

DROP PROCEDURE IF EXISTS BatchInsert;      
      
delimiter //   -- 把界定符改成雙斜杠      
CREATE PROCEDURE BatchInsert(IN init INT, IN loop_time INT)  -- 第一個參數為初始ID号(可自定義),第二個位生成MySQL記錄個數      
  BEGIN      
      DECLARE Var INT;      
      DECLARE ID INT;      
      SET Var = 0;      
      SET ID = init;      
      WHILE Var < loop_time DO      
          insert into employees      
          (id, fname, lname, birth, hired, separated, job_code, store_id)       
          values       
          (ID, CONCAT('chen', ID), CONCAT('haixiang', ID), Now(), Now(), Now(), 1, ID);      
          SET ID = ID + 1;      
          SET Var = Var + 1;      
      END WHILE;      
  END;      
//      
delimiter ;  -- 界定符改回分号                 

批量寫入20萬條

-- 開啟事務插入,否則會很慢      
      
begin;      
CALL BatchInsert(1, 200000);      
commit;      
      
Query OK, 1 row affected (7.53 sec)                 

使用insert into繼續批量寫入

mysql> insert into employees select * from employees;      
Query OK, 200000 rows affected (1.61 sec)      
Records: 200000  Duplicates: 0  Warnings: 0      
      
mysql> insert into employees select * from employees;      
Query OK, 400000 rows affected (3.25 sec)      
Records: 400000  Duplicates: 0  Warnings: 0      
      
mysql> insert into employees select * from employees;      
Query OK, 800000 rows affected (6.51 sec)      
Records: 800000  Duplicates: 0  Warnings: 0      
      
mysql> insert into employees select * from employees;      
Query OK, 1600000 rows affected (12.93 sec)      
Records: 1600000  Duplicates: 0  Warnings: 0      
      
mysql> insert into employees select * from employees;      
Query OK, 3200000 rows affected (28.61 sec)      
Records: 3200000  Duplicates: 0  Warnings: 0      
      
mysql> insert into employees select * from employees;      
Query OK, 6400000 rows affected (56.48 sec)      
Records: 6400000  Duplicates: 0  Warnings: 0      
      
mysql> insert into employees select * from employees;      
Query OK, 12800000 rows affected (1 min 55.30 sec)      
Records: 12800000  Duplicates: 0  Warnings: 0                 

查詢性能

mysql> select count(*) from employees;      
+----------+      
| count(*) |      
+----------+      
| 25600000 |      
+----------+      
1 row in set (6.15 sec)                 

求distinct性能

mysql> select count(distinct id) from employees ;      
+--------------------+      
| count(distinct id) |      
+--------------------+      
|             200000 |      
+--------------------+      
1 row in set (16.67 sec)                 

分組求distinct性能

mysql> select count(*) from (select id from employees group by id) t;      
+----------+      
| count(*) |      
+----------+      
|   200000 |      
+----------+      
1 row in set (15.52 sec)                 

再寫入200萬

begin;      
CALL BatchInsert(1, 2000000);      
commit;                 

測試表2, 寫入200萬.

CREATE TABLE employees1 (      
  id INT NOT NULL,      
  fname VARCHAR(30),      
  lname VARCHAR(30),      
  birth TIMESTAMP,      
  hired DATE NOT NULL DEFAULT '1970-01-01',      
  separated DATE NOT NULL DEFAULT '9999-12-31',      
  job_code INT NOT NULL,      
  store_id INT NOT NULL      
);      
      
DROP PROCEDURE IF EXISTS BatchInser1;      
      
delimiter //   -- 把界定符改成雙斜杠      
CREATE PROCEDURE BatchInsert1(IN init INT, IN loop_time INT)  -- 第一個參數為初始ID号(可自定義),第二個位生成MySQL記錄個數      
  BEGIN      
      DECLARE Var INT;      
      DECLARE ID INT;      
      SET Var = 0;      
      SET ID = init;      
      WHILE Var < loop_time DO      
          insert into employees1      
          (id, fname, lname, birth, hired, separated, job_code, store_id)       
          values       
          (ID, CONCAT('chen', ID), CONCAT('haixiang', ID), Now(), Now(), Now(), 1, ID);      
          SET ID = ID + 1;      
          SET Var = Var + 1;      
      END WHILE;      
  END;      
//      
delimiter ;  -- 界定符改回分号                 

使用loop insert寫入200萬行

-- 開啟事務插入,否則會很慢      
      
begin;      
CALL BatchInsert1(1, 2000000);      
commit;      
      
Query OK, 1 row affected (1 min 7.06 sec)                 

2560萬 多對一JOIN 200萬, 分組,排序

select t1.lname,count(*) from employees t1 join employees1 t2 using (id) group by t1.lname order by count(*) desc,lname limit 10;                 

簡單查詢性能(因為以上查詢幾個小時都沒有出結果, 不得不建立一個200萬的表進行查詢測試):

CREATE TABLE employees2 (      
  id INT NOT NULL,      
  fname VARCHAR(30),      
  lname VARCHAR(30),      
  birth TIMESTAMP,      
  hired DATE NOT NULL DEFAULT '1970-01-01',      
  separated DATE NOT NULL DEFAULT '9999-12-31',      
  job_code INT NOT NULL,      
  store_id INT NOT NULL      
);      
      
DROP PROCEDURE IF EXISTS BatchInser2;      
      
delimiter //   -- 把界定符改成雙斜杠      
CREATE PROCEDURE BatchInsert2(IN init INT, IN loop_time INT)  -- 第一個參數為初始ID号(可自定義),第二個位生成MySQL記錄個數      
  BEGIN      
      DECLARE Var INT;      
      DECLARE ID INT;      
      SET Var = 0;      
      SET ID = init;      
      WHILE Var < loop_time DO      
          insert into employees2      
          (id, fname, lname, birth, hired, separated, job_code, store_id)       
          values       
          (ID, CONCAT('chen', ID), CONCAT('haixiang', ID), Now(), Now(), Now(), 1, ID);      
          SET ID = ID + 1;      
          SET Var = Var + 1;      
      END WHILE;      
  END;      
//      
delimiter ;  -- 界定符改回分号      
      
-- 開啟事務插入,否則會很慢      
      
begin;      
CALL BatchInsert2(1, 2000000);      
commit;      
      
Query OK, 1 row affected (1 min 7.06 sec)                 

建立索引

create index idx_employees2_1 on employees2(id);                 

建立查詢存儲過程, 查詢200萬次.

DROP PROCEDURE IF EXISTS select1;      
      
delimiter //   -- 把界定符改成雙斜杠      
CREATE PROCEDURE select1(IN init INT, IN loop_time INT)  -- 第一個參數為初始ID号(可自定義),第二個位生成MySQL記錄個數      
  BEGIN      
      DECLARE Var INT;      
      DECLARE ID1 INT;      
      DECLARE vid INT;      
      DECLARE vfname VARCHAR(30);      
      DECLARE vlname VARCHAR(30);      
      DECLARE vbirth TIMESTAMP;      
      DECLARE vhired DATE;      
      DECLARE vseparated DATE;      
      DECLARE vjob_code INT;      
      DECLARE vstore_id INT;      
      SET Var = 0;      
      SET ID1 = init;      
      WHILE Var < loop_time DO      
          select t.id,t.fname,t.lname,t.birth,t.hired,t.separated,t.job_code,t.store_id       
          into       
            vid,vfname,vlname,vbirth,vhired,vseparated,vjob_code,vstore_id      
          from employees2 t       
          where t.id=id1;      
          SET ID1 = ID1 + 1;      
          SET Var = Var + 1;      
      END WHILE;      
  END;      
//      
delimiter ;  -- 界定符改回分号                 

基于KEY簡單查詢, 查詢200萬次的耗時.

-- 開啟事務查詢      
      
begin;      
CALL select1(1, 2000000);      
commit;      
      
      
Query OK, 1 row affected (1 min 10.23 sec)                 

MySQL 1億+:

繼續測試到1億資料量.

mysql> insert into employees select * from employees;    
Query OK, 27600000 rows affected (4 min 38.62 sec)    
Records: 27600000  Duplicates: 0  Warnings: 0    
    
mysql> insert into employees select * from employees;    
Query OK, 55200000 rows affected (11 min 13.40 sec)    
Records: 55200000  Duplicates: 0  Warnings: 0    
    
mysql> select count(*) from employees;    
+-----------+    
| count(*)  |    
+-----------+    
| 110400000 |    
+-----------+    
1 row in set (28.00 sec)    
    
mysql> select count(distinct id) from employees ;      
+--------------------+    
| count(distinct id) |    
+--------------------+    
|            2000000 |    
+--------------------+    
1 row in set (1 min 17.73 sec)    
    
    
mysql> select count(*) from (select id from employees group by id) t;      
+----------+    
| count(*) |    
+----------+    
|  2000000 |    
+----------+    
1 row in set (1 min 24.64 sec)               

1.1億全量資料更新

mysql> update employees set lname=lname||'new';  
Query OK, 110400000 rows affected, 65535 warnings (21 min 30.34 sec)  
Rows matched: 110400000  Changed: 110400000  Warnings: 220800000             

1.1億 多對一JOIN 200萬, 分組,排序, 超過3小時沒有查詢出結果.

select t1.lname,count(*) from employees t1 join employees1 t2 using (id) group by t1.lname order by count(*) desc,lname limit 10;                 

1.1億建立索引

mysql> create index idx_employees_1 on employees(id);    
Query OK, 0 rows affected (3 min 49.04 sec)    
Records: 0  Duplicates: 0  Warnings: 0               

阿裡雲RDS PostgreSQL 12測試

CREATE TABLE employees (      
  id INT NOT NULL,      
  fname VARCHAR(30),      
  lname VARCHAR(30),      
  birth TIMESTAMP,      
  hired DATE NOT NULL DEFAULT '1970-01-01',      
  separated DATE NOT NULL DEFAULT '9999-12-31',      
  job_code INT NOT NULL,      
  store_id INT NOT NULL      
);                 

直接使用srf快速寫入20萬資料

\timing      
      
insert into employees      
    (id, fname, lname, birth, hired, separated, job_code, store_id)       
select       
    ID, CONCAT('chen', ID), CONCAT('haixiang', ID), Now(), Now(), Now(), 1, ID       
from generate_series(1,200000) id;      
      
INSERT 0 200000      
Time: 355.652 ms                 

也可以使用和mysql一樣的方法loop insert寫入20萬

create or replace function BatchInsert(IN init INT, IN loop_time INT)  -- 第一個參數為初始ID号(可自定義),第二個位生成記錄個數      
returns void as $$      
DECLARE       
  Var INT := 0;      
begin      
  for id in init..init+loop_time-1 loop      
    insert into employees      
    (id, fname, lname, birth, hired, separated, job_code, store_id)       
    values       
    (ID, CONCAT('chen', ID), CONCAT('haixiang', ID), Now(), Now(), Now(), 1, ID);      
  end loop;      
end;      
$$ language plpgsql strict;      
      
      
db1=# select batchinsert(1,200000);      
 batchinsert       
-------------      
       
(1 row)      
Time: 1292.559 ms (00:01.293)                 
db1=> insert into employees select * from employees ;      
INSERT 0 400000      
Time: 322.335 ms      
db1=> insert into employees select * from employees ;      
INSERT 0 800000      
Time: 835.365 ms      
db1=> insert into employees select * from employees ;      
INSERT 0 1600000      
Time: 1622.475 ms (00:01.622)      
db1=> insert into employees select * from employees ;      
INSERT 0 3200000      
Time: 3583.787 ms (00:03.584)      
db1=> insert into employees select * from employees ;      
INSERT 0 6400000      
Time: 7277.764 ms (00:07.278)      
db1=> insert into employees select * from employees ;      
INSERT 0 12800000      
Time: 15639.482 ms (00:15.639)                 
db1=> \dt+ employees       
                      List of relations      
 Schema |   Name    | Type  |  Owner  |  Size   | Description       
--------+-----------+-------+---------+---------+-------------      
 public | employees | table | user123 | 2061 MB |       
(1 row)                 
db1=> select count(*) from employees ;      
  count         
----------      
 25600000      
(1 row)      
      
Time: 604.982 ms                 
db1=> select count(distinct id) from employees ;      
 count        
--------      
 200000      
(1 row)      
      
Time: 7852.604 ms (00:07.853)                 
db1=> select count(*) from (select id from employees group by id) t;      
 count        
--------      
 200000      
(1 row)      
      
Time: 2982.907 ms (00:02.983)                 
insert into employees      
    (id, fname, lname, birth, hired, separated, job_code, store_id)       
select       
    ID, CONCAT('chen', ID), CONCAT('haixiang', ID), Now(), Now(), Now(), 1, ID       
from generate_series(1,2000000) id;                 
CREATE TABLE employees1 (      
  id INT NOT NULL,      
  fname VARCHAR(30),      
  lname VARCHAR(30),      
  birth TIMESTAMP,      
  hired DATE NOT NULL DEFAULT '1970-01-01',      
  separated DATE NOT NULL DEFAULT '9999-12-31',      
  job_code INT NOT NULL,      
  store_id INT NOT NULL      
);      
      
      
insert into employees1      
    (id, fname, lname, birth, hired, separated, job_code, store_id)       
select       
    ID, CONCAT('chen', ID), CONCAT('haixiang', ID), Now(), Now(), Now(), 1, ID       
from generate_series(1,2000000) id;      
      
INSERT 0 2000000      
Time: 3037.777 ms (00:03.038)                 
select t1.lname,count(*) from employees t1 join employees1 t2 using (id) group by t1.lname order by count(*) desc,lname limit 10;      
     lname      | count       
----------------+-------      
 haixiang1      |   129      
 haixiang10     |   129      
 haixiang100    |   129      
 haixiang1000   |   129      
 haixiang10000  |   129      
 haixiang100000 |   129      
 haixiang100001 |   129      
 haixiang100002 |   129      
 haixiang100003 |   129      
 haixiang100004 |   129      
(10 rows)      
      
Time: 8897.907 ms (00:08.898)                 

簡單查詢性能:

create index idx_employees1_1 on employees1(id);      
CREATE INDEX      
Time: 1436.346 ms (00:01.436)                 
do language plpgsql $$       
declare      
begin      
  for i in 1..2000000 loop      
    perform * from employees1 where id=i;      
  end loop;      
end;      
$$;      
      
DO      
Time: 9515.728 ms (00:09.516)                 
db1=> select 9515.728/2000000;      
        ?column?              
------------------------      
 0.00475786400000000000      
(1 row)                 

PG 1億+:

db1=> INSERT INTO employees select * from employees;      
INSERT 0 27600000      
Time: 25050.665 ms (00:25.051)      
      
db1=> INSERT INTO employees select * from employees;      
INSERT 0 55200000      
Time: 64726.430 ms (01:04.726)                 
db1=> select count(*) from employees;      
   count         
-----------      
 110400000      
(1 row)      
      
Time: 7286.152 ms (00:07.286)                 
db1=> select count(distinct id) from employees;      
  count        
---------      
 2000000      
(1 row)      
      
Time: 39783.068 ms (00:39.783)                 
db1=> select count(*) from (select id from employees group by id) t;      
  count        
---------      
 2000000      
(1 row)      
      
Time: 14668.305 ms (00:14.668)                 
db1=> select t1.lname,count(*) from employees t1 join employees1 t2 using (id) group by t1.lname order by count(*) desc,lname limit 10;      
     lname      | count       
----------------+-------      
 haixiang1      |   516      
 haixiang10     |   516      
 haixiang100    |   516      
 haixiang1000   |   516      
 haixiang10000  |   516      
 haixiang100000 |   516      
 haixiang100001 |   516      
 haixiang100002 |   516      
 haixiang100003 |   516      
 haixiang100004 |   516      
(10 rows)      
      
Time: 33731.431 ms (00:33.731)                 

更新1.1億

db1=> update employees set lname=lname||'new';  
UPDATE 110400000  
Time: 385372.063 ms (06:25.372)               

建立索引:

db1=> create index idx_employees_1 on employees(id);    
CREATE INDEX    
Time: 70450.491 ms (01:10.450)               

MySQL vs PG 性能報表

8核32G 1500G essd雲盤, MySQL 8.0 vs PG 12

資料量 sql MySQL耗時 PG耗時 PG vs MySQL性能倍數
20萬 {寫入} 存儲過程loop insert 7.53 s 1.29 s 5.84
{寫入} SRF insert 不支援 0.36 s -
40萬 {寫入} INSERT INTO employees select * from employees; 3.25 s 0.32 s 10.16
80萬 6.51 s 0.84 s 7.75
160萬 12.93 s 1.62 s 7.95
320萬 28.61 s 3.58 s 7.99
640萬 56.48 s 7.28 s 7.76
1280萬 115.30 s 15.64 s 7.37
2760萬 278.62 s 25.05 s 11.12
5520萬 673.40 s 64.73 s 10.40
200萬 {普通查詢} KV查詢200萬次. PS: 程序模型,建議實際應用時使用連接配接池,總連接配接控制在1000以内絕佳,未來支援内置線程池,幾萬連接配接完全沒問題. 70.23 s 9.52 s 7.38
2560萬 {複雜查詢} select count(*) from employees; 6.15 s 0.60 s 10.25
{複雜查詢} select count(distinct id) from employees; 16.67 s 7.85 s 2.12
{複雜查詢} select count(*) from (select id from employees group by id) t; 15.52 s 2.98 s 5.21
1.1億 28 s 7.29 s 3.84
77.73 s 39.78 s 1.95
84.64 s 14.67 s 5.77
2760萬 多對一JOIN 200萬 {JOIN + 運算} select t1.lname,count(*) from employees t1 join employees1 t2 using (id) group by t1.lname order by count(*) desc,lname limit 10; 超過3小時未出結果 8.90 s 至少 1213.48
1.1億 多對一JOIN 200萬 33.73 s 至少 320.19
{更新} update employees set lname=concat(lname,'new'); 1290.34 s 70.45 s 18.32
{建立索引} create index idx_employees_1 on employees(id); 229.04 s 3.25

通過以上測試, 在大多數場景中, 阿裡雲RDS PG相比MySQL的綜合性能提升了1個數量級, PG+MySQL結合使用可以大幅降低企業成本. 疫情無情PG有情, 别裁員了, 建立多元化的技術棧, 強化企業IT能力更重要.

更多應用場景和使用方法請參考回顧視訊, 包括如何将mysql資料同步到pg(dts):

  • 2019.12.30 19:30 RDS PG産品概覽,如何與mysql結合使用
  • 2019.12.31 19:30 如何連接配接PG,GUI(pgadmin, navicat, dms),cli的使用
  • 2020.1.3 19:30 如何壓測PG資料庫、如何瞬間構造海量測試資料
  • 2020.1.6 19:30 mysql與pg類型、文法、函數等對應關系
  • 2020.1.7 19:30 如何将mysql資料同步到pg(dts)
  • 2020.1.8 19:30 PG外部表妙用 - mysql_fdw, oss_fdw(直接讀寫mysql、冷熱分離)
  • 2020.1.9 19:30 PG應用場景介紹 - 并行計算,實時分析
  • 2020.1.10 19:30 PG應用場景介紹 - GIS
  • 2020.1.13 19:30 PG應用場景介紹 - 使用者畫像、實時營銷系統
  • 2020.1.14 19:30 PG應用場景介紹 - 多元搜尋
  • 2020.1.15 19:30 PG應用場景介紹 - 向量計算、圖像搜尋
  • 2020.1.16 19:30 PG應用場景介紹 - 全文檢索、模糊查詢
  • 2020.1.17 19:30 pg 資料分析文法介紹
  • 2020.1.18 19:30 pg 更多功能了解:擴充文法、索引、類型、存儲過程與函數。如何加入PG技術社群 阿裡雲PG免費試用活動進行中 , 請釘釘掃碼加入咨詢:
    别裁員!别裁員!别裁員! 一招降低企業資料庫IT成本
《PG buildin pool(内置連接配接池)版本 原理與測試》

免費領取阿裡雲RDS PostgreSQL執行個體、ECS虛拟機