天天看點

pg_rman備份恢複測試

環境描述

1.OS

CentOS Linux release 7.2.1511 (Core) X64

2.PostgreSQL

PostgreSQL 9.6.1

3.pg_rman

pg_rman-1.3.3-pg96.tar.gz v1.3.3

注意:請下載下傳版本對應的源碼包。

https://github.com/ossc-db/pg_rman/releases/download/v1.3.3/pg_rman-1.3.3-pg96.tar.gz

pg_rman-1.3.3.tar.gz(此源碼編譯過程中報錯)

系統包

zlib-devel

二、pg_rman安裝

1.安裝pg_rman

root使用者登入

export PATH=/opt/pgsql/9.6.1/bin:$PATH

export LD_LIBRARY_PATH=/opt/pgsql/9.6.1/lib

export MANPATH=/opt/pgsql/9.6.1/share/man:$MANPATH

# tar zxvf pg_rman-9_6_STABLE.tar.gz

# cd pg_rman-9_6_STABLE/

# make 

......

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O2 backup.o catalog.o data.o delete.o dir.o init.o parray.o pg_rman.o restore.o show.o util.o validate.o xlog.o pgsql_src/pg_ctl.o pgut/pgut.o pgut/pgut-port.o -L/opt/pgsql/9.6.1/lib -lpgcommon -lpgport -L/opt/pgsql/9.6.1/lib -lpq -L/opt/pgsql/9.6.1/lib -Wl,--as-needed -Wl,-rpath,'/opt/pgsql/9.6.1/lib',--enable-new-dtags  -lpgcommon -lpgport -lz -lreadline -lrt -lcrypt -ldl -lm -o pg_rman

# make install

/usr/bin/mkdir -p '/opt/pgsql/9.6.1/bin'

/usr/bin/install -c  pg_rman '/opt/pgsql/9.6.1/bin'

2.安裝驗證

su - postgres

$ pg_rman --version

pg_rman 1.3.3

3.配置資料庫參數

wal_level = replica

archive_mode = on

archive_command = 'test ! -f /pg_arclog/%f && cp %p /pg_arclog/%f'

--- root user

mkdir /backup_pg_rman /pg_arclog 

chown -R postgres:postgres /backup_pg_rman

chown -R postgres:postgres /pg_arclog

--- postgresql

# pg_rman init -B $backup_dir 

三、備份恢複測試

1.備份資料(full<0> + incremental<1>)

# full

export PGDATA=/pgdata96

export BACKUP_PATH=/backup_pg_rman

$ echo $PGDATA

/pgdata96

$ echo $BACKUP_PATH

/backup_pg_rman

--- init backup dir: pg_rman init -B $backup_dir -D $PGDATA(當不配置環境變量時,手工指定,注意路徑末尾不添加'/'結束符)

$ pg_rman init

INFO: ARCLOG_PATH is set to '/pg_arclog'

INFO: SRVLOG_PATH is set to '/pgdata96/pg_log'

$  

$ cat $BACKUP_PATH/pg_rman.ini

ARCLOG_PATH='/pg_arclog'

SRVLOG_PATH='/pgdata96/pg_log'

--- full backup

$ pg_rman backup --backup-mode=full --with-serverlog --progress

INFO: copying database files

Processed 1172 of 1172 files, skipped 0

INFO: copying archived WAL files

Processed 3 of 3 files, skipped 0

INFO: copying server log files

Processed 4 of 4 files, skipped 0

INFO: backup complete

INFO: Please execute 'pg_rman validate' to verify the files are correctly copied.

--- validate backup

$ pg_rman validate, status: done

INFO: validate: "2017-03-06 16:43:39" backup, archive log files and server log files by CRC

INFO: backup "2017-03-06 16:43:39" is valid

--- show backup, status: ok 

$ pg_rman show

==========================================================

 StartTime           Mode  Duration    Size   TLI  Status 

2017-03-06 16:43:39  FULL        0m    58MB     1  OK

--- incremental

$ pg_rman backup --backup-mode=incremental --with-serverlog --progress

Processed 1172 of 1172 files, skipped 1115

Processed 48 of 48 files, skipped 3

Processed 4 of 4 files, skipped 3

$ pg_rman validate

INFO: validate: "2017-03-06 17:04:45" backup, archive log files and server log files by CRC

INFO: backup "2017-03-06 17:04:45" is valid

--- show, status: ok

$ pg_rman show detail

============================================================================================================

 StartTime           Mode  Duration    Data  ArcLog  SrvLog   Total  Compressed  CurTLI  ParentTLI  Status  

2017-03-06 17:04:45  INCR        0m   401MB   738MB    27kB  1136MB       false       1          0  OK

2017-03-06 16:43:39  FULL        0m    30MB    33MB   206kB    58MB       false       1          0  OK

2.模拟災難恢複

1).删除PGDATA 目錄下所有檔案

安全停止資料庫,删除檔案

$ pg_ctl stop -m immediate -D /pgdata96/

$ cd /pgdata96

$ rm -rf *.*

2).恢複備份

--- postgres user

$ export PGDATA=/pgdata96

$ export BACKUP_PATH=/backup_pg_rman

$ pg_rman restore

WARNING: pg_controldata file "/pgdata96/global/pg_control" does not exist

INFO: the recovery target timeline ID is not given

INFO: use timeline ID of latest full backup as recovery target: 1

INFO: calculating timeline branches to be used to recovery target point

INFO: searching latest full backup which can be used as restore start point

INFO: found the full backup can be used as base in recovery: "2017-03-06 16:43:39"

INFO: copying online WAL files and server log files

INFO: clearing restore destination

INFO: validate: "2017-03-06 16:43:39" backup, archive log files and server log files by SIZE

INFO: restoring database files from the full mode backup "2017-03-06 16:43:39"

INFO: searching incremental backup to be restored

INFO: validate: "2017-03-06 17:04:45" backup, archive log files and server log files by SIZE

INFO: restoring database files from the incremental mode backup "2017-03-06 17:04:45"

INFO: searching backup which contained archived WAL files to be restored

INFO: restoring WAL files from backup "2017-03-06 17:04:45"

INFO: restoring online WAL files and server log files

INFO: generating recovery.conf

INFO: restore complete

HINT: Recovery will start automatically when the PostgreSQL server is started.

3).啟動資料庫驗證資料

# /etc/init.d/postgresql start

Starting PostgreSQL: ok

切換至postgres使用者,然後驗證資料

基于時間點恢複

建立測試資料

testdb=# create table tbl(id int primary key, first varchar(20),second varchar(20));

CREATE TABLE

testdb=# INSERT INTO tbl VALUES(generate_series(1,1000000), 'first'||(random()*(10^3))::integer, 'second'||(random()*(10^3))::integer);

INSERT 0 1000000

testdb=#

建立全備份

Processed 27 of 27 files, skipped 0

Processed 1 of 1 files, skipped 0

2017-03-07 16:57:33  FULL        0m   433MB     4  DONE

INFO: validate: "2017-03-07 16:57:33" backup, archive log files and server log files by CRC

INFO: backup "2017-03-07 16:57:33" is valid

[postgres@localhost ~]$ pg_rman show

2017-03-07 16:57:33  FULL        0m   433MB     4  OK

$

drop 表

testdb=# drop table tbl;

DROP TABLE

testdb=# \q

停止資料庫

# /etc/init.d/postgresql stop

恢複資料庫到指定時間

$ pg_rman restore --recovery-target-time '2017-03-07 16:58:33'

INFO: use timeline ID of current database cluster as recovery target: 4

INFO: found the full backup can be used as base in recovery: "2017-03-07 16:57:33"

INFO: validate: "2017-03-07 16:57:33" backup, archive log files and server log files by SIZE

INFO: restoring database files from the full mode backup "2017-03-07 16:57:33"

INFO: restoring WAL files from backup "2017-03-07 16:57:33"

啟動資料庫

驗證資料

$ psql testdb

psql (9.6.1)

Type "help" for help.

testdb=# \dt

        List of relations

 Schema | Name | Type  |  Owner   

--------+------+-------+----------

 public | tbl  | table | postgres

(1 row)

testdb=# select count(*) from tbl;

  count  

---------

 1000000

異常停止資料恢複

描述:當資料庫沒有成功執行檢查點完成,恢複時可能會丢失資料,錯誤排查

現象:啟動資料庫失敗時

$ more postgresql-Mon.log 

2017-03-06 17:20:47 CST [3240]: [1-1] user=,db= LOG:  database system was interrupted; last known up at 2017-03-06 17:04:51 CST

2017-03-06 17:20:47 CST [3240]: [2-1] user=,db= LOG:  starting archive recovery

2017-03-06 17:20:47 CST [3240]: [3-1] user=,db= LOG:  invalid primary checkpoint record

2017-03-06 17:20:47 CST [3240]: [4-1] user=,db= LOG:  invalid secondary checkpoint record

2017-03-06 17:20:47 CST [3240]: [5-1] user=,db= PANIC:  could not locate a valid checkpoint record

2017-03-06 17:20:47 CST [3238]: [3-1] user=,db= LOG:  startup process (PID 3240) was terminated by signal 6: Aborted

2017-03-06 17:20:47 CST [3238]: [4-1] user=,db= LOG:  aborting startup due to startup process failure

2017-03-06 17:20:47 CST [3238]: [5-1] user=,db= LOG:  database system is shut down

2017-03-06 17:21:23 CST [3269]: [1-1] user=,db= LOG:  database system was interrupted; last known up at 2017-03-06 17:04:51 CST

2017-03-06 17:21:23 CST [3269]: [2-1] user=,db= LOG:  starting archive recovery

2017-03-06 17:21:23 CST [3269]: [3-1] user=,db= LOG:  invalid primary checkpoint record

2017-03-06 17:21:23 CST [3269]: [4-1] user=,db= LOG:  invalid secondary checkpoint record

2017-03-06 17:21:23 CST [3269]: [5-1] user=,db= PANIC:  could not locate a valid checkpoint record

2017-03-06 17:21:23 CST [3267]: [3-1] user=,db= LOG:  startup process (PID 3269) was terminated by signal 6: Aborted

2017-03-06 17:21:23 CST [3267]: [4-1] user=,db= LOG:  aborting startup due to startup process failure

2017-03-06 17:21:23 CST [3267]: [5-1] user=,db= LOG:  database system is shut down

處理步驟說明:

重置事務日志

僅保留備份時資料

$ pg_resetxlog -f /pgdata96

Transaction log reset

然後啟動資料庫,驗證部分資料

本文轉自 pgmia 51CTO部落格,原文連結:http://blog.51cto.com/heyiyi/1903709