背景
- MySQL5.7.21
- Centos 7.4
- innodb_force_recovery=0
資訊收集
- MySQL實體備份報錯
190425 11:58:54 >> log scanned up to (174805994673)
190425 11:58:55 >> log scanned up to (174805995101)
524288000 of 524288000 100% in 9s 55.32 MB/s done
190425 11:58:56 >> log scanned up to (174805995156)
190425 11:58:57 >> log scanned up to (174805995156)
190425 11:58:58 >> log scanned up to (174805995165)
190425 11:58:59 >> log scanned up to (174805995485)
[01] xtrabackup: Database page corruption detected at page 143665, retrying...
[01] xtrabackup: Database page corruption detected at page 143665, retrying...
[01] xtrabackup: Database page corruption detected at page 143665, retrying...
[01] xtrabackup: Database page corruption detected at page 143665, retrying...
[01] xtrabackup: Database page corruption detected at page 143665, retrying...
[01] xtrabackup: Database page corruption detected at page 143665, retrying...
190425 11:59:00 >> log scanned up to (174805995593)
[01] xtrabackup: Database page corruption detected at page 143665, retrying...
[01] xtrabackup: Database page corruption detected at page 143665, retrying...
[01] xtrabackup: Database page corruption detected at page 143665, retrying...
[01] xtrabackup: Error: failed to read page after 10 retries. File ./sbtest/sbtest4.ibd seems to be corrupted.
[01] xtrabackup: Error: xtrabackup_copy_datafile() failed.
[01] xtrabackup: Error: failed to copy datafile.
upload: '<stdin>' -> 's3://mysql-test-87771/1556164635.tar' [part 10, 331MB]
347810779 of 347810779 100% in 6s 54.92 MB/s done
Connection to mysql-test-877710-0.mysql-test-877710-headless.default.svc.clust closed.
- 通過xtrabackup日志可以發現一個重要的報錯資訊:
File ./sbtest/sbtest4.ibd seems to be corrupted.
- 手動到MySQL查詢檢查表sbtest4的,發現MySQL會重新開機
mysql> show variables like 'innodb_force_recovery';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| innodb_force_recovery | 0 |
+-----------------------+-------+
1 row in set (0.00 sec)
mysql> use sbtest
Database changed
mysql> check table sbtest.sbtest4;
ERROR 2013 (HY000): Lost connection to MySQL server during query
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
- 檢視error.log,發現MySQL異常崩潰,并重新啟動
2019-04-25T15:41:51.843625+08:00 0 [ERROR] InnoDB: Database page corruption on disk or a failed file read of page [page id: space=135, page number=143665]. You may have to recover from a backup.
2019-04-25T15:41:51.843680+08:00 0 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
len 16384; hex
...
InnoDB: End of page dump
2019-04-25T15:49:21.251992+08:00 0 [Note] InnoDB: Uncompressed page, stored checksum in field1 3293893214, calculated checksums for field1: crc32 3293893214/4282163675, innodb 271698160, none 3735928559, stored checksum in field2 3013555396, calculated checksums for field2: crc32 3293893214/4282163675, innodb 1359747787, none 3735928559, page LSN 40 340032547, low 4 bytes of LSN at page end 2116856341, page number (if stored to page already) 143665, space id (if created with >= MySQL-4.1.1 and stored already) 135
InnoDB: Page may be an index page where index id is 140
2019-04-25T15:49:21.252039+08:00 0 [Note] InnoDB: Index 140 is `PRIMARY` in table `sbtest`.`sbtest4`
2019-04-25T15:49:21.252048+08:00 0 [Note] InnoDB: It is also possible that your operating system has corrupted its own file cache and rebooting your computer removes the error. If the corrupt page is an index page. You can also try to fix the corruption by dumping, dropping, and reimporting the corrupt table. You can use CHECK TABLE to scan your table for corruption. Please refer to http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html for information about forcing recovery.
...
- 通過一系列日志收集,可以發現innodb_force_recovery=0時,index page對MySQL的啟動無影響,但是有SQL查詢到壞頁的時候,MySQL會出現崩潰的情況
原因分析
導緻page損壞的原因
- 硬體問題
- 驅動程式錯誤
- 核心錯誤
- 電源故障
- 罕見的MySQL錯誤
參數解析
- innodb_force_recovery預設為0(沒有強制恢複的正常啟動)。
- 僅在緊急情況下将innodb_force_recovery設定為大于0的值,以便您可以啟動InnoDB并轉儲表。在此之前,請確定您擁有資料庫備份,以備需要重新建立時使用。值為4或更高可能會永久損壞資料檔案。強制InnoDB恢複時,應始終以innodb_force_recovery = 1開始,并且隻在必要時逐漸增加值。
- 如果您能夠以innodb_force_recovery值為3或更低的值轉儲表,那麼您相對安全,隻有損壞的單個頁面上的某些資料會丢失。值為4或更高被認為是危險的,因為資料檔案可能會永久損壞。值6被認為是激烈的,因為資料庫頁面處于過時狀态,這反過來可能會在B樹和其他資料庫結構中引入更多損壞。
- innodb_force_recovery可以設定為6個非零值:1~6,大的數字包含了前面所有小數字的所有功能,具體情況如下:
- 1(SRV_FORCE_IGNORE_CORRUPT):忽略檢查到的corrupt頁。即使檢測到損壞的頁面,也允許伺服器運作。嘗試使SELECT * FROM tbl_name跳過損壞的索引記錄和頁面,這有助于轉儲表。
- 2(SRV_FORCE_NO_BACKGROUND):防止主線程和任何清除線程運作。如果在清除操作期間發生崩潰,則此恢複值會阻止它。
- 3(SRV_FORCE_NO_TRX_UNDO):崩潰恢複後不運作事務復原。
- 4(SRV_FORCE_NO_IBUF_MERGE):阻止插入緩沖區合并操作。如果它們會導緻崩潰,則不會這樣做。不計算表統計資訊。此值可能會永久損壞資料檔案。使用此值後,請準備删除并重新建立所有二級索引。将InnoDB設定為隻讀。
- 5(SRV_FORCE_NO_UNDO_LOG_SCAN):啟動資料庫時不檢視撤消日志(undo log):InnoDB甚至将未完成的事務視為已送出。此值可能會永久損壞資料檔案。将InnoDB設定為隻讀。
- 6(SRV_FORCE_NO_LOG_REDO):不執行與恢複相關的重做日志前滾。此值可能會永久損壞資料檔案。使資料庫頁面處于過時狀态,這反過來可能會在B樹和其他資料庫結構中引入更多損壞。将InnoDB設定為隻讀。
需要注意的是,當設定參數innodb_force_recovery大于0後,可以對表進行select、create、drop操作,但insert、update或者delete這類操作是不允許的。如果innodb_force_recovery設定為4或更高,則将InnoDB置于隻讀模式。
您可以從表中進行SELECT以轉儲它們。如果innodb_force_recovery值為3或更小,您可以DROP或CREATE表。 Innodb_force_recovery值也大于3,支援DROP TABLE,最高可達MySQL 5.7.17。從MySQL 5.7.18開始,不允許在innodb_force_recovery值大于4的情況下使用DROP TABLE。
如果您知道給定的表導緻復原崩潰,則可以删除它。如果遇到由大量導入失敗或ALTER TABLE導緻的失控復原,則可以終止mysqld程序并将innodb_force_recovery設定為3以使資料庫無需復原,然後DROP導緻失控復原的表。
如果表資料中的損壞阻止您轉儲整個表内容,則具有ORDER BY primary_key DESC子句的查詢可能能夠在損壞的部分之後轉儲表的一部分。
如果啟動InnoDB需要高innodb_force_recovery值,則可能存在可能導緻複雜查詢(包含WHERE,ORDER BY或其他子句的查詢)失敗的資料結構損壞。在這種情況下,您可能隻能運作基本的SELECT * FROM t查詢。
- 設定innodb_force_recovery=1時,對表操作
mysql> show variables like 'innodb_force_recovery';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| innodb_force_recovery | 1 |
+-----------------------+-------+
1 row in set (0.00 sec)
mysql> check table sbtest4;
+----------------+-------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+----------------+-------+----------+----------+
| sbtest.sbtest4 | check | status | OK |
+----------------+-------+----------+----------+
1 row in set (1 min 4.94 sec)
mysql> select count(*) from sbtest4;
+----------+
| count(*) |
+----------+
| 10997913 |
+----------+
1 row in set (3.58 sec)
mysql> drop table sbtest4_bak;
Query OK, 0 rows affected (0.21 sec)
mysql,> create table sbtest4_bak like sbtest4;
Query OK, 0 rows affected (0.45 sec)
mysql> insert into sbtest4_bak select * from sbtest4;
ERROR 1881 (HY000): Operation not allowed when innodb_forced_recovery > 0.
mysql> alter table sbtest4_bak engine=myisam MAX_ROWS=10000000;
ERROR 1025 (HY000): Error on rename of './sbtest/sbtest4_bak' to './sbtest/#sql2-1-1daa' (errno: 190 - Operation not allowed when innodb_forced_recovery > 0)
- 在參數innodb_force_recovery為1時,可以對表進行select、create、drop操作,但insert、update或者delete這類操作是不允許的。
問題修複
方案1:針對于myisam引擎表
mysql> repair table sbtest4;
方案2:針對于innodb引擎表,損壞非二級索引
#### 建立一張新表,并修改存儲引擎為myisam
mysql> show variables like 'innodb_force_recovery';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| innodb_force_recovery | 0 |
+-----------------------+-------+
1 row in set (0.00 sec)
mysql> create table sbtest4_bak like sbtest4;
mysql> alter table sbtest4_bak engine = myisam MAX_ROWS=10000000;
備注:
1、為什麼修改為myisam存儲引擎?是因為在innodb_force_recovery>0時,innodb無法執行insert操作。或者在innodb_force_recovery>0直接手動create table xxx ... engine=myisam.
2、在行指針設定較小不夠用的時候,為提高MyISAM表最大容量,可以調整表定義設定MAX_ROWS的值
#### 損壞頁資料導入
mysql> show variables like 'innodb_force_recovery';
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| innodb_force_recovery | 1 |
+-----------------------+-------+
1 row in set (0.00 sec)
mysql> insert into sbtest4_bak select * from sbtest4;
#### 删除原表
mysql> drop table sbtest4;
#### 新表重命名
mysql> rename table sbtest4_bak to sbtest4;
注意:在執行insert into xxx select * from xxx;可能會引發問題:
mysql> insert into sbtest4_bak select * from sbtest4;
ERROR 1598 (HY000): Binary logging not possible. Message: Either disk is full or file system is read only while rotating the binlog. Aborting the server.
錯誤日志顯示:
...
2019-04-26T14:04:12.047240+08:00 59 [ERROR] mysqld: Table './sbtest/sbtest4_bak' is marked as crashed and should be repaired
2019-04-26T14:04:12.068042+08:00 59 [Warning] Checking table: './sbtest/sbtest4_bak'
...
避免措施是關閉binlog:
mysql> set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)
mysql> insert into sbtest4_bak select * from sbtest4;
Query OK, 10997913 rows affected (2 min 0.96 sec)
Records: 10997913 Duplicates: 0 Warnings: 0
方案3:針對innodb引擎表,損壞二級索引page
mysql> OPTIMIZE TABLE sbtest4;
- 針對二級索引損壞page可以直接選擇OPTIMIZE TABLE xxx;操作直接重新建立二級索引
方案4:實體備份表空間傳輸
mysql> use sbtest;
mysql> CREATE TABLE sbtest4_bak like sbtest4;
mysql> ALTER TABLE sbtest4_bak DISCARD TABLESPACE;
mysql> FLUSH TABLES t FOR EXPORT;
shell> scp /path/to/datadir/test/sbtest4.{ibd,cfg} destination-server:/path/to/datadir/test
shell> chown -R mysql:mysql /path/to/datadir/test
mysql> use test;
mysql> UNLOCK TABLES;
mysql> use test;
mysql> ALTER TABLE t IMPORT TABLESPACE;
- 通過實體備份恢複到最近的時間點資料,通過innodb表空間傳輸技術快速恢複
方案5:實體備份邏輯恢複
shell> mysqldump -uxxx1 -hxxx1 -pxxx1 -B test --tables sbtest4 --single-transaction --master-data=2 --set-gtid-purged=off > /path/sbtest4.sql
shell> mysql -uxxx2 -hxxx2 -pxxx2 < /path/sbtest4.sql
- 通過實體備份恢複到最近的時間點資料,通過mysqldump邏輯恢複
總結
- 資料頁面的主鍵索引(clustered key index)被損壞。這種情況和資料的二級索引(secondary indexes)被損壞相比要糟很多,因為後者可以通過使用OPTIMIZE TABLE指令來修複,但這和更難以恢複的表格目錄(table dictionary)被破壞的情況來說要好一些。需要嘗試另一個方法:insert into tb select * from ta limit X;甚至是dump出去,再load回來。