MySQL複制拓撲管理工具Orchestrator

https://www.jianshu.com/p/62e95a131028

簡介

Orchestrator是一款開源的MySQL複制拓撲管理工具，采用go語言編寫，支援MySQL主從複制拓撲關系的調整、支援MySQL主庫故障自動切換、手動主從切換等功能。

Orchestrator背景依賴于MySQL或者SQLite存儲中繼資料，能夠提供Web界面展示MySQL叢集的拓撲關系及執行個體狀态，通過Web界面可更改MySQL執行個體的部配置設定置資訊，同時也提供指令行和api接口，以便更加靈活的自動化運維管理。

相比于MHA，Orchestrator更加偏重于複制拓撲關系的管理，能夠實作MySQL任一複制拓撲關系的調整，并在此基礎上，實作MySQL高可用，另外Orchestrator自身可以部署多個節點，通過raft分布式一緻性協定，保證自身的高可用。

源碼編譯

源碼位址：https://github.com/github/orchestrator.git

目前最新版：v3.0.11

編譯（需要聯網）：

git clone https://github.com/github/orchestrator.git

cd orchestrator

script/build

編譯完成後生成可執行檔案： ./bin/orchestrator

環境搭建

3.1 配置檔案

在源碼的orchestrator/conf目錄中有3個配置檔案模闆，可參考使用。

orchestrator-sample.conf.json

orchestrator-sample-sqlite.conf.json

orchestrator-simple.conf.json

這裡列出一個簡化的參數配置orchestrator.conf.json

{

"Debug": true,

"ListenAddress": ":3000", #http開放端口

}

MySQLTopologyUser 這個配置項為被管理的MySQL叢集的admin賬号，該賬号需要有super,process,reload,select,replicatiopn slave,replicatiopn client 權限。

3.2 背景資料庫

orchestrator背景依賴MySQL或者SQLite存儲管理資料，以MySQL為例，搭建Orchestrator環境，需要先搭建一個MySQL背景資料庫，MySQL具體搭建過程不再詳細介紹，搭建完，将MySQL賬号密碼等資訊寫入配置檔案，如下：

"MySQLOrchestratorHost": "127.0.0.1",

"MySQLOrchestratorPort": 3306,

"MySQLOrchestratorDatabase": "orchestrator",

"MySQLOrchestratorUser": "root",

"MySQLOrchestratorPassword": "123456",

如果覺得安裝MySQL太麻煩，隻想快速體驗一下Orchestrator，建議使用SQLite，隻需在配置檔案中寫入如下配置：

"BackendDB": "sqlite",

"SQLite3DataFile": "/root/orchestrator/orchestrator.sqlite3",

執行指令

orchestrator 通過 -c 來執行具體的指令，通過 orchestrator help 檢視所有指令的幫助文檔， orchestrator help relocate 檢視具體指令relocate的幫助文檔。

orchestrator 提供的指令很多，這裡提一些比較重要和常用的指令，沒有提到的可自行去文檔或者源碼中檢視。

比如執行一個指令：

./orchestrator --config=./orchestrator.conf.json -c discover -i mysql_host_name

4.1 MySQL執行個體管理指令

discover

forget

begin-maintenance

end-maintenance

in-maintenance

begin-downtime

end-downtime

用于發現執行個體以及該執行個體的主、從庫資訊，将擷取到的資訊寫入背景資料庫database_instance等相關表

orchestrator --config=./orchestrator.conf.json -c discover -i host_name

移除執行個體資訊，即從database_instance表中删除相關記錄

orchestrator --config=./orchestrator.conf.json -c forget -i host_name

标記一個執行個體進入維護模式，在database_instance_maintenance表中插入記錄

orchestrator -c begin-maintenance -i instance.to.lock.com --duration=3h --reason="load testing; do not disturb"

标記一個執行個體退出維護模式，即更新 database_instance_maintenance 表中相關記錄

orchestrator -c end-maintenance -i locked.instance.com

查詢執行個體是否處于維護模式，從表database_instance_maintenance中查詢

orchestrator -c in-maintenance -i locked.instance.com

标記一個執行個體進入下線模式，在database_instance_downtime表中插入記錄

orchestrator -c begin-downtime -i instance.to.downtime.com --duration=3h --reason="dba handling; do not do recovery"

标記一個執行個體退出下線模式，在database_instance_downtime表中删除記錄

orchestrator -c end-downtime -i downtimed.instance.com

4.2 MySQL執行個體資訊查詢指令

find

clusters

clusters-alias

all-clusters-masters

topology

topology-tabulated

all-instances

which-instance

which-cluster

which-cluster-domain

which-heuristic-domain-instance

which-cluster-master

which-cluster-instances

which-cluster-osc-replicas

which-cluster-gh-ost-replicas

which-master

which-downtimed-instances

which-replicas

which-lost-in-recovery

instance-status

get-cluster-heuristic-lag

通過正規表達式搜尋執行個體名

orchestrator -c find -pattern "backup.*us-east"

通過關鍵字比對搜尋執行個體名

orchestrator -c search -pattern "search string"

輸出所有的MySQL叢集名稱，通過sql查詢database_instance相關表擷取

orchestrator -c clusters

輸出所有MySQL叢集名稱以及别名

orchestrator -c clusters-alias

輸出所有MySQL叢集可寫的主庫資訊

orchestrator -c all-clusters-masters

輸出執行個體所屬叢集的拓撲資訊

orchestrator -c topology -i instance.belonging.to.a.topology.com

輸出執行個體所屬叢集的拓撲資訊，類似topology指令，輸出格式稍有不同

orchestrator -c topology-tabulated -i instance.belonging.to.a.topology.com

輸出所有已知的執行個體

orchestrator -c all-instances

輸出執行個體的完整的資訊

orchestrator -c which-instance -i instance.to.check.com

輸出MySQL執行個體所屬的叢集名稱

orchestrator -c which-cluster -i instance.to.check.com

輸出MySQL執行個體所屬叢集的域名

orchestrator -c which-cluster-domain -i instance.to.check.com

給定一個叢集域名，輸出與其關聯的可寫的執行個體

orchestrator -c which-heuristic-domain-instance -alias some_alias

輸出執行個體所屬叢集的主庫資訊

orchestrator -c which-cluster-master -i instance.to.check.com

輸出執行個體所屬叢集的所有執行個體資訊

orchestrator -c which-cluster-instances -i instance.to.check.com

列出執行個體所屬叢集的主庫資訊，與which-cluster-master類似

orchestrator -c which-master -i a.known.replica.com

列出處于下線狀态的執行個體

orchestrator -c which-downtimed-instances

輸出執行個體的從庫資訊

orchestrator -c which-replicas -i a.known.instance.com

輸出處于下線狀态，在故障恢複過程中丢失的執行個體

orchestrator -c which-lost-in-recovery

輸出執行個體的狀态資訊

orchestrator -c instance-status -i instance.to.investigate.com

輸出執行個體所屬叢集的最大延遲資訊

orchestrator -c get-cluster-heuristic-lag -i instance.that.is.part.of.cluster.com

4.3 故障恢複指令

recover

recover-lite

force-master-failover

force-master-takeover

graceful-master-takeover

replication-analysis

ack-all-recoveries

ack-cluster-recoveries

ack-instance-recoveries

relocate

主庫故障切換，主庫必須關閉，執行才有效果， -i 參數必須是已經關閉的主庫，新主庫不需要指定，由orchestrator自己選擇。

orchestrator -c recover -i dead.instance.com --debug

主庫故障切換，與recover類似，簡化的部分操作，更加輕量化。

orchestrator -c recover-lite -i dead.instance.com --debug

不管主庫是否正常，強制故障切換，切換後主庫不關閉，新主庫不需要指定，由orchestrator選擇。這個操作比較危險，謹慎使用。

orchestrator -c force-master-failover

不管主庫是否正常，強制主從切換，-i指定叢集中任一執行個體，-d 指定新主庫，注意切換後舊主庫不會指向新主庫，需要手動操作。

orchestrator -c force-master-takeover -i instance.in.relevant.cluster.com -d immediate.child.of.master.com

主從切換，舊主庫會指向新主庫，但是複制線程是停止的，需要人工手動執行start slave，恢複複制。

orchestrator -c graceful-master-takeover -i instance.in.relevant.cluster.com -d immediate.child.of.master.com

根據已有的拓撲關系分析潛在的故障事件，分析結果輸出格式不穩定，未來可能改變，建議不要使用該功能。

orchestrator -c replication-analysis

确認已有的故障恢複，防止未來再次發生故障時，會阻塞故障切換

orchestrator -c ack-all-recoveries --reason="dba has taken taken necessary steps"

orchestrator -c ack-cluster-recoveries -i instance.in.a.cluster.com --reason="reson message"

orchestrator -c ack-instance-recoveries -i instance.that.failed.com --reason="reson message"

調整拓撲結構，-i 指定的執行個體更改為 -d 指定執行個體的從庫。

orchestrator -c relocate -i replica.to.relocate.com -d instance.that.becomes.its.master

自動故障切換

Orchestrator能夠配置成自動檢測主庫故障，并完成故障切換。

以http方式啟動背景Web服務

orchestrator --config=./orchestrator.conf.json --debug http

成功啟動後，可通過浏覽器通路Web頁面：

http://192.168.56.110:3000

參數配置

"RecoverMasterClusterFilters": [""],

"RecoverIntermediateMasterClusterFilters": [""],

"FailureDetectionPeriodBlockMinutes": 60,

"RecoveryPeriodBlockSeconds": 3600

RecoverMasterClusterFilters 和 RecoverIntermediateMasterClusterFilters 必須配置為["*"]，否則自動切換不會觸發。

FailureDetectionPeriodBlockMinutes 和 RecoveryPeriodBlockSeconds 參數預設值為1個小時，也就是如果發生了故障切換，在1個小時之内，該主庫再次出現故障，将不會被監測到，也不會觸發故障切換。

Orchestrator 高可用

Orchestrator多節點部署，通過raft一緻性協定實作自身高可用。

例如在如下3台機器部署Orchestrator節點：

192.168.56.110

192.168.56.111

192.168.56.112

在每個節點上修改orchestrator.conf.json配置檔案：

"RaftEnabled": true,

"RaftDataDir": "/var/lib/orchestrator",

"RaftBind": "192.168.56.110",

"DefaultRaftPort": 10008,

"RaftNodes": [ "192.168.56.110", "192.168.56.111", "192.168.56.112" ],

RaftBind配置為目前節點ip，在每個節點上啟動orchestrator服務：

./orchestrator --config=./orchestrator.conf.json --debug http

在浏覽器中通路：

http://192.168.56.110:3000/api/leader-check

傳回 "OK"，目前leader為192.168.56.110

http://192.168.56.110:3000/api/raft-health

傳回 "healthy"

http://192.168.56.111:3000/api/leader-check

傳回 "Not leader"

http://192.168.56.111:3000/api/raft-health

關閉192.168.56.110節點上的orchestrator服務，leader自動切換到192.168.56.111或者192.168.56.112，如果192.168.56.110重新啟動後，加入叢集，它将作為follower。

注意事項

Orchestrator官方文檔部分内容不準确，比如 MySQLTopologyUser 賬号的權限應該設定為super,process,reload,select,replicatiopn slave,replicatiopn client，文檔中缺少了select權限，orchestrator切換過程中需要通過讀取從庫的mysql.slave_master_info表，擷取複制賬号和密碼，如果沒有select權限，将導緻讀取失敗，并且不會有任何錯誤資訊報出來。

orchestrator 建議使用機器名，而不是ip來管理MySQL執行個體，比如change master to 中的 master_host 如果指定的是ip，有可能導緻主從切換或者故障切換出現問題。

最後附上Web頁面圖：