ä½è ï¼è«å
æäºèç½å ¬å¸é«çº§ DBAã
æ¬ææ¥æºï¼ååæ稿
*ç±å¯çå¼æºç¤¾åºåºåï¼ååå 容æªç»ææä¸å¾éæ使ç¨ï¼è½¬è½½è¯·èç³»å°ç¼å¹¶æ³¨ææ¥æºã
ä¸ãå¼è¨
åè¦è·è¿ç»´å·¥ä½æ¯æ¯ç¸å ³ï¼ä¸ä¸ªå¥½çåè¦ç³»ç»ä¸ä» è½æåè¿ç»´çæçï¼è¿è½æåç¸å ³äººåçå·¥ä½èé度åç活质éãç¸åï¼å¦æåè¦ç³»ç»æ¯è¾æè¯ï¼é£è¿ç»´çå·¥ä½å°±æ¯è¾é¾åäºãæ¯å¦ï¼åå¤æ¶å°æ å ³çççåè¦ä¿¡æ¯ï¼åæ¯å¦è¿ä¸ªåè¦æ£å¨å¤çè¿ä¸ç´å¨åï¼åæ¯å¦åä¸æ¶é´äº§çå¾å¤åè¦ï¼ç¶åä¸éè¦çåè¦æéè¦çåè¦å·èµ°äºï¼ççã
æ¬ææ³å享ä¸ä¸å¨ä½¿ç¨Alertmanagerçè¿ç¨ä¸éå°çä¸äºå°æ°ï¼ä»¥åå享ä¸ä¸æè¿å¨åçåè¦ç³»ç»æ¹é ç项ç®ï¼ä» åç»éªäº¤æµã
äºãåæåå¤
æ们线ä¸éç¨Prometheus + Alertmanagerçæ¶æè¿è¡çæ§åè¦ãæ以æ¬æ主è¦æ¯åºäºAlertmanagerç»ä»¶è¿è¡ä»ç»ã
alertmanager, version 0.17.0 (branch: HEAD, revision: c7551cd75c414dc81df027f691e2eb21d4fd85b2)
build user: root@932a86a52b76
build date: 20190503-09:10:07
go version: go1.12.4
1ãå¾ å¤çé®é¢
ï¼1ï¼åè¦å¹²æ°
å åå²éçé®é¢ï¼æ们线ä¸ç¯å¢çç¯å¢æ¯ä¸ä¸ªé群ä¸ä¸ªprometheusï¼ç¶åå ±äº«ä¸ä¸ªåè¦ééAlertmanagerï¼ææ¶åä¼åºç°Aé群çåè¦ä¿¡æ¯è·å°Bé群çåè¦ä¸ãæ¯å¦ä¼æ¶å°åä¸é¢è¿ç§æ åµï¼
cluster: clusterA
instance: clusterB Node
alert_name: xxx
è¿ä¸ªé®é¢ä¸ç´æ²¡æ¾å°åå ï¼ä¹æ²¡æ³ç¨³å®å¤ç°ã
ï¼2ï¼åè¦å级
ç°å¨çåè¦ç³»ç»å¨è§¦ååè¦æ¶åä¸ä¼å级ãæ¯å¦ä¼å åç»å¼ç人åï¼å ¶æ¬¡ä¼æ ¹æ®æ¥æ¶äººï¼åè¦æ¶é´ï¼åè¦ä»è´¨çè¿è¡å级ã
å ³äºåè¦å级åææ解é说æã
ï¼3ï¼åè¦æ¢å¤
对äºå·²ç»æ¢å¤çåè¦ï¼Alertmanagerä¸ä¼åéä¸ä»½åè¦æ¢å¤çæ示ã
ï¼4ï¼åè¦æå¶
Alertmanageré对éå¤çåè¦å¯ä»¥åå°èªå®ä¹æ¶é´è¿è¡æå¶ï¼ä½æ¯ä¸å¤ªæºè½ï¼æ¯å¦ï¼åä¸ä¸ªåè¦é¡¹ï¼åé¢ä¸æ¬¡åéé´éçç¹ï¼è¶ è¿ä¸æ¬¡çé´éå¯ä»¥é¿ç¹ï¼æ¯å¦ç¬¬1ï¼3ï¼5ï¼10ï¼20ï¼30åéåéãå¦å¤ä¹ä¸è½åå°èªéåºæ¶é´æå¶ï¼æ¯å¦å·¥ä½æ¶é´æå¶æ¶é´é´éå¯ä»¥é¿ä¸ç¹ï¼åä¸ä¸ªåè¦ååéåä¸æ¬¡ï¼ï¼ä¼æ¯æ¶é´çæå¶æ¶é´çä¸ç¹ï¼åä¸ä¸ªåè¦äºåéåä¸æ¬¡ï¼ã
ï¼5ï¼åè¦éé»
Alertmanageræ¯æåè¦éé»åè½ï¼ä½æ¯éè¦å¨Alertmanagerå¹³å°è¿è¡é ç½®ãå¦æä¸ä¸ªæºå¨å®æºåï¼å¯è½è§¦åå¾å¤åè¦éè¦éé»ï¼æ以添å åäºåå é¤éé»è§åç管çæ¯è¾éº»ç¦ã
å¦å¤è¿æä¸ä¸ªæ¯è¾å¤´ç¼çé®é¢ï¼æ£å¸¸éè¿åè¦é¡µé¢å¯ä»¥ç¹å»ãSilenceãæ¯å¯ä»¥å°å¾ éé»çåè¦ä¿¡æ¯å¸¦å°é ç½®åè¦éé»ç页é¢ï¼ä½æ¯å¤§é¨åæ¶åé½æ¯ä¸è¡çï¼ç©ºç½é¡µé¢ï¼ï¼éè¦æå¨å¡«åéè¦éé»çåè¦ä¿¡æ¯ï¼è¿å°±å¾å¤´ç¼äºã
ï¼6ï¼è¯é³åè¦
Alertmanagerç®åä¸æ¯æè¯é³åè¦ã
è¿ä¸ªé®é¢ä¸å为åç¬çé®é¢è¿è¡ä»ç»ï¼ä¼æ¾å¨åè¦å级é¨åã
2ãåç°æ°é®é¢
为äºè§£å³ä¸è¿°æå°çãåè¦å¹²æ°ãé®é¢ï¼æ们éåçæ¹æ³æ¯å°ä¸ä¸ªAlertmanagerææå¤ä¸ªï¼ä¸ä¸ªé群ä¸ä¸ªï¼ï¼è¿æ ·è½è§£å³ãåè¦å¹²æ°é®é¢ãï¼é£ä¹å带æ¥äºæ°çé®é¢ã
- å¦ä½å®ç°åè¦æ¶æï¼
- å¦ä½ç®¡çåè¦éé»ï¼
ï¼1ï¼åè¦æ¶æ
åä¸æ¶å»äº§çå¤æ¡åè¦ï¼å°±ä¼å¯¼è´ç¸å ³äººåæ¶å°å¤æ¡åè¦ä¿¡æ¯ï¼è¿æ ·å³æµªè´¹åè¦èµæºï¼ä¹å¯¹ææ¥é®é¢å¸¦æ¥ä¸å®çå¹²æ°ãæ¯å¦åæºå¤å®ä¾çåºæ¯ï¼ä¸å°æºå¨å®æºï¼åæ¶ä¼äº§ç好å åæ¡åè¦ï¼æè ä¸ä¸ªé群åºç°é®é¢ï¼ææèç¹é½è§¦ååè¦ãé对è¿ç§åºæ¯å¦æ没æåè¦æ¶æï¼ä¼æ¯è¾çè¦ã
Alertmanageræ¬èº«æ¯ææ¶æï¼å 为æ们éè¦è§£å³ãåè¦å¹²æ°ãé®é¢ï¼æ以æ们æ¹é çæ¶åææäºä¸ä¸ªé群ä¸ä¸ªAlertmanagerï¼è¿æ ·æ们ç¯å¢å°±æ²¡æ³ç¨èªå¸¦çæ¶æåè½äºã
ï¼2ï¼åè¦éé»
æ¬æ¥å个Alertmanagerçåè¦éé»å°±æ¯è¾é¾ç®¡çäºï¼å¦æå¤ä¸ªåè¦é¡¹ï¼å¯è½æ¯å¤ä¸ªAlertmanageréè¦éé»ï¼éé»ç管çå°±æ´å 麻ç¦äºã
é对以ä¸é®é¢ï¼ä¸é¢ä¼é个ä»ç»ä¸ä¸è§£å³æè·¯ï¼é¨å解å³æ¹æ¡ä¸è§å¾éåææç¯å¢ï¼å ¶ä»ç¯å¢æ许ææ´å¥½ç解å³æ¹æ¡ï¼è¿éä» ä¾åèã
ä¸ãåè¦æ¹é
1ãåè¦å¹²æ°é®é¢
å¦åææè¿°ï¼éè¿å°ä¸ä¸ªAlertmanagerææn个ï¼è¿ç§æ¹å¼çèµ·æ¥æ¯è¾ç¬¨ï¼ä½æ¯å´ææï¼å¦ææä¸ä¸ªä¸å¸è§è§å°ææAlertmanager管çèµ·æ¥ï¼å°±å½ä½ä¸ä¸ªæ°æ®åºå®ä¾å»å¯¹å¾ ï¼å ¶å®ä¹å¾æ¹ä¾¿ãè¿ç±»ç»ä»¶ä¹ä¸éè¦å¾å¤çç³»ç»èµæºï¼ä½¿ç¨èææºå®å ¨å¤ç¨ãå¦å¤å ¶å®ä¹æä¸ä¸ªå¥½å¤ï¼å å ¥è¯´Alertmanageråºç°é®é¢ï¼æ¯å¦è¿ç¨æ£å¸¸ï¼ä½æ¯ä¸ä¼ååè¦äºï¼è¿æ ·ä¹ä¸è³äºå¢çã
æ们平å°ççæ§ãåè¦é½å·²ç»å®ç°èªå¨åï¼å¹¶ä¸é½æ¯éè¿å¹³å°è¿è¡ç®¡çï¼ç¨ä¸ä¸ªAlertmanagerè·å¤ä¸ªï¼å¨é¨ç½²å®è£ ä¸ææ¬å·®ä¸å¤ï¼ä½æ¯ä¸å¤ªå¥½ç®¡çï¼è¿ä¸ªåææ说æï¼ã
2ãåè¦å级é®é¢
åéåè¦çä»è´¨ä¸»è¦åå ç§ï¼é®ä»¶ï¼ä¼ä¸å¾®ä¿¡ï¼å ¶ä»å³æ¶éè®¯å·¥å ·ï¼ï¼çä¿¡ï¼çµè¯/è¯é³ãä»å·¦å°å³ææ¬ä¾æ¬¡å¢é«ï¼æ以为äºåè¦èµæºä¸è¢«æµªè´¹ï¼å°½å¯è½çèççä¿¡/çµè¯è¿ç§åè¦ä»è´¨ï¼æ以æ们å¸ææ们çåè¦ç³»ç»æ¯è½èªéåºçè¿è¡è°æ´ãè¿ä¸ªå级åå¦ä¸å 个维度ï¼
- ç¬¬ä¸ åè¦ä»è´¨çå级ãé®ä»¶ --> ä¼ä¸å¾®ä¿¡ --> çä¿¡ --> çµè¯ï¼åä¸ä¸ªåè¦é¡¹åéè¶ è¿3次就åä¸å级ä¸ä¸ªä»è´¨ï¼å ·ä½å¯ä»¥æ ¹æ®éæ±å®ä¹ï¼ã
- ç¬¬äº åè¦æ¥æ¶äººå级ãä¸çº§ --> äºçº§ --> leaderã
- ç¬¬ä¸ æç §æ¶é´å级ãå·¥ä½æ¶é´éè¿é®ä»¶/ä¼ä¸å¾®ä¿¡åéåè¦ï¼å·¥ä½æ¶é´ä¹å¤éè¿çä¿¡/çµè¯åéåè¦ã
è¿ä¸ªé®é¢æ³éè¿Alertmanageræ¥è§£å³å¥½åä¸å¯è¡ï¼æ以åªè½éè¿å ¶ä»æ段è¿è¡æ²çº¿æå½äºãæ们éåçæ¹å¼æ¯å¼åä¸ä¸ªèæ¬è¯»åAlertmanagerçåè¦ä¿¡æ¯ï¼ç¶åéè¿èæ¬è¿è¡åéåè¦ä¿¡æ¯ã
å ¶å®ä¹å¯ä»¥ç´æ¥è¯»åprometheusçåè¦ä¿¡æ¯ï¼åçä¸å·®ä¸å¤ªå¤ã
Send a detailed message to the DBA by Mail #åªè¦æ¯åè¦å°±ä¼éè¿é®ä»¶åç¥
if now_time > 8 and now_time < 22 :
Send a simple message to the DBA by WX
else #æç
§åè¦æ¶é´å级åè¦ä»è´¨
if alert_count > 3 and phone_count < 3 :
Send a simple simple message to the DBA by phone #çä¿¡åè¦å级çµè¯åè¦
elif alert_count > 3 and phone_count > 3 :
Send a simple message to the leader by phone #æ¥æ¶åè¦äººåå级
else
Send a simple message to the DBA by SMS #åè¦ä»è´¨å级
æ¥ååè¦ç人åå å«å¼çDBAå项ç®è´è´£äººï¼å¦ææ¥æ¶åè¦ç对象æ æ³æ¶å°åè¦ä¿¡æ¯ï¼èç³»æ¹å¼å¼å¸¸ï¼ç¦»èï¼ï¼å°±ä¼è¯»åé讯å½è·åä¸ä½åç»ç人åè¿è¡éæ°åéåè¦ã
å¯ä»¥å®ä¹ï¼å¤æ¬¡åè¦åè¿è¡æ¥æ¶äººåçåè¦å级ï¼åçä¹å¯ä»¥å®ä¹åè¦æ¶é´ï¼åè¦ä»è´¨çåè¦å级ã
åªè¦æ¯åè¦å°±ä¼éè¿é®ä»¶åç¥ï¼åå æ¯çä¿¡/è¯é³è¿ç§ä»è´¨ææ¬æ¯è¾é«ï¼ä¹ä¸æ¹ä¾¿æ¥é ï¼æ以åªä¼åéç®åçåè¦æ示ï¼è¯¦ç»çéè¦æ¥é é®ä»¶ã
3ãåè¦æ¶æ/æå¶é®é¢
é¦å éè¦ä¸ä¸ªè¡¨æ¥ä¿ååéåè¦çè®°å½ï¼å å«åè¦é¡¹ï¼è¦æ±å ¨å±å¯ä¸ï¼å»ºè®®ä½¿ç¨ip:portï¼ï¼åè¦ç¶æï¼ç´¯è®¡åè¦æ¬¡æ°ï¼æååè¦æ¶é´ï¼æ´ä¸ªç³»ç»ç好å 个åè½é½éè¦è¿ä¸ªè¡¨ã
CREATE TABLE `tb_alert_for_task` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT COMMENT '主é®',
`alert_task` varchar(100) DEFAULT '' COMMENT 'åè¦é¡¹ç®',
`alert_state` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'åè¦ç¶æ, 0表示已ç»æ¢å¤, 1表示æ£å¨åè¦',
`alert_count` int(11) NOT NULL DEFAULT '0' COMMENT 'åè¦ç次æ°, å个åè¦é¡¹åéçåè¦æ¬¡æ°æ¯10次ï¼æ¯å¤©è³å¤åéå次ï¼',
`u_time` datetime NOT NULL DEFAULT '2021-12-08 00:00:00' COMMENT 'ä¸ä¸æ¬¡åéåè¦çæ¶é´',
`alert_remarks` varchar(50) NOT NULL DEFAULT '' COMMENT 'åè¦å
容',
PRIMARY KEY (`id`),
UNIQUE KEY `uk_alert_task` (`alert_task`)
) ENGINE=InnoDB AUTO_INCREMENT=7049 DEFAULT CHARSET=utf8mb4;
éè¿æ¥é ææ¡£æ们å¯ä»¥ç¥éè°ç¨ä¸é¢çåè¦ä¿¡æ¯apiæ¯å¯ä»¥è·åå½ååè¦ä¿¡æ¯ã
api/v1/alerts?silenced=false&inhibited=false
ä¸æ¡åè¦ä¿¡æ¯é¿ä¸é¢è¿æ ·ï¼æ认为æç¨çå°±æ¯ãalertnameããclusterããinstanceï¼ä¸»æºé¨åï¼ãè¿ä¸ä¸ªå±æ§ï¼å¯ä»¥å°è¿ä¸ä¸ªå±æ§ä½ä¸ºæ¶æ维度ã
{
'labels': {
'alertname': 'TiDB_server_is_down',
'cluster': 'cluster_name_001',
'env': 'cluster_name_001',
'expr': 'probe_success{
group="tidb"
}==0',
'group': 'tidb',
'instance': '192.168.168.1:4000',
'job': 'tidb_port_probe',
'level': 'emergency',
'monitor': 'prometheus'
}
}
è¿éçç¥äºå ¶ä»ä¿¡æ¯ï¼ä» ä¿çäºlabelsé¨å
ï¼1ï¼æå¶
æå¶é»è¾æ¯ä¸ä¸æ¬¡åéåè¦æ¶é´å°äºå½åæ¶é´ï¼æè åè¦åé次æ°å¤§äº10次ã为ä»ä¹æ¯è¿ä¸ªé»è¾ï¼è¿éè¦ç»åæ¶æé¨åç代ç ä»ç»ï¼æ以åé¢è§£éã
#è¿æ¯æå¶çé»è¾, å°å½ååè¦å¼å¸¸é¡¹é½è¯»åºæ¥, å¦æå½åalertmanagerçåè¦å·²ç»å¨è¿éé¢å°±è§ä¸ºæå¶å¯¹è±¡, å 为è¿äºåè¦è¿ä¸æ»¡è¶³å次åéçæ¡ä»¶
select_sql = "select alert_task from tb_tidb_alert_for_task where alert_state = 1 and (u_time > now() or alert_count > 10);"
state, skip_instance = connect_mysql(opt = "select", sql = {"sql" : select_sql})
skip_instanceæ¯ä¸ªå表ï¼å¨éååè¦ä¿¡æ¯çæ¶åä¼ç¨å°ï¼å¦æå¨è¿ä¸ªå表éé¢å°±å¿½ç¥è¿ä¸ªåè¦ï¼ä»¥æ¤èµ·å°æå¶çææã
ï¼2ï¼æ¶æ
æ¶ææ路大æ¦æ¯è¿æ ·ï¼éåæ¯ä¸æ¡åè¦ä¿¡æ¯ï¼éåçæ¶åä¼å¨ãtb_tidb_alert_for_taskã表记å½ä¸æ¡è®°å½ï¼å å«ï¼
- instanceä¿¡æ¯ï¼è¿æ¯å ¨å±å¯ä¸ã
- åè¦ç¶æï¼åºå«æ¯æ£å¨åè¦ï¼è¿æ¯å·²ç»æ¢å¤åè¦ï¼æ¢å¤åè¦çæ¶åç¨å°ã
- åéåè¦æ¬¡æ°ï¼è¿ä¸ªå¼ä¼ä½ä¸ºæ¶æåèæ°æ®ã
- ä¸ä¸æ¬¡åéåè¦æ¶é´ï¼è¿æ¯æ¹ä¾¿å¤çæå¶ã
next_time = 0
if now_time > 8 and now_time < 22 :
next_time = max_alert_count * 2 + 1
else :
next_time = max_alert_count
insert_sql[instance_name] = """replace into tb_tidb_alert_for_task(alert_task,alert_state,alert_count,u_time,alert_remarks)
select '""" + instance_name + """',1,""" + str(max_alert_count) + """, date_add(now(), INTERVAL + """ + str(next_time) + """ MINUTE),
'tidbé群åè¦';"""
state, _ = connect_mysql(opt = "insert", sql = insert_sql)
ä¸é¢è¿é¨åé»è¾æ¯æå¡äºæå¶åè½ï¼æ以å¯ä»¥è§£éçéï¼ä¸æå°ä¸ä¸æ¬¡åéåè¦æ¶é´è¦æ±å°äºå½åæ¶é´ã
å·¥ä½æ¶é´ï¼åéä¸æ¬¡åè¦åï¼ä¸ä¸æ¬¡åè¦æ¶é´æ¯é´é 2n+1 min(n表示åè¦æ¬¡æ°)ã
éå·¥ä½æ¶é´ï¼åéä¸æ¬¡åè¦åï¼ä¸ä¸æ¬¡åè¦æ¶é´æ¯é´é n+1 min(n表示åè¦æ¬¡æ°)
ä¸é¢è¿é¨åé»è¾æ¯éåAlertmanagerçææå¤äºactiveç¶æçåè¦ä¿¡æ¯ï¼ç¶åååæå¤çã
def f_get_alert_to_msg(url) :
try :
res = json.loads(requests.get(url, headers = header, timeout = 10).text) #读åalertmanagerçåè¦ç¶æ
except Exception as err :
return {"code" : 1, "info" : str(err)}
for temp in res["data"] :
cluster_name = temp["labels"]["cluster"]
alert_name = temp["labels"]["alertname"]
instance_name = cluster_name + ":all" #ç¹æ®æ
åµæ²¡æinstanceä¿¡æ¯ï¼è¿ç§æ¶åæ¯é群åè¦ï¼èä¸æ¯æ个èç¹åè¦
if "instance" in temp["labels"] : instance_name = temp["labels"]["instance"]
if len(global_instance_list) == 0 : global_instance_list = []
if len(instance_name) > 0 : global_instance_list.append(instance_name) #è¿ä¸ªå表å¨å¤ææ¯å¦æ¯åè¦å·²ç»æ¢å¤ç¨å°
if instance_name in skip_instance : continue # 符åæå¶æ¡ä»¶å°±å¿½ç¥è¿ä¸ªåè¦
if cluster_name not in global_alert_cluster.keys() : global_alert_cluster[cluster_name] = {}
if alert_name not in global_alert_name.keys() : global_alert_name[alert_name] = {}
if instance[0] not in global_alert_host.keys() : global_alert_host[instance[0]] = {}
if alert_name not in global_alert_cluster[cluster_name].keys() : global_alert_cluster[cluster_name][alert_name] = []
if cluster_name not in global_alert_name[alert_name].keys() : global_alert_name[alert_name][cluster_name] = []
if alert_name not in global_alert_host[instance[0]].keys() : global_alert_host[instance[0]][alert_name] = []
if instance_name not in global_alert_cluster[cluster_name][alert_name] : global_alert_cluster[cluster_name][alert_name].append(instance_name)
if instance_name not in global_alert_name[alert_name][cluster_name] : global_alert_name[alert_name][cluster_name].append(instance_name)
if cluster_name + ":" + instance[1] not in global_alert_host[instance[0]][alert_name] : global_alert_host[instance[0]][alert_name].append(cluster_name + ":" + instance[1])
return {"code" : 0, "info" : "ok"}
å°ãalertnameããclusterããinstanceãåå«ä¿åå°ãglobal_alert_nameããglobal_alert_clusterããglobal_alert_hostãè¿ä¸ä¸ªåå ¸ã
ä¸é¢è¿ä¸ªé»è¾æ¯éåalertmanagerçurlï¼æ ¹æ®urlå»æ«å¯¹åºçalertmanagerçåè¦ä¿¡æ¯ï¼å¯ä»¥çå°ä»£ç ä¸æä¸ä¸ªå¤æAlertmanagerç¶æç代ç ï¼å¯ä»¥èµ·å°çæ§Alertmanagerçä½ç¨ï¼ä¼å°è®¿é®å¼å¸¸çåéåºæ¥ã
for url in url_list :
status = f_get_alert_to_msg(url) #读åalertmanageråè¦ä¿¡æ¯
if status == 1 :
error_list.append(url)
if len(error_list) > 0 :
info = "Alertmanager访é®å¼å¸¸ : " + ",".join(error_list)
status = f_alert_sms_to_user(tel,info)
æç»æ ¹æ®ãglobal_alert_nameããglobal_alert_clusterããglobal_alert_hostãä¸ä¸ªåå ¸çé¿åº¦å¤æ以åªä¸ªç»´åº¦è¿è¡æ¶æï¼å³åªä¸ªåå ¸æç就以åªä¸ªç»´åº¦ä¸ºæ¶æ对象ã
if len(global_alert_cluster.keys()) < len(global_alert_host.keys()) and len(global_alert_cluster.keys()) < len(global_alert_name.keys()) :
alert = global_alert_cluster
info_tmp = "åè¦é群 : "
elif len(global_alert_name.keys()) < len(global_alert_host.keys()) :
alert = global_alert_name
info_tmp = "åè¦å称 : "
else :
alert = global_alert_host
info_tmp = "åè¦ä¸»æº : "
4ãåè¦æ¢å¤é®é¢
#读ååè¦ç¶ææ¯1, ä¸æ¯å½åæ¶é´è¿æ©çæ¡ç®
sql = """select alert_task from tb_tidb_alert_for_task
where alert_state = 1 and alert_remarks = 'tidbé群åè¦' and u_time < date_add(now(), INTERVAL - 1 MINUTE);"""
state, alert_instance = connect_mysql(opt = "select", sql = {"sql" : sql})
alert_ok = []
for instance in alert_instance :
if instance not in global_instance_list : #global_instance_listæ¯å¨éååè¦ä¿¡æ¯çæ¶åè®°å½äºææinstanceä¿¡æ¯
alert_ok.append(instance)
update_sql[instance] = """update tb_tidb_alert_for_task set alert_state = 0, alert_count = 0
where alert_state = 1 and alert_remarks = 'tidbé群çä¿¡åè¦' and alert_task = '""" + instance + """';"""
state, _ = connect_mysql(opt = "update", sql = update_sql)
å¦æ表éå¤å¨åè¦ç¶æçè®°å½ä¸å¨å½åæ£å¨åè¦çå表ä¸ï¼å°±è¯´æåè¦å·²ç»æ¢å¤ï¼è¿æ¶åå°±ä¼åæ´åè¦ç¶æï¼ä¸å°åè¦æ¬¡æ°ç½®ä¸º0ã
5ãåè¦éé»é®é¢
ï¼1ï¼æ·»å éé»
/api/v1/silences
try :
expi_time = int(expi_time) #å°æ¶
except Exception as err :
return {"code" : 1, "info" : str(err)}
if expi_time > 24 :
return {"code" : 1, "info" : "The alarm cannot be silent for more than 24 hours"}
local_time = f_get_time() #å½åæ¶é´2022-01-01 00:00:00
local_time = dt.datetime.strptime(local_time,'%Y-%m-%d %H:%M:%S')
start_time = local2utc(local_time).strftime("%Y-%m-%dT%H:%M:%S.000Z") #æ¢æUTCæ¶é´
end_time = dt.datetime.strptime(((dt.datetime.now() - timedelta(hours = -expi_time)).strftime("%Y-%m-%d %H:%M:%S")),"%Y-%m-%d %H:%M:%S")
end_time = local2utc(end_time).strftime("%Y-%m-%dT%H:%M:%S.000Z")
dic = {
"id" : "",
"createdBy" : user,
"comment" : comment,
"startsAt" : start_time ,
"endsAt" : end_time,
"matchers" : [
{
"name" : name,
"value" : value
}
]
}
try :
res = json.loads(requests.post(url, json = dic).text)
except Exception as err :
res = {"code" : 1, "info" : str(err)}
return res
è¿ééè¦æ³¨æï¼æ·»å éé»ä¸å®éè¦è®©ç¨æ·æä¾éé»è¶ æ¶æ¶é´ï¼æ¯å¦2hï¼ä¸éæ¯24å°æ¶ï¼è¿æ ·å¯ä»¥é¿å å æ¶é´è¿é¿ï¼ç¶åéå¿è§åï¼å¯¼è´åè¦ä¸ç´è¢«éé»ã
å¦å¤ç¨æ·æä¾çæ¯éé»å°æ¶æ°ï¼èæ·»å éé»æ¯éè¦ä¸ä¸ªå¼å§æ¶é´åç»ææ¶é´ï¼ä¸éè¦UTCæ¶é´ï¼æ以éè¦éè¿è®¡ç®ç¹æ®å¤çä¸ä¸ã
æåè¿éè¦æ³¨æï¼è¿éæ·»å éé»æ¯åä¸è§åï¼å³åªæä¸ä¸ªname-value对ï¼ä¸æ¯æä¸æ¡ä»¶åæ£åãå¦æéè¦å¤ä¸ªæ¡ä»¶å¯ä»¥è¿½å matcherså表çå¼ã
"matchers" : [
{
"name" : name1,
"value" : value1
},
{
"name" : name2,
"value" : value2
}
]
ï¼2ï¼å é¤éé»
è¿é¨åéè¦ç¨å°ä¸¤ä¸ªapiï¼ç¬¬ä¸ä¸ªæ¯å è·åè§åidï¼éè¿idè¿è¡å é¤ã
/api/v1/silences?silenced=false&inhibited=false
/api/v1/silence/id
url = "http://xxx/api/v1/silences?silenced=false&inhibited=false"
try :
id_info = json.loads(requests.get(url).text)
except Exception as err :
return {"code" : 1, "info" : str(err)}
for item in id_info["data"] :
if item["status"]["state"] != "active" : continue #éactiveç¶æçç´æ¥å¿½ç¥
if item["matchers"][0]["name"] != name or item["matchers"][0]["value"] != value : continue #ä¸æ»¡è¶³æ¡ä»¶çä¹ç´æ¥å¿½ç¥
try : #å
è¦è·åidæè½è¿è¡å é¤è§å
url = "http://xxx/api/v1/silence/" + item["id"]
res = json.loads(requests.delete(url).text)
except Exception as err : #访é®å¤±è´¥
res = {"code" : 1, "info" : str(err)}
return res
å¯ä»¥çå°åè¦éé»ç管çè¿æ¯æ¯è¾éº»ç¦çï¼æ以æ们已ç»å°è¿é¨å管çåè½æ´åå°å¹³å°ï¼å¯ä»¥éè¿å¹³å°çåè¦ç®¡çå表页è¿è¡åè¦éé»ç管çï¼å¯ä»¥éè¿ipï¼instanceï¼clusterï¼roleï¼alert_nameè¿å 个维度è¿è¡ç®¡çï¼ä¹æ¯æåè¦ä¿¡æ¯å±ç¤ºï¼å¯ä»¥éè¿å±ç¤ºé¡µé¢é个åè¦æ·»å éé»ï¼ä¹å¯ä»¥å°ææåè¦ä¸é®éé»ï¼è¿æ ·å°±è§£å³äºåè¦éé»é¾ç®¡ççé®é¢ã
åã注æäºé¡¹
- å¦æ没æå¹³å°è¿è¡ç®¡çï¼ä¸å»ºè®®ä½¿ç¨è¿æ ·çæ¹å¼è¿ç»´åè¦ç³»ç»ã
- æ·»å åè¦éé»çæ¶å强ç建议添å è¶ æ¶æ¶é´ï¼ä¸ä¸å®è¿é¿ï¼é¿å æ·»å åéå¿ã
- æ·»å éé»çæ¶åä¸å®è¦åå°å¿éææ°ï¼é¿å åºç°æ éåè¦è¢«é¡ºå¸¦æ·»å éé»èåæªè¿è¡å¤ççæ åµã
äºãåå¨æå
æ¬æææå å®¹ä» ä¾åèï¼å åèªç¯å¢ä¸åï¼å¹¶ééç¨æ¹æ¡ï¼ä¸å¨ä½¿ç¨æä¸ä»£ç æ¶å¯è½ç¢°ä¸æªç¥çé®é¢ãå¦æ线ä¸ç¯å¢æä½éæ±ï¼è¯·å¨æµè¯ç¯å¢å åæµè¯ã