環境準備:
1.部署好Prometheus,可參考https://blog.51cto.com/u_13760351/5513690
2.為了友善實驗,隻用了一台伺服器(192.168.10.15),生産環境可以分開部署
部署過程:
1.編輯alertmanager.yml
vim alertmanager.yml
global:
resolve_timeout: 5m
smtp_from: '[email protected]' #發送人的郵箱,自定義
smtp_smarthost: 'smtp.qq.com:465'
smtp_auth_username: '[email protected]' #你自己的郵箱
smtp_auth_password: '安全碼' #這個是上面擷取安全碼的值,不是你郵箱的密碼
smtp_require_tls: false
smtp_hello: 'qq.com'
route:
group_by: ['alertname']
group_wait: 5s
group_interval: 5s
repeat_interval: 5m
receiver: 'email'
receivers:
- name: 'email'
email_configs:
- to: '[email protected]' #收件人
send_resolved: true
2.部署alertmanager服務
docker run -d --privileged=true \
--restart=always \
-p 9093:9093 \
-v /root/prometheus/alertmanager.yml:/etc/alertmanager/alertmanager.yml \
-v /etc/localtime:/etc/localtime:ro \
--name alertmanager \
quay.io/prometheus/alertmanager:latest
3.編寫告警規則
vim host_monitor.yml
groups:
- name: node-up
rules:
- alert: node-up
expr: up == 0
for: 5s #服務停止超過5秒就會告警停止
labels:
team: node
annotations:
summary: "{{$labels.instance}} Instance has been down for more than 5 seconds"
4.添加告警
vim prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- 192.168.10.15:9093 #替換為目前主機ip
rule_files:
- "/etc/prometheus/host_monitor.yml" #規則路徑
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
file_sd_configs:
- files:
- node_targets.yml
- job_name: 'mysql'
static_configs:
- targets: ['192.168.10.15:9104']
手動停止node服務,測試告警