环境准备:
1.部署好Prometheus,可参考https://blog.51cto.com/u_13760351/5513690
2.为了方便实验,只用了一台服务器(192.168.10.15),生产环境可以分开部署
部署过程:
1.编辑alertmanager.yml
vim alertmanager.yml
global:
resolve_timeout: 5m
smtp_from: '[email protected]' #发送人的邮箱,自定义
smtp_smarthost: 'smtp.qq.com:465'
smtp_auth_username: '[email protected]' #你自己的邮箱
smtp_auth_password: '安全码' #这个是上面获取安全码的值,不是你邮箱的密码
smtp_require_tls: false
smtp_hello: 'qq.com'
route:
group_by: ['alertname']
group_wait: 5s
group_interval: 5s
repeat_interval: 5m
receiver: 'email'
receivers:
- name: 'email'
email_configs:
- to: '[email protected]' #收件人
send_resolved: true
2.部署alertmanager服务
docker run -d --privileged=true \
--restart=always \
-p 9093:9093 \
-v /root/prometheus/alertmanager.yml:/etc/alertmanager/alertmanager.yml \
-v /etc/localtime:/etc/localtime:ro \
--name alertmanager \
quay.io/prometheus/alertmanager:latest
3.编写告警规则
vim host_monitor.yml
groups:
- name: node-up
rules:
- alert: node-up
expr: up == 0
for: 5s #服务停止超过5秒就会告警停止
labels:
team: node
annotations:
summary: "{{$labels.instance}} Instance has been down for more than 5 seconds"
4.添加告警
vim prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- 192.168.10.15:9093 #替换为当前主机ip
rule_files:
- "/etc/prometheus/host_monitor.yml" #规则路径
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node'
file_sd_configs:
- files:
- node_targets.yml
- job_name: 'mysql'
static_configs:
- targets: ['192.168.10.15:9104']
手动停止node服务,测试告警