天天看點

prometheus配置告警分發到不同的釘釘群

場景

假如我現在對一個MQ叢集監控,設定告警,有如下兩條規則:

- alert: "RocketMQ,xxx_consumer出現消息積壓"
    expr: sum by(group, topic) (rocketmq_group_diff{group="xxx_consumer",topic="xxx"}) > 1000
    for: 1m
    labels:
      severity: busi
    annotations:
      description: '消費組xxx_consumer消費xxx的消息時出現消息積壓,積壓量已超過1000'
      summary: 'RocketMQ, xxx_consumer出現消息積壓'
  - alert: "broker節點挂了"
    expr: count(rocketmq_broker_disk_ratio{cluster="XXXCluster"}) < 4
    for: 0m
    labels:
      severity: warning
    annotations:
      description: 'broker節點個數少于4個了'
      summary: 'broker節點挂了'           

上面2條規則如下:

  1. 規則一:業務組的某個消費組(核心業務),不能出現消息積壓,超過1000條就告警通知他們
  2. 規則二:我們的MQ叢集某個業務挂了,我們自己要及時收到告警

現在實際情況是這樣,如果規則一告警,必須要及時通知到相對應的業務組的告警的釘釘群裡,同時也要通知到我們自己的釘釘群。規則二告警隻通知我們自己的群,業務側不關心。即:有些告警需要同時分發到多個群,有些隻發送給某個群。

注意上面的嚴重程度(serverity)配置,注意用這個來區分,規則一是:busi,規則2是:warning。

配置示例如下

alertmanager的配置

global:
  resolve_timeout: 5m
  smtp_from: [email protected]
  smtp_smarthost: smtp.net:port
  smtp_auth_username: [email protected]
  smtp_auth_password: PASS
  smtp_require_tls: false
route:
  receiver: 'email'
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 10m
  routes:
  - receiver: 'our'
    group_wait: 10s
    match_re:
       severity: warning
  - receiver: 'other'
    group_wait: 10s
    match_re:
       severity: busi

templates:
  - '*.html'
receivers:
- name: 'email'
  email_configs:
  - to: '[email protected]'
    send_resolved: false
    html: '{{ template "default-monitor.html" . }}'
    headers: { Subject: "[WARN] 報警郵件" } #郵件主題
- name: 'our'
  webhook_configs:
  - url: http://127.0.0.1:8060/dingtalk/our/send
- name: 'other'
  webhook_configs:
  - url: http://127.0.0.1:8060/dingtalk/our/send
  - url: http://127.0.0.1:8060/dingtalk/other/send           

global:設定預設的郵箱配置,如果沒有比對的接收者就采用郵件通知

route:除了email這個全局配置的接收者外,下面的routes指定了兩個特定的接收者,一個接收者叫“our”,比對warning級别的;另一個叫“other”,比對busi級别的,這兩個級别在最前面的規則裡定義,不是什麼特定關鍵字,就是自己随便定義的一個标記

receivers:這裡指定了上面定義的接收者的配置,email指定郵件發給誰;“our”指定dingtalk的發送url,注意這個uri的末尾,send前用的"our";“other”下面指定了兩個url,差別就是url末尾的send前面,一個是“our”,另一個是"other"

下面順便貼一下我用的郵件模闆(檔案名:default-monitor.html),模闆格式是一個table:

{{ define "default-monitor.html" }}
<table>
    <tr><td>報警名</td><td>描述</td><td>開始時間</td></tr>
    {{ range $i, $alert := .Alerts }}
        <tr><td>{{ index $alert.Labels "alertname" }}</td><td>{{ index $alert.Annotations "description" }}</td><td>{{ $alert.StartsAt }}</td></tr>
    {{ end }}
</table>
{{ end }}
           

prometheus-webhook-dingtalk配置

## Customizable templates path
templates:
   - /home/user/monitor/alert/prometheus-webhook-dingtalk-1.4.0.linux-amd64/template/template.tmpl

## Targets, previously was known as "profiles"
targets:
  our:
    url: https://oapi.dingtalk.com/robot/send?access_token=xxxx
    secret: xxx_secret
  other:
    url: https://oapi.dingtalk.com/robot/send?access_token=xxx_other
    secret: xxx_other_secret           

繼續閱讀