Prometheus和Grafana完整教程

最近公司开始接触这两个东西，加上看到了一张告警框架的区域分布图。发现还是挺有意思的，亚洲基本都喜欢搞Zabbix这一套系统，而欧美等国家用Prometheus比较多。之前尝试搞过，没太懂，现在了解了基本怎么搞。比较难的是自己去写语句来搞监控，zabbix会shell即可，这个目前理解都是一些接口查询语句，自定义也能开发，把值传递给接口即可。目前使用下来感觉就个人少量服务器告警还是尝试用一下NETDATA,我这搞了一下，服务器（2GB /1 core）带不动。

概览

我都是用docker搞得，都说说每个组件都是干啥的吧？

组件	作用	监控端（需要监控的主机）	展示端（数据展示）	补充说明
Node Exporter	收集Host硬件和操作系统信息	YES	NO	主机信息
cAdvisor	负责收集Host上运行的 `容器` 信息			docker 信息采集
Prometheus Server	普罗米修斯监控主服务器			收集上面两个组件的数据并存储提供给Grafana来采集，随便安装到哪个机器上都行。
Grafana	展示普罗米修斯监控界面			把数据可视化出来
Alertmanager	告警发送	非必须		可在Grafana配置，比Grafana好一些
Pushgetway	`自定义` 告警	自定义需要	No	自定义

注意一点就是各个组件的关系、对应端口以及配置(注意容器中的localhost不能访问容器外的信息)。

安装

docker run -d -p 90:9100 \
-v "/proc:/host/proc" \
-v "/sys:/host/sys" \
-v "/:/rootfs" \
-v "/etc/localtime:/etc/localtime" \
--name=node-exporter \
prom/node-exporter

docker run -d \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:rw \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--publish=80:8080 \
--detach=true \
--name=cadvisor \
-v "/etc/localtime:/etc/localtime" \
google/cadvisor:latest

prometheus 配置文件

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).


# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - alertmanager:9093


# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
  # - /etc/prometheus/alert_rules.yml
  - /etc/prometheus/alert_rules.yml

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'


    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.


    static_configs:
    #监听的地址
    - targets: ['localhost:80','localhost:90']

  - job_name: 'mail-base'
    static_configs:
    - targets: ['xxx.xxx.xxx.xxx:80','xxx.xxx.xxx.xxx:90']

  - job_name: 'mail-docker'
    static_configs:
    - targets: ['xxx.xxx.xxx.xxx:80','xxx.xxx.xxx.xxx:90']

告警配置文件

groups:
- name: ali
  rules:


  # Alert for any instance that is unreachable for >5 minutes.
  - alert: InstanceDown
    expr: up == 0
    for: 5m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."


  # Alert for any instance that has a median request latency >1s.
  - alert: APIHighRequestLatency
    expr: api_http_request_latencies_second{quantile="0.5"} > 1
    for: 10m
    annotations:
      summary: "High request latency on {{ $labels.instance }}"
      description: "{{ $labels.instance }} has a median request latency above 1s (current value: {{ $value }}s)"

docker run  -d \
-p 9090:9090 \
-v /etc/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml  \
-v /etc/prometheus/alert_rules.yml:/etc/prometheus/alert_rules.yml \
--name prometheus \
prom/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--web.enable-lifecycle

建立文件夹并授权(没有授权启动不了)

mkdir /etc/grafana
chmod 777 /etc/grafana

docker run -d \
-p 3000:3000 \
--name=grafana \
-v /etc/grafana:/var/lib/grafana \
grafana/grafana

配置文件

global:
  resolve_timeout: 5m
  smtp_smarthost: 'xxxxxx.emperinter.info:465'
  smtp_from: '[email protected]'
  smtp_auth_username: '[email protected]'
  smtp_auth_password: 'xxxxxxxxxxx^'
  smtp_require_tls: false

route:
  receiver: team-test-mails
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 1m
  repeat_interval: 2m


receivers:
  - name: 'team-test-mails'
    email_configs:
    - to: '[email protected]'
      send_resolved: true

docker run -d -p 59093:9093 --name Alertmanager -v /etc/prometheus/alertmanager.yml:/etc/alertmanager/alertmanager.yml docker.io/prom/alertmanager:latest

Prometheus和Grafana完整教程

继续阅读

centOS7 配置 vsftpd 虚拟用户及权限Vsftpd配置虚拟用户及权限

linux-svn卸载与安装

vsftp虚拟多用户多权限一键部署脚本

Ubuntu14.04 LTS下安装mongodb

Nginx服务优化（1）——隐藏版本号、修改用户与组、网页缓存时间、日志切割、连接超时一、隐藏版本号二、修改用户与组三、配置Nginx网页缓存时间四、实现Nginx日志分割五、配置Nginx实现连接超时六、补充关于时间日期的命令

httpd服务的部署、启动、配置和简单优化一、部署二、启动三、配置文件

配置网页内容访问

手动安装Intel network I217-LM网卡的Linux驱动

禁止ubuntu系统弹出报错界面

Ubuntu Linux下Apache的配置文件

CentOS 7,docker安装

samba服务器的功能

【Docker】端口映射问题操作步骤

【Linux】UDP广播报文接收速率问题

Linux设备模型（中）之上层容器

PowerPC平台 Linux移植三