天天看点

使用脚本批量部署node_exporter监控部署文档

监控部署文档

前提:Prometheus,grafana已部署

服务部署脚本

1、脚本介绍

此脚本用于自动部署服务到节点,现在暂时只有部署node_exporter服务。

2、脚本结构介绍

使用脚本批量部署node_exporter监控部署文档
使用脚本批量部署node_exporter监控部署文档

main.sh

主程序,也是程序入口。代码如下

#!/bin/bash
red='\033[31m' # 定义红色
blue='\033[34m' # 定义蓝色
green='\033[32m' # 定义绿色
version=1.0
declare -A SERVICES
SERVICES=([1]=node_exporter [2]=test)
function service_choose(){
        #echo -e "------------------------${red}请选择部署服务\033[0m------------------------\n"
        echo -e "------------------------${green}service list:\033[0m------------------------"
        for key in ${!SERVICES[*]}
        do
                echo -e "${green} $key.${SERVICES[$key]} ${green}"
        done
        echo -e "\033[0m"
        read -p "Please key in numbers:" choose_number
#       return $choose_number

}
function input_target(){
        read -p "Please enter IP address:" ip
#       return $ip 
}
function node_exporter(){
        #node_exporter部署脚本
        /bin/bash node_exporter/deploy.sh $ip root
}
function ssh_copy_id(){
        /bin/bash ssh_id_copy.sh $ip
}
echo -e " "
echo -e "${blue}+----------------------------------------------------------------"
echo -e "${green}    Welcome Deployment tools!                    "
echo -e "${blue}+----------------------------------------------------------------"
echo -e "${green}版本信息: ${red}$version \033[0m"
echo  " "
service_choose
input_target
ssh_copy_id
${SERVICES[$choose_number]}
echo "程序退出"
           

ssh_id_copy.sh

此脚本用于同步ssh密钥,用于免密登陆,代码如下。

#!/bin/bash
DEPLOY_TARGET=$1
# 服务器用户名
DEPLOY_TARGET_USER=root
# 服务器密码
DEPLOY_TARGET_PASSWD=123456
if [ ! -f /root/.ssh/id_rsa ];then
        echo "不存在密钥,生成"
        ssh-keygen -t rsa -P "" -f /root/.ssh/id_rsa &> /dev/null
fi

expect << EOF
set timeout 20
spawn ssh-copy-id -i /root/.ssh/id_rsa.pub $DEPLOY_TARGET_USER@$DEPLOY_TARGET
expect {
    "yes/no" { send "yes\n";exp_continue }
    "password" { send "$DEPLOY_TARGET_PASSWD\n";exp_continue }
}
#expect eof
EOF
           

node_exporter

此目录存放的部署node_export相关资源

deploy.sh

部署node_exporter

3、脚本使用

运行main.sh脚本

basn main.sh
           

选择需要部署服务(输入序号)

使用脚本批量部署node_exporter监控部署文档

输入需要部署IP

使用脚本批量部署node_exporter监控部署文档

等待服务部署完成

node_exporter 监控

通过服务部署脚本部署node_exporter后,访问检测一下localhost:9100。可以访问就表示成功。

Prometheus添加target

找到Prometheus部署机器,10.20.20.46

修改Prometheus的yaml文件

增加一段

- job_name: 'node'
    scrape_interval: 5s
    static_configs:
      - targets: ['10.20.20.46:9100','10.20.20.162:9100']
           

设置告警策略

规则路径: /home/prometheus/rules

新建策略文件

vi linux_alert.yaml

groups:
- name: Node-Alert
  rules:
  - alert: Instance-Down #告警名称
    expr: up == 0
    for: 1m #持续多久后发送
    labels:
      severity: warning
    annotations: #信息
      summary: "Instance {{$labels.instance}} down"
      description: "{{$labels.instance}}: job {{$labels.job}} has been down for more than 1 minutes."

  - alert: "内存使用率过高"
    expr: round(100- node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes*100) > 80
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "{{ $labels.instance }}内存使用率过高"
      description: "{{ $labels.instance }}当前使用率{{ $value }}%"

  - alert: "CPU使用率过高"
    expr: round(100 - ((avg by (instance,job)(irate(node_cpu_seconds_total{mode="idle",instance!~'bac-.*'}[5m]))) *100)) > 85
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "{{ $labels.instance }}CPU使用率过高"
      description: "{{ $labels.instance }}当前使用率{{ $value }}%"

  - alert: "磁盘使用率过高"
    expr: round(100-100*(node_filesystem_avail_bytes{fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"})) > 80
    for: 15s
    labels:
      severity: warning
    annotations:
      summary: "{{ $labels.instance }}磁盘使用率过高"
      description: "{{ $labels.instance }}当前磁盘{{$labels.mountpoint}} 使用率{{ $value }}%"

  - alert: "分区容量过低"
    expr: round(node_filesystem_avail_bytes{fstype=~"ext4|xfs",instance!~"testnode",mountpoint!~"/rootfs.*|/boot.*"}/1024/1024/1024) < 10
    for: 15s
    labels:
      severity: warning
    annotations:
      summary: "{{ $labels.instance }}分区容量过低"
      description: "{{ $labels.instance }}当前分区为“{{$labels.mountpoint}} ” 剩余容量{{ $value }}GB"

  - alert: "网络流出速率过高"
    expr: round(irate(node_network_receive_bytes_total{instance!~"data.*",device!~'tap.*|veth.*|br.*|docker.*|vir.*|lo.*|vnet.*'}[1m])/1024) > 2048
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "{{ $labels.instance }}网络流出速率过高"
      description: "{{ $labels.instance }}当前速率{{ $value }}KB/s"
           

重启Prometheus

docker restart prometheus
           

检查是否部署成功

然后进入网页查看target

使用脚本批量部署node_exporter监控部署文档

添加成功

进入网页查看alerts

使用脚本批量部署node_exporter监控部署文档

部署成功

继续阅读