监控部署文档

前提：Prometheus，grafana已部署

服务部署脚本

1、脚本介绍

此脚本用于自动部署服务到节点，现在暂时只有部署node_exporter服务。

2、脚本结构介绍

使用脚本批量部署node_exporter监控部署文档

main.sh

主程序，也是程序入口。代码如下

#!/bin/bash
red='\033[31m' # 定义红色
blue='\033[34m' # 定义蓝色
green='\033[32m' # 定义绿色
version=1.0
declare -A SERVICES
SERVICES=([1]=node_exporter [2]=test)
function service_choose(){
        #echo -e "------------------------${red}请选择部署服务\033[0m------------------------\n"
        echo -e "------------------------${green}service list:\033[0m------------------------"
        for key in ${!SERVICES[*]}
        do
                echo -e "${green} $key.${SERVICES[$key]} ${green}"
        done
        echo -e "\033[0m"
        read -p "Please key in numbers:" choose_number
#       return $choose_number

}
function input_target(){
        read -p "Please enter IP address:" ip
#       return $ip 
}
function node_exporter(){
        #node_exporter部署脚本
        /bin/bash node_exporter/deploy.sh $ip root
}
function ssh_copy_id(){
        /bin/bash ssh_id_copy.sh $ip
}
echo -e " "
echo -e "${blue}+----------------------------------------------------------------"
echo -e "${green}    Welcome Deployment tools!                    "
echo -e "${blue}+----------------------------------------------------------------"
echo -e "${green}版本信息: ${red}$version \033[0m"
echo  " "
service_choose
input_target
ssh_copy_id
${SERVICES[$choose_number]}
echo "程序退出"

ssh_id_copy.sh

此脚本用于同步ssh密钥，用于免密登陆，代码如下。

#!/bin/bash
DEPLOY_TARGET=$1
# 服务器用户名
DEPLOY_TARGET_USER=root
# 服务器密码
DEPLOY_TARGET_PASSWD=123456
if [ ! -f /root/.ssh/id_rsa ];then
        echo "不存在密钥，生成"
        ssh-keygen -t rsa -P "" -f /root/.ssh/id_rsa &> /dev/null
fi

expect << EOF
set timeout 20
spawn ssh-copy-id -i /root/.ssh/id_rsa.pub $DEPLOY_TARGET_USER@$DEPLOY_TARGET
expect {
    "yes/no" { send "yes\n";exp_continue }
    "password" { send "$DEPLOY_TARGET_PASSWD\n";exp_continue }
}
#expect eof
EOF

node_exporter

此目录存放的部署node_export相关资源

deploy.sh

部署node_exporter

3、脚本使用

运行main.sh脚本

basn main.sh

选择需要部署服务（输入序号）

使用脚本批量部署node_exporter监控部署文档

输入需要部署IP

使用脚本批量部署node_exporter监控部署文档

等待服务部署完成

node_exporter 监控

通过服务部署脚本部署node_exporter后，访问检测一下localhost:9100。可以访问就表示成功。

Prometheus添加target

找到Prometheus部署机器，10.20.20.46

修改Prometheus的yaml文件

增加一段

- job_name: 'node'
    scrape_interval: 5s
    static_configs:
      - targets: ['10.20.20.46:9100','10.20.20.162:9100']

设置告警策略

规则路径： /home/prometheus/rules

新建策略文件

vi linux_alert.yaml

groups:
- name: Node-Alert
  rules:
  - alert: Instance-Down #告警名称
    expr: up == 0
    for: 1m #持续多久后发送
    labels:
      severity: warning
    annotations: #信息
      summary: "Instance {{$labels.instance}} down"
      description: "{{$labels.instance}}: job {{$labels.job}} has been down for more than 1 minutes."

  - alert: "内存使用率过高"
    expr: round(100- node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes*100) > 80
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "{{ $labels.instance }}内存使用率过高"
      description: "{{ $labels.instance }}当前使用率{{ $value }}%"

  - alert: "CPU使用率过高"
    expr: round(100 - ((avg by (instance,job)(irate(node_cpu_seconds_total{mode="idle",instance!~'bac-.*'}[5m]))) *100)) > 85
    for: 2m
    labels:
      severity: warning
    annotations:
      summary: "{{ $labels.instance }}CPU使用率过高"
      description: "{{ $labels.instance }}当前使用率{{ $value }}%"

  - alert: "磁盘使用率过高"
    expr: round(100-100*(node_filesystem_avail_bytes{fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"})) > 80
    for: 15s
    labels:
      severity: warning
    annotations:
      summary: "{{ $labels.instance }}磁盘使用率过高"
      description: "{{ $labels.instance }}当前磁盘{{$labels.mountpoint}} 使用率{{ $value }}%"

  - alert: "分区容量过低"
    expr: round(node_filesystem_avail_bytes{fstype=~"ext4|xfs",instance!~"testnode",mountpoint!~"/rootfs.*|/boot.*"}/1024/1024/1024) < 10
    for: 15s
    labels:
      severity: warning
    annotations:
      summary: "{{ $labels.instance }}分区容量过低"
      description: "{{ $labels.instance }}当前分区为“{{$labels.mountpoint}} ” 剩余容量{{ $value }}GB"

  - alert: "网络流出速率过高"
    expr: round(irate(node_network_receive_bytes_total{instance!~"data.*",device!~'tap.*|veth.*|br.*|docker.*|vir.*|lo.*|vnet.*'}[1m])/1024) > 2048
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "{{ $labels.instance }}网络流出速率过高"
      description: "{{ $labels.instance }}当前速率{{ $value }}KB/s"

重启Prometheus

docker restart prometheus

检查是否部署成功

然后进入网页查看target

使用脚本批量部署node_exporter监控部署文档

添加成功

进入网页查看alerts

使用脚本批量部署node_exporter监控部署文档

部署成功

使用脚本批量部署node_exporter监控部署文档

监控部署文档

服务部署脚本

1、脚本介绍

2、脚本结构介绍

main.sh

ssh_id_copy.sh

node_exporter

3、脚本使用

node_exporter 监控

Prometheus添加target

设置告警策略

重启Prometheus

检查是否部署成功

继续阅读

Shell编程——sort排序、uniq忽略重复、tr替换压缩删除、cut指定删除字段、正则表达式元字符sort 命令uniq 命令tr 命令cut 命令正则表达式

Ubuntu14.04 LTS下安装mongodb

Nginx服务优化（1）——隐藏版本号、修改用户与组、网页缓存时间、日志切割、连接超时一、隐藏版本号二、修改用户与组三、配置Nginx网页缓存时间四、实现Nginx日志分割五、配置Nginx实现连接超时六、补充关于时间日期的命令

Linxu常用命令技巧汇总

httpd服务的部署、启动、配置和简单优化一、部署二、启动三、配置文件

配置网页内容访问

手动安装Intel network I217-LM网卡的Linux驱动

《Linux命令行与Shell脚本编程大全第2版.布卢姆》pdf

禁止ubuntu系统弹出报错界面

Ubuntu Linux下Apache的配置文件

nginx 安装错误信息解决

Ambari介绍和架构原理

samba服务器的功能

【Linux】UDP广播报文接收速率问题

Linux设备模型（中）之上层容器

PowerPC平台 Linux移植三