监控部署文档
前提:Prometheus,grafana已部署
服务部署脚本
1、脚本介绍
此脚本用于自动部署服务到节点,现在暂时只有部署node_exporter服务。
2、脚本结构介绍
main.sh
主程序,也是程序入口。代码如下
#!/bin/bash
red='\033[31m' # 定义红色
blue='\033[34m' # 定义蓝色
green='\033[32m' # 定义绿色
version=1.0
declare -A SERVICES
SERVICES=([1]=node_exporter [2]=test)
function service_choose(){
#echo -e "------------------------${red}请选择部署服务\033[0m------------------------\n"
echo -e "------------------------${green}service list:\033[0m------------------------"
for key in ${!SERVICES[*]}
do
echo -e "${green} $key.${SERVICES[$key]} ${green}"
done
echo -e "\033[0m"
read -p "Please key in numbers:" choose_number
# return $choose_number
}
function input_target(){
read -p "Please enter IP address:" ip
# return $ip
}
function node_exporter(){
#node_exporter部署脚本
/bin/bash node_exporter/deploy.sh $ip root
}
function ssh_copy_id(){
/bin/bash ssh_id_copy.sh $ip
}
echo -e " "
echo -e "${blue}+----------------------------------------------------------------"
echo -e "${green} Welcome Deployment tools! "
echo -e "${blue}+----------------------------------------------------------------"
echo -e "${green}版本信息: ${red}$version \033[0m"
echo " "
service_choose
input_target
ssh_copy_id
${SERVICES[$choose_number]}
echo "程序退出"
ssh_id_copy.sh
此脚本用于同步ssh密钥,用于免密登陆,代码如下。
#!/bin/bash
DEPLOY_TARGET=$1
# 服务器用户名
DEPLOY_TARGET_USER=root
# 服务器密码
DEPLOY_TARGET_PASSWD=123456
if [ ! -f /root/.ssh/id_rsa ];then
echo "不存在密钥,生成"
ssh-keygen -t rsa -P "" -f /root/.ssh/id_rsa &> /dev/null
fi
expect << EOF
set timeout 20
spawn ssh-copy-id -i /root/.ssh/id_rsa.pub $DEPLOY_TARGET_USER@$DEPLOY_TARGET
expect {
"yes/no" { send "yes\n";exp_continue }
"password" { send "$DEPLOY_TARGET_PASSWD\n";exp_continue }
}
#expect eof
EOF
node_exporter
此目录存放的部署node_export相关资源
deploy.sh
部署node_exporter
3、脚本使用
运行main.sh脚本
basn main.sh
选择需要部署服务(输入序号)
输入需要部署IP
等待服务部署完成
node_exporter 监控
通过服务部署脚本部署node_exporter后,访问检测一下localhost:9100。可以访问就表示成功。
Prometheus添加target
找到Prometheus部署机器,10.20.20.46
修改Prometheus的yaml文件
增加一段
- job_name: 'node'
scrape_interval: 5s
static_configs:
- targets: ['10.20.20.46:9100','10.20.20.162:9100']
设置告警策略
规则路径: /home/prometheus/rules
新建策略文件
vi linux_alert.yaml
groups:
- name: Node-Alert
rules:
- alert: Instance-Down #告警名称
expr: up == 0
for: 1m #持续多久后发送
labels:
severity: warning
annotations: #信息
summary: "Instance {{$labels.instance}} down"
description: "{{$labels.instance}}: job {{$labels.job}} has been down for more than 1 minutes."
- alert: "内存使用率过高"
expr: round(100- node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes*100) > 80
for: 1m
labels:
severity: warning
annotations:
summary: "{{ $labels.instance }}内存使用率过高"
description: "{{ $labels.instance }}当前使用率{{ $value }}%"
- alert: "CPU使用率过高"
expr: round(100 - ((avg by (instance,job)(irate(node_cpu_seconds_total{mode="idle",instance!~'bac-.*'}[5m]))) *100)) > 85
for: 2m
labels:
severity: warning
annotations:
summary: "{{ $labels.instance }}CPU使用率过高"
description: "{{ $labels.instance }}当前使用率{{ $value }}%"
- alert: "磁盘使用率过高"
expr: round(100-100*(node_filesystem_avail_bytes{fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"})) > 80
for: 15s
labels:
severity: warning
annotations:
summary: "{{ $labels.instance }}磁盘使用率过高"
description: "{{ $labels.instance }}当前磁盘{{$labels.mountpoint}} 使用率{{ $value }}%"
- alert: "分区容量过低"
expr: round(node_filesystem_avail_bytes{fstype=~"ext4|xfs",instance!~"testnode",mountpoint!~"/rootfs.*|/boot.*"}/1024/1024/1024) < 10
for: 15s
labels:
severity: warning
annotations:
summary: "{{ $labels.instance }}分区容量过低"
description: "{{ $labels.instance }}当前分区为“{{$labels.mountpoint}} ” 剩余容量{{ $value }}GB"
- alert: "网络流出速率过高"
expr: round(irate(node_network_receive_bytes_total{instance!~"data.*",device!~'tap.*|veth.*|br.*|docker.*|vir.*|lo.*|vnet.*'}[1m])/1024) > 2048
for: 1m
labels:
severity: warning
annotations:
summary: "{{ $labels.instance }}网络流出速率过高"
description: "{{ $labels.instance }}当前速率{{ $value }}KB/s"
重启Prometheus
docker restart prometheus
检查是否部署成功
然后进入网页查看target
添加成功
进入网页查看alerts
部署成功