一、系统环境
IP | 角色 | 系统环境 | 防火墙 | Selinux | 软件版本 | 端口 |
192.168.2.130 | Nagios Server | Rhel7.2 X86-64 | 关闭 | 关闭 | Nagios 4.3.1 | Nagios-plugins2.2.0 | Nrpe3.0.1 | 5666 |
192.168.2.130 | Nagios Client | Rhel7.2 X86-64 | 关闭 | 关闭 | Nagios-plugins2.2.0 | Nrpe3.0.1 | 5666 |
二、添加 linux 端监控
# 客户端操作:
1、创建用户
#useradd -s /sbin/nologin nagios
2、安装 nagios-plugins
略,参照 Server 安装 篇
3、安装nrpe
#mkdir /usr/local/nagios
#chown nagios.nagios -R/usr/local/nagios
#yum -y install openssl-devel
#tar -zxvf nrpe-3.0.1.tar.gz
# cd nrpe-3.0.1
#./configure --with-nrpe-user=nagios --with-nrpe-group=nagios--with-nagios-user=nagios --with-nagios-group=nagios --enable-command-args--enable-ssl
#make all
#make install-plugin
#make install-daemon
#make install-config
#makeinstall-daemon-config # 3版本以下使用
#make install-xinetd # 3版本以下使用
4、配置nrpe.cfg
#mkdir /usr/local/nagios/etc
#cp -a /usr/local/src/nrpe-3.0.1/sample-config/nrpe.cfg /usr/local/nagios/etc/
#chown nagios.nagios -R /usr/local/nagios
# vi nrpe.cfg
log_facility=daemon debug=0 pid_file=/usr/local/nagios/var/nrpe.pid server_port=5666 server_address=192.168.1.202 nrpe_user=nagios nrpe_group=nagios allowed_hosts=127.0.0.1,192.168.1.201 dont_blame_nrpe=0 allow_bash_command_substitution=0 command_timeout=60 connection_timeout=300 command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10 command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20 command[check_disk]=/usr/local/nagios/libexec/check_disk -w 25% -c 15% command[check_mem]=/usr/local/nagios/libexec/check_mem.sh -w 15 -c 10 command[check_cpu]=/usr/local/nagios/libexec/check_cpu.sh -w 80% -c 90% command[check_uptime]=/usr/local/nagios/libexec/check_uptime.sh command[check_swap]=/usr/local/nagios/libexec/check_swap -w 85% -c 80% command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 250 -c 300 |
5、复制监控脚本至libexec目录下
# ll /usr/local/nagios/libexec/check_*.sh
-rwxr-xr-x 1 nagios nagios 8196 Mar 27 08:58 /usr/local/nagios/libexec/check_cpu.sh -rwxr-xr-x 1 nagios nagios 2789 Mar 27 08:58 /usr/local/nagios/libexec/check_mem.sh -rwxr-xr-x 1 nagios nagios 791 Mar 27 08:58 /usr/local/nagios/libexec/check_uptime.sh |
# 脚本下载地址:点击打开链接
6、生成启动nrpe脚本
#vi /etc/init.d/nrpe
#!/bin/bash # chkconfig: 2345 88 12 # description: NRPE DAEMON NRPE=/usr/local/nagios/bin/nrpe NRPECONF=/usr/local/nagios/etc/nrpe.cfg case "$1" in start) echo -n "Starting NRPE daemon..." $NRPE -c $NRPECONF -d echo " done." ;; stop) echo -n "Stopping NRPE daemon..." pkill -u nagios nrpe echo " done." ;; restart) $0 stop sleep 2 $0 start ;; *) echo "Usage: $0 start|stop|restart" ;; esac exit 0 |
# chmod +x /etc/init.d/nrpe
7、启动
# chkconfig nrpe on
# /etc/init.d/nrpe start
# ps -ef| grep nrpe
/usr/local/nagios/bin/nrpe-c /usr/local/nagios/etc/nrpe.cfg -d
8、验证
# netstat-tunlp
tcp 0 0 192.168.1.202:5666 0.0.0.0:* LISTEN 26921/nrpe |
Server 端操作:
1、修改主机配置文件
# cd/usr/local/nagios/etc/objects
# mv localhost.cfg linux.cfg
# vi linux.cfg #定义主机并定义服务,这里定义的服务名必须在commands.cfg文件中存在
############################################################################### # LOCALHOST.CFG - SAMPLE OBJECT CONFIG FILE FOR MONITORING THIS MACHINE # # # NOTE: This config file is intended to serve as an *extremely* simple # example of how you can create configuration entries to monitor # the local (Linux) machine. # ############################################################################### ############################################################################### ############################################################################### # # HOST DEFINITION # ############################################################################### ############################################################################### # Define a host for the local machine define host{ use linux-server ; Name of host template to use host_name zabbix_server alias zabbix_server address 192.168.1.201 contact_groups +admins } define host{ use linux-server ; Name of host template to use host_name zabbix_proxy alias zabbix_proxy address 192.168.1.202 contact_groups +admins } ############################################################################### ############################################################################### # # HOST GROUP DEFINITION # ############################################################################### ############################################################################### # Define an optional hostgroup for Linux machines define hostgroup{ hostgroup_name linux-servers ; The name of the hostgroup alias Linux Servers ; Long name of the group members zabbix_server,zabbix_proxy ; Comma separated list of hosts that belong to this group } ############################################################################### ############################################################################### # # SERVICE DEFINITIONS # ############################################################################### ############################################################################### # Define a service to "ping" the local machine define service{ use generic-service ; Name of service template to use hostgroup_name linux-servers ; localhost service_description PING check_command check_ping!100.0,20%!500.0,60% } # Define a service to check the disk space of the root partition # on the local machine. Warning if < 20% free, critical if # < 10% free space on partition. define service{ use generic-service ; Name of service template to use hostgroup_name linux-servers service_description Root Partition check_command check_local_disk!20%!10%!/ } # Define a service to check the number of currently logged in # users on the local machine. Warning if > 20 users, critical # if > 50 users. define service{ use generic-service ; Name of service template to use hostgroup_name linux-servers service_description Current Users check_command check_local_users!20!50 } # Define a service to check the number of currently running procs # on the local machine. Warning if > 250 processes, critical if # > 400 processes. define service{ use generic-service ; Name of service template to use hostgroup_name linux-servers service_description Total Processes check_command check_local_procs!250!400!RSZDT } # Define a service to check the load on the local machine. define service{ use generic-service ; Name of service template to use hostgroup_name linux-servers service_description Current Load check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0 } # Define a service to check the swap usage the local machine. # Critical if less than 10% of swap is free, warning if less than 20% is free define service{ use generic-service ; Name of service template to use hostgroup_name linux-servers service_description Swap Usage check_command check_local_swap!20!10 } # Define a service to check SSH on the local machine. # Disable notifications for this service by default, as not all users may have SSH enabled. define service{ use generic-service ; Name of service template to use hostgroup_name linux-servers service_description SSH check_command check_ssh! notifications_enabled 0 } # Define a service to check HTTP on the local machine. # Disable notifications for this service by default, as not all users may have HTTP enabled. define service{ use generic-service ; Name of service template to use host_name zabbix_server service_description HTTP check_command check_http! notifications_enabled 0 } define service { use generic-service hostgroup_name linux-servers service_description Check CPU check_command check_nrpe!check_cpu notifications_enabled 0 } define service { use generic-service hostgroup_name linux-servers service_description Check Disk check_command check_nrpe!check_disk notifications_enabled 0 } define service { use generic-service hostgroup_name linux-servers service_description Check MEM check_command check_nrpe!check_mem notifications_enabled 0 } define service { use generic-service hostgroup_name linux-servers service_description Check uptime check_command check_nrpe!check_uptime notifications_enabled 0 } define service { use generic-service hostgroup_name linux-servers service_description Check uptime check_command check_nrpe!check_uptime notifications_enabled 0 } |
2、定义服务
#vicommands.cfg
define command { command_name check_local_cpu command_line $USER1$/check_cpu.sh -w $ARG1$ -c $ARG2$ } define command { command_name check_local_mem command_line $USER1$/check_local_mem.sh -w $ARG1$ -c $ARG2$ } define command { command_name check_nrpe command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ } define command { command_name check_local_uptime command_line $USER1$/check_uptime.sh } |
3、定义总配置文件
#vi ../nagios.cfg
cfg_file=/usr/local/nagios/etc/objects/linux.cfg #将localhost.cfg修改为linux.cfg |
4、验证配置文件是否有错误
#/usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
Nagios Core 4.3.1 Copyright (c) 2009-present Nagios Core Development Team and Community Contributors Copyright (c) 1999-2009 Ethan Galstad Last Modified: 02-23-2017 License: GPL Website: https://www.nagios.org Reading configuration data... Read main config file okay... Warning: Duplicate definition found for service 'Check uptime' on host 'zabbix_proxy' (config file '/usr/local/nagios/etc/objects/linux.cfg', starting on line 188) Warning: Duplicate definition found for service 'Check uptime' on host 'zabbix_server' (config file '/usr/local/nagios/etc/objects/linux.cfg', starting on line 196) Read object config files okay... Running pre-flight check on configuration data... Checking objects... Checked 30 services. Checked 3 hosts. Checked 2 host groups. Checked 0 service groups. Checked 1 contacts. Checked 1 contact groups. Checked 28 commands. Checked 5 time periods. Checked 0 host escalations. Checked 0 service escalations. Checking for circular paths... Checked 3 hosts Checked 0 service dependencies Checked 0 host dependencies Checked 5 timeperiods Checking global event handlers... Checking obsessive compulsive processor commands... Checking misc settings... Total Warnings: 0 Total Errors: 0 |
5、验证客户端与服务端连通性
#cd /usr/local/nagios/libexec/
#./check_nrpe -H 192.168.1.202
NRPE v3.0.1 #出现该字段则说明正常
6、启动
# servicenagios restart
三、添加Win端监控
1、客户端配置
安装NSClient++,NSClient++有32位版和64位版
解压NSClient0.3.8-Win32至C盘根目录
打开windows命令行,切换到NSClient0.3.8-Win32目录
执行NSClient++ /install
执行NSClient++ SysTray(注意大小写有区别)
安装完后打打开windows服务设置,如下图,勾选“允许服务与桌面交互”
![](https://img.laitimes.com/img/9ZDMuAjOiMmIsIjOiQnIsIiclRnblN0LclHdpZXYyd2LcBzNvwVZ2x2bzNXak9CX90TQNNkRrFlQKBTSvwFbslmZvwFMwQzLcVmepNHdu9mZvwFVywUNMZTY18CX052bm9CX9kEROlXTU1kejpXTmJEViZXUYpVd1kmYr50MZV3YyI2cKJDT29GRjBjUIF2LcRHelR3LcJzLctmch1mclRXY39jM3EjMwczMwEzMwgDM3EDMy8CX0Vmbu4GZzNmLn9Gbi1yZtl2Lc9CX6MHc0RHaiojIsJye.jpg)
# 修改NSClient0.3.8-Win32下的nsc.ini文件
[modules]选项
所有模块前面的注释都去掉,除了CheckWMI.dll and RemoteConfiguration.dll这两个;
[Settings]选项
allowed_hosts选项的注释去掉,并且加上运行nagios的监控主机的IP.我改为如下:allowed_hosts=127.0.0.1/32,172.16.99.245
[NSClient]选项
port选项去掉注释,并且它的值是'12489',这是NSClient的默认监听端口;
在命令行中执行NSClient++ /start启动服务
windows主机如有防火墙,请开放相应端口
2、服务端配置
# vinagios.cfg
cfg_file=/usr/local/nagios/etc/objects/windows.cfg #将前面的注释删掉
# vi/usr/local/nagios/etc/objects/windows.cfg #按要求配置,参照linux
3、重启服务
# servicenagios restart