一、简介
1、获取更多的NSClient原理,请参考NSClient++官方站点
2、NSClient++与NRPE
NSClient++的工作原理
<a href="http://s3.51cto.com/wyfs02/M02/4A/E9/wKiom1QnYaewuJNPAACIXWqRWWw312.jpg" target="_blank"></a>
NRPE的工作原理
<a href="http://s3.51cto.com/wyfs02/M00/4A/F0/wKiom1Qnc3zAjECMAAEUsOqv9kM145.jpg" target="_blank"></a>
Nagios对Windows主机的监控主要有三种方法
第一种是NSclient++
第二种是NRPE
第三种是SNMP(不是很常用)
NSclient++与nrpe最大的区别就是:
1、被监控机上安装有nrpe,并且还有插件,最终的监控是由这些插件来进行的.当监控主机将监控请求发给nrpe后,nrpe调用插件来完成监控.
2、NSclient++则不同,被监控机上只安装NSclient++,没有任何的插件.当监控主机将监控请求发给NSclient++后,NSclient++直接完成监控,所有的监控是由NSclient++完成的。
这也说明了NSclient++的一个很大的问题,不灵活,没有可扩展性.它只能完成自己本身包含的监控操作,不能由一些插件来扩展.好在NSclient++已经做的不错了,基本上可以完全满足我们的监控需要。
二、check_nt的方式监控windows主机
1、下载NSClient++
2、安装NSClient++
<a href="http://s3.51cto.com/wyfs02/M01/4A/B2/wKiom1QmswLChu7BAAF5ugAoQLo820.jpg" target="_blank"></a>
<a href="http://s3.51cto.com/wyfs02/M02/4A/B4/wKioL1QmsyqyCXr-AAHsG6bmgZw306.jpg" target="_blank"></a>
<a href="http://s3.51cto.com/wyfs02/M02/4A/B2/wKiom1QmswOwzMYBAAGsdr9E0Pc403.jpg" target="_blank"></a>
<a href="http://s3.51cto.com/wyfs02/M00/4A/B4/wKioL1QmsyqTfejsAAGRfdnH-xU473.jpg" target="_blank"></a>
Allowed hosts:(this is the IP of the nagios (or other)server)
允许的主机地址:Nagios服务器端的IP地址
NSClient password(only userd via check_nt)
NSClient的密码:填写Nagios跟NSClient++进程通信的密码,可以不设置
Modules to load:
安装并加载相应的模块:NSClient++自带的有check_plugins插件,check_nt,check_nrpe,NSCA,WMI
在这里我们全部选择,以后会有用到的。
<a href="http://s3.51cto.com/wyfs02/M00/4A/B2/wKiom1QmswOTN7ZdAAF8CMhGqWs779.jpg" target="_blank"></a>
<a href="http://s3.51cto.com/wyfs02/M02/4A/B4/wKioL1Qmsyvjmrj1AADuL2UQatY830.jpg" target="_blank"></a>
点击【Finsh】,表示安装NSClient++完成
<a href="http://s3.51cto.com/wyfs02/M01/4A/B4/wKioL1QmsyujK89FAAGCBr_MNf0266.jpg" target="_blank"></a>
3、查看NSClient++服务是否已经启用
<a href="http://s3.51cto.com/wyfs02/M02/4A/E9/wKioL1QnXTmgMzAWAAQEAtOdHlA759.jpg" target="_blank"></a>
<a href="http://s3.51cto.com/wyfs02/M01/4A/E9/wKiom1QnYYexKlUCAACG8_0iqhA152.jpg" target="_blank"></a>
4、查看NSClient++的配置文件
默认是安装在C:\Program Files\NSClient++ 目录下,NSC.ini即为NSClient服务的配置文件,一般我们无需修改,但是当我们监控端的IP地址改变时,或者密码忘记,即可以在这里修改了。
安装时加载的模块
<a href="http://s3.51cto.com/wyfs02/M00/4A/E9/wKioL1QnXLeisp6DAAOmUPyr8zQ318.jpg" target="_blank"></a>
设置允许连接的地址:为Nagios的IP地址
<a href="http://s3.51cto.com/wyfs02/M00/4A/E7/wKiom1QnXJDxt5BDAAZlbauD00Y308.jpg" target="_blank"></a>
NRPE的默认端口号
<a href="http://s3.51cto.com/wyfs02/M01/4A/E9/wKioL1QnXLnCQe-sAATIjXBJX2M538.jpg" target="_blank"></a>
三、NSClient应用监控
NSClient++与Nagios服务器通信,主要使用Nagios服务器的check_nt插件。原理图如下
<a href="http://s3.51cto.com/wyfs02/M02/4A/E8/wKiom1QnXc7Rghy-AAC9NsiJ0jA271.jpg" target="_blank"></a>
1、check_nt插件的使用说明
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
<code>[root@Nagios ~]</code><code># cd /usr/local/nagios/libexec/</code>
<code>[root@Nagios libexec]</code><code># ./check_nt -h #查看check_nt的语法</code>
<code>check_nt v2.0.3 (nagios-plugins 2.0.3)</code>
<code>Copyright (c) 2000 Yves Rubin ([email protected])</code>
<code>Copyright (c) 2000-2014 Nagios Plugin Development Team</code>
<code> </code><code><[email protected]></code>
<code>This plugin collects data from the NSClient service running on a</code>
<code>Windows NT</code><code>/2000/XP/2003</code> <code>server.</code>
<code>Usage:</code>
<code>check_nt -H host -</code><code>v</code> <code>variable [-p port] [-w warning] [-c critical]</code>
<code>[-l params] [-d SHOWALL] [-u] [-t timeout]</code>
<code>Options:</code>
<code> </code><code>-h, --help</code>
<code> </code><code>Print detailed help </code><code>screen</code>
<code> </code><code>-V, --version</code>
<code> </code><code>Print version information</code>
<code> </code><code>--extra-opts=[section][@</code><code>file</code><code>]</code>
<code> </code><code>Read options from an ini </code><code>file</code><code>. See</code>
<code> </code><code>https:</code><code>//www</code><code>.nagios-plugins.org</code><code>/doc/extra-opts</code><code>.html</code>
<code> </code><code>for</code> <code>usage and examples.</code>
<code> </code><code>-H, --</code><code>hostname</code><code>=HOST</code>
<code> </code><code>Name of the host to check</code>
<code> </code><code>-p, --port=INTEGER</code>
<code> </code><code>Optional port number (default: 1248) </code><code>#默认端口号</code>
<code> </code><code>-s, --secret=<password> </code><code>#指定的密码</code>
<code> </code><code>Password needed </code><code>for</code> <code>the request</code>
<code> </code><code>-w, --warning=INTEGER</code>
<code> </code><code>Threshold </code><code>which</code> <code>will result </code><code>in</code> <code>a warning status</code>
<code> </code><code>-c, --critical=INTEGER</code>
<code> </code><code>Threshold </code><code>which</code> <code>will result </code><code>in</code> <code>a critical status</code>
<code> </code><code>-t, --timeout=INTEGER</code>
<code> </code><code>Seconds before connection attempt </code><code>times</code> <code>out (default: -l, --params=<parameters></code>
<code> </code><code>Parameters passed to specified check (see below) -d, --display={SHOWALL}</code>
<code> </code><code>Display options (currently only SHOWALL works) -u, --unknown-timeout</code>
<code> </code><code>Return UNKNOWN on timeouts10)</code>
<code> </code><code>Print this help </code><code>screen</code>
<code> </code><code>Print version information</code>
<code> </code><code>-</code><code>v</code><code>, --variable=STRING</code>
<code> </code><code>Variable to check</code>
<code>Valid variables are:</code>
<code> </code><code>CLIENTVERSION = Get the NSClient version</code>
<code> </code><code>If -l <version> is specified, will </code><code>return</code> <code>warning </code><code>if</code> <code>versions differ.</code>
<code> </code><code>CPULOAD =</code>
<code> </code><code>Average CPU load on last x minutes.</code>
<code> </code><code>Request a -l parameter with the following syntax:</code>
<code> </code><code>-l <minutes range>,<warning threshold>,<critical threshold>.</code>
<code> </code><code><minute range> should be </code><code>less</code> <code>than 24*60.</code>
<code> </code><code>Thresholds are percentage and up to 10 requests can be </code><code>done</code> <code>in</code> <code>one shot.</code>
<code> </code><code>ie: -l 60,90,95,120,90,95</code>
<code> </code><code>UPTIME =</code>
<code> </code><code>Get the uptime of the machine.</code>
<code> </code><code>-l <unit> </code>
<code> </code><code><unit> = seconds, minutes, hours, or days. (default: minutes)</code>
<code> </code><code>Thresholds will use the unit specified above.</code>
<code> </code><code>USEDDISKSPACE =</code>
<code> </code><code>Size and percentage of disk use.</code>
<code> </code><code>Request a -l parameter containing the drive letter only.</code>
<code> </code><code>Warning and critical thresholds can be specified with -w and -c.</code>
<code> </code><code>MEMUSE =</code>
<code> </code><code>Memory use.</code>
<code> </code><code>SERVICESTATE =</code>
<code> </code><code>Check the state of one or several services.</code>
<code> </code><code>Request a -l parameters with the following syntax:</code>
<code> </code><code>-l <service1>,<service2>,<service3>,...</code>
<code> </code><code>You can specify -d SHOWALL </code><code>in</code> <code>case</code> <code>you want to see working services</code>
<code> </code><code>in</code> <code>the returned string.</code>
<code> </code><code>PROCSTATE =</code>
<code> </code><code>Check </code><code>if</code> <code>one or several process are running.</code>
<code> </code><code>Same syntax as SERVICESTATE.</code>
<code> </code><code>COUNTER =</code>
<code> </code><code>Check any performance counter of Windows NT</code><code>/2000</code><code>.</code>
<code> </code><code>Request a -l parameters with the following syntax:</code>
<code> </code><code>-l </code><code>"\\<performance object>\\counter"</code><code>,"<description></code>
<code> </code><code>The <description> parameter is optional and is given to a </code><code>printf</code>
<code> </code><code>output </code><code>command</code> <code>which</code> <code>requires a float parameter.</code>
<code> </code><code>If <description> does not include </code><code>"%%"</code><code>, it is used as a label.</code>
<code> </code><code>Some examples:</code>
<code> </code><code>"Paging file usage is %%.2f %%%%"</code>
<code> </code><code>"%%.f %%%% paging file used."</code>
<code> </code><code>INSTANCES =</code>
<code> </code><code>Check any performance counter object of Windows NT</code><code>/2000</code><code>.</code>
<code> </code><code>Syntax: check_nt -H <</code><code>hostname</code><code>> -p <port> -</code><code>v</code> <code>INSTANCES -l <counter object></code>
<code> </code><code><counter object> is a Windows Perfmon Counter object (eg. Process),</code>
<code> </code><code>if</code> <code>it is two words, it should be enclosed </code><code>in</code> <code>quotes</code>
<code> </code><code>The returned results will be a comma-separated list of instances on </code>
<code> </code><code>the selected computer </code><code>for</code> <code>that object.</code>
<code> </code><code>The purpose of this is to be run from </code><code>command</code> <code>line to determine what instances</code>
<code> </code><code>are available </code><code>for</code> <code>monitoring without having to log onto the Windows server</code>
<code> </code><code>to run Perfmon directly.</code>
<code> </code><code>It can also be used </code><code>in</code> <code>scripts that automatically create Nagios service</code>
<code> </code><code>configuration files.</code>
<code> </code><code>check_nt -H 192.168.1.1 -p 1248 -</code><code>v</code> <code>INSTANCES -l Process </code><code>#check_nt的语法</code>
<code> </code>
<code>Notes:</code>
<code> </code><code>- The NSClient service should be running on the server to get any information</code>
<code> </code><code>(http:</code><code>//nsclient</code><code>.ready2run.</code><code>nl</code><code>).</code>
<code> </code><code>- Critical thresholds should be lower than warning thresholds</code>
<code> </code><code>- Default port 1248 is sometimes </code><code>in</code> <code>use by other services. The error</code>
<code> </code><code>output when this happens contains </code><code>"Cannot map xxxxx to protocol number"</code><code>.</code>
<code> </code><code>One fix </code><code>for</code> <code>this is to change the port to something </code><code>else</code> <code>on check_nt </code>
<code> </code><code>and on the client service it's connecting to.</code>
<code>Send email to [email protected] </code><code>if</code> <code>you have questions regarding use</code>
<code>of this software. To submit patches or suggest improvements, send email to</code>
<code>[email protected]</code>
2、check_nt命令的使用
check_nt参数解释
-w:警告比例
-c:紧急比例
四、定义命令、主机、服务
1、定义命令
<code>[root@Nagios ~]</code><code># vim /usr/local/nagios/etc/objects/commands.cfg</code>
<code># 'check_win' command definition</code>
<code>define </code><code>command</code><code>{</code>
<code> </code><code>command_name check_win</code>
<code> </code><code>command_line $USER1$</code><code>/check_nt</code> <code>-H $HOSTADDRESS$ -p 12489 -</code><code>v</code> <code>$ARG1$ $ARG2$</code>
<code> </code><code>}</code>
<code>注释:</code>
<code>$..$ 表示系统内置的宏,也就是所谓的变量</code>
<code>$USER1$ 表示插件所在的目录 </code>
<code>-H 指定主机地址</code>
<code>$HOSTADDRESS$ 应用到哪个主机,就用哪个主机的地址</code>
<code>$ARG1$ 传递的参数,形参</code>
<code>-s 指定密码 默认为空</code>
2、定义主机和服务
<code>[root@Nagios objects]</code><code># cp windows.cfg windows106.cfg </code>
<code>[root@Nagios objects]</code><code># sed -i 's/winserver/Windows106/g' windows106.cfg </code>
<code>[root@Nagios objects]</code><code># sed -i 's/192.168.1.2/192.168.0.106/' windows106.cfg</code>
4、检测配置文件是否有语法错误
<code>[root@Nagios ~]</code><code># service nagios configtest</code>
<code>Nagios Core 4.0.7</code>
<code>Copyright (c) 2009-present Nagios Core Development Team and Community Contributors</code>
<code>Copyright (c) 1999-2009 Ethan Galstad</code>
<code>Last Modified: 06-03-2014</code>
<code>License: GPL</code>
<code>Website: http:</code><code>//www</code><code>.nagios.org</code>
<code>Reading configuration data...</code>
<code> </code><code>Read main config </code><code>file</code> <code>okay...</code>
<code> </code><code>Read object config files okay...</code>
<code>Running pre-flight check on configuration data...</code>
<code>Checking objects...</code>
<code> </code><code>Checked 8 services.</code>
<code> </code><code>Checked 1 hosts.</code>
<code> </code><code>Checked 1 host </code><code>groups</code><code>.</code>
<code> </code><code>Checked 0 service </code><code>groups</code><code>.</code>
<code> </code><code>Checked 1 contacts.</code>
<code> </code><code>Checked 1 contact </code><code>groups</code><code>.</code>
<code> </code><code>Checked 25 commands.</code>
<code> </code><code>Checked 5 </code><code>time</code> <code>periods.</code>
<code> </code><code>Checked 0 host escalations.</code>
<code> </code><code>Checked 0 service escalations.</code>
<code>Checking </code><code>for</code> <code>circular paths...</code>
<code> </code><code>Checked 1 hosts</code>
<code> </code><code>Checked 0 service dependencies</code>
<code> </code><code>Checked 0 host dependencies</code>
<code> </code><code>Checked 5 timeperiods</code>
<code>Checking global event handlers...</code>
<code>Checking obsessive compulsive processor commands...</code>
<code>Checking misc settings...</code>
<code>Total Warnings: 0</code>
<code>Total Errors: 0</code>
<code>Things </code><code>look</code> <code>okay - No serious problems were detected during the pre-flight check</code>
<code>Object precache </code><code>file</code> <code>created:</code>
<code>/usr/local/nagios/var/objects</code><code>.precache</code>
5、重启nagios服务
<code>[root@Nagios objects]</code><code># service nagios restart</code>
<code>Running configuration check...</code>
<code>Stopping nagios: .</code><code>done</code><code>.</code>
<code>Starting nagios: </code><code>done</code><code>.</code>
四、浏览器查看监控信息
1、登陆后点击【Hosts】,查看新监控的Windows主机信息
<a href="http://s3.51cto.com/wyfs02/M00/4A/BB/wKioL1Qmw93SnhVOAAOI_HKVwgw436.jpg" target="_blank"></a>
2、点击【Services】,查看Windows服务的状态信息
<a href="http://s3.51cto.com/wyfs02/M00/4A/B9/wKiom1Qmw7awxW6cAAfMmerFYOc506.jpg" target="_blank"></a>
3、等待几分钟后状态就正常了,如下图所示
<a href="http://s3.51cto.com/wyfs02/M02/4A/BC/wKiom1Qmys7Sl-qRAAVcoYMMwSw515.jpg" target="_blank"></a>
三、NRPE的方式监控windows主机
1、修改NSClient++的配置文件
<a href="http://s3.51cto.com/wyfs02/M02/4A/EE/wKioL1QnawbRvD3RAAO2J91L10Q919.jpg" target="_blank"></a>
2、重新启动NSClient++服务
<a href="http://s3.51cto.com/wyfs02/M00/4A/EE/wKioL1Qna8GTPNRKAASz4_IEazg262.jpg" target="_blank"></a>
3、Nagios服务端测试NRPE命令
<code>[root@Nagios libexec]</code><code># ./check_nrpe -h</code>
<code>NRPE Plugin </code><code>for</code> <code>Nagios</code>
<code>Copyright (c) 1999-2008 Ethan Galstad ([email protected])</code>
<code>Version: 2.15</code>
<code>Last Modified: 09-06-2013</code>
<code>License: GPL v2 with exemptions (-l </code><code>for</code> <code>more</code> <code>info)</code>
<code>SSL</code><code>/TLS</code> <code>Available: Anonymous DH Mode, OpenSSL 0.9.6 or higher required</code>
<code>Usage: check_nrpe -H <host> [ -b <bindaddr> ] [-4] [-6] [-n] [-u] [-p <port>] [-t <timeout>] [-c <</code><code>command</code><code>>] [-a <arglist...>]</code>
<code> </code><code>-n = Do no use SSL</code>
<code> </code><code>-u = Make socket timeouts </code><code>return</code> <code>an UNKNOWN state instead of CRITICAL</code>
<code> </code><code><host> = The address of the host running the NRPE daemon</code>
<code> </code><code><bindaddr> = bind to </code><code>local</code> <code>address</code>
<code> </code><code>-4 = user ipv4 only</code>
<code> </code><code>-6 = user ipv6 only</code>
<code> </code><code>[port] = The port on </code><code>which</code> <code>the daemon is running (default=5666)</code>
<code> </code><code>[timeout] = Number of seconds before connection </code><code>times</code> <code>out (default=10)</code>
<code> </code><code>[</code><code>command</code><code>] = The name of the </code><code>command</code> <code>that the remote daemon should run</code>
<code> </code><code>[arglist] = Optional arguments that should be passed to the </code><code>command</code><code>. Multiple</code>
<code> </code><code>arguments should be separated by a space. If provided, this must be</code>
<code> </code><code>the last option supplied on the </code><code>command</code> <code>line.</code>
<code>Note:</code>
<code>This plugin requires that you have the NRPE daemon running on the remote host.</code>
<code>You must also have configured the daemon to associate a specific plugin </code><code>command</code>
<code>with the [</code><code>command</code><code>] option you are specifying here. Upon receipt of the</code>
<code>[</code><code>command</code><code>] argument, the NRPE daemon will run the appropriate plugin </code><code>command</code> <code>and</code>
<code>send the plugin output and </code><code>return</code> <code>code back to *this* plugin. This allows you</code>
<code>to execute plugins on remote hosts and </code><code>'fake'</code> <code>the results to </code><code>make</code> <code>Nagios think</code>
<code>the plugin is being run locally.</code>
<code>check_nrpe语法:</code>
<code>check_nrpe ... -c <</code><code>command</code><code>> [-a <argument> <argument> <argument>]</code>
<code>check_nrpe的内置命令:</code>
<code>· CheckAlwaysCRITICAL (check)</code>
<code>· CheckAlwaysOK (check)</code>
<code>· CheckAlwaysWARNING (check)</code>
<code>· CheckCPU (check)</code>
<code>· CheckCRITICAL (check)</code>
<code>· CheckCounter (check)</code>
<code>· CheckEventLog</code><code>/CheckEventLog</code> <code>(check)</code>
<code>· CheckFile (check)</code>
<code>· CheckFileSize (check)</code>
<code>· CheckMem (check)</code>
<code>· CheckMultiple (check)</code>
<code>· CheckOK (check)</code>
<code>· CheckProcState (check)</code>
<code>· CheckServiceState (check)</code>
<code>· CheckTaskSched</code><code>/CheckTaskSched</code> <code>(check)</code>
<code>· CheckUpTime (check)</code>
<code>· CheckVersion (check)</code>
<code>· CheckWARNING (check)</code>
<code>· CheckWMI</code><code>/CheckWMI</code> <code>(check)</code>
<code>· CheckWMIValue (check)</code>
<code>[root@Nagios libexec]</code><code># ./check_nrpe -H 192.168.1.142 -p 5666 -c CheckCPU -a warn=80 crit=90 time=20m time=10s time=4</code>
<code>OK CPU Load ok.|</code><code>'20m'</code><code>=0%;80;90 </code><code>'10s'</code><code>=0%;80;90 </code><code>'4'</code><code>=0%;80;90</code>
<code></code>
本文转自zys467754239 51CTO博客,原文链接:http://blog.51cto.com/467754239/1558861,如需转载请自行联系原作者