zabbix 突然間報:Zabbix agent on {HOST.NAME} is unreachable for 5 minutes
N多機器都有這樣的報警;
登陸檢視被監控機,一切正常,網絡一切正常;
檢視zabbix agent 日志,沒有異常;
檢視zabbix server 日志 ,大部分提示資訊為:
21567:20131203:141448.893 [Z3005] query failed: [1205] Lock wait timeout exceeded; try restarting transaction [update ids set nextid=nextid+1 where nodeid=0 and table_name='events' and field_name='eventid']
zabbix_server [21567]: ERROR [file:db.c,line:1501] Something impossible has just happened.
update triggers set lastchange=1386049597,value=1 where triggerid=14912;
update ids set nextid=nextid+1 where nodeid=0 and table_name='events' and field_name='eventid'
delete from escalations where escalationid between 364655 and 364665
登陸MySQL檢視,錯誤日志中的SQL都處在 lock wait 狀态,磁盤IO使用率100%,
應該是IO導緻的問題,可以從這條語句判斷:update triggers set lastchange=1386049597,value=1 where triggerid=14912; 觸發器的狀态沒有更新過來;
<a target="_blank" href="http://blog.51cto.com/attachment/201312/143939340.jpg"></a>
IO高的原因是 其他機器在向它scp資料,
資料scp完畢後,zabbix也恢複正常,原以為是zabbixbug,剩下的事情就是優化資料庫,修改zabbix的語句!
本文轉自 位鵬飛 51CTO部落格,原文連結:http://blog.51cto.com/weipengfei/1335388,如需轉載請自行聯系原作者