今天早上从 Observium 收到警报,我们的一个脊椎开关已关闭。显得有些奇怪。如果是这种情况,其他事情也会引起注意。
事实证明交换机已启动并运行,但 2 年多后它决定停止响应 SNMP。
所以我照常做
no snmp-server
snmp-server community blah ro
snmp-server chassis-id <switch hostname>
snmp-server contact <blah@company.com>
检查日志以确保它已启动
<switch hostname>#sh log | i %SNMP-5
2y39w: %SNMP-5-WARMSTART: SNMP agent on host <switch hostname> is undergoing a warm start
看起来不错,但 SNMP 仍然没有响应。SNMP 计数器不会改变。
<switch hostname>#sh snmp
Chassis: <switch hostname>
Contact: <blah@company.com>
Location: <Switch Location>
121127521 SNMP packets input
0 Bad SNMP version errors
2 Unknown community name
0 Illegal operation for community name supplied
0 Encoding errors
52622307 Number of requested variables
0 Number of altered variables
12869109 Get-request PDUs
160 Get-next PDUs
0 Set-request PDUs
121127516 SNMP packets output
0 Too big errors (Maximum packet size 1500)
0 No such name errors
0 Bad values errors
0 General errors
121127473 Response PDUs
0 Trap PDUs
SNMP global trap: disabled
SNMP logging: disabled
SNMP agent enabled
如何在不重新启动此交换机的情况下恢复 SNMP?
更新:昨晚开关死了,从地球上掉了下来。今天早上我去数据中心进行故障排除,控制台充满了以下消息。看来我们可能在某处存在导致内存碎片的网络循环。需要更多故障排除......
2y39w: %SYS-2-MALLOCFAIL: Memory allocation of 1696 bytes failed from 0x26C184, alignment 8
Pool: I/O Free: 7432 Cause: Memory fragmentation
Alternate Pool: None Free: 0 Cause: No Alternate pool
-Process= "Pool Manager", ipl= 0, pid= 5
-Traceback= 8A3600 BE91B4 BECE60 26C188 270A1C 270B70 758DA8 752FEC