我们在第 2 层透明模式下使用 Cisco ASA 5585。配置只是我们的业务合作伙伴 dmz 和我们的内部网络之间的两个 10GE 链路。一个简单的地图看起来像这样。
10.4.2.9/30 10.4.2.10/30
core01-----------ASA1----------dmzsw
ASA 具有 8.2(4) 和 SSP20。交换机是 6500 Sup2T 和 12.2。任何交换机或 ASA 接口上都没有丢包!!交换机之间的最大流量约为 1.8Gbps,ASA 上的 CPU 负载非常低。
我们有一个奇怪的问题。我们的 nms 管理员发现在 6 月的某个时候开始出现非常严重的数据包丢失。丢包增长非常快,但我们不知道为什么。通过防火墙的流量保持不变,但丢包率迅速增加。这些是我们通过防火墙看到的 nagios ping 失败。Nagios 向每台服务器发送 10 个 ping。某些故障会丢失所有 ping,但并非所有故障都会丢失所有 10 个 ping。
奇怪的是,如果我们使用nagios服务器的mtr,丢包不是很严重。
My traceroute [v0.75]
nagios (0.0.0.0) Fri Jul 19 03:43:38 2013
Keys: Help Display mode Restart statistics Order of fields quit
Packets Pings
Host Loss% Snt Drop Last Best Avg Wrst StDev
1. 10.4.61.1 0.0% 1246 0 0.4 0.3 0.3 19.7 1.2
2. 10.4.62.109 0.0% 1246 0 0.2 0.2 0.2 4.0 0.4
3. 10.4.62.105 0.0% 1246 0 0.4 0.4 0.4 3.6 0.4
4. 10.4.62.37 0.0% 1246 0 0.5 0.4 0.7 11.2 1.7
5. 10.4.2.9 1.3% 1246 16 0.8 0.5 2.1 64.8 7.9
6. 10.4.2.10 1.4% 1246 17 0.9 0.5 3.5 102.4 11.2
7. dmz-server 1.1% 1246 13 0.6 0.5 0.6 1.6 0.2
当我们在交换机之间 ping 时,我们不会丢失很多数据包,但很明显问题始于交换机之间的某个地方。
core01#ping ip 10.4.2.10 repeat 500000
Type escape sequence to abort.
Sending 500000, 100-byte ICMP Echos to 10.4.2.10, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Success rate is 99 percent (499993/500000), round-trip min/avg/max = 1/2/6 ms
core01#
我们怎么能有这么多的 ping 失败并且接口上没有丢包?我们如何才能找到问题所在?Cisco TAC 在这个问题上绕圈子,他们不断要求从这么多不同的交换机中获取显示技术,很明显问题出在 core01 和 dmzsw 之间。有人可以帮忙吗?
2013 年 7 月 30 日更新
感谢大家帮我找到问题。这是一个行为不端的应用程序,一次发送大量小 UDP 数据包约 10 秒。这些数据包被防火墙拒绝。看起来我的经理想要升级我们的 ASA,这样我们就不会再遇到这个问题了。
更多信息
从评论中的问题:
ASA1# show inter detail | i ^Interface|overrun|error
Interface GigabitEthernet0/0 "", is administratively down, line protocol is down
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 output errors, 0 collisions, 0 interface resets
Interface GigabitEthernet0/1 "", is administratively down, line protocol is down
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 output errors, 0 collisions, 0 interface resets
Interface GigabitEthernet0/2 "", is administratively down, line protocol is down
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 output errors, 0 collisions, 0 interface resets
Interface GigabitEthernet0/3 "", is administratively down, line protocol is down
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 output errors, 0 collisions, 0 interface resets
Interface GigabitEthernet0/4 "", is administratively down, line protocol is down
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 output errors, 0 collisions, 0 interface resets
Interface GigabitEthernet0/5 "", is administratively down, line protocol is down
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 output errors, 0 collisions, 0 interface resets
Interface GigabitEthernet0/6 "", is administratively down, line protocol is down
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 output errors, 0 collisions, 0 interface resets
Interface GigabitEthernet0/7 "", is administratively down, line protocol is down
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 output errors, 0 collisions, 0 interface resets
Interface Internal-Data0/0 "", is up, line protocol is up
2749335943 input errors, 0 CRC, 0 frame, 2749335943 overrun, 0 ignored, 0 abort
0 output errors, 0 collisions, 0 interface resets
RX[00]: 156069204310 packets, 163645512578698 bytes, 0 overrun
RX[01]: 185159126458 packets, 158490838915492 bytes, 0 overrun
RX[02]: 192344159588 packets, 197697754050449 bytes, 0 overrun
RX[03]: 173424274918 packets, 196867236520065 bytes, 0 overrun
Interface Internal-Data1/0 "", is up, line protocol is up
26018909182 input errors, 0 CRC, 0 frame, 26018909182 overrun, 0 ignored, 0 abort
0 output errors, 0 collisions, 0 interface resets
RX[00]: 194156313803 packets, 189678575554505 bytes, 0 overrun
RX[01]: 192391527307 packets, 184778551590859 bytes, 0 overrun
RX[02]: 167721770147 packets, 179416353050126 bytes, 0 overrun
RX[03]: 185952056923 packets, 205988089145913 bytes, 0 overrun
Interface Management0/0 "Mgmt", is up, line protocol is up
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 output errors, 0 collisions, 0 interface resets
Interface Management0/1 "", is administratively down, line protocol is down
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 output errors, 0 collisions, 0 interface resets
Interface TenGigabitEthernet0/8 "Inside", is up, line protocol is up
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 output errors, 0 collisions, 0 interface resets
Interface TenGigabitEthernet0/9 "DMZ", is up, line protocol is up
0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
0 output errors, 0 collisions, 0 interface resets
ASA1#