QSFP+ 分支端口部分故障 - 可能存在配置错误?

网络工程 杜松 瞻博网络
2021-07-14 05:09:54

网络设备:瞻博网络 Ex4600、40GbE QSFP+ -> 4x 10GbE SFP+ DAC 电缆、Mellanox ConnectX-2

问题:其中一条 SFP+ 物理链路无法启动(xe-0/0/25:0 Physical Link Down,xe-0/0/25:1,2,3 Physical Link Up),因此无法访问网络。

已执行现有故障排除,收集的信息:

  1. 网卡工作:

    • 使用 xe-0/0/25:1 测试服务器,物理链接正常,网络访问正常,使用 xe-0/0/25:0 测试,物理链接关闭,无网络访问
    • 当服务器与xe-0/0/25:1(工作)连接时,网卡的“连接建立”灯变绿,活动灯快闪。EX4600 上 xe-0/0/25:1 的相应 LED 灯亮起呈绿色。
    • 当服务器与xe-0/0/25:1(故障)连接时,网卡的“连接建立”灯变绿,活动灯不亮。EX4600 上 xe-0/0/25:0 对应的 LED 灯熄灭。
  2. xe-0/0:25:1 的 JunOS 网络设置(工作)

root> show interfaces xe-0/0/25:1 detail
Physical interface: xe-0/0/25:1, Enabled, Physical link is Up
  Interface index: 721, SNMP ifIndex: 557, Generation: 214
  Link-level type: Ethernet, MTU: 1514, MRU: 0, Speed: 10Gbps, BPDU Error: None,
  MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
  Flow control: Disabled, Media type: Fiber
  Device flags   : Present Running
  Interface flags: SNMP-Traps Internal: 0x4000
  Link flags     : None
  CoS queues     : 12 supported, 12 maximum usable queues
  Hold-times     : Up 0 ms, Down 0 ms
  Current address: 0c:86:10:3d:89:20, Hardware address: 0c:86:10:3d:89:20
  Last flapped   : 2018-07-28 02:36:09 UTC (2w3d 04:09 ago)
  Statistics last cleared: Never
  Traffic statistics:
   Input  bytes  :      177488100981384                    0 bps
   Output bytes  :      167345335559587                  176 bps
   Input  packets:         124325557902                    0 pps
   Output packets:         117878406643                    0 pps
   IPv6 transit statistics:
    Input  bytes  :                   0
    Output bytes  :                   0
    Input  packets:                   0
    Output packets:                   0
  Egress queues: 12 supported, 5 in use
  Queue counters:       Queued packets  Transmitted packets      Dropped packets
    0                                0         117747387692             57924403
    3                                0                    0                    0
    4                                0                    0                    0
    7                                0             29892590                    0
    8                                0             54714667                    0
  Queue number:         Mapped forwarding classes
    0                   best-effort
    3                   fcoe
    4                   no-loss
    7                   network-control
    8                   mcast
  Active alarms  : None
  Active defects : None
  Interface transmit statistics: Disabled
  MACSec statistics:
    Output
        Secure Channel Transmitted
        Protected Packets               : 0
        Encrypted Packets               : 0
        Protected Bytes                 : 0
        Encrypted Bytes                 : 0
     Input
        Secure Channel Received
        Accepted Packets                : 0
        Validated Bytes                 : 0
        Decrypted Bytes                 : 0

  Logical interface xe-0/0/25:1.0 (Index 609) (SNMP ifIndex 561)
   (Generation 196)
    Flags: Up SNMP-Traps 0x24024000 Encapsulation: Ethernet-Bridge
    Traffic statistics:
     Input  bytes  :             50018849
     Output bytes  :            570590460
     Input  packets:               767221
     Output packets:              1921180
    Local statistics:
     Input  bytes  :             50018849
     Output bytes  :            570590460
     Input  packets:               767221
     Output packets:              1921180
    Transit statistics:
     Input  bytes  :                    0                    0 bps
     Output bytes  :                    0                    0 bps
     Input  packets:                    0                    0 pps
     Output packets:                    0                    0 pps
    Protocol eth-switch, MTU: 1514, Generation: 221, Route table: 5
  1. xe-0/0:25:0 的 JunOS 网络设置(错误)
root> show interfaces xe-0/0/25:0 detail
Physical interface: xe-0/0/25:0, Enabled, Physical link is Down
  Interface index: 720, SNMP ifIndex: 556, Generation: 213
  Link-level type: Ethernet, MTU: 1514, MRU: 0, Speed: 10Gbps, BPDU Error: None,
  MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
  Flow control: Disabled, Media type: Fiber
  Device flags   : Present Running Down
  Interface flags: Hardware-Down SNMP-Traps Internal: 0x4000
  Link flags     : None
  CoS queues     : 12 supported, 12 maximum usable queues
  Hold-times     : Up 0 ms, Down 0 ms
  Current address: 0c:86:10:3d:89:1f, Hardware address: 0c:86:10:3d:89:1f
  Last flapped   : 2018-08-14 04:17:55 UTC (02:02:59 ago)
  Statistics last cleared: Never
  Traffic statistics:
   Input  bytes  :                 8943                    0 bps
   Output bytes  :               128037                    0 bps
   Input  packets:                   73                    0 pps
   Output packets:                 1069                    0 pps
   IPv6 transit statistics:
    Input  bytes  :                   0
    Output bytes  :                   0
    Input  packets:                   0
    Output packets:                   0
  Egress queues: 12 supported, 5 in use
  Queue counters:       Queued packets  Transmitted packets      Dropped packets
    0                                0                    1                    0
    3                                0                    0                    0
    4                                0                    0                    0
    7                                0                  506                    0
    8                                0                  207                    0
  Queue number:         Mapped forwarding classes
    0                   best-effort
    3                   fcoe
    4                   no-loss
    7                   network-control
    8                   mcast
  Active alarms  : LINK
  Active defects : LINK
  Interface transmit statistics: Disabled
  MACSec statistics:
    Output
        Secure Channel Transmitted
        Protected Packets               : 0
        Encrypted Packets               : 0
        Protected Bytes                 : 0
        Encrypted Bytes                 : 0
     Input
        Secure Channel Received
        Accepted Packets                : 0
        Validated Bytes                 : 0
        Decrypted Bytes                 : 0

  Logical interface xe-0/0/25:0.0 (Index 608) (SNMP ifIndex 558)
   (Generation 195)
    Flags: Device-Down SNMP-Traps 0x24024000 Encapsulation: Ethernet-Bridge
    Traffic statistics:
     Input  bytes  :                 4565
     Output bytes  :                10395

     Input  packets:                   48
     Output packets:                   35
    Local statistics:
     Input  bytes  :                 4565
     Output bytes  :                10395
     Input  packets:                   48
     Output packets:                   35
    Transit statistics:
     Input  bytes  :                    0                    0 bps
     Output bytes  :                    0                    0 bps
     Input  packets:                    0                    0 pps
     Output packets:                    0                    0 pps
    Protocol eth-switch, MTU: 1514, Generation: 220, Route table: 5

注意:上次更新时间:2018-08-14 04:17:55 UTC(02:02:59 前),活动警报:LINK,活动缺陷:LINK。服务器在 2 小时前重新启动机器,交换机检测到连接到端口的设备,但活动警报和缺陷显示“LINK”。

  1. 生成树状态:阻塞
root> show ethernet-switching interface xe-0/0/25:0
Routing Instance Name : default-switch
Logical Interface flags (DL - disable learning, AD - packet action drop,
                         LH - MAC limit hit, DN - interface down,
                         SCTL - shutdown by Storm-control,
                         MMAS - Mac-move action shutdown ) 

Logical          Vlan          TAG     MAC         STP         Logical           Tagging 
interface        members               limit       state       interface flags  
xe-0/0/25:0.0                          294912                   DN                untagged   
                 default       1       294912      Discarding                     untagged

更新:

  1. 进一步调试揭示了无法同时连接到交换机的两个特定网卡 (mellanox connectx-2) 之间的问题。两者都可以完美地与开关配合使用。网卡的MAC地址不同。在同一台交换机上,还有 20 多个其他 mellanox connectx-3 10GbE 网卡在同一台交换机上愉快地同时运行。

更新

通过在有问题的端口上禁用 RSTP 暂时解决了问题,删除子接口,然后重新添加。但是,有时当服务器重新启动时,物理网络会关闭。必须以管理方式关闭/打开接口以重新启用物理接口

root> show interfaces xe-0/0/25:0
Physical interface: xe-0/0/25:0, Enabled, Physical link is Down
  Interface index: 720, SNMP ifIndex: 556
  Link-level type: Ethernet, MTU: 1514, MRU: 0, Speed: 10Gbps, BPDU Error: None,
  MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
  Flow control: Disabled, Media type: Fiber
  Device flags   : Present Running Down
  Interface flags: Hardware-Down SNMP-Traps Internal: 0x4000
  Link flags     : None
  CoS queues     : 12 supported, 12 maximum usable queues
  Current address: 0c:86:10:3d:89:1f, Hardware address: 0c:86:10:3d:89:1f
  Last flapped   : 2018-08-16 11:14:14 UTC (16:29:48 ago)
  Input rate     : 0 bps (0 pps)
  Output rate    : 0 bps (0 pps)
  Active alarms  : LINK
  Active defects : LINK
  Interface transmit statistics: Disabled

  Logical interface xe-0/0/25:0.0 (Index 608) (SNMP ifIndex 558)
    Flags: Device-Down SNMP-Traps 0x24024000 Encapsulation: Ethernet-Bridge
    Input packets : 4067
    Output packets: 4827
    Protocol eth-switch, MTU: 1514

{master:0}[edit]
root# set interfaces xe-0/0/25:0 disable

{master:0}[edit]
root# commit
configuration check succeeds
fpc1:
commit complete
commit complete

{master:0}[edit]
root# run show interfaces xe-0/0/25:0
Physical interface: xe-0/0/25:0, Administratively down, Physical link is Down
  Interface index: 720, SNMP ifIndex: 556
  Link-level type: Ethernet, MTU: 1514, MRU: 0, Speed: 10Gbps, BPDU Error: None,
  MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
  Flow control: Disabled, Media type: Fiber
  Device flags   : Present Running Down
  Interface flags: Hardware-Down Down SNMP-Traps Internal: 0x4000
  Link flags     : None
  CoS queues     : 12 supported, 12 maximum usable queues
  Current address: 0c:86:10:3d:89:1f, Hardware address: 0c:86:10:3d:89:1f
  Last flapped   : 2018-08-16 11:14:14 UTC (16:38:47 ago)
  Input rate     : 0 bps (0 pps)
  Output rate    : 0 bps (0 pps)
  Active alarms  : LINK
  Active defects : LINK
  Interface transmit statistics: Disabled

  Logical interface xe-0/0/25:0.0 (Index 608) (SNMP ifIndex 558)
    Flags: Device-Down SNMP-Traps 0x24024000 Encapsulation: Ethernet-Bridge
    Input packets : 4067
    Output packets: 4827
    Protocol eth-switch, MTU: 1514

{master:0}[edit]
root# delete interfaces xe-0/0/25:0 disable

{master:0}[edit]
root# commit
configuration check succeeds
fpc1:
commit complete
commit complete

root# run show interfaces xe-0/0/25:0
Physical interface: xe-0/0/25:0, Enabled, Physical link is Up
  Interface index: 720, SNMP ifIndex: 556
  Link-level type: Ethernet, MTU: 1514, MRU: 0, Speed: 10Gbps, BPDU Error: None,
  MAC-REWRITE Error: None, Loopback: Disabled, Source filtering: Disabled,
  Flow control: Disabled, Media type: Fiber
  Device flags   : Present Running
  Interface flags: SNMP-Traps Internal: 0x4000
  Link flags     : None
  CoS queues     : 12 supported, 12 maximum usable queues
  Current address: 0c:86:10:3d:89:1f, Hardware address: 0c:86:10:3d:89:1f
  Last flapped   : 2018-08-17 03:53:25 UTC (00:00:16 ago)
  Input rate     : 304 bps (0 pps)
  Output rate    : 0 bps (0 pps)
  Active alarms  : None
  Active defects : None
  Interface transmit statistics: Disabled

  Logical interface xe-0/0/25:0.0 (Index 608) (SNMP ifIndex 558)
    Flags: Up SNMP-Traps 0x24024000 Encapsulation: Ethernet-Bridge
    Input packets : 4132
    Output packets: 4829
    Protocol eth-switch, MTU: 1514
1个回答

解决方法(虽然不是很好)是禁用RSTP,重新配置子接口接口,物理断开并多次重新连接网线。这是唯一一种在两个网卡上都启用联网的可能性很高的方法。

注意:这不是简单地掩盖问题的黑客解决方案,而是暂时允许两个网卡连接。我们通过更换一张网卡解决了这个问题。