Cisco nexus 9396PX TX 输出在 40G 接口上丢弃,无链路拥塞

网络工程 思科 转变 以太网 服务质量 cisco-nexus
2021-07-24 08:40:57

我们有带有 N9K-M12PQ 模块(12x40G 接口)的 cisco nexus 9396PX,我们有 8x10G L3 LACP 与我们的 ISP 绑定连接,到目前为止一切都没有问题。但最近我们将该 LACP LAg 迁移到 3x40G 链路(因此总链路为 120Gbps)

一旦我们转向 120G LACP,我就开始看到所有端口通道接口上的输出丢弃,峰值期间链路利用率为 50Gbps,但平均约为 30Gbps,这意味着它不是链路拥塞问题,我有足够的可用带宽。我想到了微突发,但为什么在迁移到 40G 接口后又开始了,过去 1 年 8x10G LACP LAg 没有问题?

N9K# sh int po120
port-channel120 is up
admin state is up,
  Hardware: Port-Channel, address: 88f0.31db.e5d7 (bia 6412.25ed.9047)
  Description: 120G_L3_LACP
  Internet Address is 77.211.14.XX/30
  MTU 1500 bytes, BW 120000000 Kbit, DLY 10 usec
  reliability 255/255, txload 55/255, rxload 48/255
  Encapsulation ARPA, medium is broadcast
  full-duplex, 40 Gb/s
  Input flow-control is off, output flow-control is off
  Auto-mdix is turned off
  Switchport monitor is off
  EtherType is 0x8100
  Members in this channel: Eth2/1, Eth2/2, Eth2/3
  Last clearing of "show interface" counters never
  1 interface resets
  30 seconds input rate 22940013928 bits/sec, 22332504 packets/sec
  30 seconds output rate 25888954296 bits/sec, 17780437 packets/sec
  Load-Interval #2: 5 minute (300 seconds)
    input rate 22.86 Gbps, 22.26 Mpps; output rate 25.75 Gbps, 17.69 Mpps
  RX
    6291392826509 unicast packets  24502 multicast packets  84 broadcast packets
    6291392850755 input packets  876101389840965 bytes
    0 jumbo packets  0 storm suppression packets
    0 runts  0 giants  0 CRC  0 no buffer
    0 input error  0 short frame  0 overrun   0 underrun  0 ignored
    0 watchdog  0 bad etype drop  0 bad proto drop  0 if down drop
    0 input with dribble  0 input discard
    0 Rx pause
  TX
    6308927523402 unicast packets  732947 multicast packets  2 broadcast packets
    6308928256067 output packets  1158946502837217 bytes
    2 jumbo packets
    0 output error  0 collision  0 deferred  0 late collision
    0 lost carrier  0 no carrier  0 babble  11275 output discard
    0 Tx pause

政策地图

N9K# show policy-map interface e2/1


Global statistics status :   enabled

Ethernet2/1

  Service-policy (queuing) output:   default-out-policy

    Class-map (queuing):   c-out-q3 (match-any)
      priority level 1
      queue dropped pkts : 0
      queue depth in bytes : 0

    Class-map (queuing):   c-out-q2 (match-any)
      bandwidth remaining percent 0
      queue dropped pkts : 0
      queue depth in bytes : 0

    Class-map (queuing):   c-out-q1 (match-any)
      bandwidth remaining percent 0
      queue dropped pkts : 0
      queue depth in bytes : 0

    Class-map (queuing):   c-out-q-default (match-any)
      bandwidth remaining percent 100
      queue dropped pkts : 3795
      queue depth in bytes : 0

缓冲配置文件

N9K# show hardware qos ns-buffer-profile
NS Buffer Profile: Burst optimized

队列接口

N9K# show queuing interface e2/1

slot  1
=======


Egress Queuing for Ethernet2/1 [System]
------------------------------------------------------------------------------
QoS-Group# Bandwidth% PrioLevel                Shape                   QLimit
                                   Min          Max        Units
------------------------------------------------------------------------------
      3             -         1           -            -     -            6(D)
      2             0         -           -            -     -            6(D)
      1             0         -           -            -     -            6(D)
      0           100         -           -            -     -            6(D)

Port Egress Statistics
--------------------------------------------------------
Pause Flush Drop Pkts                              0

+-------------------------------------------------------------------+
|                              QOS GROUP 0                          |
+-------------------------------------------------------------------+
|        Tx Pkts |   2096313003372|   Dropped Pkts |            3795|
+-------------------------------------------------------------------+
|                              QOS GROUP 1                          |
+-------------------------------------------------------------------+
|        Tx Pkts |               0|   Dropped Pkts |               0|
+-------------------------------------------------------------------+
|                              QOS GROUP 2                          |
+-------------------------------------------------------------------+
|        Tx Pkts |               0|   Dropped Pkts |               0|
+-------------------------------------------------------------------+
|                              QOS GROUP 3                          |
+-------------------------------------------------------------------+
|        Tx Pkts |               0|   Dropped Pkts |               0|
+-------------------------------------------------------------------+
|                      CONTROL QOS GROUP 4                          |
+-------------------------------------------------------------------+
|        Tx Pkts |       291929094|   Dropped Pkts |               0|
+-------------------------------------------------------------------+
|                         SPAN QOS GROUP 5                          |
+-------------------------------------------------------------------+
|        Tx Pkts |               0|   Dropped Pkts |               0|
+-------------------------------------------------------------------+


Ingress Queuing for Ethernet2/1
------------------------------------------------------------------
QoS-Group#                 Pause                        QLimit
           Buff Size       Pause Th      Resume Th
------------------------------------------------------------------
      3              -            -            -           10(D)
      2              -            -            -           10(D)
      1              -            -            -           10(D)
      0              -            -            -           10(D)

PFC Statistics
----------------------------------------------------------------------------
TxPPP:                    0, RxPPP:                    0
----------------------------------------------------------------------------
 COS QOS Group        PG   TxPause   TxCount         RxPause         RxCount
   0         0         -  Inactive         0        Inactive               0
   1         0         -  Inactive         0        Inactive               0
   2         0         -  Inactive         0        Inactive               0
   3         0         -  Inactive         0        Inactive               0
   4         0         -  Inactive         0        Inactive               0
   5         0         -  Inactive         0        Inactive               0
   6         0         -  Inactive         0        Inactive               0
   7         0         -  Inactive         0        Inactive               0
----------------------------------------------------------------------------

排队统计

N9K# show system internal qos queuing stats interface e2/1
Interface Ethernet2/1 statistics
Receive queues
----------------------------------------
This is not yet implemented
This is not yet implemented
This is not yet implemented
This is not yet implemented
This is not yet implemented
This is not yet implemented
This is not yet implemented
Interface Ethernet2/1 statistics
Transmit queues
----------------------------------------
This is not yet implemented
This is not yet implemented
This is not yet implemented
This is not yet implemented
This is not yet implemented
This is not yet implemented
This is not yet implemented

更新 - 1

端口通道负载均衡是 src-dst ip-l4port

Port Channel Load-Balancing Configuration for all modules:
Module 1:
  Non-IP: src-dst mac
  IP: src-dst ip-l4port rotate 0

我可以看到所有 3 个链接共享平衡流量我在那里没有看到任何差异。

在此处输入图片说明

1个回答

高峰期链路利用率为50Gbps

这可能是问题所在。当跨端口组的流量分配完美时, LAG 中继仅具有其接口的聚合带宽中继组中有三个端口,分布与之前的八个接口有很大不同。

通常,源/目标 IP 地址/L4 端口号被散列,并且散列用于索引出口端口 - 具有三个端口和完全随机的 IP 地址/端口,两个端口有可能获得一半的流量(每个端口的四分之一),而第三个得到另一半。(或者更确切地说,数据包退出端口 A 和 B概率分别为 25% 和端口 C 的 50%)。

由于实际上 IP/端口分布不是随机的,并且通常您有少量非常快的流,因此流的组合可能会超过出口接口的带宽。您需要密切监视流量和每个接口的吞吐量,以查明确切原因并找出如何避免。