我相信我有一个糟糕的 Cisco Nexus N9K-C92160YC-X。
我从一个配置空白的客户那里收到了这个开关。我最近了解到它的保修期已过,并且不在任何服务合同范围内。
我第一次开机就成功启动了。我给它一个接口 VLAN 1 IP 地址,并决定要更新它的固件,所以我将 nxos.9.3.3.bin 固件上传到 bootflash 并运行“show install all Impact”命令来验证它是否是一个安装成功。报告中的一切都很好。
我还应用了以下命令:
诊断监视器间隔模块 1 测试 PrimaryBootROM 小时 23 分钟 59 秒 59
诊断监视器间隔模块 1 测试 SecondaryBootROM 小时 23 分钟 59 秒 59
这些是为了避免 Cisco bug CSCvk30831 中列出的潜在问题。可以在此处找到有关此错误的详细信息:https ://www.cisco.com/c/en/us/support/docs/field-notices/703/fn70320.html
我们已经在其他几个 9K 中遇到了这个错误,所以我想在升级之前避免这个问题。
我从未真正运行过安装。我在周末关闭了交换机,并计划在周一运行安装。星期一我打开它,现在它无法启动。交换机现在卡在恒定的电源循环循环中。在启动过程中,我可以看到它内核恐慌,这似乎是由于内存错误造成的。
这是引导序列的输出:
CISCO SWITCH Ver7.59
Device detected on 0:6:0 after 0 msecs
Device detected on 0:1:2 after 0 msecs
Device detected on 0:1:1 after 0 msecs
Device detected on 0:1:0 after 0 msecs
MCFrequency 1333Mhz
Adjusting Clock synthesizer
CLK AFT: 0: ff
CLK AFT: 1: 9e
CLK AFT: 2: 3f
CLK AFT: 3: 75
CLK AFT: 4: 3
CLK AFT: 5: 7
CLK AFT: 6: 13
CLK AFT: 7: 1
CLK AFT: 8: a
CLK AFT: 9: 46
Relocated to memory
Time: 6/1/2020 14:56:17
Pre-Reserving Memory Bar of size 8000000 for root-port B0|D1C|F0 7ffffff 4
Detected CISCO IOFPGA
MIFPGA Present
Code Signing Results: 0x0
Using Upgrade FPGA
Checking and setting PSU fan directions
Booting from Primary Bios
FPGA Revison : 0x17
FPGA ID : 0x1505787
FPGA Date : 0x20161121
Power Debug Register: 0x0
Reset Cause Register: 0x80000000
Boot Ctrl Register : 0xe0ff
FPGA Update Status : 0x20
Detected CISCO MIFPGA
FPGA Update Status : 0x20
Version 2.16.1240. Copyright (C) 2013 American Megatrends, Inc.
Board type 2
IOFPGA @ 0xc8000000
SLOT_ID @ 0xf
Standalone chassis
check_bootmode: grub: Continue grub
Trying to read config file /boot/grub/menu.lst.local from (hd0,4)
Filesystem type is ext2fs, partition type 0x83
Booting bootflash:/nxos.7.0.3.I3.1.bin ...
Booting bootflash:/nxos.7.0.3.I3.1.bin
Trying diskboot
Filesystem type is ext2fs, partition type 0x83
Image valid
Image Signature verification was Successful.
Boot Time: 6/1/2020 14:56:50
Unprotecting eUSB ...
INIT: version 2.88 booting
Unprotecting eUSB ...
Unsquashing rootfs ...
Loading IGB driver ...
Installing SSE module ... done
Creating the sse device node ... done
Loading I2C driver ...
Installing CCTRL driver for card_type 33 ...
CCTRL driver for card_index 21125 ...
Micron_M500IT_MT
Checking SSD firmware ...
Model Number: Micron_M500IT_MTFDDAT064SBD
Serial Number: MSA2210036X
Firmware Revision: MU01.00
Checking all filesystems.......
Installing default sprom values ...
done.Configuring network ...
Installing LC netdev ...
Installing veobc ...
Installing OBFL driver ...
mounting plog for N9k!
invalid group file entry
delete line 'aaa-db-operator:508:'? No
grpck: no changes
..done Mon Jun 1 14:57:08 UTC 2020
tune2fs 1.42.1 (17-Feb-2012)
Setting reserved blocks percentage to 0% (0 blocks)
Starting portmap daemon...
creating NFS state directory: done
starting 8 nfsd kernel threads: done
starting mountd: done
starting statd: done
Saving image for img-sync ...
Loading system software
Installing local RPMS
Patch Repository Setup completed successfully
dealing with default shell..
file /proc/cmdline found, look for shell
unset shelltype, nothing to do..
user add file found..edit it
Uncompressing system image: Mon Jun 1 14:57:14 UTC 2020
blogger: nothing to do.
..done Mon Jun 1 14:57:14 UTC 2020
Creating /dev/mcelog
Starting mcelog daemon
Overwriting dme stub lib
Replaced dme stub lib
INIT: Entering runlevel: 3
Running S93thirdparty-script...
Populating conf files for hybrid sysmgr ...
Starting hybrid sysmgr ...
[ 33.846766] [1591023444] NMI: PCI system error (SERR) for reason b1 on CPU 0.
[ 33.931884] [1591023444] Memory ERR Staus 0x3
[ 33.983795] [1591023444] pci 0000:00:00.0: Memory ERR Staus 0x3
[ 34.054412] [1591023445] Channel 0 ECC Regs 0x28fa0003 0x200fff1
[ 34.127105] [1591023445] Channel 1 ECC Regs 0x0 0x0
[ 34.186301] [1591023445] ***Channel 0: Un-Correctable mutiple-bit error ***
[ 34.269391] [1591023445] Kernel panic - not syncing: ***Channel 2: Un-Correctable mutiple-bit error ***
[ 34.269393] [1591023445]
[ 34.412682] [1591023445] Pid: 0, comm: swapper/0 Tainted: P O 3.4.43-WR5.0.1.13_standard #1
[ 34.522767] [1591023445] Call Trace:
[ 34.565338] [1591023445] <NMI> [<ffffffff816b13d9>] panic+0xfb/0x23d
[ 34.643228] [1591023445] [<ffffffff8101faad>] host_bridge_memory_errors_reporting+0x47d/0x510
[ 34.746041] [1591023445] [<ffffffff816bb9f0>] ? do_nmi+0x190/0x4e0
[ 34.820812] [1591023445] [<ffffffff816b1585>] ? printk+0x6a/0x83
[ 34.893502] [1591023445] [<ffffffff816bb9f9>] do_nmi+0x199/0x4e0
[ 34.966199] [1591023445] [<ffffffff816bad6c>] end_repeat_nmi+0x1a/0x1e
[ 35.045125] [1591023446] [<ffffffff81305376>] ? intel_idle+0xb6/0xf0
[ 35.121969] [1591023446] [<ffffffff81305376>] ? intel_idle+0xb6/0xf0
[ 35.198816] [1591023446] [<ffffffff81305376>] ? intel_idle+0xb6/0xf0
[ 35.275658] [1591023446] <<EOE>> [<ffffffff81522aaf>] cpuidle_enter_state+0x4f/0xe0
[ 35.369125] [1591023446] [<ffffffff81522c69>] cpuidle_idle_call+0x129/0x220
[ 35.453246] [1591023446] [<ffffffff8100b35f>] cpu_idle+0x7f/0xb0
[ 35.525939] [1591023446] [<ffffffff8168d4c9>] rest_init+0x6d/0x74
[ 35.599671] [1591023446] [<ffffffff81cfac3e>] start_kernel+0x466/0x473
[ 35.678591] [1591023446] [<ffffffff81cfa54f>] ? repair_env_string+0x5a/0x5a
[ 35.762707] [1591023446] [<ffffffff81cfa32a>] x86_64_start_reservations+0x131/0x135
[ 35.855133] [1591023446] [<ffffffff81cfa140>] ? early_idt_handlers+0x140/0x140
[ 35.942368] [1591023446] [<ffffffff81cfa430>] x86_64_start_kernel+0x102/0x111
[ 36.028560] [1591023447] Dumping interrupt statistics
[ 36.088785] [1591023447] CPU0 CPU1 CPU2 CPU3 intrs/last_sec max_intrs/sec
[ 36.210285] [1591023447] 0: 57 0 0 0 57 57 IO-APIC-edge timer
[ 36.341130] [1591023447] 4: 134 0 0 0 10 28 IO-APIC-edge serial
[ 36.473017] [1591023447] 8: 1 0 0 0 0 1 IO-APIC-edge rtc0
[ 36.602823] [1591023447] 9: 0 0 0 0 0 0 IO-APIC-fasteoi acpi
[ 36.732637] [1591023447] 23: 26 0 0 0 25 25 IO-APIC-fasteoi ehci_hcd:usb1
[ 36.871790] [1591023447] 40: 0 0 0 0 0 0 PCI-MSI-edge PCIe PME
[ 37.005753] [1591023447] 41: 0 0 0 0 0 0 PCI-MSI-edge PCIe PME
[ 37.139716] [1591023448] 42: 0 0 0 0 0 0 PCI-MSI-edge PCIe PME
[ 37.273679] [1591023448] 43: 0 0 0 0 0 0 PCI-MSI-edge PCIe PME
[ 37.407644] [1591023448] 44: 0 0 0 0 0 0 PCI-MSI-edge PCIe PME
[ 37.541607] [1591023448] 45: 0 0 0 0 0 0 PCI-MSI-edge PCIe PME
[ 37.675569] [1591023448] 46: 0 0 0 0 0 0 PCI-MSI-edge PCIe PME
[ 37.809533] [1591023448] 47: 0 0 0 0 0 0 PCI-MSI-edge PCIe PME
[ 37.943498] [1591023448] 48: 3488 0 0 0 0 1445 PCI-MSI-edge ahci
[ 38.073310] [1591023449] 58: 0 0 0 0 0 0 PCI-MSI-edge cctrl_tor3_plat_io_isr
[ 38.221809] [1591023449] 59: 0 0 0 0 0 0 PCI-MSI-edge cctrl_tor3_portlib_mi_isr
[ 38.373432] [1591023449] sending NMI to all CPUs:
[ 38.429513] [1591023449] NMI backtrace for cpu 2
[ 38.484550] [1591023449] CPU 2
[ 38.519856] [1591023449] Modules linked in: klm_procfs_init(PO) klm_i2c_stub(O) ata_piix klm_isan_kthread(PO) klm_cmos(PO) klm_ins_igb(O) klm_psdev(O) klm_pfmsvcs(PO) klm_sse(O) klm_tlv(PO) klm_mping(PO) klm_kpss(PO) klm_modlock(O) klm_sdwrap(O) klm_cctrli(PO) klm_if_index(PO) klm_vdc_mgr(O) klm_dc_sprom(O) klm_nvram(O) lc_netdev.mod(O) klm_vdc(O) klm_veobc(O) klm_obfl(O) klm_rwsem(PO) klm_pss(O) klm_aipc(PO) klm_kadb(O) klm_mts(PO) klm_mtsfilter(PO) klm_cctrli_bg(PO) klm_sup_ctrl_mc(PO) klm_rdn_dummy(PO) klm_usd(O) klm_misc(O) klm_gpl(PO) klm_lc_diag_stat(O) klm_ls_notify(PO) klm_fcfwd(PO) klm_fcoe(PO) klm_fc2(PO) klm_cisco_nb(O) klm_kfsmutils(PO) klm_sysmgr-hb(O) klm_sysmgr-hb_lc(O) klm_utaker(O) klm_kgdb(PO)
[ 39.274823] [1591023449]
[ 39.305984] [1591023449] Pid: 10117, comm: sysmgr Tainted: P O 3.4.43-WR5.0.1.13_standard #1 To be filled by O.E.M. To be filled by O.E.M./Aptio CRB
[ 39.475250] [1591023449] RIP: 0010:[<ffffffff810392de>] [<ffffffff810392de>] native_flush_tlb_others+0xce/0x110
[ 39.596757] [1591023449] RSP: 0000:ffff880451c3dbe8 EFLAGS: 00000202
[ 39.673607] [1591023449] RAX: 00000000000008d1 RBX: ffff88044094b740 RCX: 0000000000000001
[ 39.772266] [1591023449] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000286
[ 39.870920] [1591023449] RBP: ffff880451c3dc18 R08: ffff88044094b740 R09: 0000000000000000
[ 39.969575] [1591023449] R10: dfed912167b8c580 R11: 0000000000000000 R12: 0000000000000080
[ 40.068231] [1591023449] R13: ffffffff81dcd180 R14: 0000000000000002 R15: 000000000a041a7c
[ 40.166886] [1591023449] FS: 0000000000000000(0000) GS:ffff88047fd00000(0063) knlGS:00000000eca9d940
[ 40.276964] [1591023449] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033
[ 40.359002] [1591023449] CR2: 000000000a041a7c CR3: 00000004515ae000 CR4: 00000000001407e0
[ 40.457659] [1591023449] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 40.556312] [1591023449] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 40.654968] [1591023449] Process sysmgr (pid: 10117, threadinfo ffff880451c3c000, task ffff88041ab0c3b0)
[ 40.768161] [1591023449] Stack:
[ 40.805542] [1591023449] ffff880451c3dbf8 ffffffff810098e9 ffff88044094b480 ffff88044094b740
[ 40.907309] [1591023449] 000000000a041a7c ffff880453eae3f0 ffff880451c3dc48 ffffffff810394ad
[ 41.009079] [1591023449] ffff88044094b480 ffff880453eae3f0 000000000a041a7c 0000000000000001
[ 41.110849] [1591023449] Call Trace:
[ 41.153438] [1591023449] [<ffffffff810098e9>] ? sched_clock+0x9/0x10
[ 41.230286] [1591023449] [<ffffffff810394ad>] flush_tlb_page+0x8d/0xa0
[ 41.309209] [1591023449] [<ffffffff8103805e>] ptep_set_access_flags+0x4e/0x70
[ 41.395404] [1591023449] [<ffffffff8111f54b>] do_wp_page+0x2fb/0x820
[ 41.472250] [1591023449] [<ffffffff811217b6>] handle_pte_fault+0xa86/0xb10
[ 41.555328] [1591023449] [<ffffffff81121bba>] handle_mm_fault+0x1da/0x200
[ 41.637369] [1591023449] [<ffffffff810cde39>] ? trace_clock_local+0x9/0x10
[ 41.720448] [1591023449] [<ffffffff810d44ef>] ? rb_reserve_next_event.isra.33+0x9f/0x300
[ 41.818067] [1591023449] [<ffffffff816be3c3>] do_page_fault+0x353/0x5d0
[ 41.898026] [1591023449] [<ffffffff810639ab>] ? queue_delayed_work+0x2b/0x30
[ 41.983181] [1591023449] [<ffffffff810639cb>] ? schedule_delayed_work+0x1b/0x20
[ 42.071453] [1591023449] [<ffffffff810d8a16>] ? trace_wake_up+0x26/0x30
[ 42.151416] [1591023449] [<ffffffff810da588>] ? trace_current_buffer_unlock_commit+0x48/0x60
[ 42.253185] [1591023449] [<ffffffff810e8884>] ? ftrace_syscall_exit+0xb4/0xd0
[ 42.339379] [1591023449] [<ffffffff8100f172>] ? syscall_trace_leave+0xb2/0x170
[ 42.426611] [1591023449] [<ffffffff811481b9>] ? sys_read+0x59/0x100
[ 42.502419] [1591023449] [<ffffffff816ba9e5>] page_fault+0x25/0x30
[ 42.577186] [1591023449] Code: c1 3b ca 00 41 8d b6 cf 00 00 00 49 8d 7d 18 ff 90 d8 00 00 00 41 8b b4 24 18 d1 dc 81 85 f6 74 0e 0f 1f 40 00 f3 90 41 8b 4d 18 <85> c9 75 f6 83 3d 7b 4e ca 00 20 49 c7 84 24 00 d1 dc 81 00 00
[ 42.816027] [1591023449] Call Trace:
[ 42.858614] [1591023449] [<ffffffff810098e9>] ? sched_clock+0x9/0x10
[ 42.935463] [1591023449] [<ffffffff810394ad>] flush_tlb_page+0x8d/0xa0
[ 43.014389] [1591023449] [<ffffffff8103805e>] ptep_set_access_flags+0x4e/0x70
[ 43.100584] [1591023449] [<ffffffff8111f54b>] do_wp_page+0x2fb/0x820
[ 43.177427] [1591023449] [<ffffffff811217b6>] handle_pte_fault+0xa86/0xb10
[ 43.260507] [1591023449] [<ffffffff81121bba>] handle_mm_fault+0x1da/0x200
[ 43.342548] [1591023449] [<ffffffff810cde39>] ? trace_clock_local+0x9/0x10
[ 43.422133] [1591023449] END: PANIC REPORT GENERATED AT 1591023449
[ 43.422139] [1591023449] CCTRL PANIC DUMP
[ 43.422140] [1591023449] =========================
[ 43.422142] [1591023449] WDT last punched at 0
[ 43.422145] [1591023449] REG(0x300) = baadbeef
[ 43.422148] [1591023449] REG(0x304) = baadbeef
[ 43.422149] [1591023449] =========================
[ 43.422152] [1591023449] pstore: Dump l1 0 l2 96741 ToDump 65512 Dumped 0
[ 43.906455] [1591023449] [<ffffffff810d44ef>] ? rb_reserve_next_event.isra.33+0x9f/0x300
[ 44.004069] [1591023449] [<ffffffff816be3c3>] do_page_fault+0x353/0x5d0
[ 44.084027] [1591023449] [<ffffffff810639ab>] ? queue_delayed_work+0x2b/0x30
[ 44.169185] [1591023449] [<ffffffff810639cb>] ? schedule_delayed_work+0x1b/0x20
[ 44.257462] [1591023449] [<ffffffff810d8a16>] ? trace_wake_up+0x26/0x30
[ 44.337424] [1591023449] [<ffffffff810da588>] ? trace_current_buffer_unlock_commit+0x48/0x60
[ 44.439196] [1591023449] [<ffffffff810e8884>] ? ftrace_syscall_exit+0xb4/0xd0
[ 44.525385] [1591023449] [<ffffffff8100f172>] ? syscall_trace_leave+0xb2/0x170
[ 44.612617] [1591023449] [<ffffffff811481b9>] ? sys_read+0x59/0x100
[ 44.688426] [1591023449] [<ffffffff816ba9e5>] page_fault+0x25/0x30
[ 44.763197] [1591023449] NMI backtrace for cpu 3
[ 44.818233] [1591023449] CPU 3
[ 44.853536] [1591023449] Modules linked in: klm_procfs_init(PO) klm_i2c_stub(O) ata_piix klm_isan_kthread(PO) klm_cmos(PO) klm_ins_igb(O) klm_psdev(O)
由于设备的保修期已过且不在服务合同范围内,我认为我无法提出 Cisco TAC 支持请求。我相信在这一点上,这取决于我自己的设备来尝试解决这个问题。任何帮助,将不胜感激。