瞻博网络 SRX100 和 SRX210 无故恢复出厂设置

网络工程 杜松-srx
2021-07-05 18:40:55

我们在 HA 集群中有两个 SRX210,两个都意外断电,当它们再次出现时,两个都丢失了所有设置,(类似于恢复出厂设置!),这很奇怪,我们花了一些时间来修复。(这是一个月前)

稍后(本周),另一个位于完全不同网络(也不同大陆)上的 SRX100 因类似问题而死亡......当我查看设置时,它完全消失了。必须从备份中恢复。

有没有人见过这样的问题?这是在相当旧的固件 10.x 上运行的,这是一个错误吗?攻击?硬件问题?

更新: SRX100 更换为备份设备(也是 SRX100),故障设备升级到最新的稳定固件,并加载与以前相同的配置。然后它被设置在一个测试网络中,并进行了几天的压力测试......这个周末它又死了。(在 STATUS 上显示红灯,并且没有车辆通过它)。串口上的控制台窗口一直在打开,这就是内容。

U-Boot 1.1.6-JNPR-2.7 (Build time: Nov 26 2013 - 19:04:49)                     

Initializing memory this may take some time...                             
Measured DDR clock 266.62 MHz                                                  
SRX_100_LOWMEM board revision major:0, minor:0, serial #: AT0112AF1168         
OCTEON CN5020-SCP pass 1.1, Core clock: 500 MHz, DDR clock: 266 MHz (532 Mhz d)
DRAM:  512 MB                                                                  
Starting Memory POST...                                                        
Checking datalines... OK                                                       
Checking address lines... OK                                                   
Checking 512K memory for U-Boot... OK.                                         
Running U-Boot CRC Test... OK.                                                 
Flash:  4 MB                                                                   
USB:   scanning bus for devices... 4 USB Device(s) found                       
       scanning bus for storage devices... 2 Storage Device(s) found           
Clearing DRAM....... done                                                      
BIST check passed.                                                             
Boot Media: nand-flash usb                                                     
Net:   pic init done (err = 0)octeth0                                          
POST Passed                                                                    
Press SPACE to abort autoboot in 1 seconds                                     
ELF file is 32 bit                                                             
Loading .text @ 0x8f0000a0 (246560 bytes)                                      
Loading .rodata @ 0x8f03c3c0 (14144 bytes)                                     
Loading .reginfo @ 0x8f03fb00 (24 bytes)                                       
Loading .rodata.str1.4 @ 0x8f03fb18 (16516 bytes)                              
Loading set_Xcommand_set @ 0x8f043b9c (96 bytes)                               
Loading .rodata.cst4 @ 0x8f043bfc (20 bytes)                                   
Loading .data @ 0x8f044000 (5744 bytes)                                        
Loading .data.rel.ro @ 0x8f045670 (120 bytes)                                  
Loading .data.rel @ 0x8f0456e8 (136 bytes)                                     
Clearing .bss @ 0x8f045770 (11600 bytes)                                       
## Starting application at 0x8f0000a0 ...                                      
Consoles: U-Boot console                                                       
Found compatible API, ver. 2.7                                                 

FreeBSD/MIPS U-Boot bootstrap loader, Revision 2.7                             
(ccheng@svl-junos-d081.juniper.net, Tue Nov 26 19:05:43 PST 2013)              
Memory: 512MB                                                                  
[0]Booting from nand-flash slice 2                                             
Un-Protected 1 sectors                                                         
writing to flash...                                                            
Protected 1 sectors                                                            
Loading /boot/defaults/loader.conf                                             
/kernel data=0xb0496c+0x1344a4 syms=[0x4+0x8a9e0+0x4+0xc8f47]                  


Hit [Enter] to boot immediately, or space bar for command prompt.              
Booting [/kernel]...                                                           
Kernel entry at 0x801000e0 ...                                                 
init regular console                                                           
Primary ICache: Sets 64 Size 128 Asso 4                                        
Primary DCache: Sets 1 Size 128 Asso 64                                        
Secondary DCache: Sets 128 Size 128 Asso 8                                     
GDB: debug ports: uart                                                         
GDB: current port: uart                                                        
KDB: debugger backends: ddb gdb                                                
KDB: current backend: ddb                                                      
kld_map_v: 0x8ff80000, kld_map_p: 0x0                                          
Copyright (c) 1996-2014, Juniper Networks, Inc.                                
All rights reserved.                                                           
Copyright (c) 1992-2006 The FreeBSD Project.                                   
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994       
        The Regents of the University of California. All rights reserved.      
JUNOS 12.1X44-D35.5 #0: 2014-05-19 21:36:43 UTC                                
    builder@dagmath.juniper.net:/volume/build/junos/12.1/service/12.1X44-D35.5l
JUNOS 12.1X44-D35.5 #0: 2014-05-19 21:36:43 UTC                                
    builder@dagmath.juniper.net:/volume/build/junos/12.1/service/12.1X44-D35.5l
real memory  = 536870912 (512MB)                                               
avail memory = 304193536 (290MB)                                               
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs                            
Security policy loaded: JUNOS MAC/pcap (mac_pcap)                              
Security policy loaded: JUNOS MAC/runasnonroot (mac_runasnonroot)              
netisr_init: !debug_mpsafenet, forcing maxthreads from 2 to 1                  
cpu0 on motherboard                                                            
: CAVIUM's OCTEON 5020 CPU Rev. 0.1 with no FPU implemented                    
        L1 Cache: I size 32kb(128 line), D size 8kb(128 line), sixty four way. 
        L2 Cache: Size 128kb, 8 way                                            
obio0 on motherboard                                                           
uart0: <Octeon-16550 channel 0> on obio0                                       
uart0: console (9600,n,8,1)                                                    
twsi0 on obio0                                                                 
dwc0: <Synopsis DWC OTG Controller Driver> on obio0                            
usb0: <USB Bus for DWC OTG Controller> on dwc0                                 
usb0: USB revision 2.0                                                         
uhub0: vendor 0x0000 DWC OTG root hub, class 9/0, rev 2.00/1.00, addr 1        
uhub0: 1 port with 1 removable, self powered                                   
uhub1: vendor 0x0409 product 0x005a, class 9/0, rev 2.00/1.00, addr 2          
uhub1: single transaction translator                                           
uhub1: 2 ports with 1 removable, self powered                                  
umass0: STMicroelectronics ST72682  High Speed Mode, rev 2.00/2.10, addr 3     
umass1: Kingston DT 101 G2, rev 2.00/1.00, addr 4                              
cpld0 on obio0                                                                 
pcib0: <Cavium on-chip PCI bridge> on obio0                                    
Disabling Octeon big bar support                                               
PCI Status: PCI 32-bit: 0xc041b                                                
pcib0: Initialized controller                                                  
pci0: <PCI bus> on pcib0                                                       
pci0: <serial bus, USB> at device 2.0 (no driver attached)                     
pci0: <serial bus, USB> at device 2.1 (no driver attached)                     
pci0: <serial bus, USB> at device 2.2 (no driver attached)                     
gblmem0 on obio0                                                               
octpkt0: <Octeon RGMII> on obio0                                               
cfi0: <AMD/Fujitsu - 4MB> on obio0                                             
Timecounter "mips" frequency 500000000 Hz quality 0                            
###PCB Group initialized for udppcbgroup                                       
###PCB Group initialized for tcppcbgroup                                       
da1 at umass-sim1 bus 1 target 0 lun 0                                         
da1: <Kingston DT 101 G2 PMAP> Removable Direct Access SCSI-0 device           
da1: 40.000MB/s transfers                                                      
da1: 15304MB (31342592 512 byte sectors: 255H 63S/T 1950C)                     
da0 at umass-sim0 bus 0 target 0 lun 0                                         
da0: <ST ST72682 2.10> Removable Direct Access SCSI-2 device                   
da0: 40.000MB/s transfers                                                      
da0: 1000MB (2048000 512 byte sectors: 64H 32S/T 1000C)                        
Trying to mount root from ufs:/dev/da0s2a                                      
WARNING: / was not properly dismounted                                         
Attaching /cf/packages/junos via /dev/mdctl...                                 
Mounted junos package on /dev/md0...                                           

Media check on da0                                                             
Automatic reboot in progress...                                                
** /dev/da0s2a                                                                 
** Last Mounted on /                                                           
** Root file system                                                            
** Phase 1 - Check Blocks and Sizes                                            
** Phase 2 - Check Pathnames                                                   
** Phase 3 - Check Connectivity                                                
** Phase 4 - Check Reference Counts                                            
** Phase 5 - Check Cyl groups                                                  
142 files, 75006 used, 75032 free (32 frags, 9375 blocks, 0.0% fragmentation)  

***** FILE SYSTEM MARKED CLEAN *****                                           
Verified junos signed by PackageProduction_12_1_0                              
Verified jboot signed by PackageProduction_12_1_0                              
Ignoring watchdog timeout during boot/reboot                                   
veriexec: cannot verify /packages/junos-12.1X44-D35.5-domestic.sig: ERROR: Faic
** /dev/bo0s3e                                                                 
** Last Mounted on /config                                                     
** Phase 1 - Check Blocks and Sizes                                            
** Phase 2 - Check Pathnames                                                   
** Phase 3 - Check Connectivity                                                
** Phase 4 - Check Reference Counts                                            
** Phase 5 - Check Cyl groups                                                  
19 files, 50 used, 12388 free (36 frags, 1544 blocks, 0.3% fragmentation)      

***** FILE SYSTEM MARKED CLEAN *****                                           
** /dev/bo0s3f                                                                 
** Last Mounted on /cf/var                                                     
** Phase 1 - Check Blocks and Sizes                                            
** Phase 2 - Check Pathnames                                                   
** Phase 3 - Check Connectivity                                                
** Phase 4 - Check Reference Counts                                            
** Phase 5 - Check Cyl groups                                                  
FREE BLK COUNT(S) WRONG IN SUPERBLK                                            
SALVAGE? yes                                                                   

SUMMARY INFORMATION BAD                                                        
SALVAGE? yes                                                                   

BLK(S) MISSING IN BIT MAPS                                                     
SALVAGE? yes                                                                   

637 files, 10808 used, 164510 free (254 frags, 20532 blocks, 0.1% fragmentatio)

***** FILE SYSTEM MARKED CLEAN *****                                           

***** FILE SYSTEM WAS MODIFIED *****                                           
Loading configuration ...                                                      
vn_read_compressed: inflate of bytepos 86966272, offset in file = 51491159, er}
panic: bad inflate                                                             
cpuid = 0                                                                      
KDB: stack backtrace:                                                          
SP 0: not in kernel                                                            
uart_z8530_class+0x0 (0,0,0,0) ra 0 sz 0                                       
pid 54, process: md0                                                           
###Entering boot mastership relinquish phase                                   
KDB: enter: panic                                                              
[thread pid 54 tid 100048 ]                                                    
Stopped at      breakpoint+0x4: jr      ra                                     
db>                                                                            

请注意以下事项:

  1. USB 驱动器在设备中
  2. 串行电缆在设备中
  3. 设备未在 UPS 上运行,可能发生电源不稳定。
  4. 创建了 4 个不同的网络,其中 3 个受到监控

希望有人可以阐明可能出了什么问题。

更新 2: 按下电源按钮什么也没做,但按下并按住它 6 秒以上将开关关闭。当我再次打开它时,它会正常加载配置。因此,与初始时间不同,这次设备没有被擦除。

1个回答

听起来这里有几个不同的问题......

众所周知,旧版本的 JunOS 会在电源故障期间损坏内容。请记住,JunOS 基于 FreeBSD,因此有一个隐含的假设,即您会在关闭电源之前进行适当的关闭。

为了缓解这种情况,JunOS 有一个救援配置如果常规配置已损坏/无法读取,它将改为加载救援配置。你的救援配置设置了吗?如果没有,你应该有。我的最佳实践是在任何配置更改完成/测试/批准后更新生产系统上的救援配置。这可以解释您在使用 SRX210 时遇到的“恢复默认设置”问题。(第二种可能是您的集群运行状况不佳,并且节点之间的配置未按预期同步。请参阅此处的命令以验证集群是否正常工作。)

此外,有可能实际上损坏了旧 JunOS 设备的根文件系统,它根本无法启动,并且将无法db>提示调试。

在较新版本的 JunOS 中,几乎解决了因电源故障导致损坏的问题。添加弹性双根分区有很大帮助。请注意,如果您从早于这些功能的旧版本升级,则在升级过程中需要进行一些额外的引导加载程序/分区表更改。 http://www.juniper.net/techpubs/en_US/junos11.4/information-products/topic-collections/security/software-all/initial-config/index.html?topic-56813.html

确保您在两个 SRX 上运行最新推荐的 JunOS 版本如果您从旧版本升级,请确保按照说明进行操作并且没有跳过步骤,因为您可能会错过双分区功能。

您在 SRX100 上看到的故障看起来像是根文件系统损坏问题(panic: bad inflate是一个重要线索)。但是,看起来您升级到足够新的 JunOS 版本,损坏的根 FS 永远不会发生。此外,如果您重新启动并神奇地再次开始工作,那么内置的闪存存储就会消亡。我会用 JTAC 开一张票来更换或买一张新的。