我们在 HA 集群中有两个 SRX210,两个都意外断电,当它们再次出现时,两个都丢失了所有设置,(类似于恢复出厂设置!),这很奇怪,我们花了一些时间来修复。(这是一个月前)
稍后(本周),另一个位于完全不同网络(也不同大陆)上的 SRX100 因类似问题而死亡......当我查看设置时,它完全消失了。必须从备份中恢复。
有没有人见过这样的问题?这是在相当旧的固件 10.x 上运行的,这是一个错误吗?攻击?硬件问题?
更新: SRX100 更换为备份设备(也是 SRX100),故障设备升级到最新的稳定固件,并加载与以前相同的配置。然后它被设置在一个测试网络中,并进行了几天的压力测试......这个周末它又死了。(在 STATUS 上显示红灯,并且没有车辆通过它)。串口上的控制台窗口一直在打开,这就是内容。
U-Boot 1.1.6-JNPR-2.7 (Build time: Nov 26 2013 - 19:04:49)
Initializing memory this may take some time...
Measured DDR clock 266.62 MHz
SRX_100_LOWMEM board revision major:0, minor:0, serial #: AT0112AF1168
OCTEON CN5020-SCP pass 1.1, Core clock: 500 MHz, DDR clock: 266 MHz (532 Mhz d)
DRAM: 512 MB
Starting Memory POST...
Checking datalines... OK
Checking address lines... OK
Checking 512K memory for U-Boot... OK.
Running U-Boot CRC Test... OK.
Flash: 4 MB
USB: scanning bus for devices... 4 USB Device(s) found
scanning bus for storage devices... 2 Storage Device(s) found
Clearing DRAM....... done
BIST check passed.
Boot Media: nand-flash usb
Net: pic init done (err = 0)octeth0
POST Passed
Press SPACE to abort autoboot in 1 seconds
ELF file is 32 bit
Loading .text @ 0x8f0000a0 (246560 bytes)
Loading .rodata @ 0x8f03c3c0 (14144 bytes)
Loading .reginfo @ 0x8f03fb00 (24 bytes)
Loading .rodata.str1.4 @ 0x8f03fb18 (16516 bytes)
Loading set_Xcommand_set @ 0x8f043b9c (96 bytes)
Loading .rodata.cst4 @ 0x8f043bfc (20 bytes)
Loading .data @ 0x8f044000 (5744 bytes)
Loading .data.rel.ro @ 0x8f045670 (120 bytes)
Loading .data.rel @ 0x8f0456e8 (136 bytes)
Clearing .bss @ 0x8f045770 (11600 bytes)
## Starting application at 0x8f0000a0 ...
Consoles: U-Boot console
Found compatible API, ver. 2.7
FreeBSD/MIPS U-Boot bootstrap loader, Revision 2.7
(ccheng@svl-junos-d081.juniper.net, Tue Nov 26 19:05:43 PST 2013)
Memory: 512MB
[0]Booting from nand-flash slice 2
Un-Protected 1 sectors
writing to flash...
Protected 1 sectors
Loading /boot/defaults/loader.conf
/kernel data=0xb0496c+0x1344a4 syms=[0x4+0x8a9e0+0x4+0xc8f47]
Hit [Enter] to boot immediately, or space bar for command prompt.
Booting [/kernel]...
Kernel entry at 0x801000e0 ...
init regular console
Primary ICache: Sets 64 Size 128 Asso 4
Primary DCache: Sets 1 Size 128 Asso 64
Secondary DCache: Sets 128 Size 128 Asso 8
GDB: debug ports: uart
GDB: current port: uart
KDB: debugger backends: ddb gdb
KDB: current backend: ddb
kld_map_v: 0x8ff80000, kld_map_p: 0x0
Copyright (c) 1996-2014, Juniper Networks, Inc.
All rights reserved.
Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
JUNOS 12.1X44-D35.5 #0: 2014-05-19 21:36:43 UTC
builder@dagmath.juniper.net:/volume/build/junos/12.1/service/12.1X44-D35.5l
JUNOS 12.1X44-D35.5 #0: 2014-05-19 21:36:43 UTC
builder@dagmath.juniper.net:/volume/build/junos/12.1/service/12.1X44-D35.5l
real memory = 536870912 (512MB)
avail memory = 304193536 (290MB)
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
Security policy loaded: JUNOS MAC/pcap (mac_pcap)
Security policy loaded: JUNOS MAC/runasnonroot (mac_runasnonroot)
netisr_init: !debug_mpsafenet, forcing maxthreads from 2 to 1
cpu0 on motherboard
: CAVIUM's OCTEON 5020 CPU Rev. 0.1 with no FPU implemented
L1 Cache: I size 32kb(128 line), D size 8kb(128 line), sixty four way.
L2 Cache: Size 128kb, 8 way
obio0 on motherboard
uart0: <Octeon-16550 channel 0> on obio0
uart0: console (9600,n,8,1)
twsi0 on obio0
dwc0: <Synopsis DWC OTG Controller Driver> on obio0
usb0: <USB Bus for DWC OTG Controller> on dwc0
usb0: USB revision 2.0
uhub0: vendor 0x0000 DWC OTG root hub, class 9/0, rev 2.00/1.00, addr 1
uhub0: 1 port with 1 removable, self powered
uhub1: vendor 0x0409 product 0x005a, class 9/0, rev 2.00/1.00, addr 2
uhub1: single transaction translator
uhub1: 2 ports with 1 removable, self powered
umass0: STMicroelectronics ST72682 High Speed Mode, rev 2.00/2.10, addr 3
umass1: Kingston DT 101 G2, rev 2.00/1.00, addr 4
cpld0 on obio0
pcib0: <Cavium on-chip PCI bridge> on obio0
Disabling Octeon big bar support
PCI Status: PCI 32-bit: 0xc041b
pcib0: Initialized controller
pci0: <PCI bus> on pcib0
pci0: <serial bus, USB> at device 2.0 (no driver attached)
pci0: <serial bus, USB> at device 2.1 (no driver attached)
pci0: <serial bus, USB> at device 2.2 (no driver attached)
gblmem0 on obio0
octpkt0: <Octeon RGMII> on obio0
cfi0: <AMD/Fujitsu - 4MB> on obio0
Timecounter "mips" frequency 500000000 Hz quality 0
###PCB Group initialized for udppcbgroup
###PCB Group initialized for tcppcbgroup
da1 at umass-sim1 bus 1 target 0 lun 0
da1: <Kingston DT 101 G2 PMAP> Removable Direct Access SCSI-0 device
da1: 40.000MB/s transfers
da1: 15304MB (31342592 512 byte sectors: 255H 63S/T 1950C)
da0 at umass-sim0 bus 0 target 0 lun 0
da0: <ST ST72682 2.10> Removable Direct Access SCSI-2 device
da0: 40.000MB/s transfers
da0: 1000MB (2048000 512 byte sectors: 64H 32S/T 1000C)
Trying to mount root from ufs:/dev/da0s2a
WARNING: / was not properly dismounted
Attaching /cf/packages/junos via /dev/mdctl...
Mounted junos package on /dev/md0...
Media check on da0
Automatic reboot in progress...
** /dev/da0s2a
** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
142 files, 75006 used, 75032 free (32 frags, 9375 blocks, 0.0% fragmentation)
***** FILE SYSTEM MARKED CLEAN *****
Verified junos signed by PackageProduction_12_1_0
Verified jboot signed by PackageProduction_12_1_0
Ignoring watchdog timeout during boot/reboot
veriexec: cannot verify /packages/junos-12.1X44-D35.5-domestic.sig: ERROR: Faic
** /dev/bo0s3e
** Last Mounted on /config
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
19 files, 50 used, 12388 free (36 frags, 1544 blocks, 0.3% fragmentation)
***** FILE SYSTEM MARKED CLEAN *****
** /dev/bo0s3f
** Last Mounted on /cf/var
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN SUPERBLK
SALVAGE? yes
SUMMARY INFORMATION BAD
SALVAGE? yes
BLK(S) MISSING IN BIT MAPS
SALVAGE? yes
637 files, 10808 used, 164510 free (254 frags, 20532 blocks, 0.1% fragmentatio)
***** FILE SYSTEM MARKED CLEAN *****
***** FILE SYSTEM WAS MODIFIED *****
Loading configuration ...
vn_read_compressed: inflate of bytepos 86966272, offset in file = 51491159, er}
panic: bad inflate
cpuid = 0
KDB: stack backtrace:
SP 0: not in kernel
uart_z8530_class+0x0 (0,0,0,0) ra 0 sz 0
pid 54, process: md0
###Entering boot mastership relinquish phase
KDB: enter: panic
[thread pid 54 tid 100048 ]
Stopped at breakpoint+0x4: jr ra
db>
请注意以下事项:
- USB 驱动器在设备中
- 串行电缆在设备中
- 设备未在 UPS 上运行,可能发生电源不稳定。
- 创建了 4 个不同的网络,其中 3 个受到监控
希望有人可以阐明可能出了什么问题。
更新 2: 按下电源按钮什么也没做,但按下并按住它 6 秒以上将开关关闭。当我再次打开它时,它会正常加载配置。因此,与初始时间不同,这次设备没有被擦除。