📜 ⬆️ ⬇️

JunOS update on EX4500 switches in VirtualChassis - what could go wrong? Part 2

So, not postponing the case indefinitely, I publish the second part of the post above. I express my gratitude for the publication - it is nice that the article interested you and the topic has found a continuation.

Let me remind you that in the last part I stopped at the conclusion that after rebooting one of the VC devices did not work properly. As was rightly noted in one of the comments, it turns out that after all this I went home. No, the first part describes about 20 minutes of my almost five-hour epic. Buckled up? Go!

After the reboot, it is not entirely clear what happened and did happen, but the main thing is that the client traffic has gone. I am connecting via a dedicated management Ethernet interface and the first surprise is that the main RE has become a member1:

login as: user
user@switch password:
--- JUNOS 12.3R12.4 built 2016-01-20 04:27:51 UTC
{master:1}
user@switch>

In principle, this happens and is not scary, since I have the same devices with the pre-provisioned VC configuration and any of them can be a master. The operating system was updated and it is good. But this is no longer good:
')
user@switch> show chassis routing-engine
Routing Engine status:
Slot 1:
Current state Master
DRAM 1024
Memory utilization 45 percent
CPU utilization:
User 14 percent
Background 0 percent
Kernel 11 percent
Interrupt 1 percent
Idle 74 percent
Model EX4500-40F
Serial ID
Start time 2016-06-02 01:28:45
Uptime 34 minutes, 55 seconds
Last reboot reason Router rebooted after a normal shutdown.
Load averages: 1 minute 5 minute 15 minute
0.59 0.80 0.66
{master:1}
user@switch>

The device sees only one RE, and there should be two of them. Further investigation only confirms that non-burning LEDs are not without reason:

user@switch> show virtual-chassis
Preprovisioned Virtual Chassis
Virtual Chassis ID:
Virtual Chassis Mode: Enabled
Mstr Mixed Neighbor List
Member ID Status Serial No Model prio Role Mode ID Interface
0 (FPC 0) Inactive ex4500-40f 129 Linecard N 1 vcp-1
1 vcp-0
1 (FPC 1) Prsnt ex4500-40f 129 Master* N 0 vcp-1
0 vcp-0
{master:1}
user@switch>

The first device, member0, is recognized as a Linecard and has the status Inactive - this means that it does not actively participate in the virtual chassis. Dedicated stack interfaces (vcp-1 and vcp-0) are active, so you can try a local connection:

Connection and verification
{master: 1}
user @ switch> request session member 0
--- JUNOS 11.1R3.5 built 2011-06-25 01:18:46 UTC
{linecard: 0}
user @ switch> show system storage
fpc0:
- Filesystem Size Used Avail Capacity Mounted on
/ dev / da0s1a 370M 142M 198M 42% /
devfs 1.0K 1.0K 0B 100% / dev
/ dev / md0 37M 37M 0B 100% / packages / mnt / jbase
/ dev / md1 12M 7.3M 3.6M 67% / packages / mfs-jcrypto-ex
/ dev / md2 22M 22M 0B 100% / packages / mnt / jcrypto-ex- 11.1R3.5
/ dev / md3 8.7M 4.1M 3.9M 51% / packages / mfs-jdocs-ex
/ dev / md4 6.3M 6.3M 0B 100% / packages / mnt / jdocs-ex- 11.1R3.5
/ dev / md5 64M 61M -1.4M 102% / packages / mfs-jkernel-ex
/ dev / md6 162M 162M 0B 100% /packages/mnt/jkernel-ex-11.1R3.5
/ dev / md7 13M 8.5M 3.5M 71% / packages / mfs-jpfe-ex45x
/ dev / md8 24M 24M 0B 100% /packages/mnt/jpfe-ex45x-11.1R3.5
/ dev / md9 20M 15M 2.9M 84% / packages / mfs-jroute-ex
/ dev / md10 47M 47M 0B 100% /packages/mnt/jroute-ex-11.1R3.5
/ dev / md11 16M 11M 3.2M 78% / packages / mfs-jswitch-ex
/ dev / md12 35M 35M 0B 100% /packages/mnt/jswitch-ex-11.1R3.5
/ dev / md13 12M 7.8M 3.6M 68% / packages / mfs-jweb-ex
/ dev / md14 22M 22M 0B 100% /packages/mnt/jweb-ex-11.1R3.5
/ dev / md15 126M 8.0K 116M 0% / tmp
/ dev / da0s3e 243M 4.4M 219M 2% / var
/ dev / da0s3d 727M 130K 668M 0% / var / tmp
/ dev / da0s4d 123M 492K 113M 0% / config
/ dev / md16 118M 14M 95M 13% / var / rundb
procfs 4.0K 4.0K 0B 100% / proc
/ var / jail / etc 243M 4.4M 219M 2% /packages/mnt/jweb-ex-11.1R3.5/jail/var/etc
/ var / jail / run 243M 4.4M 219M 2% /packages/mnt/jweb-ex-11.1R3.5/jail/var/run
/ var / jail / tmp 243M 4.4M 219M 2% /packages/mnt/jweb-ex-11.1R3.5/jail/var/tmp
/ var / tmp 727M 130K 668M 0% /packages/mnt/jweb-ex-11.1R3.5/jail/var/tmp/uploads
devfs 1.0K 1.0K 0B 100% /packages/mnt/jweb-ex-11.1R3.5/jail/dev

fpc1:
- Filesystem Size Used Avail Capacity Mounted on
/ dev / da0s2a 363M 130M 204M 39% /
devfs 1.0K 1.0K 0B 100% / dev
/ dev / md0 69M 69M 0B 100% / packages / mnt / jbase
/ dev / md1 5.8M 1.1M 4.2M 21% / packages / mfs-fips-mode-powerpc
/ dev / md2 2.9M 2.9M 0B 100% / packages / mnt / fips-mode-powerpc- 12.3R12.4
/ dev / md3 9.1M 4.4M 3.9M 53% / packages / mfs-jcrypto-ex
/ dev / md4 12M 12M 0B 100% / packages / mnt / jcrypto-ex- 12.3R12.4
/ dev / md5 8.1M 3.5M 4.0M 47% / packages / mfs-jdocs-ex
/ dev / md6 6.2M 6.2M 0B 100% /packages/mnt/jdocs-ex-12.3R12.4
/ dev / md7 43M 39M 616K 98% / packages / mfs-jkernel-ex
/ dev / md8 109M 109M 0B 100% /packages/mnt/jkernel-ex-12.3R12.4
/ dev / md9 12M 7.9M 3.6M 69% / packages / mfs-jpfe-ex45x
/ dev / md10 22M 22M 0B 100% /packages/mnt/jpfe-ex45x-12.3R12.4
/ dev / md11 17M 12M 3.2M 79% / packages / mfs-jroute-ex
/ dev / md12 38M 38M 0B 100% /packages/mnt/jroute-ex-12.3R12.4
/ dev / md13 12M 7.2M 3.6M 67% / packages / mfs-jswitch-ex
/ dev / md14 21M 21M 0B 100% /packages/mnt/jswitch-ex-12.3R12.4
/ dev / md15 14M 9.5M 3.4M 73% / packages / mfs-jweb-ex
/ dev / md16 25M 25M 0B 100% /packages/mnt/jweb-ex-12.3R12.4
/ dev / da0s3e 243M 20M 204M 9% / var
/ dev / md17 252M 12K 232M 0% / tmp
/ dev / da0s3d 727M 107M 561M 16% / var / tmp
/ dev / da0s4d 123M 494K 113M 0% / config
/ dev / md18 118M 22M 86M 20% / var / rundb
procfs 4.0K 4.0K 0B 100% / proc
/ var / jail / etc 243M 20M 204M 9% /packages/mnt/jweb-ex-12.3R12.4/jail/var/etc
/ var / jail / run 243M 20M 204M 9% /packages/mnt/jweb-ex-12.3R12.4/jail/var/run
/ var / jail / tmp 243M 20M 204M 9% /packages/mnt/jweb-ex-12.3R12.4/jail/var/tmp
/ var / tmp 727M 107M 561M 16% /packages/mnt/jweb-ex-12.3R12.4/jail/var/tmp/uploads
devfs 1.0K 1.0K 0B 100% /packages/mnt/jweb-ex-12.3R12.4/jail/dev

{linecard: 0}
user @ switch> exit
rlogin: connection closed
{master: 1}
user @ switch>

That's it! The operating system was updated only on the second device, and on the first - the old one (pay attention to the version of the firmware file FPC0 and FPC1), therefore the VC logic deactivated it. Anyway, the device is and you can try to update it again. One problem - when updating, I followed the guides from Juniper and put the image in / var / tmp, respectively, it is now empty and you need to upload the image again. I concentrate all attention on this switch and several times I try to update the system / reboot only it (member1 continues to work):

{master:1}
user@switch> request system software add /var/tmp/jinstall-XXX.tgz validate member 0
user@switch> request system reboot member 0

At the end of the download / update process every time I see:
Installing disk0s3d:/jinstall-ex-4500-12.3R12.4-domestic-signed.tgz
Verified jinstall-ex-4500-12.3R12.4-domestic.tgz signed by PackageProduction_12_ 3_0
mode = 040700, inum = 38, fs = /instrootmnt/var
panic: ffs_valloc: dup alloc
###Entering boot mastership relinquish phase
KDB: enter: panic
###Entering boot mastership relinquish phase
[thread pid 316 tid 100041 ]
Stopped at kdb_enter+0x1a0: addis r3, r0, -0x7fa4
db>

Despite the lack of knowledge in Unix, on which JunOS is based, the string "KDB: enter: panic" does not inspire optimism. Among other things, the system crashes into system debugging mode (db>), and this is really bad. For reference: Juniper has a familiar console mode where a working piece of hardware is configured, you can enter the Unix command line as root and so on; there is a loader loader mode> for restoring and uploading an operating system image, roughly corresponding to rommon> Cisco; and there is a debug mode db>, which appears when there are problems with the physical components of the device. You can do very little in this mode if you are not a Juniper TAC engineer. At that moment I don’t really understand what it is and, as a proud Windows user, I try to click “next”:

db> help
DDB Quick Help
-------------------
Type 'c' to continue, 'reset' or 'panic' to restart.

print p examine x search set write
w delete d break dwatch watch dhwatch
hwatch step s continue c until next
match trace alltrace where bt call show
ps gdb reset kill watchdog thread panic
ddbdumpsys dumpsys halt reboot
db> c
Uptime: 2m41s
Cannot dump. No dump device defined.
Automatic reboot in 15 seconds - press a key on the console to abort
Rebooting...

... ...

***** FILE SYSTEM MARKED CLEAN *****
switch (ttyu0)
login: user
Logging to master
...
Connection to master failed, enabling local login
Password:
--- JUNOS 11.1R3.5 built 2011-06-25 01:18:46 UTC
{linecard:0}
user@switch>

About a miracle - the system boots, albeit with the old version. At that time, I did not realize that this old version is loaded from the backup section (slice alternate), since the updated version is written to the main section and in my case cannot be loaded from it. Therefore, it is so important to update the bootloader whenever possible - this is another saving straw when problems arise. As a remark: pay attention to the lines “Logging to master ... Connection to master failed”. All devices united in VC have a single management console, that is, when connected, for example via SSH, we immediately get into the console of the master device. Since in my case the VC is not operational, I get into the control mode of the local gland.

In the process of work, I think of pouring an image of the OS on a workable RE and copy between VC members - this is faster and there is no need to constantly be distracted by WinSCP. It works even in my case, since the communication channels between devices are active.

user@switch> file copy fpc1:/var/tmp/jinstall-XXX.tgz fpc0:/var/tmp/jinstall-XXX.tgz

Nevertheless, an attempt to update and reboot each time gives the same result - I find myself on the system debug mode, followed by the ability to download the old version. Accordingly, the problem is constant and I will not achieve anything by repeating the steps. Then it occurs to me when I have a device, since I have a device with a working system (member1) and there is a flash drive on which you can roll a snapshot and boot from it. So I do:

{master:1}
umass1: SanDisk Corporation U3 Cruzer Micro, rev 2.00/0.10, addr 4
da1 at umass-sim1 bus 1 target 0 lun 0
da1 : <SanDisk U3 Cruzer Micro 2.15> Removable Direct Access SCSI-2 device
da1: 40.000MB/s transfers
da1: 973MB (1994385 512 byte sectors: 64H 32S/T 973C)
user@switch> request system snapshot local partition media external
user@switch> show system snapshot media external
fpc0:
--------------------------------------------------------------------------
error: external media missing or invalid

fpc1:
--------------------------------------------------------------------------
Information for snapshot on external ( /dev/da1s1a ) (backup)
Creation date: Jun 2 02:28:20 2016
JUNOS version on snapshot:
jbase : 11.1R3.5
jkernel-ex: 11.1R3.5
jcrypto-ex: 11.1R3.5
jdocs-ex: 11.1R3.5
jswitch-ex: 11.1R3.5
jpfe-ex45x: 11.1R3.5
jroute-ex: 11.1R3.5
jweb-ex: 11.1R3.5
Information for snapshot on external ( /dev/da1s2a ) (primary)
Creation date: Jun 2 02:29:21 2016
JUNOS version on snapshot:
jbase : ex-12.3R12.4
jkernel-ex: 12.3R12.4
jcrypto-ex: 12.3R12.4
jdocs-ex: 12.3R12.4
jswitch-ex: 12.3R12.4
jpfe-ex45x: 12.3R12.4
jroute-ex: 12.3R12.4
jweb-ex: 12.3R12.4
fips-mode-powerpc: 12.3R12.4

Pay attention to the messages when you connect a flash drive - it was defined as a system device da1, it will be needed in the future. Snapshot on the external flash drive repeats itself on the internal storage device - version 12.3 on the main partition (/ dev / da1s2a) and 11.1 - on the backup (/ dev / da1s1a). Slice names can also be useful if you want to boot the system from a specific partition. I insert the USB flash drive into the problem device and continue:

user@switch> request session member 0
--- JUNOS 11.1R3.5 built 2011-06-25 01:18:46 UTC
{linecard:0}
user@switch> request system reboot member 0 media external
Reboot the system ? [yes,no] (no) yes

Here again, as a precaution, I entered the local device control session, most likely it was possible to overload member0 from the wizard console. When I restart, I see a constantly cyclic sequence:

U-Boot 1.1.6 (Mar 26 2011 - 04:34:19)
Board: EX4500-40F 10.4
EPLD: Version 6.2 (0x81)
DRAM: Initializing (1024 MB)
FLASH: 8 MB
Firmware Version: 01.00.00
USB: scanning bus for devices... 3 USB Device(s) found
scanning bus for storage devices... 1 Storage Device(s) found
ELF file is 32 bit
Consoles: U-Boot console
FreeBSD/PowerPC U-Boot bootstrap loader, Revision 2.4
(hmerge@svl-junos-pool130.juniper.net, Sat Mar 26 02:46:28 PDT 2011)
Memory: 1024MB
bootsequencing is enabled
bootsuccess is not set
new boot device = disk2
can't load '/kernel'
can't load '/kernel.old'
Press Enter to stop auto bootsequencing and to enter loader prompt.

Watchdog timed out. Resetting the board.

Switch nowhere further these repeating lines does not move. What the?!? Can't find the core? After a while I pay attention to the penultimate line, press Enter and get into the loader:

loader> ?
Available commands:
bcachestat get disk block cache stats
boot boot a file or loaded kernel
autoboot boot automatically after a delay
help detailed help
? list commands
show show variable(s)
set set a variable
unset unset a variable
echo echo arguments
read read input from the terminal
more show contents of a file
nextboot set next boot device
lsdev list all devices
install install JUNOS
include read commands from a file
ls list files
load load a kernel or module
unload unload all modules
lsmod list loaded modules
export export variables to U-Boot environment
save save U-Boot environment
heap show heap usage

Thin, but still better than a cyclic reboot. The loader mode itself is just created for system recovery, that is, I'm in the right place. The work time has exceeded 2 hours ... I try different versions of the system image layout and update - with no result.

loader> install /var/tmp/jinstall-ex-4500-12.3R12.4-domestic-signed.tgz
invalid URL
loader> install --format file:///jinstall-ex-4500-12.3R12.4-domestic-signed.tgz
cannot open package (error 22)
loader> install --format file:///jinstall-ex-4500-12.3R12.4-domestic-signed.tgz
Device NOT ready
Request Sense returned 06 28 00
cannot open package (error 5)

Actually, these lines should work, but for some reason they don’t work - either at that time I didn’t understand anything, or something else. I see the same cyclic reboot and swear at the lack of a kernel. During the constant reboot, another interesting thing pops up:

Firmware Version: 01.00.00
USB: scanning bus for devices... 3 USB Device(s) found
scanning bus for storage devices... 1 Storage Device(s) found
ELF file is 32 bit
Consoles: U-Boot console
FreeBSD/PowerPC U-Boot bootstrap loader, Revision 2.4
(hmerge@svl-junos-pool130.juniper.net, Sat Mar 26 02:46:28 PDT 2011)
Memory: 1024MB
bootsequencing is enabled
bootsuccess is not set
new boot device = disk2

For me, at that moment, this is nothing more than an assumption, but bearing in mind that Juniper stands for devices with 0, it seems strange to me to have a “disk2” - I have one flash drive. In addition, when I inserted the flash drive, it was identified as da1. If you go back a little, you can see that the device tried to boot from disk 2 immediately after rebooting from the console (when I indicated the external flash drive as a device for booting), but I haven’t noticed it until now. We return to the loader and confirm the concerns, there is no disk 2, and the flash drive is a zero device:

loader> lsdev
disk devices:
disk0 - USB storage device 0
net devices:
net0:
loader> nextboot disk0:
loader> reboot
Resetting...

Everything? But how not so! The system is again trying to boot from disk 2, but now I feel that the right way. Along the way, I'm sorting out nearby options with different slices on a flash drive (nextboot diskXsY), with no result. Already almost desperate, I find the information that the boot device should be set as an environment variable from U-boot mode. I don’t know how to describe this fourth mode and what can be done there, but you can get there by interrupting the boot process by pressing Ctrl + C at the very beginning, when the system polls USB devices (USB: scanning bus for devices ...). The first line contains INTERRUPT in <> delimiters, but because of it the markup and fonts move down, so the delimiters removed:

=> INTERRUPT
=> setenv loaddev disk1
=> saveenv
Saving Environment to Flash...
. done
Un-Protected 1 sectors
Erasing Flash...
. done
Erased 1 sectors
Writing to Flash... writing to flash...
done
. done
Protected 1 sectors
=> reset

......

...
Boot media /dev/da1 has dual root support
WARNING: JUNOS versions running on dual partitions are not same
** /dev/da1s1a
FILE SYSTEM CLEAN; SKIPPING CHECKS
clean, 274948 free (84 frags, 34358 blocks, 0.0% fragmentation)
switch (ttyu0)
login: user
Logging to master
...
Connection to master failed, enabling local login
Password:

--- JUNOS 12.3R12.4 built 2016-01-20 04:27:51 UTC
warning: This chassis is operating in a non-master role as part of a virtual-chassis (VC) system.
warning: Use of interactive commands should be limited to debugging and VC Port operations.
warning: Full CLI access is provided by the Virtual Chassis Master (VC-M) chassis.
warning: The VC-M can be identified through the show virtual-chassis status command executed at this console.
warning: Please logout and log into the VC-M to use CLI.
{linecard:1}
user@switch>

WARNING: cli has been replaced by an updated version:
CLI release 12.3R12.4 built by builder on 2016-01-20 03:55:45 UTC
Restart cli using the new version ? [yes,no] (yes)

Restarting cli ...
{master:0}
user@switch>

Let's see what I saw after the reboot:

"WARNING: JUNOS versions running on dual partitions are not the same" is not terrible and expected, because the new version is contained only in the main slice of the device.

"Connection to master failed ..." and "warning: This chassis is operating in a non-master role ..." is not a problem, as the VC needs time to reconnect with the members and synchronize the configuration.

After a few minutes of waiting, the system itself asks to restart the console (WARNING: cli has been replaced by an updated version) and now the new version is loaded on the correct RE.

Checking:

user@switch> show chassis routing-engine
Routing Engine status:
Slot 0:
Current state Master
DRAM 1024
Memory utilization 50 percent
CPU utilization:
User 43 percent
Background 0 percent
Kernel 24 percent
Interrupt 1 percent
Idle 32 percent
Model EX4500-40F
Serial ID
Start time 2016-06-02 03:43:20
Uptime 3 minutes, 22 seconds
Last reboot reason Router rebooted after a normal shutdown.
Load averages: 1 minute 5 minute 15 minute
2.40 1.12 0.46
Routing Engine status:
Slot 1:
Current state Backup
DRAM 1024
Memory utilization 44 percent
CPU utilization:
User 40 percent
Background 0 percent
Kernel 30 percent
Interrupt 1 percent
Idle 28 percent
Model EX4500-40F
Serial ID
Start time 2016-06-02 01:28:45
Uptime 2 hours, 17 minutes, 57 seconds
Last reboot reason Router rebooted after a normal shutdown.
Load averages: 1 minute 5 minute 15 minute
0.49 0.46 0.44

{master:0}
user@switch>
show virtual-chassis

Preprovisioned Virtual Chassis
Virtual Chassis ID:
Virtual Chassis Mode: Enabled
Mstr Mixed Neighbor List
Member ID Status Serial No Model prio Role Mode ID Interface
0 (FPC 0) Prsnt ex4500-40f 129 Master* N 1 vcp-1
1 vcp-0
1 (FPC 1) Prsnt ex4500-40f 129 Backup N 0 vcp-1
0 vcp-0

{master:0}

Victory! Complete and unconditional! To say that I was pleased with myself - to say nothing, CSW just went off scale. Despite the fact that my work lasted about 4 hours, it was no longer so important, because the customers did not feel it. I not only gave myself a virtual medal, but also saved a lot of money to my company. I got so many impressions during these 4 hours that then it took many days (and beer) to put everything together and understand the full picture.

Now it remains only to make snapshots to the internal storage in the main section and, after a week or two, to the backup. Why in a week - for running in a new version in production, since downloading the old version of the system from the backup section is much easier than downgrading it on the entire device.

Let's analyze the situation.

According to Juniper TAC, problems with the upgrade arose due to damage to the main boot partition. You can’t do anything about it and the switch should be handed over under warranty. I still really hope that the problem was caused by file system corruption (incorrect reboot or the like) and was fixed during the update process (Un-Protected 1 sectors Erasing Flash .... done) when I set the environment variable.

What a fright the device wanted to boot from disk2, if nobody explicitly pointed at it and it wasn’t in the system - it’s not clear, TAC also found it difficult to comment. In the logs, then it was even possible to trace that disk2 appears from nowhere (note that new boot device = disk1s2 changes to new boot device = disk2):

Change boot device
user @ switch> request system reboot member 0 media external
Reboot the system? [yes, no] (no) yes
Rebooting fpc0
*** FINAL System shutdown message from root @ switch *** System going down IMMEDIATELY {linecard: 0}
iuriia @ CORE> JWaiting (max 300 seconds) for system process `vnlru_mem 'to stop ... done
Waiting (max 300 seconds) for system process `vnlru 'to stop ... done
Waiting (max 300 seconds) for system process `bufdaemon 'to stop ... done
Waiting (max 300 seconds) for system process syncer 'to stop ...
Syncing disks, vnodes remaining ... 2 2 2 0 1 1 1 0 0 0 0 0 done
syncing disks ... All buffers synced.
Uptime: 23m53s
recorded reboot as normal shutdown
Rebooting ...
U-Boot 1.1.6 (Mar 26 2011 - 04:34:19)
Board: EX4500-40F 10.4
EPLD: Version 6.2 (0x82)
DRAM: Initializing (1024 MB)
FLASH: 8 MB
Firmware Version: 01.00.00
USB: scanning bus for devices ... 3 USB Device (s) found
scanning bus for storage devices ... 1 Storage Device (s) found
ELF file is 32 bit
Consoles: U-Boot console FreeBSD / PowerPC U-Boot bootstrap loader, Revision 2.4 (hmerge@svl-junos-pool130.juniper.net, Sat Mar 26 02:46:28 PDT 2011) Memory: 1024MB bootsequencing is enabled
bootsuccess is set
new boot device = disk1s2:

can't load '/ kernel' can't load '/kernel.old' Press Enter to stop auto bootsequencing and to enter loader prompt. Watchdog timed out. Resetting the board.
U-Boot 1.1.6 (Mar 26 2011 - 04:34:19)
Board: EX4500-40F 10.4
EPLD: Version 6.2 (0x81)
DRAM: Initializing (1024 MB)
FLASH: 8 MB
Firmware Version: 01.00.00
USB: scanning bus for devices ... 3 USB Device (s) found
scanning bus for storage devices ... 1 Storage Device (s) found
ELF file is 32 bit
Consoles: U-Boot console FreeBSD / PowerPC U-Boot bootstrap loader, Revision 2.4 (hmerge@svl-junos-pool130.juniper.net, Sat Mar 26 02:46:28 PDT 2011) Memory: 1024MB bootsequencing is enabled
bootsuccess is not set
new boot device = disk2

In fact, this problem increased the time spent an hour and a half. Yes, the switch also swears at the absence of the kernel, but why then the system tries to use disk2 if the system did not seem to see it in the loader> it is not clear. I can assume that in case of problems with booting, the device tries to cycle through the disks, but again the system did not see the disk2 device. How and why then the same flash drive later successfully loaded the device also raises questions.

It is possible that I was mistaken here:

loader> nextboot disk0:
loader> reboot

After all, when rebooting, the loader settings are lost. You had to try “boot” instead of “reboot”, but then I did not.

The new version of the system significantly increased the load on the device. On the old version, the CPU usage during the day was about 27-30%, after the upgrade - 45-48%, but neither the device’s simple configuration nor the traffic characteristics changed. After several remote sessions with Juniper TAC, the reason could not be established - there were assumptions about a memory leak and similar problems, but no. Strange, but I had to accept as a fact.

The attentive reader might have noticed that the device names displayed in the loader (disk0) and used for successful boot (disk1 and then / dev / da1s1a) are different. What is the reason I will not venture to assert. I can assume that the names change depending on the degree of successful loading of the system. Loader loaded - got some device names, contacting from db> - there will be others; from the CLI, we generally call devices via “media external” and “media internal”. In general, while only the assumption.

Most of the above calculations and commands I put together in the guide long before the update. After that, I periodically read and supplemented it, if possible problems occurred to me. It wasn’t perhaps db> mode and procedures ==> setenv. It is clear that everything did not work out and something did not work as it should. But hand on heart - without this guide and time for his mental running in, I would have given up. Especially since it was night work and the sharpness of the mind was reduced.

Backups - although they did not help me much, their presence calmed the conscience and soul. In the worst case, even if all the internal storage is damaged, I would copy the text config to the console. These two points are a guarantee that you will concentrate on your work, rather than analyzing how to return everything to its original state and what to do next.

Of the major shortcomings: in the process of work, I had several tabs running PuTTY, writing a log in one file. Then it was very difficult to sort everything out according to individual devices and timestamps, it was better to use SecureCRT or run a separate window on different devices, especially since I had enough funds for this.

And in the end - a picture from the scene. I hope this post will be useful to you. Good luck in the upcoming updates!



PS in the output of the commands I used the markup for the usual code “code”, which looks worse than the markup with the background of the source code of a specific language or BASH. However, the “code” markup allows bold text selection, which was important for me to highlight interesting places in the output of commands. If someone shares how to do both (background + bold inside), I will be grateful and promise to use in the future.
Update: it turned out that in different browsers and versions, the markup of the code is displayed differently. Trouble smoking further, how to make the text more visible and readable.

Source: https://habr.com/ru/post/320310/


All Articles