Chassis Cluster Upgrade with Minimal Downtime (v1.
0)
SRX BRANCH (SRX1xx, SRX2xx, SRX3xx, SRX550, SRX550HM, SRX650)
SRX Mid-Range (SRX1400, SRX1500, SRX3400, SRX3600, SRX4100, SRX4200, SX4600, vSRX)
Prerequisites:
• Prerequisites may be performed outside of MW with no impact to traffic
• Console connections setup to both chassis cluster nodes are necessary to allow unique config adjustments and due to device
power off via 'halt' method used
• Download Junos software from Juniper Download website
• Backup current configuration
o Local device file storage – ‘show configuration | save /var/tmp/<name>
o To attached USB – Use KB12880 to mount USB then ‘save configuration | save /var/tmp/usb’
• Upload Junos OS image to Device storage
o e.g., /var/tmp/junos-install-srx5000-x86-64-18.4R2.7.tgz
• Verify upgrade image package to current configuration
o Not available for SRX1500, SRX4100/4200, SRX4600 and vSRX devices
o >request system software validate <image location>
• Temporarily disable connected switch settings for MAC moves/duplications such as ‘mac-move-limit’ and ‘duplicate-mac-
detection’ due to possible duplicate location mac addresses during Step 12.
Upgrade Directions:
The below steps assume that Node0 is the primary for control plane (RG0) and data plane
(RG1+) and configured with a higher priority than the secondary node.
As needed, please failover all redundancy-groups (RGs) to primary node
>request chassis cluster failover redundancy-group [x] node 0
Node0 Directions Node1 Directions
1. Disable all physical interfaces used for transit traffic on Node1
(secondary node)
Note: Alternatively, you may physically remove cables or ‘shut’
connected device interfaces.
e.g.,
set interfaces ge-5/0/4 disable
set interfaces ge-5/0/5 disable
2. Disable TCP SYN check and sequence check
set security flow tcp-session no-syn-check
set security flow tcp-session no-sequence-check
3. Deactivate preempt for all RG1+
deactivate chassis cluster redundancy-group 1 preempt
4. Deactivate all interface-monitor and ip-monitoring
deactivate chassis cluster redundancy-group 1 interface-
monitor
deactivate chassis cluster redundancy-group 1 ip-monitoring
5. Commit the configuration
commit
6. Physically disconnect control link and delete fab interfaces from 6. Physically disconnect control link and delete fab interfaces from
configuration. configuration
6a. Physically disconnect control port cabling
Note: Refer to Understanding SRX Series Chassis Cluster Slot
Numbering and Physical Port and Logical Interface Naming
for a listing of control port locations and SRX4k dedicated
fabric link interface naming.
Note: It is expected that Node0 will report Node1 as ‘lost’ due to Note: It is expected that Node1 will transition into an ‘ineligible’ then
loss of control link ‘disabled’ state and report Node0 as ‘lost’ due to loss of control link
6b. Delete Fabric interface information from configuration. 6b. Delete Fabric interface information from configuration
Note: Before configuration adjustments, make a note of current
configured fabric port interfaces for later addition to configuration
in step 19 & 21.
show interfaces fab0
show interfaces fab1
delete interfaces fab0
delete interfaces fab0 delete interfaces fab1
delete interfaces fab1
7. Commit the configuration
7. Commit the configuration
commit and-quit
commit and-quit
NOTE: Step 6b & 7, will need to be applied to both nodes independently due to loss of node
communication after control link removal in step 6a.
NOTE: Before starting Node1 upgrade, make sure the “active” NOTE: Before starting Node1 upgrade, make sure the “active”
configuration reflects the changes made on step 6 and Node0 configuration reflects the changes made on step 6 and Node1 reports
reports Node1 as lost. Node0 as lost.
e.g., e.g.,
{Primary:node0} {disabled:node1}
root@srx345> show configuration | display set | match "fab[01]" root@srx345> show configuration | display set | match "fab[01]"
{primary:node0} {disabled:node1}
root@srx345> show chassis cluster status root@srx5k> show chassis cluster status
… …
Cluster ID: 6 Cluster ID: 5
Node Priority Status Preempt Manual Monitor-failures Node Priority Status Preempt Manual Monitor-failures
Redundancy group: 0 , Failover count: 1 Redundancy group: 0 , Failover count: 1
node0 200 primary no no None node0 0 lost n/a n/a n/a
node1 0 lost n/a n/a n/a node1 100 disabled no no None
Redundancy group: 1 , Failover count: 1 Redundancy group: 1 , Failover count: 1
node0 0 primary no no None node0 0 lost n/a n/a n/a
node1 0 lost n/a n/a n/a node1 100 disabled no no None
2
### Start Node1 upgrade ###
8. Upgrade Junos OS on the Node1
request system software add no-copy <install-package>
Note: If upgrade was verified previously as part of prerequisite steps,
‘no-validate’ may be used to speed up install process.
9. Reboot
request system reboot
10. After Node1 completes boot process, verify the following before
moving to next step:
- Updated Junos OS
- All FPCs and PICs are online (may take upto 15 minutes
depending on the type and number of FPCs)
- Node1 should be in primary state for all RGs, and reporting
Node0 as ‘lost’
- No major alarms being displayed
show version
show chassis fpc pic-status
show chassis cluster status
show chassis alarms
show system alarms
NOTE: Priorities of RG1+ will report priority 0 as part of normal
behavior.
{primary:node1}[edit]
root@srx345# run show chassis cluster status
…
Cluster ID: 6
Node Priority Status Preempt Manual Monitor-failures
Redundancy group: 0 , Failover count: 1
node0 0 lost n/a n/a n/a
node1 100 primary no no None
Redundancy group: 1 , Failover count: 1
node0 0 lost n/a n/a n/a
node1 0 primary no no CS
11. Before failing over to Node1, it is best to verify the configuration 11. Before failing over to Node1, it is best to verify the configuration
change will occur successfully by testing a commit first then, change will occur successfully by testing a commit first then,
- disable all physical interfaces for transit traffic on Node0 - disable all physical interfaces for transit traffic on Node0
- enable all physical interfaces for transit traffic on Node1 - enable all physical interfaces for transit traffic on Node1
e.g., (test commit) e.g., (test commit)
{primary:node0}[edit] {primary:node1}[edit]
root@srx345# set interfaces reth0 description TEST root@srx345# set interfaces reth0 description TEST
{primary:node0}[edit] {primary:node1}[edit]
root@srx345# commit root@srx345# commit
node0: node1:
commit complete commit complete
{primary:node0}[edit] {primary:node1}[edit]
root@srx345# rollback 1 root@srx345# rollback 1
load complete load complete
{primary:node0}[edit] {primary:node1}[edit]
root@srx345# commit root@srx345# commit
node0: node1:
commit complete commit complete
3
e.g., e.g.,
set interfaces ge-0/0/4 disable set interfaces ge-0/0/4 disable
set interfaces ge-0/0/5 disable set interfaces ge-0/0/5 disable
delete interfaces ge-5/0/4 disable delete interfaces ge-5/0/4 disable
delete interfaces ge-5/0/5 disable delete interfaces ge-5/0/5 disable
commit check commit check
NOTE: Enable all physical interfaces of Node1 that were disabled NOTE: Enable all physical interfaces of Node1 that were disabled on
on step 1. step 1.
NOTE: If there are any commit conflicts, they need to be resolved NOTE: If there are any commit conflicts, they need to be resolved
before moving to the next step. before moving to the next step.
NOTE: Alternatively, prepare to physically remove cables or NOTE: Alternatively, prepare to physically add cables or ‘un-shut’
‘shut’ connected device interfaces for Node0. connected device interfaces for Node1.
12. Commit the configuration simultaneously on both nodes. This 12. Commit the configuration simultaneously on both nodes. This will
will cause all of the traffic to failover to the Node1. cause all of the traffic to failover to the Node1
commit commit
NOTE: Alternatively, physically remove cables or ‘shut’ NOTE: Alternatively, physically add cables or ‘un-shut’ connected
connected device interfaces for Node0. device interfaces for Node1.
NOTE: The total amount of downtime will vary depending on switching/routing environment.
(e.g., dynamic routing, STP, MSTP, RSTP, VSTP, edge, PortFast, and etc).
13. Verify traffic is passing through Node1
show security flow session summary
monitor interface traffic
### Start Node0 upgrade ###
14. Upgrade Junos OS on the Node0
request system software add no-copy no-validate <install-
package>
15. Reboot
request system reboot
16. After Node0 completes boot process verify the following, before
moving to next step:
- Updated Junos OS
- All FPCs and PICs are online (may take upto 15 minutes
depending on the type and number of FPCs)
- Node0 should be in primary state for all RGs, and reporting
Node1 as ‘lost’
- No major alarms being displayed
show version
show chassis fpc pic-status
show chassis cluster status
show chassis alarms
show system alarms
4
NOTE: Priorities of RG1+ will report priority 0 as part of normal
behavior.
{primary:node0}
root@srx345> show chassis cluster status
…
Cluster ID: 6
Node Priority Status Preempt Manual Monitor-failures
Redundancy group: 0 , Failover count: 1
node0 200 primary no no None
node1 0 lost n/a n/a n/a
Redundancy group: 1 , Failover count: 1
node0 0 primary no no CS
node1 0 lost n/a n/a n/a
17. Before connecting control link and re-configuring fab interfaces,
17. Before connecting control link and re-configuring fab interfaces, enable interface-monitor which was disabled in step 4.
enable interface-monitor which was disabled in step 4.
e.g.,
e.g., activate chassis cluster redundancy-group 1 interface-monitor
activate chassis cluster redundancy-group 1 interface-monitor commit check
commit check
18. Commit the configuration on both nodes
18. Commit the configuration on both nodes
commit
commit
19. Re-configure fabric interfaces on Node0 only (You will
configure the fabric links on Node1 at step 21)
set interfaces fab0 fabric-options member-interfaces ge-0/0/2
set interfaces fab1 fabric-options member-interfaces ge-5/0/2
commit check
commit and-quit
20. Make Node0 in halt status by “request system halt”
NOTE: For SRX1500/4100/4200/4600 use ‘request system
power-off”
{primary:node0}
root@srx345> request system halt
warning: This command will not halt the other routing-engine.
If planning to switch off power, use the both-routing-engines
option.
Halt the system ? [yes,no] (no) yes
*** FINAL System shutdown message from root@srx345 ***
System going down IMMEDIATELY
Shutdown NOW!
[pid 2193]
{primary:node0}
root@srx345> failed to set the server tnp addressWaiting (max
60 seconds) for system process `vnlru_mem' to stop...done
Waiting (max 60 seconds) for system process `vnlru' to
stop...done
Waiting (max 60 seconds) for system process `bufdaemon' to
stop...done
Waiting (max 60 seconds) for system process `syncer' to
stop...
Syncing disks, vnodes remaining...3 3 1 1 1 1 1 1 0 0 0 0 0 0
done
syncing disks... All buffers synced.
Uptime: 1h25m0s
recorded reboot as normal shutdown
The operating system has halted.
Please press any key to reboot.
5
NOTE: For SRX1500/4100/4200/4600 output will reflect as:
root@SRX4600 > request system power-off
Power Off the system ? [yes,no] (no) yes
*** FINAL System shutdown message from root@SRX4600- ***
System going down IMMEDIATELY
Shutdown NOW!
[pid 15774]
Stopping cron.
Waiting for PIDS: 14311.
Poweroff for hypervisor to respawn
…
Sending all processes the TERM signal...
Sending all processes the KILL signal...
Unmounting remote filesystems...
Deactivating swap...
Unmounting local filesystems...
reboot: Power down
NOTE: DO NOT press any key before step 21 is completed.
NOTE: Make sure DO NOT reconnect control links or commit until
node0 is in halt/powered off status in step 20.
NOTE: Make sure node1 is primary for all RGs (show chassis cluster
status).
21. When node0 console prints out “The operating system has halted.”,
re-connect the control link and re-configure fab interfaces.
NOTE: For SRX1500/4100/4200/4600 wait for system to report:
Unmounting local filesystems...
reboot: Power down
21a. Physically reconnect control link ports removed in step 6a (this
includes reconnecting links on Node 0)
21b. Re-configure fabric interfaces on node1
set interfaces fab0 fabric-options member-interfaces ge-0/0/2
set interfaces fab1 fabric-options member-interfaces ge-5/0/2
commit check
commit and-quit
22. Press any key to reboot Node0
23. When Node0 completes bootup, verify the following: 23. When Node0 completes bootup, verify the following:
- All FPCs and PICs are online (may take upto 15 minutes
depending on the type and number of FPCs) Chassis cluster statistics should reflect increasing counts on control
- Chassis cluster status should reflect nodes as and fabric links
Primary/Secondary
- Chassis cluster statistics should reflect increasing counts on show chassis cluster statistics
control and fabric links
- Both nodes showing same version
6
show chassis fpc pic-status (verify all slots and pics are
“Online”)
show security flow session summary (verify both nodes
reporting similar session counts)
show chassis cluster status
show chassis cluster statistics
show version
24. Enable all physical interfaces for transit traffic on node0, which
was disabled in step 11 and enable TCP syn-check/ sequence-check
which were disabled in step 2.
e.g.,
delete interfaces ge-0/0/4 disable
delete interfaces ge-0/0/5 disable
delete security flow tcp-session no-syn-check
delete security flow tcp-session no-sequence-check
commit check
commit
NOTE: Alternatively, physically add cables or ‘un-shut’
connected device interfaces for Node 0 after commit of flow
settings
25. Activate “preempt” and ip-monitoring if they were configured
before for RG1+
activate chassis cluster redundancy-group 1 preempt
activate chassis cluster redundancy-group 1 ip-monitoring
commit and-quit
26. Verify chassis cluster priorities have returned to normal
configured values
show chassis cluster status
27. Optional: Failover RG groups to Node0
(in case “preempt” is not configured, or is used with higher
priority on node1)
request chassis cluster failover redundancy-group 0 node 0
request chassis cluster failover redundancy-group 1 node 0
request chassis cluster failover reset redundancy-group 0
request chassis cluster failover reset redundancy-group 1