ECS - ECS Upgrade Procedures-ECS 2.0.x.x To 2.1.x.x Operating System Offline Update
ECS - ECS Upgrade Procedures-ECS 2.0.x.x To 2.1.x.x Operating System Offline Update
Topic
ECS Upgrade Procedures
Selections
What ECS Version Are You Upgrading To?: ECS 3.0.x.x or below
Select Type of ECS Upgrade Being Performed: ECS OS Upgrade - Offline/Online Procedures
Select ECS OS Upgrade Version/Procedure: 2.0.x.x to 2.1.x.x Upgrade
Select ECS OS Upgrade Type: OS - Offline
REPORT PROBLEMS
If you find any errors in this procedure or have comments regarding this application, send email to
[email protected]
Copyright © 2022 Dell Inc. or its subsidiaries. All Rights Reserved. Dell Technologies, Dell, EMC, Dell
EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be
trademarks of their respective owners.
The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of
any kind with respect to the information in this publication, and specifically disclaims implied warranties of
merchantability or fitness for a particular purpose.
Use, copying, and distribution of any software described in this publication requires an applicable
software license.
This document may contain certain words that are not consistent with Dell's current language guidelines.
Dell plans to update the document over subsequent future releases to revise these words accordingly.
This document may contain language from third party content that is not under Dell's control and is not
consistent with Dell's current guidelines for Dell's own content. When such third party content is updated
by the relevant third parties, this document will be revised accordingly.
Page 1 of 18
Contents
Preliminary Activity Tasks .......................................................................................................4
Read, understand, and perform these tasks.................................................................................................4
Post-Update Tasks................................................................................................................16
ECS Core Operating System Offline Update: Completed .....................................................................17
Check that the object service initialization process is complete .................................................................17
Page 2 of 18
Validate the data path is operational – Moved section ...............................................................................17
Troubleshooting ....................................................................................................................18
If node does not power on cleanly ..............................................................................................................18
If node does not see DAE disks..................................................................................................................18
Page 3 of 18
Preliminary Activity Tasks
This section may contain tasks that you must complete before performing this procedure.
Table 1 List of cautions, warnings, notes, and/or KB solutions related to this activity
2. This is a link to the top trending service topics. These topics may or not be related to this activity.
This is merely a proactive attempt to make you aware of any KB articles that may be associated with
this product.
Note: There may not be any top trending service topics for this product at any given time.
Page 4 of 18
ECS Core Operating System Offline Update
Use this procedure to update the ECS OS of a single appliance. This procedure brings the ECS appliance
offline and out of service. Perform this procedure only during a predefined maintenance window.
This procedure describes how to apply the OS update to all of the nodes at the same time, performing a
reboot of all nodes except one, and finally rebooting the last node to complete the OS update.
This procedure does not describe how to upgrade the ECS software or services running on the nodes.
Page 5 of 18
Use this as a guide to SSH to the appropriate node as discussed in this documentation:
Node 1 (provo): 192.168.219.1
Node 2 (sandy): 192.168.219.2
Node 3 (orem ): 192.168.219.3
Node 4 (ogden): 192.168.219.4
2. [ ] Run this command to view the contents of the MACHINES file. The output should show the
internal IP addresses for all the nodes that you are about to update.
# cat /var/tmp/MACHINES
3. [ ] Run the following command to display the current ECS OS version on all nodes.
# viprexec "rpm -qv ecs-os-base"
4. [ ] Validate the ECS OS on all nodes is the same. If the ECS OS is not the same on all nodes, open
an SR and do not continue with the procedure. The ECS OS currently installed on the nodes should
be lower than the version you are updating to.
1. [ ] Run the following commands to move the update package zip and bundle tbz, as well as any
preupdate and postupdate logs, into a time-stamped archive directory so that they do not interfere
with the checks outlined in this procedure.
# viprexec '[ -d /var/tmp/refit.d ] && mv /var/tmp/*update* /var/tmp/refit.d/. 2>/dev/null'
# viprexec '[ -d /var/tmp/refit.d ] && mv /var/tmp/MD5SUMS /var/tmp/refit.d/. 2>/dev/null'
# viprexec '[ -d /var/tmp/refit.d ] && mv /var/tmp/refit.d /var/tmp/refit.d.$(date +"%Y%m%d-
%H%M%S")'
Page 6 of 18
Staging the OS Update files
1. [ ] Using putty or a similar tool SSH to Node 1 and prepare the OS update files by running the
following commands:
# cd /var/tmp
# unzip ecs-os-update-<version>.zip
# md5sum -c MD5SUMS
# chmod +x /var/tmp/refit
2. [ ] Run the following command and verify the md5sum matches on all nodes.
# viprexec md5sum /usr/local/bin/refit
3. [ ] Distribute the OS Update bundle to all nodes that are to be updated, and validate.
# viprscp /var/tmp/*update.tbz /var/tmp/
# viprexec md5sum /var/tmp/*update.tbz
5. [ ] Create the local repository on all nodes from the OS update bundle.
# viprexec refit deploybundle /var/tmp/ecs-os-setup-target.x86_64-*.update.tbz
6. [ ] Run the following command to validate the repository was created in the previous command:
# viprexec zypper lr
Page 7 of 18
The output from each node looks like this:
# | Alias | Name | Enabled | Refresh
--+-------+------+---------+--------
1 | repo | repo | Yes | Yes
3. [ ] Run the watch command and wait for all containers to exit.
# watch viprexec "docker ps –a"
It will take several minutes for all docker images to exit. Continue to wait until the “watch” command
displays no running containers, then “Ctrl+C” out of the “watch” command and continue.
4. [ ] Run the following to stop docker and to validate that it stopped.
# viprexec systemctl stop docker
# viprexec systemctl status docker | grep Active
1. [ ] Create a second SSH session to node1. You can use it to monitor the progress of the update
command shown next.
2. [ ] From the first SSH session, run the following command:
# viprexec refit doupdate
3. [ ] From the second SSH session, run the following command to see if the refit doupdate
command is still running:
# ps auxwww|grep refit
Note: When the command completes, all the refit invocations kicked off by viprexec and its use of 'sh -c'
and 'ssh', will have completed and the grep will return nothing.
Page 8 of 18
REFIT SUCCESS running systemctl enable nileHardwareManager
enable docker
=========================================
systemctl enable docker
REFIT SUCCESS running systemctl enable docker
enable nan
=========================================
systemctl enable nan
REFIT SUCCESS running systemctl enable nan
Restart network services for upgrades to earlier versions; run the following command (wickedd
below is not a typo ):
# viprexec refit restartserviceslist wickedd wicked nan
Disable and validate that master has been configured to not allow PXE boot
Note: Execute the following section if the nodes being updated were at 2.0.0 HF3 or higher before
applying the update, otherwise skip to Exit maintmode to restart all containers on all Nodes.
1. [ ] Determine what node is the master node. SSH to any node and type the following:
# ssh master
2. [ ] Run
# refit disablenanpxe
3. [ ] Run the following commands as validations to ensure that the NAN PXE has been disabled:
a. Verify the command returns no.
# getrackinfo -p RackInstallServer
Page 9 of 18
# egrep 'tftp|pxe' /etc/dnsmasq.d/private_notftp
Verify it returns:
dhcp-boot=net:priv,pxelinux.0
#enable-tftp
#tftp-root=/srv/tftpboot
4. [ ] Make sure all the private MACs will be ignored by any rogue PXE server with a proper ignore list:
Run
# for mac in $(getrackinfo -v | egrep "private[ ]+:"|awk '{print $3}'); do
setrackinfo --installer-ignore-mac $mac ; done
5. [ ] Run:
# getrackinfo -i
The command should return output similar to the following. Validate that the number of MAC
addresses returned and the number of entries match the node count. You do not need to validate the
addresses, just the count:
Rack Installer Status
=====================
Mac Name Port Ip Status
00:1e:67:96:3e:59 provo 1 none Done!
00:1e:67:96:40:75 sandy 2 none Done!
00:1e:67:96:40:1b orem 3 none Done!
00:1e:67:96:40:2f ogden 4 none Done!
6. [ ] Run:
# viprexec cat /etc/dnsmasq.dhcpignore/all | sort | uniq
It should return output similar to the following:
4
4 00:1e:67:96:3e:59,ignore # (port 1) provo
4 00:1e:67:96:40:1b,ignore # (port 3) orem
4 00:1e:67:96:40:2f,ignore # (port 4) ogden
4 00:1e:67:96:40:75,ignore # (port 2) sandy
1 Output from host : 192.168.219.1
1 Output from host : 192.168.219.2
1 Output from host : 192.168.219.3
1 Output from host : 192.168.219.4
Page 10 of 18
1. [ ] SSH to node 1, and run the following command:
# refit ipmipower_all_not_node_x <starting_node> <ending_node> <node_1> off
Where
<starting_node> = The first node in the range. [2]
<ending_node> = The last node in the range. [4] or [8]
<node_1> = The node NOT being powered down. [1]
For example: On a 4 node system:
# refit ipmipower_all_not_node_x 2 4 1 off
You must hit <enter> to confirm the command. It does not continue until you respond.
Validate power state on all nodes "Chassis Power is off" except Node 1
Note: Execute the following section if the nodes being updated were at 2.0.0 HF3 or higher before
applying the update, otherwise skip to Exit maintmode to restart all containers on all Nodes.
Where:
<starting_node> = The first node in the range. [2]
<ending_node> = The last node in the range. [4] or [8]
<node_1> = The node NOT being checked. [1]
For example: On a 4 node system:
# refit ipmipower_all_not_node_x 2 4 1 status
You must hit <enter> to confirm the command. It does not continue until you respond.
Validate that master has been configured to not allow PXE boot.
Note: Execute the following section if the nodes being updated were at 2.0.0 HF3 or higher before
applying the update, otherwise skip to Exit maintmode to restart all containers on all Nodes.
The IPMI power down of the peers makes node1 the master. Perform these check(s) before powering the
peers up.
Page 11 of 18
# egrep 'tftp|pxe' /etc/dnsmasq.d/private_notftp
2. [ ] Make sure all the private MACs will be ignored by any rogue PXE server with a proper ignore list
by running:
# for mac in $(getrackinfo -v | egrep "private[ ]+:"|awk '{print $3}'); do
setrackinfo --installer-ignore-mac $mac ; done
The command should return output similar to the following. Validate that the number of MAC
addresses returned and the number of entries match the node count. You do not need to validate the
addresses, just the count:
Rack Installer Status
=====================
Mac Name Port Ip Status
00:1e:67:96:3e:59 provo 1 none Done!
00:1e:67:96:40:75 sandy 2 none Done!
00:1e:67:96:40:1b orem 3 none Done!
00:1e:67:96:40:2f ogden 4 none Done!
Where:
<starting_node> = The first node in the range. [2]
<ending_node> = The last node in the range. [4] or [8]
<node_1> = The node NOT being powered up. [1]
Page 12 of 18
For example, on a 4 node system:
# refit ipmipower_all_not_node_x 2 4 1 on
You must hit <enter> to confirm the command. It does not continue until you respond.
Validate power state on all nodes "Chassis Power is on" except Node 1
Note: Execute the following section if the nodes being updated were at 2.0.0 HF3 or higher before
applying the update, otherwise skip to Exit maintmode to restart all containers on all Nodes.
Where:
<starting_node> = The first node in the range. [2]
<ending_node> = The last node in the range. [4] or [8]
<node_1> = The node NOT being checked. [1]
For example on a 4 node system:
# refit ipmipower_all_not_node_x 2 4 1 status
You must hit <enter> to confirm the command. It does not continue until you respond.
Power Down Node1 and validate that master has been configured to not allow PXE boot
Note: Execute the following section if the nodes being updated were at 2.0.0 HF3 or higher before
applying the update, otherwise skip to Exit maintmode to restart all containers on all Nodes.
1. [ ] Using putty or a similar tool SSH to Node 2, and power down Node 1
# ssh <Node 1> 'shutdown -h now'
You must hit <enter> to confirm the command. It does not continue until you respond.
Page 13 of 18
3. [ ] If Node 1 is not in “off” state, use this command to force it:
# refit ipmipower_node_x 1 off
You must hit <enter> to confirm the command. It does not continue until you respond.
4. [ ] If you forced Node 1 into the "off" state, use this command to validate the status:
# refit ipmipower_node_x 1 status
You must hit <enter> to confirm the command. It does not continue until you respond.
Validate that master has been configured to not allow PXE boot
Note: Execute the following section if the nodes being updated were at 2.0.0 HF3 or higher before
applying the update, otherwise skip to Exit maintmode to restart all containers on all Nodes.
1. [ ] Determine the master node. SSH to any node and run the following command:
# ssh master
2. [ ] SSH to the master node and then run:
# getrackinfo -p RackInstallServer
4. [ ] Make sure all the private MACs will be ignored by any rogue PXE server with a proper ignore list,
by running:
# for mac in $(getrackinfo -v | egrep "private[ ]+:"|awk '{print $3}'); do
setrackinfo --installer-ignore-mac $mac ; done
5. [ ] Run:
# getrackinfo -i
The command should return output similar to the following. Validate that the number of MAC
addresses returned and the number of entries match the node count. You do not need to validate the
addresses, just the count:
Page 14 of 18
Rack Installer Status
=====================
Mac Name Port Ip Status
00:1e:67:96:3e:59 provo 1 none Done!
00:1e:67:96:40:75 sandy 2 none Done!
00:1e:67:96:40:1b orem 3 none Done!
00:1e:67:96:40:2f ogden 4 none Done!
6. [ ] Run:
# viprexec cat /etc/dnsmasq.dhcpignore/all | sort | uniq
You must hit <enter> to confirm the command. It does not continue until you respond.
2. [ ] Validate power state of Node 1 "Chassis Power is on"
# refit ipmipower_node_x <node_1> status
You must hit <enter> to confirm the command. It does not continue until you respond.
If docker is not running on all nodes, start it by running the following command:
# viprexec systemctl start docker
Page 15 of 18
2. [ ] Verify docker was started on all nodes by running the following command:
# viprexec systemctl status docker | grep active
4. [ ] Run the following command and verify that the MODE for all nodes is reported as ACTIVE and
not LOCKDOWN:
Example output:
Output from host : 192.168.219.4
{"mode":"ACTIVE","status":"OK","etag":573}
Page 16 of 18
2. [ ] Save the post-update OS information on all nodes to create an audit record that indicates an
upgrade was performed on this node.
# viprexec refit postupdateversions
3. [ ] Run the following command to display the old and new versions.
# viprexec 'ls -alrt /var/tmp/*updateversions*.log'
The first time you run the command, you might not see an entry for <unready_dt_num>. If that is true,
wait a few minutes and rerun the command. Do not proceed to the next step until you see the entry
<unready_dt_num>0</unready_dt_num> in the output.
1. [ ] Start the S3 Browser, and set up an account for the ECS Appliance with the following
specifications:
Option Description
Storage Type Select S3 Compatible Storage
REST Endpoint The IP address of one of the ECS Appliance
nodes using port 9020 or 9021. For example:
198.51.100.244:9021
Access Key ID Enter ecsuser
Secret Access Key The Object Secret Access Key
2. [ ] You should see the bucket you created using the ECS Portal.
3. [ ] Use the S3 browser to upload the test file from your laptop to verify that you are able to write to
the appliance
Page 17 of 18
Troubleshooting
If node does not power on cleanly
1. [ ] Login to Remote Management Module (RMM)
2. [ ] Launch virtual Console
3. [ ] If node hung in boot process “reset” power using RMM
Where:
<update_node_x> = The number of the Node being updated. [1-8]
For example:
# refit ipmipower_node_x 4 status
Page 18 of 18