0% found this document useful (0 votes)
16 views39 pages

Cluster Installation Trouble Shooting

Uploaded by

mingli.bi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views39 pages

Cluster Installation Trouble Shooting

Uploaded by

mingli.bi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

XtremIO

Installation and cluster


creation
XtremIO Technical Services Team
Presenter: Yejing Yang

EMC CONFIDENTIAL—INTERNAL USE ONLY 1


Presentation Agenda:
• Introduction

• Installing XtremApp

• Initializing the Cluster’s Services

• Support Documentation

XtremApp 4.0

EMC CONFIDENTIAL—INTERNAL USE ONLY 2


Introduction

EMC CONFIDENTIAL—INTERNAL USE ONLY 3


Cluster Installation
Introduction – Bring-Up Check-List

 Power-on all hardware


 Configure the Storage Controllers’ IP Addresses
 Deploy and configure the XMS’s IP Address (if required)
 Upgrade the XtremApp software and firmware versions
 Initialize cluster services
 Perform Post-Install configuration

EMC CONFIDENTIAL—INTERNAL USE ONLY 4


Cluster installation
Introduction – Bring-Up Flow
New XMS Appliance Existing XMS Appliance New Storage Controllers
Deploy OVA Power On
(Virtual) (Physical) Already powered On Power On

XMS Configuration Already configured Storage Controller


Configuration
XtremApp Installation Already runs current code

XtremApp Installation
XMS

Cluster Service Initialization

Post Install Tasks

EMC CONFIDENTIAL—INTERNAL USE ONLY 5


Installing XtremApp

EMC CONFIDENTIAL—INTERNAL USE ONLY 6


Cluster installation
Installing XtremApp – The XMS Upgrade Process

 The process of the XMS software installation is:


 Upload the XtremApp SW package to the XMS machine:
 Login to the XMS using xmsupload
(Use the Linux/Mac SCP command or the WinSCP S/W for Windows)
 Transfer the XtremApp code file to the XMS under /images

 Use the Easy-Install menu to install the XtremApp:


 Login into the XMS Using the xinstall user credentials
 In the startup menu, select Perform XMS installation only

EMC CONFIDENTIAL—INTERNAL USE ONLY 7


Cluster Installation
Installing XtremApp – Upgrading XMS XtremApp Software
login as: xinstall
xinstall@XMS's password:

XtremIO install interface


Checking XMS health
XMS health check passed

...
root:/var/lib/xms/images
Install menu xmsupload:/images
-------------------------------------
1. Configuration
2. Check configuration
3. Display configuration
4. Display installed XtremApp Version
5. Perform XMS installation only
...
99. Exit

> 5
Enter Installation image filename:
> upgrade-to-4.0.2-xx.tgz

Installing XMS
...

EMC CONFIDENTIAL—INTERNAL USE ONLY 8


Cluster Installation
Installing XtremApp – The Storage Controller Upgrade Process

 Before install the XtremApp to Storage controller, verify the


connectivity
 The process of the Storage Controller software installation is:

 Use the Easy-Install menu to install the XtremApp:


 Login into the XMS Using the xinstall user credentials
 In the startup menu, select the Install storage controllers only

EMC CONFIDENTIAL—INTERNAL USE ONLY 9


SC – Verify the connectivity
Login to xinstall, choose 3 – Check cluster setup menu

EMC CONFIDENTIAL—INTERNAL USE ONLY 10


SC xtrem-app Installation
Installing XtremApp – The Storage Controller Upgrade Process
login as: xinstall
xinstall@XMS's password:

XtremIO install interface


Checking XMS health
XMS health check passed

...
Please enter management Storage Controller
Install menu > <X1-SC1-MGMT-IP-ADDR>
------------------------------------- Please enter expected number of bricks:
1. Configure XMS > 2
2. Check XMS configuration Please enter installation image filename:
3. Display XMS configuration > upgrade-to-4.0.2-xx.tgz
4. Display XMS version
5. Install XMS only Running: /xtremapp/utils/fresh_install.py ...
6. Install storage controllers only
7. Configure ESRS IP Client
...
15. Restore XMS data
16. Disable root ssh access
99. Exit install menu

> 6

EMC CONFIDENTIAL—INTERNAL USE ONLY 11


The Installation Processes
1. Check if the installation package existed

2. Connected the input SC IP to discover all SC IP by IB

3. Make sure the discovered SC number equals bricks*2

4. Get the cluster name

5. Make sure the IP of SCs are accessible

6. Make sure there are enough space on SCs

EMC CONFIDENTIAL—INTERNAL USE ONLY 12


The Installation Processes
7. Transfer the installation file to SCs

8. Upgrade SC firmware

9. Burn the WWNN and WWPN

10. Reformat SCs

11. Reboot SCs

12. Start xtremapps

EMC CONFIDENTIAL—INTERNAL USE ONLY 13


Issue – fail with cannot connect sc ip(Step 2)
Running: /xtremapp/utils/fresh_install.py …
Waiting for storage controllers installation
Failed to install
Cannot connect SC IP

Root Cause:
• The XMS will connect the port 11111 of the management IP to discover all SC ip
• If the process in management SC is not running, the installation will fail due to not
connecting to that port

Resolution:
• Try ssh to all SCs then run the command “xtremapp-restart” to restart processes
• Or power cycle the SC

EMC CONFIDENTIAL—INTERNAL USE ONLY 14


Issue – Additional IP required (Step 3)
Running: /xtremapp/utils/fresh_install.py …
Waiting for storage controllers installation
In order to proceed 2 more storage controllers ips are required
Do you wish to provide ip manually?(Yes/No)

Root Cause:
• The XMS will discover all SC IPs by IB, then it will compare with the bricks # if matching
the number of SCs IP equals bricks*2(we have 2 SCs for each brick)
• If the SCs are in different version or IB has connection Issue, then the discovered IP
number may be less than expected

Resolution:
• Make sure you input a correct brick numbers
• Make sure the IB connection are connected
• Make sure all the SC are at least in 4.0-49 base image
• Manually input the SC IPs if not discovered by script
This is an enhancement in 4.0. If in previous release, you have to reimage all SCs to base
image

EMC CONFIDENTIAL—INTERNAL USE ONLY 15


Issue – cluster PSNT required(Step 4)
You are attempting to install Storage Controllers which are part of a configured cluster:
Cluster Name: xbrick25
Cluster Version: 4.0.4-xxx
WARNING: Proceeding with the installation will PERMANENTLY ERASE …..
Please type the cluster's PSNT if you want to proceed.

Root Cause:
• This is a protection mechanism designed in 4.0
• The XMS get the cluster name from existing management SC, if it can get this
information, then XMS thought this is a working cluster with data
• The XMS will ask you input the PSNT to make sure DO NOT ERASE a production cluster

Resolution:
Please double confirm the PSNT you have, input the correct PSNT, if it’s correct, then the
installation will continue, else it will abort

EMC CONFIDENTIAL—INTERNAL USE ONLY 16


Issue – Unable to determine cluster state (Step 4)
Running: /xtremapp/utils/fresh_install.py 10.17.1.30 6 ….
Waiting for Storage Controllers installation; found 12 Storage Controllers.
Unable to determine cluster state. Aborting installation

Root Cause:
This is due the XMS not be able to detect the cluster status

Resolution:
1. Reset the cluster status to the default by the following steps:
• Ssh to all SCs
• #xtremapp-reformat
• #xtremapp-restart
2. Or reimage the SCs and configure the information

EMC CONFIDENTIAL—INTERNAL USE ONLY 17


Issue – failed with command df /var/common failed
(Step 5 & 6)
Running: /xtremapp/utils/fresh_install.py …
Waiting for storage controllers installation, found 8 storage controllers
Failed to install
Upgrade utility failed with command df /var/common failed

Root Cause:
• When the XMS trying to find all SCs, it will connect to management SC(X1-SC1) then
discovered other SCs by IB network
• Then XMS checkes the space on SCs by ssh to all SCs in Ethernet
• If it cannot SSH to any SC, it will fail with this error

Resolution:
Try ssh all SCs from XMS to see it is able to, then troubleshoot it with customer’s network
admin

EMC CONFIDENTIAL—INTERNAL USE ONLY 18


Issue – Failed to copy package (Step 7)
Root Cause:
The XMS will scp the installation package to each SC by management IP, the file size is
greater then 1g. It requires a stable network connection, if in any case the network is not
stable, it could fail.

Resolution:
• Connect to XMS and all SCs in ssh, then run “ethtool eth0” ,make sure the speed is 1g
and the duplex mode are Full, if not
a. Check the setting on switch
b. Replace the cable
• SCP installation file from XMS to each SC in order to find which one is slow
#scp /var/lib/xms/images/upgrade-to-4.0.2-80.tar [email protected]:/var/common

Resolve this issue with customer’s network admin

EMC CONFIDENTIAL—INTERNAL USE ONLY 19


Issue – Password required during installation (Step 7)
Running: /xtremapp/utils/fresh_install.py …
Waiting for storage controllers installation; found 4 Storage Controllers
Password:
Password:

Root Cause:
Since xms will SCP installation files to SCs, it’s possible happens in those scenarios:
• DNS configured in XMS; there is a server named “none” in customer domain and it
supports ssh
• There is a existing server in customer’s network with the same IP address of the SC

Resolution:
• Disable the DNS setting in XMS server
• Ssh to SC IP with the default username and password, if fails, check if there is the IP
conflict within customer’s network

EMC CONFIDENTIAL—INTERNAL USE ONLY 20


Issue – failed with command rm -rf /var…(Step 8)
Waiting for storage controllers installation, found 2 storage controllers
Copying upgrade file to all storage controllers : [SC1 IP], [SC2 IP]
Failed to install
Upgrade utility failed with command rm -rf /var/tmp/upgrade-pkg ; …

Root Cause:
• Once the XMS transferred the installation file to all SCs, it will extract the file in SC and
run the scripts from the packages
• During to the installation, this file could be corrupted, it may fail to call some scripts

Resolution:
• Please login to xms then check the md5 of installation file, e.g,
#md5sum /var/lib/xms/images/upgrade-to-4.0.2-80.tar
302c797636385eedc91237b74f00c98b /var/lib/xms/images/upgrade-to-4.0.2-80.tar
• Make sure it’s the same value as in download.emc.com

EMC CONFIDENTIAL—INTERNAL USE ONLY 21


Cluster creation
Initializing Cluster Services – Launching the Cluster Creation
login as: xmsadmin
[email protected]'s password:
Last login: Tue Sep 24 02:39:26 2013 from isenbuchmol1c.corp.emc.com
Username: tech
Password:
Connect XMS on 10.245.160.12:443: 42502: version 4.0.0 build xxx
xmcli (tech)> create-cluster sc-mgr-host="10.245.160.26" expected-number-of-bricks=2 cluster-name="hopfsa_xbrick2_3"

Creating Cluster "hopfsa_xbrick2_3" ...


Cluster-Name Index Storage-Controller-Name Index Mgr-Addr Brick-Name Index
hopfsa_xbrick2_3 1 X1-SC1 1 10.245.160.26 X1 1
hopfsa_xbrick2_3 1 X1-SC2 2 10.245.160.28 X1 1
hopfsa_xbrick2_3 1 X2-SC1 3 10.245.160.32 X2 2
hopfsa_xbrick2_3 1 X2-SC2 4 10.245.160.34 X2 2
[- ] 0% (elapsed time 00:00:01)

[/ ] 0% Reformatting storage controllers (elapsed time 00:00:21)

[***************/ ] 30% Launching modules (elapsed time 00:03:55)

[**************************| ] 53% Activating the modules (elapsed time 00:04:38)

[******************************************- ] 84% SSD firmware upgrade if needed (elapsed time 00:06:14)

[***************************************************] 100% Done! (elapsed time 00:06:44)

Cluster hopfsa_xbrick2_3 [1] created

EMC CONFIDENTIAL—INTERNAL USE ONLY 22


The Creation Processes
1. Discover and add SCs by management IP

2. Discover and add IB ports of SCs

3. Discover and add DAE SSDs

4. Discover and add target ports (fc and iscsi)

5. Discover and add DAE/DAE components (LCC, PSU)

6. Discover and add IB SW and IB PSU

EMC CONFIDENTIAL—INTERNAL USE ONLY 23


The Creation Processes
7. Discover and add SCs local disks and PSUs

8. Discover and add BBUs

9. Write cluster PSNT and Initializing (Activate) system

10. Change the default root password for XMS and SC

11. BBU test

* Key log file: messages/xms.log in XMS server

EMC CONFIDENTIAL—INTERNAL USE ONLY 24


Issue – Connection refused
Proceed to create cluster? (Yes/No): Yes
Creating Cluster “xxxx" ...
XMS Completion Code: Connection refused

Root Cause:
Like XtremApp installation, the XMS will connect to management SC IP, the failure might be
reported if SC
• Not reachable from XMS
• Or XtremApp process is not running

Resolution:
• Verify if you could SSH from the management IP to each SC
• Run “xtremapp-restart” or reboot the SC to restart the XtremApp processes

EMC CONFIDENTIAL—INTERNAL USE ONLY 25


Issues – general hardware errors
In most of cases, the XMS will tell you the failure during hardware discovering, like
1. SSD issue
2. BBU issue
3. IB Issue
4. etc

Root Cause:
• The installation will discover all hardware components and verify the connection, then
added to SYM database, if anything unexpected, the XMS will report the errors.
• The detailed error message would be included in xms.log
Resolution:
Follow the error message and try run the following commands in all SCs accordingly
• Check BBUs
#upsc eaton1550 |grep ser

 In single brick, the serial number should be different and match to both BBUs, if not,
then identify the wrong part of cable/bbu/SC by replacing and switching test
 In multiple bricks, verify if both SC in the same brick have the same serial number, if not,
check the com cable connection; if not being reported, then identify if have the wrong
part number

EMC CONFIDENTIAL—INTERNAL USE ONLY 26


Issues – general hardware errors
• Check SSDs
#lsscsi |wc -l

 In starter brick, the number is 32=4(local disks)+2(Lcc)+13(SSDs)*2


 In other bricks, the number is 56=4+2+25*2
 In some cases, the SSD could be detected but could not read the detailed information
like wwn, please check the xms.log to find which is the fault SSD
• Check IB
#ibstat |grep Rate

The rate should be 40G for every port, so you should see 2 lines
• Check fiber channel card
#python -O /xtremapp/utils/qla_wwn.pyo –g

You should be able to see 2 WWNNs and 2 WWPNs for each SC

EMC CONFIDENTIAL—INTERNAL USE ONLY 27


Issue – sys_state_failure due to SSD
Proceed to create cluster? (Yes/No): Yes
Creating Cluster “xxxx" ...
[######## ] 38% Failed (elapsed time 00:06:30)
XMS Completion Code: sys_state is failed

Root Cause:
• This is a general message for HW failure, here is an example of SSD
• The detailed error message would be included in xms.log as following
Wrong: 2016-04-29 18:59:56,197 - XmsLogger - INFO - mom::check_property_change_events:813 - txda1xio02: txda1xio02:
Threshold crossing of SSD very high utilization; value changed to healthy, There are 0 KB remaining.

Correct: 2016-01-01 15:37:14,886 - XmsLogger - INFO - mom::check_property_change_events:813 - XtremIO_91: XtremIO_91:


Threshold crossing of SSD high utilization threshold; value changed to healthy, There are 8146708712 KB remaining

Resolution:
1. Run “lsscsi |wc –l” to check if SSD as expected
2. Ask CE to verify if all SSD LCC lights are good.
3. Ask CE to power cycle the DAE if SC could not see all disks
4. Create the cluster again

EMC CONFIDENTIAL—INTERNAL USE ONLY 28


Issue – sys_state_failure due to memory
Proceed to create cluster? (Yes/No): Yes
Creating Cluster “xxxx" ...
[######## ] 46% Failed (elapsed time 00:16:30)
XMS Completion Code: sys_state is failed

Root Cause:
• This is a general message for HW failure during initialization due to memory issue
• The detailed error message would be included in xms.log as following
Wrong:

Correct: 2016-01-01 15:37:14,962 - XmsLogger - INFO - mom::check_property_change_events:813 - XtremIO_91: XtremIO_91:


The shared memory utilization state changed from None to healthy. Shared memory utilization is 5.7810647547 percent of the
shared memory pool with significant memory that may be recovered.

Resolution:
1. Run the “free” command to check the SC memory, GEN2 is 256G and GEN3 is 512G
2. This issue always happened at GEN3. For GEN2, if memory not equal 256G, the SC could
not boot up
3. Replace the SC if not in 512G

EMC CONFIDENTIAL—INTERNAL USE ONLY 29


Issue – sys_state_failure due to SAS
Proceed to create cluster? (Yes/No): Yes
Creating Cluster “xxxx" ...
[######## ] 38% Failed (elapsed time 00:06:30)
XMS Completion Code: sys_state is failed

Root Cause:
• This is a general message for HW failure due to SAS cable
• The detailed error message would be included in xms.log
Wrong:2015-12-29 21:23:52,431 - XmsLogger - WARNING - executor::<lambda>:9576 - 172.22.185.127 reformat error:
xtremapp-reformat[229417]: found 50 drives

2015-12-29 21:23:52,442 - XmsLogger - WARNING - executor::<lambda>:9576 - 172.22.185.127 reformat error: xtremapp-


reformat[229422]: WARNING: Not all SAS phys are up, expected 8 but got 5, system might not work properly ----->The issue is
related to SAS connections

Resolution:
1. Run “/xtremapp/utils/mpt_status |grep "link speed” to check SAS connection, like
[root@XtremIO_91-x1-n1 utils]# /xtremapp/utils/mpt_status |grep "link speed"
Negotiated link speed: aa
Negotiated link speed: aa
Negotiated link speed: aa

EMC CONFIDENTIAL—INTERNAL USE ONLY 30


Issue – sys_activation_failure due to physical xms
[# ] 0% Failed! (elapsed time 00:05:08)
*** XMS Completion Code: sys_activation_error

Root Cause:
• The xms will bring the eth0 down then bring it up during the cluster creation
• For physical xms, this could take longer before lost the connection to SC
2015-07-07 13:58:03,686 - XmsLogger - INFO - executor::_expand_cluster:11157 - Added slot info for slot 24
2015-07-07 13:58:10,546 - XmsLogger - ERROR - executor::poll_nodes_for_keepalive_in_thread:1342 - System error: [Errno
113] No route to host
2015-07-07 13:58:10,547 - XmsLogger - ERROR - executor::poll_nodes_for_keepalive_in_thread:1345 - Keep Alive Error for
Node X1-SC1 [1]: [Errno 113] No route to host

Resolution:
1. Use virtual XMS
2. Comment the line in installation scripts /xtremapp/bin/network_config.py on xms
#ifdown_poor_simulator("eth0") # Disable the DHCP brought interface
system("ip -f %s addr add %s/%s dev eth0 ; ip link set eth0 up" % (family, ip, cidr))
system("ip -f %s route del default" % family)

EMC CONFIDENTIAL—INTERNAL USE ONLY 31


Issue – sys_activation_failure due to version
mismatch
[# ] 0% Failed! (elapsed time 00:05:08)
*** XMS Completion Code: sys_activation_error

Root Cause:
From 4.0, if in multiple cluster environment, the higher version of XMS(4.0.2) could be used
to manager lower version XtremIO (4.0.1), but it don’t support to create the cluster in such
combination

Resolution:
1. Deploy a new XMS with the same version as SC
2. Upgrade the SC to new version

EMC CONFIDENTIAL—INTERNAL USE ONLY 32


Issue – sys_state_failure due to SAS
2. All the link speed should be “aa”, means the connection is good.
3. Reseat SAS cable if see any “00” or “11”
4. Power cycle DAE

EMC CONFIDENTIAL—INTERNAL USE ONLY 33


Issue – create fail due to PSNT not match
xmcli (tech)>create-cluster expected-number-of-bricks=2 cluster-name="EMC-XtremIO-1" sc-mgr-host="Ip_Addr_node"
Creating Cluster EMC-XtremIO-1....***XMX Completion Code: node_psnt_mismatch

Root Cause:
When create a expanded cluster from existing SCs + New SCs, for some reason, the PSNT
of new SCs is not the same with existing SCs

Resolution:
• Login in SC and verify the PSNT, like
#/xtremapp/utils/get_psnt.sh
CKM00XXXXXXXXX

• Burn the correct PSNT to SCs


#/xtremapp/utils/burn_psnt.sh -t CKM00XXXXXXXXX

• Verify the PSNT again with get_psnt

EMC CONFIDENTIAL—INTERNAL USE ONLY 34


Issue – create fail due to PN not supported
xmcli (tech)> create-cluster expected-number-of-bricks=1 cluster-name="xio_8586" sc-mgr-host="10.242.13.31"
Warning: Storage Controller X1-SC1 part number 100-586-007-00 is unsupported for this Cluster's PSNT part number 100-586-
007-00

Root Cause:
It happens when reinstall a 4.0.x XtremIO from 3.0.x XtremIO. In the older code, we don’t
validate the SC’s part number. In 4.0.x, we do the validation. The Part Number should be
900-586-xxx, but for some old SCs, it's 100-586-xxx. We need manually modify it.
Resolution:
• Login in SC and verify the PN, like
#/xtremapp/utils/get_psnt.sh
100-586-007-00
Correct PNs
• Burn the correct PN to SCs 900-586-002 XtremIO HW Gen2 400GB
900-586-003 XtremIO HW Gen2 800GB Encrypt Capbl
#/xtremapp/utils/burn_psnt.sh -n 900-586-002
900-586-004 XtremIO HW Gen2 400GB Encrypt Capbl
900-586-005 XtremIO HW Gen2 400GB Expandable
• Verify the PSNT again with get_psnt 900-586-006 XtremIO HW Gen3 40TB

EMC CONFIDENTIAL—INTERNAL USE ONLY 35


Issue – cluster-psnt is missing
xmcli (tech)> create-cluster expected-number-of-bricks=1 sc-mgr-host="xx.xx.xx.xx" cluster-name="xxxxxxx"
You are attempting to create cluster "xxxxxx" on an active cluster.
WARNING: Proceeding with this procedure will PERMANENTLY ERASE all data and configurations from this cluster.
Do you want to proceed? (Yes/No): Yes
Cli error: "cluster-psnt" is missing

Root Cause:
• This is a protection mechanism designed in 4.0.x
• The XMS will check if the SC belong to any cluster, for example, you will see such errors
when you had ever created this cluster but failed
• The XMS will ask you input the cluster PSNT, please DO NOT ERASE a production cluster

Resolution:
Double confirm the PSNT, then run the command as
#create-cluster expected-number-of-bricks=1 sc-mgr-host="xx.xx.xx.xx" cluster-
name="xxxxx" cluster-psnt="xxxxxx"

EMC CONFIDENTIAL—INTERNAL USE ONLY 36


Supporting Documentation

EMC CONFIDENTIAL—INTERNAL USE ONLY 37


Cluster installation
Supporting Documentation

 Installing and upgrading the cluster:


– XtremIO Software Installation and Upgrade Guide
– XtremIO Installation Summary Form

EMC CONFIDENTIAL—INTERNAL USE ONLY 38

You might also like