0% found this document useful (0 votes)
32 views89 pages

Administering Managed Objects in Multicontroller RNC

Uploaded by

Erick Marin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views89 pages

Administering Managed Objects in Multicontroller RNC

Uploaded by

Erick Marin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

Administering Managed Objects in

Multicontroller RNC

DN0975566
Issue 09
Approved on 2023-02-13

WCDMA RAN mcRNC, FP23R3

Operating Documentation, Issue 01

© 2023 Nokia. Nokia Condential Information. Use subject to agreed restrictions on disclosure and use.
Nokia is committed to diversity and inclusion. We are continuously reviewing our customer
documentation and consulting with standards bodies to ensure that terminology is inclusive
and aligned with the industry. Our future customer documentation will be updated
accordingly.

This document includes Nokia proprietary and condential information, which may not be
distributed or disclosed to any third parties without the prior written consent of Nokia. This
document is intended for use by Nokia’s customers (“You”/”Your”) in connection with a
product purchased or licensed from any company within Nokia Group of Companies. Use this
document as agreed. You agree to notify Nokia of any errors you may nd in this document;
however, should you elect to use this document for any purpose(s) for which it is not
intended, You understand and warrant that any determinations You may make or actions
You may take will be based upon Your independent judgment and analysis of the content of
this document.

Nokia reserves the right to make changes to this document without notice. At all times, the
controlling version is the one available on Nokia’s site.

No part of this document may be modied.

NO WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO
ANY WARRANTY OF AVAILABILITY, ACCURACY, RELIABILITY, TITLE, NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, IS MADE IN RELATION TO THE
CONTENT OF THIS DOCUMENT. IN NO EVENT WILL NOKIA BE LIABLE FOR ANY DAMAGES,
INCLUDING BUT NOT LIMITED TO SPECIAL, DIRECT, INDIRECT, INCIDENTAL OR
CONSEQUENTIAL OR ANY LOSSES, SUCH AS BUT NOT LIMITED TO LOSS OF PROFIT,
REVENUE, BUSINESS INTERRUPTION, BUSINESS OPPORTUNITY OR DATA THAT MAY ARISE
FROM THE USE OF THIS DOCUMENT OR THE INFORMATION IN IT, EVEN IN THE CASE OF
ERRORS IN OR OMISSIONS FROM THIS DOCUMENT OR ITS CONTENT.

Copyright and trademark: Nokia is a registered trademark of Nokia Corporation. Other


product names mentioned in this document may be trademarks of their respective owners.

© 2023 Nokia.

2 © 2023 Nokia. Nokia confidential


Table of Contents

Summary of changes .................................................................................................................... 6

1 Introduction to high availability services .................................................................................. 7


1.1 Managed objects ............................................................................................................... 7
1.2 States of managed objects ............................................................................................. 9
1.2.1 State attributes ................................................................................................... 11
1.2.2 Permitted state attribute combinations and their meaning ....................... 13
1.2.3 Status attributes ................................................................................................. 14
1.2.4 Role attribute ....................................................................................................... 15
1.2.5 Dynamic attributes .............................................................................................. 16
1.3 Redundancy models ....................................................................................................... 19
1.4 Recovery group resources ............................................................................................ 21
1.5 Dependencies between managed objects .................................................................. 22
1.6 Controlled switchover and forced switchover ........................................................... 24
1.7 Fault management process .......................................................................................... 25

2 Introduction to the Functional Unit ........................................................................................ 29


2.1 Redundancy model of functional units ....................................................................... 29
2.2 Functional unit state model .......................................................................................... 30

3 Recovery actions ........................................................................................................................ 33


3.1 Alarms in management operations .............................................................................. 34
3.2 Checking unit working state and status ...................................................................... 36
3.3 Checking and changing the state of a Managed Object ........................................... 48
3.4 Checking dependencies between MOs ........................................................................ 52
3.5 Checking summary of MOs ............................................................................................ 57
3.6 Performing a controlled switchover ............................................................................ 58
3.7 Performing a forced switchover ................................................................................... 62
3.8 Powering off a node ....................................................................................................... 65
3.9 Powering off a BCN module .......................................................................................... 67
3.10 Powering off a cluster .................................................................................................. 70
3.11 Powering on a node ..................................................................................................... 72
3.12 Powering on a BCN module ......................................................................................... 73
3.13 Powering on a cluster .................................................................................................. 74
3.14 Restarting an MO .......................................................................................................... 76
3.15 Restarting a BCN module ............................................................................................ 79

4 Troubleshooting recovery and unit working state administration .................................... 85


4.1 USPU-related MO shutdown fails ................................................................................. 86

5 Appendix: Concept difference between mcRNC and IPA-RNC ........................................... 87


5.1 Recovery system in IPA-RNC ......................................................................................... 87
5.2 HAS in mcRNC .................................................................................................................. 88
5.3 Concept mapping between recovery unit and functional unit ................................ 89

© 2023 Nokia. Nokia confidential 3


List of Figures
Figure 1 HAS system model and proxy feature ....................................................................... 8
Figure 2 The standard state model for managing resources used in network element
........................................................................................................................................... 11
Figure 3 Example of smart deployment of active/standby recovery units ...................... 21
Figure 4 Fault management process ...................................................................................... 25
Figure 5 Recovery system model ............................................................................................. 87
Figure 6 HAS system model ...................................................................................................... 88

4 © 2023 Nokia. Nokia confidential


List of Tables
Table 1 Description of dynamic attributes ............................................................................ 17
Table 2 Mapping between RU and HAS redundancy models ............................................... 30
Table 3 Mapping between the functional unit states and the combined HAS states
........................................................................................................................................... 32
Table 4 HAS SCLI commands and user permissions ............................................................. 33
Table 5 Show has functional unit commands and user permissions ................................. 33
Table 6 Alarm descriptions ....................................................................................................... 35
Table 7 Generic options for the has command ..................................................................... 60
Table 8 Options for controlled switchover ............................................................................ 61
Table 9 Generic options for the has command ..................................................................... 63
Table 10 Options for forced switchover ................................................................................. 64
Table 11 Parameters for the set hardware power off command ...................................... 68
Table 12 Parameters for the set hardware restart command ........................................... 81

© 2023 Nokia. Nokia confidential 5


Summary of changes

A list of changes between document issues. You can navigate through the respective changed
topics.

Changes between issues 08E (2020-11-11, WCDMA 19) and 09 (2023-02-13)


Fault management process

Added a line about node isolation under Fault isolation.

Changes between issues 08D (2020-09-30, WCDMA19) and 08E (2020-11-11,


WCDMA 19)
Fault management process

Updated the description for repairing the system.

Powering off a node

Updated the procedure with <node-name>.


Updated the Before you start section in the procedure.

Powering off a cluster

Steps and note related to power off command is updated.

Restarting a BCN module

Updated the commands in the procedure.

Changes between issues 08C (2020-08-13, WCDMA 19) and 08D


(2020-09-30, WCDMA19)
Powering off a node

Updated the procedure with the step to gracefully shutdown.


Updated the procedure with <node-name>

Restarting a BCN module

Updated the output in the procedure.

6 © 2023 Nokia. Nokia confidential


1. Introduction to high availability services

The High Availability Services (HAS) supports various administrative and recovery actions to
enhance the availability of the system. The purpose of taking recovery actions is to recover from
failures occurring in the system resources controlled by the HAS.

You can use the SCLI command show has to check the states of the managed objects.

You can use the SCLI command set has to perform the following administrative actions:

locking the managed objects


unlocking the managed objects
gracefully shutting down the managed objects

The recovery actions include:

restarting the managed objects


powering off the nodes
performing a controlled/forced switchover between the recovery units within a recovery group

1.1 Managed objects

The term managed object is one of the basic concepts of the HAS framework. Each resource that
the HAS manages is a managed object (MO). An MO can be a cluster, a node, a recovery group
(RG), a recovery unit (RU) or a process.

Figure 1 shows the HAS system model. It demonstrates that the resources managed by HAS are
hierarchically organized into MO. Starting from the lowest level, there are processes, recovery
units (RUs), recovery groups (RGs). In a cluster environment, the highest level is the cluster, which
contains nodes and recovery groups. It is important to remember that when a HAS command is
applied to an MO at a certain level, it automatically applies to all MOs that are located below this
MO in the hierarchy.

© 2023 Nokia. Nokia confidential 7


Figure 1: HAS system model and proxy feature

The managed objects perform the following roles:

cluster
The cluster is the topmost managed object in the system model. The cluster consists of nodes
and recovery groups. In single node deployment, the operations performed on the cluster MO
affect only the single node. The managed object name of the cluster is the slash character (/).
node
In the context of high availability services, the term node refers to a certain hardware entity
(for example, CFPU, CSPU, EIPU, USPU) or specific resources of a hardware entity, the operating
system (including the network file system and basic messaging), and the HAS software. The MO
name of the node is the slash character followed by the node name, for example:
/USPU-0
recovery group (RG)
A recovery group is a group of identical recovery units and the redundancy policy they obey. In
other words, the recovery group consists of a number of recovery units controlling the similar
resources. The MO name of the recovery group is the slash character followed by the recovery
group name, for example:
/Directory
recovery unit (RU)
A recovery unit is a collection of processes that constitute the target of a certain recovery
action, for example a switchover. The processes all support the same redundancy model. A
recovery unit is the central software entity controlled by the HAS. Since the recovery unit is
always running in a single node, the MO name takes the form /<node_name>/<RU_name>,
for example:
/CFPU-0/QNOMUServer-0

8 © 2023 Nokia. Nokia confidential


process
In the HAS context, the term process means a process started by the HAS or implementing a
HAS service. A process may or may not be high-availability-aware (HA-aware). The awareness
determines the type of supervision (active or passive) that the HAS uses in the process. Active
supervision is only applied to HA-aware processes, whereas passive supervision is applied to all
processes. Individual processes can be restarted. Otherwise they cannot be managed by the
administrator using HAS SCLI commands. The MO name of the process takes the following
form: /<node_name>/<RU_name>/<process_name>

In addition to the various types of managed objects presented above, the HAS framework can
also be extended to cover proxied components - in other words hardware or software
components that are separately monitored and managed by proxy processes. A proxy process
can act as a proxy for one or more proxied components. A proxied component is mediated by one
operational proxy at a time. It is not the responsibility of the HAS to monitor the operation of
such proxied components directly.

Since a proxied component is not part of the HAS framework and therefore does not have an MO
name, the distinguished name (DN) of the component must be used to identify the object.
Distinguished names are employed for unambiguously identifying objects in the Configuration
Directory. An example of a distinguished name is:

fsipHostName=USPU-0, fsFragmentId=Nodes, fsFragmentId=HA,


fsClusterId=ClusterRoot.

The HAS framework has been designed to be scalable so it can support clusters ranging from a
couple of nodes to tens of nodes. High scalability is achieved by limiting the cluster wide decisions
to recovery unit level. Each node controls and supervises its own processes and can autonomously
execute the recovery actions in process level.

1.2 States of managed objects

The HAS framework follows a standard state model for managing the resources (managed
objects). This state model is recommended in X.731 of the International Telecommunication Union
- Telecommunication Standardization Sector (ITU-T).

According to this model, the managed objects have three main state attributes:
administrative, operational and usage. The model also includes a set of status
attributes that are called alarm, procedural, availability, and unknown. These
additional status attributes provide further information about the main states.

© 2023 Nokia. Nokia confidential 9


Note:
You can check the administrative, operational and usage of any managed
object, but you can only change the administrative state of managed object. The
operational and usage states change automatically by the system.

You can perform the lock/unlock/shutdown operations to change the administrative


state of the managed object.

As an extension to the standard state model, the platform provides additional status attributes,
such as role.

The three main state attributes and the four status attributes used by the HAS are explained
below. The combinations of different state attributes are also presented, as well as their meaning
from the viewpoint of a managed object (node).

In addition to the above mentioned attributes, HAS also supports dynamic attributes. A dynamic
attribute is a name and value pair, for example WAITING_SERVICE = /Directory. Both the
HAS and application processes can add additional state information to any MO by adding dynamic
attributes.

Figure: The standard state model for managing resources used in network element illustrates the
standard state model for the managed resources.

10 © 2023 Nokia. Nokia confidential


Figure 2: The standard state model for managing resources used in network element

1.2.1 State attributes

The managed objects have three main state attributes: administrative, operational and usage.

Administrative state
There are three possible values for the administrative state: UNLOCKED, LOCKED, and
SHUTDOWN. The operator can change the administrative state using the set has tool options
LOCK, UNLOCK, and SHUTDOWN.

© 2023 Nokia. Nokia confidential 11


UNLOCKED
In the UNLOCKED state, the software or hardware entity represented by the managed object
can perform its normal duties.
LOCKED
In the LOCKED state, the entity is administratively prohibited from performing its normal
duties, until explicitly unlocked by the operator.
SHUTDOWN
In the SHUTDOWN state, the entity processes the ongoing services, but must not take on any
new work. After the ongoing service requests are finished, the administrative state
automatically changes to LOCKED. The SHUTDOWN state is an intermediary state that is used
for implementing a graceful shutdown behaviour.

Operational state
The value of the operational state attribute is either ENABLED or DISABLED. Unlike the
administrative state, the operational state is controlled by the HAS itself. So you cannot change
the operation states.

ENABLED
In the ENABLED state attribute, the entity represented by the managed object is functioning
properly and can perform its duties normally.
DISABLED
In the DISABLED state attribute, the entity is not functioning properly and cannot perform its
duties. In other words, it is regarded as faulty in some way.

Usage state
The usage state attribute describes the usage status of the entity represented by the managed
object. There are three possible values for the usage state attribute: IDLE, ACTIVE, and BUSY.
The usage state attribute is controlled by the HAS for all the managed objects except the
processes.

IDLE
In the IDLE state attribute, the entity is not currently processing any service requests.
ACTIVE
In the ACTIVE state attribute, the entity is processing service requests and there is still some
spare capacity for new service requests.
BUSY
In the BUSY state attribute, the entity has no more spare capacity until some of the active
service requests have terminated or more capacity is added.

12 © 2023 Nokia. Nokia confidential


The HAS automatically sets the usage state of non-high-availability-aware processes to ACTIVE
after starting the process.

1.2.2 Permitted state attribute combinations and their


meaning

There are certain dependencies between the main state attributes, resulting in eight attribute
value combinations of the administrative, operational, and usage attributes.

The permitted value combinations are listed below. When explaining the meaning of each
combination, using a specific example, it is assumed that the managed object in question is a
node if not otherwise mentioned.

LOCKED, DISABLED, IDLE: The state of the node is unknown. There are two possible
reasons for this:
1. The operator has locked a failed node. Automatic recovery actions have been unsuccessfully
attempted.
2. The power was turned off at the request of the operator.
LOCKED, ENABLED, IDLE: The node is still up and running but has been locked by the
operator.
SHUTDOWN, ENABLED, ACTIVE: The node is shutting down the services gracefully. The HA-
aware applications are terminating. The applications do not accept new service requests during
the graceful shutdown, regardless of the ACTIVE value of the usage state attribute.
SHUTDOWN, ENABLED, BUSY: This state is not possible so far as nodes are concerned. It is
possible only in the case of processes. The process is shutting down gracefully. The HA-aware
applications are terminating.
UNLOCKED, DISABLED, IDLE: The node is disabled. All repair attempts are executed in this
state. If an active node becomes faulty, it is moved to this state.
UNLOCKED, ENABLED, IDLE: The node is in a normal state. There are neither faults nor
administrative actions initiated by the operator. No transactions or sessions are ongoing.
UNLOCKED, ENABLED, ACTIVE: The node is in a normal state. There are neither faults nor
administrative actions initiated by the operator. At least one RU in addition to the HAS
recovery units is running.
UNLOCKED, ENABLED, BUSY: This state is not possible so far as nodes are concerned. It is
possible only in the case of processes. There are neither faults nor administrative actions
initiated by the operator. However, new service requests are not accepted because the usage
state value is BUSY.

© 2023 Nokia. Nokia confidential 13


1.2.3 Status attributes

The status attributes provide further information about the main states. The attribute values are
EMPTY if the managed object is running normally.

Alarm status
The possible values for the alarm status attribute are OUTSTANDING and MAJOR. Both values
must be set at the same time.

The OUTSTANDING value is set for a managed object that has an active alarm, with the following
exceptions:

The alarm is only a warning.


The alarm has not been explicitly cancelled by the HAS.
Certain specific situations, such as a switchover alarm.

The MAJOR value is set for a managed object that has a major active alarm.

Procedural status
There are three possible values for the procedural status attribute: INITIALIZING,
NOTINITIALIZED and TERMINATING.

INITIALIZING
In the INITIALIZING state attribute, the process, node or RU is currently starting.
NOTINITIALIZED
In the NOTINITIALIZED state attribute, the process, node or RU is not running.
TERMINATING
In the TERMINATING state attribute, the process, RU, RG or node (and in a cluster
environment also the whole cluster) is currently terminating.

Note:
The procedural state of the ENABLED (operational state) nodes is INITIALIZING when the
services in the node are still starting up.

14 © 2023 Nokia. Nokia confidential


Availability status
There are four possible values for the availability status attribute: POWEROFF, FAILED, OFFLINE
and OFFDUTY.

POWEROFF
In the POWEROFF state attribute, the node is powered off.
FAILED
In the FAILED state attribute, the process, RU or node is faulty and waiting for a repair. In a
cluster environment, the FAILED value is also shown when the node is not physically present in
the cluster.
OFFLINE
In the OFFLINE state attribute, the node is not operational.
OFFDUTY
In the OFFDUTY state attribute, the node, process, RU, RG (or cluster in a case of cluster
environment) is not running an active service (This usually means that the managed object is
LOCKED).

Unknown status
The value of the unknown status attribute can be TRUE only for a node that is LOCKED and not
operational (its operational status is DISABLED). It can also be TRUE for a short period of time,
when the system is starting. In other cases this value is FALSE.

1.2.4 Role attribute

The role attribute is used for specifying the role of a RU in an active/standby pair of a RG.

There are three possible values for the role attribute: ACTIVE, COLDSTANDBY, and
HOTSTANDBY.

ACTIVE
If the value of the role attribute is ACTIVE, the managed object is providing normal service.
HOTSTANDBY
If the value of the role attribute is HOTSTANBY, the managed object is acting as a standby
resource for an active managed object in a hot active/standby pair and will be promoted to the
active role when the active object fails. Both the active and standby processes are running.
COLDSTANDBY
If the value of the role attribute is COLDSTANBY, the managed object is acting as a backup

© 2023 Nokia. Nokia confidential 15


resource for an active managed object in a cold active/standby pair and will be promoted to
the active role when the active object fails. Only the currently active process is running, and the
standby process is not running.

1.2.5 Dynamic attributes

Dynamic attributes are name-value pairs that show the state of the process associated with the
recovery unit. They can be used for informing the state to the users or other processes running in
the cluster and for troubleshooting purposes.

The dynamic attributes are described in the following table:

16 © 2023 Nokia. Nokia confidential


Table 1: Description of dynamic attributes

Name Supported values Description

MISSING
HW_STATUS HAS sets and keeps this attribute for nodes that have never
(since commissioning) started up successfully.

INERT_MODE ENABLED,
TEST_MODE It indicates that the MO has been set to inert mode (or its sub-
state test-mode) by an operator or script. inert mode is also
known as "recovery ban". For example. HAS does not react to
failures if the MO is in inert mode; HAS does not issue a HW
reset for failing unit when it is being upgraded.

LAST_FUNCTIONAL -
It is the timestamp when the RU was last running. It is available
when RESOURCE_STATE of the RU is NON-FUNCTIONAL.

RESET_BLOCK remaining time


It indicates that the node has been set to reset-block state.
When a node is in reset-block state, HAS does not execute
recovery actions that require node restarting. Requests to
restart the node are also denied. Timer value indicates how long
the node remains in reset-block state.

POWERING_OFF_REA NONE,
SON DEBUGGING, It is valid for nodes. It indicates the reason the operator or
MAINTENANCE, script gives when the node is powered off .
POWER-SAVING,
HARDWARE-
FAILURE

RESOURCE_STATE FUNCTIONAL
It means that the RU is functioning.

NON-FUNCTIONAL
It means that the RU could not be started currently.

DEGRADED
It is valid for a standby unit and indicates that the standby is
temporarily out-of-sync with the active, and switchover could
not be done without risk of data loss.

TRASHED
It is only valid for a standby unit. It indicates that a local
database is missing or corrupted, and a switchover (if forced)
would lose all database data.

UNKNOWN It indicates that the RU state is unknown.

RESOURCE_LEVEL <1..100>
It is a percentage number that indicates how healthy the
resources are. If resource levels between units differ, HAS
attempts to keep service active on units with a higher resource
level.

© 2023 Nokia. Nokia confidential 17


Name Supported values Description

Stack Status ALIVE


It means that the SCCP or SS7 stack is active and functioning.

CONFIG
It means that all the configuration for SCCP or SS7 stack is
complete and inter process communication is in progress inside
distributed sigtran processes.

UNCONFIG
When the ROLE is ACTIVE, this value means the SCCP stack is
being configured.
When the ROLE is HOT_STANDBY, this value means the SCCP
stack is being configured or waiting for the occurrence of
switchover.

SWOUNCONFIG
It means the SCCP switchover is ongoing. It is only applicable for
SCCP processes.

DEAD
It means that the SCCP or SS7 stack is not active and not
functioning.

SWITCHOVER_PHASE QUIESCING
It means that the RU is releasing (stopping the use of) shared
resources.

QUIESCED
It means that the RU has released shared resources and they
can be allocated on the standby side.

UNQUIESCING
It means that controlled switchover is canceled and the RU is
resuming the active role.

ACTIVATING
It means that the RU is being activated.

BECOMING_HOTSTA
NDBY It indicates that a QUIESCED RU is not turning to a hotstandby.

SHUTTING_DOWN
It means that the RU is shutting down. It is valid for cold
active/standby RUs.

WAITING_SERVICE <RU or RG name>


It indicates that the RU has a startup time or parasite
dependency to another service.

Note:

It is normal that a parasite RU of an active/standby service


has this value permanently on the standby side.

18 © 2023 Nokia. Nokia confidential


1.3 Redundancy models

Redundancy is a method of providing the system with redundant equipment to improve its
tolerance against faults. This is achieved by providing backup resources for functional units. HAS
controls resources MO and reacts to their faults according to the redundancy model in question.

To understand the HAS system model, it is important to make the distinction between software
and hardware. Recovery actions are executed at the software level, not by switching between
hardware components.

In the deployment design, a key target is to ensure that there are enough redundant hardware
resources available to meet the system level availability requirements. The redundant
communication network is an example of such hardware-level redundancy. As far as the software
is concerned, redundancy for a service is achieved by deploying standby service instances RU to
the appropriate nodes. The number of redundant RUs and their deployment methods depend on
the redundancy model.

The HAS supports the following software redundancy models:

hot active/standby redundancy


cold active/standby redundancy
cold one plus M redundancy
load sharing redundancy
no redundancy

Hot active/standby redundancy


A hot active/standby pair consists of two RUs offering the same services. Processes in both the
active and standby RUs are running and can replicate data using some application-specific
method.

Cold active/standby redundancy


A cold active/standby pair also consists of two RUs offering the same services. Processes in the
active RU are running and offering service. The redundant processes in the cold standby RU,
however, are not running.

During switchover, the roles of the RUs are swapped. The processes running in the active RU are
terminated and the unit becomes the standby unit. The processes in the former standby RU are
started, making it the new active RU.

© 2023 Nokia. Nokia confidential 19


Cold one plus M redundancy
A cold one plus M recovery group can have more than one standby RU instead of only one
redundant RU as in a cold active/standby. When active RU fails, HAS selects the preferred standby
RU for recovery actions, so that the node with the least usage is selected. There is a mechanism
to force an automatic switchback to the preferred configuration. When
fshaForcePreferedConfigTimeoutSecs is defined, HAS initiates forced/controlled
switchover to ensure that the standby resources are available.

An N+M-like redundancy configuration can be created by defining N cold one plus M recovery
groups. To make the created N+M-like redundancy configuration more manageable, it is possible
to create a logical group (LG) for it. An LG for an N+M configuration is created from node
perspective, merging the related recovery units in each node to a single logical service.

Load sharing redundancy


A load-sharing recovery group consists of a number of RUs that offer the same services and share
the load of the service requests.

From the HAS point of view, the load sharing redundancy model and the no redundancy model
are alike, although HAS offers some notification support in the case of load-sharing redundancy.
For each load-sharing group, there is a lower limit for the number of active RUs in that group
defined in the Configuration Directory attribute fshaThreshold. If the number of active RUs
drops below this limit, the HAS sets an alarm indicating this condition. This is the only load-sharing
specific support that the HAS provides. The HAS assumes that there is a load-balancing
mechanism elsewhere in the system that is able to assign the workload of a failed RU to the
remaining RUs.

No redundancy
Recovery groups of the no redundancy type provide node-local services for which active/standby
redundancy would make no sense. In the case of no redundancy, the HAS can attempt to restart
either individual processes or the whole recovery unit.

Smart deployment of active/standby recovery units


In this deployment scheme, default standby recovery units are concentrated into one node, while
the default active recovery units are distributed between several nodes. The following figure
shows an example of smart deployment of active/standby recovery units in practice.

20 © 2023 Nokia. Nokia confidential


Figure 3: Example of smart deployment of active/standby recovery units

In the figure above, the lowest recovery group is of the hot active/standby type (in other words
both active and standby processes are running), while the two uppermost recovery groups are of
the cold active/standby type (the processes are not running in the standby RU).

The HAS also protects a node with concentrated default standby recovery units from becoming
overloaded. It does this by offering an opportunity to define and detect the quota that was
caused by the active recovery unit(s), and to switch it over to the node in the case of failure. When
the quota in the node with the standby recovery units is too high, failed active recovery units will
not switch over to that node, but will instead be restarted in the node where they failed.

Additionally, automated fallback is supported; this means returning an active recovery unit to the
node with the preferred active location, when this node becomes available again after a failure.

1.4 Recovery group resources

High availability services (HAS) supports the use of the active/standby redundancy model by
allowing the linking of various resources to recovery groups (RGs). These include storage
resources such as disk file systems, distributed replicated block devices (DRBD) and raw
partitions, as well as IP addresses.

Storage resources
A cluster only contains non-shared, directly-attached storage resources (for instance DRBD
devices) and is called a shared-nothing system. In a shared-nothing configuration, the same data
is replicated and maintained in synchronisation on two or more independent nodes. When such a
system starts up, it must decide which of the storage resources is most up-to-date.

© 2023 Nokia. Nokia confidential 21


HAS can use the resource controller process to control the roles of resources and recovery units.
The role of the resourcer controller process is as follows. First, the HAS starts the resource
controller processes associated with the RUs of the active/standby RG, after which the resource
controller processes exchange information to determine which node contains the most recent
database replica. Based on the information provided by the resource controller processes, HAS
decides which RU is assigned the "active" role and which RU is assigned the "standby" role. After
the role assignment is clear, the HAS can start the RUs and their processes. Note that the RG
must be unlocked before the resource controller processes can start.

IP addresses
The HAS allows the association of IP addresses to services. These addresses are movable
resources analogous to the movable storage resources. They always point to the active RU of an
active/standby RG. The IP addresses of active/standby RGs are either redundant or dedicated IP
addresses. Redundant IP addresses are cluster-internal addresses, whereas dedicated IP
addresses are visible outside the cluster and can be used by external applications for pointing to
resources in the cluster itself.

1.5 Dependencies between managed objects

It is possible to declare dependencies between managed objects, such as the startup order
between recovery units, the startup order between processes within a recovery unit, the
switchover behaviour of recovery units within a set of interdependent recovery groups, the hot
switchover order of recovery groups, and the hot switchover order of processes within a recovery
unit. This allows for more optimized and faster startup and switchover operations.

Parasite recovery groups


A parasite recovery group (RG) needs and uses (parasites) the resources (for instance disk mount
points or IP addresses) of a host RG in the same node. Failure of a recovery unit (RU) of a parasite
RG does not affect the operation of the related host RUs. The parasite RG is allowed to perform a
switchover to another node only if the host RG performs a switchover. In other words, a parasite
RU can only be restarted as a recovery action. Failure of the host RU causes the termination of all
parasite RUs as part of host RU isolation. The parasite RUs won’t start again until the host RU is
started. As a result, parasite RGs slow down the switchover operation of the host RG.

22 © 2023 Nokia. Nokia confidential


Stalker recovery groups
The roles of stalker recovery group RUs automatically follow (stalk) the defined target recovery
group RU roles. The RUs of a stalker RG do not use any resources owned by the target RG, so that
a target RG switchover can always proceed without waiting for the isolation of stalker RUs. As in
the case of a parasite recovery group, the failure of a stalker RU does not affect the operation of
the target RUs in any way. The stalker RU performs a switchover to another node only if the target
RU performs a switchover. In other words, a stalker RU can perform a switchover only as a
recovery action.

Symbiotic recovery groups


Any number of hot or cold active/standby recovery groups that have been configured to run on
the same nodes can be grouped together to form a symbiotic group of RGs. A switchover can
take place only if it is possible for all RUs in the symbiotic group. A switchover command applied to
any RU in the symbiotic group causes the switchover of all RUs in the group. When there is an RU
failure, and a switchover of the whole symbiotic group is not possible, only the failed RU is
restarted. In other words, failures do not propagate between the RUs of symbiotic recovery
groups.

Local dependency
Local dependency defines the startup order of the RU within the same node. In the case of local
dependency, the startup of the RU depends on the local service in the node. The startup of the
target RU is allowed once at least one RU belonging to the local service has started up on the
same node where the current RU is located. If the RU depends on the local service in the multiple
nodes, at least one RU in each different services has to be started up.

Global dependency
Global dependency defines the startup order of the RU in the entire cluster. RU startup depends
on a global service that can be located anywhere in the environment. RU startup is allowed as
soon as at least one RU belonging to the global service has started up anywhere in the
environment. If RU depends on multiple global services, at least one RU in any of the different
services has to be started up.

© 2023 Nokia. Nokia confidential 23


1.6 Controlled switchover and forced switchover

Switchover is one of the service provided by HAS. It is one of the recovery actions for recovering
the faulty RU in the system.

A switchover is preformed from the failing active RU to the standby RU in an active/standby RG or


cold one plus M RG. It can be either taken automatically by the HAS, or as manual administrative
steps by the operator. When there is an error found in an RU, the system terminates the RU
processes, allocates the external resources to the standby RU and activate it into the active one.
Then the newly activated RU takes over the role of the faulty RU, and the faulty RU is switched to
the standby role. HAS initiates controlled/forced switchover to ensure that the standby resources
are available.

Note:
To perform the controlled/forced swithcover, the MO must support the hot or cold
active/standby redundancy model or the cold one plus M redundancy model.

Controlled switchover
The controlled switchover is performed in the active RU of an active/standby pair in an RG. The
RGs in hot or cold active/standby redundancy model or the cold one plus M redundancy model
provide the controlled switchover support to ensure the data are not lost during a switchover. For
a controlled switchover, the operation completes only when the data on the old standby RU is
known to be up to date with the old active RU. A controlled switchover can be done only when
both the active and standby RUs are operational. Therefore, the controlled switchover is primarily
performed when the cluster is functioning normally.

Forced switchover
By performing the forced switchover, you can force the RUs in an active/standby pair to exchange
their roles. The forced switchover is required when the controlled switchover fails or is not
supported by the system; for example, the application continuously denies a controlled switchover
operation because of high load. The switchover can be executed when the target MO is in the
standby state expected by the redundancy model.

24 © 2023 Nokia. Nokia confidential


1.7 Fault management process

In general, fault management means employing a set of functions which can detect and correct
fault situations in the system.

The fault management process in the system is as follows:

Figure 4: Fault management process

As shown in Figure: Fault management process, the fault management process in a system has
the following five steps:

1. Detection
The first step is to detect the fault.
2. (On-line) Diagnosis
The second step is to determine the cause of the fault.
3. Isolation
The third step is to protect the rest of the system from the fault.
4. Recovery
The fourth step is to restore the system to the expected behavior (recovery), by performing a
switchover or restarting some part(s) of the system. At this stage further, the actions to
determine the cause of the fault (off-line diagnosis) may be taken.
5. Repair
The final step is the repair of the system. This generally means that the faulty parts in the
system are replaced with working parts.

Notification of the fault occurs at many points in the fault management process. Components
within a system must interact with each other to enable the fault management.

Fault detection
Fault detection (also called fault supervision or fault localization) is the process of identifying an

© 2023 Nokia. Nokia confidential 25


undesirable condition (fault or symptom) that may lead to the loss of service from a system or
device. Both cluster and single node network element provide the high availability services (HAS)
framework, among others, for detecting where and when a fault occurs, and for passing relevant
information on the fault to the HAS processes responsible for diagnosis, isolation and recovery
actions.

The HAS supervises the health of software processes in the network element both passively and
actively. Passive supervision means detecting unexpected process terminations. Active supervision
is an optional feature and means heartbeating a high availability aware (HA-aware) process. A
heartbeat is a polling mechanism that is used to verify that a software resource is healthy. This
non-intrusive health monitoring method uses only a small percentage of the computing
resources. In most environments, HAS feeds a hardware watchdog though operating system
interfaces. So, a node is reset automatically if the operating system or HAS malfunctions.

In addition to the failures of the software processes, there may be hardware-related failures such
as:

node failures
communication network failures
hardware device failures

In a cluster, the node failures require a communication-based detection mechanism to know the
state of each node. Node failure detection must be fast and must not depend on a failing node
reporting its own failure. However, self-diagnosis may be leveraged to speed up failure detection
in the cluster.

Communication network failures in a cluster require health monitoring with the ability to withstand
a failure between the network cluster nodes. When a communication link fails, it must detect the
difference between a communication network failure and a node failure.

One example of a hardware fault is when the temperature starts to exceed a certain level.
Corrective measures can be, for example, to accelerate the fans or throttle the central processing
unit (CPU) to prevent it from being damaged.

On-line diagnosis
Once a fault is detected, the problem must be analyzed to determine the proper isolation and
recovery actions. The diagnosis process analyses one or more events and system parameters to
determine the nature and location of a fault. This step can be automatic or invoked separately by
the operator. The result of the diagnosis may be acted upon automatically or manually.

As an example, any intelligent network device, when started, runs some basic diagnosis to check
the health of the device. In standard personal computer (PC) hardware, as in server blades, the

26 © 2023 Nokia. Nokia confidential


basic input/output system (BIOS) is responsible for running such a test, called power-on self-test
(POST), at start-up.

Typically, only hardware components (for example, nodes, disks, network interfaces) are seen as
targets for diagnosis.

Fault isolation
The purpose of fault isolation is to keep a fault from spreading to other components of the
system. This is achieved, for example, by taking the defective component in the system out of
service. By isolating the fault in a running system, the system can be maintained, at least partially,
in the operational state.

Typical fault isolation operations are resetting or shutting down a faulty recovery unit. In node
failure situations, fault isolation is carried out by issuing a hardware reset to the node or by
powering off the node, depending on the nature of the hardware fault.

Isolating a faulty component does not necessarily imply that the fault is corrected. Fault recovery
actions are needed to bring the faulty part of the system back into operation. In some cases, it
may not be possible to separate the fault isolation and recovery processes. In a cluster, this could
occur when shutting down a faulty node in a load-sharing recovery group.

Fault recovery
The purpose of the fault recovery process is to restore the faulty part of the system to the
operational state, even though full capacity may not be achieved. In general, the possible recovery
actions are a switchover between the active and standby recovery units and restarting the faulty
process, recovery unit or the node (in a cluster environment).

In a clustered environment a switchover is performed from the failing active recovery unit to the
standby unit in an active/standby recovery group. Another recovery action could be to restart the
faulty process - perhaps several times - before performing a switchover. A more radical measure
is to reboot the faulty node (or the node running the faulty recovery unit).

The full switchover time depends on the time it takes to detect the application or node failure, to
apply the switchover policy, and to restart the application in the case of cold active/standby. The
aggregate switchover time must be short and must allow the cluster to maintain carrier grade
availability.

Repairing the system


The purpose of the repair process is to return the network element to its normal operating state
and availability level. The system may first try to recover from a fault by restarting the appropriate

© 2023 Nokia. Nokia confidential 27


managed object (process, recovery unit, or node), probably several times. If the restart does not
help in repairing the fault, and the root cause is suspected to be a hardware fault, the faulty node
(add-in card) must be replaced with a working one. If the fault is impacting several nodes in the
same BCN module, the root cause can be a hardware fault in some component on the BCN
motherboard. In this case the whole BCN module would need to be replaced.

For replacing an add-in card or BCN module, see Replacing Multicontroller Hardware Units.

Fault management tools


The SCLI tool set has offers the means of interacting with the HAS system. This tool supports
the auto-completion of the MO names, and displaying all the MOs automatically when a particular
option of the tool is used. With this tool, you can:

change the states of the MO


lock and unlock the MO
shut down and restart MO
power off or power on nodes
perform a controlled/forced switchover between recovery units within a recovery group

Error logs may be created under special circumstances during some management operations,
such as power on, power off, switchover, forced switchover, upgrade and so on. In these cases,
the error logs are a result of the unusual state of the cluster, and do not necessarily indicate an
actual error situation. Those error logs can be ignored.

The main interface for monitoring the health of the system is the alarm system and logs are only
supplementary information when investigating alarms or other operational problems like failing
SCLI commands.

28 © 2023 Nokia. Nokia confidential


2. Introduction to the Functional Unit

In the HAS MO environment, the functional unit (FU) is an entity of software capable of
accomplishing a special purpose. A functional unit typically has a visible operating state. It belongs
to one of Control, User, Transport or Management planes. Most functional units are under the
control of fault management.

A functional unit is a special kind of MO. It can be either a recovery unit or a simple executive
node.

2.1 Redundancy model of functional units

All the crucial parts of the network element have been backed up to ensure the reliability of the
system's operations.

Functional units with different redundancy models have different mapping rules for unit state
changes. The redundancy models of the functional units are as follows:

2N redundancy model
2N*M redundancy model
N+M redundancy model
SN+ redundancy model
No redundancy model

2N redundancy model
2N is a high available redundancy model, with two units in a redundancy group, one is the working
unit and another is the hot standby unit.

2N*M redundancy model


2N*M redundancy model has the same redundant principle as 2N redundancy model. It contains M
pairs of 2N redundant group. In this group, one unit is working and another unit is hot standby.
The M pairs of 2N redundant group can share the workload among each other.

© 2023 Nokia. Nokia confidential 29


N+M redundancy model
There’s a group of units in the N+M model. N represents the number of the working units and M
represents the number of the backup unit(s) for the groups of units.

SN+ redundancy model


A load-sharing model. It is assumed that there is a load-balancing mechanism elsewhere in the
system that is able to assign the workload fro a failed unit to the remaining units.

No redundancy model
There are no redundant resources for the service, and recovery from a fault is accomplished by
restarting the faulty recovery unit.

Redundancy models mapping between FU and MO

Table 2: Mapping between RU and HAS redundancy models

FU Redundancy Model HAS Redundancy Model

2N Hot active/standby

2N*M Hot active/standby

N+M (M >= 1) Cold one plus M

SN+ Load-sharing

No redundancy No redundancy

Note:
If the FU is the simple executive node, then there is no redundancy for this FU.

2.2 Functional unit state model

Functional units have their own state model, which differs considerably from the state attributes
of the managed object (MO). Other working states are either incorrect or not supported. If a unit

30 © 2023 Nokia. Nokia confidential


is continuously in an incorrect working state, an alarm is raised at regular intervals.

The state model of the functional units supported by the system are as follows:

WO-EX (Working, Executing)


WO-RE (Working, Restarting)
SP-EX (Spare, Executing)
SP-RE (Spare, Restarting)
BL-EX (Blocked, Executing)
SE-OU (Separated, Out of use)

Except the states listed above, other states are either incorrect or not supported, such as TE
(Test), BL-ID (Blocked, Idle), BL-RE (Blocked, Restarting), SP-UP (Spare, Updating), SE-NH
(Separated, No hardware), and so on.

Note:
You can only check the states of the functional unit. Changing the state of the functional
units is usually performed automatically by the system. That means when the state of MO
changes, the state of the functional unit may change correspondingly.

The show has functional-unit SCLI command can be used to check the unit
states. MO states can be changed by the set has command.

Even though the state model of the FU is different from the HAS state model, each FU can be
mapped to a corresponding HAS state.

Table 2 shows the mapping between the functional unit states and the combined HAS states:

© 2023 Nokia. Nokia confidential 31


Table 3: Mapping between the functional unit states and the combined HAS states

HAS states combination FU state model

Administrati Operation Role Procedural State of the State of the


ve state al state state 2N/N+M FU SN+/No backup
FU

1)
UNLOCKED ENABLED ACTIVE NA WO-EX
WO-EX

UNLOCKED ENABLED HOTSTANDBY NA SP-EX -2)

ENABLED ACTIVE INITIALIZING WO-RE WO-RE


UNLOCKED

UNLOCKED ENABLED HOTSTANDBY INITIALIZING SP-RE -

UNLOCKED DISABLED ACTIVE/HOTSTANDB *3) SE-OU


Y SE-OU

* * * NOTINITIALIZE SE-OU SE-OU


D

SHUTDOWN ENABLED ACTIVE * - BL-EX

LOCKED ENABLED ACTIVE/HOTSTANDB NOTINITIALIZE SE-OU SE-OU


Y D

LOCKED DISABLED ACTIVE/HOTSTANDB NOTINITIALIZE SE-OU SE-OU


Y D

1)
‘NA’ means no value.

2)
‘-’ means the nonexisting state.

3)
‘*’ means any existing state.

32 © 2023 Nokia. Nokia confidential


3. Recovery actions

After getting the correct permissions for executing the SCLI commands, you can take the
recovery actions.

The user permissions control the ability to execute the various commands of the SCLI. You must
have correct permissions for executing the SCLI commands.

Table 4: HAS SCLI commands and user permissions

Command set Permission

show has dependencies fsHASView

show has summary fsHASView

set has lock fsHASManage

set has unlock fsHASManage

set has shutdown fsHASManage

set has power on fsHASManage

set has switchover fsHASManage

set has forced switchover fsHASManage

set has power off fsHASManage

set has restart fsHASManage

Table: Show has functional unit commands and user permissions shows the show has
functional-unit command and the user permission required to execute this command.

Table 5: Show has functional unit commands and user permissions

Command set Permission

show has functional-unit fsHASManage or fsHASView

© 2023 Nokia. Nokia confidential 33


Note:
Execution of the following commands will be denied with an error message, unless the user
gives a "force" option to execute the command:
Restart, lock, shutdown, or poweroff of a node when the services hosted on the node
does not have the healthy peer node services to switch over to
Lock or shutdown of an RU when the operation would cause a switchover of services to
a node that either is unavailable or would provide a lower service level
Restart of a process with important severity
Lock or shutdown of a stalker RU whose host RU is unlocked

The following list of operations are now unconditionally denied:


Cluster restart operation, if another cluster restart is ongoing
Lock or shutdown of a cluster management node if the peer cluster management node
is already in LOCKED/SHUTTINGDOWN administrative states, DISABLED operational
state, or POWERED OFF state

3.1 Alarms in management operations

The alarms are raised by high availability services (HAS) in the management operations such as
powering on, powering off, switchover or restarting.

Different alarms might be raised each time you perform the same operation. Many factors can
affect the alarms to be raised, such as the redundancy model of the recovery unit (RU), the state
of the managed object (MO) and the traffic load of the system.

For example, Alarm 70166 MANAGED OBJECT LOCKED is raised by HAS when you lock the 2N
redundancy unit /CFPU-0/QNCFCPServer-0 which has SP-EX state; and both Alarm 70166
MANAGED OBJECT LOCKED and Alarm 70194 RECOVERY GROUP SWITCHOVER are raised by HAS
when you lock the 2N redundancy unit /CFPU-1/QNCFCPServer-1 which has WO-EX state because
the switchover is triggered.

For more information on alarms, see Multicontroller RNC Alarms (70000-72000).

34 © 2023 Nokia. Nokia confidential


Table 6: Alarm descriptions

Alarm Title Description

70011 A physical computing node does not restart despite


NODE NOT several attempts to restart it. It is possible that the node
RESPONDING may be broken, is unable to restart, or is stuck.

70159 MANAGED OBJECT This alarm is valid for processes, recovery units, recovery
FAILED groups and nodes. This alarm is raised when a named MO
failed, and is automatically cleared when the MO is no
longer down/faulty. The MO can either be software,
hardware, or logical entity.

70166 MANAGED OBJECT The administrative state of the named MO which can be a
LOCKED cluster, a node, or a recovery unit (RU) has changed to
LOCKED as a result of a user action (graceful shutdown or
lock operation).

70168 CLUSTER STARTED The cluster in starting or restarting. The (re)start may have
been initiated by an operator or be caused by fatal errors
in some critical hardware or software component. When
the cluster is restarted, the alarm system clears all alarms
that were raised by the cluster's managed objects before
the restart.

70186 CLUSTER OPERATION This alarm indicates that an operator has initiated a cluster
INITIATED BY OPERATOR operation on the specified MO and HAS is now executing
the operation. The operation can be switchover, restart or
power-off.

70187 MANUAL NODE This alarm is raised when HAS is unable to reset a faulty
ISOLATION node with Intelligent Platform Management Interface
VERIFICATION NEEDED (IPMI). The operational state of the node is not known, and
therefore, it is not known if the node still holds and/or
updates the shared resources.
The alarm is not valid in single node configurations.

70188 MANAGED OBJECT This alarm indicates that the specified MO is being shut
SHUTDOWN BY down. The named MO and all its unlocked sub-resources
OPERATOR are now terminating.

70189 MANAGED OBJECT This alarm indicates that the specified MO has been
UNLOCKED BY unlocked. The named MO and its unlocked sub-resources
OPERATOR (if there are any) can now be activated.

70194 RECOVERY GROUP This alarm is raised when HAS initiates a switchover.
SWITCHOVER

© 2023 Nokia. Nokia confidential 35


Alarm Title Description

70249 CRITICAL CLUSTER This alarm is raised when the standby Cluster
SERVICES WITHOUT Administrator (CLA) node is currently not operational.
STANDBY

70251 UNRECOMMENDED This alarm is raised when an operator locks the current
CONFIGURATION standby FSDirectoryServer recovery unit.
FORCED BY OPERATOR

70255 DRBD This alarm is raised when a secondary Distributed


SYNCHRONIZATION Replicated Block Device (DRBD) does not synchronize or
FAILURE synchronizes very slowly with the primary DRBD device.
DRBD is used to replicate data of an application partition
between two nodes.

70256 RESOURCE ALLOCATION This alarm is raised when allocation or de-allocation of


OR DE-ALLOCATION resources to or from a computer node in the cluster fails.
FAILURE

70265 RECOVERY ACTIONS This alarm is raised when an operator sets a managed
BANNED FOR MANAGED object to inert (recovery ban) mode.
OBJECT

70350 DETECTED CLUSTER This alarm is raised one of the cluster management
INTERNAL MESSAGING functionality nodes (CMFN) has received cluster
WITH UNKNOWN ORIGIN management messages with an unknown origin.

70359 HARD DISK DRIVE The alarm is raised when a disk failure is detected.
FAILED

70365 DRBD DEVICES The alarm is raised when distributed replicated block
FORCIBLY STARTED UP device (DRBD) devices are forced up without waiting for
DRBD re-synchronization.

3.2 Checking unit working state and status

Use the show has functional-unit SCLI command to check following information:
functional unit name, logical and physical addresses of the unit, functional unit state, redundancy
model of the unit, functional unit index, and functional unit type.

36 © 2023 Nokia. Nokia confidential


Purpose

Notice:
You can only check the states of the functional unit. Changing the state of the functional
units is usually performed automatically by the system.

Before you start

Ensure that you have sufficient user permission.

Procedure
1 Check the relevant informations of all functional units

show has functional-unit unit-info

This command shows following informations:

Unit name (unit name)


LOG_ADDR (logical address)
PHYS_ADDR (physical address)
State (unit state)
Redundancy (redundancy model)
RU_MONAME (name of the managed object)

Step example
Example: Following is the execution printout of the show has functional-unit
unit-info SCLI command.

© 2023 Nokia. Nokia confidential 37


Unit name LOG_ADDR PHYS_ADDR State Redundancy RU_MONAME

---------- -------- --------- ----- ---------- ---------------------

OMU-0 0x4002 0x0000 WO-EX 2N /CFPU-0/QNOMUServer-0

OMU-1 0x4002 0x0008 SP-EX 2N /CFPU-1/QNOMUServer-1

CSUP-0 0x4AAE 0x1201 WO-EX NoBackup /CSUP-0

...

USCP-0 0x4ADB 0x0102 WO-EX SN+ /USPU-0/QNUSCPServer-0

...

USUP-0 0x4B1C 0x1302 WO-EX NoBackup /USUP-0

...

EITP-0 0x4B5D 0x1703 WO-EX NoBackup /EITP-0

...

EITPPXY-0 0x4B6E 0x0003 WO-EX NoBackup

/EIPU-0/QNEITPProxyServer-0

...

CSUPPXY-0 0x4B7F 0x0101 WO-EX NoBackup

/CSPU-0/QNCSUPProxyServer-0

...

USUPPXY-0 0x4BAC 0x0202 WO-EX SN+

/USPU-0/QNUSUPProxyServer-0

...

CFCP-0 0x444C 0x0100 WO-EX 2N /CFPU-0/QNCFCPServer-0

CFCP-1 0x444C 0x0108 SP-EX 2N /CFPU-1/QNCFCPServer-1

CSCP-0 0x444F 0x0001 WO-EX N+M

/CSPU-0/QNCSCPServer-0-0

...

QNUP-0 0x4BED 0x0103 WO-EX 2N*M /EIPU-0/QNUPServer-0-0

...

QNIU-0 0x4BFE 0x0503 SP-EX 2N*M /EIPU-0/QNIUServer-0-0

...

QNIUB-0 0x4C0F 0x0303 WO-EX 2N*M

/EIPU-0/QNIUBServer-0-0

...

SCLIU-0 0x4C31 0x0200 WO-EX NoBackup

/CFPU-0/QNSCLIUServer-0

SCLIU-1 0x4C32 0x0208 WO-EX NoBackup

/CFPU-1/QNSCLIUServer-1

38 © 2023 Nokia. Nokia confidential


FUM-0 0x4C33 0x0300 WO-EX 2N

/CFPU-0/IL_FUMServer-0

FUM-1 0x4C33 0x0308 SP-EX 2N

/CFPU-1/IL_FUMServer-1

FUA-0 0x4C35 0x0400 WO-EX NoBackup

/CFPU-0/IL_FUAServer-0

...

* Meaning of the two possible notations for the STATE SE-OU:

SE-OU*: unit is available for taking over the active role in case of

switchover

SE-OU : unit is not available for automatic recovery actions

For more MO state information, please use SCLI command:

show has state managed-object <MO_NAME>

2 Check all the computer addresses

show has functional-unit comp-addr-info

This command shows the following information:

Unit type (unit type)


LOG_ADDR (logical address)
WOSP (physical address or group address of the working and spare units)
WO (physical address or group address of the working unit)
SP (physical address or group address of the spare unit)

Step example
Example: Following is the execution printout of the show has functional-unit
comp-addr-info SCLI command.

© 2023 Nokia. Nokia confidential 39


Unit type LOG_ADDR WOSP WO SP

--------- -------- ------ ------ ------

0x0002 0x4002 0x8003 0x0000 0x0008

0x0002 0x4016 0x8002 0x0000 0x0008

0x0147 0x444C 0x8004 0x0108 0x0100

0x0149 0x444E 0x8008 0x8009 0x800A

0x0149 0x444F 0x800B 0x0001 0x1FFF

0x0149 0x4450 0x800C 0x0009 0x1FFF

0x0124 0x4AAD 0x800E 0x800E 0x1FFF

0x0124 0x4AAE 0x1301 0x1301 0x1FFF

0x0124 0x4AAF 0x1304 0x1304 0x1FFF

0x0124 0x4AB0 0x1309 0x1309 0x1FFF

0x0124 0x4AB1 0x130C 0x130C 0x1FFF

3 Check all the group addresses

show has functional-unit group-addr-info

This command shows following the information:

GROUP_PHYS_ADDR (group physical address)


Physical address list (physical address list)

Step example
Example: Following is the execution printout of the show has functional-unit
group-addr-info SCLI command.

GROUP_PHYS_ADDR Physical address list

--------------- --------------------------------------------------------

0x8002 0x0000 0x0008

0x8003 0x0000 0x0008

0x8004 0x0108 0x0100

0x8005 0x0200 0x0208

0x8006 0x0308 0x0300

0x8007 0x0400 0x0201 0x0402 0x0703 0x0204 0x0305 0x0306 0x0707

0x0408 0x0209 0x030A 0x070B 0x020C 0x030E 0x070F

0x8008 0x0001 0x0009

0x8009 0x0001 0x0009

40 © 2023 Nokia. Nokia confidential


4 Check the unit type information

show has functional-unit unit-type-info

This command shows the following information:

UNIT_TYPE (unit type)


UNIT_TYPE_VALUE (unit type value)
LOG_ADDR (logical address)
PRIM_LOG_ADDR (primary logical address)
PHYS_ADDR (physical address)
REDUNDANCY (redundancy model)

Step example
Example: Following is the execution printout of the show has functional-unit
unit-type-info SCLI command.

Unit type Type value LOG_ADDR PRIM_ADDR PHYS_ADDR Redundancy


--------- ---------- -------- --------- --------- ----------
OMU 0x0002 0x4002 0x4002 0x8003 2N
CSUP 0x0124 0x4AAD 0x4AAE 0x800E NoBackup
USCP 0x0125 0x4ADA 0x4ADB 0x800F SN+
USUP 0x0126 0x4B1B 0x4B1C 0x8011 NoBackup
EITP 0x0127 0x4B5C 0x4B5D 0x8022 NoBackup
EITPPXY 0x0128 0x4B6D 0x4B6E 0x8012 NoBackup
CSUPPXY 0x0129 0x4B7E 0x4B7F 0x800D NoBackup
USUPPXY 0x012A 0x4BAB 0x4BAC 0x8010 SN+
CFCP 0x0147 0x444C 0x444C 0x8004 2N
CSCP 0x0149 0x444E 0x444F 0x8008 N+M(2+2)
QNUP 0x0162 0x4BEC 0x4BED 0x8013 2N*M
QNIU 0x0163 0x4BFD 0x4BFE 0x801D 2N*M
QNIUB 0x0164 0x4C0E 0x4C0F 0x8018 2N*M
SCLIU 0x05E2 0x4C30 0x4C31 0x8005 NoBackup
FUM 0x05F0 0x4C33 0x4C33 0x8006 2N
FUA 0x05F1 0x4C34 0x4C35 0x8007 NoBackup

5 Check the information of the unit in a specific type and index

show has functional-unit unit-info unit-type <unit-type> unit-

© 2023 Nokia. Nokia confidential 41


index <unit-index>

Step example
Example: Following is the execution printout of the show has functional-unit
unit-info unit-type OMU unit-index 0 SCLI command.

Unit name LOG_ADDR PHYS_ADDR State Redundancy RU_MONAME


---------- -------- --------- ----- ---------- ------------
---------
OMU-0 0x4002 0x0000 WO-EX 2N
/CFPU-0/QNOMUServer-0

* Meaning of the two possible notations for the STATE SE-OU:


SE-OU*: unit is available for taking over the active role in case of
switchover
SE-OU : unit is not available for automatic recovery actions
For more MO state information, please use SCLI command:
show has state managed-object <MO_NAME>

6 Check the computer address of the units in specific types

show has functional-unit comp-addr-info unit-type <unit-type>

Step example
Example: Following is the execution printout of the show has functional-unit
comp-addr-info unit-type OMU SCLI command.

Unit type LOG_ADDR WOSP WO SP

--------- -------- ------ ------ ------

0x0002 0x4002 0x8003 0x0000 0x0008

0x0002 0x4016 0x8002 0x0000 0x0008

7 Check the information of a specified unit type

show has functional-unit unit-type-info unit-type <unit-type>

42 © 2023 Nokia. Nokia confidential


Step example
Example: Following is the execution printout of the show has functional-unit
unit-type-info unit-type OMU SCLI command.

Unit type Type value LOG_ADDR PRIM_ADDR PHYS_ADDR Redundancy

--------- ---------- -------- --------- --------- ----------

OMU 0x0002 0x4002 0x4002 0x8003 2N

8 Check the information of all the units in different display modes

show-mode is an optional parameter. You can use show-mode to choose different display
modes:

simple: Shows the basic unit information of the Unit Name, Logical Address, Physical
Address, State and Redundancy.
normal: Shows the unit information in normal mode, additionally shows the Managed
Object of Recovery Unit.
verbose: Shows the unit information in verbose mode, additionally shows the Managed
Object of Recovery Group.

To check the information of the unit in simple/ normal/ verbose mode, execute the following
command:

show has functional-unit unit-info show-mode


simple/normal/verbose

Example: Following is the execution printout of the show has functional-unit


unit-info show-mode simple SCLI command.

© 2023 Nokia. Nokia confidential 43


Unit name LOG_ADDR PHYS_ADDR State Redundancy
---------- -------- --------- ----- ----------
OMU-0 0x4002 0x0200 WO-EX 2N
OMU-1 0x4002 0x0008 SP-EX 2N
CSUP-0 0x4AAE 0x1201 WO-EX NoBackup
...
USCP-0 0x4ADB 0x0102 WO-EX SN+
...
USUP-0 0x4B1C 0x1302 WO-EX NoBackup
...
EITP-0 0x4B5D 0x1703 WO-EX NoBackup
...
EITPPXY-0 0x4B6E 0x0003 WO-EX NoBackup
...
CSUPPXY-0 0x4B7F 0x0101 WO-EX NoBackup
...
USUPPXY-0 0x4BAC 0x0202 WO-EX SN+
...
CFCP-0 0x444C 0x0100 WO-EX 2N
CFCP-1 0x444C 0x0108 SP-EX 2N
CSCP-0 0x444F 0x0001 WO-EX N+M
...
QNUP-0 0x4BED 0x0103 WO-EX 2N*M
...
QNIU-0 0x4BFE 0x0503 SP-EX 2N*M
...
QNIUB-0 0x4C0F 0x0303 WO-EX 2N*M
...
SCLIU-0 0x4C31 0x0200 WO-EX NoBackup
SCLIU-1 0x4C32 0x0208 WO-EX NoBackup
FUM-0 0x4C33 0x0300 WO-EX 2N
FUM-1 0x4C33 0x0308 SP-EX 2N
FUA-0 0x4C35 0x0400 WO-EX NoBackup
...

Example: Following is the execution printout of the show has functional-unit


unit-info show-mode normal SCLI command.

44 © 2023 Nokia. Nokia confidential


Unit name LOG_ADDR PHYS_ADDR State Redundancy RU_MONAME
---------- -------- --------- ----- ---------- ----------
-----------------
OMU-0 0x4002 0x0000 WO-EX 2N
/CFPU-0/QNOMUServer-0
OMU-1 0x4002 0x0008 SP-EX 2N
/CFPU-1/QNOMUServer-1
CSUP-0 0x4AAE 0x1201 WO-EX NoBackup /CSUP-0
...
USCP-0 0x4ADB 0x0102 WO-EX SN+
/USPU-0/QNUSCPServer-0
...
USUP-0 0x4B1C 0x1302 WO-EX NoBackup /USUP-0
...
EITP-0 0x4B5D 0x1703 WO-EX NoBackup /EITP-0
...
EITPPXY-0 0x4B6E 0x0003 WO-EX NoBackup
/EIPU-0/QNEITPProxyServer-0
...
CSUPPXY-0 0x4B7F 0x0101 WO-EX NoBackup
/CSPU-0/QNCSUPProxyServer-0
...
USUPPXY-0 0x4BAC 0x0202 WO-EX SN+
/USPU-0/QNUSUPProxyServer-0
...
CFCP-0 0x444C 0x0100 WO-EX 2N
/CFPU-0/QNCFCPServer-0
CFCP-1 0x444C 0x0108 SP-EX 2N
/CFPU-1/QNCFCPServer-1
CSCP-0 0x444F 0x0001 WO-EX N+M
/CSPU-0/QNCSCPServer-0-0
...
QNUP-0 0x4BED 0x0103 WO-EX 2N*M
/EIPU-0/QNUPServer-0-0
...
QNIU-0 0x4BFE 0x0503 SP-EX 2N*M
/EIPU-0/QNIUServer-0-0
...

© 2023 Nokia. Nokia confidential 45


QNIUB-0 0x4C0F 0x0303 WO-EX 2N*M
/EIPU-0/QNIUBServer-0-0
...
SCLIU-0 0x4C31 0x0200 WO-EX NoBackup
/CFPU-0/QNSCLIUServer-0
SCLIU-1 0x4C32 0x0208 WO-EX NoBackup
/CFPU-1/QNSCLIUServer-1
FUM-0 0x4C33 0x0300 WO-EX 2N
/CFPU-0/IL_FUMServer-0
FUM-1 0x4C33 0x0308 SP-EX 2N
/CFPU-1/IL_FUMServer-1
FUA-0 0x4C35 0x0400 WO-EX NoBackup
/CFPU-0/IL_FUAServer-0
...

* Meaning of the two possible notations for the STATE SE-OU:


SE-OU*: unit is available for taking over the active role in case of
switchover
SE-OU : unit is not available for automatic recovery actions
For more MO state information, please use SCLI command:
show has state managed-object <MO_NAME>

9 Check the information of a specific unit type in different display modes

show-mode is an optional parameter.

To check the information of a specific unit displayed in different types, execute the following
command:

show has functional-unit unit-info show-mode


simple/normal/verbose unit-type <unit-type>

Example: Following is the execution printout of the show has functional-unit


unit-info show-mode simple unit-type OMU SCLI command.

Unit name LOG_ADDR PHYS_ADDR State Redundancy

---------- -------- --------- ----- ----------

OMU-0 0x4002 0x0000 WO-EX 2N

OMU-1 0x4002 0x0008 SP-EX 2N

46 © 2023 Nokia. Nokia confidential


Example: Following is the execution printout of the show has functional-unit
unit-info show-mode normal unit-type OMU SCLI command.

Unit name LOG_ADDR PHYS_ADDR State Redundancy RU_MONAME


---------- -------- --------- ----- ---------- ------------
---------
OMU-0 0x4002 0x0000 WO-EX 2N
/CFPU-0/QNOMUServer-0
OMU-1 0x4002 0x0008 SP-EX 2N
/CFPU-1/QNOMUServer-1

* Meaning of the two possible notations for the STATE SE-OU:


SE-OU*: unit is available for taking over the active role in case of
switchover
SE-OU : unit is not available for automatic recovery actions
For more MO state information, please use SCLI command:
show has state managed-object <MO_NAME>

Example: Following is the execution printout of the show has functional-unit


unit-info show-mode verbose unit-type OMU SCLI command.

Unit name LOG_ADDR PHYS_ADDR State Redundancy RU_MONAME


RG_MONAME
---------- -------- --------- ----- ---------- ------------
--------- ---------
OMU-0 0x4002 0x0000 WO-EX 2N
/CFPU-0/QNOMUServer-0 /QNOMU
OMU-1 0x4002 0x0008 SP-EX 2N
/CFPU-1/QNOMUServer-1 /QNOMU

* Meaning of the two possible notations for the STATE SE-OU:


SE-OU*: unit is available for taking over the active role in case of
switchover
SE-OU : unit is not available for automatic recovery actions
For more MO state information, please use SCLI command:
show has state managed-object <MO_NAME>

© 2023 Nokia. Nokia confidential 47


3.3 Checking and changing the state of a Managed Object

Use the SCLI commands show has and set has to check and change the states of the MOs.

Purpose
You can use the show has commands to check the following attributes:

state attributes (administrative, operational and usage state)


status attributes (alarm, procedural, availability and unknown status)
role attributes

You can use the set has commands to execute the following operations:

lock
unlock
shutdown

When using the set has commands for locking, unlocking, or shutting down an MO, you can
change only the administrative state of an MO.

Locking an MO locks all the normal application RUs of the MO, but leaves the RUs that provide
mandatory services unlocked (for example, HAS). Locking is an ungraceful (forced) way to force a
quick termination of all the processes on the MO. Therefore, you must perform the locking
command with care. The locking command can be used in cases such as hardware maintenance,
hardware replacement, and signaling configuration.

A graceful shutdown ensures that the connections are closed properly and the buffers flushed to
the disk without any data losses. The shutdown command can be used in cases such as hardware
maintenance or hardware replacement.

48 © 2023 Nokia. Nokia confidential


Notice:
Do not lock FUM or FUA functional unit because it provides important services in the
system.

Lock and shutdown operations will lead to taking the MO out of use, which may bring the
following impacts to the whole system:

failure of the on-going call


reboot of the node
instability of the system

Note also that the system does not automatically unlock a locked MO because lock and
shutdown are administrative operations conducted by the operating personnel. Special
care shall be taken when locking a complete RG. If the RG is locked when the system is
restarted, it may prevent the system from starting up.

Before executing the lock and shutdown operations, check the redundancy mode and state
of the recovery units that you want to lock or shut down. If you want to lock or shut down
the recovery units in hot standby and the N+M redundancy mode, make sure the states of
the other units within the same Recovery Group are UNLOCKED (administrative state) and
ENABLED (operational state). To check if the RU is in UNLOCKED and ENABLED states,
enter the following command: show has state managed-object /<mo-name>.

Before you start

Ensure that you have sufficient user permission.

Procedure
1 Check the state of an MO.

Use the command show has to view the state of an MO.

Note:
In the show has command, the MO can be either a cluster, a node, an RG, an RU, or
a process.

To check all the states and status attributes of an MO, enter either of the following

© 2023 Nokia. Nokia confidential 49


commands:

show has state managed-object /<mo-name>


show has state managed-object-full-name /<mo-fullname>

In the command syntaxes above, <mo-name> is the name of the Managed Object and
<mo-fullname> is the Managed Object in distinguished name (DN) format.

To check only some of the state attributes of an MO, execute the following command by
adding the state and status attributes as option:

show has state [administrative] [operational] [usage] [procedural] [availability]

[unknown] [alarm] [role] [dynamic] [dynamic-hidden] {managed-object /<mo-name> |

logical-group <logical-group-name> | managed-object-full-name <mo-fullname>}

Example: Checking the state of the USPU-0

To check the state of the USPU-0, enter the following command:

show has state managed-object /uspu-0

Expected outcome

If the checking of the USPU-0 succeeds, the following output is displayed:

OBJECT ADMINISTRATIVE OPERATIONAL USAGE ROLE PROCEDURAL DYNAMIC

/USPU-0 UNLOCKED ENABLED ACTIVE - - -

2 Change the administrative state of an MO.

Use the command set has to lock, unlock and shut down an MO.

Note:
In the set has command, the MO can be either the whole cluster, a node, an RG or
an RU, but not a process.

To lock an MO, enter the following command:


set has lock managed-object /<mo-name>
Example: Locking the node USPU-0
set has lock managed-object /uspu-0

50 © 2023 Nokia. Nokia confidential


Expected outcome
If the command for locking the USPU-0 succeeds, the following output is displayed:

/USPU-0 locked successfully

To unlock an MO, enter the following command:


set has unlock managed-object /<mo-name>
Example: Unlocking the node USPU-0
set has unlock managed-object /uspu-0
Expected outcome
If the command for unlocking the USPU-0 succeeds, the following output is displayed:

/USPU-0 unlocked successfully

Notice:
It is not recommended to unlock all the MOs.
After entering the command set has unlock all, the unlocking of the
Recovery Group /QNEM can cause the overload of the hard disk.
If you want to execute the command set has unlock all, enter the following
command after unlocking all the MOs:
set has lock managed-object /QNEM

To shut down an MO gracefully, enter the following command:


set has shutdown [timeout <timeout-value>] managed-object
/<mo-name>
You can give the parameter timeout in ‘s’ for seconds, ‘m’ for minutes; for example, 180s
or 3m. When no time unit is given for the timeout value, the default time unit for the
timeout value is ‘s’.

Notice:
To shut down the USPU-related MOs in SN+ redundancy mode (USPU node,
QNUSCPServer and QNUSUPProxyServer), you must give the timeout value. The
recommended timeout value is 180 seconds.

The set has shutdown command waits until the MO shuts itself down successfully. If
the shutdown operation has not finished before the expiration of the timeout, then HAS
forces the MO to change into the LOCKED state.

© 2023 Nokia. Nokia confidential 51


Example: Shutting down the node USPU-0
To shut down the node USPU-0 after 180 seconds, enter the following command:
set has shutdown timeout 180s managed-object /uspu-0
Expected outcome
If the graceful shutdown for USPU-0 succeeds, the following output is displayed:

/USPU-0 shutdown successfully

3.4 Checking dependencies between MOs

The HAS SCLI command show has dependencies allows checking the dependencies
between different RGs.

Purpose
The dependencies include:

startup
parasite
stalker
symbiot
local dependency
global dependency

Before you start

Ensure that you have sufficient user permission.

▪ Check the dependencies between the RGs.

To view the dependencies between the RGs, enter the following command:

show has dependencies

Result
Expected outcome

52 © 2023 Nokia. Nokia confidential


If checking the dependencies between the RGs succeeds, the following output is displayed:

© 2023 Nokia. Nokia confidential 53


QNCSCP-0-1:

logical group members: /QNCSCP-1 /QNCSCP-0

/QNIUB-0:

start after local: /QNEITPProxy

start after global: /QNUP-0 and /QNCSCP-0 or /QNCSCP-1

/QNCSCP-0:

start after local: /QNCSUPProxy

start after global: /QNOMU or /QNCFCP

/QNUSCP-0:

start after local: /QNUSUPProxy

/QNUP-0:

start after local: /QNEITPProxy

/QNIU-0:

start after global: /QNUSCP-0 or /QNUSCP-1 and /QNUP-0

/QNIUB-1:

start after local: /QNEITPProxy

start after global: /QNUP-1 and /QNCSCP-0 or /QNCSCP-1

/QNCSCP-1:

start after local: /QNCSUPProxy

start after global: /QNOMU or /QNCFCP

/QNUSCP-1:

start after local: /QNUSUPProxy

/QNUP-1:

start after local: /QNEITPProxy

/QNIU-1:

start after global: /QNUSCP-0 or /QNUSCP-1 and /QNUP-1

/QNIUB-2:

54 © 2023 Nokia. Nokia confidential


start after local: /QNEITPProxy

start after global: /QNUP-2 and /QNCSCP-0 or /QNCSCP-1

/QNUP-2:

start after local: /QNEITPProxy

/QNIU-2:

start after global: /QNUSCP-0 or /QNUSCP-1 and /QNUP-2

/QNIUB-3:

start after local: /QNEITPProxy

start after global: /QNUP-3 and /QNCSCP-0 or /QNCSCP-1

/QNUP-3:

start after local: /QNEITPProxy

/QNIU-3:

start after global: /QNUSCP-0 or /QNUSCP-1 and /QNUP-3

/BFD:

start after local: /NetworkManager

/QNOMU:

parasites: /QNHTTPD /IPSecRedundant

start after global: /QNFUM and /QNDBRNW

/QNHTTPD:

parasite of: /QNOMU

start after global: /QNOMU

/CDAfs:

parasites: /PM9Fuse

/PM9Fuse:

parasite of: /CDAfs

/Log:

parasites: /Tracing /EswMan /HPIMonitor /AlarmSystemLight

© 2023 Nokia. Nokia confidential 55


stalker of: /SSH

/Tracing:

parasite of: /Log

/SSH:

parasites: /QNSNMP

stalkers: /Log /SWMServer

/QNFUA:

parasites: /QNFUM

/QNFUM:

parasite of: /QNFUA

/EswMan:

parasite of: /Log

/QNCFCP:

start after global: /QNOMU

/QNSNMP:

parasite of: /SSH

/SWMServer:

stalker of: /SSH

/SGWNetMgr:

start after global: /Directory

/HPIMonitor:

parasite of: /Log

/AlarmSystemLight:

parasite of: /Log

/IPSecRedundant:

parasite of: /QNOMU

56 © 2023 Nokia. Nokia confidential


/SS7SGU:

start after global: /SGWNetMgr and /Directory

/QNSCLIU:

start after global: /QNOMU

/QNEITPProxy:

start after global: /QNCFCP and /BFD

/QNCSUPProxy:

start after global: /QNOMU and /QNCFCP

/QNUSUPProxy:

start after global: /QNOMU and /QNCFCP

3.5 Checking summary of MOs

The HAS SCLI command show has summary managed-object shows the number of RUs
and the processes of the MOs.

Before you start

Ensure that you have sufficient user permission.

▪ Check the summary of the MO.

To view the summary of the MO, enter the following command:

show has summary managed-object /<mo-name>

The output of this command shows the following information:

The number of recovery units


The number of unlocked recovery units
The number of processes
The number of unlocked processes

© 2023 Nokia. Nokia confidential 57


Example
To view the summary of the RG /Directory, enter the following command:

show has summary managed-object /Directory

If checking the summary of the RG /Directory succeeds, the following output is displayed:

RU status for RG /Directory

RUs in configuration : 2

Unlocked RUs : 2

Process status

Processes in configuration : 6

Unlocked processes : 6

Note:
To automatically list the available MO names:

For slash-separated (/) format, press Tab after managed-object.


For DN format, replace managed-object by managed-object-full-name and
press Tab.

3.6 Performing a controlled switchover

Controlled switchover is one of the recovery actions provided by HAS. It makes the previous
standby RU become the new active RU.

Purpose
A controlled switchover can fail if the application is either busy or the controlled switch-over
timeout is short. A controlled switchover request is automatically turned into a forced switchover
request if the RG does not support a controlled switchover or the controlled switchover fails.

58 © 2023 Nokia. Nokia confidential


Note:
Even if a cold active/standby RG does not support a controlled switchover, the old active
RU is shutdown gracefully before the new active RU is started up.

The controlled switchover timeout indicates the maximum waiting time for the synchronization
between the active and standby units. The configuration defines a reasonable default for the
controlled switchover timeout for each RG. The operator may overwrite the default timeout value
when the controlled switchover request is issued.

Note:
It is not recommended to give the timeout value for the controlled switchover, for it may
cause the failing of the operation.

You can also specify the new active RU for the controlled switchover between the active/standby
RUs in an RG. To specify the new active RU, enter the following command with the short or long
version of the options:

set has [HAS OPTION] switchover [OPTION] new-active-recovery-unit


/<ru-name>

<ru-name> is the name of the new active RU specified.

Before you start

Before performing the controlled switchover, make sure the RG consisted of the target RU is
running on the active/standby redundancy model, and the combined states of both the active and
standby RUs are UNLOCKED, ENABLED.

Ensure that you have sufficient user permission.

▪ Perform the controlled switchover.

To execute a controlled switchover between the active/standby RUs in an RG, enter the
following SCLI command with either the short or long version of the option(s):

set has [HAS OPTION] switchover [OPTION] managed-object /<mo-


name>

Replace <mo-name> with the name of the RG on which you want to perform the switchover.

© 2023 Nokia. Nokia confidential 59


Notice:
Do not perform controlled switchover to all the/CSCPRG-* RGs at the same time,
which can cause the failing of the operation.

The different generic options for the has command are listed in the following table.

Table 7: Generic options for the has command

HAS Option Description

filter MOType[,MOType...]
The MOs are filtered by their type (filter is an
input filter).
The different types of MO are:
• RG
• RU

logerrors
It enables logging of error messages in
syslog.

noerror
The switchover is continued even if the
command fails with a particular MO name(s).

regex
The regular expressions must be used in MO
names. The regular expression must be
inside the quotation marks (“ “).

The different options for controlled switchover are listed in table 8:

60 © 2023 Nokia. Nokia confidential


Table 8: Options for controlled switchover

Option Description

force
This option is used for critical MO.The
warnings are not printed on the screen.

Note:

It is not recommended to give this


parameter for the controlled switchover.

noblock
It is not required to stop waiting for HAS to
complete the operation.

timeout <timeout-value>
It specifies the timeout duration for the
command. The duration can be in seconds
(s), minutes (m), hours (h) or days (d). The
units are not case sensitive. By default, the
timeout value is in seconds.
The allowed value ranges from 1 second (s)
to 24836.49 days (d). No warning is displayed
with this parameter.

Expected outcome

If the switchover command succeeds, the following output is displayed:

<RG mo-name> controlled switchover successful; New ACTIVE RU is


<ru-name>

Unexpected outcome

The possible reasons for the failure of the switchover operation are:

incorrect parameters
dependencies between MOs
invalid state of the system

If the switchover command fails, the printout displays depending on the cause of the error.
An example is given as follows:

© 2023 Nokia. Nokia confidential 61


Pattern <mo-name> does not match to any managed object.

Unknown Managed Object: <mo-name>

Note:
To automatically list the available MO names:

For slash-separated (/) format, press Tab after managed-object.


For DN format, replace managed-object by managed-object-full-name
and press Tab.

3.7 Performing a forced switchover

The forced switchover is one of the recovery actions. Forced switchover forces the RUs in an
active/standby pair to exchange their roles. The switchover can be executed when the target MO
is in a state expected by the redundancy model.

Purpose
As part of performing a forced switchover, it is possible to specify the name of the new active RU.
If the newly specified RU is already active, then no action is taken. If it is in standby, then a
switchover occurs.

Notice:
It is not recommended to perform the forced switchover, for it can cause the dropping of
the ongoing call.

You can also specify new active RU for the forced switchover between the active and standby RUs
in an RG. To specify the new active RU, enter the following command with short or long version of
the options:

set has [HAS OPTION] forcedswitchover [OPTION] new-active-recovery-


unit /<ru-name>

<ru-name> is the name of the new active RU specified.

62 © 2023 Nokia. Nokia confidential


Before you start

Ensure that you have sufficient user permission.

▪ Perform a forced switchover.

To execute a forced switchover between the active/standby RUs in an RG, enter the following
SCLI command with either the short or long version of the option(s):

set has [HAS OPTION] forcedswitchover[OPTION] managed-object


/<mo-name>

Replace <mo-name> with the name of the RG on which you want to perform the switchover.

The different generic options for the has command are listed in the following table:

Table 9: Generic options for the has command

HAS Option Description

filter MOType[,MOType...]
The MOs are filtered by their type (filter is an
input filter).
The different types of MOs are:
• RG
• RU

logerrors
It enables logging of error messages in
syslog.

noerror
The switchover is continued even if the
command fails with a particular MO name(s).

regex
The regular expressions must be used in MO
names. The regular expression must be
inside the quotation marks (“ “).

The different options for forced switchover are listed in the following table:

© 2023 Nokia. Nokia confidential 63


Table 10: Options for forced switchover

Option Description

force
This option is used for critical MO.

Note:

It is not recommended to give the force


parameter for the forced switchover
operation.

noblock
It is not required to stop waiting for HAS to
complete the operation.

Expected outcome

If the switchover command succeeds, the following output is displayed:

<RG mo-name> switchover was successful. New ACTIVE RU is now


<ru-name>

Unexpected outcome

Possible reasons for the failure of the switchover operation are as follows:

incorrect parameters
dependencies between MOs
invalid state of the system

If the switchover command fails, then the printout is one of the following depending on the
cause of the error:

Pattern <mo-name> does not match to any managed object.

Unknown Managed Object: <mo-name>

Unable to perform forced switchover for <mo-name>: Standby RU startup is

currently prevented by a dependency.

64 © 2023 Nokia. Nokia confidential


<mo-name> is a critical entity in the system. This operation cannot be executed

unless forced with switch -F (or --force).

Unable to perform forced switchover; Standby RU is not ready

Error: The specified RG is not defined to run an active/standby redundancy

policy.

action returned with exit code 1

The operation is not possible at the moment. Cluster manager functionality is

unavailable.

3.8 Powering off a node

You can power off a node to ensure that a suspected HW-faulty node does not cause any
disturbance to rest of the system, and also for power saving if capacity is in excess for the actual
need.

Before you start

Ensure that you have sufficient user permission to execute the SCLI commands needed in the
procedure.

Procedure
1 Gracefully shutdown the node to be powered-off.

To gracefully shutdown, enter the following SCLI command:

set has shutdown force timeout <timeout-value> managed-object /


<node-name>

The timeout parameter defines the time that is given for the ongoing services to shut
down gracefully, after which the services that are still running are terminated by force. As a
result of this command, the administrative state of the node is set to LOCKED.

© 2023 Nokia. Nokia confidential 65


Note:
To ensure termination of all services in a reasonable time, you must give a
timeout value. The recommended timeout value is 5 minutes.

The status is checked with the following command:

show has state managed-object /<node-name>

2 Powering off a node.

To power off a node, enter the following command:

set has power off managed-object /<node-name>

The reason for powering off can be specified by using the following SCLI command:

set has power off power-off-cause [cause] managed-object /<node>

You can optionally specify one of the following reasons for powering it off:

automatic-isolation: The node is powered off for automatic isolation.


debugging: The node is powered off for debugging.
hardware-failure: The node is powered off, because it is broken.
maintenance: The node is powered off for maintenance.
power-saving: The node is powered off for saving power, since the current load does not
require all the nodes operating.

The powering off reason is shown as a dynamic attribute of the node that is powered off and
its current value can be viewed by running the following SCLI command:

show has state managed-object /<node>

Expected outcome

If the powering off succeeds, the following output is displayed:

<node-name> is powered OFF successfully

Unexpected outcome

Possible reasons for the failure of the powering off operation are, for example, incorrect
parameters or an invalid state of the system.

If the power-off command fails, the following output is displayed:

66 © 2023 Nokia. Nokia confidential


<node-name>:Error: Cannot power-off <node-name>; Node/cluster
must be LOCKED before it can be powered off.

Or the following error message is displayed:

Error: Only nodes and the cluster can be powered off.

3.9 Powering off a BCN module

This chapter describes the procedure to gracefully power off a BCN module.

Purpose
To gracefully power off the nodes, LMP, and motherboard contained in a BCN module.

Before you start

Enter the SCLI shell.


Acquire the configuration lock to prevent parallel power off of multiple BCN modules. To
acquire the configuration lock, enter the following command:

set config-mode exclusive

After acquiring the configuration lock, if the power off command is tried to be executed in
parallel on more than one BCN module, following error is displayed:

The command was not executed. You or another user is currently holding the

configuration lock in another session, blocking all configuration changes.

To release the configuration lock after BCN module power off, enter the following command:

set config-mode off

Note:
For powering on the BCN module, you must have physical access to the BCN module.

▪ Power off the BCN module gracefully.

© 2023 Nokia. Nokia confidential 67


To power off the BCN LMP and the nodes contained in the same BCN box, enter the
following command:

set hardware power off lmp <lmp> {[force]}

The parameters for this command are described in the following table:

Table 11: Parameters for the set hardware power off command

Parameter Description

lmp Specifies the LMP (Local Management


Processor) and also the motherboard on
which it is located.

force
Specifies the forceful power off of the LMP
(motherboard). This option is used in
scenarios where LMP power off results in
service downtime, such as powering off a
LMP hosting a management node.

Note:
When this command is executed, the following alarm is raised:

70701 GRACEFUL POWER OFF OF MOTHERBOARD INITIATED

This alarm must be cleared manually.

Step example
To power off the LMP-1-3-1 and the nodes contained in the same BCN box, enter the
following command:

set hardware power off lmp LMP-1-3-1

If the power off is successful, the following output is displayed:

68 © 2023 Nokia. Nokia confidential


Note: This command might take few minutes.

Please be patient.

LMP powered off successfully.

Step example
If the LMP to be powered off hosts a management node (CFPU) and the power off command
is given without force option, the following output is displayed:

Command failed as LMP to be powered off hosts a management node. Please retry the

command with “force” option.

Step example
To power off the LMP-1-1-1 with force option, enter the following command:

set hardware power off lmp LMP-1-1-1 force

The following output is displayed:

WARNING: Forced LMP power off could result in ungraceful shutdown and power off

nodes hosted by LMP.

Your consent is needed to proceed with forced LMP power off [y/n]

Step example
If graceful shutdown of node hosted by LMP that is being powered off fails, the following
output is displayed:

Note: This command might take few minutes.

Please be patient.

WARNING: Graceful shutdown of node EIPU-0 has failed.

LMP successfully powered off.

Step example
If powering off the node hosted by LMP that is being powered off fails, the following output
is displayed:

© 2023 Nokia. Nokia confidential 69


Note: This command might take few minutes.

Please be patient.

WARNING: Powering off EIPU-0 has failed.

LMP successfully powered off.

Step example
If the command is executed from a non-active SSH node, the following output is displayed:

Command executed from a node on which SSH service is inactive. Please retry

executing this command from a node on which SSH service is active.

Step example
If LMP power off fails due to internal error, the following output is displayed:

Failed to power off LMP-1-3-1. Try the command again. If the problem persists,

check the system log file for the error details and contact your local customer

support.

Step example
If the LMPs are tried to be powered off in parallel without first acquiring the configuration
lock, the following output is displayed:

This command cannot be executed, as LMP power off is already ongoing. Please try

after some time.

3.10 Powering off a cluster

This chapter describes the procedure to power off the whole cluster (mcRNC network element),
for example for moving the cluster physically from one place to another.

Purpose

70 © 2023 Nokia. Nokia confidential


Note:
Do not power off the whole cluster for any normal maintenance operation.

Before you start

When the BCN cluster is shut down, the SSH connection is lost and the cluster is accessible only
through the console ports. Therefore, for conducting the procedure for powering off the cluster,
connect to LMP-1-1-1 or LMP-1-2-1 through the management interface and enter the command
telnet 0 3001 to connect to the CFPU node. After that, start the SCLI session by entering the
command fsclish.

Procedure
1 Shut down the cluster.

To shut down the cluster gracefully, enter the following SCLI command:

set has shutdown force timeout <timeout-value> managed-object /

The timeout parameter defines the time that is given for the ongoing services to shut
down gracefully, after which the services that are still running are terminated by force. As a
result of this command, the administrative state of all nodes is set to LOCKED.

Note:
To ensure termination of all services in a reasonable time, you must give a
timeout value. The recommended timeout value is 5 minutes.

The status is checked with the following command:

show has state managed-object /*PU*

2 Power off the nodes.

To power off the add-in cards (nodes) of the cluster, enter the following SCLI command:

set has power off managed-object /

/ shutdown successfully

This command powers off all other nodes except the CFPU node that runs the cluster

© 2023 Nokia. Nokia confidential 71


management functionality (CMF). CMF supervises the status of the nodes. By powering off
the nodes with the above command the CMF becomes aware of the operation and stops the
supervision of the nodes. This way, the reporting of unnecessary supervision failures are
avoided.

Expected outcome

If the power-off succeeds, the following output is displayed:

root@CFPU-0 [RNC-614] >set has power off managed-object/

/is powered OFF successfully

The CFPU node (add-in card) running the CMF functionality remains powered on in this phase
and the LED indicating its status is green.

Enter the following command to check the state of all nodes:

show hardware state list

Note:
If you are connected to the CFPU which has the standby instance of the CMF
(FSClusterHAServer RU), it is disconnected after entering the command set has
power off managed-object /. In this case, the output /is powered OFF
successfully shows that the command execution status is missed. To continue
operation, to check the state of all nodes, connect through the other LMP to the CFPU
which has the active instance of the FSClusterHAServer RU.

3 Power off the BCN boxes.

For powering off the motherboard of each BCN box, switch off the power of each BCN box
by turning the power switches located in the separate PDU unit located in the first slot of the
rack.

3.11 Powering on a node

To power on and unlock a node that has been locked and powered-off earlier.

72 © 2023 Nokia. Nokia confidential


Before you start

Powering on a node by the HAS command can be executed only in a clustered environment.
Ensure the user is logged into the system with sufficient user permissions.

Procedure
1 Power on a node.

To power on a node, enter the following command:


set has power on managed-object <mo-name>

Step example
To power on the node USPU-0, enter the following SCLI command:
set has power on managed-object /USPU-0

Step result
The following output is displayed:

USPU-0 is powered ON successfully.

2 Unlock the node.

To unlock the node, enter the following command:

set has unlock managed-object <mo-name>

Step example
To unlock the node USPU-0, enter the following SCLI command:
set has unlock managed-object /USPU-0

3.12 Powering on a BCN module

This chapter describes the procedure to power on a BCN module.

© 2023 Nokia. Nokia confidential 73


Before you start

This procedure is applicable when the BCN module has been powered off as described in the
section, Powering off a BCN module.

Procedure
1 Disconnect and reconnect the power cable of the BCN module that is powered off.

2 Unlock all the nodes hosted by the BCN module.

set has unlock managed-object <mo-name>

3 Clear the 70701 alarm.

set alarm clear alarm-id <alarm-id>

3.13 Powering on a cluster

This chapter describes the procedure to power on a cluster.

Procedure
1 Power on the BCN boxes one by one in numerical order starting with box-1.

Power on the BCN boxes one by one at about 20-second intervals. It is suggested that all
BCN boxes are not powered on at the same time in order to prevent overload of the CFPU
node.

2 Wait until most of the nodes are operational.

Before proceeding, make sure that nodes with basic functions are booted up to operational
state. Make serial connection to CFPU as explained in Before you start in the chapter
Powering off a cluster.

Enter the following CLI command periodically in order to find out when these nodes are all
booted up:

74 © 2023 Nokia. Nokia confidential


show has summary managed-object /

Note:
After a certain node has booted up, it is no more shown in the list of Non-operational
nodes in the command output.

Expected outcome

# show has summary managed-object /

Node status

Nodes in configuration : 16

Unlocked nodes : 0

Non-operational nodes : 2

/EIPU-1

/EIPU-3

RG status

RGs in configuration : 68

Unlocked RGs : 0

RU status

RUs in configuration : 274

Unlocked RUs : 0

Process status

Processes in configuration : 1522

Unlocked processes : 0

3 Unlock the cluster.

Enter the following SCLI command to unlock the cluster:

set has unlock managed-object /

Expected outcome (example)

root@CFPU-0 [RNC-614]>set has unlock managed-object / / is


unlocked successfully

4 Check that all RUs are in correct state.

Ensure that all RUs are in the intended state. Enter the following command to check the state

© 2023 Nokia. Nokia confidential 75


of RUs:

show has state managed-object /*/*

Expected outcome (example)

All RUs should be UNLOCKED and ENABLED.

3.14 Restarting an MO

This chapter describes the procedure to restart an MO for recovering the software process, the
cluster, the RU, the node, or the system from faults.

Purpose
Restart recovery action is performed either automatically by the HAS or as manual administrative
steps by the operator. In general, the restart action first performs an ungraceful (forced)
shutdown and then starts the MO.

Optionally, a graceful restart of the nodes is also supported. An optional time interval can be
specified at which the recovery units need to shut down their operations or switch over to
another node before the node is restarted.

When a node or RU is restarting, all the processes running on the node or RU are immediately
terminated before the node or RU is restarted.

Note:
If a node is restarted, it is possible that a forced switchover takes place, which may cause
the dropping of the ongoing call.

Do not perform the restarting operation without instructions.

In addition to the restarting methods introduced before, you can also restart cluster in a phased
manner. In the phased cluster restarting mode, the peer management node is activated before
the other nodes.

76 © 2023 Nokia. Nokia confidential


Note:
If an RU type HARecoveryUnit was locked prior to a cluster restart operation, it is
unlocked automatically after the restart of the cluster.

Before you start

Ensure that you have sufficient user permission.

▪ Restart an MO ungracefully.

To restart an MO, enter the following SCLI command:

set has restart managed-object /<mo-name>

Replace <mo-name> with the name of the MO you want to restart.

Step example
Example: Restarting the MO /EIPU-1/QNUPServer-1-3

set has restart managed-object /EIPU-1/QNUPServer-1-3

Note:
Note that restarting the RG /Directory in the cluster environment can interrupt
the secure shell (SSH) connection.

By default, Alarm 70186 is raised when you restart an MO, and cleared automatically when its
time to live expires. User can restart an MO without raising an alarm by using the following
SCLI command:

set has restart raise-alarm no managed-object /<mo-name>

Expected outcome

If the restart succeeds, the following outputs are displayed depending on the MO:

<mo-name> reset successful 1 (of 1) running RUs were terminated and are waiting

for restart.

© 2023 Nokia. Nokia confidential 77


<mo-name> is restarted successfully

Unexpected outcome

Possible reasons for the failure of the restart operation are, for example, incorrect
parameters or an invalid state of the system.

If the restart command fails, the output is one of the following, depending on the nature of
the error:

Pattern <mo-name> does not match to any managed object.

Unknown Managed Object: <mo-name>

The operation is not possible at the moment. Cluster manager functionality is

unavailable.

Node not accessible: <mo-name> cannot be reset.

action returned with exit code 1

▪ Restart an MO gracefully.

To restart an MO gracefully, enter the following command:

set has restart graceful timeout <timeout-value> managed-object /<mo-name>

The parameter timeout specifies the provided time interval at which recovery units (RU) are
to shut down their operations or switch over to another node before the node is restarted.

Note:
Only nodes support a graceful restart.

Step example
To restart a node USPU-0 gracefully with a timeout of 100 seconds, enter the following
command:

set has restart graceful timeout 100 managed-object /USPU-0

78 © 2023 Nokia. Nokia confidential


The following output is displayed:

Are you sure you want to proceed? [y/N]: y

/USPU-0 is gracefully restarted

Unexpected outcome

If the parameter timeout is used without the graceful restart command, following output is
displayed:

Incorrect timeout usage. Use -t only with shutdown, controlled

switchover,optimized cluster restart or graceful restart command!

If a wrong value is specified for the timeout parameter, following output is displayed for
this example:

Incorrect timeout defined, value: -1 seconds

The timeout should be at least 1 seconds

Check also if the given number is too big (max. 30 days)

▪ Restart a cluster in a phased manner.

To restart a cluster in a phased manner, enter the following command:

set has restart optimized managed-object /

3.15 Restarting a BCN module

To gracefully restart the nodes hosted by the BCN module and restart the BCN module, or to
restart all the BCN modules in the cluster.

Purpose
To gracefully restart the nodes hosted by the BCN module and restart the module.
To restart all BCN modules in the cluster. Nodes in the cluster are restarted ungracefully as
part of restarting all the modules.

© 2023 Nokia. Nokia confidential 79


Note:
Restarting all BCN modules is effectively a cluster restart, in which motherboard of each
BCN module is also restarted. It results in service downtime.

Before you start

Enter the SCLI shell.


Acquire the configuration lock to prevent parallel restart of multiple BCN modules.
To acquire the configuration lock, enter the following command:

set config-mode exclusive

After acquiring the configuration lock, if the restart command is tried to be executed in parallel
for more than one BCN module, following error is displayed:

The command was not executed. You or another user is currently holding the

configuration lock in another session, blocking all configuration changes.

To release the configuration lock after restart of BCN module, enter the following command:

set config-mode on

▪ Restart a BCN module or all modules.

To restart the BCN module and the nodes contained in the same BCN box, enter the
following command:

set hardware restart lmp <lmp> {[force]}

To restart all the BCN modules, enter the following command:

set hardware restart lmp all {[force]}

Note:
To restart all the BCN modules, the command must include force option.

The parameters for this command are described in the following table:

80 © 2023 Nokia. Nokia confidential


Table 12: Parameters for the set hardware restart command

Parameter Description

all or <lmp> Using all specifies all the motherboards in


the cluster and results in ungraceful restart
of all the LMPs and nodes. The parameter
<lmp> specifies the LMP (motherboard) in
the cluster and this results in graceful restart
of the LMP and the nodes contained in the
same BCN box.

force Specifies the forceful restart of LMP


(motherboard). This optional parameter is
used in scenarios where motherboard restart
results in service downtime, such as
restarting a motherboard that is hosting a
management node, or restarting all the
nodes in the cluster.

Step example
To restart the LMP-1-3-1 and the nodes contained in the same BCN box, enter the following
command:

set hardware restart lmp LMP-1-3-1

If the restart is successful, the following output is displayed:

Note: This command might take few minutes.

Please be patient

LMP restarted successfully

Note:
In the following examples, LMP-1-1-1 is used as reference LMP.

© 2023 Nokia. Nokia confidential 81


Step example
If the LMP to be restarted hosts a management node (CFPU) and the restart command is
given without force option, the following output is displayed:

Command failed as LMP to be restarted hosts a management node. Please try

executing the command with “force” option.

Step example
To restart the LMP-1-1-1 with force option, enter the following command:

set hardware restart lmp LMP-1-1-1 force

The following output is displayed:

WARNING: Forced LMP restart would result in ungraceful restart of Nodes hosted by

LMP.

Your consent is needed to proceed with forced LMP restart [y/N]:

Step example
If graceful restart of node hosted by the LMP that is being restarted fails, the following
output is displayed:

Note: This command might take few minutes.

Please be patient

WARNING: Graceful restart of node EIPU-0 has failed

LMP restarted successfully.

Step example
If the command is executed from a non-active SSH node, the following output is displayed:

Command executed from a node on which SSH service is inactive. Please retry

executing this command from a node on which SSH service is active.

Step example
If LMP restart fails due to internal error, the following output is displayed:

82 © 2023 Nokia. Nokia confidential


Failed to restart LMP-1-1-1. Try the command again. If the problem persists,

check the system log file for the error details and contact your local customer

support.

Step example
If the LMPs are tried to be restarted in parallel without first acquiring the configuration lock,
the following output is displayed:

This command cannot be executed, as LMP operation is already ongoing.

Please try after sometime.

Step example
To restart all LMPs in the cluster, enter the following command:

set hardware restart lmp all force

If the restart is successful, the following output is displayed:

© 2023 Nokia. Nokia confidential 83


Note: This command might take few minutes.

Please be patient.

WARNING: Restarting all LMPs will result in cluster restart with ungraceful

restart of all nodes hosted by LMPs.

Your consent is needed to proceed with forced restart of all the motherboards in

the cluster [y/n]: y

LMP-1-8-1 is restarted.

LMP-1-7-1 is restarted.

LMP-1-6-1 is restarted.

LMP-1-5-1 is restarted.

LMP-1-4-1 is restarted.

LMP-1-3-1 is restarted.

LMP-1-2-1 is restarted.

LMP-1-1-1 is restarted.

Step example
If the command is executed from a non-active SSH node, the following output is displayed:

Command executed from a node on which SSH service is inactive. Please retry

executing this command from a node on which SSH service is active.

Step example
If LMP is in the deployment and the actual hardware is missing, the following output is
displayed:

84 © 2023 Nokia. Nokia confidential


LMP-1-4-1 is restarted.

LMP-1-3-1 restart failed.

LMP-1-2-1 is restarted.

LMP-1-1-1 is restarted.

Step example
If the command is executed with all option and without the force option, the following
output is displayed:

Command failed to restart all motherboards in the cluster. Please retry the

command with “force” option.

Step example
If the command execution fails due to failure in the hardware management framework API,
the following output is displayed:

Failed to restart all the motherboards in the cluster. Try the command again. If

the problem persists, check the system log file for the error details and contact

your local customer support.

Step example
If the user interrupts the LMP or all LMP restart operation, a warning message is displayed as
follows:

WARNING: LMP operation is ongoing, hence interrupt is not allowed.

4. Troubleshooting recovery and unit working state


administration

© 2023 Nokia. Nokia confidential 85


4.1 USPU-related MO shutdown fails

When the USPU-related MOs (USPU node, QNUSCPServer and QNUSUPProxyServer) shut down
fail, you can follow the steps in this procedure to set timeout value for the ongoing shutdown
operation.

Description and symptoms


The USPU-related MOs cannot shut down successfully using set has shutdown [timeout
<timeout-value>] managed-object /<mo-name> command.

Solution
To shut down the USPU-related MOs that are of SN+ redundancy type, you should give a timeout
value in set has shutdown [timeout <timeout-value>] managed-object
/<mo-name> command. Otherwise, the MO will not shut down until all the data traffic is
processed.

Procedure
1 Interrupt the ongoing shutdown operation

To interrupt the ongoing shutdown operation, press Ctrl and C at the same time. The
ongoing shutdown operation is not terminated, the system returns to the command prompt.

2 Set timeout value for the ongoing shutdown operation

To set timeout value for the ongoing shutdown operation, enter the following SCLI
command:

set has shutdown timeout <timeout-value> managed-object /<mo-name>

The recommended timeout value is 180 seconds. If the shutdown operation has not finished
after the timer expires, HAS forces the MO to change to LOCKED state.

86 © 2023 Nokia. Nokia confidential


5. Appendix: Concept difference between mcRNC and IPA-
RNC

5.1 Recovery system in IPA-RNC

The recovery system in IPA-RNC is responsible for fault management, which utilizes a set of
recovery actions, to keep the NE as high availability as possible in various fault scenarios.

Here is the object model managed by the recovery system.

Figure 5: Recovery system model

Network element
The whole RNC network element.
System
It contains several functional units. IPA-RNC may contain two systems (original and trial) during
software upgrade.
Functional unit
Entity of hardware and software, or only hardware, capable of accomplishing a special purpose.
Most, but not all, functional units are under control of fault management.
Functional unit may be computer unit or non-computer unit.
Functional unit may be nested (a functional unit may contain functional units), namely,
functional unit has hierarchy structure.
Process

© 2023 Nokia. Nokia confidential 87


Process family running in computer unit.

5.2 HAS in mcRNC

HAS (High Availability Services) supports various administrative and recovery actions to enhance
the availability of the system. HAS has similar functionality as recovery system in IPA-RNC, but
many concepts are different.

Here is the object model that managed by HAS system. Each node in the chart is a managed
object (A logical entity representing one or more resources that can be the target of management
operations such as start, stop, restart, and shutdown). HAS uses a standard (ITU-T X.731) state
model to manage these managed objects.

Figure 6: HAS system model

Cluster
The whole RNC network element.
Node
A composite managed object used to represent a physical node in a cluster. It is composed of
recovery units.
Recovery unit
A managed object used to represent a part of service (recovery group) that runs on a particular
node. It is composed of processes.
Recovery group
A managed object used to represent a service provided by the cluster, specifically, it is not
associated with any particular node on the cluster. It is composed of recovery units.
Process

88 © 2023 Nokia. Nokia confidential


The most basic managed object in HAS system model. It has a one-to-one mapping with an OS
process. Processes can be composed together to form a recovery unit.

5.3 Concept mapping between recovery unit and functional


unit

In mcRNC, each recovery unit containing RNC application and/or IPA Light software is mapped
to corresponding functional unit, and its ITU-T X.731 state is mapped to functional unit state.
In mcRNC, each Simple Executive (SE) node is mapped to corresponding functional unit.
Conceptually SE node is similar to some non-computer unit type in IPA-RNC. It is not managed
by HAS, but is supervised by a proxy recovery unit, operational state of which represents the
state of the proxied SE node.
There is no unit hierarchy structure in mcRNC.
Operator can manage/view the state of managed objects.
Operator cannot manage the state of functional units directly, but can view the state of
functional units in the same format (state model) that is used in IPA-RNC.

© 2023 Nokia. Nokia confidential 89

You might also like