Administering Managed Objects in Multicontroller RNC
Administering Managed Objects in Multicontroller RNC
Multicontroller RNC
DN0975566
Issue 09
Approved on 2023-02-13
© 2023 Nokia. Nokia Condential Information. Use subject to agreed restrictions on disclosure and use.
Nokia is committed to diversity and inclusion. We are continuously reviewing our customer
documentation and consulting with standards bodies to ensure that terminology is inclusive
and aligned with the industry. Our future customer documentation will be updated
accordingly.
This document includes Nokia proprietary and condential information, which may not be
distributed or disclosed to any third parties without the prior written consent of Nokia. This
document is intended for use by Nokia’s customers (“You”/”Your”) in connection with a
product purchased or licensed from any company within Nokia Group of Companies. Use this
document as agreed. You agree to notify Nokia of any errors you may nd in this document;
however, should you elect to use this document for any purpose(s) for which it is not
intended, You understand and warrant that any determinations You may make or actions
You may take will be based upon Your independent judgment and analysis of the content of
this document.
Nokia reserves the right to make changes to this document without notice. At all times, the
controlling version is the one available on Nokia’s site.
NO WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO
ANY WARRANTY OF AVAILABILITY, ACCURACY, RELIABILITY, TITLE, NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, IS MADE IN RELATION TO THE
CONTENT OF THIS DOCUMENT. IN NO EVENT WILL NOKIA BE LIABLE FOR ANY DAMAGES,
INCLUDING BUT NOT LIMITED TO SPECIAL, DIRECT, INDIRECT, INCIDENTAL OR
CONSEQUENTIAL OR ANY LOSSES, SUCH AS BUT NOT LIMITED TO LOSS OF PROFIT,
REVENUE, BUSINESS INTERRUPTION, BUSINESS OPPORTUNITY OR DATA THAT MAY ARISE
FROM THE USE OF THIS DOCUMENT OR THE INFORMATION IN IT, EVEN IN THE CASE OF
ERRORS IN OR OMISSIONS FROM THIS DOCUMENT OR ITS CONTENT.
© 2023 Nokia.
A list of changes between document issues. You can navigate through the respective changed
topics.
The High Availability Services (HAS) supports various administrative and recovery actions to
enhance the availability of the system. The purpose of taking recovery actions is to recover from
failures occurring in the system resources controlled by the HAS.
You can use the SCLI command show has to check the states of the managed objects.
You can use the SCLI command set has to perform the following administrative actions:
The term managed object is one of the basic concepts of the HAS framework. Each resource that
the HAS manages is a managed object (MO). An MO can be a cluster, a node, a recovery group
(RG), a recovery unit (RU) or a process.
Figure 1 shows the HAS system model. It demonstrates that the resources managed by HAS are
hierarchically organized into MO. Starting from the lowest level, there are processes, recovery
units (RUs), recovery groups (RGs). In a cluster environment, the highest level is the cluster, which
contains nodes and recovery groups. It is important to remember that when a HAS command is
applied to an MO at a certain level, it automatically applies to all MOs that are located below this
MO in the hierarchy.
cluster
The cluster is the topmost managed object in the system model. The cluster consists of nodes
and recovery groups. In single node deployment, the operations performed on the cluster MO
affect only the single node. The managed object name of the cluster is the slash character (/).
node
In the context of high availability services, the term node refers to a certain hardware entity
(for example, CFPU, CSPU, EIPU, USPU) or specific resources of a hardware entity, the operating
system (including the network file system and basic messaging), and the HAS software. The MO
name of the node is the slash character followed by the node name, for example:
/USPU-0
recovery group (RG)
A recovery group is a group of identical recovery units and the redundancy policy they obey. In
other words, the recovery group consists of a number of recovery units controlling the similar
resources. The MO name of the recovery group is the slash character followed by the recovery
group name, for example:
/Directory
recovery unit (RU)
A recovery unit is a collection of processes that constitute the target of a certain recovery
action, for example a switchover. The processes all support the same redundancy model. A
recovery unit is the central software entity controlled by the HAS. Since the recovery unit is
always running in a single node, the MO name takes the form /<node_name>/<RU_name>,
for example:
/CFPU-0/QNOMUServer-0
In addition to the various types of managed objects presented above, the HAS framework can
also be extended to cover proxied components - in other words hardware or software
components that are separately monitored and managed by proxy processes. A proxy process
can act as a proxy for one or more proxied components. A proxied component is mediated by one
operational proxy at a time. It is not the responsibility of the HAS to monitor the operation of
such proxied components directly.
Since a proxied component is not part of the HAS framework and therefore does not have an MO
name, the distinguished name (DN) of the component must be used to identify the object.
Distinguished names are employed for unambiguously identifying objects in the Configuration
Directory. An example of a distinguished name is:
The HAS framework has been designed to be scalable so it can support clusters ranging from a
couple of nodes to tens of nodes. High scalability is achieved by limiting the cluster wide decisions
to recovery unit level. Each node controls and supervises its own processes and can autonomously
execute the recovery actions in process level.
The HAS framework follows a standard state model for managing the resources (managed
objects). This state model is recommended in X.731 of the International Telecommunication Union
- Telecommunication Standardization Sector (ITU-T).
According to this model, the managed objects have three main state attributes:
administrative, operational and usage. The model also includes a set of status
attributes that are called alarm, procedural, availability, and unknown. These
additional status attributes provide further information about the main states.
As an extension to the standard state model, the platform provides additional status attributes,
such as role.
The three main state attributes and the four status attributes used by the HAS are explained
below. The combinations of different state attributes are also presented, as well as their meaning
from the viewpoint of a managed object (node).
In addition to the above mentioned attributes, HAS also supports dynamic attributes. A dynamic
attribute is a name and value pair, for example WAITING_SERVICE = /Directory. Both the
HAS and application processes can add additional state information to any MO by adding dynamic
attributes.
Figure: The standard state model for managing resources used in network element illustrates the
standard state model for the managed resources.
The managed objects have three main state attributes: administrative, operational and usage.
Administrative state
There are three possible values for the administrative state: UNLOCKED, LOCKED, and
SHUTDOWN. The operator can change the administrative state using the set has tool options
LOCK, UNLOCK, and SHUTDOWN.
Operational state
The value of the operational state attribute is either ENABLED or DISABLED. Unlike the
administrative state, the operational state is controlled by the HAS itself. So you cannot change
the operation states.
ENABLED
In the ENABLED state attribute, the entity represented by the managed object is functioning
properly and can perform its duties normally.
DISABLED
In the DISABLED state attribute, the entity is not functioning properly and cannot perform its
duties. In other words, it is regarded as faulty in some way.
Usage state
The usage state attribute describes the usage status of the entity represented by the managed
object. There are three possible values for the usage state attribute: IDLE, ACTIVE, and BUSY.
The usage state attribute is controlled by the HAS for all the managed objects except the
processes.
IDLE
In the IDLE state attribute, the entity is not currently processing any service requests.
ACTIVE
In the ACTIVE state attribute, the entity is processing service requests and there is still some
spare capacity for new service requests.
BUSY
In the BUSY state attribute, the entity has no more spare capacity until some of the active
service requests have terminated or more capacity is added.
There are certain dependencies between the main state attributes, resulting in eight attribute
value combinations of the administrative, operational, and usage attributes.
The permitted value combinations are listed below. When explaining the meaning of each
combination, using a specific example, it is assumed that the managed object in question is a
node if not otherwise mentioned.
LOCKED, DISABLED, IDLE: The state of the node is unknown. There are two possible
reasons for this:
1. The operator has locked a failed node. Automatic recovery actions have been unsuccessfully
attempted.
2. The power was turned off at the request of the operator.
LOCKED, ENABLED, IDLE: The node is still up and running but has been locked by the
operator.
SHUTDOWN, ENABLED, ACTIVE: The node is shutting down the services gracefully. The HA-
aware applications are terminating. The applications do not accept new service requests during
the graceful shutdown, regardless of the ACTIVE value of the usage state attribute.
SHUTDOWN, ENABLED, BUSY: This state is not possible so far as nodes are concerned. It is
possible only in the case of processes. The process is shutting down gracefully. The HA-aware
applications are terminating.
UNLOCKED, DISABLED, IDLE: The node is disabled. All repair attempts are executed in this
state. If an active node becomes faulty, it is moved to this state.
UNLOCKED, ENABLED, IDLE: The node is in a normal state. There are neither faults nor
administrative actions initiated by the operator. No transactions or sessions are ongoing.
UNLOCKED, ENABLED, ACTIVE: The node is in a normal state. There are neither faults nor
administrative actions initiated by the operator. At least one RU in addition to the HAS
recovery units is running.
UNLOCKED, ENABLED, BUSY: This state is not possible so far as nodes are concerned. It is
possible only in the case of processes. There are neither faults nor administrative actions
initiated by the operator. However, new service requests are not accepted because the usage
state value is BUSY.
The status attributes provide further information about the main states. The attribute values are
EMPTY if the managed object is running normally.
Alarm status
The possible values for the alarm status attribute are OUTSTANDING and MAJOR. Both values
must be set at the same time.
The OUTSTANDING value is set for a managed object that has an active alarm, with the following
exceptions:
The MAJOR value is set for a managed object that has a major active alarm.
Procedural status
There are three possible values for the procedural status attribute: INITIALIZING,
NOTINITIALIZED and TERMINATING.
INITIALIZING
In the INITIALIZING state attribute, the process, node or RU is currently starting.
NOTINITIALIZED
In the NOTINITIALIZED state attribute, the process, node or RU is not running.
TERMINATING
In the TERMINATING state attribute, the process, RU, RG or node (and in a cluster
environment also the whole cluster) is currently terminating.
Note:
The procedural state of the ENABLED (operational state) nodes is INITIALIZING when the
services in the node are still starting up.
POWEROFF
In the POWEROFF state attribute, the node is powered off.
FAILED
In the FAILED state attribute, the process, RU or node is faulty and waiting for a repair. In a
cluster environment, the FAILED value is also shown when the node is not physically present in
the cluster.
OFFLINE
In the OFFLINE state attribute, the node is not operational.
OFFDUTY
In the OFFDUTY state attribute, the node, process, RU, RG (or cluster in a case of cluster
environment) is not running an active service (This usually means that the managed object is
LOCKED).
Unknown status
The value of the unknown status attribute can be TRUE only for a node that is LOCKED and not
operational (its operational status is DISABLED). It can also be TRUE for a short period of time,
when the system is starting. In other cases this value is FALSE.
The role attribute is used for specifying the role of a RU in an active/standby pair of a RG.
There are three possible values for the role attribute: ACTIVE, COLDSTANDBY, and
HOTSTANDBY.
ACTIVE
If the value of the role attribute is ACTIVE, the managed object is providing normal service.
HOTSTANDBY
If the value of the role attribute is HOTSTANBY, the managed object is acting as a standby
resource for an active managed object in a hot active/standby pair and will be promoted to the
active role when the active object fails. Both the active and standby processes are running.
COLDSTANDBY
If the value of the role attribute is COLDSTANBY, the managed object is acting as a backup
Dynamic attributes are name-value pairs that show the state of the process associated with the
recovery unit. They can be used for informing the state to the users or other processes running in
the cluster and for troubleshooting purposes.
MISSING
HW_STATUS HAS sets and keeps this attribute for nodes that have never
(since commissioning) started up successfully.
INERT_MODE ENABLED,
TEST_MODE It indicates that the MO has been set to inert mode (or its sub-
state test-mode) by an operator or script. inert mode is also
known as "recovery ban". For example. HAS does not react to
failures if the MO is in inert mode; HAS does not issue a HW
reset for failing unit when it is being upgraded.
LAST_FUNCTIONAL -
It is the timestamp when the RU was last running. It is available
when RESOURCE_STATE of the RU is NON-FUNCTIONAL.
POWERING_OFF_REA NONE,
SON DEBUGGING, It is valid for nodes. It indicates the reason the operator or
MAINTENANCE, script gives when the node is powered off .
POWER-SAVING,
HARDWARE-
FAILURE
RESOURCE_STATE FUNCTIONAL
It means that the RU is functioning.
NON-FUNCTIONAL
It means that the RU could not be started currently.
DEGRADED
It is valid for a standby unit and indicates that the standby is
temporarily out-of-sync with the active, and switchover could
not be done without risk of data loss.
TRASHED
It is only valid for a standby unit. It indicates that a local
database is missing or corrupted, and a switchover (if forced)
would lose all database data.
RESOURCE_LEVEL <1..100>
It is a percentage number that indicates how healthy the
resources are. If resource levels between units differ, HAS
attempts to keep service active on units with a higher resource
level.
CONFIG
It means that all the configuration for SCCP or SS7 stack is
complete and inter process communication is in progress inside
distributed sigtran processes.
UNCONFIG
When the ROLE is ACTIVE, this value means the SCCP stack is
being configured.
When the ROLE is HOT_STANDBY, this value means the SCCP
stack is being configured or waiting for the occurrence of
switchover.
SWOUNCONFIG
It means the SCCP switchover is ongoing. It is only applicable for
SCCP processes.
DEAD
It means that the SCCP or SS7 stack is not active and not
functioning.
SWITCHOVER_PHASE QUIESCING
It means that the RU is releasing (stopping the use of) shared
resources.
QUIESCED
It means that the RU has released shared resources and they
can be allocated on the standby side.
UNQUIESCING
It means that controlled switchover is canceled and the RU is
resuming the active role.
ACTIVATING
It means that the RU is being activated.
BECOMING_HOTSTA
NDBY It indicates that a QUIESCED RU is not turning to a hotstandby.
SHUTTING_DOWN
It means that the RU is shutting down. It is valid for cold
active/standby RUs.
Note:
Redundancy is a method of providing the system with redundant equipment to improve its
tolerance against faults. This is achieved by providing backup resources for functional units. HAS
controls resources MO and reacts to their faults according to the redundancy model in question.
To understand the HAS system model, it is important to make the distinction between software
and hardware. Recovery actions are executed at the software level, not by switching between
hardware components.
In the deployment design, a key target is to ensure that there are enough redundant hardware
resources available to meet the system level availability requirements. The redundant
communication network is an example of such hardware-level redundancy. As far as the software
is concerned, redundancy for a service is achieved by deploying standby service instances RU to
the appropriate nodes. The number of redundant RUs and their deployment methods depend on
the redundancy model.
During switchover, the roles of the RUs are swapped. The processes running in the active RU are
terminated and the unit becomes the standby unit. The processes in the former standby RU are
started, making it the new active RU.
An N+M-like redundancy configuration can be created by defining N cold one plus M recovery
groups. To make the created N+M-like redundancy configuration more manageable, it is possible
to create a logical group (LG) for it. An LG for an N+M configuration is created from node
perspective, merging the related recovery units in each node to a single logical service.
From the HAS point of view, the load sharing redundancy model and the no redundancy model
are alike, although HAS offers some notification support in the case of load-sharing redundancy.
For each load-sharing group, there is a lower limit for the number of active RUs in that group
defined in the Configuration Directory attribute fshaThreshold. If the number of active RUs
drops below this limit, the HAS sets an alarm indicating this condition. This is the only load-sharing
specific support that the HAS provides. The HAS assumes that there is a load-balancing
mechanism elsewhere in the system that is able to assign the workload of a failed RU to the
remaining RUs.
No redundancy
Recovery groups of the no redundancy type provide node-local services for which active/standby
redundancy would make no sense. In the case of no redundancy, the HAS can attempt to restart
either individual processes or the whole recovery unit.
In the figure above, the lowest recovery group is of the hot active/standby type (in other words
both active and standby processes are running), while the two uppermost recovery groups are of
the cold active/standby type (the processes are not running in the standby RU).
The HAS also protects a node with concentrated default standby recovery units from becoming
overloaded. It does this by offering an opportunity to define and detect the quota that was
caused by the active recovery unit(s), and to switch it over to the node in the case of failure. When
the quota in the node with the standby recovery units is too high, failed active recovery units will
not switch over to that node, but will instead be restarted in the node where they failed.
Additionally, automated fallback is supported; this means returning an active recovery unit to the
node with the preferred active location, when this node becomes available again after a failure.
High availability services (HAS) supports the use of the active/standby redundancy model by
allowing the linking of various resources to recovery groups (RGs). These include storage
resources such as disk file systems, distributed replicated block devices (DRBD) and raw
partitions, as well as IP addresses.
Storage resources
A cluster only contains non-shared, directly-attached storage resources (for instance DRBD
devices) and is called a shared-nothing system. In a shared-nothing configuration, the same data
is replicated and maintained in synchronisation on two or more independent nodes. When such a
system starts up, it must decide which of the storage resources is most up-to-date.
IP addresses
The HAS allows the association of IP addresses to services. These addresses are movable
resources analogous to the movable storage resources. They always point to the active RU of an
active/standby RG. The IP addresses of active/standby RGs are either redundant or dedicated IP
addresses. Redundant IP addresses are cluster-internal addresses, whereas dedicated IP
addresses are visible outside the cluster and can be used by external applications for pointing to
resources in the cluster itself.
It is possible to declare dependencies between managed objects, such as the startup order
between recovery units, the startup order between processes within a recovery unit, the
switchover behaviour of recovery units within a set of interdependent recovery groups, the hot
switchover order of recovery groups, and the hot switchover order of processes within a recovery
unit. This allows for more optimized and faster startup and switchover operations.
Local dependency
Local dependency defines the startup order of the RU within the same node. In the case of local
dependency, the startup of the RU depends on the local service in the node. The startup of the
target RU is allowed once at least one RU belonging to the local service has started up on the
same node where the current RU is located. If the RU depends on the local service in the multiple
nodes, at least one RU in each different services has to be started up.
Global dependency
Global dependency defines the startup order of the RU in the entire cluster. RU startup depends
on a global service that can be located anywhere in the environment. RU startup is allowed as
soon as at least one RU belonging to the global service has started up anywhere in the
environment. If RU depends on multiple global services, at least one RU in any of the different
services has to be started up.
Switchover is one of the service provided by HAS. It is one of the recovery actions for recovering
the faulty RU in the system.
Note:
To perform the controlled/forced swithcover, the MO must support the hot or cold
active/standby redundancy model or the cold one plus M redundancy model.
Controlled switchover
The controlled switchover is performed in the active RU of an active/standby pair in an RG. The
RGs in hot or cold active/standby redundancy model or the cold one plus M redundancy model
provide the controlled switchover support to ensure the data are not lost during a switchover. For
a controlled switchover, the operation completes only when the data on the old standby RU is
known to be up to date with the old active RU. A controlled switchover can be done only when
both the active and standby RUs are operational. Therefore, the controlled switchover is primarily
performed when the cluster is functioning normally.
Forced switchover
By performing the forced switchover, you can force the RUs in an active/standby pair to exchange
their roles. The forced switchover is required when the controlled switchover fails or is not
supported by the system; for example, the application continuously denies a controlled switchover
operation because of high load. The switchover can be executed when the target MO is in the
standby state expected by the redundancy model.
In general, fault management means employing a set of functions which can detect and correct
fault situations in the system.
As shown in Figure: Fault management process, the fault management process in a system has
the following five steps:
1. Detection
The first step is to detect the fault.
2. (On-line) Diagnosis
The second step is to determine the cause of the fault.
3. Isolation
The third step is to protect the rest of the system from the fault.
4. Recovery
The fourth step is to restore the system to the expected behavior (recovery), by performing a
switchover or restarting some part(s) of the system. At this stage further, the actions to
determine the cause of the fault (off-line diagnosis) may be taken.
5. Repair
The final step is the repair of the system. This generally means that the faulty parts in the
system are replaced with working parts.
Notification of the fault occurs at many points in the fault management process. Components
within a system must interact with each other to enable the fault management.
Fault detection
Fault detection (also called fault supervision or fault localization) is the process of identifying an
The HAS supervises the health of software processes in the network element both passively and
actively. Passive supervision means detecting unexpected process terminations. Active supervision
is an optional feature and means heartbeating a high availability aware (HA-aware) process. A
heartbeat is a polling mechanism that is used to verify that a software resource is healthy. This
non-intrusive health monitoring method uses only a small percentage of the computing
resources. In most environments, HAS feeds a hardware watchdog though operating system
interfaces. So, a node is reset automatically if the operating system or HAS malfunctions.
In addition to the failures of the software processes, there may be hardware-related failures such
as:
node failures
communication network failures
hardware device failures
In a cluster, the node failures require a communication-based detection mechanism to know the
state of each node. Node failure detection must be fast and must not depend on a failing node
reporting its own failure. However, self-diagnosis may be leveraged to speed up failure detection
in the cluster.
Communication network failures in a cluster require health monitoring with the ability to withstand
a failure between the network cluster nodes. When a communication link fails, it must detect the
difference between a communication network failure and a node failure.
One example of a hardware fault is when the temperature starts to exceed a certain level.
Corrective measures can be, for example, to accelerate the fans or throttle the central processing
unit (CPU) to prevent it from being damaged.
On-line diagnosis
Once a fault is detected, the problem must be analyzed to determine the proper isolation and
recovery actions. The diagnosis process analyses one or more events and system parameters to
determine the nature and location of a fault. This step can be automatic or invoked separately by
the operator. The result of the diagnosis may be acted upon automatically or manually.
As an example, any intelligent network device, when started, runs some basic diagnosis to check
the health of the device. In standard personal computer (PC) hardware, as in server blades, the
Typically, only hardware components (for example, nodes, disks, network interfaces) are seen as
targets for diagnosis.
Fault isolation
The purpose of fault isolation is to keep a fault from spreading to other components of the
system. This is achieved, for example, by taking the defective component in the system out of
service. By isolating the fault in a running system, the system can be maintained, at least partially,
in the operational state.
Typical fault isolation operations are resetting or shutting down a faulty recovery unit. In node
failure situations, fault isolation is carried out by issuing a hardware reset to the node or by
powering off the node, depending on the nature of the hardware fault.
Isolating a faulty component does not necessarily imply that the fault is corrected. Fault recovery
actions are needed to bring the faulty part of the system back into operation. In some cases, it
may not be possible to separate the fault isolation and recovery processes. In a cluster, this could
occur when shutting down a faulty node in a load-sharing recovery group.
Fault recovery
The purpose of the fault recovery process is to restore the faulty part of the system to the
operational state, even though full capacity may not be achieved. In general, the possible recovery
actions are a switchover between the active and standby recovery units and restarting the faulty
process, recovery unit or the node (in a cluster environment).
In a clustered environment a switchover is performed from the failing active recovery unit to the
standby unit in an active/standby recovery group. Another recovery action could be to restart the
faulty process - perhaps several times - before performing a switchover. A more radical measure
is to reboot the faulty node (or the node running the faulty recovery unit).
The full switchover time depends on the time it takes to detect the application or node failure, to
apply the switchover policy, and to restart the application in the case of cold active/standby. The
aggregate switchover time must be short and must allow the cluster to maintain carrier grade
availability.
For replacing an add-in card or BCN module, see Replacing Multicontroller Hardware Units.
Error logs may be created under special circumstances during some management operations,
such as power on, power off, switchover, forced switchover, upgrade and so on. In these cases,
the error logs are a result of the unusual state of the cluster, and do not necessarily indicate an
actual error situation. Those error logs can be ignored.
The main interface for monitoring the health of the system is the alarm system and logs are only
supplementary information when investigating alarms or other operational problems like failing
SCLI commands.
In the HAS MO environment, the functional unit (FU) is an entity of software capable of
accomplishing a special purpose. A functional unit typically has a visible operating state. It belongs
to one of Control, User, Transport or Management planes. Most functional units are under the
control of fault management.
A functional unit is a special kind of MO. It can be either a recovery unit or a simple executive
node.
All the crucial parts of the network element have been backed up to ensure the reliability of the
system's operations.
Functional units with different redundancy models have different mapping rules for unit state
changes. The redundancy models of the functional units are as follows:
2N redundancy model
2N*M redundancy model
N+M redundancy model
SN+ redundancy model
No redundancy model
2N redundancy model
2N is a high available redundancy model, with two units in a redundancy group, one is the working
unit and another is the hot standby unit.
No redundancy model
There are no redundant resources for the service, and recovery from a fault is accomplished by
restarting the faulty recovery unit.
2N Hot active/standby
SN+ Load-sharing
No redundancy No redundancy
Note:
If the FU is the simple executive node, then there is no redundancy for this FU.
Functional units have their own state model, which differs considerably from the state attributes
of the managed object (MO). Other working states are either incorrect or not supported. If a unit
The state model of the functional units supported by the system are as follows:
Except the states listed above, other states are either incorrect or not supported, such as TE
(Test), BL-ID (Blocked, Idle), BL-RE (Blocked, Restarting), SP-UP (Spare, Updating), SE-NH
(Separated, No hardware), and so on.
Note:
You can only check the states of the functional unit. Changing the state of the functional
units is usually performed automatically by the system. That means when the state of MO
changes, the state of the functional unit may change correspondingly.
The show has functional-unit SCLI command can be used to check the unit
states. MO states can be changed by the set has command.
Even though the state model of the FU is different from the HAS state model, each FU can be
mapped to a corresponding HAS state.
Table 2 shows the mapping between the functional unit states and the combined HAS states:
1)
UNLOCKED ENABLED ACTIVE NA WO-EX
WO-EX
1)
‘NA’ means no value.
2)
‘-’ means the nonexisting state.
3)
‘*’ means any existing state.
After getting the correct permissions for executing the SCLI commands, you can take the
recovery actions.
The user permissions control the ability to execute the various commands of the SCLI. You must
have correct permissions for executing the SCLI commands.
Table: Show has functional unit commands and user permissions shows the show has
functional-unit command and the user permission required to execute this command.
The alarms are raised by high availability services (HAS) in the management operations such as
powering on, powering off, switchover or restarting.
Different alarms might be raised each time you perform the same operation. Many factors can
affect the alarms to be raised, such as the redundancy model of the recovery unit (RU), the state
of the managed object (MO) and the traffic load of the system.
For example, Alarm 70166 MANAGED OBJECT LOCKED is raised by HAS when you lock the 2N
redundancy unit /CFPU-0/QNCFCPServer-0 which has SP-EX state; and both Alarm 70166
MANAGED OBJECT LOCKED and Alarm 70194 RECOVERY GROUP SWITCHOVER are raised by HAS
when you lock the 2N redundancy unit /CFPU-1/QNCFCPServer-1 which has WO-EX state because
the switchover is triggered.
70159 MANAGED OBJECT This alarm is valid for processes, recovery units, recovery
FAILED groups and nodes. This alarm is raised when a named MO
failed, and is automatically cleared when the MO is no
longer down/faulty. The MO can either be software,
hardware, or logical entity.
70166 MANAGED OBJECT The administrative state of the named MO which can be a
LOCKED cluster, a node, or a recovery unit (RU) has changed to
LOCKED as a result of a user action (graceful shutdown or
lock operation).
70168 CLUSTER STARTED The cluster in starting or restarting. The (re)start may have
been initiated by an operator or be caused by fatal errors
in some critical hardware or software component. When
the cluster is restarted, the alarm system clears all alarms
that were raised by the cluster's managed objects before
the restart.
70186 CLUSTER OPERATION This alarm indicates that an operator has initiated a cluster
INITIATED BY OPERATOR operation on the specified MO and HAS is now executing
the operation. The operation can be switchover, restart or
power-off.
70187 MANUAL NODE This alarm is raised when HAS is unable to reset a faulty
ISOLATION node with Intelligent Platform Management Interface
VERIFICATION NEEDED (IPMI). The operational state of the node is not known, and
therefore, it is not known if the node still holds and/or
updates the shared resources.
The alarm is not valid in single node configurations.
70188 MANAGED OBJECT This alarm indicates that the specified MO is being shut
SHUTDOWN BY down. The named MO and all its unlocked sub-resources
OPERATOR are now terminating.
70189 MANAGED OBJECT This alarm indicates that the specified MO has been
UNLOCKED BY unlocked. The named MO and its unlocked sub-resources
OPERATOR (if there are any) can now be activated.
70194 RECOVERY GROUP This alarm is raised when HAS initiates a switchover.
SWITCHOVER
70249 CRITICAL CLUSTER This alarm is raised when the standby Cluster
SERVICES WITHOUT Administrator (CLA) node is currently not operational.
STANDBY
70251 UNRECOMMENDED This alarm is raised when an operator locks the current
CONFIGURATION standby FSDirectoryServer recovery unit.
FORCED BY OPERATOR
70265 RECOVERY ACTIONS This alarm is raised when an operator sets a managed
BANNED FOR MANAGED object to inert (recovery ban) mode.
OBJECT
70350 DETECTED CLUSTER This alarm is raised one of the cluster management
INTERNAL MESSAGING functionality nodes (CMFN) has received cluster
WITH UNKNOWN ORIGIN management messages with an unknown origin.
70359 HARD DISK DRIVE The alarm is raised when a disk failure is detected.
FAILED
70365 DRBD DEVICES The alarm is raised when distributed replicated block
FORCIBLY STARTED UP device (DRBD) devices are forced up without waiting for
DRBD re-synchronization.
Use the show has functional-unit SCLI command to check following information:
functional unit name, logical and physical addresses of the unit, functional unit state, redundancy
model of the unit, functional unit index, and functional unit type.
Notice:
You can only check the states of the functional unit. Changing the state of the functional
units is usually performed automatically by the system.
Procedure
1 Check the relevant informations of all functional units
Step example
Example: Following is the execution printout of the show has functional-unit
unit-info SCLI command.
...
...
...
...
/EIPU-0/QNEITPProxyServer-0
...
/CSPU-0/QNCSUPProxyServer-0
...
/USPU-0/QNUSUPProxyServer-0
...
/CSPU-0/QNCSCPServer-0-0
...
...
...
/EIPU-0/QNIUBServer-0-0
...
/CFPU-0/QNSCLIUServer-0
/CFPU-1/QNSCLIUServer-1
/CFPU-0/IL_FUMServer-0
/CFPU-1/IL_FUMServer-1
/CFPU-0/IL_FUAServer-0
...
SE-OU*: unit is available for taking over the active role in case of
switchover
Step example
Example: Following is the execution printout of the show has functional-unit
comp-addr-info SCLI command.
Step example
Example: Following is the execution printout of the show has functional-unit
group-addr-info SCLI command.
--------------- --------------------------------------------------------
Step example
Example: Following is the execution printout of the show has functional-unit
unit-type-info SCLI command.
Step example
Example: Following is the execution printout of the show has functional-unit
unit-info unit-type OMU unit-index 0 SCLI command.
Step example
Example: Following is the execution printout of the show has functional-unit
comp-addr-info unit-type OMU SCLI command.
show-mode is an optional parameter. You can use show-mode to choose different display
modes:
simple: Shows the basic unit information of the Unit Name, Logical Address, Physical
Address, State and Redundancy.
normal: Shows the unit information in normal mode, additionally shows the Managed
Object of Recovery Unit.
verbose: Shows the unit information in verbose mode, additionally shows the Managed
Object of Recovery Group.
To check the information of the unit in simple/ normal/ verbose mode, execute the following
command:
To check the information of a specific unit displayed in different types, execute the following
command:
Use the SCLI commands show has and set has to check and change the states of the MOs.
Purpose
You can use the show has commands to check the following attributes:
You can use the set has commands to execute the following operations:
lock
unlock
shutdown
When using the set has commands for locking, unlocking, or shutting down an MO, you can
change only the administrative state of an MO.
Locking an MO locks all the normal application RUs of the MO, but leaves the RUs that provide
mandatory services unlocked (for example, HAS). Locking is an ungraceful (forced) way to force a
quick termination of all the processes on the MO. Therefore, you must perform the locking
command with care. The locking command can be used in cases such as hardware maintenance,
hardware replacement, and signaling configuration.
A graceful shutdown ensures that the connections are closed properly and the buffers flushed to
the disk without any data losses. The shutdown command can be used in cases such as hardware
maintenance or hardware replacement.
Lock and shutdown operations will lead to taking the MO out of use, which may bring the
following impacts to the whole system:
Note also that the system does not automatically unlock a locked MO because lock and
shutdown are administrative operations conducted by the operating personnel. Special
care shall be taken when locking a complete RG. If the RG is locked when the system is
restarted, it may prevent the system from starting up.
Before executing the lock and shutdown operations, check the redundancy mode and state
of the recovery units that you want to lock or shut down. If you want to lock or shut down
the recovery units in hot standby and the N+M redundancy mode, make sure the states of
the other units within the same Recovery Group are UNLOCKED (administrative state) and
ENABLED (operational state). To check if the RU is in UNLOCKED and ENABLED states,
enter the following command: show has state managed-object /<mo-name>.
Procedure
1 Check the state of an MO.
Note:
In the show has command, the MO can be either a cluster, a node, an RG, an RU, or
a process.
To check all the states and status attributes of an MO, enter either of the following
In the command syntaxes above, <mo-name> is the name of the Managed Object and
<mo-fullname> is the Managed Object in distinguished name (DN) format.
To check only some of the state attributes of an MO, execute the following command by
adding the state and status attributes as option:
Expected outcome
Use the command set has to lock, unlock and shut down an MO.
Note:
In the set has command, the MO can be either the whole cluster, a node, an RG or
an RU, but not a process.
Notice:
It is not recommended to unlock all the MOs.
After entering the command set has unlock all, the unlocking of the
Recovery Group /QNEM can cause the overload of the hard disk.
If you want to execute the command set has unlock all, enter the following
command after unlocking all the MOs:
set has lock managed-object /QNEM
Notice:
To shut down the USPU-related MOs in SN+ redundancy mode (USPU node,
QNUSCPServer and QNUSUPProxyServer), you must give the timeout value. The
recommended timeout value is 180 seconds.
The set has shutdown command waits until the MO shuts itself down successfully. If
the shutdown operation has not finished before the expiration of the timeout, then HAS
forces the MO to change into the LOCKED state.
The HAS SCLI command show has dependencies allows checking the dependencies
between different RGs.
Purpose
The dependencies include:
startup
parasite
stalker
symbiot
local dependency
global dependency
To view the dependencies between the RGs, enter the following command:
Result
Expected outcome
/QNIUB-0:
/QNCSCP-0:
/QNUSCP-0:
/QNUP-0:
/QNIU-0:
/QNIUB-1:
/QNCSCP-1:
/QNUSCP-1:
/QNUP-1:
/QNIU-1:
/QNIUB-2:
/QNUP-2:
/QNIU-2:
/QNIUB-3:
/QNUP-3:
/QNIU-3:
/BFD:
/QNOMU:
/QNHTTPD:
/CDAfs:
parasites: /PM9Fuse
/PM9Fuse:
/Log:
/Tracing:
/SSH:
parasites: /QNSNMP
/QNFUA:
parasites: /QNFUM
/QNFUM:
/EswMan:
/QNCFCP:
/QNSNMP:
/SWMServer:
/SGWNetMgr:
/HPIMonitor:
/AlarmSystemLight:
/IPSecRedundant:
/QNSCLIU:
/QNEITPProxy:
/QNCSUPProxy:
/QNUSUPProxy:
The HAS SCLI command show has summary managed-object shows the number of RUs
and the processes of the MOs.
If checking the summary of the RG /Directory succeeds, the following output is displayed:
RUs in configuration : 2
Unlocked RUs : 2
Process status
Processes in configuration : 6
Unlocked processes : 6
Note:
To automatically list the available MO names:
Controlled switchover is one of the recovery actions provided by HAS. It makes the previous
standby RU become the new active RU.
Purpose
A controlled switchover can fail if the application is either busy or the controlled switch-over
timeout is short. A controlled switchover request is automatically turned into a forced switchover
request if the RG does not support a controlled switchover or the controlled switchover fails.
The controlled switchover timeout indicates the maximum waiting time for the synchronization
between the active and standby units. The configuration defines a reasonable default for the
controlled switchover timeout for each RG. The operator may overwrite the default timeout value
when the controlled switchover request is issued.
Note:
It is not recommended to give the timeout value for the controlled switchover, for it may
cause the failing of the operation.
You can also specify the new active RU for the controlled switchover between the active/standby
RUs in an RG. To specify the new active RU, enter the following command with the short or long
version of the options:
Before performing the controlled switchover, make sure the RG consisted of the target RU is
running on the active/standby redundancy model, and the combined states of both the active and
standby RUs are UNLOCKED, ENABLED.
To execute a controlled switchover between the active/standby RUs in an RG, enter the
following SCLI command with either the short or long version of the option(s):
Replace <mo-name> with the name of the RG on which you want to perform the switchover.
The different generic options for the has command are listed in the following table.
filter MOType[,MOType...]
The MOs are filtered by their type (filter is an
input filter).
The different types of MO are:
• RG
• RU
logerrors
It enables logging of error messages in
syslog.
noerror
The switchover is continued even if the
command fails with a particular MO name(s).
regex
The regular expressions must be used in MO
names. The regular expression must be
inside the quotation marks (“ “).
Option Description
force
This option is used for critical MO.The
warnings are not printed on the screen.
Note:
noblock
It is not required to stop waiting for HAS to
complete the operation.
timeout <timeout-value>
It specifies the timeout duration for the
command. The duration can be in seconds
(s), minutes (m), hours (h) or days (d). The
units are not case sensitive. By default, the
timeout value is in seconds.
The allowed value ranges from 1 second (s)
to 24836.49 days (d). No warning is displayed
with this parameter.
Expected outcome
Unexpected outcome
The possible reasons for the failure of the switchover operation are:
incorrect parameters
dependencies between MOs
invalid state of the system
If the switchover command fails, the printout displays depending on the cause of the error.
An example is given as follows:
Note:
To automatically list the available MO names:
The forced switchover is one of the recovery actions. Forced switchover forces the RUs in an
active/standby pair to exchange their roles. The switchover can be executed when the target MO
is in a state expected by the redundancy model.
Purpose
As part of performing a forced switchover, it is possible to specify the name of the new active RU.
If the newly specified RU is already active, then no action is taken. If it is in standby, then a
switchover occurs.
Notice:
It is not recommended to perform the forced switchover, for it can cause the dropping of
the ongoing call.
You can also specify new active RU for the forced switchover between the active and standby RUs
in an RG. To specify the new active RU, enter the following command with short or long version of
the options:
To execute a forced switchover between the active/standby RUs in an RG, enter the following
SCLI command with either the short or long version of the option(s):
Replace <mo-name> with the name of the RG on which you want to perform the switchover.
The different generic options for the has command are listed in the following table:
filter MOType[,MOType...]
The MOs are filtered by their type (filter is an
input filter).
The different types of MOs are:
• RG
• RU
logerrors
It enables logging of error messages in
syslog.
noerror
The switchover is continued even if the
command fails with a particular MO name(s).
regex
The regular expressions must be used in MO
names. The regular expression must be
inside the quotation marks (“ “).
The different options for forced switchover are listed in the following table:
Option Description
force
This option is used for critical MO.
Note:
noblock
It is not required to stop waiting for HAS to
complete the operation.
Expected outcome
Unexpected outcome
Possible reasons for the failure of the switchover operation are as follows:
incorrect parameters
dependencies between MOs
invalid state of the system
If the switchover command fails, then the printout is one of the following depending on the
cause of the error:
policy.
unavailable.
You can power off a node to ensure that a suspected HW-faulty node does not cause any
disturbance to rest of the system, and also for power saving if capacity is in excess for the actual
need.
Ensure that you have sufficient user permission to execute the SCLI commands needed in the
procedure.
Procedure
1 Gracefully shutdown the node to be powered-off.
The timeout parameter defines the time that is given for the ongoing services to shut
down gracefully, after which the services that are still running are terminated by force. As a
result of this command, the administrative state of the node is set to LOCKED.
The reason for powering off can be specified by using the following SCLI command:
You can optionally specify one of the following reasons for powering it off:
The powering off reason is shown as a dynamic attribute of the node that is powered off and
its current value can be viewed by running the following SCLI command:
Expected outcome
Unexpected outcome
Possible reasons for the failure of the powering off operation are, for example, incorrect
parameters or an invalid state of the system.
This chapter describes the procedure to gracefully power off a BCN module.
Purpose
To gracefully power off the nodes, LMP, and motherboard contained in a BCN module.
After acquiring the configuration lock, if the power off command is tried to be executed in
parallel on more than one BCN module, following error is displayed:
The command was not executed. You or another user is currently holding the
To release the configuration lock after BCN module power off, enter the following command:
Note:
For powering on the BCN module, you must have physical access to the BCN module.
The parameters for this command are described in the following table:
Table 11: Parameters for the set hardware power off command
Parameter Description
force
Specifies the forceful power off of the LMP
(motherboard). This option is used in
scenarios where LMP power off results in
service downtime, such as powering off a
LMP hosting a management node.
Note:
When this command is executed, the following alarm is raised:
Step example
To power off the LMP-1-3-1 and the nodes contained in the same BCN box, enter the
following command:
Please be patient.
Step example
If the LMP to be powered off hosts a management node (CFPU) and the power off command
is given without force option, the following output is displayed:
Command failed as LMP to be powered off hosts a management node. Please retry the
Step example
To power off the LMP-1-1-1 with force option, enter the following command:
WARNING: Forced LMP power off could result in ungraceful shutdown and power off
Your consent is needed to proceed with forced LMP power off [y/n]
Step example
If graceful shutdown of node hosted by LMP that is being powered off fails, the following
output is displayed:
Please be patient.
Step example
If powering off the node hosted by LMP that is being powered off fails, the following output
is displayed:
Please be patient.
Step example
If the command is executed from a non-active SSH node, the following output is displayed:
Command executed from a node on which SSH service is inactive. Please retry
Step example
If LMP power off fails due to internal error, the following output is displayed:
Failed to power off LMP-1-3-1. Try the command again. If the problem persists,
check the system log file for the error details and contact your local customer
support.
Step example
If the LMPs are tried to be powered off in parallel without first acquiring the configuration
lock, the following output is displayed:
This command cannot be executed, as LMP power off is already ongoing. Please try
This chapter describes the procedure to power off the whole cluster (mcRNC network element),
for example for moving the cluster physically from one place to another.
Purpose
When the BCN cluster is shut down, the SSH connection is lost and the cluster is accessible only
through the console ports. Therefore, for conducting the procedure for powering off the cluster,
connect to LMP-1-1-1 or LMP-1-2-1 through the management interface and enter the command
telnet 0 3001 to connect to the CFPU node. After that, start the SCLI session by entering the
command fsclish.
Procedure
1 Shut down the cluster.
To shut down the cluster gracefully, enter the following SCLI command:
The timeout parameter defines the time that is given for the ongoing services to shut
down gracefully, after which the services that are still running are terminated by force. As a
result of this command, the administrative state of all nodes is set to LOCKED.
Note:
To ensure termination of all services in a reasonable time, you must give a
timeout value. The recommended timeout value is 5 minutes.
To power off the add-in cards (nodes) of the cluster, enter the following SCLI command:
/ shutdown successfully
This command powers off all other nodes except the CFPU node that runs the cluster
Expected outcome
The CFPU node (add-in card) running the CMF functionality remains powered on in this phase
and the LED indicating its status is green.
Note:
If you are connected to the CFPU which has the standby instance of the CMF
(FSClusterHAServer RU), it is disconnected after entering the command set has
power off managed-object /. In this case, the output /is powered OFF
successfully shows that the command execution status is missed. To continue
operation, to check the state of all nodes, connect through the other LMP to the CFPU
which has the active instance of the FSClusterHAServer RU.
For powering off the motherboard of each BCN box, switch off the power of each BCN box
by turning the power switches located in the separate PDU unit located in the first slot of the
rack.
To power on and unlock a node that has been locked and powered-off earlier.
Powering on a node by the HAS command can be executed only in a clustered environment.
Ensure the user is logged into the system with sufficient user permissions.
Procedure
1 Power on a node.
Step example
To power on the node USPU-0, enter the following SCLI command:
set has power on managed-object /USPU-0
Step result
The following output is displayed:
Step example
To unlock the node USPU-0, enter the following SCLI command:
set has unlock managed-object /USPU-0
This procedure is applicable when the BCN module has been powered off as described in the
section, Powering off a BCN module.
Procedure
1 Disconnect and reconnect the power cable of the BCN module that is powered off.
Procedure
1 Power on the BCN boxes one by one in numerical order starting with box-1.
Power on the BCN boxes one by one at about 20-second intervals. It is suggested that all
BCN boxes are not powered on at the same time in order to prevent overload of the CFPU
node.
Before proceeding, make sure that nodes with basic functions are booted up to operational
state. Make serial connection to CFPU as explained in Before you start in the chapter
Powering off a cluster.
Enter the following CLI command periodically in order to find out when these nodes are all
booted up:
Note:
After a certain node has booted up, it is no more shown in the list of Non-operational
nodes in the command output.
Expected outcome
Node status
Nodes in configuration : 16
Unlocked nodes : 0
Non-operational nodes : 2
/EIPU-1
/EIPU-3
RG status
RGs in configuration : 68
Unlocked RGs : 0
RU status
Unlocked RUs : 0
Process status
Unlocked processes : 0
Ensure that all RUs are in the intended state. Enter the following command to check the state
3.14 Restarting an MO
This chapter describes the procedure to restart an MO for recovering the software process, the
cluster, the RU, the node, or the system from faults.
Purpose
Restart recovery action is performed either automatically by the HAS or as manual administrative
steps by the operator. In general, the restart action first performs an ungraceful (forced)
shutdown and then starts the MO.
Optionally, a graceful restart of the nodes is also supported. An optional time interval can be
specified at which the recovery units need to shut down their operations or switch over to
another node before the node is restarted.
When a node or RU is restarting, all the processes running on the node or RU are immediately
terminated before the node or RU is restarted.
Note:
If a node is restarted, it is possible that a forced switchover takes place, which may cause
the dropping of the ongoing call.
In addition to the restarting methods introduced before, you can also restart cluster in a phased
manner. In the phased cluster restarting mode, the peer management node is activated before
the other nodes.
▪ Restart an MO ungracefully.
Step example
Example: Restarting the MO /EIPU-1/QNUPServer-1-3
Note:
Note that restarting the RG /Directory in the cluster environment can interrupt
the secure shell (SSH) connection.
By default, Alarm 70186 is raised when you restart an MO, and cleared automatically when its
time to live expires. User can restart an MO without raising an alarm by using the following
SCLI command:
Expected outcome
If the restart succeeds, the following outputs are displayed depending on the MO:
<mo-name> reset successful 1 (of 1) running RUs were terminated and are waiting
for restart.
Unexpected outcome
Possible reasons for the failure of the restart operation are, for example, incorrect
parameters or an invalid state of the system.
If the restart command fails, the output is one of the following, depending on the nature of
the error:
unavailable.
▪ Restart an MO gracefully.
The parameter timeout specifies the provided time interval at which recovery units (RU) are
to shut down their operations or switch over to another node before the node is restarted.
Note:
Only nodes support a graceful restart.
Step example
To restart a node USPU-0 gracefully with a timeout of 100 seconds, enter the following
command:
Unexpected outcome
If the parameter timeout is used without the graceful restart command, following output is
displayed:
If a wrong value is specified for the timeout parameter, following output is displayed for
this example:
To gracefully restart the nodes hosted by the BCN module and restart the BCN module, or to
restart all the BCN modules in the cluster.
Purpose
To gracefully restart the nodes hosted by the BCN module and restart the module.
To restart all BCN modules in the cluster. Nodes in the cluster are restarted ungracefully as
part of restarting all the modules.
After acquiring the configuration lock, if the restart command is tried to be executed in parallel
for more than one BCN module, following error is displayed:
The command was not executed. You or another user is currently holding the
To release the configuration lock after restart of BCN module, enter the following command:
set config-mode on
To restart the BCN module and the nodes contained in the same BCN box, enter the
following command:
Note:
To restart all the BCN modules, the command must include force option.
The parameters for this command are described in the following table:
Parameter Description
Step example
To restart the LMP-1-3-1 and the nodes contained in the same BCN box, enter the following
command:
Please be patient
Note:
In the following examples, LMP-1-1-1 is used as reference LMP.
Step example
To restart the LMP-1-1-1 with force option, enter the following command:
WARNING: Forced LMP restart would result in ungraceful restart of Nodes hosted by
LMP.
Step example
If graceful restart of node hosted by the LMP that is being restarted fails, the following
output is displayed:
Please be patient
Step example
If the command is executed from a non-active SSH node, the following output is displayed:
Command executed from a node on which SSH service is inactive. Please retry
Step example
If LMP restart fails due to internal error, the following output is displayed:
check the system log file for the error details and contact your local customer
support.
Step example
If the LMPs are tried to be restarted in parallel without first acquiring the configuration lock,
the following output is displayed:
Step example
To restart all LMPs in the cluster, enter the following command:
Please be patient.
WARNING: Restarting all LMPs will result in cluster restart with ungraceful
Your consent is needed to proceed with forced restart of all the motherboards in
LMP-1-8-1 is restarted.
LMP-1-7-1 is restarted.
LMP-1-6-1 is restarted.
LMP-1-5-1 is restarted.
LMP-1-4-1 is restarted.
LMP-1-3-1 is restarted.
LMP-1-2-1 is restarted.
LMP-1-1-1 is restarted.
Step example
If the command is executed from a non-active SSH node, the following output is displayed:
Command executed from a node on which SSH service is inactive. Please retry
Step example
If LMP is in the deployment and the actual hardware is missing, the following output is
displayed:
LMP-1-2-1 is restarted.
LMP-1-1-1 is restarted.
Step example
If the command is executed with all option and without the force option, the following
output is displayed:
Command failed to restart all motherboards in the cluster. Please retry the
Step example
If the command execution fails due to failure in the hardware management framework API,
the following output is displayed:
Failed to restart all the motherboards in the cluster. Try the command again. If
the problem persists, check the system log file for the error details and contact
Step example
If the user interrupts the LMP or all LMP restart operation, a warning message is displayed as
follows:
When the USPU-related MOs (USPU node, QNUSCPServer and QNUSUPProxyServer) shut down
fail, you can follow the steps in this procedure to set timeout value for the ongoing shutdown
operation.
Solution
To shut down the USPU-related MOs that are of SN+ redundancy type, you should give a timeout
value in set has shutdown [timeout <timeout-value>] managed-object
/<mo-name> command. Otherwise, the MO will not shut down until all the data traffic is
processed.
Procedure
1 Interrupt the ongoing shutdown operation
To interrupt the ongoing shutdown operation, press Ctrl and C at the same time. The
ongoing shutdown operation is not terminated, the system returns to the command prompt.
To set timeout value for the ongoing shutdown operation, enter the following SCLI
command:
The recommended timeout value is 180 seconds. If the shutdown operation has not finished
after the timer expires, HAS forces the MO to change to LOCKED state.
The recovery system in IPA-RNC is responsible for fault management, which utilizes a set of
recovery actions, to keep the NE as high availability as possible in various fault scenarios.
Network element
The whole RNC network element.
System
It contains several functional units. IPA-RNC may contain two systems (original and trial) during
software upgrade.
Functional unit
Entity of hardware and software, or only hardware, capable of accomplishing a special purpose.
Most, but not all, functional units are under control of fault management.
Functional unit may be computer unit or non-computer unit.
Functional unit may be nested (a functional unit may contain functional units), namely,
functional unit has hierarchy structure.
Process
HAS (High Availability Services) supports various administrative and recovery actions to enhance
the availability of the system. HAS has similar functionality as recovery system in IPA-RNC, but
many concepts are different.
Here is the object model that managed by HAS system. Each node in the chart is a managed
object (A logical entity representing one or more resources that can be the target of management
operations such as start, stop, restart, and shutdown). HAS uses a standard (ITU-T X.731) state
model to manage these managed objects.
Cluster
The whole RNC network element.
Node
A composite managed object used to represent a physical node in a cluster. It is composed of
recovery units.
Recovery unit
A managed object used to represent a part of service (recovery group) that runs on a particular
node. It is composed of processes.
Recovery group
A managed object used to represent a service provided by the cluster, specifically, it is not
associated with any particular node on the cluster. It is composed of recovery units.
Process
In mcRNC, each recovery unit containing RNC application and/or IPA Light software is mapped
to corresponding functional unit, and its ITU-T X.731 state is mapped to functional unit state.
In mcRNC, each Simple Executive (SE) node is mapped to corresponding functional unit.
Conceptually SE node is similar to some non-computer unit type in IPA-RNC. It is not managed
by HAS, but is supervised by a proxy recovery unit, operational state of which represents the
state of the proxied SE node.
There is no unit hierarchy structure in mcRNC.
Operator can manage/view the state of managed objects.
Operator cannot manage the state of functional units directly, but can view the state of
functional units in the same format (state model) that is used in IPA-RNC.