Training Document IManager ATAE Cluster V200R002C50 Scheme-20160418-A V1.0
Training Document IManager ATAE Cluster V200R002C50 Scheme-20160418-A V1.0
Training Document IManager ATAE Cluster V200R002C50 Scheme-20160418-A V1.0
V200R002C50 Scheme
www.huawei.com
GE Gigabit Ethernet
FC Fiber Channel
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 1
References
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 2
Objectives
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 3
Contents
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 4
Low OPEX — Greatly Reducing Power
Consumption, Equipment Room, and Labor Cost
For a network with 50,000 GSM TRXs
Equipment Room
Four cabinets
M2000 Decreased
by 75%
Sun M5000(8P) Sun M4000(2P)
One cabinet
PRS Nastar TS
ATAE Server
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 5
Highly Efficient Redundancy System for Fast Service
Recovery and Smooth Capacity Expansion
Redundancy protection for key hardware
•DC cabinet: The cabinet uses dual 3 inputs. The power supply units of
SAN helps fast the ATAE subrack use 2+2 redundancy and support -48 V power supply.
recovery •AC cabinet: The cabinet uses dual 2 inputs. The power supply units of
Boards are started the ATAE subrack use 1+1 redundancy and support two -48 V power
by SAN boot. supply inputs.
PnP enables
•The ATAE SMM supports 1+1 active/standby redundancy.
smooth capacity
expansion •The ATAE switching unit supports 1+1 redundancy.
Fast recovery •The ATAE subrack's fans support 1+1 redundancy.
SAN boot technique is used.
when faults occur.
Plug & Play for boards •The service bus of the ATAE subrack adopts dual-star redundancy.
density disk arrays
24*600G/900GB high-
•The service plane and management plane are isolated from each
Data reliability other.
Service disk •The IPMB, Base plane and Fabric plane use 1+1 redundancy.
array data is •The storage system uses RAID1+0 and RAID5 for data protection.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 6
High Reliability, N:1 Node Redundancy
This ATAE scheme is similar to the
solution that an HA system is
embedded into each OSS product.
PRS
Nastar-Standby
PRS-Standby
PRS
Nastar
ATAE features carrier-class
reliability. It provides
redundancy protection for key
function modules. When a
PRS-N:1 Nastar - N:1 function module malfunctions,
the system automatically switch
services to the standby module.
PRSDB
DB-Standby
M2000-Standby
M2000
M2000-DB
functioning as a carrier-class
server is 99.999%. The reliability
of SUN and HP servers is IT class
M2000 - N:1 DB-N:1 level and is 99.99%.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 7
Contents
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 8
ATCA
Advanced Telecom Computing Architecture (ATCA) is a standard derived from the CPCI
standard. It meets the new requirements in the telecom area.
Vendors such as Intel, Force, Elma, Radisys, Schroff, Rittal, Bustronic, and Pigeonpoint are
engaged in the development and research of basic ATCA parts and have their definite
roadmaps.
Huawei uses ATCA to develop products and package the products into the ATAE platform.
Basic features of ATCA:
Dual –48 V DC redundant power supplies
High-speed differential signal connector
8 U x 280 mm board
1.2-inch slot spacing, which holds high heat sinks and helps design the air vents for heat
dissipation
High-speed subboard for hot swapping
Standard IPMI management bus, which manages all parts in the system
Open software architecture and CGL OS
Compliance with the NEBS and ETSI standards
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 9
ATAE Cluster Cabinet Deployment Scheme
DC-based scheme (recommended) AC-based scheme
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 10
ATAE Cluster Cabinet Deployment
Scheme (Continued)
Compared with the traditional deployment
scheme, the application systems like M2000 and
PRS are deployed separately from the database
system; that is, data processing and data storage
are separated.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 11
ATAE Cluster Cabinet Deployment Scheme
01 02 03 04 05 06 07 08 09 10 11 12 13 14
Nastar-Standby
OSMU Standby
SAU- Standby
PRS-Standby
Nastar DB
Reserved
Reserved
Reserved
PRS-DB
Nastar
LSW
LSW
SAU
PRS
01 02 03 04 05 06 07 08 09 10 11 12 13 14
M2000-Standby
DB-Standby
Reserved
Reserved
M2000DB
Reserved
Reserved
Reserved
(Sybase)
M2000
M2000
OMSU
LSW
LSW
TS
M2000 4:1 PRS 1:1 SAU 1:1 Nastar 1:1 DB N:1
Highly reliable Highly reliable Highly reliable Highly reliable Highly reliable
ATAE Cluster supports five clusters: M2000 cluster, PRS cluster, SAU cluster, Nastar cluster, and
Oracle cluster. Each cluster uses the N:1 scheme.
An ATAE subrack has 14 slots. Slots 07 and 08 house the GE&FC switching unt and the basic
processing subrack houses the OSMU in slot 01.
The OS is installed at the bottom of each board and application or database software is installed
above.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 12
Product Introduction – Architecture
Hot-swap
fan tray
Redundant
Air intake vent power supply
switching board
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 13
Product Introduction – Subrack
A carrier-class modular technology is used to achieve desirable functionality,
performance, density, and high reliability.
The chassis is 14 U high and 19 inches wide. It can be installed in any
standard 19-inch subrack.
The chassis provides 14 slots for boards in the front and another 14
Front slots for interface boards at the back.
The subrack is configured with a dual-star high-speed interconnection
view
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 14
Product Introduction - Heat Dissipation System
The subrack adopts the forced air cooling method
and the bottom to top ventilation design, that is, the
air inlets are from the lower front and the air outlets
are from the top rear.
The heat dissipation system can adjust the wind
speed automatically according to the internal
temperature, and provides 200 W cooling capability
to each front slot and 30 W cooling capability to
each rear slot.
As the core component of the heating dissipation
system, the fan tray supports N:1 redundancy
protection. That is, even if a fan is faulty, the running
of the entire system is not effected. The maximum
air volume of the subrack is 560 CFM. The subrack
can meet the heat dissipation requirement of the low
consumption blades.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 15
Power Supply Module of the ATAE Cluster
Subrack (DC Cabinet)
In a DC cabinet, the power module provides four –48 V power inputs for each subrack, with the left
and right sides of the subrack each supplied with 2+2 redundant backup power. The DC power
provides each slot with redundant –48 V DC power through the backplane to ensure uninterruptible
power supply for the chassis. The following table lists the power performance specifications:
Item Specification
Maximum current 40 A
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 16
Power Supply Module of the ATAE Cluster
Subrack (AC Cabinet)
In an AC cabinet, the power module provides two –48 V power input for each subrack, with the left
and right sides of the subrack each supplied with 1+1 redundant backup power. The DC power
provides each slot with redundant –48 V DC power through the backplane to ensure
uninterruptible power supply for the chassis. The following table lists the power performance
specifications:
Item Value
Maximum current 60 A
ATAE 2 Circuits, 1+1 Redundancy
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 17
Product Introduction – Server Board (1)
dual-core or quad-core Intel® Xeon® processor with low power consumption.
Each processor supports 12 MB L3 buffer.
The memory capacity is 48 GB (6 x 8 GB).
The memory model is DDR3 1333000 KHz.
The server board supports two Ethernet (10/100/1000 M Base-T) Base interfaces.
The server board supports two Ethernet (1000 M Base-B) Fabric interfaces.
The server board supports one Ethernet (1000 M Base-B) Update channel interface.
The server board provides two USB 2.0 interfaces (compatible with USB 1.1) and
one Intelligent Platform Management Controller (IPMC) serial port. (The IPMC port
also works as the system serial port. Its communication standard is RS232, and its
interface type is RJ-45.)
The server board supports two SAS hard disks. Each SAS hard disk provides a 300
GB capacity.
The server board provides an IPMC module that is supplied with power
independently. The IPMB module connects to the chassis management board
through a redundant backup IPMB bus.
The IPMC module provides the following functions:
FRU, SDR, and SEL information management
Server board Temperature detection, voltage detection, and alarming
Model: AUPSA Hot-swap control, power-on/off control, and reset control
Console redirection
Remote KVM over IP
Note: In an ATAE cluster system, only the OSMU board is configured with two 300 GB hard disks, where none of the
other boards is configured with hard disks.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 18
Product Introduction – Server Board (2)
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 19
Product Introduction – Switching Board (1)
Basic configuration (including only the Base plane)
Twelve GE ports connect the switching board to twelve board slots.
Two GE ports connect the switching board to the SMMs of the active and standby
chassis.
One GE port connects the switching board to the Base switching plane of another
switching board slot so that the active and standby Base switching planes work in
redundancy mode.
Eight ports connect the switching board to the interface board of the switched
network through the backplane so that the switching board provides external
network interfaces.
Two ports connect the switching board to the GE and FC daughter boards
respectively.
Basic configuration + GE module: provides an additional Fabric switching
plane which is independent from the Base switching plane and provides
21 ports
Twelve 20GE ports connect the switching board to 12 slots.
One 10GE port connects the switching board to the Fabric switching plane of
another switching board slot so that the two Fabric switching planes work in
redundancy mode.
Eight 10GE ports connect the switching board to the interface board of the
switched network through the backplane so that the switching board provides
external network interfaces.
Basic configuration + GE module + FC module: provides an additional FC
optical switch function and four external 8 Gbit/s FC ports through the
Switching board interface board, which can be used to set up an SAN storage network)
Model: AXCBF1
The configuration of the switching board in an ATAE cluster system
is as follows: basic configuration + GE module + FC module.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 20
Product Introduction – Switching Board (2)
Interface Description
Data cache 4 GB
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 22
Product Introduction –S3900 Disk Array
Processor 64-bit
Sixteen 8 Gbit/s FC
Host interface
host interfaces
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 23
Product Introduction –S3900 Disk Array (for
Backup)
Processor 64-bit
Disk capacity 2 TB
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 24
Contents
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 25
Contents
3. ATAE Cluster Scheme
3.1 Constraints of ATAE Cluster Design Specifications
3.2 ATAE Cluster Networking
3.3 ATAE Cluster SAN Boot Scheme
3.4 ATAE Cluster Storage Scheme
3.5 Clusters and Services in the ATAE Cluster
3.6 ATAE Cluster Backup and Restore Scheme
3.7 ATAE Cluster Remote HA Solution
3.8 ATAE Cluster Emergency System
3.9 ATAE Cluster OSMU Maintenance Solution
3.10 ATAE Cluster Antivirus Solution
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 26
Constraints of ATAE Cluster Design
Specifications
Constraint 1: All product boards are started using SAN Boot. The
emergency system can also be started using disks configured for the
ATAE boards.
A fully-configured two-subrack cabinet
provides 24 ATAE boards.
Constraint 2: When the resources required for co-deployment of A maximum of 23 boards, including
service boards, DB boards, and
multiple products exceed the capability of one cabinet using two standby boards, can be installed for
subracks, two or more cabinets must be used. However, note that one product applications.
sale has broken constraint 3, the sales department must confirm with
R&D to determine the supported or not. One backup storage subrack
(BSS)
Twelve 2 TB hard disks are
configured (compatible with the
configuration of twenty-four 1 TB
hard disks or twenty-four 600 GB
hard disks on the live network).
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 27
Capacity Expansion Principle: Flexible Board
Layout and Mixed Configuration
The following uses an example of expanding M2000/PRS (400 equivalent NEs) to M2000/PRS (800
equivalent NEs) .
400 equivalent NEs 800 equivalent NEs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Switching board 1
Switching board 2
M2000 boards are
mixed; they are
separated by PRS
boards in middle.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Switching board 1
Switching board 2
Switching board 1
Switching board 2
M2000 Standby
PRS Standby
PRS Standby
M2000 Stdby
M2000 MED
M2000 MED
M2000 MED
DB Standby
DB Standby
M2000 DB
M2000 DB
M2000 DB
M2000 TS
M2000 TS
PRS DB
PRS DB
M2000
M2000
OSMU
OSMU
PRS
PRS
Deploy M2000 and PRS in sequence in the subracks of the Deploy the new boards for capacity expansion in the
cabinet. The standby DB board is always deployed in slot 14 remaining slots of the subrack in sequence.
Capacity expansion procedure:
because it is shared by multiple products. 1. Export live network data from OSMU.
The planning tool is used during 2. Use the planning tool to import live network data and select the
target network for capacity expansion.
actual capacity expansion planning. It 3. The planning tool generates capacity expansion planned data.
(including board layout after capacity expansion without the need
generates the board locations for of manual adjustment)
capacity expansion by following the 4: Install hardware based on the capacity expansion plan.
5. Use the OSMU to import the capacity expansion planned data
principle of "flexible board layout and and activate boards.
mixed configuration". 6. Use the capacity expansion script of the service products to
complete the capacity expansion.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 28
Contents
3. ATAE Cluster Scheme
3.1 Constraints of ATAE Cluster Design Specifications
3.2 ATAE Cluster Networking
3.3 ATAE Cluster SAN Boot Scheme
3.4 ATAE Cluster Storage Scheme
3.5 Clusters and Services in the ATAE Cluster
3.6 ATAE Cluster Backup and Restore Scheme
3.7 ATAE Cluster Remote HA Solution
3.8 ATAE Cluster Emergency System
3.9 ATAE Cluster OSMU Maintenance Solution
3.10 ATAE Cluster Antivirus Solution
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 29
ATAE Cluster Networking
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 30
ATAE Cluster Networking (Compatible with
the Networking on the Live Network)
SMM SMM
IPMB
Switch Switch
Backplane
Base interface Board Board
PMC interface
Maintenance
Lan Lan of the rear
network
Switch Switch board
Service
network
Backplane FC
Switch Switch
interface
Fabric interface Switch Switch FC storage network
Board Board
VCS heartbeat Board Board
network
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 31
ATAE Cluster Networking (For V2R1 and
Later)
SMM SMM
Backplane Base
IPMB
interface
Switch Switch
Maintenance Board Board
network & VCS
heartbeat PMC interface of
network Lan Lan the rear board
Switch Switch
Independent
network plane 2
Switch Switch
Fabric interface Backplane FC
Board Board Switch Switch
interface
Independent Board Board
FC storage
Lan Lan network plane 1 network
Customer's
Switch Switch
network
Router
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 32
ATAE Cluster Networking (Compatible with
the Networking on the Live Network)
Network Number of
Network
Purpose Adapter Network Description
Type
Type Interfaces
The ATAE cluster supports The VCS heartbeat network uses the link
Used for the
the following types of VCS
communication between Fabric
layer for communication, and therefore you
network: heartbeat 2 do not need to configure an IP address. The
the nodes in the VCS plane
network two links of the heartbeat network work in
cluster
VCS heartbeat active/standby mode.
network Used for production The maintenance network uses a fixed IP
installation and for the address. Changing the address is not
Maintenance network Maintenance
routine maintenance of Base plane 2 allowed. The maintenance network
network
Service/DB network ATAE boards and disk implements redundancy through the dual
arrays planes.
FC storage network
Used for the
communication on the
service/DB network, for The service network connects to the public
Service
Service example, the M2000 network through each ATAE rear interface
interface 2
network connecting to NEs and the board, and implements redundancy through
board
DB board providing the dual planes.
database services for the
M2000
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 33
ATAE Cluster Networking Design (For V2R1
and Later)
Network Number of
The ATAE cluster Network
Purpose Adapter Network Description
supports the following Type
Type Interfaces
types of network:
The basic service network connects to the
VCS Used for communication
VCS heartbeat heartbeat between nodes in the Base plane 2
public network from the back of the switching
network board, and implements redundancy through
network VCS cluster
the dual planes.
Maintenance network
The maintenance network uses a fixed IP
Service/DB network address. Changing the address is not
allowed. The maintenance network
FC storage network
Used for production implements redundancy through the dual
Maintenance installation and routine planes.
Base plane 2
network maintenance of the ATAE The VCS heartbeat network uses the link layer
board and disk arrays. for communication, and therefore you do not
need to configure an IP address. The two links
on the heartbeat network work in
active/standby mode.
Used for service/DB
communication. For
The service network connects to the public
example, the M2000
Service Fabric network through each ATAE rear interface
connects to the NEs, or 2
network Plane board, and implements redundancy through
the DB board provides the
the dual planes.
database service for the
M2000.
The storage network connects to the service
Business board mount board or OSMU board by FC cables through
FC storage service storage space. switching board.
FC module
network OSM board mount The FC interface is provided by the fiber
backup space. loopback module of the rear board and directly
communicates with the switching board.
Note:
The network cable of the rear board of the OSMU connects to the service network so that users can remotely log in to
the system.
The OSMU communicates with and maintains boards and disk arrays through the maintenance network.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 34
Requirements for the IP Address of the
ATAE Cluster Network (Two Subracks)
Maintenanc Service
VCS
e Network Network
Device Heartbeat Description
(Private IP (Public IP
Network
Address) Address)
Each SMM is configured with one IP address. (By default, the SMM IP address is
192.168.128.23 or 192.168.128.24 in the MPS and is 192.168.128.26 or
SMM in the ATAE 192.168.128.27 in the EPS). The two SMMs work in active/standby mode. A
6 0 0
subrack logical IP address is configured for the two boards to provide external services.
By default, the logical IP address is 192.168.128.25 in the MPS and is
192.168.128.28 in the EPS.
Each disk array with disk array controllers is configured with two maintenance IP
addresses:
The default maintenance IP addresses of the service disk array
Disk array 4 0 0
are192.168.128.203 and 192.168.128.204.
The default maintenance IP addresses of the backup disk array are
192.168.128.201 and 192.168.128.202.
M–N service/DB Each board is configured with one public IP address. The public IP address also
boards serves as the logical IP address for providing external services. (Note: You
need to apply for an IP address only for an external service network.)
(M represents the
total number of The default private IP address is 192.168.128. [100 + {subrack number –1 } * 14
M-N 0 M-N
boards and N + slot number] .
represents the Note: A public IP address is not required for the active board of the locally
number of deployed ES. The IP address of the active board of the emergency M2000
standby boards) system can be used.
You do not need to plan a public IP address for the standby board. When the
services on a service board are switched to the standby board, the IP address of
N standby boards N 0 0 the service board is switched to the standby board.
The default private IP address is 192.168.128. [100+subrack No.*2+slot No.]
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 35
Introduction to the ATAE Cluster Storage
ATAE cluster storage groups are classified into the following disk arrays by functionality:
Service disk array (SAS 600 GB):
1. Determine disk array configurations based on sale scenarios. The following describes an example where the total size of
required space is S:
a. If S is smaller than 5,800 GB, one MSS needs to be configured (if the disk array is configured with disk array controllers).
b. If S is larger than or equal to 5,800 GB but smaller than 11,900 GB, one MSS and one ESS need to be configured (if a disk array
extended frame is installed).
c. If S is larger than or equal to 11,900 GB but is smaller than 18,000 GB, one MSS and two ESSs need to be configured.
2. The MSS connects to the ATAE switching board through an optical fiber and connects to each ESS through an SAS
cable.
3. The service disk array stores the data of the OSs, services, and databases on all boards.
4. The RAID 1+0 technology is used to achieve redundancy protection for the service disk array. Each disk array consisting
of 24 hard disks forms an RAID group. Each RAID group contains two hot spare disks and only 11 hard disks provide
available space.
Backup disk array (SATA 2 TB or 3.5")
1. No extended storage subrack (ESS) is used with the backup storage subrack (BSS) when a single ATAE subrack is
used.
2. The BSS communicates only with the OSMU by connecting to the switching board through an optical fiber.
3. The backup disk array stores the backup data, including the OS backup data, dynamic backup data, and static backup
data, on each ATAE board.
4. The RAID 5+1 hot standby technology is used to achieve redundancy protection for the backup disk array.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 36
Connecting the Subracks (MPS/EPS) and the
Switches
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 37
Connecting the Controller Enclosure and the
Subrack
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 38
Connecting the Controller Enclosure and the
Subrack
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 39
Connecting the MPS and the EPS
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 40
Networking Between the MSS and ESS of
the Service Disk Array
Note:
The cascading structure of disk array can be either of the following based on the deployed products
and network scale: one MSS + one ESS and one MSS + two ESSs.
The ESS connects to an MSS or ESS through an SAS cable.
The MSS connects to the ESS in dual lines for redundancy.
No ESS connects to the BSS in the current version.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 41
Contents
SAN is short for storage area network. It is a type of high-speed storage network which is similar to the
common local area network (LAN). An SAN network directly connects the server and the disk array through
dedicated hubs and FC switches. It allows you to establish a data connection between the disk array and
each ATAE board using an optical fiber and built-in ATAE fiber switching board.
3. Why is the SAN boot technology introduced into the ATAE cluster?
Higher reliability because OS data is integrated into the disk array: None of the ATAE boards is configured with
local disks or mechanical parts. This improves reliability.
Quick fault recovery: If a board becomes faulty, you can replace the faulty board without reinstalling the system.
Instead, you only need to change the mapping between the board and the OS on the disk array. This simplifies
user operations.
Centralized management: The boot disk of the server is stored on the storage device for centralized
management. This helps fully use the various advanced management functions of the storage device.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 43
Board Hot Swap due to SAN Boot
The following describes the SAN boot technology using an example where the M2000, PRS, and None of the ATAE boards is configured
TranSight (400 equivalent NEs) are deployed: with local disks. The disk array provides a
110 GB storage space for the OS of each
1 2 3 4 5 6 7 8 9 10 11 12 13 14
board.
M2000 Standby If local disks have been configured on the
PRS Standby
DB Standby
ATAE boards such as the emergency
M2000(TS)
M2000DB
PRSDB
M2000
M2000
OSMU
LSW
LSW
PRS
ES
disks are used for installing the operating
system; that is, SAN boot is not used to
boot the operating system.
OS data, service data, and database data
are completely stored in the disk array.
Board replacement does not require
system reinstallation. This simplifies onsite
Storage switching The ES uses local disks
instead of SAN boot to operations.
The OSMU is handle emergencies
responsible for occurred when disk
initialization and arrays are damaged. Quick system recovery
maintenance of the
disk array. Therefore,
it cannot act as SAN
boot.
Secure and reliable data
Disk array
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 44
Contents
LUNS
OS M2K DB:RAW PRS DB:RAW and The mapping between the disk array and
APP:/export/ and APP:/export/h /export/home the OS is implemented through the WWN
home /export/home ome number of the fiber card.
S2600-MSS S2600-ESS-1
S2600-ESS-1
RAID1+0
S2600-ESS-0 S2600-ESS-2
S2600-ESS-2
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 46
Mapping Between the LUN of the Backup
Disk Array and the OSMU
OSMU Service board/DB board
Note:
/export/home
All LUNs on the backup disk array
are allocated to the /export/home
directory in the OSMU.
MAP
The /export/home directory is used
to store all backup data.
LUNS
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 47
Contents
3. ATAE Cluster Scheme
3.1 Constraints of ATAE Cluster Design Specifications
3.2 ATAE Cluster Networking
3.3 ATAE Cluster SAN Boot Scheme
3.4 ATAE Cluster Storage Scheme
3.5 Clusters and Services in the ATAE Cluster
3.6 ATAE Cluster Backup and Restore Scheme
3.7 ATAE Cluster Remote HA Solution
3.8 ATAE Cluster Emergency System
3.9 ATAE Cluster OSMU Maintenance Solution
3.10 ATAE Cluster Antivirus Solution
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 48
VCS Cluster System and Principles
The VCS software takes resource as the minimum
management unit and resource group as the set of
resources. In the SLS system, each board maps a
service group and each group has sources defined by
users. These resources are dependent on each other.
Before using the VCS software to manage the
resources, you must register these resources first.
The VCS SLS scheme supports N:1 backup for boards
in the network. The N boards in the SLS cluster share a
standby board. The cluster consisting of the service
boards the standby board is centrally monitored and
managed by the VCS software. If a software or
hardware resource fault within the monitoring range
occurs on any board in the SLS cluster, the VCS
software will try to start the resource on the board first.
If the VCS software fails to start the service, it will
automatically switch all resources of the board including
the applications to the standby board.
To enable N boards to share a standby board, form
N service groups using each of these N boards and the
standby board separately. These service groups are
centrally managed and schedule by the VCS software.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 49
Cluster Systems in the ATAE Cluster
The following describes the application clusters where the M2000
and PRS (400 equivalent NEs) are deployed:
The ATAE SLS system includes the
Mast er Sl ave M2K- DB St andby PRS St andby
( Sybase)
service cluster such as M2000 cluster and
PRS cluster and the database cluster
PRS Cl ust er
which is comprised by all database boards.
M2000 Cl ust er
Each cluster is configured with a standby
Logi cal I P board.
DB Cl ust er
When the Sybase database is used, the
Sybase database board and the M2000
service board are deployed in the same
cluster.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 50
OSS Application Cluster
Take the M2000 as an example. The cluster system achieves high availability of
the application system through the standby board.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 51
VCS Resource of the OSS Application
Cluster Cluster Involved
Purpose Naming Rule
Resource Node
srXsY_oss_sg_nic_rs
Network Service node To monitor the where
APP
adapter and backup running status X represents the ATAE
(NIC) node of the NIC. subrack and Y represents the
active service board.
srXsY_oss_sg_ip_rs
To monitor the
MountPoint Logical IP Service node where
Logical IP running status
and backup X represents the ATAE
address of the logical
node subrack and Y represents the
IP address.
active service board.
srXsY_oss_sg_dg_rs
DiskGroup NIC Service node To monitor the where
Disk group
and backup resources in X represents the ATAE
resource
node the disk group. subrack and Y represents the
active service board.
Note:
Upper-layer resources depend on lower- To monitor the srXsY_oss_sg_mount_rs
Mount Service node running status where
layer resources.
point and backup of the X represents the ATAE
Services are started from bottom to top.
resource node /export/home subrack and Y represents the
Resources at the same layer are started at mount point. active service board.
the same time.
Services are stopped from top to bottom. srXsY_oss_sg_ossapp_rs
Service groups are named in the format of OSS Service node To monitor the where
active board subrack number_oss_sg, application and backup running status X represents the ATAE
for example, sr2s2_oss_sg. resource node of the M2000. subrack and Y represents the
active service board.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 52
Sybase VCS Resource of the OSS Cluster
Sybase backup
Cluster Involved
Purpose Naming Rule
Resource Node
srXsY_db_sg_nic_rs
DB node To monitor the
Network where
and backup running status of
adapter (NIC) X represents the ATAE subrack and
Sybase node the NIC.
Y represents the active DB board.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 53
DB Cluster
DB-4
All DB boards form the N:1 cluster
through VCS; where, N represents the
active DB boards and 1 represents the
DB-Standby standby DB board.
Each DB board runs an independent
instance. The instance name differs
DB-1
from one DB board to another.
DB-3 All the DB boards and the standby
board form a cluster. Each DB board
and the standby board form a service
group.
DB-2
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 54
VCS Resource of the DB Cluster
Oracle Listener Cluster Involved Purpose Naming Rule
Resource Node
Network DB node To monitor the srXsY_db_sg_nic_rs
adapter and running status where
ORACLE Logical IP (NIC) backup of the NIC. X represents the ATAE subrack and
node Y represents the active DB board.
Logical IP DB node To monitor the srXsY_db_sg_ip_rs
address and running status where
Mount NIC backup of the logical IP X represents the ATAE subrack and
node address. Y represents the active DB board.
Oracle DB node To monitor srXsY_db_sg_oracle_rs
resources and Oracle where
DiskGroup backup instances. X represents the ATAE subrack and
node Y represents the active DB board.
Disk group DB node To monitor the srXsY_db_sg_dg_rs
resource and resources in the where
Note: backup disk group. X represents the ATAE subrack and
Upper-layer resources depend on lower- node Y represents the active DB board.
layer resources.
Services are started from bottom to top. Mount DB node To monitor the srXsY_db_sg_mount_rs
point and running status where
Resources at the same layer are started
resource backup of the X represents the ATAE subrack and
at the same time. node /export/home Y represents the active DB board.
Services are stopped from top to bottom. mount point.
Service groups are named in the format
of active board subrack number Oracle DB node To monitor the srXsY_db_sg_listener_rs
_db_sg, for example, sr2s2_db_sg. listener and running status where
resource backup of the listener. X represents the ATAE subrack and
node Y represents the active DB board.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 55
Example of OSS Application Cluster
Switchover Only the services on one service
board can be switched to the
Service Group-0 Service Group-1 standby board at a time.
The switchover process stops all
resources on the malfunctioning
Not FailOver! board in sequence first and then
starts all resources on the standby
FailOver board in sequence.
The switchover of the service
APP-1 APP-2
group on one board will not trigger
the switchover of service groups on
other boards of the same cluster.
You can view the status on the
OSMU device panel after the
switchover succeeds.
APP-Standby Resource switchover is
automatically triggered by the VCS
when resources malfunction. In
routine maintenance, you can
manually perform the switchover.
The processes are the same.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 56
Example of DB Cluster Switchover
Only the services on one DB board can be
switched to the standby DB board at a time.
Service Group-0 Service Group-1 The switchover process stops all resources
on the malfunctioning board in sequence first
and then starts all resources on the standby
board in sequence.
Not FailOver!
The switchover of the service group on one
board will not trigger the switchover of
FailOver services boards on other boards of the same
cluster.
DB-1 DB-2
The switchover of M2000/PRS/Nastar
service groups will not be triggered when
database switchover occurs but applications
will be stopped. After the switchover
succeeds, the applications will be started
automatically.
You can view the status on the OSMU
DB-Standby device panel after the switchover succeeds.
Resource switchover is automatically
triggered by the VCS when resources
malfunction. In routine maintenance, you can
manually perform the switchover. The
processes are the same.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 57
Contents
3. ATAE Cluster Scheme
3.1 Constraints of ATAE Cluster Design Specifications
3.2 ATAE Cluster Networking
3.3 ATAE Cluster SAN Boot Scheme
3.4 ATAE Cluster Storage Scheme
3.5 Clusters and Services in the ATAE Cluster
3.6 ATAE Cluster Backup and Restore Scheme
3.7 ATAE Cluster Remote HA Solution
3.8 ATAE Cluster Emergency System
3.9 ATAE Cluster OSMU Maintenance Solution
3.10 ATAE Cluster Antivirus Solution
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 58
Data Levels and Specifications for ATAE
Backup and Restoration Dynamic data
Static data
The ATAE cluster solution provides three backup levels based on the data to be backed up. OS data
Dynamic Data
Dynamic data includes the data in the dynamic configuration files and database. Such data is generated during
the product running and is backed up once a day. A maximum of N weeks of data can be saved. This ensures
that the system can be restored based on the data of any day within the N weeks.
Static Data
Static data includes the binary codes of the product and all third-party software. N backups are performed for all
the static data after the initial installation is complete or the product and third-party software are upgraded. This
ensures that the system can be restored based on the static data of the last N backup when the system is
properly started.
OS Data
OS data includes the data of all board OSs. N backups are performed for the data after the initial installation is
complete or the OS is upgraded. This ensures that the system can be restored based on the data of the last N
backup when the OS is properly started.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 59
Data to Be Backed Up and Restored in
ATAE
Dynamic Data
It includes the service dynamic files of the OSS systems like M2000, PRS, Nastar, eSAU,
TranSight and SONMaster and database data like the configuration files saved in
/export/home/omc and data in each table space of the database.
Static Data
It includes the applications of the OSS system or Oracle. For example, the binary files in
directories like /export/home/oracle, /opt/OMC, and /opt/PRS.
OS Data
It includes the OS data of all boards. That is, all data in the / partition.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 60
Technical Implementation for ATAE Backup
and Restoration
Dynamic Data
The OSS (such as the M2000, PRS, Nastar, eSAU, and eCoordinator) backs up dynamic data on the client.
After the backup is complete, a backup script is executed to automatically upload the backup data to the
backup disk array using the SCP protocol. You do not need to stop the services when backing up the
dynamic data. You can manually restore the dynamic data using the OSMU GUI. The backup data is saved
as a tar.gz file.
Static Data
You can back up and restore static data using the OSMU. You do not need to stop the services when
backing up the static data. After the backup is complete, the data is transferred to the backup disk array
using the SCP protocol. The backup data is saved as a tar.gz file.
OS Data
You can back up and restore OS data using the OSMU. You need to use dd+bzip2 to back up the OS data to
the backup disk array. Based on test results, 50 GB partition-based OS data can be compressed to a 700 MB
file using the maximum compression ratio. It takes 30 minutes to back up the OS data and 40 to 50 minutes to
restore the OS data. From ATAE V200R001C01 onwards, you do not need to stop services and restart the OS
during OS backup. However, you must restart the OS during restoration. In versions earlier than ATAE
V200R001C01, you need to restart the OS during OS backup.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 61
Typical Scenarios of ATAE Data Backup
Scenario Data Backup Scheme
After the initial installation of the OSS systems like Back up the OSS system data in the following sequence:
M2000/PRS/Nastar/eSAU is completed 1. Use the OSS client to manually back up dynamic data and save the
data to the backup disk array.
2. Back up the OSS system and DB static data to the backup disk array.
3. Back up the OSS system and DB OS data to the backup disk array.
After the OSS system or OSMU server software is Back up the OSS system data in the following sequence:
upgraded 1. Use the OSS client to manually back up dynamic data and save the
After a patch is installed for the OSS system or OSMU data to the backup disk array.
server software 2. Back up the OSS system and DB static data to the backup disk array.
After the database software is upgrade Note:
After capacity expansion is carried out for the The operating system changes a lot after the cross-R-version
database upgrade is carried out for the OSS server software. In this case,
you are advised to backup the boards' operating system data.
After the operating system is upgraded Back up the operating system data to the backup disk array.
Routine applications Use the OSS client to periodically back up dynamic data to the backup
disk array.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 62
Typical Scenarios of ATAE Data Restoration
The OSS system like M2000, PRS, Nastar, or Restore only the dynamic data.
eSAU are functioning properly but users want to
roll back the OSS system to a previous state (for
example, last week).
The OSS system board's operating system runs Restore OSS system data in the following
properly but the database device is damaged or sequence:
the OSS system configuration file is damaged. 1. Static data
2. Dynamic data
Users are not sure which files are lost or Restore OSS system data in the following
damaged. sequence:
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 63
Contents
3. ATAE Cluster Scheme
3.1 Constraints of ATAE Cluster Design Specifications
3.2 ATAE Cluster Networking
3.3 ATAE Cluster SAN Boot Scheme
3.4 ATAE Cluster Storage Scheme
3.5 Clusters and Services in the ATAE Cluster
3.6 ATAE Cluster Backup and Restore Scheme
3.7 ATAE Cluster Remote HA Solution
3.8 ATAE Cluster Emergency System
3.9 ATAE Cluster OSMU Maintenance Solution
3.10 ATAE Cluster Antivirus Solution
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 64
Remote HA Solutions
In the ATAE cluster online remote HA solution, two
ATAE cluster systems are deployed in different
geographic locations, and data is synchronized
between the two systems in real time through a
dedicated data channel. If one system becomes
faulty, the M2000 services can be switched from the
faulty system to the other system at any time. This
ensures the continuity of the M2000 services. The
ATAE cluster online remote HA solution effectively
prevents losses caused by disastrous events such
as earthquake, fire, and power failure, provides
remote protection for the M2000 devices, and
thereby improves the M2000' capability in resisting
possible security risks. During active/standby
switchover of the two sites, the site that does not
synchronize NE performance data will not
automatically re-collect historical performance data
from NEs even if the services at the standby site
start. For details about how to query the historical
performance data of NEs before the switchover, see
section "Synchronizing NE Measurement Results
Manually" in U2000 Product Documentation.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 65
Contents
3. ATAE Cluster Scheme
3.1 Constraints of ATAE Cluster Design Specifications
3.2 ATAE Cluster Networking
3.3 ATAE Cluster SAN Boot Scheme
3.4 ATAE Cluster Storage Scheme
3.5 Clusters and Services in the ATAE Cluster
3.6 ATAE Cluster Backup and Restore Scheme
3.7 ATAE Cluster Remote HA Solution
3.8 ATAE Cluster Emergency System
3.9 ATAE Cluster OSMU Maintenance Solution
3.10 ATAE Cluster Antivirus Solution
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 66
Solution Description – Basic Principle (Local
Deployment Networking Scheme)
The emergency system and the primary system are
deployed on the same LAN and managed by the
same OSMU (locally deployed). The emergency
system must be properly connected to the primary
system server, managed NEs, NMS, and U2000
client so that it can take over OSS services in a
timely manner if the primary system becomes faulty.
1. If the emergency system is locally deployed,
when services are manually switched from the
primacy system to the emergency system, the
same IP address is used.
2. The internal network is used for data
synchronization between the local emergency
system and the primary system.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 67
Solution Description – Basic Principle (Remote
Deployment Networking Scheme)
The emergency system and the primary system are
deployed on different LANs and managed by different
OSMUs (remotely deployed). The emergency system
must be properly connected to the primary system server,
managed NEs, NMS, and U2000 client so that it can take
over OSS services in a timely manner if the primary
system becomes faulty.
1. When the emergency system is remotely deployed
(multiple emergency systems can be deployed in one
cabinet), when services are manually switched to from
the primary system to the emergency system, the IP
address changes. One public IP address needs to be
applied for each emergency system board.
2. The public network is used for data synchronization
between the remote emergency system and the
primary system.
3. When NEs in the primary system are taken over by the
emergency system, NEs are distributed on the boards
the emergency system based on the average
allocation principles.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 68
Contents
3. ATAE Cluster Scheme
3.1 Constraints of ATAE Cluster Design Specifications
3.2 ATAE Cluster Networking
3.3 ATAE Cluster SAN Boot Scheme
3.4 ATAE Cluster Storage Scheme
3.5 Clusters and Services in the ATAE Cluster
3.6 ATAE Cluster Backup and Restore Scheme
3.7 ATAE Cluster Remote HA Solution
3.8 ATAE Cluster Emergency System
3.9 ATAE Cluster OSMU Maintenance Solution
3.10 ATAE Cluster Antivirus Solution
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 69
ATAE Cluster OSMU Maintenance Networking
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 70
OSMU Maintenance Solution
Note:
The OSMU server performs all
control logic of the software. The
OSMU agent is automatically
installed on each board when the
OS is installed on the board, and
functions as the OSMU client.
The OSMU manages and maintains
each board through the OSMU
agent.
A script for route maintenance is
stored locally on the OSMU and
ATAE boards.
The OSMU server manages and
maintains each board by invoking
the script stored in the OSMU
server or using the OSMU agent to
invoke the script saved in the board.
After invoking the script finishes,
the OSMU agent returns a result to
the OSMU server stating whether
the task succeeds or fails.
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 71
OSMU Active/Standby Board Protection
A board having two disks of 600 GB each is introduced as
the standby OSMU board.
The BSS is shared between the active and standby OSMU
boards. The BSS will be mounted to the standby OSMU
board when the active/standby OSMU switchover is
OSMU OSMU Standby triggered.
Data is synchronized between the active and standby
OSMU boards once per hour.
Data to be synchronized covers the entire OSMU software
Data is synchronized (/opt/osmu), alarms recorded by OSMU and backup data of
once per hour the OSMU itself.
A full synchronization method is adopted for the first time.
From the second time, the OSMU uses incremental
synchronization. The synchronization is based on Linux
rsync and SSH encryption.
Switchover between the active and standby OSMU boards
1 2 3 4 5 6 7 8 9 10 11 12 13 14
is manually performed. Data synchronization is triggered
Emergency system
Standby OSMU once when users switch over the active and standby
OSMU boards on the OSMU GUI.
Database
M2000
M2000
OSMU
LSW
LSW
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 72
Contents
3. ATAE Cluster Scheme
3.1 Constraints of ATAE Cluster Design Specifications
3.2 ATAE Cluster Networking
3.3 ATAE Cluster SAN Boot Scheme
3.4 ATAE Cluster Storage Scheme
3.5 Clusters and Services in the ATAE Cluster
3.6 ATAE Cluster Backup and Restore Scheme
3.7 ATAE Cluster Remote HA Solution
3.8 ATAE Cluster Emergency System
3.9 ATAE Cluster OSMU Maintenance Solution
3.10 ATAE Cluster Antivirus Solution
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 73
OSMU Board Antivirus Solution
Note:
Copyright © 2014 Huawei Technologies Co., Ltd. All rights reserved. Page 74
Thanks
www.huawei.com