0% found this document useful (0 votes)
263 views

Open Stack Deployment Design Guide

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
263 views

Open Stack Deployment Design Guide

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

CONFIDENTIAL

Metaswitch Products OpenStack


Deployment Design Guide

VC3-601 - Version 8.5 - Issue 2-1465

January 2023

A Microsoft Company
OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Notices
Copyright © 2023 Microsoft. All rights reserved.

This manual is issued on a controlled basis to a specific person on the understanding that no part of
the product code or documentation (including this manual) will be copied or distributed without prior
agreement in writing from Metaswitch Networks and Microsoft.

Metaswitch Networks and Microsoft reserve the right to, without notice, modify or revise all or part of
this document and/or change product features or specifications and shall not be responsible for any
loss, cost, or damage, including consequential damage, caused by reliance on these materials.

Metaswitch and the Metaswitch logo are trademarks of Metaswitch Networks. Other brands and
products referenced herein are the trademarks or registered trademarks of their respective holders.
CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

Contents

1 Introduction.............................................................................................................5
1.1 About this document............................................................................................................. 5
1.2 Relevant product versions.....................................................................................................6
1.3 OpenStack releases.............................................................................................................. 9
1.4 Terminology......................................................................................................................... 13
2 Requirements on OpenStack.............................................................................. 14
2.1 High availability................................................................................................................... 14
2.1.1 Redundant cloud deployments.............................................................................. 16
2.1.2 Resilience to compute node failure....................................................................... 20
2.1.3 Redundant physical networking.............................................................................20
2.1.4 Virtual IP address failover..................................................................................... 21
2.1.5 Anti-affinity and scheduling....................................................................................21
2.1.6 Guest watchdog.....................................................................................................23
3 Planning your OpenStack deployment.............................................................. 24
3.1 Product topologies...............................................................................................................25
3.2 Per-product VM requirements............................................................................................. 32
3.2.1 Hyper-threading..................................................................................................... 33
3.2.2 Using the VM resource specification tables.......................................................... 33
3.2.3 VM resource specification tables...........................................................................35
3.2.4 OpenStack overhead............................................................................................. 57
3.2.5 SAS data storage.................................................................................................. 57
3.2.6 EAS data storage.................................................................................................. 59
3.2.7 Perimeta data storage........................................................................................... 60
3.2.8 Changing VM resources........................................................................................ 61
3.3 Storage options................................................................................................................... 63
3.3.1 OBS....................................................................................................................... 64
3.3.2 SAS........................................................................................................................65
3.3.3 EAS Storage Cluster and Rhino TSN................................................................... 65
3.4 Storage for Metaswitch products on OpenStack.................................................................66
3.4.1 High performance storage..................................................................................... 67
3.5 Network configuration..........................................................................................................67
3.5.1 Network interfaces................................................................................................. 68
3.5.2 Bandwidth.............................................................................................................. 71
3.5.3 IP address assignment / DHCP.............................................................................71
3.5.4 Key-based login..................................................................................................... 72
3.5.5 Security groups......................................................................................................72
3.5.6 Specific requirements for Perimeta virtual interfaces............................................ 73
3.5.7 Specific high-performance requirements for Perimeta, Rhino nodes and the
Secure Distribution Engine........................................................................................ 74
OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

3.5.8 Specific requirements for high capacity TLS/TCP performance on Perimeta........ 79


3.5.9 Specific requirements for IPv6 on Perimeta and Rhino nodes.............................. 80
3.5.10 Support for SR-IOV for Perimeta's high availability interface.............................. 80
4 Capacity planning................................................................................................ 81
4.1 Reference hardware............................................................................................................ 81
4.1.1 OpenStack compute hosts.................................................................................... 81
4.1.2 Volume storage......................................................................................................82
4.2 Benchmarking performance................................................................................................ 82
4.2.1 CPU speed............................................................................................................ 83
4.2.2 RAM....................................................................................................................... 83
4.2.3 Disk space and I/O................................................................................................84
4.2.4 Networking............................................................................................................. 84
5 Worked examples.................................................................................................85
5.1 Large multi-product deployment..........................................................................................85
5.2 Lab deployment................................................................................................................... 92
5.3 Small-scale Perimeta lab deployment.................................................................................96
5.4 Small-scale Perimeta HA SSC + MSC with single cloud active-standby VIP
deployment............................................................................................................................97
CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

1 Introduction

1.1 About this document


The Metaswitch Products OpenStack Deployment Design Guide explains the system engineering
considerations for deploying Metaswitch products as Virtual Network Functions (VNFs) in an
OpenStack deployment.

It explains how to calculate the overall resource needs for a complete deployment, setting out the
resource needs for the virtual machines on which the products will run. It includes information on
reference hardware and the performance and capacity expected from the products on that hardware.

It does not define the set of requirements that Metaswitch products place on the OpenStack cloud
- those are set out in the Virtual Infrastructure Requirements Guide. This Guide assumes you are
familiar with that document and have confirmed your environment meets those requirements.

Between them, this document and the Virtual Infrastructure Requirements Guide give you all the
information you need to define precisely what behavior, capacity and performance you need from your
virtual infrastructure.

The Virtual Infrastructure Requirements Guide documents certain OpenStack features or configuration
that are either required or must not be used, and that information is referenced here.

Detailed instructions for the installation of Metaswitch products in an OpenStack cloud are provided in
the following per-product manuals:

• For Metaswitch Fixed Voice products, including MetaSphere CFS (and, where applicable,
RPAS, OBS, MRS), Metaswitch AGC / MGC, MetaSphere EAS, Advanced Messaging Service
(AMS), MetaView Director, ESA Proxy, and MetaView Server: Metaswitch Products OpenStack
Deployment Procedures Guide
• For MetaView Server (when not deployed alongside Metaswitch Fixed Voice products): MetaView
Server OpenStack Deployment Procedures Guide
• For Service Assurance Server: Service Assurance Server OpenStack Deployment Procedures
Guide
• For MetaSphere N-Series: N-Series on OpenStack Installation and Recovery Guide
• For Perimeta: Perimeta Initial Setup Guide (OpenStack Environments)
• For Secure Distribution Engine (SDE): Secure Distribution Engine Initial Setup Guide
• For DCM: Distributed Capacity Manager Initial Setup Guide
• For Clearwater Core: Clearwater Core Initial Setup Guide
• For BGCF VM: BGCF VM Initial Setup Guide
• For Metaswitch CCF: Metaswitch CCF Initial Setup, Scaling, and Upgrades Guide
• For the MetaView Statistics Engine: MetaView Statistics Engine Deployment and Management
Guide
• For Mobile Voice Mail: MVM Install Guide

1 Introduction 5
OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

• For Rhino VoLTE TAS: VoLTE Solution Deployment Guide


• For MaX UC (including Rhino MAX): MaX UC Installation Guide
• For MetaSphere QCall: QCall Deployment Guide
• For Storage Cluster: Storage Cluster Deployment Guide (OpenStack)
• For Distributed Admission Manager: Distributed Admission Manager (DAM) Initial Setup Guide
• For Deployment Configuration Store: Deployment Configuration Store Deployment and
Management Guide

This document does not provide guidance on building an OpenStack environment. It also does not
provide detailed instructions for installing, orchestrating or commissioning Metaswitch products; this
information is covered in the relevant product documentation referenced above.

It assumes that you are familiar with virtualization concepts in general and OpenStack specifically,
and with the Metaswitch product set. This document is intended for network engineers, with a working
knowledge of OpenStack and its various components. If you do not have this background knowledge,
we recommend that you consult existing OpenStack documentation to learn about OpenStack
concepts and features before reading this document.

This document is only suitable for planning deployments of Metaswitch on OpenStack with the
KVM hypervisor. For deployments of VMware with the ESXI hypervisor, please see the Metaswitch
Products VMware Deployment Design Guide.

If you are interested in deployments on other virtual infrastructures, such as bare KVM, or VMware
Integrated OpenStack (VIO), please speak to your Metaswitch Support representative for details of
what is or isn't supported and relevant planning guidance.

1.2 Relevant product versions


This document applies to the product versions as set out in the table below.

Attention:

This manual does not apply to the Radisys MRF, a third-party component used in the Metaswitch
VoLTE Solutions. Information on the resource specifications for these VMs can be found in the
documentation for the Metaswitch VoLTE solution in which you are deploying it. For guidance
on the virtual infrastructure requirements for the Radisys MRF, please consult your Support
representative.

Note:

From the V9.5.30 release, the Accession Messaging Service is renamed as the Advanced
Messaging Service (AMS).

From the V2.31 release, the Accession Communicator for Desktop and Mobile clients are renamed
as MaX UC Desktop and MaX UC Mobile, and are collectively known as the MaX UC Clients.

6 1 Introduction
CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

Product Version

BGCF VM V11.4.07+

Clearwater Core V10+

Distributed Admission Manager (DAM) V2.0+

Deployment Configuration Store (DCS) V1.0+

Distributed Capacity Manager (DCM) V3.1+

MetaSphere CFS (including RPAS, OBS, MRS) V9.3.20+

Metaswitch AGC / MGC

MetaSphere EAS

(including EAS pool server system from V9.2.10 and virtual EAS DSS
from V9.5.20)

MetaView Server

MetaView Director

ESA Proxy

Advanced Messaging Service (from V9.4)

MetaSphere N-Series V3.9+

Metaswitch CCF V5.0+

Metaswitch Deployment Manager V1.0+

MetaView Statistics Engine (MVSE) V3.0+

Mobile Voice Mail (MVM) V2.15.0+

Perimeta (ISC, SSC, MSC) V4.2+

QCall V1.0.0+

Rhino VoLTE TAS V2.6.0+

Rhino nodes (Mobile Control Point) Rhino MCP nodes - V1.0+

Rhino TSN and REM nodes


- V4.0+

1 Introduction 7
OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Product Version

Rhino nodes (MaX UC) Rhino MAX nodes -


V9.6.00+

Rhino MAG nodes - V3.0+

Secure Distribution Engine (SDE) V1.0+

Service Assurance Server V9.3.20+

ServiceIQ Management Platform (SIMPL) V6.3.8+

ServiceIQ Monitoring (SIMon) V7.0+

Storage Cluster V1.0.0+

A number of different VMs are build on the Rhino platform and share certain infrastructural properties.
These nodes form the basis of the Rhino VoLTE TAS and the Mobile Control Point (MCP) and
are also used in the MaX UC solution. In this document, the node type is specified only where
requirements differ between types of Rhino node; for requirements common to all VMs built on Rhino
the umbrella term "Rhino nodes" is used.

The following Rhino node types are covered in this document:

• MMT
• SMO
• MAG
• MAX
• TSN
• MCP
• REM.

This table indicates only the versions of a given product for which the guidance in this document
is valid. It does not provide information about version compatibility between different Metaswitch
products. Please see individual product guidance or speak to your Metaswitch Support representative
for details.

Attention:

Some information in this document refers to specific versions of some products. Where this is the
case, it is clearly indicated in the text. Please check which versions of Metaswitch products you are
using.

8 1 Introduction
CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

1.3 OpenStack releases


Metaswitch virtual products support OpenStack with the KVM hypervisor. This section lists the
OpenStack releases supported by each Metaswitch product.

Support and testing policy

Metaswitch products are currently supported from OpenStack's Newton release onward. The support
matrix for Metaswitch product versions and OpenStack releases defines three levels of support:

• Tested (Te): The specified product version has been thoroughly tested running on the specified
OpenStack release by our product teams and is guaranteed to work and to meet our stated
capacity and performance benchmarks.
• Supported (Su): You may run the specified product version on the specified OpenStack release;
however, this product version/OpenStack release combination has not undergone extensive
testing in our labs. You must therefore test all aspects of your deployment in your own lab before
deploying this product version/OpenStack release combination.
• Not supported (No): We do not support the specified product version/OpenStack release
combination, and make no guarantees whatsoever that the product will work as intended or that
we will be able to assist with any problems you may encounter when running this combination.
We withdraw support for an OpenStack release when it is out of support with Red Hat (however,
we may declare support for a release before it is adopted by Red Hat). For details of Red
Hat's support schedule see Red Hat OpenStack Platform Lifecycle. Mappings between named
OpenStack releases and Red Hat OpenStack Platform release numbers, where applicable, are
listed below.

Attention:

As part of our support agreement, we require you to adhere to the following conditions when
deploying Metaswitch products on OpenStack:

• You agree to source your OpenStack infrastructure from a reputable vendor and ensure that
the release you have deployed is still in support by that vendor (vendors typically provide an
extended support period of up to five years for selected OpenStack releases).
• When deploying a product version/OpenStack release combination that is Supported but not
Tested, you agree to perform extensive lab tests on your deployment before making it live.
• Metaswitch will make every effort to help you solve any unexpected problems that may arise
when deploying on an OpenStack release that has not yet been specifically tested with our
products. However, you understand that this fixes and workarounds will take time to develop,
and on rare occasions - for example, if an OpenStack release contains a bug that breaks
compatibility with our products in a fundamental way - it may not be possible to work around the
problem and you may need to consider deploying on a different OpenStack release.

If you intend to deploy existing versions of the Metaswitch products on OpenStack releases that are
not explicitly identified as supported, please discuss this with your Support representative.

1 Introduction 9
OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Supported releases

Metaswitch products support the following OpenStack releases, as detailed in the table below:

• N: Newton (equivalent to Red Hat OpenStack Platform release 10)


• O: Ocata (Red Hat 11)
• P: Pike (Red Hat 12)
• Q: Queens (Red Hat 13)
• R: Rocky (Red Hat 14)
• S: Stein (Red Hat 15)
• T: Train (Red Hat 16)
• U: Ussuri
• V: Victoria
• W: Wallaby

Table 1: Supported OpenStack releases by Metaswitch product

Product Version(s) N O P Q R S T U V W

Perimeta V4.2- Te No No No No No No No No No
V4.2.20

V4.2.40- Te Te No No No No No No No No
V4.6.20

V4.6.40- Te Te Te Te No No No No No No
V4.8.20

V4.8.25+ Te Te Te Te Te Su Su Su Su Su

Clearwater V11.1, Te Te No No No No No No No No
Core V11.2,
V11.2.01

V11.3, Te Te Te Te Su Su Su Su Su Su
V11.3.01,
V11.4,
V11.4.02

V11.5+ No No Te Te Su Su Te Su Su Su

DCM V3.1-V3.2 Te No No No No No No No No No

V3.3 Te Te No No No No No No No No

V3.4 Te Te Te Te Su Su Su Su Su Su

10 1 Introduction
CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

Product Version(s) N O P Q R S T U V W

V4.0 Su Su Su Su Te Te Te Su Su Su

Service V9.3.20 Te No No No No No No No No No
Assurance
Server V9.4 - V10 Te Te No No No No No No No No

V11 - V12 Te Te Te Te No No No No No No

V12.10+ Te Te Te Te Te Te Te Su Su Su

CFS, AGC, V9.3.20 Te No No No No No No No No No


MGC,
EAS, MVS, V9.4- Te Te No No No No No No No No
MVD,RPAS, V9.4.30
OBS, ESAP
V9.5- Te Te Te Te No No No No No No
V9.5.30

V9.5.40- Te Te Te Te Te No No No No No
V9.6.10

V9.6.20+ Te Te Te Te Te Te Te Su Su Su

Advanced V9.4- Te Te Te Te No No No No No No
Messaging V9.5.30
Service
(AMS) V9.5.40- Te Te Te Te Te No No No No No
V9.6.10

V9.6.20+ Te Te Te Te Te Te Te Su Su Su

Metaswitch V1.0 Te Te Te Su Su Su Su Su Su Su
Deployment
Manager

MetaView V3.0-V3.2 Te No No No No No No No No No
Statistics
Engine

Metaswitch V5.0 Te Te No No No No No No No No
CCF
V6.0-V8.0 Te Te Su Su Su Su Su Su Su Su

V9.0 No No Su Su Te Su Su Su Su Su

1 Introduction 11
OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Product Version(s) N O P Q R S T U V W

MVM V2.15.0 Te Te Su Su Su Su Su Su Su Su

QCall V1.0+ Te Te Su Su Su Su Su Su Su Su

ServiceIQ V6.3.8+ Te Te Te Te Te Te Te Su Su Su
Management
Platform
(SIMPL)

ServiceIQ V1.0+ Te Te Te Te Te Su Su Su Su Su
Monitoring
(SIMon)

Rhino V2.6.0 - Te Su Su Su Su Su Su Su Su Su
VoLTE V4.0
TAS nodes
(MMT, SMO,
MAG, TSN),
standalone
TSN and
REM nodes

Rhino MCP V1.0 Su Su Su Su Su Su Su Su Su Su


node

Rhino nodes Rhino MAX Te Su Su Su Su Su Su Su Su Su


(MaX UC) nodes -
V4.0.0

Rhino MAG
nodes -
V3.0.0

Group V3.0 - Te Te Su Su Su Su Su Su Su Su
Application V3.1.1
Server

Secure V1.0+ No Te Su Su Su Su Su Su Su Su
Distribution
Engine
(SDE)

12 1 Introduction
CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

Product Version(s) N O P Q R S T U V W

Deployment V1.0+ No Te Su Su Su Su Su Su Su Su
Configuration
Store (DCS)

Storage V1.0 Te Te No No No No No No No No
Cluster
V2.0+ Te Te Te Te Te Te Te Te Te Te

BGCF VM V11.4.07+ No No Te Te Te Te Su Su Su Su

Distributed V2.0+ No No No Te Te Su Su Su Su Su
Admission
Manager
(DAM)

1.4 Terminology
To avoid confusion, we define the following terminology for use in this document.

• "OpenStack cloud" refers to a single self-contained instance of OpenStack running in a single site.
An OpenStack cloud consisting of multiple availability zones counts as a single OpenStack cloud.
• "OpenStack deployment" refers to your whole deployment, which may comprise multiple
OpenStack clouds, either in separate sites for geographic / site redundancy, or within a single
site to provide resilience against a single cloud failing (see discussion in High availability on page
14).

1 Introduction 13
OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

2 Requirements on OpenStack
The complete set of Metaswitch products' requirements on OpenStack is defined in the Virtual
Infrastructure Requirements Guide. That Guide includes specific OpenStack version support, plus the
following set of optional features and configuration which must be supported:

• RAM and CPU configured not to be contended


• direct access to external networks without intervening NAT
• support for the allowed-addresses Neutron API extension
• host aggregates, availability zones or region in order to achieve anti-affinity for pools.

Note:

If you require optimal Perimeta or Secure Distribution Engine (SDE) data plane performance,
you must provide SR-IOV, PCI passthrough or a fast vSwitch in addition to the environment
requirements specified in the Virtual Infrastructure Requirements Guide.

The Virtual Infrastructure Requirements Guide also references a set of OpenStack features that
should not be used or are restricted to certain products only:

• pause, suspend and resume


• rebuild
• resize
• snapshots
• migrate
• evacuate.

See the Virtual Infrastructure Requirements Guide for full details. This Guide assumes you have read
and are familiar with the Virtual Infrastructure Requirements Guide and have checked your OpenStack
environment meets its requirements.

In addition to the above requirements, when deploying Metaswitch products into OpenStack
environments, there is some specific configuration that must be applied to images when they are
uploaded and VMs when they are created. This just makes use of standard OpenStack features.
None of this information is required for designing your deployment, but is required at deployment
time. The configuration is described fully in the deployment manuals for your chosen product(s), as
described in About this document on page 5.

2.1 High availability


The level of redundancy required for a deployment depends on the level of service continuity that is
required, and the level of support for high availability in the underlying infrastructure. The decisions
you must make are summarized in the following diagram.

14 2 Requirements on OpenStack
CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

Figure 1: Redundancy considerations

• If service preservation is not required in the presence of hardware or software failures or upgrades,
then a non-highly available deployment can be used.
• If the underlying cloud infrastructure is sufficiently reliable for the level of service required, a single
cloud deployment with Metaswitch products spread redundantly across the hosts in that cloud will
be sufficient. As an example, if you require a carrier-grade five-nines telecommunications service,
and intend to only deploy a single cloud, then that cloud itself is likely to need 6 nines availability,
that when combined with the availability of other aspects the overall offering is >5 nines.
• Bear in mind that certain limitations in OpenStack (such as OpenStack upgrade potentially
resulting in an outage) mean you may struggle to host 5 nines service within a single cloud The
Metaswitch whitepaper "Telco-grade" in the cloud: deploying reliable VoIP in NFV environments
describes the challenges associated with providing highly available cloud environments. One way
to overcome these challenges is to deploy multiple OpenStack clouds for redundancy, spread VM
instances of Metaswitch products across those clouds, and use the products' application-level
redundancy mechanisms to provide high availability even in the event of total failure of an entire
cloud. Redundant cloud deployments on page 16 provides more detail on redundant cloud
deployments.
• In order to implement application-level redundancy mechanisms in either single cloud or redundant
cloud deployments, some Metaswitch products require the ability to perform virtual IP address
failover using gratuitous ARP (or IPv6 equivalent). Virtual IP addresses are additional IP
addresses that are shared between redundant pairs of VM instances. When failover occurs, the
new primary instance takes control of the virtual IP address by broadcasting a gratuitous ARP

2 Requirements on OpenStack 15
OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

for the virtual IP address, causing peer devices to start sending their traffic to the new primary
instance.

If deploying multiple clouds, this requires layer 2 connectivity between the clouds and the ability
for VM instances within the clouds to be placed on a common subnet and share IP addresses.
The network infrastructure and OpenStack must both be configured to allow shared IP addresses
and gratuitous ARPs. Details on how to configure OpenStack to allow shared IP addresses are
provided in Virtual IP address failover on page 21.
• Whether you have a single cloud or redundant cloud deployment, you should consider the
required redundancy within a given cloud in your overall availability calculations. This may include
considerations such as resilience to compute node failure (Resilience to compute node failure on
page 20) and redundant physical networking (Redundant physical networking on page 20).
• If you require geographic redundancy for your deployment (the ability to continue to provide
service in the event of a site failure, for example due to natural disaster or complete power
outage), the GR mechanisms built into Metaswitch products can provide this capability. To
function in an OpenStack environment, the GR mechanisms simply require the provision of cloud
infrastructure in multiple sites. There are various ways of achieving this, including separate clouds
in each site, in addition to potentially having multiple clouds in any given site for cloud-redundancy
purposes.

If you require cloud level redundancy and it is not possible to configure redundant cloud deployments,
or it is not possible to use virtual IP addresses with gratuitous ARP based failover, you must use other
redundancy schemes.

• If you are deploying Perimeta in multiple clouds and shared layer 2 connectivity is not possible, it
may be possible to use a NAT-based high availability deployment model to provide redundancy.
If you are interested in using this deployment model, you must contact your Metaswitch Support
representative.
• For all other products, you must discuss your options for redundancy schemes with your
Metaswitch Support representative.

2.1.1 Redundant cloud deployments


This section applies if you have determined above that you will deploy multiple clouds in a given site.

Attention:

A multi-cloud deployment is not a substitute for a proper disaster recovery plan, for example a
deployment that uses geographic redundancy or off-site product backups.

Products with 1+1 redundancy are deployed with the primary instance in one OpenStack cloud,
and the backup instance in another OpenStack cloud. Products with N+K redundancy are deployed
with VM instances spread across the cloud deployments, and with no more than K instances in any
one cloud. For a configuration of two clouds, this will mean an N+N distribution, where either cloud
deployment has sufficient capacity to handle load in the event of failure or maintenance of the other
cloud deployment.

16 2 Requirements on OpenStack
CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

The clouds are then connected together and to the wider network. For products with 1+1 HA
redundancy the two VM instances need to be placed on a single layer 2 network / subnet.

Each instance capable of sending SAS logs is configured to log to a SAS instance in its own
cloud. The collection of SAS servers spanning the multi-cloud deployment can be joined together
in a SAS federation. For more information on SAS in multi-cloud deployments, see https://
communities.metaswitch.com/docs/DOC-205433.

The first diagram below shows how the networking might appear in a single cloud, and the second
shows how it might appear in a pair of redundant clouds located in the same data center. However,
these are just suggested implementations.

Figure 2: Example routing scheme for a single cloud

2 Requirements on OpenStack 17
OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Figure 3: Example routing scheme for cloud redundancy

One way to maintain separation of networks is to use VLAN tags, other overlays, or physical
separation for each network in the cloud pair. Public networks are connected to public VLANs, which
are configured on the switches and gateways with rules to allow access to the external network.
One way of achieving the required separation is to allocate two VLANs for Metaswitch equipment
and create two provider networks in each cloud mapping onto these VLANs. This is illustrated in the
following logical network diagram.

Figure 4: Logical separation of public and private VLANs with cloud redundancy

Since they extend across the cloud pair, these networks must be set up as provider networks in each
cloud. There must be L2 connectivity between the clouds, as the VM instances must be on the same
VLAN and IP subnet to share their virtual IP addresses.

Note that products that consist of a mixture of 1+1 VMs and quorate pool VMs, such as the Secure
Distribution Engine, need some care if you deploy across redundant clouds. In particular, an SDE
deployment is likely to need 3 redundant cloud instances where the DCS VMs are distributed across
all three cloud instances and the SDE VMs are distributed over just two of them.

18 2 Requirements on OpenStack
CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

Geo-redundancy with multi-cloud deployments

Several Metaswitch products support geo-redundant deployments, where traffic is split over two or
more sites to provide protection for when one site fails completely.

One way to deploy geo-redundant cloud deployments is to have two clouds deployed at each of
two or more sites to provide protection for each other in the case of a catastrophic site failure. The
Metaswitch products installed in these sites are configured to be geo-redundant in the usual way for
each product.

Note:

This is not the only way to achieve geo-redundancy. As discussed above, if multi-cloud is not
required for cloud-redundancy reasons, geo-redundancy can be achieved within a single cloud
in each site. This section is concerned only with achieving geo-redundancy in a multi-cloud
deployment.

However, if you are concerned about cloud availability, geo-redundancy is not a substitute for
redundancy within each site. The redundancy mechanisms used to cope with a total site failure
may lead to impacts that are justified by site failures being extremely rare (for example being
caused by natural disasters or by multiple coincident system failures within a site), but those
impacts may be unacceptable for potentially more frequent individual failures within a site, and as
such the same redundancy mechanisms may not be appropriate for local failures.

This is illustrated in the diagram below. It depicts a network consisting of two physical locations
(London and Cambridge), both containing two OpenStack clouds. The products deployed in each site
are spread over the two OpenStack clouds within that site.

• Products that have 1+1 redundancy are deployed with a given pair having the active instance in
one OpenStack cloud and the backup instance in the other, as shown for CFS (for which two pairs
are shown, each split like this).
• Products with N+K redundancy are deployed across both OpenStack clouds, as shown for RPAS.
Since each OpenStack cloud must be able to handle peak load if the other deployment in that site
fails, this becomes N+N (where N is the number of VMs required to handle peak load in the event
of a total failure of one site).
• DCM must be deployed as 1+1 per site, and thus has one instance in one cloud and one in the
other (not shown in the diagram below).

Together, the products form part of a geo-redundant deployment across both sites. If one of the
London OpenStack clouds fails, its traffic will be handled by the VM instances in the other London
cloud, with minimal service impact (e.g. calls are maintained). If both London OpenStack clouds fail,
or if the London site fails entirely because of a catastrophic event, the Cambridge site will take over as
usual for a geo-redundant deployment (e.g. see the Clustered CFS Deployment Planning Guide for
details).

2 Requirements on OpenStack 19
OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Figure 5: Example logical diagram for geo-redundancy with cloud redundancy

2.1.2 Resilience to compute node failure


Metaswitch products have built-in resilience to the failure of individual product VM instances, either
through product features or through the manual or automatic/orchestrated restart of instances on
failure (assuming they are booted from redundant shared storage). See Anti-affinity and scheduling on
page 21 for general principles and Product topologies on page 25 for product specifics.

However, you also need to consider how your deployment will handle failure of an individual compute
node without losing or reducing the availability or redundancy of the overall product offering. That is to
say you need a sensible sparing strategy, and in a cloud you could actually have your spares up and
running as extra compute nodes from the outset.

2.1.3 Redundant physical networking


You need to consider how your infrastructure will handle failure of a physical network interface,
without losing or reducing the availability or redundancy of the overall product offering.

20 2 Requirements on OpenStack
CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

This is largely standard best practice. The one specific requirement is that Metaswitch products
deployed as 1+1 HA pairs must have two alternative end-to-end diverse network paths between
the two VMs in order to avoid split brain. Speak to your Metaswitch Support representative if your
intended OpenStack deployment will not provide this.

2.1.4 Virtual IP address failover


Products that use virtual IP addresses to implement 1+1 redundancy must be able to claim the virtual
IP address from either VM instance (which may be in different clouds if deploying multiple clouds)
using a gratuitous ARP broadcast (or IPv6 equivalent), and this broadcast needs to reach all hosts
and routers on the same subnet. For VM instances using SR-IOV, there is no additional step required
to enable this.

For other instances, you must ensure that the "allowed_address_pairs" feature is enabled in
OpenStack, and that the compute nodes that will host these instances are granted permission to use
the feature. You must configure the Neutron ports for each instance with the virtual IP address of the
instance pair.

You must also ensure that virtual IP addresses are not allocated to instances by OpenStack as fixed
IP addresses. The easiest way to do this is to allocate virtual IP addresses from a range of addresses
that is not part of any allocation pool. If you want to use addresses from within an allocation pool, you
must create a placeholder port to reserve the address.

2.1.5 Anti-affinity and scheduling


Some Metaswitch products have application-level redundancy, achieved by use of a 1+1 hot standby
pair or a pooled architecture (see Product topologies on page 25).

Such products must have their VMs distributed across distinct physical host resources in such a way
that service is not lost or reduced by failure of a single physical host resource. This is called anti-
affinity. An example of physical resources that may need to be distinct is compute node host servers,
but the considerations may be broader than that.

• VM instances for products that run as a 1+1 HA pair of VMs (e.g. CFS, Perimeta in HA mode,
AGC/MGC, MVD, SDE VMs in an SDE deployment) must be configured to use distinct host
resources.
• VM instances for ESA Proxy (which uses two VMs operating as a pool) must be configured to use
distinct host resources.
• VM instances for products that run as a non-quorate pool of VMs (e.g. MRS, MVM's EVN and
SMPP pools, DCM, OBS, RPAS, AMS, Clearwater Core, Metaswitch CCF, and Rhino nodes) must
be configured to use resources spread across at least two distinct physical resources.
• VM instances for products that run as a quorate pool of VMs (e.g. MVM's OAM VM pool, SDE's
DCS, and Storage Cluster) must be configured to use resources spread across at least three
distinct physical resources.

The total number of VMs required is N+K, where N is the minimum number of VMs required to meet
peak load, and K is the maximum number of VMs in the pool on any given physical host. The value of

2 Requirements on OpenStack 21
OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

K usually emerges only as part of the process of distributing VMs across distinct resources, so arriving
at the total number of VMs of such a product can be an iterative process.

Configuring VM instances to use distinct physical resources, or limiting the number of instances on
a particular host resource, can be implemented using server groups and/or OpenStack availability
zones/host aggregates. See https://blueprints.launchpad.net/nova/+spec/instance-group-api-
extension for further details.

For most deployments, where an appropriate OpenStack release is in use, server groups with
anti-affinity rules are the easiest (and therefore recommended) method of meeting the anti-affinity
requirements of Metaswitch products.

• For highly available VM pairs, server groups with hard anti-affinity rules should be used.
• For VMs that run in N+K pools, server groups with soft anti-affinity rules should be used.

When soft anti-affinity rules are applied to a server group, the Nova scheduler attempts to enforce
the anti-affinity rules but if the requisite host resources are not available will fall back to violating
them rather than failing to instantiate the VMs at all.

Limiting the number of VM instances per host is sufficient to provide continuous service in the event
of a failure of a single host resource. However, you should also consult the discussion on cloud level
redundancy in Redundant cloud deployments on page 16.

From OpenStack Pike onward, the Nova scheduler may also be configured to favor hosts which best
match the requested PCI devices for an instance, including any Virtual Function (VF) devices if using
SR-IOV. If you are using SR-IOV in your deployment but it is not supported by all hosts, you may wish
to use the Nova scheduler to ensure that any VMs that do not use SR-IOV are preferentially assigned
to hosts that do not support it, leaving those hosts that do support SR-IOV for VMs that require it.

Attention:

The Nova scheduler will weigh a number of factors when deciding on the host to which it will
allocate a given VM. These factors include any soft affinity rules you have set, along with CPU,
RAM, storage resources and PCI devices available on a given host. The factors the Nova
scheduler considers are controlled by the scheduler_weight_classes setting in /etc/nova/
nova.conf; this setting defaults to considering all factors.

Each one of these factors can be configured with a multiplier to change the weight they have in the
overall calculation; these multipliers default to 1, so each factor holds equal weight.

If you intend to use soft anti-affinity in this way, you must configure it with enough weight to
override any potentially conflicting factors that could otherwise cause a host to be chosen for a new
VM that is not the optimal choice for the soft anti-affinity rule. (Note, however, that this does not
override host filtering: a high soft affinity weighting will mean, for example, that the scheduler will
prefer soft affinity settings over choosing the host with the most RAM, but it will always choose a
host with sufficient RAM, even if this violates anti-affinity rules.)

To ensure that anti-affinity rules take precedence over other factors, you must set the
soft_anti_affinity_weight_multiplier value in the filter_scheduler section of /
etc/nova/nova.conf (in the DEFAULT section prior to OpenStack Ocata) to a large enough

22 2 Requirements on OpenStack
CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

positive value that soft anti-affinity outweighs the combined weight of all other factors configured in
scheduler_weight_classes. We recommend setting the value to 1000.

From OpenStack Pike onward, to ensure that PCI device rules take precedence over other factors,
you must set the pci_weight_multiplier value in the filter_scheduler section of /
etc/nova/nova.conf to a large enough positive value that PCI device affinity outweighs the
combined weight of all other factors configured in scheduler_weight_classes besides soft
anti-affinity. We recommend setting the value to 500.

See http://docs.openstack.org/draft/config-reference/compute/scheduler.html#weights for more


information on this.

2.1.6 Guest watchdog


To detect and recover from VM failures, you must configure the guest watchdog on each VM instance
to reset the instance if it fails. This applies to all products, with the following exceptions.

• DCM does not support guest watchdogs.


• Clearwater Core only supports guest watchdogs in V11.0 and above.
• Secure Distribution Engine (SDE) only supports guest watchdogs in V1.5 and above.

This can be enabled by setting the hw_watchdog_action property on the image used to boot the
instances or on the flavor you are going to use.

If the virtual hardware watchdog is configured, Metaswitch products will use it. If the product software
or guest OS hangs, the virtual hardware watchdog will reset the instance.

If the virtual hardware watchdog is not configured, Metaswitch products will instead configure a
software watchdog in the guest Linux kernel and use that. If the product software hangs, or if the
guest OS hangs in such a way that the kernel software watchdog is still running, the software
watchdog will reboot the instance.

2 Requirements on OpenStack 23
OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

3 Planning your OpenStack deployment


This section guides you through planning your virtualized deployment of Metaswitch products on
OpenStack.

How to use this section


This section assumes that you have, or can build, an OpenStack environment meeting the
requirements set out in Requirements on OpenStack on page 14 and in the Metaswitch Products
Virtual Infrastructure Requirements Guide.

It contains the following subsections.

• Product topologies on page 25 sets out the valid topologies for the various products, reflecting
how they deliver scaling and local redundancy. This section also defines the minimal requirements
for a lab deployment.
• Per-product VM requirements on page 32 defines the supported specifications of individual
VMs for the different products, defining their required resources and expected capacity and
performance (though see the caveats below concerning these figures).
• Network configuration on page 67 sets out networking requirements, specifically the virtual
NICs required for the different VM types. It also contains detailed guidance on use of network
acceleration techniques needed for some types of Perimeta deployment.

You can use this information as follows.

• Based on the capacity you need, the services you want to offer, and the number of sites in your
deployment, you must calculate the number of VM instances of each product that will be actively
processing load at any given time (that is, the number of VMs you require to meet your load
requirements only, without factoring in local redundancy requirements). Depending on the product,
this will be the number of standalone instances required, the number of HA pairs required, or the
value of "N" in an N+K pool. Some headline figures for expected capacity of the products are given
in VM resource specification tables on page 35, but you should consult Metaswitch to check
any assumptions you are making, especially regarding profiles of use. This is particularly the case
for EAS. You will also need to refer to this section to work out what capacity you should expect
on your particular hardware, which may differ from the reference hardware used to generate the
capacity numbers in VM resource specification tables on page 35.

Note:

In calculating the capacity required in a given site, you must consider any effects from geo-
redundancy in your overall network. For example, in a 2-site GR clustered CFs network, each site
must be able to service the load for all subscribers.

24 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

• The above step gives you the number of VM instances of various VM types required for load
only. You will then need to refer to Product topologies on page 25 to work out the number of
additional instances required to achieve redundancy given each product's topology.
• You must also take into account the requirement for five nines availability, and if you will achieve
that using multi-cloud redundancy (as described in High availability on page 14) then consider how
your instances are distributed across the multiple clouds. This distribution results in a number of
instances of each product for each cloud in your deployment.

As well as capacity numbers, VM resource specification tables on page 35 specifies resource


needs for computing power (virtual CPU cores) and RAM. You can use that together with the
numbers of VM instances determined above to work out total CPU / RAM / storage demands on
your infrastructure. In the case of a fresh deployment this gives an idea of the number of servers and
other resources to be purchased and deployed. You should also make sure OpenStack will be able
to distribute the VM instances across hosts while also respecting the rules explained in the Virtual
Infrastructure Requirements Guide, in particular concerning anti-affinity, NUMA placement, etc.

3.1 Product topologies


This table defines the redundancy topologies supported by the various products covered by this
document, in production and in laboratory systems. Explanatory notes for some products follow the
table.

Product Name Minimal production topology Minimal lab


topology

CFS MetaSphere Call 1+1 HA pair of VMs 1+1


Feature Server

AGC / MGC Access and / or Media 1+1 HA pair of VMs 1+1


Gateway Controller

RPAS Representative N+K pool of VMs 1


Application Server

OBS Object Backup Store N+K pool of VMs 1

MRS Media Resource Server N+K pool of VMs 1

MVD MetaView Director 1+1 HA pair of VMs 1+1

ESAP Emergency Standalone 1+1 pool of VMs per remote 1


Proxy site

3 Planning your OpenStack deployment 25


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Product Name Minimal production topology Minimal lab


topology

EAS MetaSphere Enhanced Single VM booted from shared 1


Application Server storage for redundancy
(single instance)

MetaSphere Enhanced Single Active VM with an 1+1


Application Server equivalent Standby VM for
(virtual DSS) redundancy

MetaSphere Enhanced N+K pool of VMs with a data 2


Application Server (pool store and load balancing
server system) infrastructure (see notes)

AMS Advanced Messaging N+K pool of VMs 1


Service (AMS)

N-Series MetaSphere N-series Single VM with shared storage 1


for redundancy plus optional
N+K pool of node servers for
(Basic) MoH only (see notes)

Perimeta Perimeta ISC 1+1 HA pair of VMs as 1


standard, single VM as option

Perimeta SSC 1+1 HA pair of VMs as 1


standard, single VM as option

Perimeta MSC 1+1 HA pair of VMs as 1


standard, single VM as option

Clearwater Core SPN N+K pool of VMs 1

DGN N+K pool of VMs 1

SCN N+1 pool of VMs, where N+1 is 1


at least 3

OAN N+K pool of VMs 1

BGCF VM BGCF VM N+K pool of VMs, with a DCS 1, with a DCS


pool pool

26 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

Product Name Minimal production topology Minimal lab


topology

Metaswitch CCF CSN N+1 pool of VMs, where N+1 is 1


at least 3

CPN N+K pool of VMs 1

CFN N+K pool of VMs 1

MVS MetaView Server Single VM with shared storage 1


for redundancy, or 1+1 pair of
VMs.

SAS Service Assurance Single VMs in each cloud of a 1


Server multi-cloud deployment, each
booted from shared storage for
local redundancy

DCM Distributed Capacity 1+1 pool of VMs per site 1


Manager

MVSE MetaView Statistics Single VM 1


Engine

MDM Metaswitch Deployment 2+1 quorate pool of VMs 1


Manager

MVM EVN N+K pool of VMs 1

SMPPp N+K pool of VMs 1

OAM 3+2 quorate pool of VMs 1

Rhino nodes MMT, SMO, MAX, MAG N+K pool of VMs, where N+K is 1
and MCP at least 3

ShCM, REM N+K pool of VMs 1

TSN N+1 pool of VMs where N+1 is 1


at least 3

Secure Distribution SDE 1+1 HA pair, with a DCS pool 1+1, with a
Engine (SDE) DCS pool

SIMon ServiceIQ Monitoring 1 1

3 Planning your OpenStack deployment 27


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Product Name Minimal production topology Minimal lab


topology

SIMPL ServiceIQ Management 1 1


Platform

Storage Cluster Storage Cluster N+K pool of VMs 3

QCall SVS 1+1 pool 1

PCC 2+1 pool 2+1 pool

CRS 2+1 pool 2+1 pool

DAM Distributed Admission Single VM, with a DCS pool Single VM, with
Manager a DCS pool

DCS Deployment 3-VM quorate pool 3-VM quorate


Configuration Store pool

Note the following. Where multi-cloud deployments are discussed, see Redundant cloud deployments
on page 16 for more details.

• CFS, AGC / MGC and MVD must always be deployed as one or more 1+1 HA pairs of VMs, even
in lab environments. They are not supported as a single active VM. In a multi-cloud deployment the
two VM instances are deployed in different clouds.
• RPAS, OBS, MRS, and DCM are all deployed as N+K pools of VMs. In labs they may all be
deployed as just a single instance if desired (although if the lab is to include redundancy testing,
at least a 1+1 pool is required). In production, DCM must be deployed as 1+1 per site. In a multi-
cloud deployment, one of each of the 1+1 DCM VM instances would be deployed in a separate
cloud, and OBS, RPAS and MRS would each be deployed as N+N pools with N instances in each
cloud, with X * (number of clouds - 1) >= N+K to allow for cloud failure.
• ESAP is deployed as a 1+1 pool of VMs per remote protected site. In labs it may be deployed as
just a single instance if desired (although if the lab is to include redundancy testing, at least a 1+1
pool is required). In a multi-cloud deployment, one of the 1+1 instances would be deployed in each
cloud.
• EAS can be deployed in the following different ways.

• Single-instance EAS is deployed with only a single active VM instance running at any one
time. Redundancy (which is mandatory for live deployments) is achieved using manual or
automatic/orchestrated failover with shared redundant storage: in the event that the active VM
fails you can use a manual process or automatic orchestration to re-instantiate the VM on a
different host.

Note that this approach places an additional requirement on the overall resources needed
and the layout of VMs across hosts, as follows. For each such product VM, there must be on

28 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

another compute node a block of unallocated resources equal to the requirements of the VM,
so that you have room to re-instantiate the VM should the original compute node fail.

There is currently no multi-cloud redundancy solution for single-instance EAS: to deploy EAS in
multiple clouds you must use the EAS pool server system (see below).
• The EAS virtual Dual Server System is deployed as a pair of active/standby VMs, where
the active VM runs the MetaSphere EAS service. If the active VM fails, there will be a short
outage while the service is transferred to the standby VM. The active and standby VMs should
be deployed on different hosts so that loss of the host does not cause the loss of both the active
and standby VMs.
• The EAS pool server system is deployed as an N+K pool of VMs, with a minimal lab topology
of two VMs. In production, the EAS pool server system would be deployed as N+N pools with N
instances in each cloud.

In addition to the requirements to run the EAS VMs, you must also provide the following.

• A data store. You can choose from the following options.

• A filer, supplied either by Metaswitch or a third party (see Storage in Metaswitch Products
Virtual Infrastructure Requirements Guide).
• (From EAS V9.6) Metaswitch's Storage Cluster, a virtual data store. You must use this
option if you are deploying an EAS GR system.
• A load balancing infrastructure.

• If you are using EAS V9.5.40 or earlier, you must deploy a third party load balancer, as
described in https://communities.metaswitch.com/docs/DOC-231512.
• EAS V9.6 introduces the ability to distribute traffic using a third party Web Application
Firewall (WAF) with load balancing functionality or, in smaller deployments or where
the cost of a WAF or load balancer is prohibitive, DNS. For more information, see Load
Balancing in the MetaSphere Enhanced Application Server and CommPortal System
Planning Guide

Note:

There is no migration path between the single-instance or virtual DSS and the pooled
implementation of virtualized EAS. If you may wish to deploy EAS in a multi-cloud redundant
architecture in the future, it is highly recommended that you deploy the EAS pool server system
from the start, even if you will only deploy it in a single cloud to begin with.

• AMS is deployed in an N+K pool of VMs, either in a single cloud or multi-cloud deployment, with a
minimal lab topology of a single VM.
• There are a number of supported deployment topologies for N-Series, depending on the N-Series
applications being used.

• The standard topology is a single active instance with its redundancy achieved using manual
or automatic/orchestrated failover and shared storage (exactly as for EAS single instance).

3 Planning your OpenStack deployment 29


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Each such instance may provide any supported combinations of N-Series services, and it is
supported to deploy multiple instances with different combinations together.
• For (Basic) Music on Hold only, a scalable topology is supported, consisting of

• a master instance (handling web traffic and mastering the configuration database), running
as a single active instance with its redundancy achieved using manual or automatic/
orchestrated failover and shared storage as above
• additionally, an N+K poll of node instances (handling SIP and RTP, with read-only duplicates
of the master database).
• It is supported to deploy any combination of the single-instance and scalable topologies
together to provide the desired set of N-Series services. See the N-Series System Planning
Guide for more information.
• Perimeta is usually deployed as one or more 1+1 HA pairs of VMs, but non-HA single VM
instances or N+K pools are supported in labs and production. In a multi-cloud deployment the two
instances of an HA pair, or N of each N+K pool, are deployed in different clouds within the same
site.

Attention:

Where Perimeta is deployed as an HA pair, there are restrictions on the relative specs of the
respective hosts.

For Perimeta V3.7.40 and earlier, the hosts containing the paired VMs must have the same clock
speed. Mismatched speeds are acceptable for short periods during maintenance or upgrade, but
while they persist, Perimeta will alarm, and if the backup is running on the slower host then it will
not be providing full HA protection at high load.

For Perimeta V3.8 onwards, the hosts containing the paired VMs may have different clock speeds.
As a safety measure, Perimeta will alarm by default if deployed on machines with different clock
speeds - see Setting the minimum CPU speed for a COTS server or virtual machine in a high
availability system in the Perimeta Operations and Maintenance Guide for how to prevent this
alarm.

• The Clearwater Core (CC) SPNs, DGNs, and optionally OANs are deployed in an N+K pool; the
Clearwater Core SCNs use an N+1 topology.
• The BGCF VM is deployed in an N+K pool of VMs, either in a single cloud or multi-cloud
deployment, with a minimal lab topology of a single VM.
• Metaswitch CCF is based on the Clearwater Core topology design, so the CCF CPNs and
optionally CFNs are deployed in an N+K pool and CSNs use an N+1 topology.
• MVS is normally deployed as a 1+1 HA instance pair, with replication, and automatic failover
between them. In a multi-cloud deployment the active and standby are deployed in different
clouds. It may also be deployed as a single instance using redundant shared storage in a single
cloud if automatic failover is not required. In a lab environment it is permissible to deploy MVS as a
single non-redundant VM.

30 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

• SAS instances receiving events from other products are deployed as individual VMs. When
virtualized they are not supported as an application co-located in the MVS VM. In a multi-site
deployment, you will deploy one or more SAS instances per site to achieve the required overall
capacity. The SAS instances within each site operate independently from other sites.

• Redundancy (which is recommended but not required for SAS, depending on your deployment
needs) is achieved at the scope of individual instances using manual or automatic/orchestrated
failover with shared redundant storage: in the event that the active VM fails you can use a
manual process or automatic orchestration to re-instantiate the VM on a different host using the
same storage.
• In a multi-cloud deployment at least one SAS instance receiving events from other products is
deployed in each cloud, with the VM instances of each other product in a given cloud sending
SAS events to a SAS instance in their local cloud. A separate master SAS (per site) is deployed
to allow searching across all SAS instances within a site.

• The Master SAS needs to continue to provide service on failure via a common IP address.
This is particularly important in a multi-cloud deployment designed to cope with cloud failure.
• This is achieved by running the master SAS collocated with a "dummy" MVS: a VM
instance that is instantiated as an MVS, but is providing only Master SAS function and is not
providing MVS function to manage any network elements.

Note that this is the only scenario in which collocation of SAS with MVS is supported on
virtual platforms. SAS instances which receive events from other products are not supported
collocated with MVS, and nor is it supported for a master SAS to be collocated with a "real"
MVS (i.e. one which is actively managing any network elements).
• The "dummy" MVS runs as an active/standby pair of VM instances, with one half in each
cloud, and which therefore provides a single virtual IP address that floats to the currently
active half.
• Master SAS is not supported collocated with a "real" MVS (i.e. one which is actively
managing any network elements).
• MDM is only supported as a 2+1 quorate pool of VMs in production topologies or as a single VM in
lab topologies. It is not possible to deploy any other number of MDM VMs.
• The MVM EVN and SMPPp VMs are deployed in an N+K pool, the MVM OAMs are supported only
as a 3+2 quorate pool of VMs in production topologies or as a single VM in lab topologies. It is not
possible to deploy any other number of OAM VMs.
• The Rhino nodes are supported as single VMs, but for high availability it is recommended that the
deployment uses at least a 3-node cluster (i.e. a pool with N+K >= 3).
• Storage Cluster is deployed in an N+K pool of VMs, with a minimal lab topology of at least a 3-
node cluster (i.e. a pool with N+K >= 3).
• The following product components are not deployed as VMs in their own right, but are collocated
with other products in the above VMs.

• Install Server and VPN Server are supported as applications in the MVS VM. (Certificate
Authority Server is not included by default; please contact your Metaswitch support
representative if you need to use it in your deployment.)

3 Planning your OpenStack deployment 31


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

• MetaView Web is collocated with MVS or EAS.


• CommPortal is collocated with EAS.
• SIP Provisioning Server is collocated with EAS.

3.2 Per-product VM requirements


This section sets out the following requirements for a single VM instance of a given product.

• Number of virtual CPUs (vCPUs). Hyper-threading on page 33 explains the relationship


between vCPUs and physical CPUs and the role hyper-threading plays in defining this relationship.
• RAM (GiB)
• Boot disk space (GiB)
• Additional volume disk space (GiB)
• Disk performance (I/O operations per second - iops).

Attention:

Throughout this document we quote RAM and disk sizes as GiB and TiB. By this we mean the IEC
binary units, e.g. 1 GiB is 1024×1024×1024 bytes, rather than decimal unit GB (1,000,000,000
bytes).

The discrepancy between binary and decimal units becomes greater at scale: 1TiB measured
in binary units is approximately 10% (or 100GiB) larger than 1TB measured in decimal units.
Many equipment manufacturers list capacities in decimal units; you should factor this into your
calculations when choosing hardware in order to avoid under-provisioning resources for your
deployment.

The boot disk space is the size of the disk that is created from the QCOW2 image. It contains the
guest operating system and Metaswitch software, and is used for storing some configuration and
diagnostics data.

• Perimeta, DCM, Clearwater Core, Metaswitch CCF, MVM, MetaView Statistics Engine, and
Storage Cluster use ephemeral storage for the boot device.
• We recommend that all other products use persistent volume storage for the boot device.

Some products require additional persistent volume storage space for extra data, including more
configuration and diagnostics data, and data stored as part of the service (for example voicemails).
This is listed as additional volume disk space.

See VM resource specification tables on page 35 for more discussion on storage. In particular,
note that the values given in the table below are for the disk space that must be available to the VM,
not the raw disk capacity to provide it, which may be significantly more than this depending on the
underlying storage implementation.

Most Metaswitch products define a small number of different sizes of VMs, drawn from the list below.
One VM size is defined for a lab deployment of each product, and one or more sizes are defined for
production.

32 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

• Lab. As the name suggests, this is suitable for lab testing only, and supports only very modest
amounts of load (i.e. no load testing in the lab).
• Low capacity. This is the minimal set of VM requirements for a single VM instance of the product
when in production.
• Medium capacity. An intermediate size for a single VM instance of the product when in
production.
• High capacity. This is the maximal set of VM requirements for a single VM instance of the product
when in production.
• Single spec. This is used by some products that support a single VM size for production systems.

For production systems, the table sets out estimated or indicative headline capacity numbers of the
specs of VMs it defines. Note that the precise capacity of a particular Network Element depends on
a number of factors, but this table provides guideline figures for a typical deployment, assuming the
reference hardware set out in Reference hardware on page 81. Each individual project needs to be
dimensioned separately, according to the load on it. Where multiple capacity measures are given, do
not assume that they can all be met simultaneously and do consult the notes after the tables.

For lab-spec systems, the table does not quote a capacity, and we make no stipulations about the
specification of hardware.

It is not supported to deploy any products in production with vCPU and RAM resources not matching
one of the sizes defined here. In particular, adding more of these resources above the high-capacity
VM spec is not expected to result in higher performance/capacity, and in some cases for various
reasons may actually be detrimental to performance/capacity, and is therefore not supported.

3.2.1 Hyper-threading
Hyper-threading is an Intel processor technology that allows more efficient use of processor resources
through greater parallel computation. When it is enabled, each CPU core appears to host operating
systems and hypervisors as two logical processors rather than one (though it does not deliver twice
the processing power - a common rule of thumb is that a hyper-threaded core is equivalent to 1.3
non-hyper-threaded cores of equivalent speed). See https://software.intel.com/en-us/articles/how-to-
determine-the-effectiveness-of-hyper-threading-technology-with-an-application for further information
on how to determine the effective processing speed of hyper-threaded cores.

When hyper-threading is enabled in a virtualized environment, virtual CPUs are executed on hyper-
threads, meaning that two vCPUs can be simultaneously assigned to a single core. With hyper-
threading disabled, vCPUs are executed one per core, and significantly more hardware resources are
needed.

All Metaswitch products are agnostic as to whether hyper-threading is enabled, but the performance
and capacity estimates in this document assume that it is enabled where available.

3.2.2 Using the VM resource specification tables


The tables in VM resource specification tables on page 35 show the resource specifications for one
instance of each virtualized Metaswitch product running on OpenStack. This information must be used

3 Planning your OpenStack deployment 33


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

in conjunction with the product topology and scale requirements to calculate total resource needs, as
in the following examples:

• Where the product runs as a 1+1 pair, the pair needs twice the resources listed here.
• Where the product is being deployed as a pool of VM instances, the pool needs the resources
listed here multiplied by the number of instances in the pool (N+K).
• Where the product runs as a single instance, using manual or automatic/orchestrated failover with
shared redundant storage for redundancy, space needs to be included (but not allocated to any
VM) so that you are able to re-instantiate a failed instance on a new host.

Heat templates corresponding to the VM instance definitions specified in the table are available for
some Metaswitch products. Please see individual Metaswitch product documentation for details.

The following notes are also relevant.

• As per the discussion in the Virtual Infrastructure Requirements Guide, this table assumes the
VMs' resources are either dedicated or pooled but uncontended. Where they are pooled and
contended, as permitted (but not recommended), lower capacity should be expected at the points
where contention is hit. We do not attempt to quantify the impact of such contention - it will vary
from product to product, and from resource to resource - but be aware that the impact is likely to
be worse than you would predict based on a linear degradation pattern.
• When calculating the total resources required in your virtual infrastructure, you must reserve
resources as follows:

• You must reserve in full the resources specified in VM resource specification tables on page
35, even if you require less maximum capacity than the product can deliver with the stated
resources. You cannot scale down the resource allocation for anticipated lower capacity.
• You must additionally reserve the resources required for the OpenStack hypervisor and system
tasks to run successfully, without contending with the guests. See OpenStack overhead on
page 57 for details.
• You must ensure that any non-Metaswitch guests running on the same hosts as your
Metaswitch VMs are configured so as not to contend for resources with the Metaswitch VMs.
• Storage numbers quoted state the usable space that must be available to the VM. The actual size
of disks will typically be more than that to allow overheads for shared storage and/or RAID. So for
example, if using RAID 10, then the actual disk size needs to be twice the storage size that must
be offered to the VM.

Attention:

The disk performance requirements (iops) must be met at all times, including e.g. when a RAID
rebuild is occurring, in order to avoid service impact. You should consider using a maintenance
window for any maintenance operation on your virtualization environment or storage subsystem
unless you are confident it will always satisfy minimum iops requirements during maintenance.

Attention:

34 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

The iops figures given here represent the minimum requirements your storage device must meet
for the given VM capacity. You should not use this figure to set limits on disk performance in the
settings for your VM.

3.2.3 VM resource specification tables


This table shows the VM resource specifications for one VM instance of a given product.

Note:

The "maximum capacity" values given in this table derive from the performance capabilities and
restrictions of the respective product VMs and the reference hardware. In practice, some of these
values may be limited by the licensed capacity you have purchased.

Attention:

No guarantee is possible concerning performance or correct functioning of Metaswitch products


using different quantities of vCPU or RAM from those in the tables below. Therefore, the products
are supported only using VMs of the specifications stated.

Note:

In these tables, the term vCPU refers to virtual CPU as exposed by the hypervisor. How this maps
to physical resource demands depends on whether hyper-threading is in use. If hyper-threading
is in use, a single physical core is exposed as two virtual CPUs (so the number of physical cores
is half the vCPU count indicated). If not, a single core is exposed as one virtual CPU. See Hyper-
threading on page 33 for more detail.

Attention:

These tables do not apply to the Radisys MRF, a third-party component used in the Metaswitch
VoLTE Solutions. Information on the resource specifications for these VMs can be found in the
documentation for the Metaswitch VoLTE solution in which you are deploying it.

CFS / AGC / MGC

Spec vCPUs RAM Boot Addl. Storage Maximum capacity /


(GiB) Device volume speed Notes
size storage (iops)
(GiB) required
(GiB)

Lab (CFS non-clustered, 1 4 30 170


AGC, MGC)

3 Planning your OpenStack deployment 35


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Spec vCPUs RAM Boot Addl. Storage Maximum capacity /


(GiB) Device volume speed Notes
size storage (iops)
(GiB) required
(GiB)

Lab (CFS clustered) 1 8 30 170

Low capacity 4 8 30 170 50 25k residential


equivalent subs

50k BHCA

High capacity 12 48 30 170 100 500k residential


equivalent subscribers

1M BHCA

500k BHCA on geo-


redundant AGC/MGC

Note:

Heavy usage of Line State Monitoring can reduce the BHCA capacity of your CFS.

Note:

One business subscriber is equivalent to 2.5 residential subscribers.

RPAS

Spec vCPUs RAM Boot Addl. Storage Maximum


(GiB) Device volume speed capacity / Notes
size storage (iops)
(GiB) required
(GiB)

Lab 1 4 30 70

Low capacity 4 4 30 70 75 600k BHCA

High capacity 8 8 30 70 100 3 million BHCA

36 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

OBS

Spec vCPUs RAM Boot Addl. Storage Maximum


(GiB) Device volume speed capacity / Notes
size storage (iops)
(GiB) required
(GiB)

Lab 1 8 30 70

Low capacity 8 16 30 70 50 1+1 pair per site


supports up to 500k
subs across 2 or 3
sites

High capacity 24 48 30 70 100 1+1 pair per site


supports up to 3
million subs across
2 sites or 4.5 million
across 3 sites

MRS

Spec vCPUs RAM Boot Addl. Storage Maximum


(GiB) Device volume speed capacity / Notes
size storage (iops)
(GiB) required
(GiB)

Lab 1 4 100 0

Single spec 8 8 100 0 50 500 media streams

MVD

Spec vCPUs RAM Boot Addl. Storage Maximum


(GiB) Device volume speed capacity / Notes
size storage (iops)
(GiB) required
(GiB)

Lab 1 8 30 70

Low capacity 4 8 30 70 50 500k subs

3 Planning your OpenStack deployment 37


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Spec vCPUs RAM Boot Addl. Storage Maximum


(GiB) Device volume speed capacity / Notes
size storage (iops)
(GiB) required
(GiB)

High capacity 12 24 30 70 100 4.5 million subs

Very High capacity 12 48 30 70 100 15 million subs

ESAP

Spec vCPUs RAM Boot Addl. Storage Maximum


(GiB) Device volume speed capacity / Notes
size storage (iops)
(GiB) required
(GiB)

Lab 1 4 30 70

Low capacity 4 8 30 70 50 100k subscribers /


200k BHCA

High capacity 8 16 30 70 100 500k subscribers /


1M BHCA

EAS (single instance)

Spec vCPUs RAM Boot Addl. Storage Maximum


(GiB) Device volume speed capacity / Notes
size storage (iops)
(GiB) required
(GiB)

Lab 1 4 70 230

Low capacity 4 8 70 230 50 5k BPU subs, 11k


BSU subs

High capacity 12 24 70 930 100 30k BPU subs, 66k


BSU subs

Note:

EAS capacity is indicated in terms of the following types of subscribers:

38 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

- BPU = Business Premium Users or equivalent subscribers including a mix of CommPortal,


VM, videomail, MaX UC Client and SIP Provisioning Server, with automatic message aging for
voicemail.

- BSU = Business Standard Users or equivalent subscribers including CommPortal, VM and SIP
Provisioning Server, with automatic message aging for voicemail.

The indicated performance numbers assume a certain usage profile for these subscribers that may
not match your subscribers. You should discuss your expected profiles with Metaswitch.

EAS (virtual DSS)

Spec vCPUs RAM Boot Volume Storage Maximum


(GiB) Device storage speed capacity / Notes
size required (iops)
(GiB) (GiB)

Lab 1 6 70 530

Production 12 24 70 530 100 30k BPU subs, 66k


BSU subs

Note:

EAS capacity is indicated in terms of the following types of subscribers:

- BPU = Business Premium Users or equivalent subscribers including a mix of CommPortal,


VM, videomail, MaX UC Client and SIP Provisioning Server, with automatic message aging for
voicemail.

- BSU = Business Standard Users or equivalent subscribers including CommPortal, VM and SIP
Provisioning Server, with automatic message aging for voicemail.

The indicated performance numbers assume a certain usage profile for these subscribers that may
not match your subscribers. You should discuss your expected profiles with Metaswitch.

EAS (pool server system)

Spec vCPUs RAM Boot Addl. Storage Maximum


(GiB) Device volume speed capacity / Notes
size storage (iops)
(GiB) required
(GiB)

Lab 2 4 100 0 50

3 Planning your OpenStack deployment 39


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Spec vCPUs RAM Boot Addl. Storage Maximum


(GiB) Device volume speed capacity / Notes
size storage (iops)
(GiB) required
(GiB)

Deployment 12 24 100 0 100 13k BPU subs or


27k BSU subs per
VM

One VM in the
pool provides
redundancy,
so should be
excluded from the
calculations. For
example, a pool of N
+1 VMs can support
up to N x 13k BPU
subs. Up to 20 VMs
are supported in a
pool.

Note:

EAS capacity is indicated in terms of the following types of subscribers:

- BPU = Business Premium Users or equivalent subscribers including a mix of CommPortal,


VM, videomail, MaX UC Client and SIP Provisioning Server, with automatic message aging for
voicemail.

- BSU = Business Standard Users or equivalent subscribers including CommPortal, VM and SIP
Provisioning Server, with automatic message aging for voicemail.

The indicated performance numbers assume a certain usage profile for these subscribers that may
not match your subscribers. You should discuss your expected profiles with Metaswitch.

40 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

N-Series

Spec vCPUs RAM Boot Addl. Storage Maximum


(GiB) Device volume speed capacity / Notes
size storage (iops)
(GiB) required
(GiB)

Lab 1 4 300 0

Low capacity 4 8 300 0 50 300 conferencing


ports

13,500 (Basic) MOH


subs

High capacity 12 24 300 0 100 1500 conferencing


ports

66,000 (Basic) MOH


subs

Note:

These resource specifications apply to both single instances and scalable (Basic) Music on Hold
pool instances.

Perimeta ISC

Note:

Performance and capacity metrics for each scale of Perimeta ISC are provided at https://
communities.metaswitch.com/docs/DOC-231501.

Spec vCPUs RAM (GiB) Boot Addl. Storage


Device volume speed
size (GiB) storage (iops)
required
(GiB)

Lab 2 4 40 0

Low capacity 2 6 40 0 50

3 Planning your OpenStack deployment 41


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Spec vCPUs RAM (GiB) Boot Addl. Storage


Device volume speed
size (GiB) storage (iops)
required
(GiB)

Medium capacity 8 16 / 32 60 0 100


(see notes
below)

High capacity 16 16 / 32 60 0 100


(see notes
below)

Note:

The amount of RAM required for a medium or high capacity Perimeta ISC depends on the number
of subscribers using TLS that the Session Controller must support.

- Without TLS, you will need 16 GiB of RAM.

- If support is required for fewer than 32,000 subscribers using TLS, you will need 16 GiB of RAM.

- If support is required for 32,000 subscribers or more using TLS, you will need 32 GiB of RAM.

Note:

See Perimeta data storage on page 60 for more information on recommended and minimum
virtual disk sizes for Perimeta.

Perimeta MSC

Note:

Performance and capacity metrics for each scale of Perimeta MSC are provided at https://
communities.metaswitch.com/docs/DOC-231501.

Spec vCPUs RAM Boot Addl. Storage


(GiB) Device volume speed
size (GiB) storage (iops)
required
(GiB)

Lab 2 4 30 0

Low capacity - RTP 2 6 30 0 50

42 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

Spec vCPUs RAM Boot Addl. Storage


(GiB) Device volume speed
size (GiB) storage (iops)
required
(GiB)

Medium capacity - RTP 8 8 30 0 100

High capacity - RTP 16 8 30 0 100

Low capacity - MSRP 2 6 30 0 50

High capacity - MSRP 8 8 30 0 100

Note:

See Perimeta data storage on page 60 for more information on recommended and minimum
virtual disk sizes for Perimeta.

Perimeta SSC

Note:

Performance and capacity metrics for each scale of Perimeta SSC are provided at https://
communities.metaswitch.com/docs/DOC-231501.

Spec vCPUs RAM (GiB) Boot Addl. Storage


Device volume speed
size (GiB) storage (iops)
required
(GiB)

Lab 2 4 40 0

Low capacity 2 6 40 0 50

Medium capacity 8 16 / 32 60 0 100


(see notes
below)

High capacity 20 16 / 32 150 0 100


(see notes
below)

3 Planning your OpenStack deployment 43


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Note:

The amount of RAM required for a medium or high capacity Perimeta SSC depends on the number
of subscribers using TLS that the Session Controller must support.

- Without TLS, you will need 16 GiB of RAM.

- If support is required for fewer than 32,000 subscribers using TLS, you will need 16 GiB of RAM.

- If support is required for 32,000 subscribers or more using TLS, you will need 32 GiB of RAM.

Note:

See Perimeta data storage on page 60 for more information on recommended and minimum
virtual disk sizes for Perimeta.

AMS

Spec vCPUs RAM Storage Storage Maximum capacity / notes


(GiB) required speed (iops)
(GiB)

Lab 1 4 100 50

Low capacity 4 8 100 50 20,000 BPU subscribers per


server. Up to 40,000 BPU
subscribers per AMS cluster.

50,000 Consumer
subscribers per server.
Up to 100,000 Consumer
subscribers per AMS cluster.

Medium capacity 8 32 100 100 50,000 BPU subscribers per


server. Up to 100,000 BPU
subscribers per AMS cluster.

100,000 Consumer
subscribers per server.
Up to 200,000 Consumer
subscribers per AMS cluster.

High capacity 16 64 100 100 100,000 BPU subscribers


per server, 200,000 BPU
subscribers per AMS cluster.

Not recommended for


consumer-only deployments.

44 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

Note:

AMS capacity is indicated in terms of the following types of subscribers:

BPU = Business Premium Users with Presence enabled on MaX UC Client. AMS capacity for these
subscribers is governed by the limited storage space available for Presence avatars. Capacity will
increase if avatars are not used in Presence and subscribers have a small number of contacts.

Consumer = Consumer users, who do not use Presence.

One VM in the cluster provides redundancy, so should be excluded from the calculations. For
example, a cluster of N+1 VMs can support up to N x 100k BPU subs. The maximum supported
cluster size is 2+1 VMs.

Clearwater Core SPN

Spec vCPUs RAM Boot Addl. Storage


(GiB) Device volume speed
size storage (iops)
(GiB) required
(GiB)

Lab 1 2 20 0

Single spec 2 4 20 0 50

Clearwater Core DGN

Spec vCPUs RAM Boot Addl. Storage


(GiB) Device volume speed
size storage (iops)
(GiB) required
(GiB)

Lab 1 2 20 0

Single spec 2 4 20 0 50

3 Planning your OpenStack deployment 45


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Clearwater Core SCN

Spec vCPUs RAM Boot Addl. Storage


(GiB) Device volume speed
size storage (iops)
(GiB) required
(GiB)

Lab 1 4 20 0

Low 2 4 20 0 50

High 16 32 80 0 400

Clearwater Core OAN

Spec vCPUs RAM Boot Addl. Storage


(GiB) Device volume speed
size storage (iops)
(GiB) required
(GiB)

Lab 1 2 20 0

Single spec 2 4 20 0 50

BGCF VM

Spec vCPUs RAM Boot Addl. Storage Maximum


(GiB) Device volume speed capacity / Notes
size storage (iops)
(GiB) required
(GiB)

Single spec 4 8 20 0 0 290k BHCA of


VoLTE traffic profile

Metaswitch CCF CPN

Spec vCPUs RAM Storage Storage Maximum capacity / notes


(GB) required speed
(GB) (IOPS)

Lab 1 2 20

46 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

Spec vCPUs RAM Storage Storage Maximum capacity / notes


(GB) required speed
(GB) (IOPS)

Production 2 4 20 50

Metaswitch CCF CSN

Spec vCPUs RAM Storage Storage Maximum capacity / notes


(GB) required speed
(GB) (IOPS)

Lab 1 2 20

Production 2 4 100 150 Must use a solid state drive


(SSD) volume

Expanded (V8.0+) 2 4 300 200 Must use a solid state drive


(SSD) volume

High capacity 16 32 300 200 Must use a solid state drive


(V9.0+) (SSD) volume

Metaswitch CCF CFN

Spec vCPUs RAM Storage Storage Maximum capacity / notes


(GB) required speed
(GB) (IOPS)

Lab 1 2 20

Production 2 16 20

MVS

Attention:

Running MetaView Web on the MetaView Server reduces the quoted maximum capacity by 50%.

3 Planning your OpenStack deployment 47


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Spec vCPUs RAM Boot Addl. Storage Maximum


(GiB) Device volume speed capacity / Notes
size storage (iops)
(GiB) required
(GiB)

Lab 1 8 35 265

Low capacity 4 8 35 265 100 250k residential


subs

Does not support


SIP PS managed
devices.

Does not support


the EAS Shadow
Configuration
Database.

Mid capacity 12 24 35 265 150 1 million residential


subs

High capacity 24 48 35 265 300 5 million residential


subs

SAS (Standalone, Data and Master)

Where a range of values is given, we recommend using the lowest value as the most economical
option. Higher values are supported to allow you to match your own standard OpenStack flavors. SAS
can be installed on boot devices with more than 60GiB, but partitioning will mean that only 60GiB is
used.

Note:

For SAS versions before V9.2, the VM requires 64GiB RAM. From V9.2 onwards, it requires 24GiB.
For a SAS VM installed before V9.2 and then upgraded to V9.2, procedures are provided to adjust
the VM size down from 64GiB to 24GiB. See the OpenStack Deployment Procedures Guide.

See SAS data storage on page 57 for more information on virtual disk sizes for SAS.

48 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

Spec vCPUs RAM Boot Boot IOPS Additional storage


(GiB) Device
size (GiB)

Lab 1 4 60 No additional volume storage


needed for lab systems.

Low capacity 4 8 60 50 See SAS data storage on page


57

High capacity 8 >V9.2: 60 - 160 50 See SAS data storage on page


57
24 - 32
No additional storage is required
<V9.2:
if deployed as a Master SAS.
64

DCM

Spec vCPUs RAM Boot Addl. Storage Maximum


(GiB) Device volume speed capacity / Notes
size storage (iops)
(GiB) required
(GiB)

Lab & production 1 2 (1 10 0 50


for
versions
prior
to
V4.0)

MVSE

Spec vCPUs RAM Boot Addl. Storage Maximum


(GiB) Device volume speed capacity / Notes
size storage (iops)
(GiB) required
(GiB)

Small 2 4 20 0 50 • Statistics
aggregation in a
Clearwater Core
deployment with
20 nodes, or

3 Planning your OpenStack deployment 49


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Spec vCPUs RAM Boot Addl. Storage Maximum


(GiB) Device volume speed capacity / Notes
size storage (iops)
(GiB) required
(GiB)

• Perimeta KPI
Dashboard (proof
of concept or lab
trial; capacity
3 Perimeta
Session
Controllers with
150 adjacencies
in total)

Medium 8 16 20 0 200 • Statistics


aggregation in a
Clearwater Core
deployment with
20 nodes, or
• Perimeta KPI
Dashboard
(production
deployment).
Capacity:

• 20 Perimeta
Session
Controllers
with 1000
adjacencies
(total)
and 250
adjacencies
per Session
Controller, or
• 30 Perimeta
Session
Controllers
without per-
adjacency
statistics
collection

50 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

MDM

Where a range of values is given, we recommend using the lowest value as the most economical
option. Higher values are supported to allow you to match your own standard OpenStack flavors.

Spec vCPUs RAM (GiB) Boot Addl. Storage Maximum


Device size volume speed capacity /
(GiB) storage (iops) Notes
required
(GiB)

Medium 6 8 - 12 40 - 80 0 100

MVM EVN

Spec vCPUs RAM (GiB) Boot Addl. Storage Maximum


Device size volume speed capacity /
(GiB) storage (iops) Notes
required
(GiB)

Lab 2 8 100 100

Single spec 8 16 100 100 100

MVM OAM

Spec vCPUs RAM (GiB) Boot Addl. Storage Maximum


Device size volume speed capacity /
(GiB) storage (iops) Notes
required
(GiB)

Lab 2 4 100 100

Single spec 8 16 100 100 100

3 Planning your OpenStack deployment 51


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

MVM SMPPp

Spec vCPUs RAM (GiB) Boot Addl. Storage Maximum


Device size volume speed capacity /
(GiB) storage (iops) Notes
required
(GiB)

Lab 2 4 100 100

Single spec 8 16 100 100 100

ServiceIQ Monitoring Node (SIMon)

Where a range of values is given, we recommend using the lowest value as the most economical
option. Higher values are supported to allow you to match your own standard OpenStack flavors.

Spec vCPUs RAM (GiB) Boot Boot Addl. Volume


Device size storage volume storage
(GiB) speed storage speed
(iops) required (iops)
(GiB)

Lab 4 8 20 50 * 100

Production 8+ 64+ 20+ 50+ * 300+

* ServiceIQ Monitoring supports a range of storage sizes for different solutions. Refer to your
Metaswitch solution documentation for details.

ServiceIQ Management Platform (SIMPL)

Spec vCPUs RAM (GB) Boot Boot Addl. Volume


Device size storage volume storage
(GB) speed storage speed
(iops) required (iops)
(GB)

Production 2 8 20 50 Minimum 50
60*

* ServiceIQ Management Platform VM volume storage is 60GB in size by default. As resizing the
volume storage is not supported once the SIMPL VM is deployed, please ensure that the required
volume storage size is correctly determined before deploying the SIMPL VM. For more details on the
required volume storage, see VM specification in the SIMPL VM Deployment Guide and consult your
Metaswitch Professional Services Engineer.

52 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

QCall Signing and Verification Service (SVS)

Where a range of values is given, we recommend using the lowest value as the most economical
option. Higher values are supported to allow you to match your own standard OpenStack flavors.

Spec vCPUs RAM (GiB) Boot Boot Addl. Volume


Device size storage volume storage
(GiB) speed storage speed
(iops) required (iops)
(GiB)

Lab 1 4 20 0 0 0

Production 2 4 20 - 40 50 0 0

QCall Platform Cluster Configuration (PCC)

Where a range of values is given, we recommend using the lowest value as the most economical
option. Higher values are supported to allow you to match your own standard OpenStack flavors.

Spec vCPUs RAM (GiB) Boot Boot Addl. Volume


Device size storage volume storage
(GiB) speed storage speed (iops)
(iops) required
(GiB)

Lab & 2 4 20 - 60 50 4 50
Production

QCall Certificate Repository Server (CRS)

Where a range of values is given, we recommend using the lowest value as the most economical
option. Higher values are supported to allow you to match your own standard OpenStack flavors.

Spec vCPUs RAM (GiB) Boot Boot Addl. Volume


Device size storage volume storage
(GiB) speed storage speed (iops)
(iops) required
(GiB)

Lab & 2 4 20 - 40 50 4 50
Production

3 Planning your OpenStack deployment 53


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Rhino nodes - MMT / SMO / MAG / MAX / MCP

Spec vCPUs RAM (GiB) Boot Addl. Storage


Device size volume speed
(GiB) storage (iops)
required
(GiB)

Small 4 18 (all 30 50
versions
from V4.0)

16 (all
versions
from
V2.7.1-7 to
V4.0, except
V2.8.0-0)

8 (all
versions
prior to
V2.7.1-7)

*See note
below for
details

Medium 8 18 (all 30 200


versions
from V4.0)

16 (all
versions
prior to
V4.0)

Rhino nodes - ShCM / REM

Spec vCPUs RAM (GiB) Boot Addl. Storage


Device size volume speed
(GiB) storage (iops)
required
(GiB)

Single spec 4 8 30 50

54 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

Rhino nodes - TSN

Spec vCPUs RAM (GiB) Storage required Storage speed


(GiB) (iops)

Medium 10 (all versions 16 94 (all versions 1000


from V4.0) from V4.0)

8 (all versions 30 (all versions


prior to V4.0) prior to V4.0)

Large 12 (all versions 24 94 (all versions 1000


from V4.0) from V4.0)

8 (all versions 30 (all versions


prior to V4.0) prior to V4.0)

Secure Distribution Engine (SDE)

Spec vCPUs RAM (GiB) Boot Addl. Storage


Device size volume speed
(GiB) storage (iops)
required
(GiB)

Lab 2 8 40 Diagnostics No specific


volume: requirements.
40GiB

Single spec 8 8 40 Diagnostics 50


volume:
40GiB

Deployment Configuration Store (DCS)

Spec vCPUs RAM (GiB) Boot Addl. Storage


Device size volume speed
(GiB) storage (iops)
required
(GiB)

Lab / Single spec 1 8 10 10 GiB 50

3 Planning your OpenStack deployment 55


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Storage Cluster

Spec vCPUs RAM Boot Configuration


Diagnostic
Diagnostic
Addl. Storage Maximum
(GiB) Devicevolumevolumevolumevolume speed (iops) capacity /
size size size IOPS storage notes
(GiB) (GiB) (GiB) required
Ephemeral
Persistent
Persistent (GiB)
Persistent

Lab 2 12 20 1 1 No The The exact


requirement
capacity speed required
required will depend
will on the profile
depend on used by your
the profile deployment.
used Contact your
by your account
deployment. manager
Contact for more
your information.
account
manager
for more
information

Production
12 24 20 5 75 75 The The exact For more
capacity speed required information on
required will depend the capacity
will on the profile required for
depend on used by your the Storage
the profile deployment. Cluster, see
used Contact your Capacity and
by your account scalability in the
deployment. manager Storage Cluster
Contact for more Deployment
your information. Guide
account (OpenStack).
manager
for more
information

56 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

Distribution Admission Manager (DAM)

Spec vCPUs RAM (GiB) Boot Addl. Storage


Device size volume speed
(GiB) storage (iops)
required
(GiB)

Lab / Single spec 4 4 20 0 50

3.2.4 OpenStack overhead


In addition to the figures given in VM resource specification tables on page 35, you must also
factor into your resource calculations the fact that OpenStack itself uses host resources to run the
hypervisor and other system tasks. OpenStack must have access to dedicated CPU, RAM, and disk
space; otherwise, it will contend with the guests for resources and impact their performance.

You can explicitly reserve CPU cores on the host (for example, by using the isolcpus kernel boot
parameter in conjunction with the vcpu_pin_set nova setting) to ensure that OpenStack does not
make them available to guests and thus retains sufficient operating bandwidth for the hypervisor. You
must reserve a minimum of one physical core per CPU socket for the hypervisor and vSwitch; this
requirement will increase if you are using software-accelerated vSwitches (e.g. Open vSwitch with
DPDK).

You should also reserve sufficient RAM and disk space to prevent resource contention between the
host and guests; see the OpenStack documentation for further details.

3.2.5 SAS data storage

This section does not apply to Master SAS VMs.

Production SAS systems require a volume for storing the SAS data. This is often substantial and
hence broken out into a separate data store with sizing points as shown in the tables below.

The amount of storage you require will depend on both your SAS version and the period for which you
have chosen to retain data. From V10.2, SAS uses 50% less storage space to store an equivalent
amount of data than in earlier versions. Please refer to the applicable table for your SAS version.

Attention:

The sizes given in this table are explicitly offered as options at install time (see the Service
Assurance Server OpenStack Deployment Procedures Guide for details). You may specify an
alternative size if your chosen retention period is not given in Table 3: SAS storage requirements
for V10.2 and later on page 58. Consult your support representative for appropriate data storage
values.

3 Planning your OpenStack deployment 57


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Table 2: SAS storage requirements for versions prior to V10.2

Capacity Storage for root Storage for 7 day Storage speed - Storage speed -
(capacity blocks) partition (GiB) retention (GiB) write (IOPS) read (IOPS)

7 60 1,000 100 75

15 60 2,000 175 100

30 60 4,000 300 150

45 60 6,000 425 225

60 60 8,000 550 275

75 60 10,000 675 350

90 60 12,000 800 400

Table 3: SAS storage requirements for V10.2 and later

Capacity Storage Storage Storage Storage Storage Storage


(capacity for root for 3 day for 7 day for 14 day speed - speed -
blocks) partition retention retention retention write (IOPS) read (IOPS)
(GiB) (GiB) (GiB) (GiB)

14 60 430 1,000 2,000 100 100

(150 for 14 (150 for 14


day) day)

30 60 860 2,000 4,000 150 150

(275 for 14 (275 for 14


day) day)

45 60 1,290 3,000 6,000 225 225

(400 for 14 (400 for 14


day) day)

60 60 1,720 4,000 Not 275 275


supported

75 60 2,150 5,000 Not 350 350


supported

90 60 2,580 6,000 Not 400 400


supported

105 60 3,000 7,000 Not 475 475


supported

58 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

The first row in each table (excluding 14 day retention) can be used with the SAS low-capacity spec.
All others require the SAS high-capacity spec.

Note that this is the storage space required to be presented to SAS. The actual disk size may be more
for redundancy, e.g. if using RAID10 it is typically double this (2000 GiB-24,000 GIB).

This additional storage is not required for lab systems. In those systems, the volume of data is small
enough to fit on the boot device shown in VM resource specification tables on page 35 .

3.2.6 EAS data storage

Single-instance EAS

Redundancy for single-instance EAS is achieved by booting from shared storage. Recovery from a
host failure where an EAS instance was lost requires booting a new EAS instance using the shared
storage volume from the lost instance. This means all EAS storage, including the boot device, must be
located on a shared storage volume: local ephemeral storage cannot be used.

It is not possible to resize the volume being used by EAS. However, the EAS data capacity can be
increased by adding additional volumes to the EAS instance.

Attention:

High capacity MetaSphere EAS systems using vSAN that exceed the low capacity EAS usage
rating of 5K BPU, 11K BSU, defined in VM resource specification tables in Metaswitch Products
VMware Deployment Design Guide, require 10Gb network connections for vSAN traffic on the
underlying hardware.

Virtual Dual Server System (DSS)

Virtual Dual Server Systems do not use shared storage. The MetaSphere EAS service runs on the
active VM and redundancy is achieved by deploying an equivalent standby VM to which you can
transfer the service if the active VM fails. Data is replicated in real time between the active and
standby servers.

The active and standby VMs both need their own private disk. As per the standard recommendations
for Metaswitch VMs, using shared SAN storage is recommended, but local storage on the host is also
permissible.

EAS pool server system

The EAS pool requires two types of storage, as follows:

• Each EAS pool server VM needs its own private disk. As per the standard recommendations for
Metaswitch VMs, using shared SAN storage is recommended, but local storage on the host is also
permissible.
• The server pool as a whole requires an additional data store for deployment-level shared config,
subscriber settings, and voicemail storage.

3 Planning your OpenStack deployment 59


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

• EAS V9.5.40 and earlier requires a filer, accessed via NFS from within the guest. You can
either use a filer supplied by Metaswitch or provide this filer yourself. If you choose to provide
your own filer, this must have equivalent capabilities to the Metaswitch model. See Third-Party
Filer Requirements at https://communities.metaswitch.com for full details of the requirements it
must fulfill.
• From EAS V9.6, you have the alternative option of deploying Metaswitch's Storage Cluster,
which is a virtual data store. You must use this option if you are deploying an EAS GR system.
For more information, see EAS Storage Cluster and Rhino TSN on page 65.

3.2.7 Perimeta data storage

Disk space

You must decide on the size of the virtual hard disk that you will allocate to your Perimeta VMs. The
recommended sizes for each Session Controller type are given in VM resource specification tables on
page 35. However, you can choose any size above 30GiB.

Our recommended default choice for disk size varies by signaling scale as specified in the following
storage calculation table.

Perimeta option Default storage size Fits 7 days of billing


recommendation (GiB) records at call rate (cps)

ISC (low capacity) 40 250

ISC (medium capacity) 60 400

ISC (high capacity) 60 400

SSC (low capacity) 40 250

SSC (medium capacity) 60 400

SSC (high capacity) 150 1000

MSC (low, medium, or high capacity) 30 NA (no billing records)

You should use the following information to decide on the size of the virtual hard disk.

• Above the first 10GiB, Perimeta uses 54% of its remaining disk space for storing XML billing files
or caching unsent Diameter accounting messages, depending on whether you are using XML
billing or Rf charging. For more information on these billing methods, see Billing in the Perimeta
Operations and Maintenance Guide.
• 1GiB is enough space for 10 million calls' worth of compressed XML billing files or 1 million calls'
worth of cached Diameter accounting messages.

60 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

• If you are using XML billing, you should allow a further 2GiB to store the current XML billing file
before it is closed and compressed.

You must also allow further disk space if you are planning to upload uncompressed XML billing
files and / or transform them to another format using XSLT.

• If you are planning to configure the Session Controller to convert your XML billing files into
another format using XSLT before uploading them, you should allow a further 4GiB. This is the
recommended amount of disk space regardless of whether or not the Session Controller will
decompress the transformed files before uploading them.
• If you are planning to configure the Session Controller to decompress XML billing files before
uploading them (but not to transform them using XSLT), you should allow a further 2GiB.
• Above the first 10GiB, Perimeta reserves 5% of its remaining disk space (or a minimum of 1GiB)
for storage accessible via FTP. This is typically used for the following purposes.

• Storing gathered logs, diagnostics packages, billing files and snapshots.


• Uploading new versions of the Perimeta core software.

If this space fills up, you must clear out files you no longer need before you can carry out tasks
such as gathering diagnostics packages or uploading new software. A diagnostics package will
typically use 300-500MB. A core software package will typically use 700-800MB.

Note:

The figures here represent the storage space that must be presented to Perimeta. The actual disk
size may be more for redundancy, e.g. if using RAID10 it is typically double.

Disk speed

The host disk carrying Perimeta's virtual disk must support a minimum disk read speed of 100MB/s.
The disk must meet the speed constraint at all times (e.g. even during a RAID rebuild).

3.2.8 Changing VM resources

Note:

This section does not apply to vCloud deployments. In a vCloud deployment, lifecycle operations
are managed by the SIMPL VM, and VM sizes are defined in a Solution Definition File (SDF).
Resizing VMs requires that the SDF is updated and the VM(s) redeployed.

OpenStack allows you to change the resources assigned to VMs after creation. For example,
you can assign more or fewer vCPUs, or more or less RAM or disk space. Where supported by
the Metaswitch product, this can be a way of migrating a product instance between the standard
supported configurations outlined in VM resource specification tables on page 35.

Product support for changing the resources assigned to VMs after creation is as follows.

3 Planning your OpenStack deployment 61


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Clearwater Core, MVM, BGCF VM and Metaswitch CCF

Changes are not supported. MVM, BGCF VM and Metaswitch CCF each have a single specification
for production VM instances. Clearwater Core has low and high capacity specifications for production
SCNs, and a single spec for all other production node types.

You can deploy more or fewer instances to change the capacity of the system.

Perimeta and the Secure Distribution Engine (SDE)

Changes are not currently supported for Perimeta or the Secure Distribution Engine. If you need to
change the size of your Perimeta or Secure Distribution Engine VMs, please contact your Metaswitch
Support representative to discuss this.

DCM

Changes are not supported. There is only one valid spec.

Distributed Admission Manager (DAM)

Changes are not supported. There is only one valid spec for production deployments.

MetaSphere EAS virtual Dual Server Systems (DSS)

Changes are not supported. There is only one valid spec for production deployments. Multiple Dual
Server Systems can be combined in a Service Federation if you need more capacity.

Storage Cluster

Changes are not supported. If you require more disk space for your Storage Cluster, you must scale
it out by deploying additional VMs, as described in Scaling the Storage Cluster in the Storage Cluster
Deployment Guide (OpenStack). You must contact your support representative before attempting to
scale out your Storage Cluster.

Other products

Other products support changing vCPU and RAM resources to move between supported specs.
This includes reducing the RAM size of the SAS VM from 64GiB to 24GiB after upgrade to V9.2
from an earlier version. See Resizing instances of Metaswitch products running on OpenStack in
the Metaswitch Products OpenStack Deployment Procedures Guide for VM resizing instructions on
OpenStack.

Note that specifications other than those listed in VM resource specification tables on page 35
are not supported. Disk size changes are not supported, although SAS and MetaSphere EAS single
instance or pooled server systems support the attachment of additional block devices to increase the
available storage.

When adding disk capacity to SAS and to MetaSphere EAS single instance or pooled server systems,
careful consideration must be given to the iops capability of the underlying disk elements. More
detailed advice is available in the relevant product documentation, but our recommendation is that the
data store you plan to use for these products be extensible.

62 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

3.3 Storage options

General considerations
Storage in Metaswitch Products Virtual Infrastructure Requirements Guide describes in general terms
the types of storage offered by virtual infrastructures and how Metaswitch products use them. This
can be summarized as follows:

• All virtual infrastructures provide virtual block devices to VMs which look like local disks in that they
can be mounted, partitioned and formatted.
• Virtual block devices may be ephemeral or persistent.

• Ephemeral means the device is tied to the lifecycle of a specific VM; when that VM fails or is
destroyed the device fails or is destroyed with it, and it cannot subsequently be attached to any
other VM.
• Persistent means that the device has a lifecycle independent of any specific VM; it can be
attached, detached and reattached to VMs in sequence, and it survives the failure of any VM to
which it is attached
• Virtual block devices may or may not be backed by redundant storage.
• Some virtual infrastructures cleanly distinguish between ephemeral and persistent storage; others
do not.
• Some virtual infrastructures tie the concepts of ephemeral vs. persistent with redundant vs. non-
redundant; others do not.

OpenStack storage model


OpenStack distinguishes between ephemeral and persistent storage. Ephemeral storage is provided
by the Nova service, and persistent storage by the Cinder service. Nova storage is usually (but not
always) served from local disks on the compute hosts. Cinder volume storage is served across the
network and may be provided by a mix of underlying storage options ranging from software-defined
storage solutions (such as ceph) making use of local disks in dedicated storage hosts through to a
dedicated device such as a SAN. Ephemeral storage is usually non-redundant and persistent storage
is usually redundant, though this it not always the case.

• Distributed Capacity Manager, Clearwater Core, Metaswitch CCF, MetaView Statistics Engine,
Metaswitch Deployment Manager, MVM, Secure Distribution Engine (SDE) and the Rhino nodes
(with the exception of the Rhino TSN, which is described below) use ephemeral (nova) storage for
their boot device.
• Perimeta can use either ephemeral (nova) or persistent (cinder-volume) storage for its boot device.
• OBS, SAS, Storage Cluster, and Rhino TSN have particular requirements discussed in the
following sections.

• OBS on page 64
• SAS on page 65
• EAS Storage Cluster and Rhino TSN on page 65

3 Planning your OpenStack deployment 63


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

• All other products must use persistent (Cinder) storage for both their boot disk and any additional
data volumes they may require. The storage must be redundant, as the products' redundancy and
recover mechanisms depend on this.

Attention:

Several Metaswitch products use LVM to combine space on multiple attached virtual disks into
a single volume. However, an OpenStack bug affecting LVM means that you should exercise
caution when using this feature. By default, guest disks are visible as block devices in the host
operating system, meaning that the host can see and manipulate LVM config within the guest
disks. If a single host is running multiple Metaswitch VMs it may therefore see multiple volume
groups with the same name corresponding to different guests. After certain events (for example a
host reboot) the host may erroneously interpret this as a misconfiguration and attempt to resolve
it by manipulating LVM metadata, resulting in guest VMs being unable to contact their storage
volumes. This bug applies to all iSCSI-based cinder storage, including disk arrays and SANs.

To prevent the bug from affecting the performance of Metaswitch VMs, do one of the following on
each host that runs the nova-compute or cinder-volume service:

• Disable LVM on the host


• Configure an LVM filter on the host to prevent it from seeing and manipulating LVM metadata
within OpenStack Cinder volumes.

Attention:

The Cinder LVM implementation works by creating a logical volume on a single storage host and
exposing it as an iSCSI target. It is therefore a single point of failure and not suitable for providing
persistent storage for Metaswitch products in production environments.

Many products require an additional persistent Cinder volume in addition to their boot device. The
tables in VM resource specification tables on page 35 list the size required for both boot devices
(whether nova or cinder) and additional volumes.

Note:

The storage size specified is that used by the application. The raw storage required to provide that
may be significantly higher, depending on the level of redundancy used by the virtual infrastructure.

3.3.1 OBS
OBS is responsible for the geo-redundant storage of subscriber data. Internally, it uses the Cassandra
NoSQL DB. Cassandra internally stores data in high-churn journal files, which are small but need
very high I/O rates for low-latency performance, and longer-lasting data tables (SSTables), which
may be large and still need reasonably good I/O rates. Further, Cassandra manages replication at the
application level, spreading the data across multiple nodes. It does not require or expect individual
nodes to have redundant storage, and in fact any redundant storage is wasteful.

64 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

• The optimal deployment model for OBS in OpenStack is to use SSD-backed ephemeral nova
storage for both journals and SSTables.
• Next best is to use ephemeral nova storage for journals and put the SSTables on persistent cinder
storage, using the lowest redundancy level possible that is consistent with no two OBS VMs
sharing a single point of failure.

3.3.2 SAS
SAS places high demands on both storage capacity and I/O rates - see SAS data storage on page
57. It is usually regarded as mission critical, meaning that its data must be stored redundantly and
persistently so that no single disk failure loses an entire week's data set. However, it is also supported
on non-redundant and non-persistent storage for scenarios where it is acceptable to lose a week's
data set in the event of a single hardware failure.

• If SAS data is mission critical, it must use redundant persistent Cinder storage.
• If SAS data loss is acceptable, it can use ephemeral nova storage, though it is unusual to find
OpenStack flavors supporting the high disk capacities required.

3.3.3 EAS Storage Cluster and Rhino TSN


From MetaSphere EAS V9.6, pooled server systems on virtual platforms can be deployed alongside a
Storage Cluster instead of a NAS device. A single Storage Cluster can be used to provide a data store
for a single pooled server system, or two separate pooled server systems (each with a local storage
cluster) can be linked together to form a MetaSphere EAS Geographically Redundant (GR) system.

A Rhino TAS Storage Node (TSN) is a VM that runs two Cassandra databases and provides these
databases' services to the other node types in a Rhino VoLTE TAS deployment. Rhino TSNs run in a
cluster with between 3 and 30 nodes per cluster depending on deployment size, with load-balancing
performed automatically.

The Storage Cluster and a Rhino TSN cluster both provide a redundant data store, designed
to be deployable on non-redundant underlying hardware. These components must provide
storage performance similar to an SSD, including high rates of IOPS (as indicated in VM resource
specification tables on page 35) and low read latency of less than a millisecond. The following is
our standard recommended specification, although you may choose to use a different specification as
long as it meets the requirements given above.

• Fast underlying block devices. We strongly recommend that you use SSDs. The overall latency
and throughput of IOPS is normally a more important consideration than capacity for Storage
Cluster VMs or Rhino TSNs.
• Non-redundant storage.

• Storage Cluster VMs and Rhino TSNs replicate the data around the local cluster and operate
using a quorum concept. This means that data is written out 3 times.
• Additional storage redundancy is often either wasteful or can in some cases cause degraded
performance (for example, if the storage layer makes the guest VM wait for replication to
complete after each storage request).

3 Planning your OpenStack deployment 65


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

• Running Storage Cluster VMs or Rhino TSNs on underlying block devices using distributed
storage technology such as Ceph RBS or VMware vSAN may cause performance issues. To
avoid this, please discuss the storage layer design and configuration with your NFVI provider.
For example, consider configuring the storage layer to only use storage that is local to the host
and / or avoid replicating the data between Storage Cluster VMs or Rhino TSNs.
• 10GiB NICs available to Storage Cluster VMs or Rhino TSNs both for application layer and storage
layer network access.

It is often acceptable for lab systems to use lower specifications. Note that while this will often provide
sufficient performance for lab testing, the impact on performance is generally non-linear, and intensive
operations such as upgrades may take longer than they would on a Storage Cluster or a Rhino TSN
cluster using underlying hardware that meets the recommendations above.

Note:

Internally, Storage Cluster VMs and Rhino TSNs use Ceph functionality. Metaswitch are not
endorsed by or associated with Ceph or the Ceph trademark in any way. However, this information
may be useful when planning the provision of block devices for Storage Cluster and Rhino TSN
deployments.

3.4 Storage for Metaswitch products on OpenStack

Note:

OBS, SAS and Storage Cluster have specific storage requirements that differ to the general
guidelines given in this section. You should use the information in the following sections for these
products.

• SAS on page 65
• OBS on page 64
• EAS Storage Cluster and Rhino TSN on page 65

OpenStack makes three types of storage available to VM instances.

• Ephemeral storage, via Nova, which is also used by default as the boot device for VMs booted
from images. Ephemeral storage devices appear like normal hard disks to the VM instance. Some
Metaswitch products use this for their bootimages, and some of those use it for all data (but some
use volume storage for bootimages as below).
• Volume storage, via Cinder. Volume storage devices appear like normal hard disks to the VM
instance, but can be attached or detached through Cinder. Some Metaswitch products use
this for storing data other than bootimages, such as software images, customer config and
diagnostics data. Additionally, some Metaswitch products that do not use ephemeral storage for
their bootimages will use volume storage instead, making it possible to recover the instance in the
event of host failure (assuming that the volume storage is shared and redundant).

66 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

• Object storage, via Swift. This provides a key-value store to VM instances. It is designed for
sharing structured data between instances and is accessed via a special network API. MVM
requires access to storage via this API, which may be provided either via Swift or an alternative
solution not native to OpenStack. Other Metaswitch products do not use object storage.

See Per-product VM requirements on page 32 for details of which Metaswitch products boot from
ephemeral storage, and which from volume storage.

Ephemeral storage may be provided locally on the compute host, using files on the compute host's
filesystem, or on shared storage made available over the network. On production systems, it must use
RAID or an equivalent mechanism for redundancy, whether local or shared.

Volume storage must be provided by shared storage using either a distributed storage cluster, e.g.
Ceph, or by dedicated storage array hardware. Shared storage is always expected to use RAID or
an equivalent mechanism for redundancy. We do not require a particular model of redundancy to be
used, only that the storage meets the performance requirements with the chosen model.

Note:

The Cinder LVM implementation works by creating a logical volume locally on a single storage
host (running the cinder-volume service), and then exposing that volume as an iSCSI target. The
storage host for Cinder is thus a single point of failure.

On production systems, you must ensure that the network bandwidth used for shared storage traffic
does not contend with the bandwidth used for service traffic. Additionally, as discussed in Storage
options on page 63, if you are using shared storage, you should ensure that sufficient disk space
and IOPS are reserved for the storage cluster software (e.g. Ceph) to avoid resource contention with
the VM instances.

3.4.1 High performance storage


MetaSphere EAS single instance or pooled server systems, SAS, and Storage Cluster require access
to large amounts of storage with a guaranteed I/O rate available to them.

Typically this will mean that these products require storage provided by dedicated hardware in order
to deliver the necessary I/O performance.

See Volume storage on page 82 for details of the hardware we use for large scale data storage in
Metaswitch's labs.

In order to use a storage device such as an iSCSI storage array in one of your OpenStack clouds, you
must choose a device that is supported by a Cinder driver.

3.5 Network configuration


As described in the Virtual Infrastructure Requirements Guide, VM instances require direct access to
external networks without an intervening NAT for all networks that VM instances will use, other than
the internal high availability network.

3 Planning your OpenStack deployment 67


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Additionally, if you are using multiple clouds for cloud redundancy as described in High availability
on page 14, the internal high availability network must also be a provider network that connects the
OpenStack clouds together but is not connected to the wider network.

3.5.1 Network interfaces


The various products use the following traffic types. The installation instructions for each product
describe how to map these to virtual network interfaces when you create your VM instances.

Product Mgt Sig Media Service LI HA Internet

CFS 1 1 1 1

AGC/ MGC 1 1 1 1

RPAS 1 5-6

OBS 1

MRS 1 1 1

MVD 1 1 1

ESAP 1 1

EAS (single instance) 1 1 1

EAS (virtual Dual Server 1 1 1


System)

EAS (pool server system) 1 1 1 1

N-Series 1 1 1

Perimeta (all) 1 1-8 1

AMS 1 1 1

Clearwater Core SPN/DGN/ 1 1


SCN

BGCF VM 1 1

Clearwater Core OAN 1 1 1

Metaswitch CCF CPN/CSN 1 1

68 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

Product Mgt Sig Media Service LI HA Internet

MVM 1 See the See the See the See the See the
note note note note note
below below below below below
the the the the the
table. table. table. table. table.

MVS 1

SAS 1

DCM 1

MVSE 1

MDM 1 1

Rhino nodes 1 1-3 1 1

Secure Distribution Engine 1 2


(SDE)

QCall PCC/CRS 1 1

QCall SVS 1 1 1

ServiceIQ Monitoring 1
(SIMon)

Storage Cluster 1

DAM 1

Note:

MVM requires a Management network interface and, beyond that, supports the assignment of the
majority of its network traffic to virtual network interfaces of your choice.

Note:

In deployments where MDM is not used to provide DNS to other nodes in the solution it can be
deployed without a signaling interface.

The interfaces shown in the table above are as follows.

3 Planning your OpenStack deployment 69


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

• Mgt - Management
• Sig - Signaling
• Media - Media
• Service - Service, used for signaling and/or media by Perimeta, for signaling by Secure
Distribution Engine. for non-VoIP service traffic from end-user clients by EAS pool server systems,
for smartphone app access to OAN, AMS and the Sentinel Authentication Gateway on Rhino MAG
nodes.

Note:

Aside from the identical name, the various implementations of the Service interface are
otherwise dissimilar. It is therefore expected that the Service interfaces for EAS, Perimeta, OAN,
AMS, and Rhino MAG nodes will each connect to a different network.

• LI - Lawful Interception
• HA - High Availability, used for communication between the two halves of a 1+1 redundant pair or
for cluster HA for N+K deployments.
• Internet - Used for resources accessed over the public internet.

These interfaces are used in the following ways:

• The products, except for the Secure Distribution Engine, always put HA traffic (which is the traffic
that flows between VM instances of products deployed in a 1+1 HA arrangement) on the HA virtual
network interface.

• The Secure Distribution Engine puts HA traffic on the service network interface for the internal
network.
• QCall and ServiceIQ Monitoring put HA traffic on the Management interface.
• The products always put Management traffic on the Management virtual network interface.
• The products always put Internet traffic on the Internet virtual network interface.
• The products can either put Signaling, Media and LI traffic on their own virtual network interfaces,
or combine them with each other, and / or with the Management traffic on the Management virtual
network interface. Combining is done if you want to use the same IP address for multiple of those
traffic types. For some products, you may need to create placeholder network interfaces in order to
ensure the expected number of interfaces is presented to the VM. The installation instructions for
the product describe how to do this.

Attention:

Important: For the purpose of designing your network, you need to refer to detailed product
documentation to determine in detail what traffic is classified as Management, LI, etc., in the case
of individual products.

Note the following additional points.

• Some products require multiple IP addresses per network interface. Refer to the IP Network
Design Guide for details on IP address requirements, and to the OpenStack Deployment

70 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

Procedures Guide for details on how to configure OpenStack to allow the instance to use these
additional addresses.
• For Perimeta high performance networking considerations, see Specific high-performance
requirements for Perimeta, Rhino nodes and the Secure Distribution Engine on page 74.

3.5.2 Bandwidth
Bandwidth required between products and between different components is no different for
virtualization from the bandwidth in a hardware appliance system. See the relevant products' IP
Network Design Guide manuals for more information.

However, with a virtual deployment it is also necessary to consider bandwidth between compute
nodes and storage. The bandwidth you need in your deployment depends on your pattern of use,
but you can approximate it based on the iops demands of products using shared storage (calculated
by applying the information in VM resource specification tables on page 35 to your VMs). The
conversion factor Metaswitch assumes is that 1 Gigabit/s of bandwidth is required for each 400 iops.

Note:

If you are using Perimeta with virtio interfaces, see Networking on page 84 for information about
benchmarking network performance to determine the impact of using virtio interfaces.

Note:

Remember that the filer in an EAS pool server system is accessed by the guest. NFS traffic to the
filer is therefore via the VM networks and not the storage network.

3.5.3 IP address assignment / DHCP


Most Metaswitch products can use DHCP to receive their initial IP address configuration.

• Clearwater Core V11.5 and above, Secure Distribution Engine, BGCF VM, Storage Cluster,
Distributed Admission Manager (DAM) and Distributed Capacity Manager (DCM) receive their
initial IP addresses from OpenStack at deployment.
• For Clearwater Core V11.4 and below, initial IP addresses can also be assigned statically via
virtual console access.

Once their management IP address has been configured, all products can be accessed over a Craft
or SSH interface which must be used to configure their IP addresses on various interfaces such as
signaling, media and HA. The exceptions to this are

• Clearwater Core V11.5 and above and Secure Distribution Engine, which receive service IP
addresses at deployment.
• Clearwater Core V11.4 and below, which will use DHCP to configure its signaling IP address if
DHCP has been used to configure the management IP address
• Distributed Capacity Manager and DAM, which only have a management interface and so need no
further configuration.

3 Planning your OpenStack deployment 71


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Attention:

It is important that the DHCP service provides continuity of IP addresses. That is, once it has
allocated an IP address to a VM, it continues to serve the same IP address to that VM in future,
even across reboots. This is the default behavior of the OpenStack DHCP service.

Note:

Metaswitch products do not support adjusting the MTU via DHCP.

3.5.4 Key-based login


There are two generally recognized ways for providing SSH access to VMs. These are

• username/password challenge (where the product is pre-configured to know about valid username/
password combinations); and
• key-based login (where a public key for a given user is stored on the product).

All Metaswitch products except MVM support username/password challenge. Additionally, Distributed
Capacity Manager, Perimeta, Secure Distribution Engine (SDE), BGCF VM, Storage Cluster and
Distributed Admission Manager (DAM) support key-based login using keys injected into the VM via
the OpenStack compute metadata service.

For Rhino nodes, key-based login is required for upgrade and patching purposes from a management
host.

For MVM and Storage Cluster:

• The root user cannot log in over SSH.


• The centos user (for MVM) and qs-admin user (for Storage Cluster) can log in only over SSH by
key-based login, not by password.
• You can use Heat templates to add other local users. Local users can log in only over SSH by key-
based login, not by password.

DAM only supports username/password challenge for configuration changes through DCS, and only
in conjunction with a RADIUS server. It does not support username/password access for SSH access,
where key-based login is the only option.

For Secure Distribution Engineand BGCF VM, username/password challenge is only supported in
conjunction with a RADIUS server.

For MDM, only key-based login is supported. The same key used for login must be used by the
ServiceIQ Management Platform (SIMPL) for updates and upgrades.

3.5.5 Security groups


Security groups must be configured so that unfiltered connections can be made between all
Metaswitch products in a deployment. We recommend that you put all VM instances in the same
security group, and configure the security group to allow all traffic within that group.

72 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

Aside from the guidance below, you will also need to consider "firewall" rules for many other traffic
types used by Metaswitch products. See individual product document for what protocols each product
uses.

SCTP

CFS, AGC, MGC, MVD, RPAS, MVS, EAS, and the Rhino SMO node (for both OCSS7 and IP-SM-
GW applications) use SCTP to communicate between them.

The Sentinel VoLTE application on the Rhino MMT node, the BSF Service on the Rhino MAG
node and the Sh Cache Microservice on the Rhino ShCM node support SCTP as transport for the
Diameter protocol. SCTP may or may not be required for these nodes depending on the Diameter
implementation in your deployment.

By default, OpenStack does not include rules for SCTP traffic, so SCTP traffic will be filtered from
any traffic that goes between multiple security groups or multiple clouds. If your deployment requires
SCTP, you must therefore create a rule that permits SCTP traffic. SCTP is protocol number 132.

IPsec

CFS, AGC, MGC, MVD, RPAS and ESAP can be configured with secure connections to MVS. These
secure connections use IPsec. By default, OpenStack does not include rules for IPsec traffic, so IPsec
traffic will be filtered from any traffic that goes between multiple security groups or multiple clouds.
Therefore you must create a rule that permits IPsec traffic. IPsec uses protocol numbers 50 and 51.

This also applies to Perimeta service interfaces when configured to use IPsec tunnels except when
using SR-IOV (which bypasses security group rules).

Redundant cloud deployments

If redundant cloud deployments are in use, the security groups in each cloud deployment must
additionally be configured to allow all traffic, including SCTP and IPSec traffic, to and from the IP
ranges owned by the other cloud deployments.

3.5.6 Specific requirements for Perimeta virtual interfaces


Perimeta can have two, four, six or eight virtual NICs (vNICs) for service traffic.

• For high availability (HA) systems, you can have two, four, six or eight service vNICs.
• For standalone systems, you can have two service vNICs.

Perimeta requires additional NICs for the management and high availability networks. Calculate the
total number of vNICs required as follows.

Table 4: Total number of vNICs required

System type Total number of vNICs

High availability Number of service vNICs + 2 (so four, six, eight


or ten vNICs)

3 Planning your OpenStack deployment 73


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

System type Total number of vNICs

Standalone Three (2 service vNICs + 1)

Perimeta supports on-instance link redundancy, by pairing two NICs together in a port group. If you
use SR-IOV for high performance, we recommend that the vNICs (the Virtual Functions, or VFs)
in the port group use different physical NICs (Physical Functions, or PFs). This provides maximum
link redundancy. For more information on SR-IOV, see Specific high-performance requirements for
Perimeta, Rhino nodes and the Secure Distribution Engine on page 74.

Most OpenStack deployments do not currently support VLANs being exposed to guest VMs. As
such, Perimeta disables configuration of VLANs on OpenStack by default. If you have an OpenStack
deployment that does support guest VLANs and you want to make use of this in Perimeta, you can do
so in the following cases.

• Perimeta uses virtio (not SR-IOV or PCI passthrough). In this case, you must orchestrate
the enabling of VLAN configuration, as described in Template format in Perimeta Automated
Orchestration Guide (OpenStack environments).
• Perimeta uses Cisco NICs using the enic driver (for example, VIC 1340) for SR-IOV or PCI
passthrough. In this case, Perimeta supports up to 1024 VLANs per low capacity Session
Controller and 4095 VLANs per high capacity Session Controller.
• Perimeta uses the Intel X710 or XXV710 Network Adapter with the i40evf driver for SR-IOV. In this
case, Perimeta supports up to 8 VLANs for each service vNIC. Note that the Intel XL710 Network
Adapter is not supported for SR-IOV.

For Perimeta Session Controllers that are using SR-IOV with other NIC drivers, a single VLAN is
allowed for each service interface (not configured in the guest). If you need both maximal session
capacity and to connect more than 8 VLANs into a single Perimeta Session Controller, please contact
your Metaswitch support representative to discuss your options.

You may want to have Perimeta connect multiple vNICs to the same neutron network - e.g. if some of
the service traffic (e.g. Rf charging) must be routed onto the management network. Perimeta supports
VMs being created with multiple vNICs connected to the same network. This is enabled by default.

3.5.7 Specific high-performance requirements for Perimeta, Rhino nodes and the Secure
Distribution Engine
As described in the Virtual Infrastructure Requirements Guide, for high-performance operation
Perimeta, the Rhino nodes and the Secure Distribution Engine (SDE) must use a network acceleration
technology. The supported mechanisms are SR-IOV, PCI passthrough and DPDK.

• SR-IOV, Single Root I/O Virtualization, allows a single NIC (termed the Physical Function -
PF) to be shared between a bounded number of VMs, providing a Virtual Function (VF) network
interface to each VM.
• PCI passthrough allows the VM to have full access to and full control of a physical NIC (or other
device) connected to a PCI bus. This is an alternative to SR-IOV.

74 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

• DPDK, Intel's Data Plane Development Kit, enables efficient packet processing by processing
vNIC packet queues in a tight poll mode loop, rather than using traditional interrupt handling
approach.

Table 5: Supported network acceleration technologies by product

SR-IOV (host PCI passthrough (host DPDK (mode in the


networking) networking) guest)

Perimeta Supported on the Supported on the Supported (on certain


following NICs: following NICs: systems) and enabled by
default (see DPDK mode
• Intel NICs using the • Cisco NICs using the
on Perimeta on page
82599 controller and enic driver (e.g. VIC
78).
the IXGBEVF driver 1340).
(e.g. Intel X520 Server
Intel NICs using the
Adapter).
82599 controller have
• Intel X710 or Intel not been tested by
XXV710 Network Metaswitch and are
Adapter with the therefore unsupported.
i40evf driver. Note If you want to use Intel
that the Intel XL710 NICs, consult your
Network Adapter is not support representative.
supported.
• Cisco NICs using the
enic driver (e.g. VIC
1340).

The Perimeta MSRP


MSC does not support
SR-IOV.

Rhino nodes Supported on the Supported on the Not applicable


following NICs: following NICs:

• Intel NICs using the • Cisco NICs using the


82599 controller and enic driver (e.g. VIC
the IXGBEVF driver 1340).
(e.g. Intel X520 Server
Intel NICs using the
Adapter).
82599 controller have
• Intel X710 or Intel not been tested by
XXV710 Network Metaswitch and are
Adapter with the therefore unsupported.
i40evf driver. Note If you want to use Intel
that the Intel XL710

3 Planning your OpenStack deployment 75


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

SR-IOV (host PCI passthrough (host DPDK (mode in the


networking) networking) guest)

Network Adapter is not NICs, consult your


supported. support representative.
• Cisco NICs using the
enic driver (e.g. VIC
1340).

Secure Strongly recommended Not supported Always enabled for the


Distribution on the service networks. service networks.
Engine
Supported NICs:

• Intel X520 NICs with


the IXGBEVF driver.
• Intel X710 NICs with
the i40evf driver.

This is the standard


deployment model
and used to provide
guaranteed performance
metrics. If you need to
use a different NIC in a
production environment,
you must first discuss
your requirements
with your support
representative, including
a plan for qualifying your
NICs.

It is vital to understand which if any of these acceleration technologies will be used before doing
capacity planning. It can make a huge difference to performance.

Note:

Combining SR-IOV and DPDK mode on Perimeta provides the best media throughput. You should
only use PCI passthrough if SR-IOV is not suitable for your deployment.

For Perimeta ISCs or MSCs without SR-IOV or PCI passthrough, packets-per-second processing
limits of the vSwitch in the host mean that overall media capacity may be restricted to less than the
combined VM capacities, if multiple media-processing VMs are deployed on the same host without
media impairment. As a rough rule of thumb, a single OpenStack host using Open vSwitch can only
reach a few hundreds of media sessions, however many VMs are assigned to it.

76 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

For Rhino nodes without SR-IOV or PCI passthrough, the signaling latency increases significantly,
reducing the number of new sessions per second the products can handle.

Networking options

SR-IOV

SR-IOV provides greatly improved performance at high packet rates because the NIC hardware
provides the VF directly to the VM, without indirection through the vSwitch in the hypervisor.

SR-IOV has some specific hardware and hypervisor requirements.

• The network cards must support it and have appropriate PF drivers in the hypervisor.
• Special handling is needed for the VF driver in the VM. The Perimeta, Secure Distribution
Engine or Rhino VM must contain a VF driver that matches the host PF driver. For Perimeta,
you can determine whether the correct driver is already supported by referring to https://
communities.metaswitch.com/docs/DOC-154762 on the Metaswitch Support Community.
• The host CPU must support I/O MMU virtualization (Intel VT d). Most recent Intel CPUs have this
support.
• If you are using SR-IOV with Perimeta VM(s) that use Intel X710 Network Adapters, you must set
network_allocate_retries=1 in your nova configuration file (nova.conf).

SR-IOV also has some drawbacks.

• It requires careful management of PF/VF driver compatibility.


• It does not allow VLAN trunking (using multiple VLANs to separate multiple service interfaces on
top of the same virtual network interface), except in the following circumstances.

• Perimeta on hosts with Cisco NICs using the enic driver. On these hosts, Perimeta supports up
to 1024 VLANs per low capacity Session Controller and 4095 VLANs per high capacity Session
Controller.
• Perimeta or Secure Distribution Engine on hosts with NICs using the Intel 710 series controller
and i40evf driver. On these hosts, Perimeta supports up to 8 VLANs for each Virtual Function
(VF) network interface. Secure Distribution Engine supports up to 60 VLANs in total.
• If Perimeta is running on a host using NICs with the Intel 710 series controller and i40evf driver for
SR-IOV, each port group can support a maximum of 6 local IPv6 addresses. The same address on
multiple service-interface counts multiple times towards this limit.
• It bypasses OpenStack security group rules implemented in the host. For Perimeta and the Secure
Distribution Engine, this is unlikely to be an issue as they are security devices themselves.
• It is incompatible with VM migration and VM snapshots.
• It is incompatible with the use of VXLAN, GRE or other overlay networking.

Finally, there are some current OpenStack limitations related to SR-IOV:

• In OpenStack Pike and earlier, VMs with SR-IOV interfaces need to be created in a specific way
using OpenStack APIs or the command line tools. They cannot be created through the Horizon
GUI. This limitation is not present in OpenStack Queens and later.

3 Planning your OpenStack deployment 77


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

• Link redundancy using SR-IOV is not supported as there is no way to get OpenStack to choose
different PFs for two redundant VFs allocated to a guest.

Note:

If you are using SR-IOV with OpenStack Pike or later, you can choose to configure PCI device
weighting for the Nova scheduler to ensure that any VMs that do not use SR-IOV are preferentially
assigned to hosts which do not support it, leaving those hosts that do support SR-IOV for the VMs
that require it. For more information, see Anti-affinity and scheduling on page 21.

The Perimeta MSRP MSC does not support SR-IOV.

PCI passthrough

PCI passthrough provides greatly improved performance at high packet rates because the NIC
hardware provides the hardware PCI function directly to the VM.

PCI passthrough has some specific hardware and hypervisor requirements.

• The host CPU must support I/O MMU virtualization (Intel VT-d).
• The network cards must support it.
• You cannot have multiple NICs of a given type on different networks.

PCI passthrough has the same drawbacks as SR-IOV (see SR-IOV on page 77), except:

• It does not require careful management of PF/VF driver compatibility.


• The NIC cannot be shared between multiple VMs.

Options in the guest

For high performance, Perimeta and the Secure Distribution Engine can process incoming packets
using DPDK mode instead of a lower-performance interrupt handling mode. DPDK mode is always
enabled on the Secure Distribution Engine. Rhino nodes do not support DPDK as a network
acceleration technology.

DPDK mode on Perimeta

DPDK mode allows the Session Controller to dedicate vCPUs to packet processing. Perimeta can use
DPDK mode if all the following hold.

• You are using the medium or high-capacity MSC or ISC VM sizes.


• Perimeta has a poll-mode driver that is compatible with all the service vNICs:

• Any SR-IOV vNIC that uses the 82599 controller and the IXGBEVF driver (e.g. VFs provided
by an Intel X520 Server Adapter in the host). In this case, Perimeta will automatically enable
DPDK mode.
• vNICs using the virtio driver. This is only supported on recommendation from Metaswitch and
only when you are using a suitable accelerated network layer in the host. This function is only
available in an orchestrated environment and needs to be explicitly enabled using Perimeta's
orchestration API.

78 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

• (Medium-capacity MSC or ISCs only) VFs provided by an Intel X710 Network Adapter or Intel
XXV710 Network Adapter in the host. In these cases, Perimeta will automatically enable DPDK
mode.

When DPDK mode is enabled, the Perimeta vCPUs and the corresponding hyperthreads or processor
cores that are responsible for running dataplane poll loops will max out at 100% CPU utilization in
normal operation.

DPDK mode can reduce the media transcoding capacity of a Session Controller, as DPDK mode
involves dedicating vCPUs to packet processing. From V4.1.20, you can choose to disable DPDK
mode for Session Controllers with SR-IOV NICs that will be used primarily for media transcoding.
For more information on the media transcoding capacity of Session Controllers with DPDK mode
enabled, see https://communities.metaswitch.com/docs/DOC-275537. For more information on
the media transcoding capacity of Session Controllers with DPDK mode disabled, see https://
communities.metaswitch.com/docs/DOC-275536.

You will usually decide whether or not DPDK mode will be enabled on a Session Controller during
the commissioning process. However, if you subsequently need to change this setting on a Session
Controller with SR-IOV NICs, you can do so by following the steps given in Enabling or disabling
DPDK mode on a virtual machine with suitable SR-IOV interfaces in Perimeta Operations and
Maintenance Guide.

Attention:

Metaswitch strongly recommends that you use DPDK mode on Session Controllers using the
IXGBEVF driver and V4.8 and above. DPDK mode is enabled by default on these Session
Controllers.

3.5.8 Specific requirements for high capacity TLS/TCP performance on Perimeta

You will need to carry out some additional configuration on your host if Perimeta must handle 60,000
or more TCP subscribers and the host network layer that you are using tracks connection state.

The following list describes the specific configuration that you must add if you are using standard
Open vSwitch to support 60,000 or more TCP subscribers. You will need to make similar changes if
you are using another type of vSwitch that does not provide accelerated networking. The exact steps
that you need to take will depend on the type of vSwitch.

Note that there is no need to apply this additional configuration if you are using SR-IOV.

• You must increase the size of the connection tracking tables in the host kernel to allow for more
than 200,000 connections. You can do this using the following commands on the Compute node.

sysctl -w net.netfilter.nf_conntrack_max=262144

sysctl -w net.netfilter.nf_conntrack_generic_timeout=120

sysctl -w net.netfilter.nf_conntrack_tcp_timeout_established=54000

3 Planning your OpenStack deployment 79


OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

echo 32768 > /sys/module/nf_conntrack/parameters/hashsize


• You must make add the following lines to the /etc/sysctl.conf file.

# Altered conntrack sizes to support many TCP connections

net.netfilter.nf_conntrack_max=262144

net.netfilter.nf_conntrack_generic_timeout=120

net.netfilter.nf_conntrack_tcp_timeout_established=54000
• You must make add the following lines to the /etc/modprobe.d/options.conf file. Note that
you may need to create this file, or use an alternative .conf file in the modprobe directory.

# Set conntrack hashsize to larger than default value (16384) to support


more TCP

connections without bad hash table filling

options nf_conntrack hashsize=32768

3.5.9 Specific requirements for IPv6 on Perimeta and Rhino nodes

If you need to use IPv6 addresses on Perimeta or Rhino nodes, you must note some additional
restrictions.

Perimeta restrictions

• If you need to use IPv6 addresses on the management interface, you cannot use DHCP to assign
them.
• If you are using Open vSwitch, your Linux host's kernel version must be V4.2 or above. A bug in
kernel versions earlier than V4.2 breaks IPv6 fragmentation, which can cause large IPv6 packets
to be dropped. This affects the connections between SSCs and MSCs in a distributed deployment,
and may also affect other interfaces (e.g. SIP).

Rhino node restrictions

Signaling over IPv6 is supported by Rhino VoLTE TAS nodes (MMT, SMO, MAG, ShCM and TSN),
but the cluster management requires an IPv4 interface due to the cluster protocol used. MCP and
MAX nodes do not support IPv6.

3.5.10 Support for SR-IOV for Perimeta's high availability interface


Perimeta supports the use of SR-IOV (as described in Specific high-performance requirements for
Perimeta, Rhino nodes and the Secure Distribution Engine on page 74) for its service interfaces.

Perimeta also supports SR-IOV for its high availability (HA) interface, used for communication
between the two halves of a 1+1 redundant pair or for cluster HA in an N+K deployment.

In order to support SR-IOV for Perimeta's HA interface, your host hardware must be using Intel X710
NICs.

80 3 Planning your OpenStack deployment


CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

4 Capacity planning
This section is intended to help you estimate how the details of the hardware you use for your hosts
will affect the capacity you can expect from your OpenStack environment.

It sets out the reference hardware corresponding to the capacity figures presented in VM resource
specification tables on page 35 for the different product VM specifications, and then explains how to
estimate the capacity on different hardware. Note that there is differing guidance depending on the
particular product.

In most cases, the capacity figures in VM resource specification tables on page 35 assume a
particular profile of use (for example how many calls and of what duration your residential and
business subscribers make). Your profile of use may differ significantly from the profiles assumed
here: please consult Metaswitch for detailed advice on this.

Any performance or capacity numbers you obtain by following the guidance in this section are
indicative and should be verified in practice.

4.1 Reference hardware


This section outlines the reference hardware used to test the performance of Metaswitch products on
OpenStack and to determine the figures given in VM resource specification tables on page 35 .

4.1.1 OpenStack compute hosts


The following reference hardware was used for the hosts on which performance and capacity
measurements were taken for all Metaswitch products other than Perimeta, and for which indicative
numbers are provided in VM resource specification tables on page 35.

• Server model: Dell PowerEdge R630 rack-mounted server


• Processor: dual Intel Xeon E5-2690 v3

• 2.6GHz
• 12 physical cores with hyper-threading (i.e. 24 vCPUs per CPU socket)
• 30MB cache
• 9.6GT/s QPI
• RAM: 12 x 16GiB RDIMM, 2133MT/s, Dual Rank, x4 Data Width
• Network:

• 10-Gb Ethernet NICs for storage network access


• Gb Ethernet NICs for all other traffic
• Open vSwitch 2.3.1 for testing Perimeta performance without SR-IOV

The reference hardware used for the hosts on which performance and capacity measurements were
taken for Perimeta is described in https://communities.metaswitch.com/docs/DOC-231501.

4 Capacity planning 81
OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

4.1.2 Volume storage


For volume storage, the reference hardware is as follows.

Ceph cluster

For VM instances other than high capacity SAS, we used a Ceph cluster spread across three hosts,
using 18 x 900GB SAS 6Gbps 2.5-in 10k RPM hard drives.

In practice, we expect any volume storage back-end supported by OpenStack to be sufficient.

iSCSI SAN storage array

For high capacity SAS volume storage, we used a 48TB (raw capacity) iSCSI storage array with a
hardware RAID controller, connected via a SAN using 10-Gigabit Ethernet.

4.2 Benchmarking performance


This section explains how to compare the hardware you intend to use with the reference hardware
and estimate the product performance you can expect to get on your hardware. There are sections for
different products and hardware aspects.

Note:

We recommend that you use a benchmarking resource to compare your hardware with the
reference hardware and adjust your performance estimates accordingly. PassMark Software is one
such resource; its benchmarks for CPU performance can be found at http://www.cpubenchmark.net
and those for RAM at http://www.memorybenchmark.net.

As a general rule, it is not permitted to use different quantities of resources from those specified in VM
resource specification tables on page 35, but it is possible to vary the quality of those resources. As
a concrete example, we require that a high-capacity CFS VM has 12 vCPUs, but it is possible to use
CPUs with a core clock speed different from the 2.6GHz in the reference hardware.

There are exceptions to this with disk storage. For example, Perimeta has a recommended amount
of disk space, but you can use different amounts depending on the number of billing logs you want to
retain (as set out in Perimeta data storage on page 60.).

As discussed in Power management in Metaswitch Products Virtual Infrastructure Requirements


Guide, if you are attempting to measure performance by running high load through VMs running
on only a small numbers of cores, your power management policy can significantly impact the
performance provided by your virtualization infrastructure. You may find your measured performance
is lower than expected unless you run separate load on the same virtualization infrastructure to keep
the CPU from entering power-saving mode, or unless you disable power management entirely during
your testing.

82 4 Capacity planning
CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

4.2.1 CPU speed

Perimeta

For Perimeta, the expected performance on Intel CPUs of the Xeon® Scalable family can be
obtained by scaling the metrics given in https://community.metaswitch.com/support/solutions/
articles/76000006189 linearly with clock speed, up to 3GHz. Beyond 3GHz it is not possible to predict
resulting performance with sufficient precision, so this only applies to scaling down for lower clock
speeds.

For Intel CPUs of other families, benchmarks such as PassMark provide a rough indication for
scaling from the reference environment in https://community.metaswitch.com/support/solutions/
articles/76000006189.

Clearwater Core and Metaswitch CCF

For Clearwater Core and Metaswitch CCF, we expect performance on CPUs of the E5-2600 v3 family
to vary linearly with the clock speed of the CPU cores, up to a cap at 3GHz. Beyond 3GHz it is not
possible to predict resulting performance with sufficient precision.

For Intel CPUs of other families, benchmarks such as PassMark provide a rough indication for scaling
from the reference environment described in Reference hardware on page 81.

Other products

Note:

This section applies to all products except Perimeta, Clearwater Core, and Metaswitch CCF.

For these products you can expect to get roughly the performance set out in the tables in VM resource
specification tables on page 35 provided the per-core performance provided by the CPU you are
using is at least as good as that for the reference hardware. In assessing this, you can use common
benchmarks such as PassMark. However, even if you are using more powerful CPUs, the maximum
expected performance is as set out in those tables.

If the benchmark figure is lower than that for the CPU in the reference hardware, then you can expect
to get less good performance, with benchmarks such as PassMark providing a reasonable estimate
for the expected reduction in performance.

4.2.2 RAM
You must ensure that you have provisioned sufficient RAM for your VMs, and provided dedicated
access to it.

However, RAM speed can have a slight effect on performance. You can compare different RAM
speeds using benchmarking resources such as PassMark.

4 Capacity planning 83
OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

4.2.3 Disk space and I/O


Metaswitch products require that the appropriate amount of disk space is configured for each VM, as
set out in VM resource specification tables on page 35.

For disk I/O rate, it is necessary to ensure that the criteria specified earlier in this document are met
by the storage system in use. If they are not, then it is not reasonable to expect that the predicted
product performance will be met in all patterns of use. It is hard to predict in detail the impact of
insufficient I/O rate, but be aware that the impact is likely to be worse than you would predict based on
a linear degradation pattern. It is also unlikely that using even significantly faster or more capable disk
hardware will result in greater predicted performance.

4.2.4 Networking
When using Perimeta with virtio in network environments different from standard Open vSwitch,
you will need to benchmark Perimeta's performance in your environment to understand its limits.
This applies particularly to concurrent media stream capacity. For more information, speak to your
Metaswitch support representative.

84 4 Capacity planning
CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

5 Worked examples
This section contains some worked examples showing how to apply the guidance and other
information in this document when designing virtualized deployments of Metaswitch products.

We give number of different examples, illustrating different aspects of the information through
particular scenarios.

5.1 Large multi-product deployment


This section focuses on the VM resource requirements in a large deployment with multiple products,
the distribution of VMs across compute nodes, and how ultimately to determine the minimum host
hardware requirements for the compute nodes. This puts into practice much of the guidance from
Planning your OpenStack deployment on page 24.

Solution overview
This worked example assumes we are planning a statically sized deployment involving:

• 500k consumer subscribers


• 30k hosted business subscribers
• pre-IMS architecture (so excluding Clearwater Core)
• all SIP for access and interconnect.

The products and architecture that will be involved are:

• Clustered CFS, with two-site GR (including CFS, MRS, RPAS, OBS, MVD, MVS)
• EAS providing MaX UC Client, CommPortal and voicemail for business subscribers
• N-Series providing (Basic) Music on Hold
• Perimeta SBC
• SAS.

This worked example assumes that

• we are creating a fresh OpenStack deployment hosting just Metaswitch product VMs;
• we want cloud-level redundancy to provide and continued site availability in the event of
maintenance (e.g. upgrade) of a cloud, and hence will deploy two OpenStack clouds per site
• we want geographic redundancy.

Product VM instances
We determine the number of VM instances of each product, and their product topologies. For general
guidance, see Product topologies on page 25 and VM resource specification tables on page 35. In
addition your Metaswitch Sales or Support team can also provide guidance on the number of product
instances required.

5 Worked examples 85
OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

• CFS. For clustered CFS with two sites and GR, either site must handle the full deployment
capacity if the other site fails. One CFS HA pair, with the high capacity VM spec, supports 500k
residential or 200k business subscribers, or a combination, so we need two high-spec CFS 1+1
HA pairs per site.
• MVD. Clustered CFS always requires one pair per site. The low-spec MVD is too small at only
500k subscribers, so we need one high-spec MVD 1+1 HA pair per site.
• RPAS. Capacity is quoted by BHCA (one RPAS supports 5 million BHCA), and we make some
assumptions to get to that from subscriber counts as follows.

• Residential subscribers make 2 calls in the busy hour.


• Business subscribers make 5 calls in the busy hour.

In our deployment that equates to (500k * 2) + (30k * 5) = 1.15M BHCA, which is well within what one
RPAS can cope with, so for redundancy we just need a 1+1 RPAS pool per site.

• OBS. OBS supports 3 million total cluster subscribers, and a pair can cope with 1.5M instantiations
from a site failure, so we just need a 1+1 OBS pool per site.
• MRS. Capacity is quoted by number of media streams, and we make some assumptions to get to
that from subscriber counts.

• Residential subs have 2% of calls requiring media resources, with 12:1 call concurrency (up to
1/12 of all subscribers in calls simultaneously).
• Business subs have 2% of calls requiring media resources, with 6:1 call concurrency.
• Basic MOH endpoints have 100% of calls requiring media resources, with 40:1 call
concurrency, and two channels for each call.
• Each Premium ACD sub (with monitor/whisper/barge-in) requires 0.75 channels.

In our deployment, assuming all the business subscribers use MOH, but none are premium ACD, that
equates to (500k * 2% / 12) + (30k * 2% / 6) + (30k * 100% / 40 * 2) = 833 + 100 + 1500 = 2433 MRS
sessions. One MRS supports 500 sessions, so we need a 5+5 MRS pool per site.

• DCM. All production deployments require a 1+1 DCM pool per site .
• MVS. Clustered CFS always requires one per site. This can be the mid-sized VM specification (up
to 1M subscribers), so we need one mid-spec active + standby MVS per site .
• EAS. Our residential subscribers do not use EAS services, but business subscribers do, and with
MaX UC Client, CommPortal and Voicemail we treat them as Business Premium Subscribers.
One EAS can support 30k BPU subscribers, so we need one high-spec EAS in the primary site
(manual or automatic/orchestrated restart on failure).
• N-Series. Required for Basic MOH. The low capacity VM spec cannot support 30k MOH
subscribers, but the high capacity VM spec supports up to 44k MOH subscribers. So we need one
high-spec N-Series in the primary site (manual or automatic/orchestrated restart on failure).
• Perimeta. We assume 12:1 concurrency for residential and 6:1 concurrency for business. That
leads to 46667 sessions. We opt for a distributed SSC/MSC architecture.

• Perimeta SSC can handle up to 1.87M UDP subscribers, so assuming we want GR we need
one 1+1 SSC HA pair per site.

86 5 Worked examples
CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

• For Perimeta MSC we'll assume use of SR-IOV and DPDK, giving 28,100 sessions per
instance. Assuming we want GR with one site and one cloud instance being able to handle the
full capacity in the event of site failure we need 2 active VM instances, and assuming we can
live with standalone instances we need a 2+2 MSC pool per site .
• SAS. Speak to your Metaswitch Support representative for exact details of SAS capacity planning,
but across all these products we require one high-spec SAS per cloud (manual or automatic/
orchestrated restart on failure) that receives SAS events from other product VM instances, and
also a master SAS per site, and 52 capacity blocks deployment-wide

Per-product VM requirements
The resources required for this example deployment are shown in the table below, which lists the
number of VMs of each type required in each cloud in each site (based on the analysis in the previous
section), the size of each VM (see VM resource specification tables on page 35 for one VM instance
of a given product), and a sum-product to get the total resources per site.

For SAS data storage we also refer to SAS data storage on page 57. The requirement calculated
above of 52 capacity blocks is rounded up to the next data point, which needs 14 TiB / 500 write
iops / 250 read iops volume storage. That is the requirement per-cloud (per SAS VM instance), so
it is sufficient in the event that a site fails, and a cloud in the other site fails, and all traffic is coming
through (and hence all SAS data is to be stored in) just one cloud / SAS instance.

As per the guidance in Redundant cloud deployments on page 16 we opt for a redundant multi-cloud
deployment, with two OpenStack clouds in each site to cope with individual cloud failure, and two sites
to cope with geographic site disaster.

The only difference between sites 1 and 2 is that the EAS and N-Series VM instances appear only in
site 1 (they do not have geographic redundancy).

The only difference between the two clouds in site 1 is that only the first contains EAS and N-Series
VM instances (as described in Product topologies on page 25 they do not currently support cloud
redundancy).

Product VMs in VMs in VMs in VMs in vCPUs RAM Boot Volume Storage
site 1, site 1, site 2, site 2, required required device storage speed
cloud A cloud B cloud C cloud D per VM (GiB) size required required
per VM (GiB) (GiB) (iops)
per VM per VM

CFS 2 2 2 2 12 48 30 170 100

MVD 1 1 1 1 12 24 30 70 100

RPAS 1 1 1 1 8 8 30 70 100

OBS 1 1 1 1 24 48 30 70 100

5 Worked examples 87
OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Product VMs in VMs in VMs in VMs in vCPUs RAM Boot Volume Storage
site 1, site 1, site 2, site 2, required required device storage speed
cloud A cloud B cloud C cloud D per VM (GiB) size required required
per VM (GiB) (GiB) (iops)
per VM per VM

MRS 5 5 5 5 8 8 100 50

DCM 1 1 1 1 1 1 10 50

MVS 1 1 1 1 12 24 300 100

EAS 1 12 24 70 530 100

N- 1 12 24 300 100
Series

Perimeta 1 1 1 1 20 16 150 100


SSC

Perimeta 2 2 2 2 16 8 30 100
MSC

SAS 1 1 1 1 8 24 60 14,000 950

TOTAL 18 205 345 1600 15,080 2350


for site
1, cloud
A

TOTAL 16 181 297 1230 14,550 2150


for site
1, cloud
B

TOTAL 16 181 297 1230 14,550 2150


for site
2, cloud
A

TOTAL 16 181 297 1230 14,550 2150


for site
2, cloud
B

88 5 Worked examples
CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

Compute node server hardware selection


If you are building a new OpenStack deployment, the next step is to decide what compute node host
servers will be used, and in particular what CPUs they have. The combination of the spec of those
servers, and the number of them, needs to be sufficient to support all the VM instances determined in
the previous step.

Firstly, assume we select host hardware with comparable benchmark performance to the reference
hardware described in Reference hardware on page 81. If we didn't, we'd have to scale the maximum
supported capacity of product instances accordingly, which could result in needing more or fewer
instances for some products.

The sum-product of vCPUs in the above tables gives a very rough indication of the requirements.
In site 1 cloud A we need 205 vCPUs (virtual CPU cores), which with hyper-threading on an Intel
architecture would require 101 pCPUs (physical CPU cores). Assume we select a high end data-
center server with four 8-core Intel CPUs. With 32 pCPUs per server, we can expect to need in the
region of 4+1 host servers per site. However, see the next section for the exact calculation.

Ensuring enough compute node resources


We cannot simply divide the total vCPU requirement calculated above by the capabilities of the
compute node servers. We must follow the guidance in Maximizing compute performance on servers
with multiple CPUs in Metaswitch Products Virtual Infrastructure Requirements Guide with regard to

• avoiding splitting VMs across CPU sockets


• respecting product internal redundancy.

This ultimately results in an assignment of VM instances to host servers. In practice this is something
that is done automatically by the OpenStack scheduler. However, for the purposes of planning it is
necessary to ensure it is actually possible. Below is just one possibility that could result for site 1 cloud
A, needing 4 primary host servers for compute nodes, although it is easy enough to see we cannot do
better (use fewer host servers) than this.

We must then include an additional host server, to allow manual or automatic/orchestrated restoration
of failed VMs, particularly for those products like SAS, EAS and N-Series that are not deployed as
pools or pairs with their own internal redundancy. So in total we need 4+1=5 host servers for compute
nodes in site 1 cloud A.

5 Worked examples 89
OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Figure 6: One possible distribution of VMs across host servers in site 1, cloud A

We cannot improve on this in the other cloud in site 1, or in site 2.

In terms of memory requirements, the most demanding server 2 requires 112GiB RAM for the listed
VM instances. Allowing some additional RAM for the hypervisor, we can conclude that the servers
should have 128GiB RAM.

When we put all this together, we arrive at the total server count for each cloud shown in the following
table.

Hosts Site 1 cloud A Site 1 cloud B Site 2 cloud C Site 2 cloud D

Compute nodes 4+1 4+1 4+1 4+1

Controller / network nodes 1+1 1+1 1+1 1+1

TOTAL hosts 7 7 7 7

Storage
There are a number of options outlined in Storage options on page 63. Here we assume that

• boot device storage is provided by redundant shared storage for those products that support it, and
ephemeral storage for others
• volume storage is provided in a shared SAN for all VMs in each cloud.

To calculate the SAN sizes we do as follows.

90 5 Worked examples
CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

• First we take the sum-product of the per-VM storage requirements from the table above e.g. (2
CFS VMs * 170GiB) + (1 RPAS VM * 70GiB) etc., for both boot disk and volume storage. This
gives:

• Site 1 cloud A: 16,680 GiB / 2350 iops


• Site 1 cloud B: 15,780 GiB / 2150 iops
• Site 2 cloud C: 15,780 GiB / 2150 iops
• Site 2 cloud D: 15,780 GiB / 2150 iops.
• The above figures are the storage to present to the VMs, so we increase this assuming RAID 10
redundancy which approximately doubles the storage required. So we have:

• Site 1 cloud A: ~34TiB / 2350 iops


• Site 1 cloud B: ~32TiB / 2150 iops
• Site 2 cloud C: ~32TiB / 2150 iops
• Site 2 cloud D: ~32TiB / 2150 iops.

Conclusion
In this worked example we need the following host hardware:

• Site 1 cloud A:

• 7 host servers each with 128GiB RAM and four 8-core Intel processors
• 34TiB / 2350 iops of SAN disk space
• 10Gb Ethernet switch
• Site 1 cloud B:

• 6 host servers each with 128GiB RAM and four 8-core Intel processors
• 32TiB / 2150 iops of SAN disk space
• 10Gb Ethernet switch
• Site 2 cloud C:

• 7 host servers each with 128GiB RAM and four 8-core Intel processors
• 32TiB / 2150 iops of SAN disk space
• 10Gb Ethernet switch
• Site 2 cloud D:

• 6 host servers each with 128GiB RAM and four 8-core Intel processors
• 32TiB / 2150 iops of SAN disk space
• 10Gb Ethernet switch

In terms of physical realization, this may be achieved with each site containing two racks, where each
rack contains its own OpenStack cloud comprising:

• a network switch for the rack


• one or more controller nodes, running the OpenStack controller software
• a network node
• the compute nodes as identified above

5 Worked examples 91
OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

• A highly available storage system. This might be physically arranged as in the following diagram.

Figure 7: Example physical implementation of cloud redundancy

Other OpenStack considerations


This particular worked example focuses mainly on the VM resource requirements. However, the other
requirements and guidance in this document must also be followed with regard to OpenStack settings,
as discussed in Requirements on OpenStack on page 14.

We must also ensure that we use an OpenStack version that is supported by all applicable products,
as described in OpenStack releases on page 9.

5.2 Lab deployment


This section focuses on the VM resource requirements in a basic lab deployment. It is slightly less
detailed than the previous section, but summarizes the key steps that need to be reached along the
way.

Solution overview
This worked example assumes we are planning a lab deployment requiring minimal capacity, only in a
single site with a single cloud, but still with HA available to be tested (although it will not be possible to
test or cope with cloud failure). The products and architecture that will be involved are:

• MVS
• CFS
• MRS
• EAS providing MaX UC Client, CommPortal and voicemail

92 5 Worked examples
CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

• MVD (required for MaX UC Client call jump)


• N-Series providing (Basic) Music on Hold
• Perimeta SBC
• SAS.

Product VM instances
For a lab deployment, performance and capacity are not an issue. The main requirement is ensuring
the right set of products, and redundancies.

• CFS. Must always be an HA pair, even in labs, so we need one lab-spec (non-clustered) CFS
1+1 HA pairs.
• MVD. Must always be an HA pair, even in labs, so we need one lab-spec MVD 1+1 HA pair per
site.
• RPAS. Not required for non-clustered CFS.
• OBS. Not required for non-clustered CFS.
• MRS. May be deployed as a single non-redundant instance, but we would like to test HA in the lab,
so we need a 1+1 lab-spec MRS pool.
• DCM. May be deployed as a single non-redundant instance, but we would like to test HA in the lab,
so we need a 1+1 DCM pool.
• MVS. We need one lab-spec active + standby MVS.
• EAS. We need one lab-spec EAS (manual or automatic/orchestrated restart on failure).
• N-Series. We need one lab-spec N-Series (manual or automatic/orchestrated restart on failure).
• Perimeta. For a lab we opt for an Integrated Session Controller (ISC). This may be deployed as
a single non-redundant instance, but we would like to test HA in the lab so we need one lab-spec
ISC HA pair.
• SAS. We need one lab-spec SAS per site (manual or automatic/orchestrated restart on failure).
Lab SAS does not require a separate data store, just the VM requirements specified in VM
resource specification tables on page 35.

Per-product VM requirements
The following table summarizes the number of VMs of each type required (based on the analysis in
the previous section), the size of each VM (see VM resource specification tables on page 35), and a
sum-product to get the total resources per site.

Product VMs vCPUs RAM Boot device Volume


required per required size (GiB) per storage
VM (GiB) per VM VM required (GiB)
per VM

CFS 2 1 4 30 170

MVD 2 1 8 30 70

5 Worked examples 93
OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Product VMs vCPUs RAM Boot device Volume


required per required size (GiB) per storage
VM (GiB) per VM VM required (GiB)
per VM

MRS 2 1 4 100

DCM 2 1 1 10

MVS 2 1 8 300

EAS 1 1 4 70 230

N-Series 1 1 4 300

Perimeta ISC 1 2 4 40

SAS 1 1 4 60

TOTAL for lab 14 15 66 1410 710

Ensuring enough compute node resources


In this case we have relatively low requirements. Here we can make a good illustration of some of
the points in the guidance in Maximizing compute performance on servers with multiple CPUs in
Metaswitch Products Virtual Infrastructure Requirements Guide, especially with regard to respecting
product internal redundancy.

One option might be to go for a physical host server with one 8-core Intel processor each. In theory,
with hyper-threading, that gives us 16 vCPUs. From the table above we only need 14 vCPUs total, so
in theory this could fit. However, we do not do choose this option as it would place both VMs in any
1+1 HA pairs (e.g. CFS, MVD) or pools (e.g. MRS, DCM) on the same physical host, and there is then
no protection against that physical host failing.

For this reason, we deploy 2 host servers for the VMs in the table above, with any 1+1 HA pairs or
pools split having one VM on each host. Given we have forced ourselves to have 2 host servers, we
can actually reduce the spec to just be one 6-core Intel processor.

Below is one possibility that could then pack the VMs on to this.

We must then include an additional host server, to allow manual or automatic/orchestrated restoration
of failed VMs, particularly for those products like SAS, EAS and N-Series that aren't deployed as
pools or pairs with their own internal redundancy. So in total we need 2+1=3 host servers for compute
nodes.

Considering the memory requirements, both servers need 33GiB, so to be safe we opt for a round 48
GiB RAM.

94 5 Worked examples
CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

Figure 8: One possible distribution of VMs across host servers

Storage
There are a number of options outlined in Storage options on page 63. Here we assume ephemeral
(locally attached) boot device and volume storage for all VMs. To calculate the disk requirements we
take the sum-product of the per-VM storage requirements in the table above, giving us 1410 GiB for
one server and 710 GiB for the other.

We do not need additional SAS data storage for lab deployments.

The above figures are the storage to present to the VMs. However, as this is only a lab deployment
we choose not to layer any additional redundancy, e.g. RAID. So we just need approximately 1.5TiB
of disk space per host to be safe.

Note:

With this storage configuration it is not possible to test recovery from a failed compute host, as if
the compute host fails the ephemeral storage will also be lost/destroyed. To support recovery from
a failed compute host, all storage must be on persistent (non-ephemeral) volumes. See Storage for
Metaswitch products on OpenStack on page 66 for more information.

Conclusion
In this worked example we need the following host hardware.

• 2+1 host servers for compute nodes each with

5 Worked examples 95
OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

• 48GiB RAM and one 6-core Intel processor


• 1.5TiB of disk space
• 1+1 host servers for controller and network nodes

5.3 Small-scale Perimeta lab deployment


This deployment layout focuses on a lab deployment consisting of an HA Perimeta ISC with SAS.

Solution Overview
This solution is designed for use for small scale lab deployments where separation of signaling
and media function is not required. In some cases it may be appropriate to deploy a standalone
ISC instead of an HA one but generally it is recommended that lab deployments use the same HA
topology as production ones. Similarly, it is possible to deploy DCM without a multiple member pool
but it is more realistic to deploy 2 DCMs.

Product instances
The following product instances are required in this deployment.

• 1+1 HA DCM pool


• 1+1 HA Perimeta ISC pair sized for lab use
• A matching lab SAS.

Distribution of VMs across hosts


There are no particular VM distribution requirements for a lab solution, although if there is a
requirement to maintain service across host maintenance, the VMs forming the ISC HA pair and DCM
pool should be located on different hosts.

Product vCPUs RAM Boot device Volume Storage speed


required required size (GiB) storage (iops)
(GiB) required (GiB)

Perimeta ISC 2 2 4 40 0

DCM 2 1 1 10 0

SAS 1 1 4 60 0

TOTAL for lab 5 7 14 160 0

96 5 Worked examples
CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

Per-product VM requirements

Product VMs vCPUs RAM Boot device Volume


required per required size (GiB) per storage
VM (GiB) per VM VM required (GiB)
per VM

Perimeta ISC 2 2 4 40 0

DCM 2 1 1 10 0

SAS 1 4 4 60 0

TOTAL for lab 5 7 14 160 0

Storage
There are no particular storage requirements beyond the boot device requirements, which give a total
of 160GiB.

Conclusion
For this deployment, the resources required are as follows.

• 7 vCPUs
• 14 GiB RAM
• 160 GiB of boot disk space.

There are no specific requirements on redundancy as this is a lab deployment. It is highly likely that a
single host would be sufficient for this deployment (and would likely still have capacity for other VMs)
and the storage could reasonably be provided by non-redundant local storage on the compute hosts.

5.4 Small-scale Perimeta HA SSC + MSC with single cloud active-standby VIP
deployment
This deployment layout focuses on a small scale deployment of Perimeta with SAS.in a single
OpenStack cloud instance. Note that service availability in this solution will be constrained by the
availability of the cloud instance.

Solution Overview
This solution is designed for use for small scale production deployments where only a single cloud
instance is available (and the resulting impact on availability is acceptable). It is suitable up to the
rated capacities of the SSC and MSC in their 2-core guise and can be expanded by adding additional
MSC HA pairs up to the limits of the SSC, and possibly adding additional SAS volume storage
capacity.

5 Worked examples 97
OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

Product instances
The following product instances are required in this deployment.

• 1+1 HA DCM pool


• 1+1 HA Perimeta SSC
• 1+1 HA Perimeta MSC
• SAS

Note that there is only a single SAS instance. It is assumed this will be recreated on failure and that
the lack of SAS redundancy to host failure is acceptable.

Distribution of VMs across hosts


In order to provide suitable redundancy, the VM pairs forming the HA MSC and HA SSC and DCM
pool must be spread across on different hosts (i.e. the two VMs forming the 1+1 SSC HA pairs must
not be on the same host and similarly for the MSC and DCM).

This can be achieved in numerous ways on OpenStack as described in High availability on page 14.

Compute host requirements


The compute hosts for production Perimeta deployments need to be set up with uncontended CPU,
memory and storage resources as described in the Virtual Infrastructure Requirements Guide to avoid
other VMs interfering with the real-time responsiveness of Perimeta.

Per-product VM requirements

Product VMs vCPUs RAM Boot device Volume


required per required size (GiB) per storage
VM (GiB) per VM VM required (GiB)
per VM

Perimeta SSC 2 2 4 40 0

Perimeta MSC 2 2 4 30 0

DCM 2 1 1 10 0

SAS 1 4 4 60 1000

TOTAL 7 14 22 220 1000

Networking requirements
Networking requirements are as follows.

• Perimeta, DCM and SAS all require a management network.


• An HA network is required to connect the SSC and MSC HA pairs.

98 5 Worked examples
CONFIDENTIAL OpenStack Deployment Design Guide (V8.5)

• Perimeta requires one or more service networks. In typical small scale deployments there is one
service network for access or untrusted SIP and RTP traffic and one service network for core or
trusted traffic. The service interfaces need to use SR-IOV ports to achieve optimal capacity.

This gives the following network topology.

Figure 9: Logical network topology

Note:

Although the HA networks between the SSC pair and MSC pair are shown as separate in the
diagram above for convenience, they can use the same underlying OpenStack network.

Storage
There are the following storage requirements for this deployment.

• 220 GiB of boot device storage

5 Worked examples 99
OpenStack Deployment Design Guide (V8.5) CONFIDENTIAL

• 1000 GiB of volume storage for SAS. This is recommended in order to have redundancy built in so
that a single failure does not cause loss of SAS data.

Conclusion
For this deployment, the total resources required are as follows.

• 14 vCPUs
• 22 GiB RAM
• 220 GiB of boot disk space.
• 1000 GiB of volume storage

Although these resources could potentially be provided by a single compute host, for a production
deployment at least 2 compute hosts are required to provide any level of redundancy to compute
host failure. In addition, it is recommended that there in a suitable level of redundancy in the volume
storage (e.g. using RAID or a distributed file system with a suitable redundancy factor) so that a single
host or disk failure does not trigger loss of all of the data.

Finally, in order to support more than a few hundred media sessions, the Perimeta SSC and MSC
VMs are recommended to use SR-IOV interfaces to achieve maximal media throughput and DDoS
protection.

100 5 Worked examples

You might also like