Vmware Validated Design 30 SDDC Reference Architecture
Vmware Validated Design 30 SDDC Reference Architecture
EN-002234-00
VMware Validated Design Reference Architecture Guide
The VMware Web site also provides the latest product updates.
If you have comments about this documentation, submit your feedback to:
[email protected]
© 2016 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright
and intellectual property laws. This product is covered by one or more patents listed at
http://www.vmware.com/download/patents.html.
VMware is a registered trademark or trademark of VMware, Inc. in the United States and/or other
jurisdictions. All other marks and names mentioned herein may be trademarks of their respective
companies.
VMware, Inc.
3401 Hillview Avenue
Palo Alto, CA 94304
www.vmware.com
Contents
3. Detailed Design.............................................................................. 45
3.1 Physical Infrastructure Design ..................................................................................... 45
3.1.1 Physical Design Fundamentals ............................................................................................. 46
3.1.2 Physical Networking Design .................................................................................................. 52
3.1.3 Physical Storage Design ....................................................................................................... 61
3.2 Virtual Infrastructure Design ........................................................................................ 70
3.2.1 Virtual Infrastructure Design Overview .................................................................................. 70
3.2.2 ESXi Design .......................................................................................................................... 73
3.2.3 vCenter Server Design .......................................................................................................... 75
3.2.4 Virtualization Network Design................................................................................................ 89
3.2.5 NSX Design ......................................................................................................................... 104
3.2.6 Shared Storage Design ....................................................................................................... 126
3.3 Cloud Management Platform Design ......................................................................... 145
3.3.1 vRealize Automation Design ............................................................................................... 145
3.3.2 vRealize Orchestrator Design.............................................................................................. 175
3.4 Operations Infrastructure Design ............................................................................... 185
3.4.1 vRealize Log Insight Design ................................................................................................ 186
3.4.2 vRealize Operations Manager Design ................................................................................. 196
3.4.3 vSphere Data Protection Design ......................................................................................... 206
3.4.4 Site Recovery Manager and vSphere Replication Design ................................................... 213
List of Tables
Table 1. Elements and Components of the Cloud Management Platform ........................................... 32
Table 2. Characteristics of the Cloud Management Platform Architecture ........................................... 33
Table 3. Cloud Management Platform Elements .................................................................................. 34
Table 4. vRealize Operations Manager Logical Node Architecture ...................................................... 44
Table 5. Regions ................................................................................................................................... 46
Table 6. Availability Zones and Regions Design Decisions .................................................................. 47
Table 7. Required Number of Racks ..................................................................................................... 48
Table 8. POD and Racks Design Decisions ......................................................................................... 49
Table 9. ESXi Host Design Decisions ................................................................................................... 51
Table 10. Host Memory Design Decision .............................................................................................. 52
Table 11. Jumbo Frames Design Decisions ......................................................................................... 56
Table 12. VLAN Sample IP Ranges ...................................................................................................... 58
Table 13. Physical Network Design Decisions...................................................................................... 60
Table 14. Additional Network Design Decisions ................................................................................... 61
Table 15. Virtual SAN Physical Storage Design Decision .................................................................... 62
Table 16. Virtual SAN Mode Design Decision ...................................................................................... 62
Table 17. Hybrid and All-Flash Virtual SAN Endurance Classes ......................................................... 64
Table 18. SSD Endurance Class Design Decisions ............................................................................. 64
Table 19. SSD Performance Classes ................................................................................................... 65
Table 20. SSD Performance Class Selection ....................................................................................... 65
Table 21. SSD Performance Class Design Decisions .......................................................................... 66
Table 22. Virtual SAN HDD Environmental Characteristics .................................................................. 66
Table 23. HDD Characteristic Selection ............................................................................................... 67
Table 24. HDD Selection Design Decisions.......................................................................................... 67
Table 25. NFS Usage Design Decisions ............................................................................................... 68
Table 26. NFS Hardware Design Decision ........................................................................................... 69
Table 27. Volume Assignment Design Decisions ................................................................................. 69
Table 28. ESXi Boot Disk Design Decision ........................................................................................... 74
Table 29. ESXi User Access Design Decisions .................................................................................... 75
Table 30. Other ESXi Host Design Decisions ....................................................................................... 75
Table 31. vCenter Server Design Decision ........................................................................................... 76
Table 32. vCenter Server Platform Design Decisions .......................................................................... 77
Table 33. Platform Service Controller Design Decisions ...................................................................... 77
Table 34. Methods for Protecting vCenter Server System and the vCenter Server Appliance ............ 78
Table 35. vCenter Server Systems Protection Design Decisions ......................................................... 79
Table 36. Logical Specification for Management vCenter Server Appliance ........................................ 79
Table 37. Logical Specification for Compute vCenter Server Appliance .............................................. 79
Table 77. vSphere Compute Cluster Split Design Decisions .............................................................. 112
Table 78. VTEP Teaming and Failover Configuration Design Decision ............................................. 114
Table 79. Logical Switch Control Plane Mode Decision ..................................................................... 115
Table 80. Transport Zones Design Decisions ..................................................................................... 116
Table 81. Routing Model Design Decision .......................................................................................... 116
Table 82. Transit Network Design Decision ........................................................................................ 118
Table 83. Tenant Firewall Design Decision ........................................................................................ 118
Table 84. Load Balancer Features of NSX Edge Services Gateway ................................................. 119
Table 85. NSX for vSphere Load Balancer Design Decision.............................................................. 120
Table 86.Virtual to Physical Interface Type Design Decision ............................................................. 120
Table 87. Inter-Site Connectivity Design Decisions ............................................................................ 121
Table 88. Isolated Management Applications Design Decisions ........................................................ 122
Table 89. Portable Management Applications Design Decision ......................................................... 123
Table 90. Application Virtual Network Configuration .......................................................................... 126
Table 91. Network Shared Storage Supported by ESXi Hosts ........................................................... 127
Table 92. vSphere Features Supported by Storage Type .................................................................. 127
Table 93. Storage Type Design Decisions .......................................................................................... 129
Table 94. VAAI Design Decisions ....................................................................................................... 130
Table 95. Virtual Machine Storage Policy Design Decisions .............................................................. 131
Table 96. Storage I/O Control Design Decisions ................................................................................ 131
Table 97. Resource Management Capabilities Available for Datastores ........................................... 132
Table 98. Network Speed Selection .................................................................................................... 135
Table 99. Network Bandwidth Design Decision .................................................................................. 135
Table 100. Virtual Switch Types ......................................................................................................... 136
Table 101. Virtual Switch Design Decisions ....................................................................................... 136
Table 102. Jumbo Frames Design Decision ....................................................................................... 137
Table 103. VLAN Design Decision ...................................................................................................... 137
Table 104. Virtual SAN Datastore Design Decisions .......................................................................... 138
Table 105. Number of Hosts per Cluster ............................................................................................ 138
Table 106. Cluster Size Design Decisions .......................................................................................... 139
Table 107. Number of Disk Groups per Host ...................................................................................... 139
Table 108. Disk Groups per Host Design Decision ............................................................................ 140
Table 109. Virtual SAN Policy Options ............................................................................................... 140
Table 110. Object Policy Defaults ....................................................................................................... 142
Table 111. Policy Design Decision ..................................................................................................... 142
Table 112. NFS Version Design Decision ........................................................................................... 143
Table 113. NFS Export Sizing ............................................................................................................. 143
Table 114. NFS Export Design Decisions ........................................................................................... 144
Table 115. NFS Datastore Design Decision ....................................................................................... 144
Table 192. Custom Role-Based User Management Design Decision ................................................ 193
Table 193. Custom Certificates Design Decision ............................................................................... 193
Table 194. Direct Log Communication to vRealize Log Insight Design Decisions ............................. 194
Table 195. Time Synchronization Design Decision ............................................................................ 194
Table 196. Syslog Protocol Design Decision ...................................................................................... 195
Table 197. Protocol for Event Forwarding across Regions Design Decision ..................................... 195
Table 198. Analytics Cluster Node Configuration Design Decisions .................................................. 197
Table 199. Size of a Medium vRealize Operations Manager Virtual Appliance ................................. 198
Table 200. Analytics Cluster Node Size Design Decisions ................................................................. 198
Table 201. Size of a Standard Remote Collector Virtual Appliance for vRealize Operations Manager
............................................................................................................................................................ 198
Table 202. Compute Resources of the Remote Collector Nodes Design Decisions .......................... 199
Table 203. Analytics Cluster Node Storage Design Decision ............................................................. 199
Table 204. Remote Collector Node Storage Design Decision ............................................................ 200
Table 205. vRealize Operations Manager Isolated Network Design Decision ................................... 202
Table 206. IP Subnets in the Application Virtual Network of vRealize Operations Manager ............. 202
Table 207. IP Subnets Design Decision ............................................................................................. 202
Table 208. DNS Names for the Application Virtual Networks ............................................................. 202
Table 209. Networking Failover and Load Balancing Design Decisions ............................................ 203
Table 210. Identity Source for vRealize Operations Manager Design Decision ................................. 204
Table 211. Using CA-Signed Certificates Design Decision ................................................................ 204
Table 212. Monitoring vRealize Operations Manager Design Decisions ........................................... 205
Table 213. Management Packs for vRealize Operations Manager Design Decisions ....................... 205
Table 214. vSphere Data Protection Design Decision ........................................................................ 206
Table 215. Options for Backup Storage Location ............................................................................... 207
Table 216. VMware Backup Store Target Design Decisions .............................................................. 207
Table 217. vSphere Data Protection Performance ............................................................................. 207
Table 218. VMware vSphere Data Protection Sizing Guide ............................................................... 208
Table 219. VMware Backup Store Size Design Decisions ................................................................. 208
Table 220. Virtual Machine Transport Mode Design Decisions .......................................................... 209
Table 221. Backup Schedule Design Decisions ................................................................................. 210
Table 222. Retention Policies Design Decision .................................................................................. 210
Table 223. Component Backup Jobs Design Decision ....................................................................... 210
Table 224. VM Backup Jobs in Region A ........................................................................................... 211
Table 225. VM Backup Jobs in Region B* .......................................................................................... 212
Table 226. Design Decisions for Site Recovery Manager and vSphere Replication Deployment ..... 215
Table 227. vSphere Replication Design Decisions ............................................................................. 217
Table 228. Recovery Plan Test Network Design Decision ................................................................. 220
List of Figures
Figure 1. Overview of SDDC Architecture ............................................................................................ 13
Figure 2. Pods in the SDDC .................................................................................................................. 15
Figure 3. Leaf-and-Spine Physical Network Design ............................................................................. 16
Figure 4. High-level Physical and Logical Representation of a Leaf Node ........................................... 17
Figure 5. Pod Network Design .............................................................................................................. 19
Figure 6. Oversubscription in the Leaf Layer ........................................................................................ 20
Figure 7. Compensation for a Link Failure ............................................................................................ 21
Figure 8. Quality of Service (Differentiated Services) Trust Point ........................................................ 22
Figure 9. Availability Zones and Regions .............................................................................................. 23
Figure 10. Virtual Infrastructure Layer in the SDDC ............................................................................. 24
Figure 11. SDDC Logical Design .......................................................................................................... 25
Figure 12. NSX for vSphere Architecture .............................................................................................. 27
Figure 13. NSX for vSphere Universal Distributed Logical Router ....................................................... 30
Figure 14. Cloud Management Platform Conceptual Architecture ....................................................... 32
Figure 15. vRealize Automation Logical Architecture for Region A ...................................................... 36
Figure 16. vRealize Automation Logical Architecture for Region B ...................................................... 37
Figure 17. Dual-Region Data Protection Architecture ........................................................................... 38
Figure 18. Disaster Recovery Architecture ........................................................................................... 39
Figure 19. Cluster Architecture of vRealize Log Insight ........................................................................ 41
Figure 20. vRealize Operations Manager Architecture ......................................................................... 42
Figure 21. Physical Infrastructure Design ............................................................................................. 45
Figure 22. Physical Layer within the SDDC .......................................................................................... 46
Figure 23. SDDC Pod Architecture ....................................................................................................... 47
Figure 24. Leaf-and-Spine Architecture ................................................................................................ 52
Figure 25. Example of a Small-Scale Leaf-and-Spine Architecture ..................................................... 54
Figure 26. Leaf-and-Spine and Network Virtualization ......................................................................... 55
Figure 27. Leaf Switch to Server Connection within Compute Racks .................................................. 56
Figure 28. Leaf Switch to Server Connection within Management/Shared Compute and Edge Rack . 57
Figure 29. Sample VLANs and Subnets within a Pod .......................................................................... 58
Figure 30. Oversubscription in the Leaf Switches................................................................................. 59
Figure 31. Virtual Infrastructure Layer Business Continuity in the SDDC ............................................ 70
Figure 32. SDDC Logical Design .......................................................................................................... 71
Figure 33. vSphere Data Protection Logical Design ............................................................................. 72
Figure 34. Disaster Recovery Logical Design ....................................................................................... 73
Figure 35. vCenter Server and Platform Services Controller Deployment Model ................................ 78
Figure 36. vSphere Logical Cluster Layout ........................................................................................... 81
Figure 37. Network Switch Design for Management Hosts .................................................................. 92
Figure 38. Network Switch Design for shared Edge and Compute Hosts ............................................ 96
Figure 39. Network Switch Design for Compute Hosts ......................................................................... 98
Figure 40. Architecture of NSX for vSphere........................................................................................ 105
Figure 41. Conceptual Tenant Overview ............................................................................................ 111
Figure 42. Cluster Design for NSX for vSphere .................................................................................. 113
Figure 43. Logical Switch Control Plane in Hybrid Mode .................................................................... 115
Figure 44. Virtual Application Network Components and Design ....................................................... 124
Figure 45. Detailed Example for vRealize Automation Networking .................................................... 125
Figure 46. Logical Storage Design ...................................................................................................... 128
Figure 47. Conceptual Virtual SAN Design ......................................................................................... 134
Figure 48. Virtual SAN Conceptual Network Diagram ........................................................................ 135
Figure 49. NFS Storage Exports ......................................................................................................... 144
Figure 50. Cloud Management Platform Design ................................................................................. 145
Figure 51. vRealize Automation Design Overview for Region A ........................................................ 147
Figure 52. vRealize Automation Design Overview for Region B ........................................................ 148
Figure 53. Rainpole Cloud Automation Tenant Design for Two Regions ........................................... 159
Figure 54. vRealize Automation Logical Design ................................................................................. 166
Figure 55. vRealize Automation Integration with vSphere Endpoint .................................................. 169
Figure 56. Template Synchronization ................................................................................................. 174
Figure 57. VMware Identity Manager proxies authentication between Active Directory and vRealize
Automation .......................................................................................................................................... 174
Figure 58. Operations Infrastructure Conceptual Design ................................................................... 186
Figure 59. Logical Design of vRealize Log Insight .............................................................................. 186
Figure 60. Networking Design for the vRealize Log Insight Deployment ........................................... 189
Figure 61. Logical Design of vRealize Operations Manager Multi-Region Deployment ..................... 196
Figure 62. Networking Design of the vRealize Operations Manager Deployment ............................. 201
Figure 63. vSphere Data Protection Logical Design ........................................................................... 206
Figure 64. Disaster Recovery Logical Design ..................................................................................... 214
Figure 65. Logical Network Design for Cross-Region Deployment with Management Application
Network Containers ............................................................................................................................. 216
Note The VMware Validated Design Reference Architecture Guide is compliant and validated with
certain product versions. See VMware Validated Design Release Notes for more information
about supported product versions.
VMware Validated Design Reference Architecture Guide is intended for cloud architects,
infrastructure administrators and cloud administrators who are familiar with and want to use VMware
software to deploy in a short time and manage an SDDC that meets the requirements for capacity,
scalability, backup and restore, and extensibility for disaster recovery support.
2 Architecture Overview
The VMware Validated™ Design for Software-Defined Data Center outcome requires a system that
enables an IT organization to automate the provisioning of common repeatable requests and to
respond to business needs with more agility and predictability. Traditionally this has been referred to
as IAAS, or Infrastructure as a Service, however the software-defined data center (SDDC) extends
the typical IAAS solution to include a broader and more complete IT solution.
The VMware Validated Design architecture is based on a number of layers and modules, which
allows interchangeable components be part of the end solution or outcome such as the SDDC. If a
particular component design does not fit the business or technical requirements for whatever reason,
it should be able to be swapped out for another similar component. The VMware Validated Designs
are one way of putting an architecture together. They are rigorously tested to ensure stability,
scalability and compatibility. Ultimately however, the system is designed in such a way as to ensure
the desired IT outcome will be achieved.
Physical Layer
The lowest layer of the solution is the Physical Layer, sometimes referred to as the 'core', which
consists of three main components, Compute, Network and Storage. Inside the compute component
sit the x86 based servers that run the management, edge and tenant compute workloads. There is
some guidance around the capabilities required to run this architecture, however no
recommendations on the type or brand of hardware is given. All components must be supported on
the VMware Hardware Compatibility guide.
Virtual Infrastructure Layer
Sitting on the Physical Layers infrastructure is the Virtual Infrastructure Layer. Within the Virtual
Infrastructure Layer, access to the physical underlying infrastructure is controlled and allocated to the
management and tenant workloads. The Virtual Infrastructure Layer consists primarily of the physical
host's hypervisor and the control of these hypervisors. The management workloads consist of
elements in the virtual management layer itself, along with elements in the Cloud Management Layer,
Service Management, Business Continuity and Security areas.
Cloud Management Layer
The Cloud Management Layer is the "top" layer of the stack and is where the service consumption
occurs. Typically, through a UI or API, this layer calls for resources and then orchestrates the actions
of the lower layers to achieve the request. While the SDDC can stand on its own without any other
ancillary services, for a complete SDDC experience other supporting components are needed. The
Service Management, Business Continuity and Security areas complete the architecture by providing
this support.
Service Management
When building any type of IT infrastructure, portfolio and operations management play key roles in
continued day-to-day service delivery. The Service Management area of this architecture mainly
focuses on operations management in particular monitoring, alerting and log management. Portfolio
Management is not a focus of this SDDC design but may be added in future releases.
Business Continuity
To ensure a system that is enterprise ready, it must contain elements to support business continuity in
the area of backup, restore and disaster recovery. This area ensures that when data loss occurs, the
right elements are in place to ensure there is no permanent loss to the business. The design provides
comprehensive guidance on how to operate backups, restore and also the run books of how to fail
components over in the event of a disaster.
Security
All systems need to inherently be secure by design to reduce risk and increase compliance while still
providing a governance structure. The security area outlines what is needed to ensure the entire
SDDC is resilient to both internal and external threats.
2.1.1.1 Pod
A pod is a logical boundary of functionality for the SDDC platform. While each pod usually spans one
rack, it is possible to aggregate multiple pods into a single rack in smaller setups. For both small and
large setups, homogeneity and easy replication are important.
Different pods of the same type can provide different characteristics for varying requirements. For
example, one compute pod could use full hardware redundancy for each component (power supply
through memory chips) for increased availability. At the same time, another compute pod in the same
setup could use low-cost hardware without any hardware redundancy. With these variations, the
architecture can cater to the different workload requirements in the SDDC.
One of the guiding principles for such deployments is that VLANs are not spanned beyond a single
pod by the network virtualization layer. Although this VLAN restriction appears to be a simple
requirement, it has widespread impact on how a physical switching infrastructure can be built and on
how it scales.
One Pod in One Rack. One pod can occupy exactly one rack. This is typically the case for
compute pods.
Multiple Pods in One Rack. Two or more pods can occupy a single rack, for example, one
management pod and one shared edge and compute pod can be deployed to a single rack.
Single Pod Across Multiple Racks. A single pod can stretch across multiple adjacent racks. For
example, a storage pod with filer heads and disk shelves can span more than one rack.
Ports that face the servers inside a rack should have a minimal configuration, shown in the following
high-level physical and logical representation of the leaf node.
Note Each leaf node has identical VLAN configuration with unique /24 subnets assigned to each
VLAN.
A dynamic routing protocol—for example OSPF, ISIS, or iBGP—connects the leaf switches and
spine switches. Each leaf switch in the rack advertises a small set of prefixes, typically one per
VLAN or subnet. In turn, the leaf switch calculates equal cost paths to the prefixes it received from
other leaf switches
Using Layer 3 routing has benefits and drawbacks.
The benefit is that you can chose from a wide array of Layer 3 capable switch products for the
physical switching fabric. You can mix switches from different vendors due to general
interoperability between implementation of OSPF, ISIS or iBGP. This approach is usually more
cost effective because it uses only basic functionality of the physical switches.
The drawbacks are some design restrictions because VLANs are restricted to a single rack. This
affects vSphere vMotion, vSphere Fault Tolerance, and storage networks.
The number of links to the spine switches dictates how many paths for traffic from this rack to another
rack are available. Because the number of hops between any two racks is consistent, the architecture
can utilize equal-cost multipathing (ECMP). Assuming traffic sourced by the servers carries a TCP or
UDP header, traffic spray can occur on a per-flow basis.
High Bandwidth
In leaf-and-spine topologies, oversubscription typically occurs at the leaf switch.
Oversubscription is equal to the total amount of bandwidth available to all servers connected to a leaf
switch divided by the aggregate amount of uplink bandwidth.
oversubscription = total bandwidth / aggregate uplink bandwidth
For example, 20 servers with one 10 Gigabit Ethernet (10 GbE) port each create up to 200 Gbps of
bandwidth. In an environment with eight 10 GbE uplinks to the spine—a total of 80 Gbps—a 2.5:1
oversubscription results shown in the Oversubscription in the Leaf Layer illustration.
2.5 (oversubscription) = 200 (total) / 80 (total uplink)
You can make more or less bandwidth available to a rack by provisioning more or fewer uplinks. That
means you can change the available bandwidth on a per-rack basis.
Note The number of uplinks from a leaf switch to each spine switch must be the same to avoid
hotspots.
For example, if a leaf switch has two uplinks to spine switch A and only one uplink to spine switches
B, C and D, more traffic is sent to the leaf switch via spine switch A, which might create a hotspot.
Fault Tolerance
The larger the environment, the more switches make up the overall fabric and the greater the
possibility for one component of the data center switching fabric to fail. A resilient fabric can sustain
individual link or box failures without widespread impact.
For example, if one of the spine switches fails, traffic between racks continues to be routed across the
remaining spine switches in a Layer 3 fabric. The routing protocol ensures that only available paths
are chosen. Installing more than two spine switches reduces the impact of a spine switch failure.
Multipathing-capable fabrics handle box or link failures, reducing the need for manual network
maintenance and operations. If a software upgrade of a fabric switch becomes necessary, the
administrator can take the node out of service gracefully by changing routing protocol metrics, which
will quickly drain network traffic from that switch, freeing the switch for maintenance.
Depending on the width of the spine, that is, how many switches are in the aggregation or spine layer,
the additional load that the remaining switches must carry is not as significant as if there were only
two switches in the aggregation layer. For example, in an environment with four spine switches, a
failure of a single spine switch only reduces the available capacity by 25%.
Quality of Service Differentiation
Virtualized environments must carry different types of traffic, including tenant, storage and
management traffic, across the switching infrastructure. Each traffic type has different characteristics
and makes different demands on the physical switching infrastructure.
Management traffic, although typically low in volume, is critical for controlling physical and virtual
network state.
IP storage traffic is typically high in volume and generally stays within a data center.
For virtualized environments, the hypervisor sets the QoS values for the different traffic types. The
physical switching infrastructure has to trust the values set by the hypervisor. No reclassification is
necessary at the server-facing port of a leaf switch. If there is a congestion point in the physical
switching infrastructure, the QoS values determine how the physical network sequences, prioritizes,
or potentially drops traffic.
Two types of QoS configuration are supported in the physical switching infrastructure:
Layer 2 QoS, also called class of service
Layer 3 QoS, also called DSCP marking.
A vSphere Distributed Switch supports both class of service and DSCP marking. Users can mark the
traffic based on the traffic type or packet classification. When the virtual machines are connected to
the VXLAN-based logical switches or networks, the QoS values from the internal packet headers are
copied to the VXLAN-encapsulated header. This enables the external physical network to prioritize
the traffic based on the tags on the external header.
This VMware Validated Design uses two regions, but uses only one availability zone in each region.
The following diagram shows how the design could be expanded to include multiple availability
zones.
Figure 9. Availability Zones and Regions
2.1.3.2 Regions
Multiple regions support placing workloads closer to your customers, for example, by operating one
region on the US east coast and one region on the US west coast, or operating a region in Europe
and another region in the US. Regions are helpful in many ways.
Regions can support disaster recovery solutions: One region can be the primary site and another
region can be the recovery site.
You can use multiple regions to address data privacy laws and restrictions in certain countries by
keeping tenant data within a region in the same country.
The distance between regions can be rather large. This design uses two regions, one region is
assumed to be in San Francisco (SFO), the other region is assumed to be in Los Angeles (LAX).
VMware NSX 6.2 allows linking multiple vCenter and VMware NSX deployments, and manage them
from a single NSX Manager that is designated as primary. Such a linked environment includes both
an NSX Manager primary instance, and one or more secondary instances.
The primary NSX Manager instance is linked to the primary vCenter Server instance and allows
the creation and management of universal logical switches, universal (distributed) logical routers
and universal firewall rules.
Each secondary NSX Manager instance can manage networking services that are local to itself.
Up to seven secondary NSX Manager instances can be associated with the primary NSX
Manager in a linked environment. You can configure network services on all NSX Manager
instances from one central location.
Note A linked environment still requires one vCenter Server instance for each NSX Manager
instance.
To manage all NSX Manager instances from the primary NSX Manager in a Cross-vCenter VMware
NSX deployment, the vCenter Server instances must be connected with Platform Services Controller
nodes in Enhanced Linked Mode.
Figure 12. NSX for vSphere Architecture
Control Plane
VMware NSX supports three different replication modes to provide multiple destination
communication.
Multicast Mode. When multicast replication mode is selected for a logical switch, VMware NSX
relies on the Layer 2 and Layer 3 multicast capability of the physical network to ensure VXLAN
encapsulated multi-destination traffic is sent to all the VXLAN tunnel end points (VTEPs). The
control plane uses multicast IP addresses on the physical network in this mode.
Unicast Mode. In unicast mode, the control plane is managed by the NSX Controller instances
and all replication is done locally on the host. No multicast IP addresses or physical network
configurations are required. This mode is well suited for smaller deployments.
Hybrid Mode. Hybrid mode is an optimized version of unicast mode, where local traffic replication
for the subnet is offloaded to the physical network. This mode requires IGMP snooping on the first
hop switch, and an IGMP querier must be available. Protocol-independent multicast (PIM) is not
required.
NSX Controller
The NSX Controller cluster is the control plane component, and is responsible for managing the
switching and routing modules in the hypervisors. An NSX Controller node performs the following
functions.
Provides the control plane to distribute VXLAN and logical routing information to ESXi hosts.
Clusters nodes for scale-out and high availability.
Slices network information across nodes in a cluster for redundancy purposes.
Eliminates the need for multicast support from the physical network infrastructure.
Provides ARP-suppression of broadcast traffic in VXLAN networks.
NSX for vSphere control plane communication occurs over the management network. Network
information from the ESXi hosts and the distributed logical router control VMs is reported to NSX
Controller instances through the UWA. The NSX Controller command line supports retrieval of
VXLAN and logical routing network state information.
Data Plane
The NSX data plane consists of the NSX vSwitch, which is based on the vSphere Distributed Switch
(VDS) and includes additional components. These components include kernel modules, which run
within the ESXi kernel and provide services such as the distributed logical router (DLR) and
distributed firewall (DFW). The NSX kernel modules also enable Virtual Extensible LAN (VXLAN)
capabilities.
The NSX vSwitch abstracts the physical network and provides access-level switching in the
hypervisor. It is central to network virtualization because it enables logical networks that
are independent of physical constructs such as a VLAN. The NSX vSwitch provides multiple benefits.
Three types of overlay networking capabilities:
o Creation of a flexible logical Layer 2 overlay over existing IP networks on existing
physical infrastructure, without the need to rearchitect the data center networks.
o Support for east/west and north/south communication while maintaining isolation
between tenants.
o Support for application workloads and virtual machines that operate as if they were connected
to a physical Layer 2 network.
Support for VXLAN and centralized network configuration.
A comprehensive toolkit for traffic management, monitoring and troubleshooting within a virtual
network which includes Port Mirroring, NetFlow/IPFIX, configuration backup and restore, network
health check, and Quality of Service (QoS).
In addition to the NSX vSwitch, the data plane also includes gateway devices, which can provide
Layer 2 bridging from the logical networking space (VXLAN) to the physical network (VLAN). The
gateway device is typically an NSX Edge Gateway device. NSX Edge Gateway devices offer Layer 2,
Layer 3, perimeter firewall, load-balancing, Virtual Private Network (VPN), and Dynamic Host Control
Protocol (DHCP) services.
Consumption Plane
In the consumption plane, different users of NSX for vSphere can access and manage services in
different ways:
NSX administrators can manage the NSX environment from the vSphere Web Client.
End-users can consume the network virtualization capabilities of NSX for vSphere through the
Cloud Management Platform (CMP) when deploying applications.
Designated Instance
The designated instance is responsible for resolving ARP on a VLAN LIF. There is one designated
instance per VLAN LIF. The selection of an ESXi host as a designated instance is performed
automatically by the NSX Controller cluster and that information is pushed to all other hosts. Any ARP
requests sent by the distributed logical router on the same subnet are handled by the same host. In
case of host failure, the controller selects a new host as the designated instance and makes that
information available to other hosts.
User World Agent
User World Agent (UWA) is a TCP and SSL client that enables communication between the ESXi
hosts and NSX Controller nodes, and the retrieval of information from NSX Manager through
interaction with the message bus agent.
Edge Service Gateways
While the Universal Logical Router provides VM to VM or east-west routing, the NSX Edge Service
Gateway provides north-south connectivity, by peering with upstream Top of Rack switches, thereby
enabling tenants to access public networks.
Logical Firewall
NSX for vSphere Logical Firewall provides security mechanisms for dynamic virtual data centers.
The Distributed Firewall allows you to segment virtual data center entities like virtual machines.
Segmentation can be based on VM names and attributes, user identity, vCenter objects like data
centers, and hosts, or can be based on traditional networking attributes like IP addresses, port
groups, and so on.
The Edge Firewall component helps you meet key perimeter security requirements, such as
building DMZs based on IP/VLAN constructs, tenant-to-tenant isolation in multi-tenant virtual data
centers, Network Address Translation (NAT), partner (extranet) VPNs, and user-based SSL
VPNs.
The Flow Monitoring feature displays network activity between virtual machines at the application
protocol level. You can use this information to audit network traffic, define and refine firewall policies,
and identify threats to your network.
Logical Virtual Private Networks (VPNs)
SSL VPN-Plus allows remote users to access private corporate applications. IPSec VPN offers site-
to-site connectivity between an NSX Edge instance and remote sites. L2 VPN allows you to extend
your datacenter by allowing virtual machines to retain network connectivity across geographical
boundaries.
Logical Load Balancer
The NSX Edge load balancer enables network traffic to follow multiple paths to a specific destination.
It distributes incoming service requests evenly among multiple servers in such a way that the load
distribution is transparent to users. Load balancing thus helps in achieving optimal resource
utilization, maximizing throughput, minimizing response time, and avoiding overload. NSX Edge
provides load balancing up to Layer 7.
Service Composer
Service Composer helps you provision and assign network and security services to applications in a
virtual infrastructure. You map these services to a security group, and the services are applied to the
virtual machines in the security group.
Data Security provides visibility into sensitive data that are stored within your organization's virtualized
and cloud environments. Based on the violations that are reported by the NSX for vSphere Data
Security component, NSX security or enterprise administrators can ensure that sensitive data is
adequately protected and assess compliance with regulations around the world.
NSX for vSphere Extensibility
VMware partners integrate their solutions with the NSX for vSphere platform to enable an integrated
experience across the entire SDDC. Data center operators can provision complex, multi-tier virtual
networks in seconds, independent of the underlying network topology or components.
The Cloud Management Platform consists of the following design element and components.
Table 1. Elements and Components of the Cloud Management Platform
Tools and supporting Building blocks that provide the foundation of the cloud.
infrastructure
VM templates and blueprints. VM templates are used to author
the blueprints that tenants (end users) use to provision their cloud
workloads.
Cloud management A portal that provides self-service capabilities for users to administer,
portal provision and manage workloads.
vRealize Automation portal, Admin access. The default root
tenant portal URL used to set up and administer tenants and
global configuration options.
vRealize Automation portal, Tenant access. Refers to a
subtenant and is accessed using with an appended tenant
identifier.
Note A tenant portal might refer to the default tenant portal in some
configurations. In this case, the URLs match, and the user
interface is contextually controlled by the role-based access
control permissions that are assigned to the tenant.
Characteristic Description
Characteristic Description
Scalability Depicts the effect the option has on the ability of the solution to be
augmented to achieve better sustained performance within the
infrastructure.
Key metrics: Web site latency, network traffic, and CPU usage on the
database and web servers.
2.4.1.1 Architecture
Data protection solutions provide the following functions in the SDDC:
Backup and restore virtual machines.
Store data according to company retention policies.
You can configure vSphere Replication to regularly create and retain snapshots of protected
virtual machines on the recovery region.
Protection groups. A protection group is a collection of virtual machines that Site Recovery
Manager protects together. You configure virtual machines and create protection groups
differently depending on whether you use array-based replication or vSphere Replication. You
cannot create protection groups that combine virtual machines for which you configured array-
based replication with virtual machines for which you configured vSphere Replication.
Recovery plans. A recovery plan specifies how Site Recovery Manager recovers the virtual
machines in the protection groups that it contains. You can include a combination of array-based
replication protection groups and vSphere Replication protection groups in the same recovery
plan.
2.4.3.1 Overview
vRealize Log Insight collects data from ESXi hosts using the syslog protocol. It connects to vCenter
Server to collect events, tasks, and alarms data, and integrates with vRealize Operations Manager to
send notification events and enable launch in context. It also functions as a collection and analysis
point for any system capable of sending syslog data. In addition to syslog data an ingestion agent can
be installed on Linux or Windows servers to collect logs. This agent approach is especially useful for
custom logs and operating systems that don't natively support the syslog protocol, such as Windows.
vRealize Log Insight clients connect to ILB VIP address and use the Web user interface and
ingestion (via Syslog or the Ingestion API) to send logs to vRealize Log Insight.
By default, the vRealize Log Insight Solution collects data from vCenter Server systems and ESXi
hosts. For forwarding logs from NSX for vSphere, and vRealize Automation, use content packs which
contain extensions or provide integration with other systems in the SDDC.
2.4.3.6 Archiving
vRealize Log Insight supports data archiving on NFS shared storage that each vRealize Log Insight
node can access.
2.4.3.7 Backup
You back up each vRealize Log Insight cluster locally by using traditional virtual machine backup
solutions, such as a vSphere Storage APIs for Data Protection (VADP) compatible backup software
like vSphere Data Protection.
2.4.4.2 Architecture
vRealize Operations Manager contains functional elements that collaborate for data analysis and
storage, and support creating clusters of nodes with different roles.
Figure 20. vRealize Operations Manager Architecture
3 Detailed Design
This Software-Defined Data Center (SDDC) detailed design consists of the following main sections:
Physical Infrastructure Design
Virtual Infrastructure Design
Cloud Management Platform Design
Operations Infrastructure Design
Each section is divided into subsections that include detailed discussion, diagrams, and design
decisions with justifications.
The Physical Infrastructure Design section focuses on the three main pillars of any data center,
compute, storage and network. In this section you find information about availability zones and
regions. The section also provides details on the rack and pod configuration, and on physical hosts
and the associated storage and network configurations.
In the Virtual Infrastructure Design section, you find details on the core virtualization software
configuration. This section has information on the ESXi hypervisor, vCenter Server, the virtual
network design including VMware NSX, and on software-defined storage for VMware Virtual SAN.
This section also includes details on business continuity (backup and restore) and on disaster
recovery.
The Cloud Management Platform Design section contains information on the consumption and
orchestration layer of the SDDC stack, which uses vRealize Automation and vRealize Orchestrator. IT
organizations can use the fully distributed and scalable architecture to streamline their provisioning
and decommissioning operations.
The Operations Infrastructure Design section explains how to architect, install, and configure vRealize
Operations Manager and vRealize Log Insight. You learn how to ensure that service management
within the SDDC is comprehensive. This section ties directly into the Operational Guidance section.
Note This design leverages a single availability zone for a one region deployment, and a single
availability zone in each region in the case of a two region deployment.
The design uses the following regions. The region identifier uses United Nations Code for Trade and
Transport Locations (UN/LOCODE) along with a numeric instance ID.
Table 5. Regions
SDDC- Per region, a single A single availability zone can Results in limited
PHY-001 availability zone that support all SDDC management redundancy of the overall
can support all and compute components for a solution. The single
SDDC management region. You can later add availability zone can
components is another availability zone to become a single point of
deployed. extend and scale the failure and prevent high-
management and compute availability design
capabilities of the SDDC. solutions.
SDDC- Use two regions. Supports the technical Having multiple regions
PHY-002 requirement of multi-region will require an increased
failover capability as outlined in solution footprint and
the design objectives. associated costs.
Storage pods 6 0 (if using Storage that is not Virtual SAN storage is
Virtual SAN hosted on isolated storage pods.
for Compute
Pods)
Total 13 1
SDDC-PHY-003 A single compute pod Scaling out of the Dual power supplies
is bound to a physical SDDC infrastructure is and power feeds are
rack. simplified by through a required to ensure
1:1 relationship availability of
between a compute hardware
pod and the compute components.
resources contained
within a physical rack.
SDDC-PHY-005 Storage pods can To simplify the scale The design must
occupy one or more out of the SDDC include sufficient
racks. infrastructure, the power and cooling to
storage pod to rack(s) operate the storage
relationship has been equipment. This
standardized. depends on the
selected vendor and
It is possible that the
products.
storage system arrives
from the manufacturer
in dedicated rack or
set of racks and a
storage system of this
type is accommodated
for in the design.
SDDC-PHY-006 Each rack has two Redundant power All equipment used
separate power feeds. feeds increase must support two
availability by ensuring separate power feeds.
that failure of a power The equipment must
feed does not bring keep running if one
down all equipment in power feed fails.
a rack.
If the equipment of an
Combined with entire rack fails, the
redundant network cause, such as
connections into a rack flooding or an
and within a rack, earthquake, also
redundant power feeds affects neighboring
prevent failure of racks. A second
equipment in an entire region is needed to
rack. mitigate downtime
associated with such
an event.
SDDC- Use Virtual SAN Using a Virtual SAN Ready Node Might limit hardware
PHY-009 Ready Nodes. ensures seamless compatibility choices.
with Virtual SAN during the
deployment.
SDDC- All nodes must A balanced cluster delivers more Vendor sourcing,
PHY-010 have uniform predictable performance even budgeting and
configurations during hardware failures. In procurement
across a given addition, performance impact considerations for uniform
cluster. during resync/rebuild is minimal server nodes will be
when the cluster is balanced. applied on a per cluster
basis.
Note See the VMware Virtual SAN 6.0 Design and Sizing Guide for more information about disk
groups, including design and sizing guidance. The number of disk groups and disks that an
ESXi host manages determines memory requirements. 32 GB of RAM is required to support
the maximum number of disk groups.
SDDC- Set up each ESXi host in the The VMs in the management None
PHY-011 management and edge pods to and edge pods require a total
have a minimum 128 GB RAM. 375 GB RAM.
Instead of relying on one or two large chassis-based switches at the core, the load is distributed
across all spine switches, making each individual spine insignificant as the environment scales out.
Scalability
Several factors, including the following, affect scale.
Number of racks that are supported in a fabric
Amount of bandwidth between any two racks in a data center
Number of paths a leaf switch can select from when communicating with another rack
The total number of available ports dictates the number of racks supported in a fabric across all spine
switches and the acceptable level of oversubscription.
Different racks might be hosting different types of infrastructure. For example, a rack might host filers
or other storage systems, which might attract or source more traffic than other racks in a data center.
In addition, traffic levels of compute racks (that is, racks that are hosting hypervisors with workloads
or virtual machines) might have different bandwidth requirements than edge racks, which provide
connectivity to the outside world. Link speed as well as the number of links vary to satisfy different
bandwidth demands.
The number of links to the spine switches dictates how many paths are available for traffic from this
rack to another rack. Because the number of hops between any two racks is consistent, equal-cost
multipathing (ECMP) can be used. Assuming traffic sourced by the servers carry a TCP or UDP
header, traffic spray can occur on a per-flow basis.
Note VXLANs need an MTU value of at least 1600 bytes on the switches and routers that carry the
transport zone traffic.
SDDC- Configure the MTU Setting the MTU to 9000 bytes When adjusting the MTU
PHY- size to 9000 bytes (Jumbo Frames) improves packet size, the entire
NET-001 (Jumbo Frames) on traffic throughput. network path (VMkernel
the portgroups that port, distributed switch,
In order to support VXLAN the
support the following physical switches and
MTU setting must be increased
traffic types. routers) must also be
to a minimum of 1600 bytes,
configured to support the
NFS setting this portgroup also to
same MTU packet size.
9000 bytes has no effect on
Virtual SAN VXLAN but ensures
vMotion consistency across portgroups
that are adjusted from the
VXLAN default MTU size.
vSphere
Replication
Each ESXi host in the management/shared edge and compute rack is connected to the SDDC
network fabric and also to the Wide Area Network (WAN) and to the Internet, as show in the following
illustration.
Figure 28. Leaf Switch to Server Connection within Management/Shared Compute and Edge
Rack
Note The following IP ranges are meant as samples. Your actual implementation depends on your
environment.
Routing protocols. Base the selection of the external routing protocol on your current
implementation or on available expertise among the IT staff. Take performance requirements into
consideration. Possible options are OSPF, iBGP and IS-IS.
DHCP proxy. The DHCP proxy must point to a DHCP server via its IPv4 address. See the
External Service Dependencies section for details on the DHCP server.
Table 13. Physical Network Design Decisions
SDDC- Each rack uses two This design uses two 10 Requires two ToR
PHY-NET- ToR switches. These GbE links to provide switches per rack which
004 switches provide redundancy and reduce can increase costs.
connectivity across two overall design complexity.
10 GbE links to each
server.
SDDC- Use VLANs to segment Allow for Physical network Uniform configuration and
PHY-NET- physical network connectivity without requiring presentation is required
005 functions. large number of NICs. on all the trunks made
available to the ESXi
Segregation is needed for
hosts.
the different network
functions that are required in
the SDDC. This allows for
differentiated services and
prioritization of traffic as
needed.
SDDC- NTP time source will be used Critical to maintain accurate None
PHY-NET- for all management nodes of and synchronized time
008 the SDDC infrastructure. between management nodes.
The following requirements and dependencies summarize the information in the VMware Virtual SAN
documentation. The design decisions of this VMware Validated Design fulfill these requirements.
The software-defined storage module has the following requirements and options:
Minimum of 3 hosts providing storage resources to the Virtual SAN cluster.
Virtual SAN is configured as hybrid storage or all-flash storage.
o A Virtual SAN hybrid storage configuration requires both magnetic devices and flash caching
devices.
o An All-Flash Virtual SAN configuration requires vSphere 6.0 or later.
Each ESXi host that provides storage resources to the cluster must meet the following
requirements:
o Minimum of one SSD.
o The SSD flash cache tier should be at least 10% of the size of the HDD capacity tier.
o Minimum of two HHDs.
o RAID controller compatible with VMware Virtual SAN.
o 10 Gbps network for Virtual SAN traffic with Multicast enabled.
o vSphere High Availability Isolation Response set to power off virtual machines. With this
setting, no possibility of split brain conditions in case of isolation or network partition exists. In
a split-brain condition, the virtual machine might be powered on by two hosts by mistake. See
design decision SDDC-VI-VC-024 for more details.
Table 15. Virtual SAN Physical Storage Design Decision
SDDC- Use one 200 GB SSD Allow enough capacity Having only one disk group
PHY-STO- and two traditional 1 TB for the management limits the amount of striping
001 HDDs to create a single VMs with a minimum of (performance) capability and
disk group in the 10% flash-based increases the size of the
management cluster. caching. fault domain.
SDDC- Configure The VMs in the management cluster, Virtual SAN hybrid mode
PHY-STO- Virtual SAN in which are hosted within Virtual SAN, does not provide the
002 hybrid mode. do not require the performance or potential performance of
expense of an all-flash Virtual SAN an all-flash configuration.
configuration.
Hardware Considerations
You can build your own VMware Virtual SAN cluster or choose from a list of Virtual SAN Ready
Nodes.
Build Your Own. Be sure to use hardware from the VMware Compatibly Guide for Virtual SAN
(https://www.vmware.com/resources/compatibility/search.php?deviceCategory=Virtual SAN) for
the following components:
o Solid state disks (SSDs)
o Magnetic hard drives (HDDs)
o I/O controllers, including Virtual SAN certified driver/firmware combinations
Use VMware Virtual SAN Ready Nodes. A Virtual SAN Ready Node is a validated server
configuration in a tested, certified hardware form factor for Virtual SAN deployment, jointly
recommended by the server OEM and VMware. See the VMware Virtual SAN Compatibility Guide
(https://www.vmware.com/resources/compatibility/pdf/vi_vsan_rn_guide.pdf). The Virtual SAN
Ready Node documentation provides examples of standardized configurations, including the
numbers of VMs supported and estimated number of 4K IOPS delivered.
As per design decision SDDC-PHY-009, the VMware Validated Design uses Virtual SAN Ready
Nodes.
Note All drives listed in the VMware Compatibility Guide for Virtual SAN
(https://www.vmware.com/resources/compatibility/search.php?deviceCategory=Virtual SAN)
meet the Class D requirements.
The reasoning behind using TBW is that VMware now offers the flexibility to use larger capacity drives
with lower DWPD specifications.
If a SSD vendor uses Drive Writes Per Day as a measurement, you can calculate endurance in
Terabytes Written (TBW) as follows:
TBW (over 5 years) = Drive Size x DWPD x 365 x 5
For example, if a vendor specified DWPD = 10 for a 800 GB capacity SSD, you can compute TBW as
follows:
TBW = 0.4TB X 10DWPD X 365days X 5yrs
TBW = 7300TBW
That means the SSD supports 7300TB writes over 5 years. (Higher TBW figures denote a higher
endurance class).
For SSDs that are designated for caching and all-flash capacity layers, the following table outlines
which endurance class to use for hybrid and for all-flash VMware Virtual SAN.
Table 17. Hybrid and All-Flash Virtual SAN Endurance Classes
Note This VMware Validated Design does not use All-Flash Virtual SAN.
SDDC- Use Class D If a SSD designated for the SSDs with higher
PHY-STO- (>=7300TBW) SSDs caching tier fails due to wear-out, endurance may be
003 for the caching tier of the entire VMware Virtual SAN more expensive than
the management disk group becomes unavailable. lower endurance
cluster. The result is potential data loss or classes.
operational impact.
SSD Performance
There is a direct correlation between the SSD performance class and the level of Virtual SAN
performance. The highest-performing hardware results in the best performance of the solution. Cost is
therefore the determining factor. A lower class of hardware that is more cost effective might be
attractive even if the performance or size is not ideal. For optimal performance of Virtual SAN, select
class E SSDs. See the VMware Compatibility Guide for Virtual SAN
(https://www.vmware.com/resources/compatibility/search.php?deviceCategory=Virtual SAN) for detail
on the different classes.
SSD Performance Design Decision Background
Select a high class of SSD for optimal performance of VMware Virtual SAN. Before selecting a drive
size, consider disk groups and sizing as well as expected future growth. VMware defines classes of
performance in the VMware Compatibility Guide for Virtual SAN
(https://www.vmware.com/resources/compatibility/search.php?deviceCategory=vsan) as follows:
Table 19. SSD Performance Classes
Class F 100,000 +
Select an SSD size that is, at a minimum, 10 percent of the anticipated size of the consumed HDD
storage capacity, before failures to tolerate are considered. For example, select an SSD of at least
100 GB for 1 TB of HDD storage consumed in a 2 TB disk group.
Caching Algorithm
Both hybrid clusters and all-flash configurations adhere to the recommendation that 10% of consumed
capacity for the flash cache layer. However, there are differences between the two configurations:
Hybrid Virtual SAN. 70% of the available cache is allocated for storing frequently read disk
blocks, minimizing accesses to the slower magnetic disks. 30% of available cache is allocated to
writes.
All-Flash Virtual SAN. All-flash clusters have two types of flash: very fast and durable write
cache, and cost-effective capacity flash. Here cache is 100% allocated for writes, as read
performance from capacity flash is more than sufficient.
Use Class E SSDs for the highest possible level of performance from the VMware Virtual SAN
volume.
Table 20. SSD Performance Class Selection
SDDC- Use Class E SSDs The storage I/O performance Class E SSDs might
PHY-STO- (30,000-100,000 writes requirements within the be more expensive
004 per second) for the Management cluster dictate the than lower class
management cluster. need for at least Class E SSDs. drives.
Capacity 7,200
Performance 10,000
Cache-friendly workloads are less sensitive to disk performance characteristics; however, workloads
can change over time. HDDs with 10,000 RPM are the accepted norm when selecting a capacity tier.
For the software-defined storage module, VMware recommends that you use an HDD configuration
that is suited to the characteristics of the environment. If there are no specific requirements, selecting
10,000 RPM drives achieves a balance between cost and availability.
Table 23. HDD Characteristic Selection
SDDC- Use 10,000 RPM 10,000 RPM HDDs achieve a balance Slower and
PHY-STO- HDDs for the between performance and availability potentially cheaper
005 management for the VMware Virtual SAN HDDs are not
cluster. configuration. available.
The performance of 10,000 RPM HDDs
avoids disk drain issues. In Virtual SAN
hybrid mode, the Virtual SAN
periodically flushes uncommitted writes
to the capacity tier.
Requirements
Your environment must meet the following are requirements to use NFS storage in the VMware
Validated Design.
Storage arrays are connected directly to the leaf switches.
All connections are made using 10 Gb Ethernet.
Jumbo Frames are enabled.
10K SAS (or faster) drives are used in the storage array.
Different disk speeds and disk types can be combined in an array to create different performance and
capacity tiers. The management cluster uses 10K SAS drives in the RAID configuration
recommended by the array vendor to achieve the required capacity and performance.
Table 26. NFS Hardware Design Decision
SDDC- Use 10K SAS drives 10K SAS drives achieve a balance 10K SAS drives are
PHY-STO- for management between performance and capacity. generally more
007 cluster NFS Faster drives can be used if expensive than other
volumes. desired. alternatives.
vSphere Data Protection requires
high-performance datastores in
order to meet backup SLAs.
vRealize Automation uses NFS
datastores for its content catalog
which requires high-performance
datastores.
vRealize Log Insight uses NFS
datastores for its archive storage
which, depending on compliance
regulations, can use a large amount
of disk space.
Volumes
A volume consists of multiple disks in a storage array. RAID is applied at the volume level. The more
disks in a volume, the better the performance and the greater the capacity.
Multiple datastores can be created on a single volume, but for applications that do not have a high I/O
footprint a single volume with multiple datastores is sufficient.
For high I/O applications, such as backup applications, use a dedicated volume to avoid
performance issues.
For other applications, set up Storage I/O Control (SIOC) to impose limits on high I/O applications
so that other applications get the I/O they are requesting.
Table 27. Volume Assignment Design Decisions
SDDC- Use a dedicated The backup and restore Dedicated volumes add
PHY- NFS volume to process is I/O intensive. management overhead to
STO-008 support backup Using a dedicated NFS storage administrators.
requirements. volume ensures that the Dedicated volumes might use
process does not impact the more disks, depending on the
performance of other array and type of RAID.
management components.
Note A region in the VMware Validated Design is equivalent to the site construct in Site Recovery
Manager.
SDDC-VI- Install and configure USB or SD cards are an When you use USB or SD storage,
ESXi-001 all ESXi hosts to boot inexpensive and easy to ESXi logs are not retained.
using local USB or configure option for installing Configure remote syslog (such as
SD devices. ESXi. vRealize Log Insight) to collect
ESXi host logs.
Using local USB or SD
allows allocation of all local
HDDs to a VMware Virtual
SAN storage system.
SDDC-VI- Add each host to the child Using Active Directory Adding hosts to
ESXi-002 Active Directory domain for the membership allows greater the domain can
region and in which it will flexibility in granting add some
reside. e.g. sfo01.rainpole.local access to ESXi hosts. administrative
or lax01.rainpole.local overhead.
Ensuring that users log in
with a unique user account
allows greater visibility for
auditing.
SDDC-VI- Deploy two vCenter Server Isolates vCenter Server failures to Requires
VC-001 systems in the first management or compute licenses for
availability zone of each workloads. each vCenter
region. Server
Isolates vCenter Server operations
instance.
One vCenter Server between management and
supporting the SDDC compute.
management
Supports a scalable cluster design
components.
where the management
One vCenter Server components may be re-used as
supporting the edge additional compute needs to be
components and added to the SDDC.
compute workloads. Simplifies capacity planning for
compute workloads by eliminating
management workloads from
consideration in the the Compute
vCenter Server.
Improves the ability to upgrade the
vSphere environment and related
components by providing for
explicit separation of maintenance
windows:
Management workloads
remain available while
workloads in compute are
being addressed
Compute workloads remain
available while workloads in
management are being
addressed
Ability to have clear separation of
roles and responsibilities to ensure
that only those administrators with
proper authorization can attend to
the management workloads.
Facilitates quicker troubleshooting
and problem resolution.
Simplifies Disaster Recovery
operations by supporting a clear
demarcation between recovery of
the management components and
compute workloads.
Enables the use of two NSX
managers, one for the
management pod and the other for
the shared edge and compute pod.
Network separation of the pods in
the SDDC allows for isolation of
potential network issues.
You can install vCenter Server as a Windows-based system or deploy the Linux-based VMware
vCenter Server Appliance. The Linux-based vCenter Server Appliance is preconfigured, enables fast
deployment, and potentially results in reduced Microsoft licensing costs.
Table 32. vCenter Server Platform Design Decisions
SDDC-VI- Deploy all vCenter Allows for rapid Operational staff might
VC-002 Server instances as deployment, enables need Linux experience to
Linux-based vCenter scalability, and reduces troubleshoot the Linux-
Server Appliances. Microsoft licensing costs. based appliances.
SDDC-VI- Join all Platform When all Platform Services Only one Single
VC-004 Services Controller Controller instances are joined into Sign-On domain
instances to a single a single vCenter Single Sign-On will exist.
vCenter Single Sign- domain, they can share
On domain. authentication and license data
across all components and regions.
Figure 35. vCenter Server and Platform Services Controller Deployment Model
SDDC-VI- Protect all vCenter Supports availability objectives for vCenter Server will
VC-006 Server appliances by vCenter Server appliances without be unavailable
using vSphere HA. a required manual intervention during a vSphere
during a failure event. HA failover.
Attribute Specification
Number of CPUs 2
Memory 16 GB
Attribute Specification
Attribute Specification
Number of CPUs 16
Memory 32 GB
SDDC-VI- Set up all vCenter Reduces both overhead and The vCenter Server
VC-009 Server instances to Microsoft or Oracle licensing Appliance has limited
use the embedded costs. Avoids problems with database management
PostgreSQL upgrades. Support for external tools for database
databases. databases is deprecated for administrators.
vCenter Server Appliance in the
next release.
SDDC-VI- Set vSphere HA Virtual SAN requires that the VMs are powered off in case of
VC-011 Host Isolation HA Isolation Response be a false positive and a host is
Response to set to Power Off and to declared isolated incorrectly.
Power Off. restart VMs on available
hosts.
SDDC-VI- Set vSphere HA for the Using the percentage- based If additional hosts are
VC-015 management cluster reservation works well in added to the cluster,
to reserve 25% of situations where virtual more resources are
cluster resources for machines have varying and being reserved for
failover. sometime significant CPU or failover capacity.
memory reservations.
Recalculate the
percentage of
reserved resources
when additional hosts
are added to the
cluster.
Attribute Specification
Capacity for host failures per cluster 25% reserved CPU & RAM
SDDC-VI- Create a shared NSX Manager requires a 1:1 Each time you provision a
VC-016 edge and compute relationship with a vCenter Compute vCenter Server
cluster for the NSX Server system. system, a new NSX
Controllers and Manager is required.
NSX Edge
Set anti-affinity rules to
gateway devices.
keep each Controller on a
separate host. A 4-node
cluster allows maintenance
while ensuring that the 3
Controllers remain on
separate hosts.
SDDC-VI- Set vSphere HA vSphere HA protects the NSX If one of the hosts becomes
VC-017 for the shared Controller instances and edge unavailable, two Controllers
edge and compute services gateway devices in run on a single host..
cluster to reserve the event of a host failure.
25% of cluster vSphere HA powers on virtual
resources for machines from the failed hosts
failover. on any remaining hosts..
SDDC-VI- Create shared 3 NSX Controllers are required 4 hosts is the smallest
VC-018 edge and compute for sufficient redundancy and starting point for the shared
cluster with a majority decisions. edge and compute cluster
minimum of 4 for redundancy and
One host is available for
hosts. performance thus
failover and to allow for
increasing cost over a 3
scheduled maintenance.
node cluster.
SDDC-VI- Set up VLAN- Edge gateways need access VLAN-backed port groups
VC-019 backed port to the external network in must be configured with the
groups for external addition to the management correct number of ports, or
access and network. with elastic port allocation
management on
the shared edge
and compute
cluster hosts.
SDDC-VI- Create a resource The NSX components control During contention SDDC
VC-020 pool for the all network traffic in and out of NSX components receive
required SDDC the SDDC as well as update more resources then all
NSX Controllers route information for inter- other workloads as such
and edge SDDC communication. In a monitoring and capacity
appliances with a contention situation it is management must be a
CPU share level of imperative that these virtual proactive activity.
High, a memory machines receive all the
share of normal, resources required.
and 15 GB
memory
reservation.
SDDC-VI- Create a resource NSX edges for users, created During contention these
VC-021 pool for all user by vRealize Automation, NSX edges will receive less
NSX Edge devices support functions such as load resources then the SDDC
with a CPU share balancing for user workloads. edge devices as such
value of Normal These edge devices do not monitoring and capacity
and a memory support the entire SDDC as management must be a
share value of such they receive fewer proactive activity.
Normal. resources during contention.
SDDC-VI- Create a resource In a shared edge and compute During contention user
VC-022 pool for all user cluster the SDDC edge workload virtual machines
virtual machines devices must be guaranteed could be starved for
with a CPU share resources above all other resources and experience
value of Normal workloads as to not impact poor performance. It is
and a memory network connectivity. Setting critical that monitoring and
share value of the share values to normal capacity management must
Normal. gives the SDDC edges more be a proactive activity and
shares of resources during that capacity is added or a
contention ensuring network dedicated edge cluster is
traffic is not impacted. created before contention
occurs.
Attribute Specification
Minimum number of hosts required to support the shared edge and compute cluster 4
SDDC-VI- The hosts in each compute The spine-and-leaf Fault domains are
VC-023 cluster are contained within a architecture dictates that limited to each rack.
single rack. all hosts in a cluster
must be connected to
the same top-of-rack
switches.
SDDC-VI- Enable DRS on all The default settings provide In the event of a vCenter
VC-026 clusters and set it to the best trade-off between outage, mapping from
automatic, with the load balancing and virtual machines to ESXi
default setting excessive migration with hosts might be more
(medium). vMotion events. difficult to determine.
SDDC-VI- Enable Enhanced vMotion Allows cluster You can enable EVC only
VC-027 Compatibility on all clusters. upgrades without if clusters contain hosts
virtual machine with CPUs from the same
Set EVC mode to the lowest
downtime. vendor.
available setting supported
for the hosts in the cluster.
High latency on any network can negatively affect performance. Some components are more
sensitive to high latency than others. For example, reducing latency is important on the IP storage
and the vSphere Fault Tolerance logging network because latency on these networks can negatively
affect the performance of multiple virtual machines.
Depending on the application or service, high latency on specific virtual machine networks can also
negatively affect performance. Use information gathered from the current state analysis and from
interviews with key stakeholder and SMEs to determine which workloads and networks are especially
sensitive to high latency.
Note For VLAN and MTU checks, at least two physical NICs for the distributed switch are required.
For a teaming policy check, at least two physical NICs and two hosts are required when
applying the policy.
Parameter Setting
Failback No
This section expands on the logical network design by providing details on the physical NIC layout
and physical network attributes.
Table 54. Management Virtual Switches by Physical/Virtual NIC
vDS-Mgmt 0 Uplink
vDS-Mgmt 1 Uplink
vDS-Mgmt- 1500
vDS-Mgmt Management Management Traffic
Management (Default)
vSphere Replication
traffic
vDS-Mgmt Replication vDS-Mgmt-VR 9000
vSphere Replication
NFC traffic
Auto Generated
vDS-Mgmt VTEP - 9000
(NSX VTEP)
For more information on the physical network design specifications, see the Physical Network Design
section.
Table 57. Virtual Switch for the shared Edge and Compute Cluster
Parameter Setting
Failback No
Figure 38. Network Switch Design for shared Edge and Compute Hosts
This section expands on the logical network design by providing details on the physical NIC layout
and physical network attributes.
Table 59. Shared Edge and Compute Cluster Virtual Switches by Physical/Virtual NIC
vDS-Comp01 0 Uplink
vDS-Comp01 1 Uplink
Table 60. Edge Cluster Virtual Switch Port Groups and VLANs
For more information on the physical network design, see the Physical Network Design section.
Compute Cluster Distributed Switches
A compute cluster vSphere Distributed Switch uses the following configuration settings.
Table 62. Virtual Switches for Compute Cluster Hosts
Parameter Setting
Failback No
This section expands on the logical network design by providing details on the physical NIC layout
and physical network attributes.
Table 64. Compute Cluster Virtual Switches by Physical/Virtual NIC
vDS-Comp02 0 Uplink
vDS-Comp02 1 Uplink
Table 65. Compute Cluster Virtual Switch Port Groups and VLANs
Route based on
vDS-Comp02 vDS-Comp02-vMotion 0, 1 1622
physical NIC load
Route based on
vDS-Comp02 vDS-Comp02-NFS 0, 1 1625
physical NIC load
For more information on the physical network design specifications, see the Physical Network Design
section.
An active-active configuration in which two or more physical NICs in the server are assigned the
active role.
This validated design uses an active-active configuration.
Table 67. NIC Teaming and Policy
SDDC-VI- Use the Route based on physical Reduce complexity of Because NSX does not
Net-003 NIC load teaming algorithm for all the network design support Route based on
port groups except for ones that and increase resiliency physical NIC load two
carry VXLAN traffic. VTEP kernel and performance. different algorithms are
ports and VXLAN traffic will use necessary.
Route based on SRC-ID.
If configured incorrectly
Enable Network
Network I/O Control
SDDC-VI- I/O Control on all Increase resiliency and performance of
could impact network
NET-005 distributed the network.
performance for critical
switches.
traffic types.
3.2.4.7 VXLAN
VXLAN provides the capability to create isolated, multi-tenant broadcast domains across data center
fabrics and enables customers to create elastic, logical networks that span physical network
boundaries.
The first step in creating these logical networks is to abstract and pool the networking resources. Just
as vSphere abstracts compute capacity from the server hardware to create virtual pools of resources
that can be consumed as a service, vSphere Distributed Switch and VXLAN abstract the network into
a generalized pool of network capacity and separate the consumption of these services from the
underlying physical infrastructure. A network capacity pool can span physical boundaries, optimizing
compute resource utilization across clusters, pods, and geographically-separated data centers. The
unified pool of network capacity can then be optimally segmented into logical networks that are
directly attached to specific applications.
VXLAN works by creating Layer 2 logical networks that are encapsulated in standard Layer 3 IP
packets. A Segment ID in every frame differentiates the VXLAN logical networks from each other
without any need for VLAN tags. As a result, large numbers of isolated Layer 2 VXLAN networks can
coexist on a common Layer 3 infrastructure.
In the vSphere architecture, the encapsulation is performed between the virtual NIC of the guest VM
and the logical port on the virtual switch, making VXLAN transparent to both the guest virtual
machines and the underlying Layer 3 network. Gateway services between VXLAN and non-VXLAN
hosts (for example, a physical server or the Internet router) are performed by the NSX for vSphere
Edge gateway appliance. The Edge gateway translates VXLAN segment IDs to VLAN IDs, so that
non-VXLAN hosts can communicate with virtual machines on a VXLAN network.
The dedicated edge cluster hosts all NSX Edge instances and all Universal Distributed Logical Router
instances that are connect to the Internet or to corporate VLANs, so that the network administrator
can manage the environment in a more secure and centralized way.
Table 70. VXLAN Design Decisions
SDDC-VI- Use NSX for vSphere to Simplify the network Requires NSX for
Net-015 introduce VXLANs for the configuration for each tenant vSphere licenses.
use of virtual application via centralized virtual network
networks and tenants management.
networks.
SDDC-VI- Use VXLAN along with Create isolated, multi-tenant Transport networks
Net-016 NSX Edge gateways and broadcast domains across and MTU greater than
the Universal Distributed data center fabrics to create 1600 bytes has to be
Logical Router (UDLR) to elastic, logical networks that configured in the
provide customer/tenant span physical network reachability radius.
network capabilities. boundaries.
SDDC-VI- Use VXLAN along with Leverage benefits of network Requires installation
Net-017 NSX Edge gateways and virtualization in the and configuration of the
the Universal Distributed management pod. NSX for vSphere
Logical Router (UDLR) to instance in the
provide management management pod.
application network
capabilities.
SDDC-VI- Use two separate NSX SDN capabilities offered by NSX, You must install
SDN-001 instances per region. One such as load balancing and firewalls, and perform initial
instance is tied to the are crucial for the compute/edge configuration of the
Management vCenter layer to support the cloud four NSX instances
Server, and the other management platform operations, separately.
instance is tied to the and also for the management
Compute vCenter Server. applications in the management
stack that need these capabilities.
SDDC-VI- Pair NSX Manager NSX can extend the logical You must consider
SDN-002 instances in a primary- boundaries of the networking and that you can pair up
secondary relationship security services across regions. As to eight NSX
across regions for both a result, workloads can be live- Manager instances.
management and compute migrated and failed over between
workloads. regions without reconfiguring the
network and security constructs.
A client can write (create or modify) an object with an HTTP PUT or POST request that includes a
new or changed XML document for the object.
A client can delete an object with an HTTP DELETE request.
vSphere Web Client
The NSX Manager component provides a networking and security plug-in in the vSphere Web Client.
This plug-in provides an interface to consuming virtualized networking from the NSX Manager for
users that have sufficient privileges.
Table 72. Consumption Method Design Decisions
SDDC-VI- For the shared edge and vRealize Automation services Customers typically
SDN-003 compute cluster NSX are used for the customer- interact only indirectly
instance, consumption is facing portal. The vSphere Web with NSX from the
accomplished by using Client consumes NSX for vRealize Automation
the vRealize Automation vSphere resources through the portal. Administrators
services, the vSphere Network and Security plug-in. interact with NSX from
Web Client, and the NSX The NSX REST API offers the the vSphere Web Client
REST API. potential of scripting repeating and API.
actions and operations.
SDDC-VI- For the management Ensures that infrastructure Tenants do not have
SDN-004 cluster NSX instance, components are not modified by access to the
consumption is only by tenants and/or non-provider management stack
provider staff via the staff. workloads.
vSphere Web Client and
the API.
NSX Manager
NSX Manager provides the centralized management plane for NSX for vSphere and has a one-to-one
mapping to vCenter Server workloads.
NSX Manager performs the following functions.
Provides the single point of configuration and the REST API entry-points for NSX in a vSphere
environment.
Deploys NSX Controller clusters, Edge distributed routers, and Edge service gateways in the form
of OVF appliances, guest introspection services, and so on.
Prepares ESXi hosts for NSX by installing VXLAN, distributed routing and firewall kernel modules,
and the User World Agent (UWA).
Communicates with NSX Controller clusters over REST and with hosts over the RabbitMQ
message bus. This internal message bus is specific to NSX for vSphere and does not require
setup of additional services.
Generates certificates for the NSX Controller instances and ESXi hosts to secure control plane
communications with mutual authentication.
NSX Controller
An NSX Controller performs the following functions.
Provides the control plane to distribute VXLAN and logical routing information to ESXi hosts.
Includes nodes that are clustered for scale-out and high availability.
Slices network information across cluster nodes for redundancy.
Requirement Comments
For the hybrid replication mode, Internet IGMP snooping on Layer 2 switches is a requirement of
Group Management Protocol (IGMP) the hybrid replication mode. Hybrid replication mode is
snooping must be enabled on the Layer 2 the recommended replication mode for broadcast,
switches to which ESXi hosts that unknown unicast, and multicast (BUM) traffic when
participate in VXLAN are attached. IGMP deploying into an environment with large scale-out
querier must be enabled on the connected potential. The traditional requirement for Protocol
router or Layer 3 switch. Independent Multicast (PIM) is removed.
Dynamic routing support on the upstream Enable a dynamic routing protocol supported by NSX
Layer 3 data center switches must be on the upstream data center switches to establish
enabled. dynamic routing adjacency with the ESGs.
Note NSX ESG sizing can vary with tenant requirements, so all options are listed.
NSX Manager 4 16 GB 60 GB 1
NSX Controller 4 4 GB 20 GB 3
512 MB
(Compact)
512 MB
1 (Compact) (Compact) 512 MB
(Large)
2 (Large) 1 GB (Large)
512 MB Optional component. Deployment of
NSX ESG 4 (Quad 1 GB (Quad (Quad Large) the NSX ESG varies per use case.
Large) Large)
4.5 GB (X-
6 (X-Large) 8 GB (X- Large)
Large)
(+4 GB with
swap)
Note Edge service gateway throughput is influenced by the WAN circuit, so an adaptable
approach, that is, converting as necessary, is recommended.
SDDC-VI- Use large size NSX The large size provides all the performance None.
SDN-006 Edge service characteristics needed even in the event of a
gateways. failure.
A larger size would also provide the
performance required but at the expense of
extra resources that wouldn't be used.
In this document, tenant refers to a tenant of the cloud management platform within the compute/edge
stack or to a management application within the management stack.
The conceptual design has the following key components.
External Networks. Connectivity to and from external networks is through the perimeter firewall.
The main external network is the Internet.
Perimeter Firewall. The physical firewall exists at the perimeter of the data center. Each tenant
receives either a full instance or partition of an instance to filter external traffic.
Provider Logical Router (PLR). The PLR exists behind the perimeter firewall and handles
north/south traffic that is entering and leaving tenant workloads.
NSX for vSphere Distributed Logical Router (DLR). This logical router is optimized for
forwarding in the virtualized space, that is, between VMs, on VXLAN port groups or VLAN-backed
port groups.
Internal Non-Tenant Network. A single management network, which sits behind the perimeter
firewall but not behind the PLR. Enables customers to manage the tenant environments.
Internal Tenant Networks. Connectivity for the main tenant workload. These networks are
connected to a DLR, which sits behind the PLR. These networks take the form of VXLAN-based
NSX for vSphere logical switches. Tenant virtual machine workloads will be directly attached to
these networks.
SDDC-VI- For the compute stack, Simplifies configuration and The NSX Controller instances,
SDN-007 do not use a dedicated minimizes the number of NSX Edge services gateways,
edge cluster. hosts required for initial and DLR control VMs of the
deployment. compute stack are deployed in
the shared edge and compute
cluster.
The shared nature of the
cluster will require the cluster
to be scaled out as compute
workloads are added so as to
not impact network
performance.
SDDC-VI- For the management The number of supported The NSX Controller instances,
SDN-008 stack, do not use a management applications NSX Edge service gateways,
dedicated edge cluster. does not justify the cost of and DLR control VMs of the
a dedicated edge cluster in management stack are
the management stack. deployed in the management
cluster.
The logical design of NSX considers the vCenter Server clusters and define the place where each
NSX component runs.
Figure 42. Cluster Design for NSX for vSphere
NSX Edge components that are deployed for north/south traffic are configured in equal-cost multi-
path (ECMP) mode that supports route failover in seconds. NSX Edge components deployed for load
balancing utilize NSX HA. NSX HA provides faster recovery than vSphere HA alone because NSX HA
uses an active/passive pair of NSX Edge devices. By default, the passive Edge device becomes
active within 15 seconds. All NSX Edge devices are also protected by vSphere HA.
Scalability of NSX Components
A one-to-one mapping between NSX Manager instances and vCenter Server instances exists. If the
inventory of either the management stack or the compute stack exceeds the limits supported by a
single vCenter Server, then you can deploy a new vCenter Server instance, and must also deploy a
new NSX Manager instance. You can extend transport zones by adding more compute and edge
clusters until you reach the vCenter Server limits. Consider the limit of 100 DLRs per ESXi host
although the environment usually would exceed other vCenter Server limits before the DLR limit.
vSphere Distributed Switch Uplink Configuration
Each ESXi host utilizes two physical 10 Gb Ethernet adapters, associated with the uplinks on the
vSphere Distributed Switches to which it is connected. Each uplink is connected to a different top-of-
rack switch to mitigate the impact of a single top-of-rack switch failure and to provide two paths in and
out of the SDDC.
SDDC-VI- Set up VXLAN Tunnel Allows for the utilization of Link aggregation such as LACP
SDN-010 Endpoints (VTEPs) to the two uplinks of the vDS between the top-of-rack (ToR)
use Route based on resulting in better switches and ESXi host must not
SRC-ID for teaming and bandwidth utilization and be configured in order to allow
failover configuration. faster recovery from dynamic routing to peer between
network path failures. the ESGs and the upstream
switches.
SDDC-VI- For the compute stack, A single Universal Transport You must consider that
SDN-012 use a single universal zone supports extending you can pair up to eight
transport zone that networks and security policies NSX Manager instances. If
encompasses all shared across regions. This allows the solution grows past
edge and compute, and seamless migration of eight NSX Manager
compute clusters from applications across regions instances, you must deploy
all regions.. either by cross vCenter vMotion a new primary manager
or by failover recovery with Site and new transport zone.
Recovery Manager.
SDDC-VI- For the management A single Universal Transport You must consider that
SDN-013 stack, use a single zone supports extending you can pair up to eight
universal transport zone networks and security policies NSX Manager instances. If
that encompasses all across regions. This allows the solution grows past
management clusters. seamless migration of the eight NSX Manager
management applications instances, you must deploy
across regions either by cross- a new primary manager
vCenter vMotion or by failover and new transport zone.
recovery with Site Recovery
Manager.
SDDC- Deploy NSX Edge The NSX ESG is the ECMP requires 2 VLANS
VI-SDN- Services Gateways in an recommended device for for uplinks which adds an
014 ECMP configuration for managing north/south traffic. additional VLAN over
north/south routing in Using ECMP provides multiple traditional HA ESG
both management and paths in and out of the SDDC. configurations.
shared edge and This results in faster failover
compute clusters. times than deploying Edge
service gateways in HA mode.
SDDC- Deploy a single NSX Using the UDLR reduces the hop DLRs are limited to 1,000
VI-SDN- UDLR for the count between nodes attached to logical interfaces. When
015 management cluster to it to 1. This reduces latency and that limit is reached, a
provide east/west routing improves performance. new UDLR must be
across all regions. deployed.
SDDC- Deploy a single NSX Using the UDLR reduces the hop DLRs are limited to 1,000
VI-SDN- UDLR for the shared count between nodes attached to logical interfaces. When
016 edge and compute, and it to 1. This reduces latency and that limit is reached a new
compute clusters to improves performance. UDLR must be deployed.
provide east/west routing
across all regions.
SDDC- Deploy all NSX UDLRs When local egress is enabled, All north/south traffic is
VI-SDN- without the local egress control of ingress traffic, is also routed through Region A
017 option enabled. necessary (for example using until those routes are no
NAT). This becomes hard to longer available. At that
manage for little to no benefit. time, all traffic dynamically
changes to Region B.
SDDC- Use BGP as the Using BGP as opposed to OSPF BGP requires configuring
VI-SDN- dynamic routing protocol eases the implementation of each ESG and UDLR with
018 inside the SDDC. dynamic routing. There is no the remote router that it
need to plan and design access exchanges routes with.
to OSPF area 0 inside the SDDC.
OSPF area 0 varies based on
customer configuration.
SDDC- Configure BGP Keep With Keep Alive and Hold Timers If an ESXi host becomes
VI-SDN- Alive Timer to 1 and between the UDLR and ECMP resource constrained, the
019 Hold Down Timer to 3 ESGs set low, a failure is ESG running on that host
between the UDLR and detected quicker, and the routing might no longer be used
all ESGs that provide table is updated faster. even though it is still up.
north/south routing.
SDDC- Configure BGP Keep This provides a good balance By using longer timers to
VI-SDN- Alive Timer to 4 and between failure detection detect when a router is
020 Hold Down Timer to 12 between the ToRs and the ESGs dead, a dead router stays
between the ToR and overburdening the ToRs with in the routing table longer
switches and all ESGs keep alive traffic. and continues to send
providing north/south traffic to a dead router.
routing.
SDDC- Create a universal virtual switch The universal virtual switch Only the primary NSX
VI-SDN- for use as the transit network allows the UDLR and all Manager can create
021 between the UDLR and ESGs. ESGs across regions to and manage universal
The UDLR provides north/south exchange routing objects including this
routing in both compute and information. UDLR.
management stacks.
SDDC- Create two VLANs in each region. This enables the ESGs to Extra VLANs are
VI-SDN- Use those VLANs to enable have multiple equal-cost required.
022 ECMP between the north/south routes and provides more
ESGs and the ToR switches. resiliency and better
bandwidth utilization in the
Each ToR has an SVI on each
network.
VLAN and each north/south ESG
also has an interface on each
VLAN.
SDDC-VI- For all ESGs deployed Restricting and granting access is Explicit rules to allow
SDN-023 as load balancers, set handled by the distributed access to management
the default firewall rule firewall. The default firewall rule applications must be
to allow all traffic. does not have to do it. defined in the distributed
firewall.
SDDC-VI- For all ESGs deployed Use of ECMP on the ESGs is a Services such as NAT and
SDN-024 as ECMP north/south requirement. Leaving the firewall load balancing cannot be
routers, disable the enabled, even in allow all traffic used when the firewall is
firewall. mode, results in sporadic network disabled.
connectivity.
TCP
HTTP
Protocols TCP
HTTPS (SSL Pass-through)
HTTPS (SSL Offload)
Round Robin
Round Robin
Source IP Hash
Load balancing method Source IP Hash
Least Connection
Least Connection
URI
TCP
Health checks TCP HTTP (GET, OPTION, POST)
HTTPS (GET, OPTION, POST)
Monitoring View VIP (Virtual IP), Pool View VIP, Pool and Server objects
and Server objects and stats and statistics by using CLI and API
via CLI and API
View global statistics about VIP
View global stats for VIP sessions from the vSphere Web
sessions from the vSphere Client
Web Client
SDDC- Use the NSX load The NSX load balancer can support None.
VI-SDN- balancer. the needs of the management
025 applications. Using another load
balancer would increase cost and
add another component to be
managed as part of the SDDC.
SDDC- Use a single NSX All management applications that One management
VI-SDN- load balancer in HA require a load balancer are on a application owner could
026 mode for all single virtual wire, having a single make changes to the load
management load balancer keeps the design balancer that impact
applications. simple. another application.
SDDC- Place all virtual machines, both Bridging and routing are not Access to
VI-SDN- management and tenant, on VXLAN- possible on the same logical physical
027 backed networks unless you must switch. As a result, it makes workloads is
satisfy an explicit requirement to use sense to attach a VLAN LIF to a routed via the
VLAN-backed port groups for these distributed router or ESG and DLR or ESG.
virtual machines. If VLAN-backed route between the physical and
port groups are required, connect virtual machines. Use bridging
physical workloads that need to only where virtual machines
communicate to virtualized need access only to the physical
workloads to routed VLAN LIFs on a machines on the same Layer 2.
DLR.
SDDC-VI- Make sure that the A latency below 150 ms is required for the None.
SDN-029 latency in the connection following features.
between the regions is
Cross-vCenter vMotion
below 150ms.
The NSX design for the SDDC
SDDC-VI- Place the following Access to the The virtual application network is
SDN-030 management management fronted by an NSX Edge device for
applications on an applications is only load balancing and the distributed
application virtual through published firewall to isolate applications from
network. access points. each other and external users.
Direct access to virtual application
vRealize Automation
networks is controlled by distributed
vRealize Automation firewall rules.
Proxy Agents
vRealize Business
vRealize Business
collectors
vRealize Orchestrator
vRealize Operations
Manager
vRealize Operations
Manager remote
collectors
vRealize Log Insight
SDDC-VI- Create three Using only three A single /24 subnet is used for each
SDN-031 application virtual application virtual application virtual network. IP
networks. networks simplifies the management becomes critical to
design by sharing Layer ensure no shortage of IP addresses
Each region has a
2 networks with will appear in the future.
dedicated application
applications based on
virtual network for
their needs.
management
applications in that
region that do not
require failover.
One application virtual
network is reserved for
management
application failover
between regions.
Having software-defined networking based on NSX in the management stack makes all NSX features
available to the management applications.
This approach to network virtualization service design improves security and mobility of the
management applications, and reduces the integration effort with existing customer networks.
Note The following IP ranges are samples. Your actual implementation depends on your
environment.
Your decision to implement one technology or another can be based on performance and
functionality, and on considerations like the following:
The organization’s current in-house expertise and installation base
The cost, including both capital and long-term operational expenses
The organization’s current relationship with a storage vendor
VMware Virtual SAN
VMware Virtual SAN is a software-based distributed storage platform that combines the compute and
storage resources of ESXi hosts. It provides a simple storage management experience for the user.
This solution makes software-defined storage a reality for VMware customers. However, you must
carefully consider supported hardware options when sizing and designing a Virtual SAN cluster.
Fibre Channel
/ Fibre
Yes VMFS Yes Yes Yes Yes
Channel over
Ethernet
Virtual
Virtual SAN Yes No No Yes Yes
SAN
Move data between storage tiers during the application life cycle as needed.
SDDC-VI- Select an VAAI offloads tasks to the array Not all VAAI arrays support
Storage- array that itself, enabling the ESXi hypervisor VAAI over NFS. A plugin
003 supports VAAI to use its resources for application from the array vendor is
over NAS workloads and not become a required to enable this
(NFS). bottleneck in the storage functionality.
subsystem.
VAAI is required to support the
desired number of virtual machine
lifecycle operations.
Note VMware Virtual SAN uses storage policies to allow specification of the characteristics of
virtual machines, so you can define the policy on an individual disk level rather than at the
volume level for Virtual SAN.
You can identify the storage subsystem capabilities by using the VMware vSphere API for Storage
Awareness or by using a user-defined storage policy.
VMware vSphere API for Storage Awareness (VASA). With vSphere API for Storage
Awareness, storage vendors can publish the capabilities of their storage to VMware vCenter
Server, which can display these capabilities in its user interface.
User-defined storage policy. Defined by using the VMware Storage Policy SDK or VMware
vSphere PowerCLI (see the Sample Scripts), or from the vSphere Web Client.
You can assign a storage policy to a virtual machine and periodically check for compliance so that the
virtual machine continues to run on storage with the correct performance and availability
characteristics.
You can associate a virtual machine with a virtual machine storage policy when you create, clone, or
migrate that virtual machine. If a virtual machine is associated with a storage policy, the vSphere Web
Client shows the datastores that are compatible with the policy. You can select a datastore or
datastore cluster. If you select a datastore that does not match the virtual machine storage policy, the
vSphere Web Client shows that the virtual machine is using non-compliant storage. See Creating and
Managing vSphere Storage Policies.
Table 95. Virtual Machine Storage Policy Design Decisions
SDDC-VI- Do not use The default Virtual SAN If 3rd party or additional VMs
Storage-004 customized virtual storage policy is adequate have different storage
machine storage for the management requirements, additional VM
policies. cluster VMs. storage policies might be
required.
SDDC-VI- Enable Storage I/O Storage I/O Control ensures that Virtual machines that
Storage- Control with the all virtual machines on a use more I/O are
005 default values on the datastore receive an equal throttled to allow other
NFS datastores. amount of I/O. virtual machines access
to the datastore only
when contention occurs
on the datastore.
SDDC-VI- In the shared edge Storage I/O Control ensures that Virtual machines that
Storage- and compute cluster, all virtual machines on a use more I/O are
006 enable Storage I/O datastore receive an equal throttled to allow other
Control with default amount of I/O. For the NSX virtual machines access
values. components in this shared to the datastore only
cluster it is critical that they have when contention occurs
equal access to the datastore to on the datastore.
avoid network bottlenecks.
Capability Description
Space You can set a threshold for space use. When space use on a datastore
utilization load exceeds the threshold, vSphere Storage DRS generates recommendations or
balancing performs migrations with vSphere Storage vMotion to balance space use
across the datastore cluster.
I/O latency You can configure the I/O latency threshold to avoid bottlenecks. When I/O
load balancing latency on a datastore exceeds the threshold, vSphere Storage DRS generates
recommendations or performs vSphere Storage vMotion migrations to help
alleviate high I/O load.
Anti-affinity You can configure anti-affinity rules for virtual machine disks to ensure that the
rules virtual disks of a virtual machine are kept on different datastores. By default, all
virtual disks for a virtual machine are placed on the same datastore.
You can enable vSphere Storage I/O Control or vSphere Storage DRS for a datastore cluster. You
can enable the two features separately, even though vSphere Storage I/O control is enabled by
default when you enable vSphere Storage DRS.
Note Virtual SAN all-flash configurations are supported only with 10 GbE.
Design 1 Gb 10 Gb Comments
Quality
SDDC- Use only 10 Performance with 10 GbE is The physical network must
SDS-001 GbE for optimal. Without it, a significant support 10 Gb networking
VMware Virtual decrease in array performance between every host in the
SAN traffic. results. Virtual SAN clusters.
SDDC-SDS- Use the existing vSphere Provide high availability for All traffic paths
002 Distributed Switch Virtual SAN traffic in case of are shared over
instances in the contention by using existing common uplinks.
management and edge networking components.
clusters.
Jumbo Frames
VMware Virtual SAN supports jumbo frames for Virtual SAN traffic.
A Virtual SAN design should use jumbo frames only if the physical environment is already configured
to support them, they are part of the existing design, or if the underlying configuration does not create
a significant amount of added complexity to the design.
Table 102. Jumbo Frames Design Decision
SDDC- Use jumbo Jumbo frames are already used to Every device in the
SDS-003 frames. improve performance of vSphere network must support
vMotion and NFS storage traffic. jumbo frames.
VLANs
VMware recommends isolating VMware Virtual SAN traffic on its own VLAN. When a design uses
multiple Virtual SAN clusters, each cluster should use a dedicated VLAN or segment for its traffic.
This approach prevents interference between clusters and helps with troubleshooting cluster
configuration.
Table 103. VLAN Design Decision
SDDC- Use a dedicated VLAN for VLANs ensure VLANs span only a single
SDS-004 Virtual SAN traffic for the traffic isolation. pod.
management cluster and for the
A sufficient number of
edge cluster.
VLANs are available within
each pod and should be used
for traffic segregation.
Multicast Requirements
Virtual SAN requires that IP multicast is enabled on the Layer 2 physical network segment that is
used for intra-cluster communication. All VMkernel ports on the Virtual SAN network subscribe to a
multicast group using Internet Group Management Protocol (IGMP).
A default multicast address is assigned to each Virtual SAN cluster at the time of creation. IGMP (v3)
snooping is used to limit Layer 2 multicast traffic to specific port groups. As per the Physical Network
Design, IGMP snooping is configured with an IGMP snooping querier to limit the physical switch ports
that participate in the multicast group to only Virtual SAN VMkernel port uplinks. In some cases, an
IGMP snooping querier can be associated with a specific VLAN. However, vendor implementations
might differ.
Cluster and Disk Group Design
When considering the cluster and disk group design, you have to decide on the Virtual SAN datastore
size, number of hosts per cluster, number of disk groups per host, and the Virtual SAN policy.
VMware Virtual SAN Datastore Size
The size of the Virtual SAN datastore depends on the requirements for the datastore. Consider cost
versus availability to provide the appropriate sizing.
SDDC- Use a single disk Single disk group provides Losing an SSD in a host
SDS-007 group per ESXi host in the required performance will take the disk group
the management and usable space for the offline.
cluster. datastore.
Using two or more disk
groups can increase
availability and
performance.
Flash read Performance Default 0 Flash capacity reserved as read cache for the storage
cache is a percentage of the logical object size that will be
Max
reservation reserved for that object.
100%
(%)
Only use this setting for workloads if you must
address read performance issues. The downside of
this setting is that other objects cannot use a reserved
cache.
VMware recommends not using these reservations
unless it is absolutely necessary because unreserved
flash is shared fairly among all objects.
Object Thick Default 0 The percentage of the storage object that will be thick
space provisioning provisioned upon VM creation. The remainder of the
Max
reservation storage will be thin provisioned.
100%
(%)
This setting is useful if a predictable amount of
storage will always be filled by an object, cutting back
on repeatable disk growth operations for all but new or
non-predictable storage use.
By default, policies are configured based on application requirements. However, they are applied
differently depending on the object.
Virtual disk Uses virtual disk policy Same as virtual disk policy by default. Changes
snapshot(s) are not recommended.
Note If you do not specify a user-configured policy, the default system policy of 1 failure to tolerate
and 1 disk stripe is used for virtual disk(s) and virtual disk snapshot(s). Policy defaults for the
VM namespace and swap are set statically and are not configurable to ensure appropriate
protection for these critical virtual machine components. Policies must be configured based
on the application’s business requirements. Policies give Virtual SAN its power because it can
adjust how a disk performs on the fly based on the policies configured.
SDDC- Use the default The default Virtual SAN Additional policies might be
SDS-008 VMware Virtual storage policy provides the needed if 3rd party VMs are hosted
SAN storage level of redundancy that is in these clusters because their
policy. needed within both the performance or availability
management cluster. requirements might differ from
what the default Virtual SAN policy
supports.
SDDC- Use NFS v3 for all NFS v4.1 datastores are not NFS v3 does not
NFS-001 NFS hosted supported with Storage I/O Control support Kerberos
datastores. and with Site Recovery Manager. authentication.
Storage Access
NFS v3 traffic is transmitted in an unencrypted format across the LAN. Therefore, best practice is to
use NFS storage on trusted networks only and to isolate the traffic on dedicated VLANs.
Many NFS arrays have some built-in security, which enables them to control the IP addresses that
can mount NFS exports. Best practice is to use this feature to determine which ESXi hosts can mount
the volumes that are being exported and have read/write access to those volumes. This prevents
unapproved hosts from mounting the NFS datastores.
Exports
All NFS exports are shared directories that sit on top of a storage volume. These exports control the
access between the endpoints (ESXi hosts) and the underlying storage system. Multiple exports can
exist on a single volume, with different access controls on each.
Table 113. NFS Export Sizing
SDDC- Place the vSphere Data vSphere Data Protection is I/O Dedicated exports can
NFS-003 Protection export on its intensive. vSphere Data add management
own separate volume as Protection or other overhead to storage
per SDDC-PHY-STO- applications suffer if vSphere administrators.
008 Data Protection is placed on a
shared volume.
SDDC- For each export, limit Limiting access helps ensure Securing exports
NFS-004 access to only the the security of the underlying individually can
application VMs or hosts data. introduce operational
requiring the ability to overhead.
mount the storage.
NFS Datastores
Within vSphere environments, ESXi hosts mount NFS exports as a file-share instead of using the
VMFS clustering filesystem. For this design, only secondary storage is being hosted on NFS storage.
The datastore construct within vSphere mounts some of the exports, depending on their intended use.
For the vRealize Log Insight archive data, the application maps directly to the NFS export and no
vSphere Datastore is required.
Table 115. NFS Datastore Design Decision
SDDC- Create 2 datastores The application VMs using Do not use the NFS
NFS-005 for use across the these data assume that all datastores as primary VM
following clusters. hosts in the vSphere cluster storage in the management
can access the datastores. cluster even though that is
Management cluster:
possible.
vSphere Data
Protection
Shared Edge and
Compute cluster:
vRealize Automation
Content Library
Central user-centric and business-aware governance for all physical, virtual, private, and public
cloud services.
Design that meets the customer and business needs and is extensible.
SDDC- Deploy two appliances Enable high availability for In this active/passive
CMP-003 that replicate data vRealize Automation. configuration, manual failover
using the embedded between the two instances is
PostgreSQL database. required.
Table 118. vRealize Automation Virtual Appliance Resource Requirements per Virtual Machine
Attribute Specification
Number of vCPUs 4
Memory 18 GB
Note The vRealize Automation IaaS Web server is a separate component from the vRealize
Automation appliance.
Attribute Specification
Number of vCPUs 2
Memory 4 GB
Attribute Specification
Note The vRealize Automation IaaS Manager Service and DEM server are separate servers, but
are installed on the same virtual machine.
Table 121. vRealize Automation IaaS Model Manager and DEM Orchestrator Server Design
Decision
SDDC- Deploy two virtual The Automation IaaS More resources are required
CMP-006 machines to run both the Manager Service and for these two virtual
Automation IaaS DEM Orchestrator machines to accommodate the
Manager Service and share the same load of the two applications.
the DEM Orchestrator active/passive You can scale up the virtual
services in a load- application model. machines later if additional
balanced pool. resources are required.
Table 122. vRealize Automation IaaS Manager Service and DEM Orchestrator Server Resource
Requirements per Virtual Machine
Attribute Specification
Number of vCPUs 2
Memory 4 GB
Manager Service
vRealize Automation functions
DEM Orchestrator
SDDC- Install three Each DEM Worker can process up to 30 If you add more DEM
CMP-007 DEM Worker concurrent workflows. Beyond this limit, Workers, you must
instances per workflows are queued for execution. If the also provide
DEM host. number of concurrent workflows is additional resources
consistently above 90, you can add to run them.
additional DEM Workers on the DEM host.
Table 124. vRealize Automation DEM Worker Resource Requirements per Virtual Machine
Attribute Specification
Number of vCPUs 2
Memory 6 GB
vRealize Automation does not itself virtualize resources, but works with vSphere to provision and
manage the virtual machines. It uses vSphere agents to send commands to and collect data from
vSphere.
Table 125. vRealize Automation IaaS Agent Server Design Decisions
SDDC- Deploy two vRealize Using two virtual More resources are
CMP-008 Automation vSphere Proxy machines provides required because
Agent virtual machines. redundancy for vSphere multiple virtual machines
connectivity. are deployed for this
function.
SDDC- Abstract the proxy agent Allows the failover of the Additional application
CMP-009 virtual machines on a vRealize Automation virtual networks and
separate virtual network for instance from one site to associated edge devices
independent failover of the another independently. need to be provisioned
main vRealize Automation for those proxy agents.
components across sites.
Table 126. vRealize Automation IaaS Proxy Agent Resource Requirements per Virtual
Machines
Attribute Specification
Number of vCPUs 2
Memory 4 GB
Load Balancer
Session persistence of a load balancer allows the same server to serve all requests after a session is
established with that server. The session persistence is enabled on the load balancer to direct
subsequent requests from each unique session to the same vRealize Automation server in the load
balancer pool. The load balancer also handles failover for the vRealize Automation Server (Manager
Service) because only one Manager Service is active at any one time. Session persistence is not
enabled because it is not a required component for the Manager Service.
SDDC- Set up a load balancer for all Required to enable vRealize Additional
CMP-010 vRealize Automation services Automation to handle a configuration is
that support active/active or greater load and obtain a required to configure
active/passive configurations. higher level of availability the load balancers.
than without load balancers.
SDDC- Locate the Microsoft For simple failover of the entire Adds additional overhead
CMP-012 SQL server in the vRealize Automation instance to managing Microsoft
vRealize Automation from one region to another, the SQL services.
virtual network or set Microsoft SQL server must be
it up to have global running as a VM inside the
failover available. vRealize Automation
application virtual network.
If the environment uses a
shared SQL server, global
failover ensures connectivity
from both primary and
secondary regions.
SDDC- Set up Microsoft SQL While each organization might You might need to consult
CMP-013 server with separate have their own best practices in with the Microsoft SQL
OS volumes for SQL the deployment and database administrators of
Data, Transaction configuration of Microsoft SQL your organization for
Logs, TempDB, and server, high level best practices guidance about production
Backup. recommend separation of deployment in your
database data files and environment.
database transaction logs.
Table 133. vRealize Automation SQL Database Server Resource Requirements per VM
Attribute Specification
Number of vCPUs 8
Memory 16 GB
1
80 GB (C:) (OS)
40 GB (D:) (Application)
Number of local drives 40 GB (E:) Database Data
Total usable capacity
20 GB (F:) Database Log
20 GB (G:) TempDB
80 GB (H:) Backup
SDDC- Use unencrypted Simplifies the design and All notifications will be sent
CMP-016 anonymous SMTP. eases the SMTP unencrypted with no
configuration. authentication.
Notifications
System administrators configure default settings for both the outbound and inbound emails servers
used to send system notifications. Systems administrators can create only one of each type of server
that appears as the default for all tenants. If tenant administrators do not override these settings
before enabling notifications, vRealize Automation uses the globally configured email server.
System administrators create a global outbound email server to process outbound email notifications,
and a global inbound email server to process inbound email notifications, such as responses to
approvals.
vRealize Business for Cloud Standard
System administrators configure default settings for both the outbound and inbound emails servers
used to send system notifications. Systems administrators can create only one of each type of server
that appears as the default for all tenants. If tenant administrators do not override these settings
before enabling notifications, vRealize Automation uses the globally configured email server.
System administrators create a global outbound email server to process outbound email notifications,
and a global inbound email server to process inbound email notifications, such as responses to
approvals.
Table 136. vRealize Business for Cloud Standard Design Decision
SDDC- Use default vRealize Default reference costing is Default reference costing
CMP-019 Business reference based on industry might not accurately represent
costing database. information and is actual customer costs.
periodically updated. vRealize Business Appliance
requires Internet access to
periodically update the
reference database.
SDDC- Deploy vRealize For best performance, the In the case where the
CMP-020 Business as a three- vRealize Business environment does not
VM architecture with collectors should be implement disaster recovery
remote data regionally local to the support, you must deploy an
collectors in Region resource which they are additional appliance, the one
A and Region B. configured to collect. for the remote data collector,
Because this design although the vRealize
supports disaster recovery, Business server can handle
the CMP can reside in the load on its own.
Region A or Region B.
Table 137. vRealize Business for Cloud Standard Virtual Appliance Resource Requirements
per Virtual Machine
Attribute Specification
Number of vCPUs 2
Memory 4 GB
SDDC- Configure two business units Allows transparency Some elements such as
CMP-023 as business groups (instead of across the build profiles are visible
separate tenants). environments and to both business groups.
some level of sharing of The design does not
resources and services provide full isolation for
such as blueprints. security or auditing.
SDDC- Create separate fabric groups Provides future Initial deployment will
CMP-024 for each deployment region. isolation of fabric use a single shared
Each fabric group represent resources and potential fabric that consists of
region-specific data center delegation of duty to one compute pod.
resources. Each of the independent fabric
business groups have administrators.
reservations into each of the
fabric groups.
SDDC- Allow access to the default Isolates the default Each tenant
CMP-025 tenant only by the system tenant from individual administrator is
administrator and for the tenant configurations. responsible for
purposes of managing tenants managing their own
and modifying system-wide tenant configuration.
configurations.
Service Catalog
The service catalog provides a common interface for consumers of IT services to use to request and
manage the services and resources they need.
A tenant administrator or service architect can specify information about the service catalog, such as
the service hours, support team, and change window. While the catalog does not enforce service-
level agreements on services, this service hours, support team, and change window information is
available to business users browsing the service catalog.
Table 139. Service Catalog Design Decision
SDDC- Set up the Rainpole service Distinguishes the blueprints and None.
CMP-026 catalog with the following services that will be provisioned in
services: specific regions without over-
complicating the naming convention of
Common. Any blueprints
those catalog items.
or advanced services that
are not tied to a specific
data center.
Region A. Service catalog
that is dedicated to Region
A.
Region B. Service catalog
that is dedicated to Region
B.
Catalog Items
Users can browse the service catalog for catalog items they are entitled to request. For some catalog
items, a request results in the provisioning of an item that the user can manage. For example, the
user can request a virtual machine with Windows 2012 preinstalled, and then manage that virtual
machine after it has been provisioned.
Tenant administrators define new catalog items and publish them to the service catalog. The tenant
administrator can then manage the presentation of catalog items to the consumer and entitle new
items to consumers. To make the catalog item available to users, a tenant administrator must entitle
the item to the users and groups who should have access to it. For example, some catalog items may
be available only to a specific business group, while other catalog items may be shared between
business groups using the same tenant. The administrator determines what catalog items are
available to different users based on their job functions, departments, or location.
Typically, a catalog item is defined in a blueprint, which provides a complete specification of the
resource to be provisioned and the process to initiate when the item is requested. It also defines the
options available to a requester of the item, such as virtual machine specifications or lease duration,
or any additional information that the requester is prompted to provide when submitting the request.
Table 140. Catalog Items – Common Service Catalog Design Decision
Machine Blueprints
A machine blueprint is the complete specification for a virtual, cloud or physical machine. A machine
blueprint determines the machine's attributes, how it is provisioned, and its policy and management
settings. Machine blueprints are published as catalog items in the service catalog.
Machine blueprints can be specific to a business group or shared among groups within a tenant.
Tenant administrators can create shared blueprints that can be entitled to users in any business
group within the tenant. Business group managers can create group blueprints that can only be
entitled to users within a specific business group. A business group manager cannot modify or delete
shared blueprints. Tenant administrators cannot view or modify group blueprints unless they also
have the business group manager role for the appropriate group.
If a tenant administrator sets a shared blueprint's properties so that it can be copied, the business
group manager can also copy the shared blueprint for use as a starting point to create a new group
blueprint.
Table 141. Single Machine Blueprints
Name Description
Name Description
Windows Server + SQL Server (Production) Base Windows 2012 R2 Server with silent SQL
2012 Server install with custom properties. This is
available to the Production business group.
Windows Server + SQL Server Base Windows 2012 R2 Server with silent SQL
(Development) 2012 Server install with custom properties. This is
available to the Development business group.
Blueprint Definitions
The following sections provide details of each service definition that has been included as part of the
current phase of cloud platform deployment.
Table 142. Base Windows Server Blueprint
Lease:
Production Blueprints: No expiration date
Lease and Archival Details Development Blueprints: Minimum 30 days –
Maximum 270 days
Archive: 15 days
Default 1 4 60
Maximum 4 16 60
Lease:
Production Blueprints: No expiration date
Lease and Archival Details Development Blueprints: Minimum 30 days –
Maximum 270 days
Archive: 15 days
Default 1 6 20
Maximum 4 12 20
Table 146. Base Windows Server with SQL Server Install Requirements and Standards
Lease:
Production Blueprints: No expiration date
Lease and Archival Details Development Blueprints: Minimum 30 days –
Maximum 270 days
Archive: 15 days
Default 1 8 100
Maximum 4 16 400
SDDC- Perform branding with Provides a consistent look Logo image must be
CMP-028 corporate logo and colors on and feel in accordance provided in 800x52
the tenant and default tenant with corporate standards. pixel size.
web sites.
SDDC- Set the product name to Neutral default. This Users see this name
CMP-029 Infrastructure Service Portal. description can be as the portal display
configured on a per tenant name by default.
basis.
The following terms apply to vRealize Automation when integrated with vSphere. These terms and
their meaning may vary from the way they are used when referring only to vSphere.
Table 149. Terms and Definitions
Term Definition
Term Definition
The following figure shows the logical design constructs discussed in the previous section as they
would apply to a deployment of vRealize Automation integrated with vSphere in a cross data center
provisioning.
During installation of the vRealize Automation IaaS components, you can configure the proxy agents
and define their associated endpoints. Alternatively, you can configure the proxy agents and define
their associated endpoints separately after the main vRealize Automation installation is complete.
Table 150. Endpoint Design Decisions
SDDC- Create two One vSphere endpoint is required As additional regions are
CMP-030 vSphere to connect to each vCenter Server brought online additional
endpoints. instance in each region. Two vSphere endpoints need to
endpoints will be needed for two be deployed.
regions.
SDDC- Create two Each region has one As additional compute clusters are
CMP-032 compute compute cluster, one created, they need to be added to the
resources. compute resource is existing compute resource in their
required for each cluster. region or to a new resource, which
has to be created.
Note By default, compute resources are provisioned to the root of the compute cluster. If desired,
compute resources can be configured to provision to a specific resource pool. This design
does not use resource pools.
Fabric Groups
A fabric group is a logical container of several compute resources, and can be managed by fabric
administrators.
SDDC- Create a fabric group for To enable region specific As additional clusters
CMP-033 each region and include all provisioning a fabric are added in a region,
the compute resources and group in each region must they must be added to
edge resources in that be created. the fabric group.
region.
Business Groups
A Business group is a collection of machine consumers, often corresponding to a line of business,
department, or other organizational unit. To request machines, a vRealize Automation user must
belong to at least one Business group. Each group has access to a set of local blueprints used to
request machines.
Business groups have the following characteristics:
A group must have at least one business group manager, who maintains blueprints for the group
and approves machine requests.
Groups can contain support users, who can request and manage machines on behalf of other
group members.
A vRealize Automation user can be a member of more than one Business group, and can have
different roles in each group.
Table 153. Business Group Design Decision
SDDC- Create two business Creating two groups, one for Creating more
CMP-034 groups, one for each type of user, allows business groups
production users and one different permissions and results in more
for the development access to be applied to each administrative
users. type of user overhead.
Reservations
A reservation is a share of one compute resource's available memory, CPU and storage reserved for
use by a particular fabric group. Each reservation is for one fabric group only but the relationship is
many-to-many. A fabric group might have multiple reservations on one compute resource, or
reservations on multiple compute resources, or both.
Converged Compute/Edge Clusters and Resource Pools
While reservations provide a method to allocate a portion of the cluster memory or storage within
vRA, reservations do not control how CPU and memory is allocated during periods of contention on
the underlying vSphere compute resources. vSphere Resource Pools are utilized to control the
allocation of CPU and memory during time of resource contention on the underlying host. To fully
utilize this, all VMs must be deployed into one of three resource pools – SDDC-EdgeRP01, User-
EdgeRP01, User-VMRP01. Core-NSX is dedicated for datacenter level NSX Edge components and
should not contain any user workloads. User-EdgeRP01 is dedicated for any statically or dynamically
deployed NSX components such as NSX Edges or Load Balancers which serve a specific customer
workload. User-VMRP01 is dedicated for any statically or dynamically deployed virtual machines
such as Windows, Linux, databases, etc, which contain specific customer workloads.
SDDC- Create four Each resource cluster will have Because production and
CMP-035 reservations – two for two reservations, one for development share the
production and two production and one for same compute resources,
for development. development, allowing both the development business
production and development group must be limited to a
workloads to be provisioned. fixed amount of resources.
SDDC- Create two edge An edge reservation in each The workload reservation
CMP-036 reservations – one in region allows NSX to create must define the edge
each region. edge services gateways on reservation in the network
demand and place them on the settings.
edge cluster.
SDDC- All reservations for In order to ensure dedicated Cloud admins must ensure
CMP-037 production and compute resources of NSX all workload reservations
development networking components, end- are configured with the
workloads are user deployed workloads must appropriate resource pool.
configured to utilize a be assigned to a dedicated This may be a single
dedicated vCenter end-user workload vCenter resource pool for both
Resource Pools. Resource Pools. Workloads production and
provisioned at the root development workloads, or
resource pool level will receive two resource pools, one
more resources then resource dedicated for the
pools, which would starve Development Business
those virtual machines in Group and one dedicated
contention situations. for the Production Business
Group.
SDDC- All reservations for In order to ensure dedicated Cloud admins must ensure
CMP-038 dynamically compute resources of NSX all workload reservations
provisioned NSX networking components, end- are configured with the
Edge components user deployed NSX edge appropriate resource pool.
(routed gateway) are components must be assigned
configured to utilize a to a dedicated end-user
dedicated vCenter network component vCenter
Resource Pools. Resource Pool. Workloads
provisioned at the root
resource pool level will receive
more resources then resource
pools, which would starve
those virtual machines in
contention situations.
SDDC- All vCenter Resource Nesting of resource pools can All resource pools must be
CMP-039 Pools to be used for create administratively complex created at the root
Edge or Compute resource calculations that may resource pool level.
workloads must be result in unintended under or
created at the "root" over allocation of resources
level. Nesting of during contention situations.
resource pools is not
recommended.
Reservation Policies
You can add each virtual reservation to one reservation policy. The reservation from which a
particular virtual machine is provisioned is determined by vRealize Automation based on the
reservation policy specified in the blueprint (if any), the priorities and current usage of the fabric
group's reservations, and other custom properties.
SDDC- Create four workload Two reservation policies are As more groups are
CMP-040 reservation policies for required in each region, one created reservation
production and for production and the other policies for those
development blueprints. for development. groups must be
created.
A storage reservation policy is a set of datastores that can be assigned to a machine blueprint to
restrict disk provisioning to only those datastores. Storage reservation policies are created and
associated with the appropriate datastores and assigned to reservations.
Table 156. Storage Reservation Policy Design Decisions
SDDC- This design does The underlying physical Both business groups will
CMP-042 not use storage storage design does not use have access to the same
tiers. storage tiers. storage.
Template Synchronization
This dual-region design supports provisioning workloads across regions from the same portal using
the same single-machine blueprints. A synchronization mechanism is required to have consistent
templates across regions. There are multiple ways to achieve synchronization, for example, vSphere
Content Library or external services like vCloud Connector or vSphere Replication.
SDDC- This design uses The vSphere Content Library is Storage space
CMP-043 vSphere Content Library built into the version of vSphere must be
to synchronize templates being used and meets all the provisioned in
across regions. requirements to synchronize each region.
templates.
When users select this blueprint, vRealize Automation clones a vSphere virtual machine template with
preconfigured vCenter customizations.
Figure 56. Template Synchronization
SDDC- Choose Active Directory Rainpole uses a single-forest, Requires that the
CMP-044 with Integrated Windows multiple-domain Active vRealize Automation
Authentication as the Directory environment. appliances are joined
Directory Service to the Active Directory
Integrated Windows
connection option. domain.
Authentication supports
establishing trust relationships
in a multi-domain or multi-
forest Active Directory
environment.
By default, the vRealize Automation appliance is initially configured with 18 GB of memory, which is
enough to support a small Active Directory environment. An Active Directory environment is
considered small if it fewer than 25,000 users in the OU have to be synced. An Active Directory
environment with more than 25,000 users is considered large and needs additional memory and CPU.
See the vRealize Automation sizing guidelines for details.
Table 159. vRealize Automation Appliance Sizing Decision
SDDC- Leave the vRealize This design's Active Customers should consider
CMP-045 Automation Directory environment expanding the memory allocation
appliance default contains no more than for the vRealize Automation
memory allocation 25,000 users and appliance based on the size of
(18 GB). groups. their actual Active Directory
environment.
The connector is a component of the vRealize Automation service and performs the synchronization
of users and groups between Active Directory and the vRealize Automation service. In addition, the
connector is the default identity provider and authenticates users to the service.
Table 160. Connector Configuration Decision
In this VMware Validated Design, vRealize Administration uses the vRealize Orchestrator Plug-In to
connect to vCenter Server for compute resource allocation.
SDDC- Deploy all vRealize The vRealize Orchestrator Resources should not
CMP- Orchestrator instances appliance requires the be reduced as the
VRO-01 required within the SDDC appropriate resources to vRealize Orchestrator
solution with 2 CPUs, 4 enable connectivity to Appliance requires this
GB memory, and 12 GB vRealize Automation via the for scalability.
of hard disk. vRealize Orchestrator Plugin.
SDDC- Configure all vRealize LDAP is being depreciated. This design does not
CMP- Orchestrator instances Supports existing design setup support local
VRO-02 within the SDDC to use utilizing Active Directory authentication for
vRealize Automation services. vRealize Orchestrator.
authentication.
LDAP 389 TCP vRealize LDAP server Lookup port of your LDAP
Orchestrator authentication server.
server
LDAP using 636 TCP vRealize LDAP server Lookup port of your secure LDAP
SSL Orchestrator authentication server.
server
LDAP using 3268 TCP vRealize Global Port to which Microsoft Global
Global Catalog Orchestrator Catalog Catalog server queries are
server server directed.
SQL Server 1433 TCP vRealize Microsoft Port used to communicate with
Orchestrator SQL server the Microsoft SQL Server or SQL
server Server Express instances that
are configured as the vRealize
Orchestrator database.
SMTP Server 25 TCP vRealize SMTP Server Port used for email notifications.
port Orchestrator
server
vCenter Server 443 TCP vRealize VMware The vCenter Server API
API port Orchestrator vCenter communication port used by
server server vRealize Orchestrator to obtain
virtual infrastructure and virtual
machine information from the
orchestrated vCenter Server
instances.
VMware ESXi 443 TCP vRealize ESXi hosts (Optional) Workflows using the
Orchestrator vCenter Guest Operations API
server need direct connection between
vRealize Orchestrator and the
ESXi hosts the VM is running on.
The vRealize Orchestrator appliance using Linux comes preconfigured, enabling fast deployment. In
contrast to a vRealize Orchestrator installations using Microsoft Windows as the operating system,
the Linux-based appliance does not incur Microsoft licensing costs.
The vRealize Orchestrator appliance package is distributed with preinstalled software contains the
following software components.
SUSE Linux Enterprise Server 11 SP3 for VMware, 64-bit edition
PostgreSQL
OpenLDAP
vRealize Orchestrator
Table 166. vRealize Orchestrator Platform Design Decision
SDDC- Install and configure the vRealize Orchestrator will not Enables disaster
CMP- multi-node plug-in your support disaster recovery recovery and multisite
VRO-08 multisite implementation to without the implementation for implementations
provide disaster recovery the multinode plug-in. within vRealize
capability through vRealize Orchestrator
Orchestrator content
replication.
SDDC- Deploy all vRealize Orchestrator In this design, only the None
CMP-VRO- instances required by the SDDC vRealize Automation
10 solution within the same cluster as the component consumes
vRealize Automation instances vRealize Orchestrator.
(management cluster).
The following tables outline characteristics for this vRealize Orchestrator design.
Table 170. Service Monitors Characteristics
Receive
Monitor Interval Timeout Retries Type Send String
String
Monitor
Pool Name Algorithm Monitors Members Port
Port
SDDC- For user who require You must first install the Any additional
CMP- frequent developer or vRealize Orchestrator instances of the
VRO-12 administrative access, install Integration Client from the vRealize Orchestrator
the vRealize Orchestrator vRealize Orchestrator Client or Client
Client required within the Virtual appliance. Casual Integration Plug-in can
SDDC solution from the users of vRealize be installed according
vRealize Orchestrator Orchestrator may utilize the to your needs.
appliance. Java WebStart client.
SDDC- Configure the vRealize The SDDC design is already using an None
CMP- Orchestrator appliance external MSSQL database for other
VRO-15 required within the SDDC components. Database support currently
solution to use an includes MSSQL or Oracle. For all
external MSSQL supported versions of databases see the
database. VMware Product Interoperability Matrix.
trigger events in vRealize Orchestrator, and in the plugged-in technology. Plug-ins provide an
inventory of JavaScript objects that you can access on the vRealize Orchestrator Inventory tab. Each
plug-in can provide one or more packages of workflows and actions that you can run on the objects in
the inventory to automate the typical use cases of the integrated product.
vRealize Orchestrator and the vCenter Server Plug-In
You can use the vCenter Server plug-in to manage multiple vCenter Server instances. You can create
workflows that use the vCenter Server plug-in API to automate tasks in your vCenter Server
environment. The vCenter Server plug-in maps the vCenter Server API to the JavaScript that you can
use in workflows. The plug-in also provides actions that perform individual vCenter Server tasks that
you can include in workflows.
The vCenter Server plug-in provides a library of standard workflows that automate vCenter Server
operations. For example, you can run workflows that create, clone, migrate, or delete virtual
machines. Before managing the objects in your VMware vSphere inventory by using vRealize
Orchestrator and to run workflows on the objects, you must configure the vCenter Server plug-in and
define the connection parameters between vRealize Orchestrator and the vCenter Server instances
you want to orchestrate. You can configure the vCenter Server plug-in by using the vRealize
Orchestrator configuration interface or by running the vCenter Server configuration workflows from the
vRealize Orchestrator client. You can configure vRealize Orchestrator to connect to your vCenter
Server instances for running workflows over the objects in your vSphere infrastructure.
To manage the objects in your vSphere inventory using the vSphere Web Client, configure vRealize
Orchestrator to work with the same vCenter Single Sign-On instance to which both vCenter Server
and vSphere Web Client are pointing. Also verify that vRealize Orchestrator is registered as a vCenter
Server extension. You register vRealize Orchestrator as a vCenter Server extension when you specify
a user (user name and password) who has the privileges to manage vCenter Server extensions.
Table 178. vRealize Orchestrator vCenter Server Plug-In Design Decisions
To manage the objects in your vSphere inventory using the vSphere Web Client, configure vRealize
Orchestrator to work with the same vCenter Single Sign-On instance to which both vCenter Server
and vSphere Web Client are pointing. Also verify that vRealize Orchestrator is registered as a vCenter
Server extension. You register vRealize Orchestrator as a vCenter Server extension when you specify
a user (user name and password) who has the privileges to manage vCenter Server extensions.
Scale Out
You can use the vCenter Server plug-in to manage multiple vCenter Server instances. You can create
workflows that use the vCenter Server plug-in API to automate tasks in your vCenter Server
environment. The vCenter Server plug-in maps the vCenter Server API to the JavaScript that you can
use in workflows. The plug-in also provides actions that perform individual vCenter Server tasks that
you can include in workflows.
The vCenter Server plug-in provides a library of standard workflows that automate vCenter Server
operations. For example, you can run workflows that create, clone, migrate, or delete virtual
machines. Before managing the objects in your VMware vSphere inventory by using vRealize
Orchestrator and to run workflows on the objects, you must configure the vCenter Server plug-in and
define the connection parameters between vRealize Orchestrator and the vCenter Server instances
you want to orchestrate. You can configure the vCenter Server plug-in by using the vRealize
Orchestrator configuration interface or by running the vCenter Server configuration workflows from the
vRealize Orchestrator client. You can configure vRealize Orchestrator to connect to your vCenter
Server instances for running workflows over the objects in your vSphere infrastructure.
To manage the objects in your vSphere inventory using the vSphere Web Client, configure vRealize
Orchestrator to work with the same vCenter Single Sign-On instance to which both vCenter Server
and vSphere Web Client are pointing. Also verify that vRealize Orchestrator is registered as a vCenter
Server extension. You register vRealize Orchestrator as a vCenter Server extension when you specify
a user (user name and password) who has the privileges to manage vCenter Server extensions.
An active-active cluster with up to five active nodes. VMware recommends a maximum of three
active nodes in this configuration.
An active-passive cluster with only one active node, and up to seven standby nodes.
In a clustered vRealize Orchestrator environment you cannot change workflows while other vRealize
Orchestrator instances are running. Stop all other vRealize Orchestrator instances before you connect
the vRealize Orchestrator client and change or develop a new workflow.
You can scale out a vRealize Orchestrator environment by having multiple independent vRealize
Orchestrator instances (each with their own database instance). This option allows you to increase
the number of managed inventory objects. You can use the vRealize Orchestrator Multinode plug-in
to replicate the vRealize Orchestrator content, and to start and monitor workflow executions.
Table 65. vRealize Orchestrator Active-Passive Design Decision
Decision Design Decision Design Justification Design
ID Implication
vRealize Log Insight collects log events from the following virtual infrastructure and cloud
management components:
Management vCenter Server
o Platform Services Controller
o vCenter Server
Compute vCenter Server
o Platform Services Controller
o vCenter Server
Management, shared edge and compute ESXi hosts
NSX for vSphere for the management and for the shared compute and edge clusters
o NSX Manager
o NSX Controller instances
o NSX Edge instances
vRealize Automation
o vRealize Orchestrator
o vRealize Automation components
vRealize Operations Manager
o Analytics cluster nodes
SDDC- Deploy vRealize Log Insight in Provides high availability. Using You must size
OPS-LOG- a cluster configuration of 3 the integrated load balancer each node
001 nodes with an integrated load simplifies the Log Insight identically.
balancer: one master and two deployment, and prevents a
worker nodes. single point of failure.
3.4.1.4 Sizing
By default, a vRealize Log Insight virtual appliance has 2 vCPUs, 4 GB of virtual memory, and 144
GB of disk space provisioned. vRealize Log Insight uses 100 GB of the disk space to store raw, index
and metadata.
Sizing Nodes
To accommodate all of log data from the products in the SDDC, you must size the Log Insight nodes
properly.
Table 180. Compute Resources for a vRealize Log Insight Medium-Size Node
Attribute Specification
Number of CPUs 8
Memory 16 GB
Sizing Storage
Sizing is based on IT organization requirements, but assuming that you want to retain 7 days of data,
you can use the following calculations.
For 250 syslog sources at a rate of 150 MB of logs ingested per-day per-source over 7 days:
250 sources * 150 MB of log data ≈ 37 GB log data per-day
37 GB * 7 days ≈ 260 GB log data per vRealize Log Insight node
260 GB * 1.7 overhead index ≈ 450 GB
Based on this example, you must provide 270 GB of storage space per node when you deploy the
medium-size vRealize Log Insight virtual appliance. You must add additional space of approximately
190 GB.
Note vRealize Log Insight supports virtual hard disks of up to 2 TB. If more capacity is needed, add
another virtual hard disk. Do not extend existing retention virtual disks.
Table 181. Compute Resources for the vRealize Log Insight Nodes Design Decision
SDDC- Deploy vRealize Accommodates the You must increase the size of
OPS-LOG- Log Insight nodes number of expected the nodes if you configure Log
002 of medium size. syslog connections. Insight to monitor additional
syslog sources.
SDDC- Add (190 GB)* Used to ensure 7 days of Additional storage space is
OPS-LOG- additional storage data retention. required.
003 per node.
Figure 60. Networking Design for the vRealize Log Insight Deployment
SDDC-OPS- Deploy vRealize Log Insight on Secures the vRealize Log None
LOG-004 the shared management region Insight instances.
VXLAN.
Provides a consistent
deployment model for
management applications.
3.4.1.5.2 IP Subnets
You can allocate the following example subnets to the vRealize Log Insight deployment:
Table 183. IP Subnets in the Application Isolated Networks
Region A 192.168.31.0/24
Region B 192.168.32.0/24
SDDC- Configure forward and All nodes are You must manually provide a
OPS- reverse DNS records for accessible by using DNS record for each node and
LOG-006 all vRealize Log Insight fully-qualified domain VIP.
nodes and VIPs. names instead of by
using IP addresses
only.
SDDC- For all applications that Support logging when If vRealize Automation and
OPS- failover between regions not all management vRealize Operations Manager
LOG-007 (such as vRealize applications are failed are failed over to Region B
Automation and vRealize over to Region B. For and the vRealize Log Insight
Operations Manager), use example, only one cluster is no longer available
the FQDN of the vRealize application is moved to in Region A, update the A
Log Insight Region A VIP Region B. record on the child DNS
when you configure server to point to the vRealize
logging. Log Insight cluster in Region
B.
Hard disk 4 190 GB Storage for collected logs. The capacity from
(additional virtual this disk is added to /storage/core.
disk)
Calculate the storage space that is available for log data using the following equation:
/storage/core = hard disk 2 space + hard disk 4 space - system logs space
on hard disk 2
Based on the size of the default and additional virtual disks, the storage core is equal to 440 GB.
/storage/core = 270 GB + 190 GB - 20 GB = 440 GB
Retention = /storage/core – 3% * /storage/core
If /storage/core is 425 GB, vRealize Log Insight can use 413 GB for retention.
Retention = 440 GB - 3% * 440 ≈ 427 GB
Configure a retention period of 7 days for the medium-size vRealize Log Insight appliance.
Table 188. Retention Period Design Decision
SDDC-OPS- Configure vRealize Accommodates logs from 750 You must add a
LOG-008 Log Insight to retain syslog sources (250 per node) as VMDK to each
data for 7 days. per the SDDC design. appliance.
Archiving
vRealize Log Insight archives log messages as soon as possible. At the same time, they remain
retained on the virtual appliance until the free local space is almost filled. Data exists on both the
vRealize Log Insight appliance and the archive location for most of the retention period. The archiving
period must be longer than the retention period.
The archive location must be on an NFS version 3 shared storage. The archive location must be
available and must have enough capacity to accommodate the archives.
Apply an archive policy of 90 days for the medium-size vRealize Log Insight appliance. The vRealize
Log Insight appliance will use about 1 TB of shared storage. According to the business compliance
regulations of your organization, these sizes might change.
Table 189. Log Archive Policy Design Decision
SDDC- Provide 1 TB of NFS Archives logs from You must provide an NFS version
OPS-LOG- version 3 shared storage 750 syslog 3 shared storage in addition to the
009 to each vRealize Log sources. data storage for the vRealize Log
Insight cluster. Insight cluster.
You must enforce the archive
policy directly on the shared
storage.
3.4.1.7 Alerting
vRealize Log Insight supports alerts that trigger notifications about its health. The following types of
alerts exist in vRealize Log Insight:
System Alerts. vRealize Log Insight generates notifications when an important system event
occurs, for example when the disk space is almost exhausted and vRealize Log Insight must start
deleting or archiving old log files.
Content Pack Alerts. Content packs contain default alerts that can be configured to send
notifications, these alerts are specific to the content pack and are disabled by default.
User-Defined Alerts. Administrators and users can define their own alerts based on data
ingested by vRealize Log Insight.
vRealize Log Insight handles alerts in two ways:
Send an e-mail over SMTP
Send to vRealize Operations Manager
SDDC- Use Active Provides fine-grained role and You must provide access
OPS-LOG- Directory for privilege-based access for to the Active Directory
012 authentication. administrator and operator from all Log Insight nodes.
roles.
Encryption
Replace default self-signed certificates with a CA-signed certificate to provide secure access to the
vRealize Log Insight Web user interface.
Table 193. Custom Certificates Design Decision
SDDC- Configure syslog sources Simplifies the design You must configure
OPS-LOG- to send log data directly to implementation for log syslog sources to
014 vRealize Log Insight. sources that are syslog forward logs to the
capable. vRealize Log Insight
VIP.
SDDC- Configure the vRealize Log Windows does not natively You must manually
OPS-LOG- Insight agent for the support syslog. install and configure
015 vRealize Automation the agent.
vRealize Automation
Windows servers and Linux
requires the use of the
appliances.
agents to collect all
vRealize Automation logs.
vRealize Log Insight receives log data over the syslog TCP, syslog TLS/SSL, or syslog UDP
protocols. Use the default syslog UDP protocol because security is already designed at the level of
the management network.
Table 196. Syslog Protocol Design Decision
SDDC- Communicate with the syslog Using the default syslog If the network
OPS-LOG- clients, such as ESXi, vCenter port simplifies connection is
017 Server, NSX for vSphere, on the configuration for all interrupted, the
default UDP syslog port. syslog sources. syslog traffic is lost.
UDP syslog traffic is
not secure.
SDDC- Forward log event Using the forwarding protocol You must configure
OPS-LOG- to the other region supports structured and each region to
018 by using the unstructured data, provides client- forward log data to
Ingestion API. side compression, and event the other.
throttling.
o vCenter Server
Management, Edge and Compute ESXi hosts
NSX for vSphere for the management and compute clusters
o NSX Manager
o NSX Controller Instances
o NSX Edge instances
vRealize Automation
o vRealize Orchestrator
vRealize Automation Components
vRealize Log Insight
vRealize Operations Manager (Self Health Monitoring)
SDDC-OPS- Deploy vRealize Operations Manager as Enables scale-out Each node must
MON-001 a cluster of 4 nodes: one master, one and high availability. be sized
master replica and two data nodes. identically.
Attribute Specification
vCPU 8
Memory 32 GB
(*) Metric numbers reflect the total number of metrics that are collected from all adapter instances in
vRealize Operations Manager. To get this number, you can go to the Cluster Management page in
vRealize Operations Manager, and view the adapter instances of each node at the bottom of the
page. You can get the number of metrics collected by each adapter instance. The sum of these
metrics is what is estimated in this sheet.
Note The number shown in the overall metrics on the Cluster Management page reflects the
metrics that are collected from different data sources and the metrics that vRealize
Operations Manager creates.
(**) Note the reduction in maximum metrics to permit some head room.
SDDC-OPS- Deploy each node in the Provides the scale You must utilize 32 vCPUs and
MON-002 analytics cluster as a required to monitor the 128 GB of memory in the
medium-size appliance. SDDC. management cluster.
Attribute Specification
vCPU 2
Attribute Specification
Memory 4 GB
Maximum number of End Point Operations Management Agents per Node 250
* The object limit for the remote collector is based on the VMware vCenter adapter.
Table 202. Compute Resources of the Remote Collector Nodes Design Decisions
SDDC- Deploy two remote Removes the load from the When configuring the
OPS-MON- collector nodes per analytics cluster from collecting monitoring of a solution, you
003 region. metrics from applications that do must assign a collector
not fail over between regions. group.
SDDC- Deploy the standard- Enables metric collection for the You must provide 4 vCPUs
OPS-MON- size remote collector expected number of objects in the and 8 GB of memory in the
004 virtual appliances. SDDC. management cluster in each
region.
SDDC- Provide a 1 TB VMDK for Provides enough You must add the 1 TB disk
OPS-MON- each analytics node storage to meet the manually while the virtual
005 (master, master replica, and SDDC design. machine for the analytics node
two data nodes). is powered off.
For more information about the networking configuration of the application isolated network, see
Software-Defined Networking Design and NSX Design.
Table 205. vRealize Operations Manager Isolated Network Design Decision
3.4.2.7.2 IP Subnets
You can allocate the following example subnets for each cluster in the vRealize Operations Manager
deployment:
Table 206. IP Subnets in the Application Virtual Network of vRealize Operations Manager
Analytics cluster in Region A (also valid for Region B for failover) 192.168.11.0/24
SDDC- Allocate separate Placing the remote collectors on their own None.
OPS-MON- subnets for each subnet enables them to communicate
008 application isolated with the analytics cluster and not be a
network. part of the failover group.
SDDC- Place the analytics Enables balanced access of tenants You must manually
OPS-MON- cluster behind an and users to the analytics services configure the NSX Edge
009 NSX load balancer. with the load being spread evenly devices to provide load
across the cluster. balancing services.
Encryption
Access to all vRealize Operations Manager Web interfaces requires an SSL connection. By default,
vRealize Operations Manager uses a self-signed certificate. Replace default self-signed certificates
with a CA-signed certificate to provide secure access to the vRealize Operations Manager user
interface.
Table 211. Using CA-Signed Certificates Design Decision
Log Insight log event. The infrastructure on which vRealize Operations Manager is running has
low-level issues. You can also use the log events for root cause analysis.
Custom dashboard. vRealize Operations Manager can show super metrics for data center
monitoring, capacity trends and single pane of glass overview.
Table 212. Monitoring vRealize Operations Manager Design Decisions
SDDC- Solutions that failover Provides monitoring for Adds minimal additional
OPS-MON- between sites will use the all components during a load to the analytics
017 default cluster group as the failover. cluster
collector.
stores all the data that is required to recover services according to a Recovery Point Objective
(RPO). Determine the target location and make sure that it meets performance requirements.
Table 215. Options for Backup Storage Location
Store production You do not have to request a new You cannot recover your
and backup data on storage configuration from the data if the destination
the same storage storage team. datastore or the production
platform. storage is unrecoverable.
You can take full advantage of
vSphere capabilities.
SDDC- Allocate a dedicated NFS vSphere Data Protection emergency You must
OPS-BKP- datastore for the vSphere restore operations are possible even provide an
002 Data Protection appliance when the primary VMware Virtual SAN external NFS
and the backup data in datastore is not available because the storage array.
each region according to vSphere Data Protection storage volume
Physical Storage Design. is separate from the primary Virtual SAN
datastore.
The amount of storage required for
backups is greater than the amount of
storage available in the Virtual SAN
datastore.
3.4.3.4 Performance
vSphere Data Protection generates a significant amount of I/O operations, especially when
performing multiple concurrent backups. The storage platform must be able to handle this I/O. If the
storage platform does not meet the performance requirements, it might miss backup windows.
Backup failures and error messages might occur. Run the vSphere Data Protection performance
analysis feature during virtual appliance deployment or after deployment to assess performance.
Table 217. vSphere Data Protection Performance
1 TB 611 Mbps
2 TB 1223 Mbps
2 TB 6 GB
4 TB 8 GB
6 TB 10 GB
8 TB 12 GB
Decision Design
Design Justification Design Implication
ID Decision
Note The default Virtual SAN storage policy includes Number Of Failures To Tolerate =
1, which means that virtual machine data will be mirrored.
vSphere Data Protection is used to restore virtual machines that failed or need their data reverted to a
previous state.
Decision
Design Decision Design Justification Design Implication
ID
Warning Do not perform any backup or other administrative activities during the vSphere Data
Protection maintenance window. You can only perform restore operations. By default, the
vSphere Data Protection maintenance window begins at 8 PM local server time and
continues uninterrupted until 8 AM or until the backup jobs are complete. Configure
maintenance windows according to IT organizational policy requirements.
© 2016 VMware, Inc. All rights reserved.
Page 209 of 220
VMware Validated Design Reference Architecture Guide
Decision Design
Design Justification Design Implication
ID Decision
SDDC- Allows for the recovery of virtual Data that changed since the
Schedule daily
OPS-BKP- machines data that is at most a day last backup, 24 hours ago, is
backups.
006 old lost.
Decision Design
Design Justification Design Implication
ID Decision
Decision
Design Decision Design Justification Design Implication
ID
SDDC- Use the internal Restoring small configuration files An FTP server is
OPS-BKP- configuration backup can be a faster and less destructive required for the NSX
009 features within VMware method to achieve a similar configuration backup.
NSX. restoration of functionality.
Compute Job
comp01vc01.sfo01.rainpole.local
comp01psc01.sfo01.rainpole.local
Management Job
NSX for vSphere
mgmt01nsxm01.sfo01.rainpole.local
Compute Job
comp01nsxm01.sfo01.rainpole.local
vRealize vrops-mstrn-01.rainpole.local
Operations
vrops-repln-02.rainpole.local
Manager
vrops-datan-03.rainpole.local
vrops-datan-04.rainpole.local
vrops-rmtcol-
01.sfo01.rainpole.local
vrops-rmtcol-
02.sfo01.rainpole.local
Compute Job
comp01vc51.lax01.rainpole.local
comp01psc51.lax01.rainpole.local
Management Job
NSX for
vSphere
mgmt01nsxm51.lax01.rainpole.local
Compute Job
comp01nsxm51.lax01.rainpole.local
vRealize vra01ias51.lax01.rainpole.local
Automation vra01ias52.lax01.rainpole.local
vra01buc51.lax01.rainpole.local
vRealize vrli-mstr-51.lax01.rainpole.local
Log Insight
vrli-wrkr-51.lax01.rainpole.local
vrli-wrkr-52.lax01.rainpole.local
vRealize vrops-rmtcol-51.lax01.rainpole.local
Operations
vrops-rmtcol-52.lax01.rainpole.local
Manager
Note A region in the VMware Validated Design is equivalent to the site construct in Site Recovery
Manager.
Each region has a Site Recovery Manager server with an embedded Site Recovery Manager
database.
In each region, Site Recovery Manager is integrated with the Management vCenter Server
instance.
vSphere Replication provides hypervisor-based virtual machine replication between Region A and
Region B.
vSphere Replication replicates data from Region A to Region B by using a dedicated VMkernel
TCP/IP stack.
Users and administrators access management applications from other branch offices and remote
locations over the corporate Local Area Network (LAN), Wide Area Network (WAN), and Virtual
Private Network (VPN).
Figure 64. Disaster Recovery Logical Design
You have the following options for deployment and pairing of vCenter Server and Site Recovery
Manager:
vCenter Server options
o You can use Site Recovery Manager and vSphere Replication with vCenter Server Appliance
or with vCenter Server for Windows.
o You can deploy a vCenter Server Appliance in one region and a vCenter Server for Windows
instance in the other region.
Site Recovery Manager options
o You can use either a physical system or a virtual system.
o You can deploy Site Recovery Manager on a shared system, such as the system of vCenter
Server for Windows, or on a dedicated system.
Table 226. Design Decisions for Site Recovery Manager and vSphere Replication Deployment
Decision
Design Decision Design Justification Design Implication
ID
segment. The private network can remain unchanged. You only reassign the external load balancer
interface.
On the public network segment, each management application is accessible under one or more
virtual IP (VIP) addresses.
On the isolated application virtual network segment, the virtual machines of each management
application are isolated.
After a failover, the recovered application is available under a different IPv4 address (VIP). The use
of the new IP address requires changes to the DNS records. You can change the DNS records
manually or by using a script in the Site Recovery Manager recovery plan.
Figure 65. Logical Network Design for Cross-Region Deployment with Management
Application Network Containers
The IPv4 subnets (orange networks) are routed within the vSphere management network of each
region. Nodes on these network segments are reachable from within the SDDC. IPv4 subnets, such
as the subnet for the vRealize Automation primary components, overlap across a region. Make sure
that only the active IPv4 subnet is propagated in the region and beyond. The public facing Ext-Mgmt
network of both regions (grey networks) is reachable by SDDC users and provides connection to
external resources, such as Active Directory or DNS. See Virtualization Network Design.
Load balancing functionality is provided by NSX Edge devices, each fronting a network that contains
the protected components of all management applications. In each region, you use the same
configuration for the management applications and their Site Recovery Manager shadow. Active
Directory and DNS services must be running in both the protected and recovery regions.
Decision
Design Decision Design Justification Design Implication
ID
SDDC- Set up a dedicated Ensures that vSphere Replication You must allocate a
OPS-DR- vSphere Replication traffic does not impact other dedicated VLAN for
005 distributed port group. vSphere management traffic. vSphere Replication.
Recovery Manager adds the placeholder virtual machines as recovery region objects to the
Management vCenter Server.
Scripts or commands must be available in the path on the virtual machine according to the following
guidelines:
Use full paths to all executables. For example c:\windows\system32\cmd.exe instead of
cmd.exe.
Call only.exe or .com files from the scripts. Command-line scripts can call only executables.
The scripts that are run after powering on a virtual machine are executed under the Local Security
Authority of the Site Recovery Manager server. Store post-power-on scripts on the Site Recovery
Manager virtual machine. Do not store such scripts on a remote network share.
Decision Design
Design Justification Design Implication
ID Decision