0% found this document useful (0 votes)
35 views220 pages

Vmware Validated Design 30 SDDC Reference Architecture

The VMware Validated Design Reference Architecture Guide provides comprehensive documentation for the Software-Defined Data Center 3.0, detailing architecture and design principles for physical and virtual infrastructure, cloud management, and operations. It includes extensive tables and design decisions to support various components and configurations. The document is intended for users seeking guidance on implementing VMware solutions effectively.

Uploaded by

ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views220 pages

Vmware Validated Design 30 SDDC Reference Architecture

The VMware Validated Design Reference Architecture Guide provides comprehensive documentation for the Software-Defined Data Center 3.0, detailing architecture and design principles for physical and virtual infrastructure, cloud management, and operations. It includes extensive tables and design decisions to support various components and configurations. The document is intended for users seeking guidance on implementing VMware solutions effectively.

Uploaded by

ahmed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 220

VMware Validated Design™

Reference Architecture Guide

VMware Validated Design for Software-


Defined Data Center 3.0

This document supports the version of each product listed


and supports all subsequent versions until the document is
replaced by a new edition. To check for more recent
editions of this document, see
http://www.vmware.com/support/pubs.

EN-002234-00
VMware Validated Design Reference Architecture Guide

The VMware Web site also provides the latest product updates.
If you have comments about this documentation, submit your feedback to:
[email protected]

© 2016 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright
and intellectual property laws. This product is covered by one or more patents listed at
http://www.vmware.com/download/patents.html.
VMware is a registered trademark or trademark of VMware, Inc. in the United States and/or other
jurisdictions. All other marks and names mentioned herein may be trademarks of their respective
companies.

VMware, Inc.
3401 Hillview Avenue
Palo Alto, CA 94304
www.vmware.com

© 2016 VMware, Inc. All rights reserved.


Page 2 of 220
VMware Validated Design Reference Architecture Guide

Contents

1. Purpose and Intended Audience .................................................... 12


2. Architecture Overview .................................................................... 13
2.1 Physical Infrastructure Architecture ............................................................................. 14
2.1.1 Pod Architecture .................................................................................................................... 14
2.1.2 Physical Network Architecture ............................................................................................... 16
2.1.3 Availability Zones and Regions ............................................................................................. 22
2.2 Virtual Infrastructure Architecture ................................................................................ 24
2.2.1 Infrastructure Architecture ..................................................................................................... 24
2.2.2 Virtual Infrastructure Overview .............................................................................................. 24
2.2.3 Network Virtualization Architecture ........................................................................................ 26
2.3 Cloud Management Architecture ................................................................................. 31
2.3.1 Cloud Management Platform Architecture ............................................................................. 31
2.3.2 Logical Architecture of the Cloud Management Platform ...................................................... 33
2.4 Operations Architecture Overview ............................................................................... 37
2.4.1 Backup Architecture .............................................................................................................. 37
2.4.2 Disaster Recovery Architecture ............................................................................................. 38
2.4.3 Logging Architecture ............................................................................................................. 39
2.4.4 Operations Management Architecture ................................................................................... 42

3. Detailed Design.............................................................................. 45
3.1 Physical Infrastructure Design ..................................................................................... 45
3.1.1 Physical Design Fundamentals ............................................................................................. 46
3.1.2 Physical Networking Design .................................................................................................. 52
3.1.3 Physical Storage Design ....................................................................................................... 61
3.2 Virtual Infrastructure Design ........................................................................................ 70
3.2.1 Virtual Infrastructure Design Overview .................................................................................. 70
3.2.2 ESXi Design .......................................................................................................................... 73
3.2.3 vCenter Server Design .......................................................................................................... 75
3.2.4 Virtualization Network Design................................................................................................ 89
3.2.5 NSX Design ......................................................................................................................... 104
3.2.6 Shared Storage Design ....................................................................................................... 126
3.3 Cloud Management Platform Design ......................................................................... 145
3.3.1 vRealize Automation Design ............................................................................................... 145
3.3.2 vRealize Orchestrator Design.............................................................................................. 175
3.4 Operations Infrastructure Design ............................................................................... 185
3.4.1 vRealize Log Insight Design ................................................................................................ 186
3.4.2 vRealize Operations Manager Design ................................................................................. 196
3.4.3 vSphere Data Protection Design ......................................................................................... 206
3.4.4 Site Recovery Manager and vSphere Replication Design ................................................... 213

© 2016 VMware, Inc. All rights reserved.


Page 3 of 220
VMware Validated Design Reference Architecture Guide

List of Tables
Table 1. Elements and Components of the Cloud Management Platform ........................................... 32
Table 2. Characteristics of the Cloud Management Platform Architecture ........................................... 33
Table 3. Cloud Management Platform Elements .................................................................................. 34
Table 4. vRealize Operations Manager Logical Node Architecture ...................................................... 44
Table 5. Regions ................................................................................................................................... 46
Table 6. Availability Zones and Regions Design Decisions .................................................................. 47
Table 7. Required Number of Racks ..................................................................................................... 48
Table 8. POD and Racks Design Decisions ......................................................................................... 49
Table 9. ESXi Host Design Decisions ................................................................................................... 51
Table 10. Host Memory Design Decision .............................................................................................. 52
Table 11. Jumbo Frames Design Decisions ......................................................................................... 56
Table 12. VLAN Sample IP Ranges ...................................................................................................... 58
Table 13. Physical Network Design Decisions...................................................................................... 60
Table 14. Additional Network Design Decisions ................................................................................... 61
Table 15. Virtual SAN Physical Storage Design Decision .................................................................... 62
Table 16. Virtual SAN Mode Design Decision ...................................................................................... 62
Table 17. Hybrid and All-Flash Virtual SAN Endurance Classes ......................................................... 64
Table 18. SSD Endurance Class Design Decisions ............................................................................. 64
Table 19. SSD Performance Classes ................................................................................................... 65
Table 20. SSD Performance Class Selection ....................................................................................... 65
Table 21. SSD Performance Class Design Decisions .......................................................................... 66
Table 22. Virtual SAN HDD Environmental Characteristics .................................................................. 66
Table 23. HDD Characteristic Selection ............................................................................................... 67
Table 24. HDD Selection Design Decisions.......................................................................................... 67
Table 25. NFS Usage Design Decisions ............................................................................................... 68
Table 26. NFS Hardware Design Decision ........................................................................................... 69
Table 27. Volume Assignment Design Decisions ................................................................................. 69
Table 28. ESXi Boot Disk Design Decision ........................................................................................... 74
Table 29. ESXi User Access Design Decisions .................................................................................... 75
Table 30. Other ESXi Host Design Decisions ....................................................................................... 75
Table 31. vCenter Server Design Decision ........................................................................................... 76
Table 32. vCenter Server Platform Design Decisions .......................................................................... 77
Table 33. Platform Service Controller Design Decisions ...................................................................... 77
Table 34. Methods for Protecting vCenter Server System and the vCenter Server Appliance ............ 78
Table 35. vCenter Server Systems Protection Design Decisions ......................................................... 79
Table 36. Logical Specification for Management vCenter Server Appliance ........................................ 79
Table 37. Logical Specification for Compute vCenter Server Appliance .............................................. 79

© 2016 VMware, Inc. All rights reserved.


Page 4 of 220
VMware Validated Design Reference Architecture Guide

Table 38. vCenter Appliance Sizing Design Decisions ......................................................................... 80


Table 39. vCenter Database Design Decisions .................................................................................... 81
Table 40. vSphere HA Design Decisions .............................................................................................. 82
Table 41. vSphere Cluster Workload Design Decisions ....................................................................... 83
Table 42. Management Cluster Design Decisions ................................................................................ 83
Table 43. Management Cluster Attributes ............................................................................................ 84
Table 44. Edge Cluster Design Decisions ............................................................................................ 85
Table 45. Shared Edge and Compute Cluster Attributes ...................................................................... 86
Table 46. Compute Cluster Design Decisions ...................................................................................... 87
Table 47. Monitor Virtual Machines Design Decisions ......................................................................... 87
Table 48. vSphere Distributed Resource Scheduling Design Decisions .............................................. 88
Table 49. VMware Enhanced vMotion Compatibility Design Decisions ............................................... 88
Table 50. vCenter Server TLS Certificate Design Decisions ................................................................ 88
Table 51. Virtual Switch Design Decisions ........................................................................................... 91
Table 52. Virtual Switch for the Management Cluster .......................................................................... 91
Table 53. vDS-Mgmt Port Group Configuration Settings ...................................................................... 92
Table 54. Management Virtual Switches by Physical/Virtual NIC......................................................... 92
Table 55. Management Virtual Switch Port Groups and VLANs .......................................................... 93
Table 56. Management VMkernel Adapter ........................................................................................... 93
Table 57. Virtual Switch for the shared Edge and Compute Cluster .................................................... 95
Table 58. vDS-Comp01 Port Group Configuration Settings ................................................................. 95
Table 59. Shared Edge and Compute Cluster Virtual Switches by Physical/Virtual NIC ..................... 96
Table 60. Edge Cluster Virtual Switch Port Groups and VLANs........................................................... 96
Table 61. Shared Edge and Compute Cluster VMkernel Adapter ........................................................ 97
Table 62. Virtual Switches for Compute Cluster Hosts ......................................................................... 97
Table 63. vDS-Comp02 Port Group Configuration Settings ................................................................. 98
Table 64. Compute Cluster Virtual Switches by Physical/Virtual NIC .................................................. 98
Table 65. Compute Cluster Virtual Switch Port Groups and VLANs .................................................... 99
Table 66. Compute Cluster VMkernel Adapter ..................................................................................... 99
Table 67. NIC Teaming and Policy ..................................................................................................... 100
Table 68. NIC Teaming Design Decision ............................................................................................ 100
Table 69. Network I/O Control Design Decision ................................................................................. 101
Table 70. VXLAN Design Decisions ................................................................................................... 103
Table 71. NSX for vSphere Design Decision ...................................................................................... 104
Table 72. Consumption Method Design Decisions ............................................................................. 106
Table 73. NSX Controller Design Decision ......................................................................................... 107
Table 74. NSX for vSphere Physical Network Requirements ............................................................. 108
Table 75. Resource Specification of NSX Components ..................................................................... 109
Table 76. NSX Edge Service Gateway Sizing Design Decision ......................................................... 110

© 2016 VMware, Inc. All rights reserved.


Page 5 of 220
VMware Validated Design Reference Architecture Guide

Table 77. vSphere Compute Cluster Split Design Decisions .............................................................. 112
Table 78. VTEP Teaming and Failover Configuration Design Decision ............................................. 114
Table 79. Logical Switch Control Plane Mode Decision ..................................................................... 115
Table 80. Transport Zones Design Decisions ..................................................................................... 116
Table 81. Routing Model Design Decision .......................................................................................... 116
Table 82. Transit Network Design Decision ........................................................................................ 118
Table 83. Tenant Firewall Design Decision ........................................................................................ 118
Table 84. Load Balancer Features of NSX Edge Services Gateway ................................................. 119
Table 85. NSX for vSphere Load Balancer Design Decision.............................................................. 120
Table 86.Virtual to Physical Interface Type Design Decision ............................................................. 120
Table 87. Inter-Site Connectivity Design Decisions ............................................................................ 121
Table 88. Isolated Management Applications Design Decisions ........................................................ 122
Table 89. Portable Management Applications Design Decision ......................................................... 123
Table 90. Application Virtual Network Configuration .......................................................................... 126
Table 91. Network Shared Storage Supported by ESXi Hosts ........................................................... 127
Table 92. vSphere Features Supported by Storage Type .................................................................. 127
Table 93. Storage Type Design Decisions .......................................................................................... 129
Table 94. VAAI Design Decisions ....................................................................................................... 130
Table 95. Virtual Machine Storage Policy Design Decisions .............................................................. 131
Table 96. Storage I/O Control Design Decisions ................................................................................ 131
Table 97. Resource Management Capabilities Available for Datastores ........................................... 132
Table 98. Network Speed Selection .................................................................................................... 135
Table 99. Network Bandwidth Design Decision .................................................................................. 135
Table 100. Virtual Switch Types ......................................................................................................... 136
Table 101. Virtual Switch Design Decisions ....................................................................................... 136
Table 102. Jumbo Frames Design Decision ....................................................................................... 137
Table 103. VLAN Design Decision ...................................................................................................... 137
Table 104. Virtual SAN Datastore Design Decisions .......................................................................... 138
Table 105. Number of Hosts per Cluster ............................................................................................ 138
Table 106. Cluster Size Design Decisions .......................................................................................... 139
Table 107. Number of Disk Groups per Host ...................................................................................... 139
Table 108. Disk Groups per Host Design Decision ............................................................................ 140
Table 109. Virtual SAN Policy Options ............................................................................................... 140
Table 110. Object Policy Defaults ....................................................................................................... 142
Table 111. Policy Design Decision ..................................................................................................... 142
Table 112. NFS Version Design Decision ........................................................................................... 143
Table 113. NFS Export Sizing ............................................................................................................. 143
Table 114. NFS Export Design Decisions ........................................................................................... 144
Table 115. NFS Datastore Design Decision ....................................................................................... 144

© 2016 VMware, Inc. All rights reserved.


Page 6 of 220
VMware Validated Design Reference Architecture Guide

Table 116. vRealize Automation Region Design Decision ................................................................. 146


Table 117. vRealize Automation Virtual Appliance Design Decisions ................................................ 148
Table 118. vRealize Automation Virtual Appliance Resource Requirements per Virtual Machine..... 149
Table 119. vRealize Automation IaaS Web Server Design Decision ................................................. 149
Table 120. vRealize Automation IaaS Web Server Resource Requirements .................................... 149
Table 121. vRealize Automation IaaS Model Manager and DEM Orchestrator Server Design Decision
............................................................................................................................................................ 150
Table 122. vRealize Automation IaaS Model Manager and DEM Orchestrator Server Resource
Requirements per Virtual Machine ...................................................................................................... 150
Table 123. vRealize Automation IaaS DEM Worker Design Decision ................................................ 151
Table 124. vRealize Automation DEM Worker Resource Requirements per Virtual Machine ........... 151
Table 125. vRealize Automation IaaS Agent Server Design Decisions ............................................. 152
Table 126. vRealize Automation IaaS Proxy Agent Resource Requirements per Virtual Machines .. 152
Table 127. Load Balancer Design Decisions ...................................................................................... 152
Table 128. Load Balancer Application Profile Characteristics ............................................................ 153
Table 129. Load Balancer Service Monitoring Characteristics ........................................................... 153
Table 130. Load Balancer Pool Characteristics .................................................................................. 153
Table 131. Virtual Server Characteristics ........................................................................................... 154
Table 132. vRealize Automation SQL Database Design Decisions ................................................... 155
Table 133. vRealize Automation SQL Database Server Resource Requirements per VM ................ 155
Table 134. vRealize Automation PostgreSQL Database Design Decision ......................................... 156
Table 135. Email Design Decision ...................................................................................................... 156
Table 136. vRealize Business for Cloud Standard Design Decision .................................................. 157
Table 137. vRealize Business for Cloud Standard Virtual Appliance Resource Requirements per
Virtual Machine ................................................................................................................................... 158
Table 138. Tenant Design Decisions .................................................................................................. 160
Table 139. Service Catalog Design Decision ...................................................................................... 160
Table 140. Catalog Items – Common Service Catalog Design Decision ........................................... 161
Table 141. Single Machine Blueprints ................................................................................................ 161
Table 142. Base Windows Server Blueprint ....................................................................................... 162
Table 143. Base Windows Blueprint Sizing ........................................................................................ 162
Table 144. Base Linux Server Requirements and Standards ............................................................. 163
Table 145. Base Linux Blueprint Sizing .............................................................................................. 163
Table 146. Base Windows Server with SQL Server Install Requirements and Standards ................. 164
Table 147. Base Windows with SQL Server Blueprint Sizing ............................................................. 164
Table 148. Tenant Branding Decisions ............................................................................................... 164
Table 149. Terms and Definitions ....................................................................................................... 167
Table 150. Endpoint Design Decisions ............................................................................................... 170
Table 151. Compute Resource Design Decision ................................................................................ 170
Table 152. Fabric Group Design Decisions ........................................................................................ 170

© 2016 VMware, Inc. All rights reserved.


Page 7 of 220
VMware Validated Design Reference Architecture Guide

Table 153. Business Group Design Decision ..................................................................................... 171


Table 154. Reservation Design Decisions .......................................................................................... 172
Table 155. Reservation Policy Design Decisions ............................................................................... 173
Table 156. Storage Reservation Policy Design Decisions .................................................................. 173
Table 157. Template Synchronization Design Decision ..................................................................... 173
Table 158. Active Directory Authentication Decision .......................................................................... 174
Table 159. vRealize Automation Appliance Sizing Decision .............................................................. 175
Table 160. Connector Configuration Decision .................................................................................... 175
Table 161. vRealize Orchestrator Hardware Design Decision ........................................................... 176
Table 162. vRealize Orchestrator Directory Service Design Decision ............................................... 176
Table 163. vRealize Orchestrator Default Configuration Ports ........................................................... 177
Table 164. vRealize Orchestrator Default External Communication Ports ......................................... 177
Table 165. vRealize Orchestrator Deployment Decision .................................................................... 179
Table 166. vRealize Orchestrator Platform Design Decision .............................................................. 179
Table 167. vRealize Orchestrator Topology Design Decisions .......................................................... 179
Table 168. vRealize Orchestrator Server Mode Design Decision ...................................................... 180
Table 169. vRealize Orchestrator SDDC Cluster Design Decision .................................................... 181
Table 170. Service Monitors Characteristics ...................................................................................... 181
Table 171. Pool Characteristics .......................................................................................................... 181
Table 172. Virtual Server Characteristics ........................................................................................... 181
Table 173. vRealize Orchestrator Appliance Network Settings and Naming Conventions Design
Decision............................................................................................................................................... 181
Table 174. vRealize Orchestrator Client Design Decision .................................................................. 182
Table 175. vRealize Orchestrator External Database Design Decision ............................................. 182
Table 176. vRealize Orchestrator SSL Design Decision .................................................................... 183
Table 177. vRealize Orchestrator Database Design Decision ............................................................ 183
Table 178. vRealize Orchestrator vCenter Server Plug-In Design Decisions .................................... 184
Table 179. Cluster Node Configuration Design Decision ................................................................... 187
Table 180. Compute Resources for a vRealize Log Insight Medium-Size Node................................ 188
Table 181. Compute Resources for the vRealize Log Insight Nodes Design Decision ...................... 188
Table 182. vRealize Log Insight Isolated Network Design Decisions ................................................. 189
Table 183. IP Subnets in the Application Isolated Networks .............................................................. 189
Table 184. IP Subnets Design Decision ............................................................................................. 190
Table 185. DNS Names of the vRealize Log Insight Nodes ............................................................... 190
Table 186. DNS Names Design Decision ........................................................................................... 190
Table 187. Virtual Disk Configuration in the vRealize Log Insight Virtual Appliance .......................... 191
Table 188. Retention Period Design Decision .................................................................................... 191
Table 189. Log Archive Policy Design Decision ................................................................................. 192
Table 190. SMTP Alert Notification Design Decision .......................................................................... 192
Table 191. Forwarding Alerts to vRealize Operations Manager Design Decision .............................. 193

© 2016 VMware, Inc. All rights reserved.


Page 8 of 220
VMware Validated Design Reference Architecture Guide

Table 192. Custom Role-Based User Management Design Decision ................................................ 193
Table 193. Custom Certificates Design Decision ............................................................................... 193
Table 194. Direct Log Communication to vRealize Log Insight Design Decisions ............................. 194
Table 195. Time Synchronization Design Decision ............................................................................ 194
Table 196. Syslog Protocol Design Decision ...................................................................................... 195
Table 197. Protocol for Event Forwarding across Regions Design Decision ..................................... 195
Table 198. Analytics Cluster Node Configuration Design Decisions .................................................. 197
Table 199. Size of a Medium vRealize Operations Manager Virtual Appliance ................................. 198
Table 200. Analytics Cluster Node Size Design Decisions ................................................................. 198
Table 201. Size of a Standard Remote Collector Virtual Appliance for vRealize Operations Manager
............................................................................................................................................................ 198
Table 202. Compute Resources of the Remote Collector Nodes Design Decisions .......................... 199
Table 203. Analytics Cluster Node Storage Design Decision ............................................................. 199
Table 204. Remote Collector Node Storage Design Decision ............................................................ 200
Table 205. vRealize Operations Manager Isolated Network Design Decision ................................... 202
Table 206. IP Subnets in the Application Virtual Network of vRealize Operations Manager ............. 202
Table 207. IP Subnets Design Decision ............................................................................................. 202
Table 208. DNS Names for the Application Virtual Networks ............................................................. 202
Table 209. Networking Failover and Load Balancing Design Decisions ............................................ 203
Table 210. Identity Source for vRealize Operations Manager Design Decision ................................. 204
Table 211. Using CA-Signed Certificates Design Decision ................................................................ 204
Table 212. Monitoring vRealize Operations Manager Design Decisions ........................................... 205
Table 213. Management Packs for vRealize Operations Manager Design Decisions ....................... 205
Table 214. vSphere Data Protection Design Decision ........................................................................ 206
Table 215. Options for Backup Storage Location ............................................................................... 207
Table 216. VMware Backup Store Target Design Decisions .............................................................. 207
Table 217. vSphere Data Protection Performance ............................................................................. 207
Table 218. VMware vSphere Data Protection Sizing Guide ............................................................... 208
Table 219. VMware Backup Store Size Design Decisions ................................................................. 208
Table 220. Virtual Machine Transport Mode Design Decisions .......................................................... 209
Table 221. Backup Schedule Design Decisions ................................................................................. 210
Table 222. Retention Policies Design Decision .................................................................................. 210
Table 223. Component Backup Jobs Design Decision ....................................................................... 210
Table 224. VM Backup Jobs in Region A ........................................................................................... 211
Table 225. VM Backup Jobs in Region B* .......................................................................................... 212
Table 226. Design Decisions for Site Recovery Manager and vSphere Replication Deployment ..... 215
Table 227. vSphere Replication Design Decisions ............................................................................. 217
Table 228. Recovery Plan Test Network Design Decision ................................................................. 220

© 2016 VMware, Inc. All rights reserved.


Page 9 of 220
VMware Validated Design Reference Architecture Guide

List of Figures
Figure 1. Overview of SDDC Architecture ............................................................................................ 13
Figure 2. Pods in the SDDC .................................................................................................................. 15
Figure 3. Leaf-and-Spine Physical Network Design ............................................................................. 16
Figure 4. High-level Physical and Logical Representation of a Leaf Node ........................................... 17
Figure 5. Pod Network Design .............................................................................................................. 19
Figure 6. Oversubscription in the Leaf Layer ........................................................................................ 20
Figure 7. Compensation for a Link Failure ............................................................................................ 21
Figure 8. Quality of Service (Differentiated Services) Trust Point ........................................................ 22
Figure 9. Availability Zones and Regions .............................................................................................. 23
Figure 10. Virtual Infrastructure Layer in the SDDC ............................................................................. 24
Figure 11. SDDC Logical Design .......................................................................................................... 25
Figure 12. NSX for vSphere Architecture .............................................................................................. 27
Figure 13. NSX for vSphere Universal Distributed Logical Router ....................................................... 30
Figure 14. Cloud Management Platform Conceptual Architecture ....................................................... 32
Figure 15. vRealize Automation Logical Architecture for Region A ...................................................... 36
Figure 16. vRealize Automation Logical Architecture for Region B ...................................................... 37
Figure 17. Dual-Region Data Protection Architecture ........................................................................... 38
Figure 18. Disaster Recovery Architecture ........................................................................................... 39
Figure 19. Cluster Architecture of vRealize Log Insight ........................................................................ 41
Figure 20. vRealize Operations Manager Architecture ......................................................................... 42
Figure 21. Physical Infrastructure Design ............................................................................................. 45
Figure 22. Physical Layer within the SDDC .......................................................................................... 46
Figure 23. SDDC Pod Architecture ....................................................................................................... 47
Figure 24. Leaf-and-Spine Architecture ................................................................................................ 52
Figure 25. Example of a Small-Scale Leaf-and-Spine Architecture ..................................................... 54
Figure 26. Leaf-and-Spine and Network Virtualization ......................................................................... 55
Figure 27. Leaf Switch to Server Connection within Compute Racks .................................................. 56
Figure 28. Leaf Switch to Server Connection within Management/Shared Compute and Edge Rack . 57
Figure 29. Sample VLANs and Subnets within a Pod .......................................................................... 58
Figure 30. Oversubscription in the Leaf Switches................................................................................. 59
Figure 31. Virtual Infrastructure Layer Business Continuity in the SDDC ............................................ 70
Figure 32. SDDC Logical Design .......................................................................................................... 71
Figure 33. vSphere Data Protection Logical Design ............................................................................. 72
Figure 34. Disaster Recovery Logical Design ....................................................................................... 73
Figure 35. vCenter Server and Platform Services Controller Deployment Model ................................ 78
Figure 36. vSphere Logical Cluster Layout ........................................................................................... 81
Figure 37. Network Switch Design for Management Hosts .................................................................. 92

© 2016 VMware, Inc. All rights reserved.


Page 10 of 220
VMware Validated Design Reference Architecture Guide

Figure 38. Network Switch Design for shared Edge and Compute Hosts ............................................ 96
Figure 39. Network Switch Design for Compute Hosts ......................................................................... 98
Figure 40. Architecture of NSX for vSphere........................................................................................ 105
Figure 41. Conceptual Tenant Overview ............................................................................................ 111
Figure 42. Cluster Design for NSX for vSphere .................................................................................. 113
Figure 43. Logical Switch Control Plane in Hybrid Mode .................................................................... 115
Figure 44. Virtual Application Network Components and Design ....................................................... 124
Figure 45. Detailed Example for vRealize Automation Networking .................................................... 125
Figure 46. Logical Storage Design ...................................................................................................... 128
Figure 47. Conceptual Virtual SAN Design ......................................................................................... 134
Figure 48. Virtual SAN Conceptual Network Diagram ........................................................................ 135
Figure 49. NFS Storage Exports ......................................................................................................... 144
Figure 50. Cloud Management Platform Design ................................................................................. 145
Figure 51. vRealize Automation Design Overview for Region A ........................................................ 147
Figure 52. vRealize Automation Design Overview for Region B ........................................................ 148
Figure 53. Rainpole Cloud Automation Tenant Design for Two Regions ........................................... 159
Figure 54. vRealize Automation Logical Design ................................................................................. 166
Figure 55. vRealize Automation Integration with vSphere Endpoint .................................................. 169
Figure 56. Template Synchronization ................................................................................................. 174
Figure 57. VMware Identity Manager proxies authentication between Active Directory and vRealize
Automation .......................................................................................................................................... 174
Figure 58. Operations Infrastructure Conceptual Design ................................................................... 186
Figure 59. Logical Design of vRealize Log Insight .............................................................................. 186
Figure 60. Networking Design for the vRealize Log Insight Deployment ........................................... 189
Figure 61. Logical Design of vRealize Operations Manager Multi-Region Deployment ..................... 196
Figure 62. Networking Design of the vRealize Operations Manager Deployment ............................. 201
Figure 63. vSphere Data Protection Logical Design ........................................................................... 206
Figure 64. Disaster Recovery Logical Design ..................................................................................... 214
Figure 65. Logical Network Design for Cross-Region Deployment with Management Application
Network Containers ............................................................................................................................. 216

© 2016 VMware, Inc. All rights reserved.


Page 11 of 220
VMware Validated Design Reference Architecture Guide

1 Purpose and Intended Audience


VMware Validated Design Reference Architecture Guide contains a validated model of the Software-
Defined Data Center (SDDC) and provides a detailed design of each management component of the
SDDC stack.
The Architecture Overview discusses the building blocks and the main principles of each layer SDDC
management layer. The Detailed Design provides the available design options according to the
design objective, and a set of design decisions to justify selecting the path for building each SDDC
component.

Note The VMware Validated Design Reference Architecture Guide is compliant and validated with
certain product versions. See VMware Validated Design Release Notes for more information
about supported product versions.

VMware Validated Design Reference Architecture Guide is intended for cloud architects,
infrastructure administrators and cloud administrators who are familiar with and want to use VMware
software to deploy in a short time and manage an SDDC that meets the requirements for capacity,
scalability, backup and restore, and extensibility for disaster recovery support.

© 2016 VMware, Inc. All rights reserved.


Page 12 of 220
VMware Validated Design Reference Architecture Guide

2 Architecture Overview
The VMware Validated™ Design for Software-Defined Data Center outcome requires a system that
enables an IT organization to automate the provisioning of common repeatable requests and to
respond to business needs with more agility and predictability. Traditionally this has been referred to
as IAAS, or Infrastructure as a Service, however the software-defined data center (SDDC) extends
the typical IAAS solution to include a broader and more complete IT solution.
The VMware Validated Design architecture is based on a number of layers and modules, which
allows interchangeable components be part of the end solution or outcome such as the SDDC. If a
particular component design does not fit the business or technical requirements for whatever reason,
it should be able to be swapped out for another similar component. The VMware Validated Designs
are one way of putting an architecture together. They are rigorously tested to ensure stability,
scalability and compatibility. Ultimately however, the system is designed in such a way as to ensure
the desired IT outcome will be achieved.

Figure 1. Overview of SDDC Architecture

Physical Layer
The lowest layer of the solution is the Physical Layer, sometimes referred to as the 'core', which
consists of three main components, Compute, Network and Storage. Inside the compute component
sit the x86 based servers that run the management, edge and tenant compute workloads. There is
some guidance around the capabilities required to run this architecture, however no
recommendations on the type or brand of hardware is given. All components must be supported on
the VMware Hardware Compatibility guide.
Virtual Infrastructure Layer
Sitting on the Physical Layers infrastructure is the Virtual Infrastructure Layer. Within the Virtual
Infrastructure Layer, access to the physical underlying infrastructure is controlled and allocated to the
management and tenant workloads. The Virtual Infrastructure Layer consists primarily of the physical
host's hypervisor and the control of these hypervisors. The management workloads consist of
elements in the virtual management layer itself, along with elements in the Cloud Management Layer,
Service Management, Business Continuity and Security areas.
Cloud Management Layer
The Cloud Management Layer is the "top" layer of the stack and is where the service consumption
occurs. Typically, through a UI or API, this layer calls for resources and then orchestrates the actions
of the lower layers to achieve the request. While the SDDC can stand on its own without any other

© 2016 VMware, Inc. All rights reserved.


Page 13 of 220
VMware Validated Design Reference Architecture Guide

ancillary services, for a complete SDDC experience other supporting components are needed. The
Service Management, Business Continuity and Security areas complete the architecture by providing
this support.
Service Management
When building any type of IT infrastructure, portfolio and operations management play key roles in
continued day-to-day service delivery. The Service Management area of this architecture mainly
focuses on operations management in particular monitoring, alerting and log management. Portfolio
Management is not a focus of this SDDC design but may be added in future releases.
Business Continuity
To ensure a system that is enterprise ready, it must contain elements to support business continuity in
the area of backup, restore and disaster recovery. This area ensures that when data loss occurs, the
right elements are in place to ensure there is no permanent loss to the business. The design provides
comprehensive guidance on how to operate backups, restore and also the run books of how to fail
components over in the event of a disaster.
Security
All systems need to inherently be secure by design to reduce risk and increase compliance while still
providing a governance structure. The security area outlines what is needed to ensure the entire
SDDC is resilient to both internal and external threats.

2.1 Physical Infrastructure Architecture


2.1.1 Pod Architecture
The VMware Validated Design for SDDC uses a small set of common building blocks called pods.
Pods can include different combinations of servers, storage equipment, and network equipment, and
can be set up with varying levels of hardware redundancy and varying quality of components. Pods
are connected to a network core that distributes data between them. The pod is not defined by any
hard physical properties, as it is a standard unit of connected elements within the SDDC network
fabric.

2.1.1.1 Pod
A pod is a logical boundary of functionality for the SDDC platform. While each pod usually spans one
rack, it is possible to aggregate multiple pods into a single rack in smaller setups. For both small and
large setups, homogeneity and easy replication are important.
Different pods of the same type can provide different characteristics for varying requirements. For
example, one compute pod could use full hardware redundancy for each component (power supply
through memory chips) for increased availability. At the same time, another compute pod in the same
setup could use low-cost hardware without any hardware redundancy. With these variations, the
architecture can cater to the different workload requirements in the SDDC.
One of the guiding principles for such deployments is that VLANs are not spanned beyond a single
pod by the network virtualization layer. Although this VLAN restriction appears to be a simple
requirement, it has widespread impact on how a physical switching infrastructure can be built and on
how it scales.

2.1.1.2 Pod Types


The SDDC differentiates between the following types of pods:
 Management pod
 Shared Edge and Compute pod
 Compute pod
 Storage pod

© 2016 VMware, Inc. All rights reserved.


Page 14 of 220
VMware Validated Design Reference Architecture Guide

Figure 2. Pods in the SDDC

2.1.1.3 Management Pod


The management pod runs the virtual machines that manage the SDDC. These virtual machines host
vCenter Server, NSX Manager, NSX Controller, vRealize Operations Management, vRealize Log
Insight, vRealize Automation, and other shared management components. Different types of
management pods can support different SLAs. Because the management pod hosts critical
infrastructure, you should consider implementing a basic level of hardware redundancy for this pod.
Management pod components must not have tenant-specific addressing.

2.1.1.4 Shared Edge and Compute Pod


The shared edge and compute pod runs the required NSX services to enable north-south routing
between the SDDC and the external network, and east-west routing inside the SDDC. This shared
pod also hosts the SDDC tenant virtual machines (sometimes referred to as workloads or payloads).
As the SDDC grows, additional compute-only pods can be added to support a mix of different types of
workloads for different types of Service Level Agreements (SLAs).

2.1.1.5 Compute Pod


Compute pods host the SDDC tenant virtual machines (sometimes referred to as workloads or
payloads). An SDDC can mix different types of compute pods and provide separate compute pools for
different types of SLAs.

2.1.1.6 Storage Pod


Storage pods provide network-accessible storage using NFS or iSCSI. Different types of storage pods
can provide different levels of SLA, ranging from just a bunch of disks (JBODs) using IDE drives with
minimal to no redundancy, to fully redundant enterprise-class storage arrays. For bandwidth-intense
IP-based storage, the bandwidth of these pods can scale dynamically.
This design does not consider Fibre Channel or Fibre Channel over Ethernet (FCoE) based storage
technology. Instead, this design focuses on technologies that can use the central Layer 3 network
fabric for primary connectivity.

2.1.1.7 Pod to Rack Mapping


Pods are not mapped one-to-one to 19" data center racks. While a pod is an atomic unit of a
repeatable building block, a rack is merely a unit of size. Because pods can have different sizes, how
pods are mapped to 19" data center racks depends on the use case.

© 2016 VMware, Inc. All rights reserved.


Page 15 of 220
VMware Validated Design Reference Architecture Guide

 One Pod in One Rack. One pod can occupy exactly one rack. This is typically the case for
compute pods.
 Multiple Pods in One Rack. Two or more pods can occupy a single rack, for example, one
management pod and one shared edge and compute pod can be deployed to a single rack.
 Single Pod Across Multiple Racks. A single pod can stretch across multiple adjacent racks. For
example, a storage pod with filer heads and disk shelves can span more than one rack.

2.1.2 Physical Network Architecture


The physical network architecture is tightly coupled with the pod-and-core architecture, and uses a
Layer 3 leaf-and-spine network instead of the more traditional 3-tier data center design.

2.1.2.1 Leaf-and-Spine Network Architecture


The design uses leaf switches and spine switches.
 A leaf switch is typically located inside a rack and provides network access to the servers inside
that rack, it is also referred to as a Top of Rack (ToR) switch.
 A spine switch is in the spine layer and provides connectivity between racks. Links between spine
switches are typically not required. If a link failure occurs between a spine switch and a leaf
switch, the routing protocol ensures that no traffic for the affected rack is sent to the spine switch
that has lost connectivity to that rack.

Figure 3. Leaf-and-Spine Physical Network Design

Ports that face the servers inside a rack should have a minimal configuration, shown in the following
high-level physical and logical representation of the leaf node.

Note Each leaf node has identical VLAN configuration with unique /24 subnets assigned to each
VLAN.

© 2016 VMware, Inc. All rights reserved.


Page 16 of 220
VMware Validated Design Reference Architecture Guide

Figure 4. High-level Physical and Logical Representation of a Leaf Node

2.1.2.2 Network Transport


You can implement the physical layer switch fabric for an SDDC by offering Layer 2 transport services
or Layer 3 transport services to all components. For a scalable and vendor-neutral data center
network, use Layer 3 transport.

2.1.2.3 Benefits and Drawbacks for Layer 2 Transport


In a design that uses Layer 2 transport, leaf switches and spine switches form a switched fabric,
effectively acting like one large switch. Using modern data center switching fabric products such as
Cisco FabricPath, you can build highly scalable Layer 2 multipath networks without the Spanning Tree
Protocol (STP). Such networks are particularly suitable for large virtualization deployments, private
clouds, and high-performance computing (HPC) environments.
Using Layer 2 routing has benefits and drawbacks.
 The benefit of this approach is more design freedom. You can span VLANs, which is useful for
vSphere vMotion or vSphere Fault Tolerance (FT).
 The drawback is that the size of such a deployment is limited because the fabric elements have to
share a limited number of VLANs. In addition, you have to rely on a specialized data center
switching fabric product from a single vendor because these products are not designed for
interoperability between vendors.

2.1.2.4 Benefits and Drawbacks for Layer 3 Transport


A design with Layer 3 transport requires these considerations.
 Layer 2 connectivity is limited to within the data center rack, up to the leaf switch.
 The leaf switch terminates each VLAN and provides default gateway functionality, that is, it has a
switch virtual interface (SVI) for each VLAN.
 Uplinks from the leaf switch to the spine layer are routed point-to-point links. VLAN trunking on
the uplinks is not allowed.

© 2016 VMware, Inc. All rights reserved.


Page 17 of 220
VMware Validated Design Reference Architecture Guide

 A dynamic routing protocol—for example OSPF, ISIS, or iBGP—connects the leaf switches and
spine switches. Each leaf switch in the rack advertises a small set of prefixes, typically one per
VLAN or subnet. In turn, the leaf switch calculates equal cost paths to the prefixes it received from
other leaf switches
Using Layer 3 routing has benefits and drawbacks.
 The benefit is that you can chose from a wide array of Layer 3 capable switch products for the
physical switching fabric. You can mix switches from different vendors due to general
interoperability between implementation of OSPF, ISIS or iBGP. This approach is usually more
cost effective because it uses only basic functionality of the physical switches.
 The drawbacks are some design restrictions because VLANs are restricted to a single rack. This
affects vSphere vMotion, vSphere Fault Tolerance, and storage networks.

2.1.2.5 Infrastructure Network Architecture


One of the key goals of network virtualization is to provide a virtual-to-physical network abstraction.
For this, the physical fabric must provide a robust IP transport with the following characteristics.
 Simplicity
 Scalability
 High bandwidth
 Fault-tolerant transport
 Support for different levels of quality of service (QoS)
Simplicity
Configuration of the switches inside a data center must be simple. General or global configuration
such as AAA, SNMP, syslog, NTP, and others should be replicated line by line, independent of the
position of the switches. A central management capability to configure all switches at once is an
alternative.
Scalability
Scalability factors include but are not limited to the following.
 Number of racks supported in a fabric.
 Amount of bandwidth between any two racks in a data center.
 Number of paths that a leaf switch can select from when communicating with another rack.
The total number of ports available across all spine switches and the oversubscription that is
acceptable determine the number of racks supported in a fabric. Different racks might host different
types of infrastructure, which results in different bandwidth requirements.
 Racks with storage systems might attract or source more traffic than other racks.
 Compute racks, such as racks hosting hypervisors with workloads or virtual machines, might have
different bandwidth requirements than shared edge and compute racks, which provide
connectivity to the outside world.
Link speed and the number of links vary to satisfy different bandwidth demands. You can vary them
for each rack without sacrificing other aspects of the leaf-and-spine architecture.

© 2016 VMware, Inc. All rights reserved.


Page 18 of 220
VMware Validated Design Reference Architecture Guide

Figure 5. Pod Network Design

The number of links to the spine switches dictates how many paths for traffic from this rack to another
rack are available. Because the number of hops between any two racks is consistent, the architecture
can utilize equal-cost multipathing (ECMP). Assuming traffic sourced by the servers carries a TCP or
UDP header, traffic spray can occur on a per-flow basis.
High Bandwidth
In leaf-and-spine topologies, oversubscription typically occurs at the leaf switch.
Oversubscription is equal to the total amount of bandwidth available to all servers connected to a leaf
switch divided by the aggregate amount of uplink bandwidth.
oversubscription = total bandwidth / aggregate uplink bandwidth
For example, 20 servers with one 10 Gigabit Ethernet (10 GbE) port each create up to 200 Gbps of
bandwidth. In an environment with eight 10 GbE uplinks to the spine—a total of 80 Gbps—a 2.5:1
oversubscription results shown in the Oversubscription in the Leaf Layer illustration.
2.5 (oversubscription) = 200 (total) / 80 (total uplink)

© 2016 VMware, Inc. All rights reserved.


Page 19 of 220
VMware Validated Design Reference Architecture Guide

Figure 6. Oversubscription in the Leaf Layer

You can make more or less bandwidth available to a rack by provisioning more or fewer uplinks. That
means you can change the available bandwidth on a per-rack basis.

Note The number of uplinks from a leaf switch to each spine switch must be the same to avoid
hotspots.

For example, if a leaf switch has two uplinks to spine switch A and only one uplink to spine switches
B, C and D, more traffic is sent to the leaf switch via spine switch A, which might create a hotspot.
Fault Tolerance
The larger the environment, the more switches make up the overall fabric and the greater the
possibility for one component of the data center switching fabric to fail. A resilient fabric can sustain
individual link or box failures without widespread impact.

© 2016 VMware, Inc. All rights reserved.


Page 20 of 220
VMware Validated Design Reference Architecture Guide

Figure 7. Compensation for a Link Failure

For example, if one of the spine switches fails, traffic between racks continues to be routed across the
remaining spine switches in a Layer 3 fabric. The routing protocol ensures that only available paths
are chosen. Installing more than two spine switches reduces the impact of a spine switch failure.
Multipathing-capable fabrics handle box or link failures, reducing the need for manual network
maintenance and operations. If a software upgrade of a fabric switch becomes necessary, the
administrator can take the node out of service gracefully by changing routing protocol metrics, which
will quickly drain network traffic from that switch, freeing the switch for maintenance.
Depending on the width of the spine, that is, how many switches are in the aggregation or spine layer,
the additional load that the remaining switches must carry is not as significant as if there were only
two switches in the aggregation layer. For example, in an environment with four spine switches, a
failure of a single spine switch only reduces the available capacity by 25%.
Quality of Service Differentiation
Virtualized environments must carry different types of traffic, including tenant, storage and
management traffic, across the switching infrastructure. Each traffic type has different characteristics
and makes different demands on the physical switching infrastructure.
 Management traffic, although typically low in volume, is critical for controlling physical and virtual
network state.
 IP storage traffic is typically high in volume and generally stays within a data center.
For virtualized environments, the hypervisor sets the QoS values for the different traffic types. The
physical switching infrastructure has to trust the values set by the hypervisor. No reclassification is
necessary at the server-facing port of a leaf switch. If there is a congestion point in the physical
switching infrastructure, the QoS values determine how the physical network sequences, prioritizes,
or potentially drops traffic.

© 2016 VMware, Inc. All rights reserved.


Page 21 of 220
VMware Validated Design Reference Architecture Guide

Figure 8. Quality of Service (Differentiated Services) Trust Point

Two types of QoS configuration are supported in the physical switching infrastructure:
 Layer 2 QoS, also called class of service
 Layer 3 QoS, also called DSCP marking.
A vSphere Distributed Switch supports both class of service and DSCP marking. Users can mark the
traffic based on the traffic type or packet classification. When the virtual machines are connected to
the VXLAN-based logical switches or networks, the QoS values from the internal packet headers are
copied to the VXLAN-encapsulated header. This enables the external physical network to prioritize
the traffic based on the tags on the external header.

2.1.2.6 Server Interfaces (NICs)


If the server has more than one server interface (NIC) of the same speed, use two as uplinks with
VLANs trunked to the interfaces.
The vSphere Distributed Switch supports many different NIC Teaming options. Load-based NIC
teaming supports optimal use of available bandwidth and supports redundancy in case of a link
failure. Use two 10 GbE connections per server along with a pair of leaf switches. 801.Q trunks are
used for carrying a small number of VLANs; for example, management, storage, VXLAN, vSphere
Replication, and VMware vSphere vMotion traffic.

2.1.3 Availability Zones and Regions


In an SDDC, availability zones are collections of infrastructure components. Regions support disaster
recovery solutions and allow you to place workloads closer to your customers. Typically, multiple
availability zones form a single region.

© 2016 VMware, Inc. All rights reserved.


Page 22 of 220
VMware Validated Design Reference Architecture Guide

This VMware Validated Design uses two regions, but uses only one availability zone in each region.
The following diagram shows how the design could be expanded to include multiple availability
zones.
Figure 9. Availability Zones and Regions

2.1.3.1 Availability Zones


Each availability zone is isolated from other availability zones to stop the propagation of failure or
outage across zone boundaries. Together, multiple availability zones provide continuous availability
through redundancy, helping to avoid outages and improve SLAs. An outage that is caused by
external factors (power, cooling, physical integrity) affects only one zone, those factors most likely
don't lead to an outage in other zones except in the case of major disasters.
Each availability zone runs on its own physically distinct, independent infrastructure, and is
engineered to be highly reliable. Each zone should have independent power, cooling, network and
security. Common points of failures within a physical data center, like generators and cooling
equipment, should not be shared across availability zones. Additionally, these zones should be
physically separate; so that even uncommon disasters affect only a single availability zone.
Availability zones are usually either two distinct data centers within metro distance (latency in the
single digit range) or two safety/fire sectors (aka data halls) within the same large scale data center.
Multiple availability zones (usually two) belong to a single region. The physical distance between
availability zones can be up to approximately 50 kilometer or 30 miles, therefore offering low single-
digit latency and large bandwidth - via dark fiber - between the zones. This architecture allows the
SDDC equipment across the availability zone to operate in an active/active manner as a single virtual
data center or region.
You can operate workloads across multiple availability zones within the same region as if they were
part of a single virtual data center. This supports an architecture with very high availability that is
suitable for mission critical applications. When the distance between two locations of equipment
becomes too large, these locations can no longer function as two availability zones within the same
region and need to be treated as separate regions.

2.1.3.2 Regions
Multiple regions support placing workloads closer to your customers, for example, by operating one
region on the US east coast and one region on the US west coast, or operating a region in Europe
and another region in the US. Regions are helpful in many ways.
 Regions can support disaster recovery solutions: One region can be the primary site and another
region can be the recovery site.
 You can use multiple regions to address data privacy laws and restrictions in certain countries by
keeping tenant data within a region in the same country.
The distance between regions can be rather large. This design uses two regions, one region is
assumed to be in San Francisco (SFO), the other region is assumed to be in Los Angeles (LAX).

© 2016 VMware, Inc. All rights reserved.


Page 23 of 220
VMware Validated Design Reference Architecture Guide

2.2 Virtual Infrastructure Architecture


2.2.1 Infrastructure Architecture
The virtual infrastructure is the foundation of an operational SDDC. Within the virtual infrastructure
layer, access to the physical underlying infrastructure is controlled and allocated to the management
and tenant workloads. The virtual infrastructure layer consists primarily of the physical hosts'
hypervisors and the control of these hypervisors. The management workloads consist of elements in
the virtual management layer itself, along with elements in the cloud management layer and in the
service management, business continuity, and security areas.
Figure 10. Virtual Infrastructure Layer in the SDDC

2.2.2 Virtual Infrastructure Overview


The SDDC virtual infrastructure consists of two regions. Each region includes a management pod and
a shared edge and compute pod.

© 2016 VMware, Inc. All rights reserved.


Page 24 of 220
VMware Validated Design Reference Architecture Guide

Figure 11. SDDC Logical Design

2.2.2.1 Management Pod


Management pods run the virtual machines that manage the SDDC. These virtual machines host
vCenter Server, NSX Manager, NSX Controller, vRealize Operations, vRealize Log Insight, vRealize
Automation, Site Recovery Manager and other shared management components. All management,
monitoring, and infrastructure services are provisioned to a vCenter Server High Availability cluster
which provides high availability for these critical services. Permissions on the management cluster
limit access to only administrators. This limitation protects the virtual machines that are running the
management, monitoring, and infrastructure services.

2.2.2.2 Shared Edge and Compute Pod


The shared edge and compute pod runs the required NSX services to enable north-south routing
between the SDDC and the external network and east-west routing inside the SDDC. This pod also
hosts the SDDC tenant virtual machines (sometimes referred to as workloads or payloads). As the
SDDC grows additional compute-only pods can be added to support a mix of different types of
workloads for different types of SLAs.

© 2016 VMware, Inc. All rights reserved.


Page 25 of 220
VMware Validated Design Reference Architecture Guide

2.2.3 Network Virtualization Architecture


2.2.3.1 NSX for vSphere Components
VMware NSX for vSphere, the network virtualization platform, is a key solution in the SDDC
architecture. The NSX for vSphere platform consists of several components that are relevant to the
network virtualization design.
NSX for vSphere Platform
NSX for vSphere creates a network virtualization layer. All virtual networks are created on top of this
layer, which is an abstraction between the physical and virtual networks. Several components are
required to create this network virtualization layer.
 vCenter Server
 NSX Manager
 NSX Controller
 NSX Virtual Switch
 NSX for vSphere API
These components are separated into different planes to create communications boundaries and
provide isolation of workload data from system control messages.
 Data plane. Workload data is contained wholly within the data plane. NSX logical switches
segregate unrelated workload data. The data is carried over designated transport networks in the
physical network. The NSX Virtual Switch, distributed routing, and the distributed firewall are also
implemented in the data plane.
 Control plane. Network virtualization control messages are located in the control plane. Control
plane communication should be carried on secure physical networks (VLANs) that are isolated
from the transport networks that are used for the data plane. Control messages are used to set up
networking attributes on NSX Virtual Switch instances, as well as to configure and manage
disaster recovery and distributed firewall components on each ESXi host.
 Management plane. The network virtualization orchestration happens in the management plane.
In this layer, cloud management platforms such as VMware vRealize® Automation™ can request,
consume, and destroy networking resources for virtual workloads. Communication is directed
from the cloud management platform to vCenter Server to create and manage virtual machines,
and to NSX Manager to consume networking resources.
The different planes are connected through the APIs, which include REST, VMware vSphere, and
VMware VIX APIs. The API used depends on the component being controlled.
NSX Manager
NSX Manager provides the centralized management plane for the NSX for vSphere architecture and
has a one-to-one mapping with vCenter Server for workloads. NSX Manager performs the following
functions.
 Provides a single point of configuration and the REST API entry points in a vSphere environment
configured for NSX for vSphere.
 Responsible for deploying NSX Controller clusters, NSX Edge distributed routers, and NSX Edge
services gateways (as appliances in OVF format), guest introspection services, and so on.
 Responsible for preparing ESXi hosts for NSX for vSphere by installing VXLAN, distributed
routing, and firewall kernel modules, as well as the User World Agent (UWA).
 Communicates with NSX Controller clusters through REST and with the ESXi hosts through the
VMware vFabric® RabbitMQ message bus. This internal message bus is specific to NSX for
vSphere and does not require setup of additional services.
 Generates certificates for the NSX Controller nodes and ESXi hosts to secure control plane
communications with mutual authentication.

© 2016 VMware, Inc. All rights reserved.


Page 26 of 220
VMware Validated Design Reference Architecture Guide

VMware NSX 6.2 allows linking multiple vCenter and VMware NSX deployments, and manage them
from a single NSX Manager that is designated as primary. Such a linked environment includes both
an NSX Manager primary instance, and one or more secondary instances.
 The primary NSX Manager instance is linked to the primary vCenter Server instance and allows
the creation and management of universal logical switches, universal (distributed) logical routers
and universal firewall rules.
 Each secondary NSX Manager instance can manage networking services that are local to itself.
Up to seven secondary NSX Manager instances can be associated with the primary NSX
Manager in a linked environment. You can configure network services on all NSX Manager
instances from one central location.

Note A linked environment still requires one vCenter Server instance for each NSX Manager
instance.

To manage all NSX Manager instances from the primary NSX Manager in a Cross-vCenter VMware
NSX deployment, the vCenter Server instances must be connected with Platform Services Controller
nodes in Enhanced Linked Mode.
Figure 12. NSX for vSphere Architecture

© 2016 VMware, Inc. All rights reserved.


Page 27 of 220
VMware Validated Design Reference Architecture Guide

Control Plane
VMware NSX supports three different replication modes to provide multiple destination
communication.
 Multicast Mode. When multicast replication mode is selected for a logical switch, VMware NSX
relies on the Layer 2 and Layer 3 multicast capability of the physical network to ensure VXLAN
encapsulated multi-destination traffic is sent to all the VXLAN tunnel end points (VTEPs). The
control plane uses multicast IP addresses on the physical network in this mode.
 Unicast Mode. In unicast mode, the control plane is managed by the NSX Controller instances
and all replication is done locally on the host. No multicast IP addresses or physical network
configurations are required. This mode is well suited for smaller deployments.
 Hybrid Mode. Hybrid mode is an optimized version of unicast mode, where local traffic replication
for the subnet is offloaded to the physical network. This mode requires IGMP snooping on the first
hop switch, and an IGMP querier must be available. Protocol-independent multicast (PIM) is not
required.
NSX Controller
The NSX Controller cluster is the control plane component, and is responsible for managing the
switching and routing modules in the hypervisors. An NSX Controller node performs the following
functions.
 Provides the control plane to distribute VXLAN and logical routing information to ESXi hosts.
 Clusters nodes for scale-out and high availability.
 Slices network information across nodes in a cluster for redundancy purposes.
 Eliminates the need for multicast support from the physical network infrastructure.
 Provides ARP-suppression of broadcast traffic in VXLAN networks.
NSX for vSphere control plane communication occurs over the management network. Network
information from the ESXi hosts and the distributed logical router control VMs is reported to NSX
Controller instances through the UWA. The NSX Controller command line supports retrieval of
VXLAN and logical routing network state information.
Data Plane
The NSX data plane consists of the NSX vSwitch, which is based on the vSphere Distributed Switch
(VDS) and includes additional components. These components include kernel modules, which run
within the ESXi kernel and provide services such as the distributed logical router (DLR) and
distributed firewall (DFW). The NSX kernel modules also enable Virtual Extensible LAN (VXLAN)
capabilities.
The NSX vSwitch abstracts the physical network and provides access-level switching in the
hypervisor. It is central to network virtualization because it enables logical networks that
are independent of physical constructs such as a VLAN. The NSX vSwitch provides multiple benefits.
 Three types of overlay networking capabilities:
o Creation of a flexible logical Layer 2 overlay over existing IP networks on existing
physical infrastructure, without the need to rearchitect the data center networks.
o Support for east/west and north/south communication while maintaining isolation
between tenants.
o Support for application workloads and virtual machines that operate as if they were connected
to a physical Layer 2 network.
 Support for VXLAN and centralized network configuration.
 A comprehensive toolkit for traffic management, monitoring and troubleshooting within a virtual
network which includes Port Mirroring, NetFlow/IPFIX, configuration backup and restore, network
health check, and Quality of Service (QoS).

© 2016 VMware, Inc. All rights reserved.


Page 28 of 220
VMware Validated Design Reference Architecture Guide

In addition to the NSX vSwitch, the data plane also includes gateway devices, which can provide
Layer 2 bridging from the logical networking space (VXLAN) to the physical network (VLAN). The
gateway device is typically an NSX Edge Gateway device. NSX Edge Gateway devices offer Layer 2,
Layer 3, perimeter firewall, load-balancing, Virtual Private Network (VPN), and Dynamic Host Control
Protocol (DHCP) services.
Consumption Plane
In the consumption plane, different users of NSX for vSphere can access and manage services in
different ways:
 NSX administrators can manage the NSX environment from the vSphere Web Client.
 End-users can consume the network virtualization capabilities of NSX for vSphere through the
Cloud Management Platform (CMP) when deploying applications.

2.2.3.2 Network Virtualization Services


Network virtualization services include logical switches, logical routers, logical firewalls, and other
components of NSX for vSphere.
Logical Switches
NSX for vSphere logical switches create logically abstracted segments to which tenant virtual
machines can connect. A single logical switch is mapped to a unique VXLAN segment ID and is
distributed across the ESXi hypervisors within a transport zone. This allows line-rate switching in the
hypervisor without creating constraints of VLAN sprawl or spanning tree issues.
Universal Distributed Logical Router
The NSX for vSphere Universal Distributed Logical Router is optimized for forwarding in the
virtualized space (between VMs, on VXLAN- or VLAN-backed port groups). Features include:
 High performance, low overhead first hop routing.
 Scaling the number of hosts.
 Support for up to 1,000 logical interfaces (LIFs) on each distributed logical router.
The Universal Distributed Logical Router is installed in the kernel of every ESXi host, as such it
requires a VM to provide the control plane. The universal distributed logical router Control VM is the
control plane component of the routing process, providing communication between NSX Manager and
NSX Controller cluster through the User World Agent. NSX Manager sends logical interface
information to the Control VM and NSX Controller cluster, and the Control VM sends routing updates
to the NSX Controller cluster.

© 2016 VMware, Inc. All rights reserved.


Page 29 of 220
VMware Validated Design Reference Architecture Guide

Figure 13. NSX for vSphere Universal Distributed Logical Router

Designated Instance
The designated instance is responsible for resolving ARP on a VLAN LIF. There is one designated
instance per VLAN LIF. The selection of an ESXi host as a designated instance is performed
automatically by the NSX Controller cluster and that information is pushed to all other hosts. Any ARP
requests sent by the distributed logical router on the same subnet are handled by the same host. In
case of host failure, the controller selects a new host as the designated instance and makes that
information available to other hosts.
User World Agent
User World Agent (UWA) is a TCP and SSL client that enables communication between the ESXi
hosts and NSX Controller nodes, and the retrieval of information from NSX Manager through
interaction with the message bus agent.
Edge Service Gateways
While the Universal Logical Router provides VM to VM or east-west routing, the NSX Edge Service
Gateway provides north-south connectivity, by peering with upstream Top of Rack switches, thereby
enabling tenants to access public networks.
Logical Firewall
NSX for vSphere Logical Firewall provides security mechanisms for dynamic virtual data centers.
 The Distributed Firewall allows you to segment virtual data center entities like virtual machines.
Segmentation can be based on VM names and attributes, user identity, vCenter objects like data
centers, and hosts, or can be based on traditional networking attributes like IP addresses, port
groups, and so on.
 The Edge Firewall component helps you meet key perimeter security requirements, such as
building DMZs based on IP/VLAN constructs, tenant-to-tenant isolation in multi-tenant virtual data
centers, Network Address Translation (NAT), partner (extranet) VPNs, and user-based SSL
VPNs.

© 2016 VMware, Inc. All rights reserved.


Page 30 of 220
VMware Validated Design Reference Architecture Guide

The Flow Monitoring feature displays network activity between virtual machines at the application
protocol level. You can use this information to audit network traffic, define and refine firewall policies,
and identify threats to your network.
Logical Virtual Private Networks (VPNs)
SSL VPN-Plus allows remote users to access private corporate applications. IPSec VPN offers site-
to-site connectivity between an NSX Edge instance and remote sites. L2 VPN allows you to extend
your datacenter by allowing virtual machines to retain network connectivity across geographical
boundaries.
Logical Load Balancer
The NSX Edge load balancer enables network traffic to follow multiple paths to a specific destination.
It distributes incoming service requests evenly among multiple servers in such a way that the load
distribution is transparent to users. Load balancing thus helps in achieving optimal resource
utilization, maximizing throughput, minimizing response time, and avoiding overload. NSX Edge
provides load balancing up to Layer 7.
Service Composer
Service Composer helps you provision and assign network and security services to applications in a
virtual infrastructure. You map these services to a security group, and the services are applied to the
virtual machines in the security group.
Data Security provides visibility into sensitive data that are stored within your organization's virtualized
and cloud environments. Based on the violations that are reported by the NSX for vSphere Data
Security component, NSX security or enterprise administrators can ensure that sensitive data is
adequately protected and assess compliance with regulations around the world.
NSX for vSphere Extensibility
VMware partners integrate their solutions with the NSX for vSphere platform to enable an integrated
experience across the entire SDDC. Data center operators can provision complex, multi-tier virtual
networks in seconds, independent of the underlying network topology or components.

2.3 Cloud Management Architecture


2.3.1 Cloud Management Platform Architecture
The Cloud Management Platform (CMP) is the primary consumption portal for the entire Software-
Defined Data Center (SDDC). Within the SDDC, users use vRealize Automation to author, administer,
and consume VM templates and blueprints.

© 2016 VMware, Inc. All rights reserved.


Page 31 of 220
VMware Validated Design Reference Architecture Guide

Figure 14. Cloud Management Platform Conceptual Architecture

The Cloud Management Platform consists of the following design element and components.
Table 1. Elements and Components of the Cloud Management Platform

Design Element Design Components

Users  Cloud administrators. Tenant, group, fabric, infrastructure,


service, and other administrators as defined by business policies
and organizational structure.
 Cloud (or tenant) users. Users within an organization that can
provision virtual machines and directly perform operations on
them at the level of the operating system.

Tools and supporting Building blocks that provide the foundation of the cloud.
infrastructure
 VM templates and blueprints. VM templates are used to author
the blueprints that tenants (end users) use to provision their cloud
workloads.

Provisioning On-premises and off-premises resources which together form a


infrastructure hybrid cloud.
 Internal Virtual Resources. Supported hypervisors and
associated management tools.
 External Cloud Resources. Supported cloud providers and
associated APIs.

© 2016 VMware, Inc. All rights reserved.


Page 32 of 220
VMware Validated Design Reference Architecture Guide

Design Element Design Components

Cloud management A portal that provides self-service capabilities for users to administer,
portal provision and manage workloads.
 vRealize Automation portal, Admin access. The default root
tenant portal URL used to set up and administer tenants and
global configuration options.
 vRealize Automation portal, Tenant access. Refers to a
subtenant and is accessed using with an appended tenant
identifier.

Note A tenant portal might refer to the default tenant portal in some
configurations. In this case, the URLs match, and the user
interface is contextually controlled by the role-based access
control permissions that are assigned to the tenant.

2.3.2 Logical Architecture of the Cloud Management Platform


The Cloud Management Platform layer delivers the following multi-platform and multi-vendor cloud
services.
 Comprehensive and purpose-built capabilities to provide standardized resources to global
customers in a short time span.
 Multi-platform and multi-vendor delivery methods that integrate with existing enterprise
management systems.
 Central user-centric and business-aware governance for all physical, virtual, private, and public
cloud services.
 Design that meets the customer and business needs and is extensible.
This design considers the following characteristics.
Table 2. Characteristics of the Cloud Management Platform Architecture

Characteristic Description

Availability Indicates the effect a choice has on technology and related


infrastructure to provide highly-available operations and sustain
operations during system failures.

VMware vSphere High Availability will provide the required host


redundancy and tolerance of hardware failures where appropriate.

Manageability Relates to the effect a choice has on overall infrastructure


manageability.
Key metrics: Accessibility and the lifecycle of the infrastructure being
managed.

© 2016 VMware, Inc. All rights reserved.


Page 33 of 220
VMware Validated Design Reference Architecture Guide

Characteristic Description

Performance Reflects whether the option has a positive or negative impact


on overall infrastructure performance. This architecture follows the
VMware reference architecture sizing guidelines to provide
certain performance characteristics.
Key metrics: Performance analysis and tuning of the database,
Manager service, Model Manager, portal Web site, and data
collection.

Scalability Depicts the effect the option has on the ability of the solution to be
augmented to achieve better sustained performance within the
infrastructure.
Key metrics: Web site latency, network traffic, and CPU usage on the
database and web servers.

Security Reflects whether the option has a positive or negative impact on


overall infrastructure security.
Key metrics: Data confidentiality, integrity, authenticity, and non-
repudiation of cloud automation components and the option's
integration with supporting and provisioning infrastructures.

2.3.2.1 Cloud Management Layer Elements


The Cloud Management Platform elements include software and physical components providing
portal-based functionality and service catalog, Infrastructure as a Service (IaaS) components to model
and provision virtualized workloads, and orchestration engine.
Table 3. Cloud Management Platform Elements

Design Element Design Component

vRealize Automation  vRealize Automation Portal Web/Application Server


virtual appliance
 vRealize Automation PostgreSQL Database
 vRealize Automation service catalog
 VMware Identity Manager

vRealize Automation  vRealize Automation IaaS Web Server


IaaS components
 vRealize Automation IaaS Manager Services

Distributed execution  vRealize Automation Distributed Execution Managers.


components
o Orchestrator
o Workers

Integration components  vRealize Automation Agent machines

vRealize Orchestrator  vRealize Orchestrator virtual appliances


components

© 2016 VMware, Inc. All rights reserved.


Page 34 of 220
VMware Validated Design Reference Architecture Guide

Design Element Design Component

Provisioning  vSphere environment


infrastructure
 Other supported physical, virtual, or cloud environments.

Costing components  vRealize Business for Cloud Standard server


 vRealize Business for Cloud Standard data collector

Supporting infrastructure  Microsoft SQL database environment


 Active Directory environment
 SMTP
 NTP

2.3.2.2 Cloud Management Platform Logical Architecture


In this architecture, vRealize Automation and vRealize Orchestrator run on a VXLAN-backed network
that is fronted by the NSX Logical Distributed Router. An NSX Edge services gateway, acting as a
load balancer, is deployed to provide load balancing services for the CMP components.

© 2016 VMware, Inc. All rights reserved.


Page 35 of 220
VMware Validated Design Reference Architecture Guide

Figure 15. vRealize Automation Logical Architecture for Region A

© 2016 VMware, Inc. All rights reserved.


Page 36 of 220
VMware Validated Design Reference Architecture Guide

Figure 16. vRealize Automation Logical Architecture for Region B

2.4 Operations Architecture Overview


2.4.1 Backup Architecture
You can use a backup solution, such as vSphere Data Protection, to protect the data of your SDDC
management components on the management and edge clusters, and of the tenant workloads that
run on the compute clusters.

2.4.1.1 Architecture
Data protection solutions provide the following functions in the SDDC:
 Backup and restore virtual machines.
 Store data according to company retention policies.

© 2016 VMware, Inc. All rights reserved.


Page 37 of 220
VMware Validated Design Reference Architecture Guide

 Inform administrators about backup and restore activities through reports.


vSphere Data Protection instances in the two regions provide data protection for the products that
implement the management capabilities of the SDDC. vSphere Data Protection stores backups of the
management product virtual appliances on a shared storage allocation according to a defined
schedule.
Figure 17. Dual-Region Data Protection Architecture

2.4.2 Disaster Recovery Architecture


You use VMware Site Recovery Manager to implement disaster recovery for the workloads of the
management products in the SDDC.

2.4.2.1 Elements of Disaster Recovery


Disaster recovery that is based on VMware Site Recovery Manager has the following main elements:
 Dual-region configuration. All protected virtual machines are located in Region A that is
considered as the protected region, and are recovered in Region B that is considered as the
recovery region.
In a typical Site Recovery Manager installation, the protected region provides business-critical
data center services. The recovery region is an alternative infrastructure to which Site Recovery
Manager can migrate these services.

© 2016 VMware, Inc. All rights reserved.


Page 38 of 220
VMware Validated Design Reference Architecture Guide

 Replication of virtual machine data.


o Array-based replication. When you use array-based replication, one or more storage arrays
at the protected region replicate data to peer arrays at the recovery region. To use array-
based replication with Site Recovery Manager, you must configure replication first before you
can configure Site Recovery Manager to use it.
o Replication by using vSphere Replication. You deploy the vSphere Replication appliance
and configure vSphere Replication on virtual machines independently of Site Recovery
Manager. vSphere Replication does not require storage arrays. The replication source and
target storage can be any storage device, including, but not limited to, storage arrays.

You can configure vSphere Replication to regularly create and retain snapshots of protected
virtual machines on the recovery region.
 Protection groups. A protection group is a collection of virtual machines that Site Recovery
Manager protects together. You configure virtual machines and create protection groups
differently depending on whether you use array-based replication or vSphere Replication. You
cannot create protection groups that combine virtual machines for which you configured array-
based replication with virtual machines for which you configured vSphere Replication.
 Recovery plans. A recovery plan specifies how Site Recovery Manager recovers the virtual
machines in the protection groups that it contains. You can include a combination of array-based
replication protection groups and vSphere Replication protection groups in the same recovery
plan.

2.4.2.2 Disaster Recovery Configuration


The VMware Validated Design implements the following disaster recovery configuration:
 The following management applications are a subject of disaster recovery protection:
o vRealize Automation together with vRealize Orchestrator and vRealize Business
o Analytics cluster of vRealize Operations Manager
 The virtual infrastructure components that are not in the scope of the disaster recovery protection,
such as vRealize Log Insight, are available as separate instances in each region.
Figure 18. Disaster Recovery Architecture

2.4.3 Logging Architecture


vRealize Log Insight provides real-time log management and log analysis with machine learning-
based intelligent grouping, high-performance searching, and troubleshooting across physical, virtual,
and cloud environments.

© 2016 VMware, Inc. All rights reserved.


Page 39 of 220
VMware Validated Design Reference Architecture Guide

2.4.3.1 Overview
vRealize Log Insight collects data from ESXi hosts using the syslog protocol. It connects to vCenter
Server to collect events, tasks, and alarms data, and integrates with vRealize Operations Manager to
send notification events and enable launch in context. It also functions as a collection and analysis
point for any system capable of sending syslog data. In addition to syslog data an ingestion agent can
be installed on Linux or Windows servers to collect logs. This agent approach is especially useful for
custom logs and operating systems that don't natively support the syslog protocol, such as Windows.

2.4.3.2 Installation Models


You can deploy vRealize Log Insight as a virtual appliance in one of the following configurations:
 Standalone node
 Highly available cluster of one master and at least two worker nodes using an internal load
balancer (ILB)
The compute and storage resources of the vRealize Log Insight instances for scale-up.

2.4.3.3 Cluster Nodes


For high availability and scalability, you can deploy several vRealize Log Insight instances in a
cluster where they can have either of the following roles:
 Master Node. Required initial node in the cluster. The master node is responsible for queries and
log ingestion. The Web user interface of the master node serves as the single pane of glass for
the cluster. All queries against data are directed to the master, which in turn queries the workers
as appropriate.
 Worker Node. Enables scale-out in larger environments. A worker node is responsible for
ingestion of logs. A worker node stores logs locally. If a worker node is down, the logs on that
worker becomes unavailable.
You need at least two worker nodes to form a cluster with the master node.
 Integrated Load Balancer (ILB). Provides high availability (HA). The ILB runs on one of the
cluster nodes. If the node that hosts the ILB Virtual IP (VIP) address stops responding, the VIP
address is failed over to another node in the cluster.

2.4.3.4 Architecture of a Cluster


The architecture of vRealize Log Insight enables several channels for HA collection of log messages.

© 2016 VMware, Inc. All rights reserved.


Page 40 of 220
VMware Validated Design Reference Architecture Guide

Figure 19. Cluster Architecture of vRealize Log Insight

vRealize Log Insight clients connect to ILB VIP address and use the Web user interface and
ingestion (via Syslog or the Ingestion API) to send logs to vRealize Log Insight.
By default, the vRealize Log Insight Solution collects data from vCenter Server systems and ESXi
hosts. For forwarding logs from NSX for vSphere, and vRealize Automation, use content packs which
contain extensions or provide integration with other systems in the SDDC.

2.4.3.5 Integration with vRealize Operations Manager


The integration with vRealize Operations Manager provides a single pane of glass for monitoring the
SDDC. vRealize Log Insight sends notification events to vRealize Operations Manager. You can also
launch vRealize Log Insight from the vRealize Operations Manager Web user interface.

2.4.3.6 Archiving
vRealize Log Insight supports data archiving on NFS shared storage that each vRealize Log Insight
node can access.

2.4.3.7 Backup
You back up each vRealize Log Insight cluster locally by using traditional virtual machine backup
solutions, such as a vSphere Storage APIs for Data Protection (VADP) compatible backup software
like vSphere Data Protection.

2.4.3.8 Multi-Region vRealize Log Insight Deployment


The scope of the SDDC design covers multiple regions. Using vRealize Log Insight in a multi-region
design can provide a syslog infrastructure in all regions of the SDDC. Using vRealize Log
Insight across multiple regions requires deploying a cluster in each region. vRealize Log Insight
supports event forwarding to other vRealize Log Insight deployments across regions in the SDDC.
Implementing failover by using vSphere Replication or disaster recovery by using Site Recovery
Manager is not necessary. The event forwarding feature adds tags to log message that identify the
source region and event filtering prevents looping messages between the regions.

© 2016 VMware, Inc. All rights reserved.


Page 41 of 220
VMware Validated Design Reference Architecture Guide

2.4.4 Operations Management Architecture


vRealize Operations Manager tracks and analyzes the operation of multiple data sources within the
Software-Defined Data Center (SDDC) by using specialized analytics algorithms. These algorithms
help vRealize Operations Manager to learn and predicts the behavior of every object it monitors.
Users access this information by using views, reports, and dashboards.

2.4.4.1 Installation Models


vRealize Operations Manager is available in two different deployment models
- a preconfigured virtual appliance, or a Windows or Linux installable package. Select the installation
method according to the following considerations:
 When you use the vRealize Operations Manager virtual appliance, you deploy the OVF file of the
virtual appliance once for each cluster node. You access the product to set up cluster nodes
according to their role, and log in to configure the installation.
Use virtual appliance deployment to easily create vRealize Operations Manager nodes with pre-
defined identical size.
 When you use the Windows or Linux installable package, you run the vRealize Operations
Manager installation on each cluster node. You access the product to set up cluster nodes
according to their role, and log in to configure the installation.
Use installable package deployment to create vRealize Operations Manager node with custom
identical size.

2.4.4.2 Architecture
vRealize Operations Manager contains functional elements that collaborate for data analysis and
storage, and support creating clusters of nodes with different roles.
Figure 20. vRealize Operations Manager Architecture

© 2016 VMware, Inc. All rights reserved.


Page 42 of 220
VMware Validated Design Reference Architecture Guide

2.4.4.3 Types of Nodes and Clusters


For high availability and scalability, you can deploy several vRealize Operations Manager instances in
a cluster where they can have either of the following roles:
 Master Node. Required initial node in the cluster. In large-scale environments the master node
manages all other nodes. In small-scale environments, the master node is the single standalone
vRealize Operations Manager node.
 Master Replica Node. (Optional) Enables high availability of the master node.
 Data Node. Enables scale-out of vRealize Operations Manager in larger environments. Data
nodes have adapters installed to perform collection and analysis. Data nodes also host vRealize
Operations Manager management packs.
Larger deployments usually include adapters only on data nodes, not on the master node or
replica node
 Remote Collector Node. In distributed deployments, enables navigation through firewalls,
interfaces with a remote data source, reduces bandwidth across regions, or reduces the load on
the vRealize Operations Manager analytics cluster. Remote collector nodes only gather objects
for the inventory and forward collected data to the data nodes. Remote collector nodes do not
store data or perform analysis. In addition, you can install them on a different operating system
than the rest of the cluster nodes.
The master and master replica nodes are data nodes with extended capabilities.
vRealize Operations Manager can form two type of clusters according to the nodes that participate in
a cluster:
 Analytics clusters. Tracks, analyzes, and predicts the operation of monitored systems. Consists
of a master node, data nodes, and optionally of a master replica node.
 Remote collectors cluster. Only collects diagnostics data without storage or analysis. Consists
only of remote collector nodes.

2.4.4.4 Application Functional Components


The functional components of a vRealize Operations Manager instance interact to provide analysis of
diagnostics data from the data center and visualize the result in the Web user interface.

© 2016 VMware, Inc. All rights reserved.


Page 43 of 220
VMware Validated Design Reference Architecture Guide

Table 4. vRealize Operations Manager Logical Node Architecture

Architecture Component Diagram Description

 Admin / Product UI server. The UI server is a


Web application that serves as both user and
administration interface.
 REST API / Collector. The Collector collects
data from all components in the data center.
 Controller. The Controller handles the data flow
the UI server, Collector, and the analytics
engine.
 Analytics. The Analytics engine creates all
associations and correlations between various
data sets, handles all super metric calculations,
performs all capacity planning functions, and is
responsible for triggering alerts.
 Persistence. The persistence layer handles the
read and write operations on the underlying
databases across all nodes.
 FSDB. The File System Database (FSDB)
stores collected metrics in raw format. FSDB is
available in all the nodes.
 xDB (HIS). The xDB stores data from the
Historical Inventory Service (HIS). This
component is available only on the master and
master replica nodes.
 Global xDB. The Global xDB stores user
preferences, alerts, and alarms, and
customization that is related to the vRealize
Operations Manager. This component is
available only on the master and master replica
nodes

2.4.4.5 Management Packs


Management packs contain extensions and third-party integration software. They add dashboards,
alerts definitions, policies, reports, and other content to the inventory of vRealize Operations
Manager. You can learn more details about and download management packs from the VMware
Solutions Exchange.

2.4.4.6 Multi-Region vRealize Operations Manager Deployment


The scope of the SDDC design covers multiple regions. Using vRealize Operations Manager across
multiple regions requires deploying an analytics cluster that is protected by Site Recovery Manager,
and deploying remote collectors in each region.

© 2016 VMware, Inc. All rights reserved.


Page 44 of 220
VMware Validated Design Reference Architecture Guide

3 Detailed Design
This Software-Defined Data Center (SDDC) detailed design consists of the following main sections:
 Physical Infrastructure Design
 Virtual Infrastructure Design
 Cloud Management Platform Design
 Operations Infrastructure Design

Each section is divided into subsections that include detailed discussion, diagrams, and design
decisions with justifications.
The Physical Infrastructure Design section focuses on the three main pillars of any data center,
compute, storage and network. In this section you find information about availability zones and
regions. The section also provides details on the rack and pod configuration, and on physical hosts
and the associated storage and network configurations.
In the Virtual Infrastructure Design section, you find details on the core virtualization software
configuration. This section has information on the ESXi hypervisor, vCenter Server, the virtual
network design including VMware NSX, and on software-defined storage for VMware Virtual SAN.
This section also includes details on business continuity (backup and restore) and on disaster
recovery.
The Cloud Management Platform Design section contains information on the consumption and
orchestration layer of the SDDC stack, which uses vRealize Automation and vRealize Orchestrator. IT
organizations can use the fully distributed and scalable architecture to streamline their provisioning
and decommissioning operations.
The Operations Infrastructure Design section explains how to architect, install, and configure vRealize
Operations Manager and vRealize Log Insight. You learn how to ensure that service management
within the SDDC is comprehensive. This section ties directly into the Operational Guidance section.

3.1 Physical Infrastructure Design


The physical infrastructure design includes details on decisions covering availability zones and
regions and pod layout within data center racks. This section then goes on to detail decisions related
to server, networking, and storage hardware.
Figure 21. Physical Infrastructure Design

© 2016 VMware, Inc. All rights reserved.


Page 45 of 220
VMware Validated Design Reference Architecture Guide

3.1.1 Physical Design Fundamentals


The SDDC physical layer design is based on a pod architecture. The physical data center elements
include racks, servers, network elements, and storage arrays.
Figure 22. Physical Layer within the SDDC

3.1.1.1 Availability Zones and Regions


Availability zones and regions are used for different purposes.
 Availability zones. An availability zone is the fault domain of the SDDC. Multiple availability
zones can provide continuous availability of an SDDC, minimize unavailability of services and
improve SLAs.
 Regions. Regions provide disaster recovery across different SDDC instances. This design uses
two regions. Each region is a separate SDDC instance. The regions have a similar physical layer
design and virtual infrastructure design but different naming. For information on exceptions to this
design, see the Business Continuity / Disaster Recovery Design chapter.

Note This design leverages a single availability zone for a one region deployment, and a single
availability zone in each region in the case of a two region deployment.

The design uses the following regions. The region identifier uses United Nations Code for Trade and
Transport Locations (UN/LOCODE) along with a numeric instance ID.
Table 5. Regions

Region Region Region-specific Domain Region Description


Identifier Name

A SFO01 sfo01.rainpole.local San Francisco, CA, USA based data


center

B LAX01 lax01.rainpole.local Los Angeles, CA, USA based data


center

© 2016 VMware, Inc. All rights reserved.


Page 46 of 220
VMware Validated Design Reference Architecture Guide

Table 6. Availability Zones and Regions Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC- Per region, a single A single availability zone can Results in limited
PHY-001 availability zone that support all SDDC management redundancy of the overall
can support all and compute components for a solution. The single
SDDC management region. You can later add availability zone can
components is another availability zone to become a single point of
deployed. extend and scale the failure and prevent high-
management and compute availability design
capabilities of the SDDC. solutions.

SDDC- Use two regions. Supports the technical Having multiple regions
PHY-002 requirement of multi-region will require an increased
failover capability as outlined in solution footprint and
the design objectives. associated costs.

3.1.1.2 Pods and Racks


The SDDC functionality is split across multiple pods. Each pod can occupy one rack or multiple racks.
The total number of pods for each pod type depends on scalability needs.
Figure 23. SDDC Pod Architecture

© 2016 VMware, Inc. All rights reserved.


Page 47 of 220
VMware Validated Design Reference Architecture Guide

Table 7. Required Number of Racks

Pod (Function) Required Minimum Comment


Number of Number of
Racks Racks
(for full scale
deployment)

Management pod 1 1 Two half-racks are sufficient for the


and shared edge management pod and shared edge and
and compute pod compute pod. As the number and resource
usage of compute VMs increase adding
additional hosts to the cluster will be
required, as such extra space in the rack
should be reserved for growth.

Compute pods 6 0 With 6 compute racks, 6 compute pods


with 19 ESXi hosts each can achieve the
target size of 6000 average-sized VMs. If
an average size VM has two vCPUs with 4
GB of RAM, 6000 VMs with 20% overhead
for bursting workloads require 114 hosts.
The quantity and performance varies based
on the workloads running within the
compute pods.

Storage pods 6 0 (if using Storage that is not Virtual SAN storage is
Virtual SAN hosted on isolated storage pods.
for Compute
Pods)

Total 13 1

© 2016 VMware, Inc. All rights reserved.


Page 48 of 220
VMware Validated Design Reference Architecture Guide

Table 8. POD and Racks Design Decisions

Decision ID Design Decision Design Justification Design Implication

SDDC-PHY-003 A single compute pod Scaling out of the Dual power supplies
is bound to a physical SDDC infrastructure is and power feeds are
rack. simplified by through a required to ensure
1:1 relationship availability of
between a compute hardware
pod and the compute components.
resources contained
within a physical rack.

SDDC-PHY-004 The management and The number of The design must


the shared edge and required compute include sufficient
compute pod occupy resources for the power and cooling to
the same rack. management pod (4 operate the server
ESXi servers) and equipment. This
shared edge and depends on the
compute pod (4 ESXi selected vendor and
servers) are low and products.
do not justify a
If the equipment in
dedicated rack for
this entire rack fails, a
each pod.
second region is
On-ramp and off-ramp needed to mitigate
connectivity to physical downtime associated
networks (i.e., north- with such an event.
south L3 routing on
NSX Edge virtual
appliances) can be
supplied to both the
management and
compute pods via this
management/edge
rack.
Edge resources
require external
connectivity to physical
network devices.
Placing edge
resources for
management and
compute in the same
rack will minimize
VLAN spread.

© 2016 VMware, Inc. All rights reserved.


Page 49 of 220
VMware Validated Design Reference Architecture Guide

Decision ID Design Decision Design Justification Design Implication

SDDC-PHY-005 Storage pods can To simplify the scale The design must
occupy one or more out of the SDDC include sufficient
racks. infrastructure, the power and cooling to
storage pod to rack(s) operate the storage
relationship has been equipment. This
standardized. depends on the
selected vendor and
It is possible that the
products.
storage system arrives
from the manufacturer
in dedicated rack or
set of racks and a
storage system of this
type is accommodated
for in the design.

SDDC-PHY-006 Each rack has two Redundant power All equipment used
separate power feeds. feeds increase must support two
availability by ensuring separate power feeds.
that failure of a power The equipment must
feed does not bring keep running if one
down all equipment in power feed fails.
a rack.
If the equipment of an
Combined with entire rack fails, the
redundant network cause, such as
connections into a rack flooding or an
and within a rack, earthquake, also
redundant power feeds affects neighboring
prevent failure of racks. A second
equipment in an entire region is needed to
rack. mitigate downtime
associated with such
an event.

SDDC-PHY-007 Mount the compute Mounting the compute None.


resources (minimum resources for the
of 4 ESXi servers) for management pod
the management pod together can ease
together in a rack. physical data center
design, deployment
and troubleshooting.
Using a VM to host
ratio of more than
100:1 can lead to
availability issues.
Host numbers within
this pod should be
scaled accordingly.

© 2016 VMware, Inc. All rights reserved.


Page 50 of 220
VMware Validated Design Reference Architecture Guide

Decision ID Design Decision Design Justification Design Implication

SDDC-PHY-008 Mount the compute Mounting the compute None.


resources for the resources for the
shared edge and shared edge and
compute pod compute pod together
(minimum of 4 ESXi can ease physical
servers) together in a datacenter design,
rack. deployment and
troubleshooting.
Using a VM to host
ratio of more than
100:1 can lead to
availability issues.
Host numbers within
this pod should be
scaled accordingly.

3.1.1.3 ESXi Host Physical Design Specifications


The physical design specifications of the ESXi host list the characteristics that were used during
deployment and testing of the VMware Validated Design.
The configuration and assembly process for each system is standardized, with all components
installed the same manner on each host. Standardizing the entire physical configuration of the ESXi
hosts is critical to providing an easily manageable and supportable infrastructure because
standardization eliminates variability. Consistent PCI card slot location, especially for network
controllers, is essential for accurate alignment of physical to virtual I/O resources. Deploy ESXi hosts
with identical configuration, including identical storage, and networking configurations, across all
cluster members. Identical configurations ensure an even balance of virtual machine storage
components across storage and compute resources.
Select all ESXi host hardware, including CPUs following the VMware Compatibility Guide.
The sizing of the physical servers for the ESXi hosts for the management and edge pods has special
consideration because it is based on the VMware document "VMware Virtual SAN Ready Nodes", as
these pod type use VMware Virtual SAN.
 An average sized VM has two vCPUs with 4 GB of RAM, 6000 VMs with 20% for bursting
workloads.
 A standard 2U server can host 60 average-sized VMs on a single ESXi host.
Table 9. ESXi Host Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC- Use Virtual SAN Using a Virtual SAN Ready Node Might limit hardware
PHY-009 Ready Nodes. ensures seamless compatibility choices.
with Virtual SAN during the
deployment.

© 2016 VMware, Inc. All rights reserved.


Page 51 of 220
VMware Validated Design Reference Architecture Guide

SDDC- All nodes must A balanced cluster delivers more Vendor sourcing,
PHY-010 have uniform predictable performance even budgeting and
configurations during hardware failures. In procurement
across a given addition, performance impact considerations for uniform
cluster. during resync/rebuild is minimal server nodes will be
when the cluster is balanced. applied on a per cluster
basis.

3.1.1.4 ESXi Host Memory

Note See the VMware Virtual SAN 6.0 Design and Sizing Guide for more information about disk
groups, including design and sizing guidance. The number of disk groups and disks that an
ESXi host manages determines memory requirements. 32 GB of RAM is required to support
the maximum number of disk groups.

Table 10. Host Memory Design Decision

Decision Design Decision Design Justification Design


ID Implication

SDDC- Set up each ESXi host in the The VMs in the management None
PHY-011 management and edge pods to and edge pods require a total
have a minimum 128 GB RAM. 375 GB RAM.

3.1.1.5 Host Boot Device Background Considerations


Minimum boot disk size for ESXi in SCSI-based devices (SAS / SATA / SAN) is greater than 5 GB.
ESXi can be deployed using stateful local, SAN SCSI boot devices, or vSphere Auto Deploy.
What is supported depends on the version of Virtual SAN that you are using:
 Virtual SAN does not support stateless vSphere Auto Deploy
 Virtual SAN 5.5 supports USB/SD embedded devices for ESXi boot device (4 GB or greater).
 Since Virtual SAN 6.0, there is an option to use SATADOM as a supported boot device.
Refer to the VMware Virtual SAN 6.0 Design and Sizing Guide to choose the option that best fits your
hardware.

3.1.2 Physical Networking Design


The physical network uses a leaf-and-spine design, shown in the following illustration. For additional
information, see Physical Network Architecture.
Figure 24. Leaf-and-Spine Architecture

© 2016 VMware, Inc. All rights reserved.


Page 52 of 220
VMware Validated Design Reference Architecture Guide

3.1.2.1 Leaf-and-Spine and Network Virtualization Architecture


As virtualization, cloud computing, and distributed cloud become more pervasive in the data center, a
shift in the traditional three-tier networking model is taking place. This shift addresses simplicity and
scalability.
Simplicity
The traditional core-aggregate-access model is efficient for north/south traffic that travels in and out of
the data center. This model is usually built for redundancy and resiliency against failure. However, the
Spanning Tree Protocol (STP) typically blocks 50 percent of the critical network links to prevent
network loops, which means 50 percent of the maximum bandwidth is wasted until something fails.
A core-aggregate-access architecture is still widely used for service-oriented traffic that travels
north/south. However, the trends in traffic patterns are changing with the types of workloads. In
today’s data centers east/west or server-to-server traffic is common. If the servers in a cluster are
performing a resource-intensive calculation in parallel, unpredictable latency or lack of bandwidth are
undesirable. Powerful servers that perform these calculations can attempt to communicate with each
other, but if they cannot communicate efficiently because of a bottleneck in the network architecture,
wasted capital expenditure results.
One way to solve the problem is to create a leaf-and-spine architecture, also known as a distributed
core.
A leaf-and-spine architecture has two main components: spine switches and leaf switches.
 Spine switches can be thought of as the core, but instead of being a large, chassis-based
switching platform, the spine consists of many high-throughput Layer 3 switches with high port
density.
 Leaf switches can be treated as the access layer. Leaf switches provide network connection
points for servers and uplink to the spine switches.
Every leaf switch connects to every spine switch in the fabric. No matter which leaf switch a server is
connected to, it always has to cross the same number of devices to get to another server (unless the
other server is located on the same leaf). This design keeps the latency down to a predictable level
because a payload has to hop only to a spine switch and another leaf switch to get to its destination.

© 2016 VMware, Inc. All rights reserved.


Page 53 of 220
VMware Validated Design Reference Architecture Guide

Figure 25. Example of a Small-Scale Leaf-and-Spine Architecture

Instead of relying on one or two large chassis-based switches at the core, the load is distributed
across all spine switches, making each individual spine insignificant as the environment scales out.
Scalability
Several factors, including the following, affect scale.
 Number of racks that are supported in a fabric
 Amount of bandwidth between any two racks in a data center
 Number of paths a leaf switch can select from when communicating with another rack
The total number of available ports dictates the number of racks supported in a fabric across all spine
switches and the acceptable level of oversubscription.
Different racks might be hosting different types of infrastructure. For example, a rack might host filers
or other storage systems, which might attract or source more traffic than other racks in a data center.
In addition, traffic levels of compute racks (that is, racks that are hosting hypervisors with workloads
or virtual machines) might have different bandwidth requirements than edge racks, which provide
connectivity to the outside world. Link speed as well as the number of links vary to satisfy different
bandwidth demands.
The number of links to the spine switches dictates how many paths are available for traffic from this
rack to another rack. Because the number of hops between any two racks is consistent, equal-cost
multipathing (ECMP) can be used. Assuming traffic sourced by the servers carry a TCP or UDP
header, traffic spray can occur on a per-flow basis.

© 2016 VMware, Inc. All rights reserved.


Page 54 of 220
VMware Validated Design Reference Architecture Guide

Figure 26. Leaf-and-Spine and Network Virtualization

3.1.2.2 Top of Rack Physical Switches


When configuring Top of Rack (ToR) switches, consider the following best practices:
 Configure redundant physical switches to enhance availability.
 Configure switch ports that connect to ESXi hosts manually as trunk ports. Virtual switches are
passive devices and do not send or receive trunking protocols, such as Dynamic Trunking
Protocol (DTP).
 Modify the Spanning Tree Protocol (STP) on any port that is connected to an ESXi NIC to reduce
the time it takes to transition ports over to the forwarding state, for example using the Trunk
PortFast feature found in a Cisco physical switch.
 Provide DHCP or DHCP Helper capabilities on all VLANs that are used by VMkernel ports. This
setup simplifies the configuration by using DHCP to assign IP address based on the IP subnet in
use.

3.1.2.3 Jumbo Frames


IP storage throughput can benefit from the configuration of jumbo frames. Increasing the per-frame
payload from 1500 bytes to the jumbo frame setting increases the efficiency of data transfer. Jumbo
frames must be configured end-to-end, which is easily accomplished in a LAN. When you enable
jumbo frames on an ESXi host, you have to select an MTU that matches the MTU of the physical
switch ports.
The workload determines whether it makes sense to configure jumbo frames on a virtual machine. If
the workload consistently transfers large amounts of network data, configure jumbo frames if possible.
In that case, the virtual machine operating systems and the virtual machine NICs must also support
jumbo frames.
Using jumbo frames also improves performance of vSphere vMotion.

© 2016 VMware, Inc. All rights reserved.


Page 55 of 220
VMware Validated Design Reference Architecture Guide

Note VXLANs need an MTU value of at least 1600 bytes on the switches and routers that carry the
transport zone traffic.

Table 11. Jumbo Frames Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC- Configure the MTU Setting the MTU to 9000 bytes When adjusting the MTU
PHY- size to 9000 bytes (Jumbo Frames) improves packet size, the entire
NET-001 (Jumbo Frames) on traffic throughput. network path (VMkernel
the portgroups that port, distributed switch,
In order to support VXLAN the
support the following physical switches and
MTU setting must be increased
traffic types. routers) must also be
to a minimum of 1600 bytes,
configured to support the
 NFS setting this portgroup also to
same MTU packet size.
9000 bytes has no effect on
 Virtual SAN VXLAN but ensures
 vMotion consistency across portgroups
that are adjusted from the
 VXLAN default MTU size.
 vSphere
Replication

3.1.2.4 Leaf Switch Connectivity and Network Settings


Each ESXi host in the compute rack is connected redundantly to the SDDC network fabric ToR
switches via two 10 GbE ports, as shown in the following illustration. Configure the ToR switches to
provide all necessary VLANs via an 802.1Q trunk.
Figure 27. Leaf Switch to Server Connection within Compute Racks

© 2016 VMware, Inc. All rights reserved.


Page 56 of 220
VMware Validated Design Reference Architecture Guide

Each ESXi host in the management/shared edge and compute rack is connected to the SDDC
network fabric and also to the Wide Area Network (WAN) and to the Internet, as show in the following
illustration.
Figure 28. Leaf Switch to Server Connection within Management/Shared Compute and Edge
Rack

3.1.2.5 VLANs and Subnets


Each ESXi host in the compute rack and the management/edge rack uses VLANs and corresponding
subnets for internal-only traffic, as shown in the illustration below.
The leaf switches of each rack act as the Layer 3 interface for the corresponding subnet.
The management/edge rack provides externally accessible VLANs for access to the Internet and/or
MPL-based corporate networks.

© 2016 VMware, Inc. All rights reserved.


Page 57 of 220
VMware Validated Design Reference Architecture Guide

Figure 29. Sample VLANs and Subnets within a Pod

Follow these guidelines:


 Use only /24 subnets to reduce confusion and mistakes when dealing with IPv4 subnetting.
 Use the IP address .1 as the (floating) interface with .2 and .3 for Virtual Router Redundancy
Protocol (VRPP) or Hot Standby Routing Protocol (HSRP).
 Use the RFC1918 IPv4 address space for these subnets and allocate one octet by region and
another octet by function. For example, the mapping 172.regionid.function.0/24 results in the
following sample subnets.

Note The following IP ranges are meant as samples. Your actual implementation depends on your
environment.

Table 12. VLAN Sample IP Ranges

Pod Function Sample VLAN Sample IP range

Management Management 1611 (Native) 172.1611.0/24

Management vMotion 1612 172.16.12.0/24

Management VXLAN 1614 172.16.14.0/24

Management VSAN 1613 172.16.13.0/24

Shared Edge and Compute Management 1631 (Native) 172.16.31.0/24

Shared Edge and Compute vMotion 1632 172.16.32.0/24

Shared Edge and Compute VXLAN 1634 172.16.34.0/24

Shared Edge and Compute VSAN 1633 172.16.33.0/24

© 2016 VMware, Inc. All rights reserved.


Page 58 of 220
VMware Validated Design Reference Architecture Guide

3.1.2.6 Access Port Network Settings


Configure additional network settings on the access ports that connect the leaf switch to the
corresponding servers.
 Spanning-Tree Protocol (STP). Although this design does not use the spanning tree protocol,
switches usually come with STP configured by default. Designate the access ports as trunk
PortFast.
 Trunking. Configure the VLANs as members of a 802.1Q trunk with the management VLAN
acting as the native VLAN.
 MTU. Set MTU for all VLANS (Management, vMotion, VXLAN and Storage) to jumbo frames for
consistency purposes.
 DHCP helper. Configure the VIF of the Management, vMotion and VXLAN subnet as a DHCP
proxy.
 Multicast. Configure IGMP snooping on the ToR switches and include an IGMP querier on each
VLAN.

3.1.2.7 Region Interconnectivity


The SDDC management networks, VXLAN kernel ports and the edge and compute VXLAN kernel
ports of the two regions must be connected. These connections can be over a VPN tunnel, Point to
Point circuits, MPLS, etc. End users must be able to reach the public-facing network segments (public
management and tenant networks) of both regions.
The design of the connection solution is out of scope for this VMware Validated Design.

3.1.2.8 Physical Network Characteristics


 Requirements. The design uses 4 spine switches with 40 GbE ports. As a result, each leaf
switch must have 4 uplink ports capable of 40 GbE.
 Fault Tolerance. In case of a switch failure or scheduled maintenance, switch fabric capacity
reduction is 25% with four spine switches.
 Oversubscription. Oversubscription can occur within a leaf switch. To compute the
oversubscription for a leaf switch, use this formula.
Total bandwidth available to all connected servers / aggregate amount of
uplink bandwidth
The compute rack and the management/edge rack have 19 ESXi hosts. Each ESXi host has one 10
GbE port connected to each ToR switch, creating up to 190 Gbps of bandwidth. With four 40 GbE
uplinks to the spine, you can compute oversubscription as follows (see the Oversubscription in the
Leaf Switches illustration).
190 Gbps (total bandwidth) /160 Gbps (uplink bandwidth) = 1.1875:1
Figure 30. Oversubscription in the Leaf Switches

© 2016 VMware, Inc. All rights reserved.


Page 59 of 220
VMware Validated Design Reference Architecture Guide

 Routing protocols. Base the selection of the external routing protocol on your current
implementation or on available expertise among the IT staff. Take performance requirements into
consideration. Possible options are OSPF, iBGP and IS-IS.
 DHCP proxy. The DHCP proxy must point to a DHCP server via its IPv4 address. See the
External Service Dependencies section for details on the DHCP server.
Table 13. Physical Network Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC- Racks are connected A Layer 3 leaf-and-spine Layer 2 traffic is reduced


PHY-NET- using a leaf-and-spine architecture supports scale to within the pod.
002 topology and Layer 3 out while maintaining failure
connectivity. isolation.

SDDC- Only the management Aggregating physical cabling Workloads in compute


PHY-NET- and shared edge and and network services to the pods located in compute
003 compute rack have management and shared racks have to use network
physical access to the edge and compute rack virtualization (NSX for
external network via reduces costs. vSphere) for external
VLANs network connectivity..

SDDC- Each rack uses two This design uses two 10 Requires two ToR
PHY-NET- ToR switches. These GbE links to provide switches per rack which
004 switches provide redundancy and reduce can increase costs.
connectivity across two overall design complexity.
10 GbE links to each
server.

© 2016 VMware, Inc. All rights reserved.


Page 60 of 220
VMware Validated Design Reference Architecture Guide

Decision Design Decision Design Justification Design Implication


ID

SDDC- Use VLANs to segment Allow for Physical network Uniform configuration and
PHY-NET- physical network connectivity without requiring presentation is required
005 functions. large number of NICs. on all the trunks made
available to the ESXi
Segregation is needed for
hosts.
the different network
functions that are required in
the SDDC. This allows for
differentiated services and
prioritization of traffic as
needed.

Table 14. Additional Network Design Decisions

Decision Design Decision Design Justification Design


ID Implication

SDDC- Static IP addresses will be Configuration of static IP Accurate IP


PHY-NET- assigned to all management addresses avoid connection address
006 nodes of the SDDC outages due to DHCP management
infrastructure. availability or must be in place.
misconfiguration.

SDDC- DNS records to enable Ensures consistent resolution None


PHY-NET- forward, reverse, short and of management nodes using
007 FQDN resolution will be both IP address (reverse
created for all management lookup) and name resolution.
nodes of the SDDC
infrastructure.

SDDC- NTP time source will be used Critical to maintain accurate None
PHY-NET- for all management nodes of and synchronized time
008 the SDDC infrastructure. between management nodes.

3.1.3 Physical Storage Design


This VMware Validated Design relies on both VMware Virtual SAN storage and NFS storage. The
Shared Storage Design section explains where the SDDC uses which type of storage and gives
background information. The focus of this section is the physical storage design.

3.1.3.1 Virtual SAN Physical Design


Software-defined storage is a key technology in the SDDC. This design uses VMware Virtual SAN to
implement software-defined storage for the management clusters.
VMware Virtual SAN is a fully integrated hypervisor-converged storage software. Virtual SAN creates
a cluster of server hard disk drives and solid state drives and presents a flash-optimized, highly
resilient, shared storage datastore to hosts and virtual machines. Virtual SAN allows you to control
capacity, performance, and availability on a per virtual machine basis through the use of storage
policies.
Requirements and Dependencies

© 2016 VMware, Inc. All rights reserved.


Page 61 of 220
VMware Validated Design Reference Architecture Guide

The following requirements and dependencies summarize the information in the VMware Virtual SAN
documentation. The design decisions of this VMware Validated Design fulfill these requirements.
The software-defined storage module has the following requirements and options:
 Minimum of 3 hosts providing storage resources to the Virtual SAN cluster.
 Virtual SAN is configured as hybrid storage or all-flash storage.
o A Virtual SAN hybrid storage configuration requires both magnetic devices and flash caching
devices.
o An All-Flash Virtual SAN configuration requires vSphere 6.0 or later.
 Each ESXi host that provides storage resources to the cluster must meet the following
requirements:
o Minimum of one SSD.
o The SSD flash cache tier should be at least 10% of the size of the HDD capacity tier.
o Minimum of two HHDs.
o RAID controller compatible with VMware Virtual SAN.
o 10 Gbps network for Virtual SAN traffic with Multicast enabled.
o vSphere High Availability Isolation Response set to power off virtual machines. With this
setting, no possibility of split brain conditions in case of isolation or network partition exists. In
a split-brain condition, the virtual machine might be powered on by two hosts by mistake. See
design decision SDDC-VI-VC-024 for more details.
Table 15. Virtual SAN Physical Storage Design Decision

Decision Design Decision Design Justification Design Implication


ID

SDDC- Use one 200 GB SSD Allow enough capacity Having only one disk group
PHY-STO- and two traditional 1 TB for the management limits the amount of striping
001 HDDs to create a single VMs with a minimum of (performance) capability and
disk group in the 10% flash-based increases the size of the
management cluster. caching. fault domain.

Virtual SAN Background Information


vSphere offers two different Virtual SAN modes of operation, all-flash or hybrid.
Hybrid Mode
In a hybrid storage architecture, Virtual SAN pools server-attached capacity devices--in this case
magnetic devices--and caching devices, typically SSDs or PCI-e devices to create a distributed
shared datastore.
All-Flash Mode
VMware Virtual SAN can be deployed as all-flash storage. All-flash storage uses flash-based devices
(SSD or PCI-e) only as a write cache while other flash-based devices provide high endurance for
capacity and data persistence.
Table 16. Virtual SAN Mode Design Decision

Decision Design Design Justification Design Implication


ID Decision

© 2016 VMware, Inc. All rights reserved.


Page 62 of 220
VMware Validated Design Reference Architecture Guide

SDDC- Configure The VMs in the management cluster, Virtual SAN hybrid mode
PHY-STO- Virtual SAN in which are hosted within Virtual SAN, does not provide the
002 hybrid mode. do not require the performance or potential performance of
expense of an all-flash Virtual SAN an all-flash configuration.
configuration.

Hardware Considerations
You can build your own VMware Virtual SAN cluster or choose from a list of Virtual SAN Ready
Nodes.
 Build Your Own. Be sure to use hardware from the VMware Compatibly Guide for Virtual SAN
(https://www.vmware.com/resources/compatibility/search.php?deviceCategory=Virtual SAN) for
the following components:
o Solid state disks (SSDs)
o Magnetic hard drives (HDDs)
o I/O controllers, including Virtual SAN certified driver/firmware combinations
 Use VMware Virtual SAN Ready Nodes. A Virtual SAN Ready Node is a validated server
configuration in a tested, certified hardware form factor for Virtual SAN deployment, jointly
recommended by the server OEM and VMware. See the VMware Virtual SAN Compatibility Guide
(https://www.vmware.com/resources/compatibility/pdf/vi_vsan_rn_guide.pdf). The Virtual SAN
Ready Node documentation provides examples of standardized configurations, including the
numbers of VMs supported and estimated number of 4K IOPS delivered.
As per design decision SDDC-PHY-009, the VMware Validated Design uses Virtual SAN Ready
Nodes.

3.1.3.2 Solid State Disk (SSD) Characteristics


In a VMware Virtual SAN configuration, the SSDs are used for the Virtual SAN caching layer for
hybrid deployments and for the capacity layer for all flash.
 For a hybrid deployment, the use of the SSD is split between a non-volatile write cache
(approximately 30%) and a read buffer (approximately 70%). As a result, the endurance and the
number of I/O operations per second that the SSD can sustain are important performance factors.
 For an all-flash model, endurance and performance have the same criteria. However, many more
write operations are held by the caching tier, thus elongating or extending the life of the SSD
capacity-tier.
SSD Endurance
This VMware Validated Design uses class D endurance class SSDs for the caching tier.

Note All drives listed in the VMware Compatibility Guide for Virtual SAN
(https://www.vmware.com/resources/compatibility/search.php?deviceCategory=Virtual SAN)
meet the Class D requirements.

SDDC Endurance Design Decision Background


For endurance of the SSDs used for Virtual SAN, standard industry write metrics are the primary
measurements used to gauge the reliability of the drive. No standard metric exists across all vendors,
however; Drive Writes per Day (DWPD) or Petabytes Written (PBW) are the measurements normally
used.
For vSphere 5.5, the endurance class was based on Drive Writes Per Day (DWPD). For VMware
Virtual SAN 6.0, the endurance class has been updated to use Terabytes Written (TBW), based on
the vendor’s drive warranty. TBW can be used for VMware Virtual SAN 5.5 and VMware Virtual SAN
6.0 and is reflected in the VMware Compatibility Guide for Virtual SAN.

© 2016 VMware, Inc. All rights reserved.


Page 63 of 220
VMware Validated Design Reference Architecture Guide

The reasoning behind using TBW is that VMware now offers the flexibility to use larger capacity drives
with lower DWPD specifications.
If a SSD vendor uses Drive Writes Per Day as a measurement, you can calculate endurance in
Terabytes Written (TBW) as follows:
TBW (over 5 years) = Drive Size x DWPD x 365 x 5
For example, if a vendor specified DWPD = 10 for a 800 GB capacity SSD, you can compute TBW as
follows:
TBW = 0.4TB X 10DWPD X 365days X 5yrs
TBW = 7300TBW
That means the SSD supports 7300TB writes over 5 years. (Higher TBW figures denote a higher
endurance class).
For SSDs that are designated for caching and all-flash capacity layers, the following table outlines
which endurance class to use for hybrid and for all-flash VMware Virtual SAN.
Table 17. Hybrid and All-Flash Virtual SAN Endurance Classes

Endurance TBW Hybrid Caching All-Flash Caching All-Flash Capacity


Class Tier Tier Tier

Class A >=365 No No Yes

Class B >=1825 Yes No Yes

Class C >=3650 Yes Yes Yes

Class D >=7300 Yes Yes Yes

Note This VMware Validated Design does not use All-Flash Virtual SAN.

Table 18. SSD Endurance Class Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC- Use Class D If a SSD designated for the SSDs with higher
PHY-STO- (>=7300TBW) SSDs caching tier fails due to wear-out, endurance may be
003 for the caching tier of the entire VMware Virtual SAN more expensive than
the management disk group becomes unavailable. lower endurance
cluster. The result is potential data loss or classes.
operational impact.

SSD Performance
There is a direct correlation between the SSD performance class and the level of Virtual SAN
performance. The highest-performing hardware results in the best performance of the solution. Cost is
therefore the determining factor. A lower class of hardware that is more cost effective might be
attractive even if the performance or size is not ideal. For optimal performance of Virtual SAN, select
class E SSDs. See the VMware Compatibility Guide for Virtual SAN
(https://www.vmware.com/resources/compatibility/search.php?deviceCategory=Virtual SAN) for detail
on the different classes.
SSD Performance Design Decision Background

© 2016 VMware, Inc. All rights reserved.


Page 64 of 220
VMware Validated Design Reference Architecture Guide

Select a high class of SSD for optimal performance of VMware Virtual SAN. Before selecting a drive
size, consider disk groups and sizing as well as expected future growth. VMware defines classes of
performance in the VMware Compatibility Guide for Virtual SAN
(https://www.vmware.com/resources/compatibility/search.php?deviceCategory=vsan) as follows:
Table 19. SSD Performance Classes

Performance Class Writes Per Second

Class A 2,500 – 5,000

Class B 5,000 – 10,000

Class C 10,000 – 20,000

Class D 20,000 – 30,000

Class E 30,000 – 100,100

Class F 100,000 +

Select an SSD size that is, at a minimum, 10 percent of the anticipated size of the consumed HDD
storage capacity, before failures to tolerate are considered. For example, select an SSD of at least
100 GB for 1 TB of HDD storage consumed in a 2 TB disk group.
Caching Algorithm
Both hybrid clusters and all-flash configurations adhere to the recommendation that 10% of consumed
capacity for the flash cache layer. However, there are differences between the two configurations:
 Hybrid Virtual SAN. 70% of the available cache is allocated for storing frequently read disk
blocks, minimizing accesses to the slower magnetic disks. 30% of available cache is allocated to
writes.
 All-Flash Virtual SAN. All-flash clusters have two types of flash: very fast and durable write
cache, and cost-effective capacity flash. Here cache is 100% allocated for writes, as read
performance from capacity flash is more than sufficient.
Use Class E SSDs for the highest possible level of performance from the VMware Virtual SAN
volume.
Table 20. SSD Performance Class Selection

Design Option 1 Option 2 Comments


Quality Class E Class C

Availability o o Neither design option impacts availability.

Manageability o o Neither design option impacts manageability.

Performance ↑ ↓ The higher the storage class that is used, the


better the performance.

Recover-ability o o Neither design option impacts recoverability.

Security o o Neither design option impacts security.

Legend: ↑ = positive impact on quality; ↓ = negative impact on quality; o = no impact on quality.

© 2016 VMware, Inc. All rights reserved.


Page 65 of 220
VMware Validated Design Reference Architecture Guide

Table 21. SSD Performance Class Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC- Use Class E SSDs The storage I/O performance Class E SSDs might
PHY-STO- (30,000-100,000 writes requirements within the be more expensive
004 per second) for the Management cluster dictate the than lower class
management cluster. need for at least Class E SSDs. drives.

3.1.3.3 Magnetic Hard Disk Drives (HDD) Characteristics


The HDDs in a VMware Virtual SAN environment have two different purposes, capacity and object
stripe width.
 Capacity. Magnetic disks, or HDDs, unlike caching-tier SSDs, make up the capacity of a Virtual
SAN datastore
 Stripe Width. You can define stripe width at the virtual machine policy layer. Virtual SAN might
use additional stripes when making capacity and placement decisions outside a storage policy.
Virtual SAN supports these disk types:
 Serial Attached SCSI (SAS)
 Near Line Serial Attached SCSI (NL-SCSI). NL-SAS can be thought of as enterprise SATA drives
but with a SAS interface.
 Serial Advanced Technology Attachment (SATA). Use SATA magnetic disks only in capacity-
centric environments where performance is not prioritized.
SAS and NL-SAS get you the best results. This VMware Validated Design uses 10,000 RPM drives to
achieve a balance between cost and availability.
HDD Capacity, Cost, and Availability Background Considerations
You can achieve the best results with SAS and NL-SAS.
As per the VMware Compatibility Guide for Virtual SAN, the maximum size of an SAS drive at the time
of writing is 6 TB. The VMware Virtual SAN design must consider the number of magnetic disks
required for the capacity layer, and how well the capacity layer will perform.
 SATA disks typically provide more capacity per individual drive, and tend to be less expensive
than SAS drives. However, the trade-off is performance, because SATA performance is not as
good as SAS performance due to lower rotational speeds (typically 7200RPM)
 Choose SAS magnetic disks instead of SATA magnetic disks in environments where performance
is critical.
Consider that failure of a larger capacity drive has operational impact on the availability and recovery
of more components.
Rotational Speed (RPM) Background Considerations
HDDs tend to be more reliable, but that comes at a cost. SAS disks can be available up to 15,000
RPM speeds.
Table 22. Virtual SAN HDD Environmental Characteristics

Characteristic Revolutions per Minute (RPM)

Capacity 7,200

Performance 10,000

© 2016 VMware, Inc. All rights reserved.


Page 66 of 220
VMware Validated Design Reference Architecture Guide

Additional Performance 15,000

Cache-friendly workloads are less sensitive to disk performance characteristics; however, workloads
can change over time. HDDs with 10,000 RPM are the accepted norm when selecting a capacity tier.
For the software-defined storage module, VMware recommends that you use an HDD configuration
that is suited to the characteristics of the environment. If there are no specific requirements, selecting
10,000 RPM drives achieves a balance between cost and availability.
Table 23. HDD Characteristic Selection

Design Option 1 Option 2 Option 3 Comments


Quality 7200 10000 15000
RPM RPM RPM

Availability ↑ ↓ ↓ Less expensive disks make it easier to


achieve more failures to tolerate without
incurring a large cost. Therefore, slower
disks are an appealing option for an
environment in which availability is
important.

Manageability o o o No design option impacts manageability.

Performance ↓ ↑ ↑↑ In a VMware Virtual SAN environment,


performance is best when using high-RPM
HDDs. However, a high-performing SDD
impacts performance more than a high-RPM
HDD.

Recoverability o o o No design option impacts recover-ability.

Security o o o No design option impacts security.

Legend: ↑ = positive impact on quality; ↓ = negative impact on quality; o = no impact on quality.

Table 24. HDD Selection Design Decisions

Decision Design Decision Design Justification Design


ID Implication

SDDC- Use 10,000 RPM 10,000 RPM HDDs achieve a balance Slower and
PHY-STO- HDDs for the between performance and availability potentially cheaper
005 management for the VMware Virtual SAN HDDs are not
cluster. configuration. available.
The performance of 10,000 RPM HDDs
avoids disk drain issues. In Virtual SAN
hybrid mode, the Virtual SAN
periodically flushes uncommitted writes
to the capacity tier.

© 2016 VMware, Inc. All rights reserved.


Page 67 of 220
VMware Validated Design Reference Architecture Guide

3.1.3.4 I/O Controllers


The I/O controllers are as important to a VMware Virtual SAN configuration as the selection of disk
drives. Virtual SAN supports SAS, SATA, and SCSI adapters in either pass-through or RAID 0 mode.
Virtual SAN supports multiple controllers per host.
 Multiple controllers can improve performance and mitigate a controller or SSD failure to a smaller
number of drives or Virtual SAN disk groups.
 With a single controller, all disks are controlled by one device. A controller failure impacts all
storage, including the boot media (if configured).
Controller queue depth is possibly the most important aspect for performance. All I/O controllers in the
VMware Virtual SAN Hardware Compatibility Guide have a minimum queue depth of 256. Consider
normal day-to-day operations and increase of I/O due to Virtual Machine deployment operations or re-
sync I/O activity as a result of automatic or manual fault remediation.
A Note on SAS Expanders
SAS expanders are a storage technology that lets you maximize the storage capability of your SAS
controller card. Like switches of an ethernet network, SAS expanders enable you to connect a larger
number of devices, that is, more SAS/SATA devices to a single SAS controller. Many SAS controllers
support up to 128 or more hard drives.
VMware has not extensively tested SAS expanders, as a result performance and operational
predictability are relatively unknowns at this point. Avoid configurations with SAS expanders.

3.1.3.5 NFS Physical Storage Design


Network File System (NFS) is a distributed file system protocol that allows a user on a client computer
to access files over a network much like local storage is accessed. In this case the client computer is
an ESXi host, and the storage is provided by a NFS-capable external storage array.
The management cluster uses VMware Virtual SAN for primary storage and NFS for secondary
storage. The compute clusters are not restricted to any particular storage technology. For compute
clusters, the decision on which technology to use is based on the performance, capacity, and
capabilities (replication, deduplication, compression, etc.) required by the workloads that are running
in the clusters.
Table 25. NFS Usage Design Decisions

Decision ID Design Decision Design Justification Design


Implication

SDDC- NFS storage is presented Separate primary virtual An NFS capable


PHY-STO- to provide: machine storage from backup external array is
006 data in case of primary storage required.
 A datastore for backup
data
failure.

 An export for archive vRealize Log Insight archiving


data requires a NFS export.
 A datastore for
templates and ISOs

Requirements
Your environment must meet the following are requirements to use NFS storage in the VMware
Validated Design.
 Storage arrays are connected directly to the leaf switches.
 All connections are made using 10 Gb Ethernet.
 Jumbo Frames are enabled.

© 2016 VMware, Inc. All rights reserved.


Page 68 of 220
VMware Validated Design Reference Architecture Guide

 10K SAS (or faster) drives are used in the storage array.
Different disk speeds and disk types can be combined in an array to create different performance and
capacity tiers. The management cluster uses 10K SAS drives in the RAID configuration
recommended by the array vendor to achieve the required capacity and performance.
Table 26. NFS Hardware Design Decision

Decision Design Decision Design Justification Design Implication


ID

SDDC- Use 10K SAS drives 10K SAS drives achieve a balance 10K SAS drives are
PHY-STO- for management between performance and capacity. generally more
007 cluster NFS Faster drives can be used if expensive than other
volumes. desired. alternatives.
vSphere Data Protection requires
high-performance datastores in
order to meet backup SLAs.
vRealize Automation uses NFS
datastores for its content catalog
which requires high-performance
datastores.
vRealize Log Insight uses NFS
datastores for its archive storage
which, depending on compliance
regulations, can use a large amount
of disk space.

Volumes
A volume consists of multiple disks in a storage array. RAID is applied at the volume level. The more
disks in a volume, the better the performance and the greater the capacity.
Multiple datastores can be created on a single volume, but for applications that do not have a high I/O
footprint a single volume with multiple datastores is sufficient.
 For high I/O applications, such as backup applications, use a dedicated volume to avoid
performance issues.
 For other applications, set up Storage I/O Control (SIOC) to impose limits on high I/O applications
so that other applications get the I/O they are requesting.
Table 27. Volume Assignment Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC- Use a dedicated The backup and restore Dedicated volumes add
PHY- NFS volume to process is I/O intensive. management overhead to
STO-008 support backup Using a dedicated NFS storage administrators.
requirements. volume ensures that the Dedicated volumes might use
process does not impact the more disks, depending on the
performance of other array and type of RAID.
management components.

SDDC- Use a shared Non-backup related Enough storage space for


PHY- volume for other management applications can shared volumes and their
STO-009 management share a common volume due associated application data
component to the lower I/O profile of must be available.
datastores. these applications.

© 2016 VMware, Inc. All rights reserved.


Page 69 of 220
VMware Validated Design Reference Architecture Guide

3.2 Virtual Infrastructure Design


The virtual infrastructure design includes the software components that make up the virtual
infrastructure layer and that support the business continuity of the SDDC. These components include
the software products that provide the virtualization platform hypervisor, virtualization management,
storage virtualization, network virtualization, backup and disaster recovery. VMware products in this
layer include VMware vSphere, VMware Virtual SAN, VMware NSX, vSphere Data Protection, and
VMware Site Recovery Manager.
Figure 31. Virtual Infrastructure Layer Business Continuity in the SDDC

3.2.1 Virtual Infrastructure Design Overview


The SDDC virtual infrastructure consists of two regions. Each region includes a management pod,
and a shared edge and compute pod.

© 2016 VMware, Inc. All rights reserved.


Page 70 of 220
VMware Validated Design Reference Architecture Guide

Figure 32. SDDC Logical Design

3.2.1.1 Management Pod


Management pods run the virtual machines that manage the SDDC. These virtual machines
host vCenter Server, NSX Manager, NSX Controller, vRealize Operations, vRealize Log Insight,
vRealize Automation, Site Recovery Manager and other shared management components. All
management, monitoring, and infrastructure services are provisioned to a vCenter Server High
Availability cluster which provides high availability for these critical services. Permissions on the
management cluster limit access to only administrators. This protects the virtual machines running the
management, monitoring, and infrastructure services.

3.2.1.2 Shared Edge and Compute Pod


The virtual infrastructure design uses a shared edge and compute pod. The shared pod combines the
characteristics of typical edge and compute pods into a single pod. It is possible to separate these in
the future if required.
This pod provides the following main functions:
 Supports on-ramp and off-ramp connectivity to physical networks
 Connects with VLANs in the physical world

© 2016 VMware, Inc. All rights reserved.


Page 71 of 220
VMware Validated Design Reference Architecture Guide

 Optionally hosts centralized physical services


 Hosts the SDDC tenant virtual machines
The shared edge and compute pod connects the virtual networks (overlay networks) provided by NSX
for vSphere and the external networks. An SDDC can mix different types of compute-only pods and
provide separate compute pools for different types of SLAs.

3.2.1.3 Business Continuity


You can support business continuity and disaster recovery (BCDR) in the SDDC by protecting
vCenter Server, NSX for vSphere, vRealize Automation, vRealize Operations Manager, and vRealize
Log Insight. Enable backup and failover to a recovery region of these management
applications to continue the delivery of infrastructure management, operations management, and
cloud platform management.

3.2.1.4 Data Protection Design


Data backup protects the data of your organization against data loss, hardware failure, accidental
deletion, or other disaster for each region. For consistent image-level backups, use backup software
that is based on the VMware Virtual Disk Development Kit (VDDK), such as vSphere Data Protection.
Figure 33. vSphere Data Protection Logical Design

3.2.1.5 Disaster Recovery Design


The SDDC disaster recovery design includes two regions:
 Protected Region A in San Francisco. Region A runs the management stack virtual machine
workloads that are being protected and is referred to as the protected region in this document.
 Recovery Region B in Los Angeles. Region B is the disaster recovery region and is referred to
as the recovery region.
Site Recovery Manager can automate the setup and execution of disaster recovery plans between
these two regions.

© 2016 VMware, Inc. All rights reserved.


Page 72 of 220
VMware Validated Design Reference Architecture Guide

Note A region in the VMware Validated Design is equivalent to the site construct in Site Recovery
Manager.

Figure 34. Disaster Recovery Logical Design

3.2.2 ESXi Design


3.2.2.1 ESXi Hardware Requirements
You can find the ESXi hardware requirements in Physical Design Fundamentals. The following design
outlines the design of the ESXi configuration.

3.2.2.2 ESXi Manual Install and Boot Options


You can install or boot ESXi 6.0 from the following storage systems:
 SATA disk drives. SATA disk drives connected behind supported SAS controllers or supported
on-board SATA controllers.
 Serial-attached SCSI (SAS) disk drives. Supported for installing ESXi.
 SAN. Dedicated SAN disk on Fibre Channel or iSCSI.
 USB devices. Supported for installing ESXi. 16 GB or more is recommended.
 FCoE (Software Fibre Channel over Ethernet)
ESXi can boot from a disk larger than 2 TB if the system firmware and the firmware on any add-in
card support it. See the vendor documentation.

© 2016 VMware, Inc. All rights reserved.


Page 73 of 220
VMware Validated Design Reference Architecture Guide

3.2.2.3 ESXi Boot Disk and Scratch Configuration


For new installations of ESXi, the installer creates a 4 GB VFAT scratch partition. ESXi uses this
scratch partition to store log files persistently. By default, vm-support output, which is used by
VMware to troubleshoot issues on the ESXi host, is also stored on the scratch partition.
An ESXi installation on SD/USB media does not configure a default scratch partition. VMware
recommends that you specify a scratch partition on a VMFS volume or configure remote syslog
logging for the host.
Table 28. ESXi Boot Disk Design Decision

` Design Decision Design Justification Design Implication

SDDC-VI- Install and configure USB or SD cards are an When you use USB or SD storage,
ESXi-001 all ESXi hosts to boot inexpensive and easy to ESXi logs are not retained.
using local USB or configure option for installing Configure remote syslog (such as
SD devices. ESXi. vRealize Log Insight) to collect
ESXi host logs.
Using local USB or SD
allows allocation of all local
HDDs to a VMware Virtual
SAN storage system.

3.2.2.4 ESXi Host Access


After installation, ESXi hosts are added to a VMware vCenter Server system and managed through
that vCenter Server system.
Direct access to the host console is still available and most commonly used for troubleshooting
purposes. You can access ESXi hosts directly using one of these three methods:
 Direct Console User Interface (DCUI). Graphical interface on the console. Allows basic
administrative controls and troubleshooting options.
 ESXi Shell. A Linux-style bash login on the ESXi console itself.
 Secure Shell (SSH) Access. Remote command-line console access.
You can enable or disable each method. By default, the ESXi Shell and SSH are disabled to secure
the ESXi host. The DCUI is disabled only if Lockdown Mode is enabled.

3.2.2.5 ESXi User Access


By default, root is the only user who can log in to an ESXi host directly, however, you can add ESXi
hosts to an Active Directory domain. After the host has been added to an Active Directory domain,
access can be granted through Active Directory groups. Auditing who has logged into the host also
becomes easier.

© 2016 VMware, Inc. All rights reserved.


Page 74 of 220
VMware Validated Design Reference Architecture Guide

Table 29. ESXi User Access Design Decisions

Decision Design Decision Design Justification Design


ID Implication

SDDC-VI- Add each host to the child Using Active Directory Adding hosts to
ESXi-002 Active Directory domain for the membership allows greater the domain can
region and in which it will flexibility in granting add some
reside. e.g. sfo01.rainpole.local access to ESXi hosts. administrative
or lax01.rainpole.local overhead.
Ensuring that users log in
with a unique user account
allows greater visibility for
auditing.

SDDC-VI- Change the default ESX Admins Having an SDDC-Admins Additional


ESXi-003 group to the SDDC-Admins group is more secure changes to the
Active Directory group. Add because it removes a host's advanced
ESXi administrators to the known administrative settings are
SDDC-Admins group following access point. In addition required.
standard access procedures. different groups allow for
separation of management
tasks.

3.2.2.6 Virtual Machine Swap Configuration


When a virtual machine is powered on, the system creates a VMkernel swap file to serve as a
backing store for the virtual machine's RAM contents. The default swap file is stored in the same
location as the virtual machine's configuration file. This simplifies the configuration, however it can
cause an excess of replication traffic that is not needed.
You can reduce the amount of traffic that is replicated by changing the swap file location to a user-
configured location on the host. However, it can take longer to perform VMware vSphere vMotion ®
operations when the swap file has to be recreated.
Table 30. Other ESXi Host Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC-VI- Configure all ESXi Required because All firewalls located


ESXi-004 hosts to synchronize deployment of vCenter between the ESXi host and
time with the central Server Appliance on an the NTP servers have to
NTP servers. ESXi host might fail if the allow NTP traffic on the
host is not using NTP. required network ports.

3.2.3 vCenter Server Design


The Design put forward for vCenter includes both the Server component and the VMware Platform
Services Controller instances.
A VMware Platform Services Controller groups a set of infrastructure services including vCenter
Single Sign-On, License service, Lookup Service, and VMware Certificate Authority. You can deploy
the Platform Services controller and the associated vCenter Server system on the same virtual
machine (embedded Platform Services Controller) or on different virtual machines (external Platform
Services Controller).

© 2016 VMware, Inc. All rights reserved.


Page 75 of 220
VMware Validated Design Reference Architecture Guide

Table 31. vCenter Server Design Decision

Decision Design Decision Design Justification Design


ID Implication

SDDC-VI- Deploy two vCenter Server Isolates vCenter Server failures to Requires
VC-001 systems in the first management or compute licenses for
availability zone of each workloads. each vCenter
region. Server
Isolates vCenter Server operations
instance.
 One vCenter Server between management and
supporting the SDDC compute.
management
Supports a scalable cluster design
components.
where the management
 One vCenter Server components may be re-used as
supporting the edge additional compute needs to be
components and added to the SDDC.
compute workloads. Simplifies capacity planning for
compute workloads by eliminating
management workloads from
consideration in the the Compute
vCenter Server.
Improves the ability to upgrade the
vSphere environment and related
components by providing for
explicit separation of maintenance
windows:
 Management workloads
remain available while
workloads in compute are
being addressed
 Compute workloads remain
available while workloads in
management are being
addressed
Ability to have clear separation of
roles and responsibilities to ensure
that only those administrators with
proper authorization can attend to
the management workloads.
Facilitates quicker troubleshooting
and problem resolution.
Simplifies Disaster Recovery
operations by supporting a clear
demarcation between recovery of
the management components and
compute workloads.
Enables the use of two NSX
managers, one for the
management pod and the other for
the shared edge and compute pod.
Network separation of the pods in
the SDDC allows for isolation of
potential network issues.

© 2016 VMware, Inc. All rights reserved.


Page 76 of 220
VMware Validated Design Reference Architecture Guide

You can install vCenter Server as a Windows-based system or deploy the Linux-based VMware
vCenter Server Appliance. The Linux-based vCenter Server Appliance is preconfigured, enables fast
deployment, and potentially results in reduced Microsoft licensing costs.
Table 32. vCenter Server Platform Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC-VI- Deploy all vCenter Allows for rapid Operational staff might
VC-002 Server instances as deployment, enables need Linux experience to
Linux-based vCenter scalability, and reduces troubleshoot the Linux-
Server Appliances. Microsoft licensing costs. based appliances.

3.2.3.1 Platform Services Controller Design Decision Background


vCenter Server supports installation with an embedded Platform Services Controller (embedded
deployment) or with an external Platform Services Controller.
 In an embedded deployment, vCenter Server and the Platform Services Controller run on the
same virtual machine. Embedded deployments are recommended for standalone
environments with only one vCenter Server system.
 Environments with an external Platform Services Controller can have multiple vCenter Server
systems. The vCenter Server systems can use the same Platform Services Controller
services. For example, several vCenter Server systems can use the same instance of vCenter
Single Sign-On for authentication.
 If there is a need to replicate with other Platform Services Controller instances, or if the solution
includes more than one vCenter Single Sign-On instance, you can deploy multiple external
Platform Services Controller instances on separate virtual machines.
Table 33. Platform Service Controller Design Decisions

Decision Design Decision Design Justification Design


ID Implication

SDDC-VI- Deploy each vCenter External Platform Services The number of


VC-003 Server with an external Controllers are required for VMs that have to
Platform Services replication between Platform be managed
Controller. Services Controller instances. increases.

SDDC-VI- Join all Platform When all Platform Services Only one Single
VC-004 Services Controller Controller instances are joined into Sign-On domain
instances to a single a single vCenter Single Sign-On will exist.
vCenter Single Sign- domain, they can share
On domain. authentication and license data
across all components and regions.

SDDC-VI- Create a ring topology By default, Platform Service Command-line


VC-005 for the Platform Controllers only replicate with one interface
Service Controllers. other Platform Services Controller, commands must
that creates a single point of failure be used to
for replication. A ring topology configure the ring
ensures each Platform Service replication
Controller has two replication topology.
partners and eliminates any single
point of failure.

© 2016 VMware, Inc. All rights reserved.


Page 77 of 220
VMware Validated Design Reference Architecture Guide

Figure 35. vCenter Server and Platform Services Controller Deployment Model

3.2.3.2 vCenter Server Networking


As specified in the physical networking design static IP addresses and host names must be used for
all vCenter Server systems. The IP addresses must have valid (internal) DNS registration including
reverse name resolution.
The vCenter Server systems must maintain network connections to the following components:
 All VMware vSphere Client and vSphere Web Client user interfaces.
 Systems running vCenter Server add-on modules.
 Each ESXi host.

3.2.3.3 vCenter Server Redundancy


Protecting the vCenter Server system is important because it is the central point of management and
monitoring for the SDDC. How you protect vCenter Server depends on maximum downtime tolerated,
and on whether failover automation is required.
The following table lists methods available for protecting the vCenter Server system and the vCenter
Server Appliance.
Table 34. Methods for Protecting vCenter Server System and the vCenter Server Appliance

Redundancy Protects Protects Protects Protects


Method vCenter Server Platform vCenter Server Platform
system Services (Appliance)? Services
(Windows)? Controller Controller
(Windows)? (Appliance)?

Automated Yes Yes Yes Yes


protection using
vSphere HA.

Manual Yes Yes Yes Yes


configuration and
manual failover. For
example, using a
cold standby.

© 2016 VMware, Inc. All rights reserved.


Page 78 of 220
VMware Validated Design Reference Architecture Guide

Redundancy Protects Protects Protects Protects


Method vCenter Server Platform vCenter Server Platform
system Services (Appliance)? Services
(Windows)? Controller Controller
(Windows)? (Appliance)?

HA Cluster with Not Available Yes Not Available Yes


external load
balancer

Table 35. vCenter Server Systems Protection Design Decisions

Decision Design Decision Design Justification Design


ID Implication

SDDC-VI- Protect all vCenter Supports availability objectives for vCenter Server will
VC-006 Server appliances by vCenter Server appliances without be unavailable
using vSphere HA. a required manual intervention during a vSphere
during a failure event. HA failover.

3.2.3.4 vCenter Server Appliance Sizing


The following tables outline minimum hardware requirements for the management vCenter Server
appliance and the compute vCenter Server appliance.
Table 36. Logical Specification for Management vCenter Server Appliance

Attribute Specification

vCenter Server version 6.0 (vCenter Server Appliance)

Physical or virtual system Virtual (appliance)

Appliance Size Small (up to 100 hosts / 1,000 VMs)

Platform Services Controller External

Number of CPUs 2

Memory 16 GB

Disk Space 136 GB

Table 37. Logical Specification for Compute vCenter Server Appliance

Attribute Specification

vCenter Server version 6.0 (vCenter Server Appliance)

Physical or virtual system Virtual (appliance)

Appliance Size Large (up to 1,000 hosts / 10,000 VMs)

© 2016 VMware, Inc. All rights reserved.


Page 79 of 220
VMware Validated Design Reference Architecture Guide

Attribute Specification

Platform Services Controller External

Number of CPUs 16

Memory 32 GB

Disk Space 295 GB

Table 38. vCenter Appliance Sizing Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC-VI- Configure the Based on the number of If the size of the


VC-007 management vCenter management VMs that are management
Server Appliances running, a vCenter Server environment changes,
with the small size Appliance installed with the the vCenter Server
setting. small size setting is sufficient. Appliance size might
need to be increased.

SDDC-VI- Configure the compute Based on the number of As the compute


VC-008 and edge vCenter compute and edge VMs that are environment grows,
Server Appliances running, a vCenter Server additional vCenter
with the large size Appliance installed with the Server instances
setting. large size setting is needed. might be needed.

3.2.3.5 Database Design


A vCenter Server Appliance can use either a built-in local PostgreSQL database or an external Oracle
database. Both configurations support up to 1,000 hosts or 10,000 virtual machines.
Database Design Decision Background
A vCenter Server Windows installation can use either a supported external database or a local
PostgreSQL database. The local PostgreSQL database is installed with vCenter Server and is limited
to 20 hosts and 200 virtual machines. Supported external databases include Microsoft SQL Server
2008 R2, SQL Server 2012, SQL Server 2014, Oracle Database 11g, and Oracle Database 12c.
External databases require a 64-bit DSN. DSN aliases are not supported.
vCenter Server Appliance can use either a local PostgreSQL database that is built into the appliance,
which is recommended, or an external database. Supported external databases include Oracle
Database 11g and Oracle Database 12c. External database support is being deprecated in this
release; this is the last release that supports the use of an external database with vCenter Server
Appliance.
Unlike a vCenter Windows installation, a vCenter Server Appliance that uses a local PostgreSQL
database supports up to 1,000 hosts or 10,000 virtual machines at full vCenter Server scale.

© 2016 VMware, Inc. All rights reserved.


Page 80 of 220
VMware Validated Design Reference Architecture Guide

Table 39. vCenter Database Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC-VI- Set up all vCenter Reduces both overhead and The vCenter Server
VC-009 Server instances to Microsoft or Oracle licensing Appliance has limited
use the embedded costs. Avoids problems with database management
PostgreSQL upgrades. Support for external tools for database
databases. databases is deprecated for administrators.
vCenter Server Appliance in the
next release.

3.2.3.6 vSphere Cluster Design


The cluster design must take into account the workload that the cluster handles. Different cluster
types in this design have different characteristics.
vSphere Cluster Design Decision Background
The following heuristics help with cluster design decisions.
 Decide to use fewer, larger hosts or more, smaller hosts.
o A scale-up cluster has fewer, larger hosts.
o A scale-out cluster has more, smaller hosts.
o A virtualized server cluster typically has more hosts with fewer virtual machines per host.
 Compare the capital costs of purchasing fewer, larger hosts with the costs of purchasing more,
smaller hosts. Costs vary between vendors and models.
 Evaluate the operational costs of managing a few hosts with the costs of managing more hosts.
 Consider the purpose of the cluster.
 Consider the total number of hosts and cluster limits.
Figure 36. vSphere Logical Cluster Layout

© 2016 VMware, Inc. All rights reserved.


Page 81 of 220
VMware Validated Design Reference Architecture Guide

3.2.3.7 vSphere High Availability Design


VMware vSphere High Availability (vSphere HA) protects your virtual machines in case of host failure
by restarting virtual machines on other hosts in the cluster when a host fails.
During configuration of the cluster, the hosts elect a master host. The master host communicates with
the vCenter Server system and monitors the virtual machines and secondary hosts in the cluster.
The master hosts detects different types of failure:
 Host failure, for example an unexpected power failure
 Host network isolation or connectivity failure
 Loss of storage connectivity
 Problems with virtual machine OS availability
Table 40. vSphere HA Design Decisions

Decision Design Design Justification Design Implication


ID Decision

SDDC-VI- Use vSphere vSphere HA supports a Sufficient resources on the


VC-010 HA to protect all robust level of protection for remaining host are required to
clusters against both host and virtual so that virtual machines can be
failures. machine availability. migrated to those hosts in the
event of a host outage.

SDDC-VI- Set vSphere HA Virtual SAN requires that the VMs are powered off in case of
VC-011 Host Isolation HA Isolation Response be a false positive and a host is
Response to set to Power Off and to declared isolated incorrectly.
Power Off. restart VMs on available
hosts.

3.2.3.8 vSphere HA Admission Control Policy Configuration


The vSphere HA Admission Control Policy allows an administrator to configure how the cluster judges
available resources. In a smaller vSphere HA cluster, a larger proportion of the cluster resources are
reserved to accommodate host failures, based on the selected policy. The following policies are
available:
 Host failures the cluster tolerates. vSphere HA ensures that a specified number of hosts can
fail and sufficient resources remain in the cluster to fail over all the virtual machines from those
hosts.
 Percentage of cluster resources reserved. vSphere HA ensures that a specified percentage of
aggregate CPU and memory resources are reserved for failover.
 Specify Failover Hosts. When a host fails, vSphere HA attempts to restart its virtual machines
on any of the specified failover hosts. If restart is not possible, for example the failover hosts have
insufficient resources or have failed as well, then vSphere HA attempts to restart the virtual
machines on other hosts in the cluster.

3.2.3.9 vSphere Cluster Workload Design


This design defines the following vSphere clusters and the workloads that they handle.

© 2016 VMware, Inc. All rights reserved.


Page 82 of 220
VMware Validated Design Reference Architecture Guide

Table 41. vSphere Cluster Workload Design Decisions

Decision ID Design Decision Design Justification Design Implication

SDDC-VI- Create a single Simplifies Management of multiple


VC-012 management cluster configuration by clusters and vCenter Server
containing all management isolating instances increases
hosts. management operational overhead.
workloads from
compute workloads.
Ensures that
compute workloads
have no impact on
the management
stack.
You can add ESXi
hosts to the cluster
as needed.

SDDC-VI- Create a shared edge and Simplifies Management of multiple


VC-013 compute cluster that hosts configuration and clusters and vCenter Server
compute workloads, NSX minimizes the instances increases
Controllers and associated number of hosts operational overhead.
NSX Edge gateway devices required for initial
Due to the shared nature of
used for compute deployment.
the cluster, when compute
workloads.
Ensures that the workloads are added, the
management stack cluster must be scaled out
has no impact on to keep high level of
compute workloads. network performance.
You can add ESXi
hosts to the cluster
as needed.

3.2.3.10 Management Cluster Design


The management cluster design determines the management cluster configuration.
Table 42. Management Cluster Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC-VI- Create a management Two hosts is generally Calculate reserved


VC-014 cluster with 4 hosts. considered enough to support amounts when
Cluster redundancy is the management components. cluster size increases
n+2 protection for One host supports failover in to prevent
vSphere HA which case of a hardware defect. One overprotection.
covers outage more host allows failover if a
Additional host
redundancy during second host is unavailable for
resources are
maintenance tasks. scheduled maintenance.
required for
redundancy.

© 2016 VMware, Inc. All rights reserved.


Page 83 of 220
VMware Validated Design Reference Architecture Guide

Decision Design Decision Design Justification Design Implication


ID

SDDC-VI- Set vSphere HA for the Using the percentage- based If additional hosts are
VC-015 management cluster reservation works well in added to the cluster,
to reserve 25% of situations where virtual more resources are
cluster resources for machines have varying and being reserved for
failover. sometime significant CPU or failover capacity.
memory reservations.
Recalculate the
percentage of
reserved resources
when additional hosts
are added to the
cluster.

Management Cluster Logical Design Background


The following table summarizes the attributes of the management cluster logical design.
Table 43. Management Cluster Attributes

Attribute Specification

Number of hosts required to support management hosts with no 2


overcommitment

Number of hosts recommended due to operational constraints 3


(Ability to take a host offline without sacrificing High Availability
capabilities)

Number of hosts recommended due to operational constraints, 4


while using Virtual SAN (Ability to take a host offline without
sacrificing High Availability capabilities)

Capacity for host failures per cluster 25% reserved CPU & RAM

Number of usable hosts per cluster 3 usable hosts

3.2.3.11 Shared Edge and Compute Cluster Design


Tenant workloads run on the ESXi hosts in the shared edge and compute cluster, also, due to the
shared nature of the cluster, NSX Controllers and Edge devices run in this cluster.

© 2016 VMware, Inc. All rights reserved.


Page 84 of 220
VMware Validated Design Reference Architecture Guide

Table 44. Edge Cluster Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC-VI- Create a shared NSX Manager requires a 1:1 Each time you provision a
VC-016 edge and compute relationship with a vCenter Compute vCenter Server
cluster for the NSX Server system. system, a new NSX
Controllers and Manager is required.
NSX Edge
Set anti-affinity rules to
gateway devices.
keep each Controller on a
separate host. A 4-node
cluster allows maintenance
while ensuring that the 3
Controllers remain on
separate hosts.

SDDC-VI- Set vSphere HA vSphere HA protects the NSX If one of the hosts becomes
VC-017 for the shared Controller instances and edge unavailable, two Controllers
edge and compute services gateway devices in run on a single host..
cluster to reserve the event of a host failure.
25% of cluster vSphere HA powers on virtual
resources for machines from the failed hosts
failover. on any remaining hosts..

SDDC-VI- Create shared 3 NSX Controllers are required 4 hosts is the smallest
VC-018 edge and compute for sufficient redundancy and starting point for the shared
cluster with a majority decisions. edge and compute cluster
minimum of 4 for redundancy and
One host is available for
hosts. performance thus
failover and to allow for
increasing cost over a 3
scheduled maintenance.
node cluster.

SDDC-VI- Set up VLAN- Edge gateways need access VLAN-backed port groups
VC-019 backed port to the external network in must be configured with the
groups for external addition to the management correct number of ports, or
access and network. with elastic port allocation
management on
the shared edge
and compute
cluster hosts.

SDDC-VI- Create a resource The NSX components control During contention SDDC
VC-020 pool for the all network traffic in and out of NSX components receive
required SDDC the SDDC as well as update more resources then all
NSX Controllers route information for inter- other workloads as such
and edge SDDC communication. In a monitoring and capacity
appliances with a contention situation it is management must be a
CPU share level of imperative that these virtual proactive activity.
High, a memory machines receive all the
share of normal, resources required.
and 15 GB
memory
reservation.

© 2016 VMware, Inc. All rights reserved.


Page 85 of 220
VMware Validated Design Reference Architecture Guide

Decision Design Decision Design Justification Design Implication


ID

SDDC-VI- Create a resource NSX edges for users, created During contention these
VC-021 pool for all user by vRealize Automation, NSX edges will receive less
NSX Edge devices support functions such as load resources then the SDDC
with a CPU share balancing for user workloads. edge devices as such
value of Normal These edge devices do not monitoring and capacity
and a memory support the entire SDDC as management must be a
share value of such they receive fewer proactive activity.
Normal. resources during contention.

SDDC-VI- Create a resource In a shared edge and compute During contention user
VC-022 pool for all user cluster the SDDC edge workload virtual machines
virtual machines devices must be guaranteed could be starved for
with a CPU share resources above all other resources and experience
value of Normal workloads as to not impact poor performance. It is
and a memory network connectivity. Setting critical that monitoring and
share value of the share values to normal capacity management must
Normal. gives the SDDC edges more be a proactive activity and
shares of resources during that capacity is added or a
contention ensuring network dedicated edge cluster is
traffic is not impacted. created before contention
occurs.

Shared Edge and Compute Cluster Logical Design Background


The following table summarizes the attributes of the shared edge and compute cluster logical design.
The number of VMs on the shared edge and compute cluster will start low but will grow quickly as
user workloads are created.

Table 45. Shared Edge and Compute Cluster Attributes

Attribute Specification

Capacity for host failures per cluster 1

Number of usable hosts per cluster 3

Minimum number of hosts required to support the shared edge and compute cluster 4

3.2.3.12 Compute Cluster Design


As the SDDC grows additional compute-only clusters can be configured. Tenant workloads run on the
ESXi hosts in the compute cluster instances. Multiple compute clusters are managed by the Compute
vCenter Server instance.

© 2016 VMware, Inc. All rights reserved.


Page 86 of 220
VMware Validated Design Reference Architecture Guide

Table 46. Compute Cluster Design Decisions

Decision ID Design Decision Design Justification Design Implication

SDDC-VI- The hosts in each compute The spine-and-leaf Fault domains are
VC-023 cluster are contained within a architecture dictates that limited to each rack.
single rack. all hosts in a cluster
must be connected to
the same top-of-rack
switches.

SDDC-VI- Configure vSphere HA to use Using explicit host As the number of


VC-024 percentage-based failover failover limits the total hosts in the cluster
capacity to ensure n+1 available resources in a changes, the
availability. The exact setting cluster. percentage of failover
depend on the number of hosts capacity must be
in the compute cluster. adjusted.

3.2.3.13 vCenter Server Customization


vCenter Server supports a rich set of customization options, including monitoring, virtual machine fault
tolerance, and so on. For each feature, this VMware Validated Design specifies the design decisions.

3.2.3.14 VM and Application Monitoring Service


When VM and Application Monitoring is enabled, the VM and Application Monitoring service, which
uses VMware Tools, evaluates whether each virtual machine in the cluster is running. The service
checks for regular heartbeats and I/O activity from the VMware Tools process running on guests. If
the service receives no heartbeats or I/O activity, it is likely that the guest operating system has failed
or that VMware Tools is not being allocated time for heartbeats or I/O activity. In this case, the service
determines that the virtual machine has failed and reboots the virtual machine.
Enable Virtual Machine Monitoring for automatic restart of a failed virtual machine. The application or
service that is running on the virtual machine must be capable of restarting successfully after a reboot
or the VM restart is not sufficient.
Table 47. Monitor Virtual Machines Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC-VI- Enable Virtual Virtual Machine Monitoring There is no downside to


VC-025 Machine Monitoring provides adequate in-guest enabling Virtual
for each cluster. protection for most VM Machine Monitoring.
workloads.

3.2.3.15 VMware vSphere Distributed Resource Scheduling (DRS)


vSphere Distributed Resource Scheduling provides load balancing of a cluster by migrating workloads
from heavily loaded hosts to less utilized hosts in the cluster. DRS supports manual and automatic
modes.
 Manual. Recommendations are made but an administrator needs to confirm the changes.
 Automatic. Automatic management can be set to five different levels. At the lowest setting,
workloads are placed automatically at power on and only migrated to fulfill certain criteria, such as
entering maintenance mode. At the highest level, any migration that would provide a slight
improvement in balancing will be executed.

© 2016 VMware, Inc. All rights reserved.


Page 87 of 220
VMware Validated Design Reference Architecture Guide

Table 48. vSphere Distributed Resource Scheduling Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC-VI- Enable DRS on all The default settings provide In the event of a vCenter
VC-026 clusters and set it to the best trade-off between outage, mapping from
automatic, with the load balancing and virtual machines to ESXi
default setting excessive migration with hosts might be more
(medium). vMotion events. difficult to determine.

3.2.3.16 Enhanced vMotion Compatibility (EVC)


EVC works by masking certain features of newer CPUs to allow migration between hosts containing
older CPUs. EVC works only with CPUs from the same manufacturer and there are limits to the
version difference gaps between the CPU families.
If you set EVC during cluster creation, you can add hosts with newer CPUs at a later date without
disruption. You can use EVC for a rolling upgrade of all hardware with zero downtime.
Set EVC to the highest level possible with the current CPUs in use.
Table 49. VMware Enhanced vMotion Compatibility Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC-VI- Enable Enhanced vMotion Allows cluster You can enable EVC only
VC-027 Compatibility on all clusters. upgrades without if clusters contain hosts
virtual machine with CPUs from the same
Set EVC mode to the lowest
downtime. vendor.
available setting supported
for the hosts in the cluster.

3.2.3.17 Use of Transport Layer Security (TLS) Certificates


By default, vSphere 6.0 uses TLS/SSL certificates that are signed by VMCA (VMware Certificate
Authority). By default, these certificates are not trusted by end-user devices or browsers. It is a
security best practice to replace at least user-facing certificates with certificates that are signed by a
third-party or enterprise Certificate Authority (CA). Certificates for machine-to-machine communication
can remain as VMCA-signed certificates.
Table 50. vCenter Server TLS Certificate Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC-VI- Replace the vCenter Infrastructure administrators Replacing and


VC-028 Server machine connect to both vCenter Server managing
certificate and Platform and the Platform Services certificates is an
Services Controller Controller via web browser to operational
machine certificate with perform configuration, overhead.
a certificate signed by management and troubleshooting
a custom Certificate activities. Certificate warnings
Authority (CA). result with the default certificate.

© 2016 VMware, Inc. All rights reserved.


Page 88 of 220
VMware Validated Design Reference Architecture Guide

3.2.4 Virtualization Network Design


A well-designed network helps the organization meet its business goals. It prevents unauthorized
access, and provides timely access to business data.
This network virtualization design uses vSphere and VMware NSX for vSphere to implement virtual
networking.

3.2.4.1 Virtual Network Design Background Considerations


This VMware Validated Design follows high-level network design guidelines and networking best
practices.
The high-level design goals apply regardless of your environment.
 Meet diverse needs. The network must meet the diverse needs of many different entities in an
organization. These entities include applications, services, storage, administrators, and users.
 Reduce costs. Reducing costs is one of the simpler goals to achieve in the vSphere
infrastructure. Server consolidation alone reduces network costs by reducing the number of
required network ports and NICs, but a more efficient network design is desirable. For example,
configuring two 10 GbE NICs with VLANs might be more cost effective than configuring a dozen 1
GbE NICs on separate physical networks.
 Boost performance. You can achieve performance improvement and decrease the time that is
required to perform maintenance by providing sufficient bandwidth, which reduces contention and
latency.
 Improve availability. A well-designed network improves availability, typically by providing
network redundancy.
 Support security. A well-designed network supports an acceptable level of security through
controlled access (where required) and isolation (where necessary).
 Enhance infrastructure functionality. You can configure the network to support vSphere
features such as vSphere vMotion, vSphere High Availability, and vSphere Fault Tolerance.
Follow networking best practices throughout your environment.
 Separate network services from one another to achieve greater security and better performance.
 Use Network I/O Control and traffic shaping to guarantee bandwidth to critical virtual machines.
During network contention these critical virtual machines will receive a higher percentage of the
bandwidth.
 Separate network services on a single vSphere Distributed Switch by attaching them to port
groups with different VLAN IDs.
 Keep vSphere vMotion traffic on a separate network. When migration with vMotion occurs, the
contents of the guest operating system’s memory is transmitted over the network. You can put
vSphere vMotion on a separate network by using a dedicated vSphere vMotion VLAN.
 When using passthrough devices with a Linux kernel version 2.6.20 or earlier guest OS, avoid
MSI and MSI-X modes because these modes have significant performance impact.
 For best performance, use VMXNET3 virtual NICs.
 Ensure that physical network adapters that are connected to the same vSphere Standard Switch
or vSphere Distributed Switch are also connected to the same physical network.
 Configure all VMkernel network adapters with the same MTU. When several VMkernel network
adapters are connected to distributed switches, but those network adapters have different MTUs
configured, network connectivity problems might result.

3.2.4.2 Network Segmentation and VLANs


Separating different types of traffic is required to reduce contention and latency. Separate networks
are also required for access security.

© 2016 VMware, Inc. All rights reserved.


Page 89 of 220
VMware Validated Design Reference Architecture Guide

High latency on any network can negatively affect performance. Some components are more
sensitive to high latency than others. For example, reducing latency is important on the IP storage
and the vSphere Fault Tolerance logging network because latency on these networks can negatively
affect the performance of multiple virtual machines.
Depending on the application or service, high latency on specific virtual machine networks can also
negatively affect performance. Use information gathered from the current state analysis and from
interviews with key stakeholder and SMEs to determine which workloads and networks are especially
sensitive to high latency.

Note Configuring separate networks requires additional monitoring and administration.

3.2.4.3 Virtual Networks


Determine the number of networks or VLANs that are required depending on the type of traffic.
 vSphere operational traffic.
o Management
o vMotion
o Virtual SAN
o NFS Storage
o vSphere Replication
o VXLAN
 Traffic that supports the organization’s services and applications.

3.2.4.4 Virtual Switches


Virtual switches simplify the configuration process by providing one single pane of glass view for
performing virtual network management tasks.
Virtual Switch Design Background
A vSphere Distributed Switch (distributed switch) offers several enhancements over standard virtual
switches.
 Centralized management. Because distributed switches are created and managed centrally on a
vCenter Server system, they make the switch configuration more consistent across ESXi hosts.
Centralized management saves time, reduces mistakes, and lowers operational costs.
 Additional features. Distributed switches offer features that are not available on standard virtual
switches. Some of these features can be useful to the applications and services that are running
in the organization’s infrastructure. For example, NetFlow and port mirroring provide monitoring
and troubleshooting capabilities to the virtual infrastructure.
Consider the following caveats for distributed switches.
 Distributed switches require a VMware vSphere Enterprise Plus Edition license.
 Distributed switches are not manageable when vCenter Server is unavailable. vCenter Server
therefore becomes a tier one application.
Health Check
The health check service helps identify and troubleshoot configuration errors in vSphere distributed
switches.
Health check helps identify the following common configuration errors.
 Mismatched VLAN trunks between an ESXi host and the physical switches it's connected to.
 Mismatched MTU settings between physical network adapters, distributed switches, and physical
switch ports.
 Mismatched virtual switch teaming policies for the physical switch port-channel settings.

© 2016 VMware, Inc. All rights reserved.


Page 90 of 220
VMware Validated Design Reference Architecture Guide

Health check monitors VLAN, MTU, and teaming policies.


 VLANs. Checks whether the VLAN settings on the distributed switch match the trunk port
configuration on the connected physical switch ports.
 MTU. For each VLAN, health check determines whether the physical access switch port's MTU
jumbo frame setting matches the distributed switch MTU setting.
 Teaming policies. Health check determines whether the connected access ports of the physical
switch that participate in an EtherChannel are paired with distributed ports whose teaming policy
is IP hash.
Health check is limited to the access switch port to which the ESXi hosts' NICs connects.

Note For VLAN and MTU checks, at least two physical NICs for the distributed switch are required.
For a teaming policy check, at least two physical NICs and two hosts are required when
applying the policy.

Number of Virtual Switches


Create fewer virtual switches, preferably just one. For each type of network traffic, configure a single
virtual switch with a port group to simplify configuration and monitoring.
Table 51. Virtual Switch Design Decisions

Decision ID Design Decision Design Justification Design Implication

SDDC-VI- Use vSphere vSphere Distributed Migration from a VSS to a VDS


Net-001 Distributed Switches Switches simplify requires a minimum of two
(VDS). management. physical NICs to maintain
redundancy.

SDDC-VI- Use a single VDS per Reduces complexity of None.


Net-002 cluster. the network design.

Management Cluster Distributed Switches


The management cluster uses a single vSphere Distributed Switch with the following configuration
settings.
Table 52. Virtual Switch for the Management Cluster

vSphere Function Network I/O Number of MTU


Distributed Control Physical
Switch NIC Ports
Name

vDS-Mgmt  ESXi Management Enabled 2 9000

 Network IP Storage (NFS)


 Virtual SAN
 vSphere vMotion
 VXLAN Tunnel Endpoint (VTEP)
 vSphere Replication/vSphere Replication
NFC
 Uplinks (2) to enable ECMP
 External management connectivity

© 2016 VMware, Inc. All rights reserved.


Page 91 of 220
VMware Validated Design Reference Architecture Guide

Table 53. vDS-Mgmt Port Group Configuration Settings

Parameter Setting

Failover detection Link status only

Notify switches Enabled

Failback No

Failover order Active uplinks: Uplink1, Uplink2

Figure 37. Network Switch Design for Management Hosts

This section expands on the logical network design by providing details on the physical NIC layout
and physical network attributes.
Table 54. Management Virtual Switches by Physical/Virtual NIC

vSphere Distributed Switch vmnic Function

vDS-Mgmt 0 Uplink

© 2016 VMware, Inc. All rights reserved.


Page 92 of 220
VMware Validated Design Reference Architecture Guide

vDS-Mgmt 1 Uplink

Table 55. Management Virtual Switch Port Groups and VLANs

vSphere Distributed Port Group Name Teaming Active VLAN


Switch Uplinks ID

vDS-Mgmt vDS-Mgmt- Route based on physical 0, 1 1611


Management NIC load

vDS-Mgmt vDS-Mgmt-vMotion Route based on physical 0, 1 1612


NIC load

vDS-Mgmt vDS-Mgmt-VSAN Route based on physical 0, 1 1613


NIC load

vDS-Mgmt Auto Generated (NSX Route based on SRC-ID 0, 1 1614


VTEP)

vDS-Mgmt vDS-Mgmt-Uplink01 Route based on physical 0, 1 2711


NIC load

vDS-Mgmt vDS-Mgmt-Uplink02 Route based on physical 0, 1 2712


NIC load

vDS-Mgmt vDS-Mgmt-NFS Route based on physical 0, 1 1615


NIC load

vDS-Mgmt vDS-Mgmt-VR Route based on physical 0, 1 1616


NIC load

vDS-Mgmt vDS-Mgmt-Ext- Route based on physical 0, 1 130


Management NIC load

Table 56. Management VMkernel Adapter

vSphere Network Connected Port Enabled Services MTU


Distributed Switch Label Group

vDS-Mgmt- 1500
vDS-Mgmt Management Management Traffic
Management (Default)

vDS-Mgmt vMotion vDS-Mgmt-vMotion vMotion Traffic 9000

vDS-Mgmt VSAN vDS-Mgmt-VSAN Virtual SAN 9000

vDS-Mgmt NFS vDS-Mgmt-NFS - 9000

vSphere Replication
traffic
vDS-Mgmt Replication vDS-Mgmt-VR 9000
vSphere Replication
NFC traffic

© 2016 VMware, Inc. All rights reserved.


Page 93 of 220
VMware Validated Design Reference Architecture Guide

Auto Generated
vDS-Mgmt VTEP - 9000
(NSX VTEP)

For more information on the physical network design specifications, see the Physical Network Design
section.

© 2016 VMware, Inc. All rights reserved.


Page 94 of 220
VMware Validated Design Reference Architecture Guide

Shared Edge and Compute Cluster Distributed Switches


The shared edge and compute cluster uses a single vSphere Distributed Switch with the following
configuration settings.

Table 57. Virtual Switch for the shared Edge and Compute Cluster

vSphere Distributed Function Network I/O Number of MTU


Switch Name Control Physical
NIC Ports

vDS-Comp01 ESXi Management Enabled 2 9000


Network IP Storage
(NFS)
vSphere vMotion
VXLAN Tunnel Endpoint
(VTEP)
Uplinks (2) to enable
ECMP
Virtual SAN
External customer/tenant
connectivity

Table 58. vDS-Comp01 Port Group Configuration Settings

Parameter Setting

Failover detection Link status only

Notify switches Enabled

Failback No

Failover order Active uplinks: Uplink1, Uplink2

© 2016 VMware, Inc. All rights reserved.


Page 95 of 220
VMware Validated Design Reference Architecture Guide

Figure 38. Network Switch Design for shared Edge and Compute Hosts

This section expands on the logical network design by providing details on the physical NIC layout
and physical network attributes.
Table 59. Shared Edge and Compute Cluster Virtual Switches by Physical/Virtual NIC

vSphere Distributed Switch vmnic Function

vDS-Comp01 0 Uplink

vDS-Comp01 1 Uplink

Table 60. Edge Cluster Virtual Switch Port Groups and VLANs

vSphere Distributed Port Group Name Teaming Active VLAN


Switch Uplinks ID

vDS- Comp01- Route based on physical


vDS-Comp01 0, 1 1631
Management NIC load

Route based on physical


vDS- Comp01 vDS- Comp01-vMotion 0, 1 1632
NIC load

© 2016 VMware, Inc. All rights reserved.


Page 96 of 220
VMware Validated Design Reference Architecture Guide

vSphere Distributed Port Group Name Teaming Active VLAN


Switch Uplinks ID

Route based on physical


vDS- Comp01 vDS- Comp01-VSAN 0, 1 1633
NIC load

Route based on physical


vDS-Comp01 vDS-Comp01-NFS 0, 1 1615
NIC load

Auto Generated (NSX


vDS- Comp01 Route based on SRC-ID 0, 1 1634
VTEP)

vDS- Comp01- Route based on physical


vDS- Comp01 0, 1 1635
Uplink01 NIC load

vDS- Comp01- Route based on physical


vDS- Comp01 0, 1 2713
Uplink02 NIC load

Table 61. Shared Edge and Compute Cluster VMkernel Adapter

vSphere Distributed Network Connected Port Enabled MTU


Switch Label Group Services

vDS- Comp01- Management 1500


vDS- Comp01 Management
Management Traffic (Default)

vDS- Comp01 vMotion vDS- Comp01-vMotion vMotion Traffic 9000

vDS- Comp01 VSAN vDS- Comp01-VSAN Virtual SAN 9000

vDS-Comp01 NFS vDS-Comp01-NFS - 9000

Auto Generated (NSX


vDS- Comp01 VTEP - 9000
VTEP)

For more information on the physical network design, see the Physical Network Design section.
Compute Cluster Distributed Switches
A compute cluster vSphere Distributed Switch uses the following configuration settings.
Table 62. Virtual Switches for Compute Cluster Hosts

vSphere Distributed Function Network I/O Number of MTU


Switch Name Control Physical NIC Ports

vDS-Comp02 ESXi Management Enabled 2 9000


Network IP Storage
(NFS)
vSphere vMotion
VXLAN Tunnel
Endpoint (VTEP)

© 2016 VMware, Inc. All rights reserved.


Page 97 of 220
VMware Validated Design Reference Architecture Guide

Table 63. vDS-Comp02 Port Group Configuration Settings

Parameter Setting

Failover detection Link status only

Notify switches Enabled

Failback No

Failover order Active uplinks: Uplink1, Uplink2

Figure 39. Network Switch Design for Compute Hosts

This section expands on the logical network design by providing details on the physical NIC layout
and physical network attributes.
Table 64. Compute Cluster Virtual Switches by Physical/Virtual NIC

vSphere Distributed Switch vmnic Function

vDS-Comp02 0 Uplink

vDS-Comp02 1 Uplink

© 2016 VMware, Inc. All rights reserved.


Page 98 of 220
VMware Validated Design Reference Architecture Guide

Table 65. Compute Cluster Virtual Switch Port Groups and VLANs

vSphere Distributed Port Group Name Teaming Active VLAN


Switch Uplinks ID

vDS-Comp02- Route based on


vDS-Comp02 0, 1 1621
Management physical NIC load

Route based on
vDS-Comp02 vDS-Comp02-vMotion 0, 1 1622
physical NIC load

Auto Generated (NSX Route based on SRC-


vDS-Comp02 0, 1 1624
VTEP) ID

Route based on
vDS-Comp02 vDS-Comp02-NFS 0, 1 1625
physical NIC load

Table 66. Compute Cluster VMkernel Adapter

vSphere Distributed Network Connected Port Enabled MTU


Switch Label Group Services

vDS-Comp02- Management 1500


vDS-Comp02 Management
Management traffic (Default)

vDS-Comp02 vMotion vDS-Comp02-vMotion vMotion traffic 9000

vDS-Comp02 NFS vDS-Comp02-NFS - 9000

Auto Generated (NSX


vDS-Comp02 VTEP - 9000
VTEP)

For more information on the physical network design specifications, see the Physical Network Design
section.

3.2.4.5 NIC Teaming


You can use NIC teaming to increase the network bandwidth available in a network path, and to
provide the redundancy that supports higher availability.
NIC teaming helps avoid a single point of failure and provides options for load balancing of traffic. To
further reduce the risk of a single point of failure, build NIC teams by using ports from multiple NIC
and motherboard interfaces.
Create a single virtual switch with teamed NICs across separate physical switches.
This VMware Validated Design uses an active-active configuration using the route that is based on
physical NIC load algorithm for teaming. In this configuration, idle network cards do not wait for a
failure to occur, and they aggregate bandwidth.
NIC Teaming Design Background
For a predictable level of performance, use multiple network adapters in one of the following
configurations.
 An active-passive configuration that uses explicit failover when connected to two separate
switches.

© 2016 VMware, Inc. All rights reserved.


Page 99 of 220
VMware Validated Design Reference Architecture Guide

 An active-active configuration in which two or more physical NICs in the server are assigned the
active role.
This validated design uses an active-active configuration.
Table 67. NIC Teaming and Policy

Design Active- Active- Comments


Quality Active Passive

Using teaming regardless of the option increases the


Availability ↑ ↑
availability of the environment.

Manageability o o Neither design option impacts manageability.

An active-active configuration can send traffic across either


NIC, thereby increasing the available bandwidth. This
Performance ↑ o
configuration provides a benefit if the NICs are being shared
among traffic types and Network I/O Control is used.

Recoverability o o Neither design option impacts recoverability.

Security o o Neither design option impacts security.

Note Legend: ↑ = positive impact on quality; ↓ = negative impact on quality; o = no impact on


quality.

Table 68. NIC Teaming Design Decision

Decision Design Decision Design Justification Design Implication


ID

SDDC-VI- Use the Route based on physical Reduce complexity of Because NSX does not
Net-003 NIC load teaming algorithm for all the network design support Route based on
port groups except for ones that and increase resiliency physical NIC load two
carry VXLAN traffic. VTEP kernel and performance. different algorithms are
ports and VXLAN traffic will use necessary.
Route based on SRC-ID.

3.2.4.6 Network I/O Control


When Network I/O Control is enabled, the distributed switch allocates bandwidth for the following
system traffic types.
 Fault tolerance traffic
 iSCSI traffic
 vSphere vMotion traffic
 Management traffic
 VMware vSphere Replication traffic
 NFS traffic
 VMware Virtual SAN traffic
 vSphere Data Protection backup traffic

© 2016 VMware, Inc. All rights reserved.


Page 100 of 220
VMware Validated Design Reference Architecture Guide

 Virtual machine traffic


How Network I/O Control Works
Network I/O Control enforces the share value specified for the different traffic types only when there is
network contention. When contention occurs Network I/O Control applies the share values set to each
traffic type. As a result, less important traffic, as defined by the share percentage, will be throttled,
allowing more important traffic types to gain access to more network resources.
Network I/O Control also allows the reservation of bandwidth for system traffic based on the capacity
of the physical adapters on a host, and enables fine-grained resource control at the virtual machine
network adapter level. Resource control is similar to the model for vCenter CPU and memory
reservations.
Network I/O Control Considerations
The following heuristics can help with design decisions.
 Shares vs. Limits. When you use bandwidth allocation, consider using shares instead of limits.
Limits impose hard limits on the amount of bandwidth used by a traffic flow even when network
bandwidth is available.
 Limits on Certain Resource Pools. Consider imposing limits on a given resource pool. For
example, if you put a limit on vSphere vMotion traffic, you can benefit in situations where multiple
vSphere vMotion data transfers, initiated on different hosts at the same time, result in
oversubscription at the physical network level. By limiting the available bandwidth for vSphere
vMotion at the ESXi host level, you can prevent performance degradation for other traffic.
 vSphere FT Traffic. Because vSphere FT traffic is latency sensitive, keep the shares value for
this resource pool set to high. If you are using custom shares, set the shares value to a
reasonably high relative value.
 Teaming Policy. When you use Network I/O Control, use Route based on physical NIC load
teaming as a distributed switch teaming policy to maximize the networking capacity utilization.
With load-based teaming, traffic might move among uplinks, and reordering of packets at the
receiver can result occasionally.
 Traffic Shaping. Use distributed port groups to apply configuration policies to different traffic
types. Traffic shaping can help in situations where multiple vSphere vMotion migrations initiated
on different hosts converge on the same destination host. The actual limit and reservation also
depend on the traffic shaping policy for the distributed port group where the adapter is connected
to.
For example, if a VM network adapter requires a limit of 200 Mbps, and the average bandwidth
that is configured in the traffic shaping policy is 100 Mbps, then the effective limit becomes 100
Mbps.
Table 69. Network I/O Control Design Decision

Decision Design Decision Design Justification Design Implication


ID

Network I/O Control version 3 enables Version 2.0 is the


SDDC-VI- Use Network I/O per vNIC reservations as well as default, administrators
NET-004 Control version 3. creating pools of bandwidth that are must upgrade to
guaranteed to workloads in those pools. version 3.0.

If configured incorrectly
Enable Network
Network I/O Control
SDDC-VI- I/O Control on all Increase resiliency and performance of
could impact network
NET-005 distributed the network.
performance for critical
switches.
traffic types.

© 2016 VMware, Inc. All rights reserved.


Page 101 of 220
VMware Validated Design Reference Architecture Guide

Decision Design Decision Design Justification Design Implication


ID

During times of network


Set the share During times of contention vMotion
SDDC-VI- contention vMotion's
value for vMotion traffic is not as important as virtual
NET-006 will take longer then
traffic to Low. machine or storage traffic.
usual to complete.

During times of network


Set the share
During times of contention vSphere contention vSphere
SDDC-VI- value for vSphere
Replication traffic is not as important as Replication will take
NET-007 Replication traffic
virtual machine or storage traffic. longer and could violate
to Low.
the defined SLA.

During times of contention Virtual SAN


Set the share
SDDC-VI- traffic needs guaranteed bandwidth so
value for Virtual None.
NET-008 virtual machine performance does not
SAN to High.
suffer.

By keeping the default setting of Normal


management traffic is prioritized higher
Set the share
then vMotion and vSphere Replication
SDDC-VI- value for
but lower then Virtual SAN traffic. None.
NET-009 Management to
Management traffic is important as it
Normal.
ensures the hosts can still be managed
during times of network contention.

Because NFS is used for secondary


storage, such as VDP backups and During times of
Set the share
SDDC-VI- vRealize Log Insight archives it is not as contention VDP
value for NFS
NET-010 important as Virtual SAN traffic, by backups will be slower
Traffic to Normal.
prioritizing it lower Virtual SAN is not than usual.
impacted.

Set the share


During times of contention it is more During times of
value for vSphere
SDDC-VI- important that primary functions of the contention VDP
Data Protection
NET-011 SDDC continue to have access to backups will be slower
Backup traffic to
network resources over backup traffic. than usual.
Low.

Virtual machines are the most important


Set the share asset in the SDDC. Leaving the default
SDDC-VI-
value for virtual setting of High ensures that they will None.
NET-012
machines to High. always have access to the network
resources they need.

Set the share Fault Tolerance is not used in this


SDDC-VI-
value for Fault design therefore it can be set to the None.
NET-013
Tolerance to Low. lowest priority.

Set the share iSCSI is not used in this design


SDDC-VI-
value for iSCSI therefore it can be set to the lowest None.
NET-014
traffic to Low. priority.

© 2016 VMware, Inc. All rights reserved.


Page 102 of 220
VMware Validated Design Reference Architecture Guide

3.2.4.7 VXLAN
VXLAN provides the capability to create isolated, multi-tenant broadcast domains across data center
fabrics and enables customers to create elastic, logical networks that span physical network
boundaries.
The first step in creating these logical networks is to abstract and pool the networking resources. Just
as vSphere abstracts compute capacity from the server hardware to create virtual pools of resources
that can be consumed as a service, vSphere Distributed Switch and VXLAN abstract the network into
a generalized pool of network capacity and separate the consumption of these services from the
underlying physical infrastructure. A network capacity pool can span physical boundaries, optimizing
compute resource utilization across clusters, pods, and geographically-separated data centers. The
unified pool of network capacity can then be optimally segmented into logical networks that are
directly attached to specific applications.
VXLAN works by creating Layer 2 logical networks that are encapsulated in standard Layer 3 IP
packets. A Segment ID in every frame differentiates the VXLAN logical networks from each other
without any need for VLAN tags. As a result, large numbers of isolated Layer 2 VXLAN networks can
coexist on a common Layer 3 infrastructure.
In the vSphere architecture, the encapsulation is performed between the virtual NIC of the guest VM
and the logical port on the virtual switch, making VXLAN transparent to both the guest virtual
machines and the underlying Layer 3 network. Gateway services between VXLAN and non-VXLAN
hosts (for example, a physical server or the Internet router) are performed by the NSX for vSphere
Edge gateway appliance. The Edge gateway translates VXLAN segment IDs to VLAN IDs, so that
non-VXLAN hosts can communicate with virtual machines on a VXLAN network.
The dedicated edge cluster hosts all NSX Edge instances and all Universal Distributed Logical Router
instances that are connect to the Internet or to corporate VLANs, so that the network administrator
can manage the environment in a more secure and centralized way.
Table 70. VXLAN Design Decisions

Decision ID Design Decision Design Justification Design Implication

SDDC-VI- Use NSX for vSphere to Simplify the network Requires NSX for
Net-015 introduce VXLANs for the configuration for each tenant vSphere licenses.
use of virtual application via centralized virtual network
networks and tenants management.
networks.

SDDC-VI- Use VXLAN along with Create isolated, multi-tenant Transport networks
Net-016 NSX Edge gateways and broadcast domains across and MTU greater than
the Universal Distributed data center fabrics to create 1600 bytes has to be
Logical Router (UDLR) to elastic, logical networks that configured in the
provide customer/tenant span physical network reachability radius.
network capabilities. boundaries.

SDDC-VI- Use VXLAN along with Leverage benefits of network Requires installation
Net-017 NSX Edge gateways and virtualization in the and configuration of the
the Universal Distributed management pod. NSX for vSphere
Logical Router (UDLR) to instance in the
provide management management pod.
application network
capabilities.

© 2016 VMware, Inc. All rights reserved.


Page 103 of 220
VMware Validated Design Reference Architecture Guide

3.2.5 NSX Design


3.2.5.1 Software-Defined Networking Design
This design implements software-defined networking by using VMware NSX™ for vSphere®. With
NSX for vSphere, virtualization delivers for networking what it has already delivered for compute and
storage. In much the same way that server virtualization programmatically creates, snapshots,
deletes, and restores software-based virtual machines (VMs), NSX network virtualization
programmatically creates, snapshots, deletes, and restores software-based virtual networks. The
result is a transformative approach to networking that not only enables data center managers to
achieve orders of magnitude better agility and economics, but also supports a vastly simplified
operational model for the underlying physical network. NSX for vSphere is a nondisruptive solution
because it can be deployed on any IP network, including existing traditional networking models and
next-generation fabric architectures, from any vendor.
When administrators provision workloads, network management is one of the most time-consuming
tasks. Most of the time spent provisioning networks is consumed configuring individual components in
the physical infrastructure and verifying that network changes do not affect other devices that are
using the same networking infrastructure.
The need to pre-provision and configure networks is a major constraint to cloud deployments where
speed, agility, and flexibility are critical requirements. Pre-provisioned physical networks can allow for
the rapid creation of virtual networks and faster deployment times of workloads utilizing the virtual
network. As long as the physical network that you need is already available on the host where the
workload is to be deployed, this works well. However, if the network is not available on a given host,
you must find a host with the available network and spare capacity to run your workload in your
environment.
To get around this bottleneck requires a decoupling of virtual networks from their physical
counterparts. This, in turn, requires that you can programmatically recreate all physical networking
attributes that are required by workloads in the virtualized environment. Because network
virtualization supports the creation of virtual networks without modification of the physical network
infrastructure, it allows more rapid network provisioning.

3.2.5.2 NSX for vSphere Design


Each NSX instance is tied to a vCenter Server instance. The design decision to deploy two vCenter
Server instances per region (SDDC-VI-VC-001) requires deployment of two separate NSX instances
per region.
Table 71. NSX for vSphere Design Decision

Decision Design Decision Design Justification Design


ID Implications

SDDC-VI- Use two separate NSX SDN capabilities offered by NSX, You must install
SDN-001 instances per region. One such as load balancing and firewalls, and perform initial
instance is tied to the are crucial for the compute/edge configuration of the
Management vCenter layer to support the cloud four NSX instances
Server, and the other management platform operations, separately.
instance is tied to the and also for the management
Compute vCenter Server. applications in the management
stack that need these capabilities.

SDDC-VI- Pair NSX Manager NSX can extend the logical You must consider
SDN-002 instances in a primary- boundaries of the networking and that you can pair up
secondary relationship security services across regions. As to eight NSX
across regions for both a result, workloads can be live- Manager instances.
management and compute migrated and failed over between
workloads. regions without reconfiguring the
network and security constructs.

© 2016 VMware, Inc. All rights reserved.


Page 104 of 220
VMware Validated Design Reference Architecture Guide

Figure 40. Architecture of NSX for vSphere

3.2.5.3 NSX Components


The following sections describe the components in the solution and how they are relevant to the
network virtualization design.
Consumption Layer
NSX for vSphere can be consumed by the cloud management platform (CMP), represented by vRealize
Automation, by using the NSX REST API and the vSphere Web Client.
Cloud Management Platform (CMP)
NSX for vSphere is consumed by vRealize Automation. NSX offers self-service provisioning of virtual
networks and related features from a service portal. Details of the service requests and their
orchestration are outside the scope of this document and can be referenced in the Cloud
Management Platform Design document.
API
NSX for vSphere offers a powerful management interface through its REST API.
 A client can read an object by making an HTTP GET request to the object’s resource URL.

© 2016 VMware, Inc. All rights reserved.


Page 105 of 220
VMware Validated Design Reference Architecture Guide

 A client can write (create or modify) an object with an HTTP PUT or POST request that includes a
new or changed XML document for the object.
 A client can delete an object with an HTTP DELETE request.
vSphere Web Client
The NSX Manager component provides a networking and security plug-in in the vSphere Web Client.
This plug-in provides an interface to consuming virtualized networking from the NSX Manager for
users that have sufficient privileges.
Table 72. Consumption Method Design Decisions

Decision Design Decision Design Justification Design Implications


ID

SDDC-VI- For the shared edge and vRealize Automation services Customers typically
SDN-003 compute cluster NSX are used for the customer- interact only indirectly
instance, consumption is facing portal. The vSphere Web with NSX from the
accomplished by using Client consumes NSX for vRealize Automation
the vRealize Automation vSphere resources through the portal. Administrators
services, the vSphere Network and Security plug-in. interact with NSX from
Web Client, and the NSX The NSX REST API offers the the vSphere Web Client
REST API. potential of scripting repeating and API.
actions and operations.

SDDC-VI- For the management Ensures that infrastructure Tenants do not have
SDN-004 cluster NSX instance, components are not modified by access to the
consumption is only by tenants and/or non-provider management stack
provider staff via the staff. workloads.
vSphere Web Client and
the API.

NSX Manager
NSX Manager provides the centralized management plane for NSX for vSphere and has a one-to-one
mapping to vCenter Server workloads.
NSX Manager performs the following functions.
 Provides the single point of configuration and the REST API entry-points for NSX in a vSphere
environment.
 Deploys NSX Controller clusters, Edge distributed routers, and Edge service gateways in the form
of OVF appliances, guest introspection services, and so on.
 Prepares ESXi hosts for NSX by installing VXLAN, distributed routing and firewall kernel modules,
and the User World Agent (UWA).
 Communicates with NSX Controller clusters over REST and with hosts over the RabbitMQ
message bus. This internal message bus is specific to NSX for vSphere and does not require
setup of additional services.
 Generates certificates for the NSX Controller instances and ESXi hosts to secure control plane
communications with mutual authentication.
NSX Controller
An NSX Controller performs the following functions.
 Provides the control plane to distribute VXLAN and logical routing information to ESXi hosts.
 Includes nodes that are clustered for scale-out and high availability.
 Slices network information across cluster nodes for redundancy.

© 2016 VMware, Inc. All rights reserved.


Page 106 of 220
VMware Validated Design Reference Architecture Guide

 Removes requirement of VXLAN Layer 3 multicast in the physical network.


 Provides ARP suppression of broadcast traffic in VXLAN networks.
NSX control plane communication occurs over the management network.
Table 73. NSX Controller Design Decision

Decision Design Decision Design Justification Design Implications


ID

Deploy NSX Controller The secondary NSX Manager


The high availability of
instances in Universal Cluster will not have active controllers
NSX Controller
mode with three members to but will automatically import
SDDC-VI- reduces the downtime
provide high availability and the configuration of the
SDN-005 period in case of
scale. Provision these three Universal Controllers that are
failure of one physical
nodes through the primary created in the primary NSX
host.
NSX Manager instance. Manager

NSX Virtual Switch


The NSX data plane consists of the NSX virtual switch. This virtual switch is based on the vSphere
Distributed Switch (VDS) with additional components to enable rich services. The add-on NSX
components include kernel modules (VIBs) which run within the hypervisor kernel and provide
services such as distributed logical router (DLR) and distributed firewall (DFW), and VXLAN
capabilities.
The NSX virtual switch abstracts the physical network and provides access-level switching in the
hypervisor. It is central to network virtualization because it enables logical networks that are
independent of physical constructs such as VLAN. Using an NSX virtual switch includes several
benefits.
 Supports overlay networking and centralized network configuration. Overlay networking enables
the following capabilities.
o Creation of a flexible logical Layer 2 overlay over existing IP networks on existing physical
infrastructure without the need to re-architect the data center networks.
o Provisioning of communication (east/west and north/south) while maintaining isolation
between tenants.
o Application workloads and virtual machines that are agnostic of the overlay network and
operate as if they were connected to a physical Layer 2 network.
 Facilitates massive scale of hypervisors.
 Because the NSX virtual switch is based on VDS, it provides a comprehensive toolkit for traffic
management, monitoring, and troubleshooting within a virtual network through features such as
port mirroring, NetFlow/IPFIX, configuration backup and restore, network health check, QoS, and
more.
Logical Switching
NSX logical switches create logically abstracted segments to which tenant virtual machines can be
connected. A single logical switch is mapped to a unique VXLAN segment and is distributed across
the ESXi hypervisors within a transport zone. The logical switch allows line-rate switching in the
hypervisor without the constraints of VLAN sprawl or spanning tree issues.
Distributed Logical Router
The NSX distributed logical router (DLR) is optimized for forwarding in the virtualized space, that is,
forwarding between VMs on VXLAN- or VLAN-backed port groups. DLR has the following
characteristics.
 High performance, low overhead first hop routing

© 2016 VMware, Inc. All rights reserved.


Page 107 of 220
VMware Validated Design Reference Architecture Guide

 Scales with number of hosts


 Up to 1,000 Logical Interfaces (LIFs) on each DLR
Distributed Logical Router Control Virtual Machine
The distributed logical router control virtual machine is the control plane component of the routing
process, providing communication between NSX Manager and the NSX Controller cluster through the
User World Agent (UWA). NSX Manager sends logical interface information to the control virtual
machine and the NSX Controller cluster, and the control virtual machine sends routing updates to the
NSX Controller cluster.
User World Agent
The User World Agent (UWA) is a TCP (SSL) client that facilitates communication between the ESXi
hosts and the NSX Controller instances as well as the retrieval of information from the NSX Manager
via interaction with the message bus agent.
VXLAN Tunnel Endpoint
VXLAN Tunnel Endpoints (VTEPs) are instantiated within the vSphere Distributed Switch to which the
ESXi hosts that are prepared for NSX for vSphere are connected. VTEPs are responsible for
encapsulating VXLAN traffic as frames in UDP packets and for the corresponding decapsulation.
VTEPs take the form of one or more VMkernel ports with IP addresses and are used both to
exchange packets with other VTEPs and to join IP multicast groups via Internet Group Membership
Protocol (IGMP). If you use multiple VTEPs, then you must select a teaming method.
Edge Services Gateway
The NSX Edge services gateways (ESGs) primary function is north/south communication, but it also
offers support for Layer 2, Layer 3, perimeter firewall, load balancing and other services such as SSL-
VPN and DHCP-relay.
Distributed Firewall
NSX includes a distributed kernel-level firewall known as the distributed firewall. Security enforcement
is done at the kernel and VM network adapter level. The security enforcement implementation
enables firewall rule enforcement in a highly scalable manner without creating bottlenecks on physical
appliances. The distributed firewall has minimal CPU overhead and can perform at line rate.
The flow monitoring feature of the distributed firewall displays network activity between virtual
machines at the application protocol level. This information can be used to audit network traffic, define
and refine firewall policies, and identify botnets.
Logical Load Balancer
The NSX logical load balancer provides load balancing services up to Layer 7, allowing distribution of
traffic across multiple servers to achieve optimal resource utilization and availability. The logical load
balancer is a service provided by the NSX Edge service gateway.
Table 74. NSX for vSphere Physical Network Requirements

Requirement Comments

VXLAN packets cannot be fragmented. The MTU size


Any network that carries VXLAN traffic must be large enough to support extra encapsulation
must have an MTU size of 1600 or greater. overhead.
This design uses jumbo frames for VXLAN traffic.

© 2016 VMware, Inc. All rights reserved.


Page 108 of 220
VMware Validated Design Reference Architecture Guide

For the hybrid replication mode, Internet IGMP snooping on Layer 2 switches is a requirement of
Group Management Protocol (IGMP) the hybrid replication mode. Hybrid replication mode is
snooping must be enabled on the Layer 2 the recommended replication mode for broadcast,
switches to which ESXi hosts that unknown unicast, and multicast (BUM) traffic when
participate in VXLAN are attached. IGMP deploying into an environment with large scale-out
querier must be enabled on the connected potential. The traditional requirement for Protocol
router or Layer 3 switch. Independent Multicast (PIM) is removed.

Dynamic routing support on the upstream Enable a dynamic routing protocol supported by NSX
Layer 3 data center switches must be on the upstream data center switches to establish
enabled. dynamic routing adjacency with the ESGs.

The NSX Manager requires NTP settings that


synchronize it with the rest of the vSphere environment.
NTP server must be available. Drift can cause problems with authentication. The NSX
Manager must be in sync with the vCenter Single Sign-
On service on the Platform Services Controller.

Forward and reverse DNS resolution for all


The NSX Controller nodes do not require DNS entries.
management VMs must be established.

3.2.5.4 NSX Specifications


The following table lists the components involved in the NSX for vSphere solution and the
requirements for installing and running them. The compute and storage requirements have been
taken into account when sizing resources to support the NSX for vSphere solution.

Note NSX ESG sizing can vary with tenant requirements, so all options are listed.

Table 75. Resource Specification of NSX Components

VM vCPU Memory Storage Quantity per Stack Instance

NSX Manager 4 16 GB 60 GB 1

NSX Controller 4 4 GB 20 GB 3

512 MB
(Compact)
512 MB
1 (Compact) (Compact) 512 MB
(Large)
2 (Large) 1 GB (Large)
512 MB Optional component. Deployment of
NSX ESG 4 (Quad 1 GB (Quad (Quad Large) the NSX ESG varies per use case.
Large) Large)
4.5 GB (X-
6 (X-Large) 8 GB (X- Large)
Large)
(+4 GB with
swap)

DLR control Optional component. Varies with


1 512 MB 512 MB
VM use case. Typically 2 per HA pair.

Guest Optional component. 1 per ESXi


2 1 GB 4 GB
introspection host.

© 2016 VMware, Inc. All rights reserved.


Page 109 of 220
VMware Validated Design Reference Architecture Guide

VM vCPU Memory Storage Quantity per Stack Instance

NSX data Optional component. 1 per ESXi


1 512 MB 6 GB
security host.

NSX Edge Service Gateway Sizing


The Quad Large model is suitable for high performance firewall abilities and the X-Large is suitable
for both high performance load balancing and routing.
You can convert between NSX Edge service gateway sizes upon demand using a non-disruptive
upgrade process, so the recommendation is to begin with the Large model and scale up if necessary.
A Large NSX Edge service gateway is suitable for medium firewall performance but as detailed later,
the NSX Edge service gateway does not perform the majority of firewall functions.

Note Edge service gateway throughput is influenced by the WAN circuit, so an adaptable
approach, that is, converting as necessary, is recommended.

Table 76. NSX Edge Service Gateway Sizing Design Decision

Decision ID Design Decision Design Justification Design


Implications

SDDC-VI- Use large size NSX The large size provides all the performance None.
SDN-006 Edge service characteristics needed even in the event of a
gateways. failure.
A larger size would also provide the
performance required but at the expense of
extra resources that wouldn't be used.

3.2.5.5 Network Virtualization Conceptual Design


The following diagram depicts the conceptual tenant architecture components and their relationship.

© 2016 VMware, Inc. All rights reserved.


Page 110 of 220
VMware Validated Design Reference Architecture Guide

Figure 41. Conceptual Tenant Overview

In this document, tenant refers to a tenant of the cloud management platform within the compute/edge
stack or to a management application within the management stack.
The conceptual design has the following key components.
 External Networks. Connectivity to and from external networks is through the perimeter firewall.
The main external network is the Internet.
 Perimeter Firewall. The physical firewall exists at the perimeter of the data center. Each tenant
receives either a full instance or partition of an instance to filter external traffic.
 Provider Logical Router (PLR). The PLR exists behind the perimeter firewall and handles
north/south traffic that is entering and leaving tenant workloads.
 NSX for vSphere Distributed Logical Router (DLR). This logical router is optimized for
forwarding in the virtualized space, that is, between VMs, on VXLAN port groups or VLAN-backed
port groups.
 Internal Non-Tenant Network. A single management network, which sits behind the perimeter
firewall but not behind the PLR. Enables customers to manage the tenant environments.
 Internal Tenant Networks. Connectivity for the main tenant workload. These networks are
connected to a DLR, which sits behind the PLR. These networks take the form of VXLAN-based

© 2016 VMware, Inc. All rights reserved.


Page 111 of 220
VMware Validated Design Reference Architecture Guide

NSX for vSphere logical switches. Tenant virtual machine workloads will be directly attached to
these networks.

3.2.5.6 Cluster Design for NSX for vSphere


Following the vSphere design, the NSX for vSphere design consists of a management stack and a
compute/edge stack in each region.
Management Stack
In the management stack, the underlying hosts are prepared for NSX for vSphere. The management
stack has these components.
 NSX Manager instances for both stacks (management stack and compute/edge stack),
 NSX Controller cluster for the management stack,
 NSX ESG and DLR control VMs for the management stack.
Compute/Edge Stack
In the compute/edge stack, the underlying hosts are prepared for NSX for vSphere. The
compute/edge stack has these components.
 NSX Controller cluster for the compute stack
 All NSX Edge service gateways and DLR control VMs of the compute stack that are dedicated to
handling the north/south traffic in the data center. A separate edge stack helps prevent VLAN
sprawl because any external VLANs need only be trunked to the hosts in this cluster.
Table 77. vSphere Compute Cluster Split Design Decisions

Decision Design Decision Design Justification Design Implications


ID

SDDC-VI- For the compute stack, Simplifies configuration and The NSX Controller instances,
SDN-007 do not use a dedicated minimizes the number of NSX Edge services gateways,
edge cluster. hosts required for initial and DLR control VMs of the
deployment. compute stack are deployed in
the shared edge and compute
cluster.
The shared nature of the
cluster will require the cluster
to be scaled out as compute
workloads are added so as to
not impact network
performance.

SDDC-VI- For the management The number of supported The NSX Controller instances,
SDN-008 stack, do not use a management applications NSX Edge service gateways,
dedicated edge cluster. does not justify the cost of and DLR control VMs of the
a dedicated edge cluster in management stack are
the management stack. deployed in the management
cluster.

SDDC-VI- Apply vSphere Using DRS prevents Additional configuration is


SDN-009 Distributed Resource controllers from running on required to set up anti-affinity
Scheduler (DRS) anti- the same ESXi host and rules.
affinity rules to the NSX thereby risking their high
components in both availability capability.
stacks.

© 2016 VMware, Inc. All rights reserved.


Page 112 of 220
VMware Validated Design Reference Architecture Guide

The logical design of NSX considers the vCenter Server clusters and define the place where each
NSX component runs.
Figure 42. Cluster Design for NSX for vSphere

High Availability of NSX for vSphere Components


The NSX Manager instances of both stacks run on the management cluster. vSphere HA protects the
NSX Manager instances by ensuring that the NSX Manager VM is restarted on a different host in the
event of primary host failure.
The NSX Controller nodes of the management stack run on the management cluster. The NSX for
vSphere Controller nodes of the compute stack run on the edge cluster. In both clusters, vSphere
Distributed Resource Scheduler (DRS) rules ensure that NSX for vSphere Controller nodes do not run
on the same host.
The data plane remains active during outages in the management and control planes although the
provisioning and modification of virtual networks is impaired until those planes become available
again.
The NSX Edge service gateways and DLR control VMs of the compute stack are deployed on the
edge cluster. The NSX Edge service gateways and DLR control VMs of the management stack run on
the management cluster.
© 2016 VMware, Inc. All rights reserved.
Page 113 of 220
VMware Validated Design Reference Architecture Guide

NSX Edge components that are deployed for north/south traffic are configured in equal-cost multi-
path (ECMP) mode that supports route failover in seconds. NSX Edge components deployed for load
balancing utilize NSX HA. NSX HA provides faster recovery than vSphere HA alone because NSX HA
uses an active/passive pair of NSX Edge devices. By default, the passive Edge device becomes
active within 15 seconds. All NSX Edge devices are also protected by vSphere HA.
Scalability of NSX Components
A one-to-one mapping between NSX Manager instances and vCenter Server instances exists. If the
inventory of either the management stack or the compute stack exceeds the limits supported by a
single vCenter Server, then you can deploy a new vCenter Server instance, and must also deploy a
new NSX Manager instance. You can extend transport zones by adding more compute and edge
clusters until you reach the vCenter Server limits. Consider the limit of 100 DLRs per ESXi host
although the environment usually would exceed other vCenter Server limits before the DLR limit.
vSphere Distributed Switch Uplink Configuration
Each ESXi host utilizes two physical 10 Gb Ethernet adapters, associated with the uplinks on the
vSphere Distributed Switches to which it is connected. Each uplink is connected to a different top-of-
rack switch to mitigate the impact of a single top-of-rack switch failure and to provide two paths in and
out of the SDDC.

Table 78. VTEP Teaming and Failover Configuration Design Decision

Decision Design Decision Design Justification Design Implications


ID

SDDC-VI- Set up VXLAN Tunnel Allows for the utilization of Link aggregation such as LACP
SDN-010 Endpoints (VTEPs) to the two uplinks of the vDS between the top-of-rack (ToR)
use Route based on resulting in better switches and ESXi host must not
SRC-ID for teaming and bandwidth utilization and be configured in order to allow
failover configuration. faster recovery from dynamic routing to peer between
network path failures. the ESGs and the upstream
switches.

3.2.5.7 Logical Switch Control Plane Mode Design


The control plane decouples NSX for vSphere from the physical network and handles the broadcast,
unknown unicast, and multicast (BUM) traffic within the logical switches. The control plane is on top of
the transport zone and is inherited by all logical switches that are created within it. It is possible to
override aspects of the control plane. The following options are available.
 Multicast Mode. The control plane uses multicast IP addresses on the physical network. Use
multicast mode only when upgrading from existing VXLAN deployments. In this mode, you must
configure PIM/IGMP on the physical network.
 Unicast Mode. The control plane is handled by the NSX Controllers and all replication occurs
locally on the host. This mode does not require multicast IP addresses or physical network
configuration.
 Hybrid Mode. This mode is an optimized version of the unicast mode where local traffic
replication for the subnet is offloaded to the physical network. Hybrid mode requires IGMP
snooping on the first-hop switch and access to an IGMP querier in each VTEP subnet. Hybrid
mode does not require PIM.

© 2016 VMware, Inc. All rights reserved.


Page 114 of 220
VMware Validated Design Reference Architecture Guide

Figure 43. Logical Switch Control Plane in Hybrid Mode

This design uses hybrid mode for control plane replication.


Table 79. Logical Switch Control Plane Mode Decision

Decision Design Design Justification Design Implications


ID Decision

Offloading multicast processing to the


physical network reduces pressure on IGMP snooping must
Use hybrid
VTEPs as the environment scales out. For be enabled on the ToR
SDDC-VI- mode for
large environments, hybrid mode is physical switch and an
SDN-011 control plane
preferable to unicast mode. Multicast mode IGMP querier must be
replication.
is used only when migrating from existing available.
VXLAN solutions.

3.2.5.8 Transport Zone Design


A transport zone is used to define the scope of a VXLAN overlay network and can span one or more clusters
within one vCenter Server domain. One or more transport zones can be configured in an NSX for vSphere
solution. A transport zone is not meant to delineate a security boundary.

© 2016 VMware, Inc. All rights reserved.


Page 115 of 220
VMware Validated Design Reference Architecture Guide

Table 80. Transport Zones Design Decisions

Decision Design Decision Design Justification Design Implications


ID

SDDC-VI- For the compute stack, A single Universal Transport You must consider that
SDN-012 use a single universal zone supports extending you can pair up to eight
transport zone that networks and security policies NSX Manager instances. If
encompasses all shared across regions. This allows the solution grows past
edge and compute, and seamless migration of eight NSX Manager
compute clusters from applications across regions instances, you must deploy
all regions.. either by cross vCenter vMotion a new primary manager
or by failover recovery with Site and new transport zone.
Recovery Manager.

SDDC-VI- For the management A single Universal Transport You must consider that
SDN-013 stack, use a single zone supports extending you can pair up to eight
universal transport zone networks and security policies NSX Manager instances. If
that encompasses all across regions. This allows the solution grows past
management clusters. seamless migration of the eight NSX Manager
management applications instances, you must deploy
across regions either by cross- a new primary manager
vCenter vMotion or by failover and new transport zone.
recovery with Site Recovery
Manager.

3.2.5.9 Routing Design


The routing design has to consider different levels of routing in the environment.
 North/south. The Provider Logical Router (PLR) handles the north/south traffic to and from a
tenant and management applications inside of application virtual networks.
 East/west. Internal east/west routing at the layer beneath the PLR deals with the application
workloads.
This design uses a universal distributed logical router (UDLR) which is a universal object that can
cross vCenter Server boundaries. The design decision table uses this abbreviation, for clarity when
the design decisions are viewed in a different context. The rest of this page uses the term distributed
logical router (DLR) to mean the same thing.
Table 81. Routing Model Design Decision

Decision Design Decision Design Justification Design Implications


ID

SDDC- Deploy NSX Edge The NSX ESG is the ECMP requires 2 VLANS
VI-SDN- Services Gateways in an recommended device for for uplinks which adds an
014 ECMP configuration for managing north/south traffic. additional VLAN over
north/south routing in Using ECMP provides multiple traditional HA ESG
both management and paths in and out of the SDDC. configurations.
shared edge and This results in faster failover
compute clusters. times than deploying Edge
service gateways in HA mode.

© 2016 VMware, Inc. All rights reserved.


Page 116 of 220
VMware Validated Design Reference Architecture Guide

Decision Design Decision Design Justification Design Implications


ID

SDDC- Deploy a single NSX Using the UDLR reduces the hop DLRs are limited to 1,000
VI-SDN- UDLR for the count between nodes attached to logical interfaces. When
015 management cluster to it to 1. This reduces latency and that limit is reached, a
provide east/west routing improves performance. new UDLR must be
across all regions. deployed.

SDDC- Deploy a single NSX Using the UDLR reduces the hop DLRs are limited to 1,000
VI-SDN- UDLR for the shared count between nodes attached to logical interfaces. When
016 edge and compute, and it to 1. This reduces latency and that limit is reached a new
compute clusters to improves performance. UDLR must be deployed.
provide east/west routing
across all regions.

SDDC- Deploy all NSX UDLRs When local egress is enabled, All north/south traffic is
VI-SDN- without the local egress control of ingress traffic, is also routed through Region A
017 option enabled. necessary (for example using until those routes are no
NAT). This becomes hard to longer available. At that
manage for little to no benefit. time, all traffic dynamically
changes to Region B.

SDDC- Use BGP as the Using BGP as opposed to OSPF BGP requires configuring
VI-SDN- dynamic routing protocol eases the implementation of each ESG and UDLR with
018 inside the SDDC. dynamic routing. There is no the remote router that it
need to plan and design access exchanges routes with.
to OSPF area 0 inside the SDDC.
OSPF area 0 varies based on
customer configuration.

SDDC- Configure BGP Keep With Keep Alive and Hold Timers If an ESXi host becomes
VI-SDN- Alive Timer to 1 and between the UDLR and ECMP resource constrained, the
019 Hold Down Timer to 3 ESGs set low, a failure is ESG running on that host
between the UDLR and detected quicker, and the routing might no longer be used
all ESGs that provide table is updated faster. even though it is still up.
north/south routing.

SDDC- Configure BGP Keep This provides a good balance By using longer timers to
VI-SDN- Alive Timer to 4 and between failure detection detect when a router is
020 Hold Down Timer to 12 between the ToRs and the ESGs dead, a dead router stays
between the ToR and overburdening the ToRs with in the routing table longer
switches and all ESGs keep alive traffic. and continues to send
providing north/south traffic to a dead router.
routing.

Transit Network and Dynamic Routing


Dedicated networks are needed to facilitate traffic between the universal dynamic routers and edge
gateways, and to facilitate traffic between edge gateways and the top of rack switches. These
networks are used for exchanging routing tables and for carrying transit traffic.

© 2016 VMware, Inc. All rights reserved.


Page 117 of 220
VMware Validated Design Reference Architecture Guide

Table 82. Transit Network Design Decision

Decision Design Decision Design Justification Design Implications


ID

SDDC- Create a universal virtual switch The universal virtual switch Only the primary NSX
VI-SDN- for use as the transit network allows the UDLR and all Manager can create
021 between the UDLR and ESGs. ESGs across regions to and manage universal
The UDLR provides north/south exchange routing objects including this
routing in both compute and information. UDLR.
management stacks.

SDDC- Create two VLANs in each region. This enables the ESGs to Extra VLANs are
VI-SDN- Use those VLANs to enable have multiple equal-cost required.
022 ECMP between the north/south routes and provides more
ESGs and the ToR switches. resiliency and better
bandwidth utilization in the
Each ToR has an SVI on each
network.
VLAN and each north/south ESG
also has an interface on each
VLAN.

3.2.5.10 Firewall Logical Design


The NSX Distributed Firewall is used to protect all management applications attached to application
virtual networks. To secure the SDDC, only other solutions in the SDDC and approved administration
IPs can directly communicate with individual components. External facing portals are accessible via a
load balancer virtual IP (VIP). This simplifies the design by having a single point of administration for
all firewall rules. The firewall on individual ESGs is set to allow all traffic. An exception are ESGs that
provide ECMP services, which require the firewall to be disabled.
Table 83. Tenant Firewall Design Decision

Decision Design Decision Design Justification Design Implications


ID

SDDC-VI- For all ESGs deployed Restricting and granting access is Explicit rules to allow
SDN-023 as load balancers, set handled by the distributed access to management
the default firewall rule firewall. The default firewall rule applications must be
to allow all traffic. does not have to do it. defined in the distributed
firewall.

SDDC-VI- For all ESGs deployed Use of ECMP on the ESGs is a Services such as NAT and
SDN-024 as ECMP north/south requirement. Leaving the firewall load balancing cannot be
routers, disable the enabled, even in allow all traffic used when the firewall is
firewall. mode, results in sporadic network disabled.
connectivity.

3.2.5.11 Load Balancer Design


The ESG implements load balancing within NSX for vSphere. The ESG has both a Layer 4 and a
Layer 7 engine that offer different features, summarized in the following table.

© 2016 VMware, Inc. All rights reserved.


Page 118 of 220
VMware Validated Design Reference Architecture Guide

Table 84. Load Balancer Features of NSX Edge Services Gateway

Feature Layer 4 Engine Layer 7 Engine

TCP
HTTP
Protocols TCP
HTTPS (SSL Pass-through)
HTTPS (SSL Offload)

Round Robin
Round Robin
Source IP Hash
Load balancing method Source IP Hash
Least Connection
Least Connection
URI

TCP
Health checks TCP HTTP (GET, OPTION, POST)
HTTPS (GET, OPTION, POST)

TCP: SourceIP, MSRDP


Persistence (keeping client
HTTP: SourceIP, Cookie
connections to the same TCP: SourceIP
back-end server) HTTPS: SourceIP, Cookie,
ssl_session_id

Connection throttling No Client Side: Maximum concurrent


connections, Maximum new
connections per second
Server Side: Maximum concurrent
connections

High availability Yes Yes

Monitoring View VIP (Virtual IP), Pool View VIP, Pool and Server objects
and Server objects and stats and statistics by using CLI and API
via CLI and API
View global statistics about VIP
View global stats for VIP sessions from the vSphere Web
sessions from the vSphere Client
Web Client

Layer 7 manipulation No URL block, URL rewrite, content rewrite

© 2016 VMware, Inc. All rights reserved.


Page 119 of 220
VMware Validated Design Reference Architecture Guide

Table 85. NSX for vSphere Load Balancer Design Decision

Decision Design Decision Design Justification Design Implications


ID

SDDC- Use the NSX load The NSX load balancer can support None.
VI-SDN- balancer. the needs of the management
025 applications. Using another load
balancer would increase cost and
add another component to be
managed as part of the SDDC.

SDDC- Use a single NSX All management applications that One management
VI-SDN- load balancer in HA require a load balancer are on a application owner could
026 mode for all single virtual wire, having a single make changes to the load
management load balancer keeps the design balancer that impact
applications. simple. another application.

3.2.5.12 Bridging Physical Workloads


NSX for vSphere offers VXLAN to Layer 2 VLAN bridging capabilities with the data path contained
entirely in the ESXi hypervisor. The bridge runs on the ESXi host where the DLR control VM is
located. Multiple bridges per DLR are supported.
Table 86.Virtual to Physical Interface Type Design Decision

Decision Design Decision Design Justification Design


ID Implications

SDDC- Place all virtual machines, both Bridging and routing are not Access to
VI-SDN- management and tenant, on VXLAN- possible on the same logical physical
027 backed networks unless you must switch. As a result, it makes workloads is
satisfy an explicit requirement to use sense to attach a VLAN LIF to a routed via the
VLAN-backed port groups for these distributed router or ESG and DLR or ESG.
virtual machines. If VLAN-backed route between the physical and
port groups are required, connect virtual machines. Use bridging
physical workloads that need to only where virtual machines
communicate to virtualized need access only to the physical
workloads to routed VLAN LIFs on a machines on the same Layer 2.
DLR.

3.2.5.13 Region Connectivity


Regions must be connected to each other. Connection types could be point-to-point links, MPLS,
VPN Tunnels, etc. This connection will vary by customer and is out of scope for this design.

© 2016 VMware, Inc. All rights reserved.


Page 120 of 220
VMware Validated Design Reference Architecture Guide

Table 87. Inter-Site Connectivity Design Decisions

Decision Design Decision Design Justification Design


ID Implications

SDDC-VI- Provide a connection NSX universal objects require connectivity None.


SDN-028 between regions that is between NSX managers and ESXi host
capable of routing VTEPs.
between each pod.
To support cross-region authentication,
the vCenter Server and Platform Services
Controller design requires a single
vCenter Single Sign-On domain.
Portability of management and compute
workloads requires connectivity between
regions.

SDDC-VI- Make sure that the A latency below 150 ms is required for the None.
SDN-029 latency in the connection following features.
between the regions is
Cross-vCenter vMotion
below 150ms.
The NSX design for the SDDC

3.2.5.14 Application Virtual Network


Management applications, such as VMware vRealize Automation, VMware vRealize Operations
Manager, or VMware vRealize Orchestrator, leverage a traditional 3-tier client/server architecture with
a presentation tier (user interface), functional process logic tier, and data tier. This architecture
requires a load balancer for presenting end-user facing services. Implement each of these
management applications as their own trust zone (or multiple trust zones) and isolate management
applications from each other, but also from the external-to-SDDC security zone.

© 2016 VMware, Inc. All rights reserved.


Page 121 of 220
VMware Validated Design Reference Architecture Guide

Table 88. Isolated Management Applications Design Decisions

Decision Design Decision Design Justification Design Implications


ID

SDDC-VI- Place the following Access to the The virtual application network is
SDN-030 management management fronted by an NSX Edge device for
applications on an applications is only load balancing and the distributed
application virtual through published firewall to isolate applications from
network. access points. each other and external users.
Direct access to virtual application
vRealize Automation
networks is controlled by distributed
vRealize Automation firewall rules.
Proxy Agents
vRealize Business
vRealize Business
collectors
vRealize Orchestrator
vRealize Operations
Manager
vRealize Operations
Manager remote
collectors
vRealize Log Insight

SDDC-VI- Create three Using only three A single /24 subnet is used for each
SDN-031 application virtual application virtual application virtual network. IP
networks. networks simplifies the management becomes critical to
design by sharing Layer ensure no shortage of IP addresses
Each region has a
2 networks with will appear in the future.
dedicated application
applications based on
virtual network for
their needs.
management
applications in that
region that do not
require failover.
One application virtual
network is reserved for
management
application failover
between regions.

© 2016 VMware, Inc. All rights reserved.


Page 122 of 220
VMware Validated Design Reference Architecture Guide

Table 89. Portable Management Applications Design Decision

Decision Design Decision Design Justification Design Implications


ID

SDDC-VI- The following management Management applications must Unique addressing is


SDN-032 applications must be easily be easily portable between required for all
portable between regions. regions without requiring management
reconfiguration. applications.
vRealize Automation
vRealize Orchestrator
vRealize Business
vRealize Operations
Manager

Having software-defined networking based on NSX in the management stack makes all NSX features
available to the management applications.
This approach to network virtualization service design improves security and mobility of the
management applications, and reduces the integration effort with existing customer networks.

© 2016 VMware, Inc. All rights reserved.


Page 123 of 220
VMware Validated Design Reference Architecture Guide

Figure 44. Virtual Application Network Components and Design

3.2.5.15 Tenant Onboarding Process Automation


Certain configuration choices might later facilitate the tenant onboarding process.
 Create the primary NSX ESG to act as the tenant PLR and the logical switch that forms the transit
network for use in connecting to the DLR.
 Connect the primary NSX ESG uplinks to the external networks
 Connect the primary NSX ESG internal interface to the transit network.
 Create the NSX DLR to provide routing capabilities for tenant internal networks and connect the
DLR uplink to the transit network.
 Create any tenant networks that are known up front and connect them to the DLR.

© 2016 VMware, Inc. All rights reserved.


Page 124 of 220
VMware Validated Design Reference Architecture Guide

3.2.5.16 Virtual Network Design Example


The Detailed Example for vRealize Automation Networking illustration shows an example for
implementing a management application virtual network. The example service is vRealize
Automation, but any other 3-tier application would look similar.
Figure 45. Detailed Example for vRealize Automation Networking

The example is set up as follows.


 You deploy vRealize Automation on the application virtual network that is used to fail over
applications between regions. This network is provided by a VXLAN virtual wire (orange network
in Detailed Example for vRealize Automation Networking).
 The network that is used by vRealize Automation connects to external networks through NSX for
vSphere. NSX ESGs and the UDLR route traffic between the application virtual networks and the
public network.
 Services such as a Web GUI, which must be available to the end users of vRealize Automation,
are accessible via the NSX Edge load balancer.
The following table shows an example of a mapping from application virtual networks to IPv4 subnets.
The actual mapping depends on the customer environment and is based on available IP subnets.

Note The following IP ranges are samples. Your actual implementation depends on your
environment.

© 2016 VMware, Inc. All rights reserved.


Page 125 of 220
VMware Validated Design Reference Architecture Guide

Table 90. Application Virtual Network Configuration

Application Virtual Management Applications Internal IPv4 Subnet


Network

Mgmt-xRegion01- vRealize Automation (includes vRealize 192.168.11.0/24


VXLAN Orchestrator and vRealize Business)
vRealize Operations Manager

Mgmt-RegionA01- vRealize Log Insight 192.168.31.0/24


VXLAN
vRealize Operations Manager Remote
Collectors
vRealize Automation Proxy Agents

Mgmt-RegionB01- vRealize Log Insight 192.168.32.0/24


VXLAN
vRealize Operations Manager Remote
Collectors
vRealize Automation Proxy Agents

3.2.6 Shared Storage Design


Well-designed shared storage provides the basis for an SDDC and has the following benefits:
 Prevents unauthorized access to business data.
 Protects data from hardware and software failures.
 Protects data from malicious or accidental corruption.
Follow these guidelines when designing shared storage for your environment.
 Optimize the storage design to meet the diverse needs of applications, services, administrators,
and users.
 Strategically align business applications and the storage infrastructure to reduce costs, boost
performance, improve availability, provide security, and enhance functionality.
 Provide multiple tiers of storage to match application data access to application requirements.
 Design each tier of storage with different performance, capacity, and availability characteristics.
Because not every application requires expensive, high-performance, highly available storage,
designing different storage tiers reduces cost.

3.2.6.1 Shared Storage Platform


You can choose between traditional storage, VMware vSphere Virtual Volumes and VMware Virtual
SAN storage.
 Traditional Storage. Fibre Channel, NFS, and iSCSI are mature and viable options to support
virtual machine needs.
 VMware Virtual SAN Storage. VMware Virtual SAN is a software-based distributed storage
platform that combines the compute and storage resources of VMware ESXi hosts. When you
design and size a Virtual SAN cluster, hardware choices are more limited than for traditional
storage.
 Virtual Volumes. This design does not leverage VMware vSphere Virtual Volumes because
Virtual Volumes does not support Site Recovery Manager.

3.2.6.2 Background on Traditional Storage and VMware Virtual SAN Storage


Traditional Storage
Fibre Channel, NFS, and iSCSI are mature and viable options to support virtual machine needs.
© 2016 VMware, Inc. All rights reserved.
Page 126 of 220
VMware Validated Design Reference Architecture Guide

Your decision to implement one technology or another can be based on performance and
functionality, and on considerations like the following:
 The organization’s current in-house expertise and installation base
 The cost, including both capital and long-term operational expenses
 The organization’s current relationship with a storage vendor
VMware Virtual SAN
VMware Virtual SAN is a software-based distributed storage platform that combines the compute and
storage resources of ESXi hosts. It provides a simple storage management experience for the user.
This solution makes software-defined storage a reality for VMware customers. However, you must
carefully consider supported hardware options when sizing and designing a Virtual SAN cluster.

3.2.6.3 Storage Type Comparison


ESXi hosts support a variety of storage types. Each storage type supports different vSphere features.
Table 91. Network Shared Storage Supported by ESXi Hosts

Technology Protocols Transfers Interface

Fibre Channel FC/SCSI Block access of Fibre Channel HBA


data/LUN

Fibre Channel over FCoE/SCSI Block access of Converged network adapter


Ethernet data/LUN (hardware FCoE)
NIC with FCoE support (software
FCoE)

iSCSI IP/SCSI Block access of iSCSI HBA or iSCSI enabled NIC


data/LUN (hardware iSCSI)
Network Adapter (software iSCSI)

NAS IP/NFS File (no direct LUN Network adapter


access)

Virtual SAN IP Block access of Network adapter


data

Table 92. vSphere Features Supported by Storage Type

Type vSphere Datastore Raw Device Application or HA/DRS Storage APIs


vMotion Mapping Block-level Data
(RDM) Clustering Protection

Local Storage Yes VMFS No Yes No Yes

Fibre Channel
/ Fibre
Yes VMFS Yes Yes Yes Yes
Channel over
Ethernet

iSCSI Yes VMFS Yes Yes Yes Yes

© 2016 VMware, Inc. All rights reserved.


Page 127 of 220
VMware Validated Design Reference Architecture Guide

Type vSphere Datastore Raw Device Application or HA/DRS Storage APIs


vMotion Mapping Block-level Data
(RDM) Clustering Protection

NAS over NFS Yes NFS No No Yes Yes

Virtual
Virtual SAN Yes No No Yes Yes
SAN

3.2.6.4 Shared Storage Logical Design


The shared storage design selects the appropriate storage device for each type of cluster:
 Management clusters use Virtual SAN for primary storage and NFS for secondary storage
 Shared edge and compute clusters can use FC/FCoE, iSCSI, NFS, or Virtual SAN storage. No
specific guidance is given as user workloads and other factors determine storage type and SLA
for user workloads
Figure 46. Logical Storage Design

© 2016 VMware, Inc. All rights reserved.


Page 128 of 220
VMware Validated Design Reference Architecture Guide

Table 93. Storage Type Design Decisions

Decision ID Design Decision Design Justification Design Implication

Virtual SAN as the primary shared


storage solution can take
In the management
advantage of more cost-effective
cluster, use
local storage.
VMware Virtual
SAN and NFS Using two storage technologies
shared storage: provides capabilities such as The use of two
deduplication and compression different storage
SDDC-VI- Use Virtual SAN as
which is not available in Virtual technologies
Storage- the primary shared
SAN today. increases the
001 storage platform.
complexity and
NFS is used primarily for archival
Use NFS as the operational overhead.
and the need to maintain historical
secondary shared
data. Leveraging NFS provides
storage platform for
large, low cost volumes that have
the management
the flexibility to be expanded on a
cluster.
regular basis depending on
capacity needs.

In the shared edge


If the datastore runs out of free Monitoring and
and compute
SDDC-VI- space, services that include the capacity management
cluster, ensure that
Storage- NSX Edge core network services are critical and must
at least 20% of free
002 fail. To prevent this, maintain be performed
space is always
adequate free space. proactively.
available.

3.2.6.5 Storage Tiering


Today’s enterprise-class storage arrays contain multiple drive types and protection mechanisms. The
storage, server, and application administrators face challenges when selecting the correct storage
configuration for each application being deployed in the environment. Virtualization can make this
problem more challenging by consolidating many different application workloads onto a small number
of large devices. Given this challenge, administrators might use single storage type for every type of
workload without regard to the needs of the particular workload. However, not all application
workloads have the same requirements, and storage tiering allows for these differences by creating
multiple levels of storage with varying degrees of performance, reliability and cost, depending on the
application workload needs.
The most mission-critical data typically represents the smallest amount of data and offline data
represents the largest amount. Details differ for different organizations.
To determine the storage tier for application data, determine the storage characteristics of the
application or service.
 I/O operations per second (IOPS) requirements
 Megabytes per second (MBps) requirements
 Capacity requirements
 Availability requirements
 Latency requirements
After you determine the information for each application, you can move the application to the storage
tier with matching characteristics.
 Consider any existing service-level agreements (SLAs)

© 2016 VMware, Inc. All rights reserved.


Page 129 of 220
VMware Validated Design Reference Architecture Guide

 Move data between storage tiers during the application life cycle as needed.

3.2.6.6 VMware Hardware Acceleration API/CLI for Storage


The VMware Hardware Acceleration API/CLI for storage (previously known as vStorage APIs for
Array Integration or VAAI), supports a set of ESXCLI commands for enabling communication between
ESXi hosts and storage devices. The APIs define a set of storage primitives that enable the ESXi host
to offload certain storage operations to the array. Offloading the operations reduces resource
overhead on the ESXi hosts and can significantly improve performance for storage-intensive
operations such as storage cloning, zeroing, and so on. The goal of hardware acceleration is to help
storage vendors provide hardware assistance to speed up VMware I/O operations that are more
efficiently accomplished in the storage hardware.
Without the use of VAAI, cloning or migration of virtual machines by the VMkernel data mover
involves software data movement. The data mover issues I/O to read and write blocks to and from the
source and destination datastores. With VAAI, the data mover can use the API primitives to offload
operations to the array when possible. For example, when you copy a virtual machine disk file
(VMDK file) from one datastore to another inside the same array, the data mover directs the array to
make the copy completely inside the array. If you invoke a data movement operation and the
corresponding hardware offload operation is enabled, the data mover first attempts to use hardware
offload. If the hardware offload operation fails, the data mover reverts to the traditional software
method of data movement.
In nearly all cases, hardware data movement performs significantly better than software data
movement. It consumes fewer CPU cycles and less bandwidth on the storage fabric. Timing
operations that use the VAAI primitives and use esxtop to track values such as CMDS/s, READS/s,
WRITES/s, MBREAD/s, and MBWRTN/s of storage adapters during the operation show performance
improvements.
Table 94. VAAI Design Decisions

Decision Design Design Justification Design Implication


ID Decision

SDDC-VI- Select an VAAI offloads tasks to the array Not all VAAI arrays support
Storage- array that itself, enabling the ESXi hypervisor VAAI over NFS. A plugin
003 supports VAAI to use its resources for application from the array vendor is
over NAS workloads and not become a required to enable this
(NFS). bottleneck in the storage functionality.
subsystem.
VAAI is required to support the
desired number of virtual machine
lifecycle operations.

3.2.6.7 Virtual Machine Storage Policies


You can create a storage policy for a virtual machine to specify which storage capabilities and
characteristics are the best match for this virtual machine.

Note VMware Virtual SAN uses storage policies to allow specification of the characteristics of
virtual machines, so you can define the policy on an individual disk level rather than at the
volume level for Virtual SAN.

You can identify the storage subsystem capabilities by using the VMware vSphere API for Storage
Awareness or by using a user-defined storage policy.
 VMware vSphere API for Storage Awareness (VASA). With vSphere API for Storage
Awareness, storage vendors can publish the capabilities of their storage to VMware vCenter
Server, which can display these capabilities in its user interface.

© 2016 VMware, Inc. All rights reserved.


Page 130 of 220
VMware Validated Design Reference Architecture Guide

 User-defined storage policy. Defined by using the VMware Storage Policy SDK or VMware
vSphere PowerCLI (see the Sample Scripts), or from the vSphere Web Client.
You can assign a storage policy to a virtual machine and periodically check for compliance so that the
virtual machine continues to run on storage with the correct performance and availability
characteristics.
You can associate a virtual machine with a virtual machine storage policy when you create, clone, or
migrate that virtual machine. If a virtual machine is associated with a storage policy, the vSphere Web
Client shows the datastores that are compatible with the policy. You can select a datastore or
datastore cluster. If you select a datastore that does not match the virtual machine storage policy, the
vSphere Web Client shows that the virtual machine is using non-compliant storage. See Creating and
Managing vSphere Storage Policies.
Table 95. Virtual Machine Storage Policy Design Decisions

Decision ID Design Decision Design Justification Design Implication

SDDC-VI- Do not use The default Virtual SAN If 3rd party or additional VMs
Storage-004 customized virtual storage policy is adequate have different storage
machine storage for the management requirements, additional VM
policies. cluster VMs. storage policies might be
required.

3.2.6.8 vSphere Storage I/O Control Background Information


VMware vSphere Storage I/O Control allows cluster-wide storage I/O prioritization, which results in
better workload consolidation and helps reduce extra costs associated with overprovisioning.
vSphere Storage I/O Control extends the constructs of shares and limits to storage I/O resources.
You can control the amount of storage I/O that is allocated to virtual machines during periods of I/O
congestion, so that more important virtual machines get preference over less important virtual
machines for I/O resource allocation.
When vSphere Storage I/O Control is enabled on a datastore, the ESXi host monitors the device
latency when communicating with that datastore. When device latency exceeds a threshold, the
datastore is considered to be congested and each virtual machine that accesses that datastore is
allocated I/O resources in proportion to their shares. Shares are set on a per-virtual machine basis
and can be adjusted.
vSphere Storage I/O Control has several requirements, limitations, and constraints.
 Datastores that are enabled with vSphere Storage I/O Control must be managed by a single
vCenter Server system.
 Storage I/O Control is supported on Fibre Channel-connected, iSCSI-connected, and NFS-
connected storage. RDM is not supported.
 Storage I/O Control does not support datastores with multiple extents.
 Before using vSphere Storage I/O Control on datastores that are backed by arrays with
automated storage tiering capabilities, check the VMware Compatibility Guide whether the
storage array has been certified a compatible with vSphere Storage I/O Control.
Table 96. Storage I/O Control Design Decisions

Decision Design Decision Design Justification Design Implication


ID

© 2016 VMware, Inc. All rights reserved.


Page 131 of 220
VMware Validated Design Reference Architecture Guide

SDDC-VI- Enable Storage I/O Storage I/O Control ensures that Virtual machines that
Storage- Control with the all virtual machines on a use more I/O are
005 default values on the datastore receive an equal throttled to allow other
NFS datastores. amount of I/O. virtual machines access
to the datastore only
when contention occurs
on the datastore.

SDDC-VI- In the shared edge Storage I/O Control ensures that Virtual machines that
Storage- and compute cluster, all virtual machines on a use more I/O are
006 enable Storage I/O datastore receive an equal throttled to allow other
Control with default amount of I/O. For the NSX virtual machines access
values. components in this shared to the datastore only
cluster it is critical that they have when contention occurs
equal access to the datastore to on the datastore.
avoid network bottlenecks.

3.2.6.9 Datastore Cluster Design


A datastore cluster is a collection of datastores with shared resources and a shared management
interface. Datastore clusters are to datastores what clusters are to ESXi hosts. After you create a
datastore cluster, you can use vSphere Storage DRS to manage storage resources.
vSphere datastore clusters group similar datastores into a pool of storage resources. When vSphere
Storage DRS is enabled on a datastore cluster, vSphere automates the process of initial virtual
machine file placement and balances storage resources across the cluster to avoid bottlenecks.
vSphere Storage DRS considers datastore space usage and I/O load when making migration
recommendations.
When you add a datastore to a datastore cluster, the datastore's resources become part of the
datastore cluster's resources. The following resource management capabilities are also available for
each datastore cluster.
Table 97. Resource Management Capabilities Available for Datastores

Capability Description

Space You can set a threshold for space use. When space use on a datastore
utilization load exceeds the threshold, vSphere Storage DRS generates recommendations or
balancing performs migrations with vSphere Storage vMotion to balance space use
across the datastore cluster.

I/O latency You can configure the I/O latency threshold to avoid bottlenecks. When I/O
load balancing latency on a datastore exceeds the threshold, vSphere Storage DRS generates
recommendations or performs vSphere Storage vMotion migrations to help
alleviate high I/O load.

Anti-affinity You can configure anti-affinity rules for virtual machine disks to ensure that the
rules virtual disks of a virtual machine are kept on different datastores. By default, all
virtual disks for a virtual machine are placed on the same datastore.

You can enable vSphere Storage I/O Control or vSphere Storage DRS for a datastore cluster. You
can enable the two features separately, even though vSphere Storage I/O control is enabled by
default when you enable vSphere Storage DRS.

© 2016 VMware, Inc. All rights reserved.


Page 132 of 220
VMware Validated Design Reference Architecture Guide

3.2.6.10 vSphere Storage DRS Background Information


vSphere Storage DRS supports automating the management of datastores based on latency and
storage utilization. When configuring vSphere Storage DRS, verify that all datastores use the same
version of VMFS and are on the same storage subsystem. Because vSphere Storage vMotion
performs the migration of the virtual machines, confirm that all prerequisites are met.
vSphere Storage DRS provides a way of balancing usage and IOPS among datastores in a storage
cluster:
 Initial placement of virtual machines is based on storage capacity.
 vSphere Storage DRS uses vSphere Storage vMotion to migrate virtual machines based on
storage capacity.
 vSphere Storage DRS uses vSphere Storage vMotion to migrate virtual machines based on I/O
latency.
 You can configure vSphere Storage DRS to run in either manual mode or in fully automated
mode.
vSphere Storage I/O Control and vSphere Storage DRS manage latency differently.
 vSphere Storage I/O Control distributes the resources based on virtual disk share value after a
latency threshold is reached.
 vSphere Storage DRS measures latency over a period of time. If the latency threshold of vSphere
Storage DRS is met in that time frame, vSphere Storage DRS migrates virtual machines to
balance latency across the datastores that are part of the cluster.
When making a vSphere Storage design decision, consider these points:
 Use vSphere Storage DRS where possible.
 vSphere Storage DRS provides a way of balancing usage and IOPS among datastores in a
storage cluster:
o Initial placement of virtual machines is based on storage capacity.
o vSphere Storage vMotion is used to migrate virtual machines based on storage capacity.
o vSphere Storage vMotion is used to migrate virtual machines based on I/O latency.
o vSphere Storage DRS can be configured in either manual or fully automated modes

3.2.6.11 Virtual SAN Storage Design


The VMware Virtual SAN Storage design in this VMware Validated Design includes conceptual
design, logical design, network design, cluster and disk group design, and policy design.
Conceptual Design
This Virtual SAN design is limited to the management cluster only. The design uses the default
Storage Policy to achieve redundancy and performance within the cluster.
While Virtual SAN can be used within the shared edge and compute cluster, this design currently
gives no guidance for the implementation.

© 2016 VMware, Inc. All rights reserved.


Page 133 of 220
VMware Validated Design Reference Architecture Guide

Figure 47. Conceptual Virtual SAN Design

Virtual SAN Logical Design


In a cluster that is managed by vCenter Server, you can manage software-defined storage resources
just as you can manage compute resources. Instead of CPU or memory reservations, limits, and
shares, you can define storage policies and assign them to virtual machines. The policies specify the
characteristics of the storage and can be changed as business requirements change.
VMware Virtual SAN Network Design
When performing network configuration, you have to consider the traffic and decide how to isolate
Virtual SAN traffic.
 Consider how much replication and communication traffic is running between hosts. With VMware
Virtual SAN, the amount of traffic depends on the number of VMs that are running in the cluster,
and on how write-intensive the I/O is for the applications running in the VMs.
 Isolate Virtual SAN traffic on its own Layer 2 network segment. You can do this with dedicated
switches or ports, or by using a VLAN.
The Virtual SAN VMkernel port group is created as part of cluster creation. Configure this port group
on all hosts in a cluster, even for hosts that are not contributing storage resources to the cluster.
The Conceptual Network Diagram below illustrates the logical design of the network

© 2016 VMware, Inc. All rights reserved.


Page 134 of 220
VMware Validated Design Reference Architecture Guide

Figure 48. Virtual SAN Conceptual Network Diagram

Network Bandwidth Requirements


VMware recommends that solutions use a 10 Gb Ethernet connection for use with Virtual SAN to
ensure the best and most predictable performance (IOPS) for the environment. Without it, a
significant decrease in array performance results.

Note Virtual SAN all-flash configurations are supported only with 10 GbE.

Table 98. Network Speed Selection

Design 1 Gb 10 Gb Comments
Quality

Availability o o Neither design option impacts availability.

Manageability o o Neither design option impacts manageability.

Faster network speeds increase Virtual SAN performance


Performance ↓ ↑
(especially in I/O intensive situations).

Faster network speeds increase the performance of rebuilds


Recoverability ↓ ↑ and synchronizations in the environment. This ensures that VMs
are properly protected from failures.

Security o o Neither design option impacts security.

Legend: ↑ = positive impact on quality; ↓ = negative impact on quality; o = no impact on quality.

Table 99. Network Bandwidth Design Decision

Decision Design Design Justification Design Implication


ID Decision

© 2016 VMware, Inc. All rights reserved.


Page 135 of 220
VMware Validated Design Reference Architecture Guide

SDDC- Use only 10 Performance with 10 GbE is The physical network must
SDS-001 GbE for optimal. Without it, a significant support 10 Gb networking
VMware Virtual decrease in array performance between every host in the
SAN traffic. results. Virtual SAN clusters.

VMware Virtual SAN Virtual Switch Type


Virtual SAN supports the use of vSphere Standard Switch or vSphere Distributed Switch. The benefit
of using vSphere Distributed Switch is that it supports Network I/O Control which allows for
prioritization of bandwidth in case of contention in an environment.
This design uses a vSphere Distributed Switch for the Virtual SAN port group to ensure that priority
can be assigned using Network I/O Control to separate and guarantee the bandwidth for Virtual SAN
traffic.
Virtual Switch Design Background
Virtual switch type affects performance and security of the environment.
Table 100. Virtual Switch Types

Design vSphere vSphere Comments


Quality Standard Distributed
Switch Switch

Availability o o Neither design option impacts availability.

The vSphere Distributed Switch is centrally


managed across all hosts, unlike the
Manageability ↓ ↑
standard switch which is managed on each
host individually.

The vSphere Distributed Switch has added


controls, such as Network I/O Control, which
Performance ↓ ↑
you can use to guarantee performance for
Virtual SAN traffic.

The vSphere Distributed Switch has added


controls, such as Network I/O Control, which
Recoverability ↓ ↑
you can use to guarantee performance for
Virtual SAN traffic.

The vSphere Distributed Switch has added


Security ↓ ↑
built-in security controls to help protect traffic.

Legend: ↑ = positive impact on quality; ↓ = negative impact on quality; o = no impact on quality.

Table 101. Virtual Switch Design Decisions

Decision ID Design Decision Design Justification Design


Implication

SDDC-SDS- Use the existing vSphere Provide high availability for All traffic paths
002 Distributed Switch Virtual SAN traffic in case of are shared over
instances in the contention by using existing common uplinks.
management and edge networking components.
clusters.

© 2016 VMware, Inc. All rights reserved.


Page 136 of 220
VMware Validated Design Reference Architecture Guide

Jumbo Frames
VMware Virtual SAN supports jumbo frames for Virtual SAN traffic.
A Virtual SAN design should use jumbo frames only if the physical environment is already configured
to support them, they are part of the existing design, or if the underlying configuration does not create
a significant amount of added complexity to the design.
Table 102. Jumbo Frames Design Decision

Decision Design Design Justification Design Implication


ID Decision

SDDC- Use jumbo Jumbo frames are already used to Every device in the
SDS-003 frames. improve performance of vSphere network must support
vMotion and NFS storage traffic. jumbo frames.

VLANs
VMware recommends isolating VMware Virtual SAN traffic on its own VLAN. When a design uses
multiple Virtual SAN clusters, each cluster should use a dedicated VLAN or segment for its traffic.
This approach prevents interference between clusters and helps with troubleshooting cluster
configuration.
Table 103. VLAN Design Decision

Decision Design Decision Design Design Implication


ID Justification

SDDC- Use a dedicated VLAN for VLANs ensure VLANs span only a single
SDS-004 Virtual SAN traffic for the traffic isolation. pod.
management cluster and for the
A sufficient number of
edge cluster.
VLANs are available within
each pod and should be used
for traffic segregation.

Multicast Requirements
Virtual SAN requires that IP multicast is enabled on the Layer 2 physical network segment that is
used for intra-cluster communication. All VMkernel ports on the Virtual SAN network subscribe to a
multicast group using Internet Group Management Protocol (IGMP).
A default multicast address is assigned to each Virtual SAN cluster at the time of creation. IGMP (v3)
snooping is used to limit Layer 2 multicast traffic to specific port groups. As per the Physical Network
Design, IGMP snooping is configured with an IGMP snooping querier to limit the physical switch ports
that participate in the multicast group to only Virtual SAN VMkernel port uplinks. In some cases, an
IGMP snooping querier can be associated with a specific VLAN. However, vendor implementations
might differ.
Cluster and Disk Group Design
When considering the cluster and disk group design, you have to decide on the Virtual SAN datastore
size, number of hosts per cluster, number of disk groups per host, and the Virtual SAN policy.
VMware Virtual SAN Datastore Size
The size of the Virtual SAN datastore depends on the requirements for the datastore. Consider cost
versus availability to provide the appropriate sizing.

© 2016 VMware, Inc. All rights reserved.


Page 137 of 220
VMware Validated Design Reference Architecture Guide

Table 104. Virtual SAN Datastore Design Decisions

Decision Design Design Justification Design


ID Decision Implication

SDDC- Management Management cluster virtual machines that None


SDS-005 cluster: Minimum use Virtual SAN require at least 8 TB of
8 TB raw raw storage; however, the total storage
footprint does not have to be completely
Virtual SAN based.
NFS is used for additional shared storage
of some management components.

Number of Hosts per Cluster


The number of hosts in the cluster depends on these factors:
 Amount of available space on the Virtual SAN datastore
 Number of failures you can tolerate in the cluster
For example, if the Virtual SAN cluster has only 3 ESXi hosts, only a single failure is supported. If a
higher level of availability is required, additional hosts are required.
Cluster Size Design Background
Table 105. Number of Hosts per Cluster

Design 3 Hosts 32 Hosts 64 Hosts Comments


Quality

Availability ↓ ↑ ↑↑ The more hosts that are available in the


cluster, the more failures the cluster can
tolerate.

Manageability ↓ ↑ ↑ The more hosts in the cluster, the more virtual


machines can be in the Virtual SAN
environment.

Performance ↑ ↓ ↓ Having a larger cluster can impact


performance if there is an imbalance of
resources. Consider performance as you
make your decision.

Recoverability o o o Neither design option impacts recoverability.

Security o o o Neither design option impacts security.

Legend: ↑ = positive impact on quality; ↓ = negative impact on quality; o = no impact on quality.

© 2016 VMware, Inc. All rights reserved.


Page 138 of 220
VMware Validated Design Reference Architecture Guide

Table 106. Cluster Size Design Decisions

Decision Design Design Justification Design Implication


ID Decision

SDDC- The Having 4 hosts addresses the The availability


SDS-006 management availability and sizing requirements for the
cluster includes 4 requirements, and allows you to management cluster
ESXi hosts for take an ESXi host offline for might cause
Virtual SAN. maintenance or upgrades without underutilization of the
impacting the overall Virtual SAN cluster hosts.
health.

Number of Disk Groups per Host


Disk group sizing is an important factor during volume design.
 If more hosts are available in the cluster, more failures are tolerated in the cluster. This capability
adds cost because additional hardware for the disk groups is required.
 More available disk groups can increase the recoverability of Virtual SAN during a failure.
Consider these data points when deciding on the number of disk groups per host:
 Amount of available space on the Virtual SAN datastore
 Number of failures you can tolerate in the cluster
The optimal number of disk groups is a balance between hardware and space requirements for the
Virtual SAN datastore. More disk groups increase space and provide higher availability. However,
adding disk groups can be cost-prohibitive.
Disk Groups Design Background
The number of disk groups can affect availability and performance.
Table 107. Number of Disk Groups per Host

Design 1 Disk 3 Disk 5 Disk Comments


Quality Group Groups Groups

Availability ↓ ↑ ↑↑ If more hosts are available in the cluster, the


cluster tolerates more failures. This capability
adds cost because additional hardware for
the disk groups is required.

Manageability o o o If more hosts are in the cluster, more virtual


machines can be managed in the Virtual SAN
environment.

Performance o ↑ ↑↑ If the flash percentage ratio to storage


capacity is large, the Virtual SAN can deliver
increased performance and speed.

Recoverability o o o More available disk groups can increase the


recoverability of Virtual SAN during a failure.
Rebuilds complete faster because there are
more places to place data and to copy data
from.

Security o o o Neither design option impacts security.

© 2016 VMware, Inc. All rights reserved.


Page 139 of 220
VMware Validated Design Reference Architecture Guide

Legend: ↑ = positive impact on quality; ↓ = negative impact on quality; o = no impact on quality.

Table 108. Disk Groups per Host Design Decision

Decision Design Decision Design Justification Design Implication


ID

SDDC- Use a single disk Single disk group provides Losing an SSD in a host
SDS-007 group per ESXi host in the required performance will take the disk group
the management and usable space for the offline.
cluster. datastore.
Using two or more disk
groups can increase
availability and
performance.

Virtual SAN Policy Design


After you enable and configure VMware Virtual SAN, you can create storage policies that define the
virtual machine storage characteristics. Storage characteristics specify different levels of service for
different virtual machines. The default storage policy tolerates a single failure and has a single disk
stripe. Use the default unless your environment requires policies with non-default behavior. If you
configure a custom policy, Virtual SAN will guarantee it; however, if Virtual SAN cannot guarantee a
policy, you cannot provision a virtual machine that uses the policy unless you enable force
provisioning.
Virtual SAN Policy Options
After you enable and configure VMware Virtual SAN, you can create storage policies that define the
virtual machine storage characteristics. Storage characteristics specify different levels of service for
different virtual machines. The default storage policy tolerates a single failure and has a single disk
stripe. Use the default unless your environment requires policies with non-default behavior. If you
configure a custom policy, Virtual SAN will guarantee it; however, if Virtual SAN cannot guarantee a
policy, you cannot provision a virtual machine that uses the policy unless you enable force
provisioning.
Policy Design Background
Before making design decisions, understand the policies and the objects to which they can be
applied. The policy options are listed in the following table.
Table 109. Virtual SAN Policy Options

Capability Use Case Value Comments

Number of Redundancy Default 1 A standard RAID 1 mirrored configuration that


failures to provides redundancy for a virtual machine disk. The
Max 3
tolerate higher the value, the more failures can be tolerated.
For n failures tolerated, n+1 copies of the disk are
created, and 2n+1 hosts contributing storage are
required.
A higher n value indicates that more replicas of virtual
machines are made, which can consume more disk
space than expected.

© 2016 VMware, Inc. All rights reserved.


Page 140 of 220
VMware Validated Design Reference Architecture Guide

Capability Use Case Value Comments

Number of Performance Default 1 A standard RAID 0 stripe configuration used to


disk stripes increase performance for a virtual machine disk.
Max 12
per object
This setting defines the number of HDDs on which
each replica of a storage object is striped.
If the value is higher than 1, increased performance
can result. However, an increase in system resource
usage might also result.

Flash read Performance Default 0 Flash capacity reserved as read cache for the storage
cache is a percentage of the logical object size that will be
Max
reservation reserved for that object.
100%
(%)
Only use this setting for workloads if you must
address read performance issues. The downside of
this setting is that other objects cannot use a reserved
cache.
VMware recommends not using these reservations
unless it is absolutely necessary because unreserved
flash is shared fairly among all objects.

Object Thick Default 0 The percentage of the storage object that will be thick
space provisioning provisioned upon VM creation. The remainder of the
Max
reservation storage will be thin provisioned.
100%
(%)
This setting is useful if a predictable amount of
storage will always be filled by an object, cutting back
on repeatable disk growth operations for all but new or
non-predictable storage use.

Force Override Default: Force provisioning allows for provisioning to occur


provisioning policy even if the currently available cluster resources
No
cannot satisfy the current policy.
Force provisioning is useful in case of a planned
expansion of the Virtual SAN cluster, during which
provisioning of VMs must continue. Virtual SAN
automatically tries to bring the object into compliance
as resources become available.

By default, policies are configured based on application requirements. However, they are applied
differently depending on the object.

© 2016 VMware, Inc. All rights reserved.


Page 141 of 220
VMware Validated Design Reference Architecture Guide

Table 110. Object Policy Defaults

Object Policy Comments

Virtual machine Failures-to-Tolerate: 1 Configurable. Changes are not recommended.


namespace

Swap Failures-to-Tolerate: 1 Configurable. Changes are not recommended.

Virtual disk(s) User-Configured Can be any storage policy configured on the


Storage Policy system.

Virtual disk Uses virtual disk policy Same as virtual disk policy by default. Changes
snapshot(s) are not recommended.

Note If you do not specify a user-configured policy, the default system policy of 1 failure to tolerate
and 1 disk stripe is used for virtual disk(s) and virtual disk snapshot(s). Policy defaults for the
VM namespace and swap are set statically and are not configurable to ensure appropriate
protection for these critical virtual machine components. Policies must be configured based
on the application’s business requirements. Policies give Virtual SAN its power because it can
adjust how a disk performs on the fly based on the policies configured.

Policy Design Recommendations


Policy design starts with assessment of business needs and application requirements. Use cases for
Virtual SAN must be assessed to determine the necessary policies. Start by assessing the following
application requirements:
 I/O performance and profile of your workloads on a per-virtual-disk basis
 Working sets of your workloads
 Hot-add of additional cache (requires repopulation of cache)
 Specific application best practice (such as block size)
After assessment, configure the software-defined storage module policies for availability and
performance in a conservative manner so that space consumed and recoverability properties are
balanced. In many cases the default system policy is adequate and no additional policies are required
unless specific requirements for performance or availability exist.
Table 111. Policy Design Decision

Decision Design Design Justification Design Implication


ID Decision

SDDC- Use the default The default Virtual SAN Additional policies might be
SDS-008 VMware Virtual storage policy provides the needed if 3rd party VMs are hosted
SAN storage level of redundancy that is in these clusters because their
policy. needed within both the performance or availability
management cluster. requirements might differ from
what the default Virtual SAN policy
supports.

3.2.6.12 NFS Storage Design


This NFS design does not give specific vendor or array guidance. Consult your storage vendor for the
configuration settings appropriate for your storage array.

© 2016 VMware, Inc. All rights reserved.


Page 142 of 220
VMware Validated Design Reference Architecture Guide

NFS Storage Concepts


NFS (Network File System) presents file devices to an ESXi host for mounting over a network. The
NFS server or array makes its local file systems available to ESXi hosts. The ESXi hosts access the
metadata and files on the NFS array or server using a RPC-based protocol. NFS is implemented
using Standard NIC that is accessed using a VMkernel port (vmknic).
NFS Load Balancing
No load balancing is available for NFS/NAS on vSphere because it is based on single session
connections. You can configure aggregate bandwidth by creating multiple paths to the NAS array, and
by accessing some datastores via one path, and other datastores via another path. You can configure
NIC Teaming so that if one interface fails, another can take its place. However these load balancing
techniques work only in case of a network failure and might not be able to handle error conditions on
the NFS array or on the NFS server. The storage vendor is often the source for correct configuration
and configuration maximums.
NFS Versions
vSphere is compatible with both NFS version 3 and version 4.1; however, not all features can be
enabled when connecting to storage arrays that use NFS v4.1.
Table 112. NFS Version Design Decision

Decision Design Decision Design Justification Design Implication


ID

SDDC- Use NFS v3 for all NFS v4.1 datastores are not NFS v3 does not
NFS-001 NFS hosted supported with Storage I/O Control support Kerberos
datastores. and with Site Recovery Manager. authentication.

Storage Access
NFS v3 traffic is transmitted in an unencrypted format across the LAN. Therefore, best practice is to
use NFS storage on trusted networks only and to isolate the traffic on dedicated VLANs.
Many NFS arrays have some built-in security, which enables them to control the IP addresses that
can mount NFS exports. Best practice is to use this feature to determine which ESXi hosts can mount
the volumes that are being exported and have read/write access to those volumes. This prevents
unapproved hosts from mounting the NFS datastores.
Exports
All NFS exports are shared directories that sit on top of a storage volume. These exports control the
access between the endpoints (ESXi hosts) and the underlying storage system. Multiple exports can
exist on a single volume, with different access controls on each.
Table 113. NFS Export Sizing

Export Size per Region Size

vSphere Data Protection 4 TB

vRealize Log Insight Archive 1 TB

vRealize Automation Content Library 1 TB

© 2016 VMware, Inc. All rights reserved.


Page 143 of 220
VMware Validated Design Reference Architecture Guide

Figure 49. NFS Storage Exports

Table 114. NFS Export Design Decisions

Decision ID Design Decision Design Justification Design Implication

SDDC- Create 3 exports to The storage requirements of Having multiple


NFS-002 support the management these management exports can introduce
components. components are separate operational overhead.
from the primary storage.
vSphere Data Protection
vRealize Log Insight
Archive
vRealize Automation
Content Library

SDDC- Place the vSphere Data vSphere Data Protection is I/O Dedicated exports can
NFS-003 Protection export on its intensive. vSphere Data add management
own separate volume as Protection or other overhead to storage
per SDDC-PHY-STO- applications suffer if vSphere administrators.
008 Data Protection is placed on a
shared volume.

SDDC- For each export, limit Limiting access helps ensure Securing exports
NFS-004 access to only the the security of the underlying individually can
application VMs or hosts data. introduce operational
requiring the ability to overhead.
mount the storage.

NFS Datastores
Within vSphere environments, ESXi hosts mount NFS exports as a file-share instead of using the
VMFS clustering filesystem. For this design, only secondary storage is being hosted on NFS storage.
The datastore construct within vSphere mounts some of the exports, depending on their intended use.
For the vRealize Log Insight archive data, the application maps directly to the NFS export and no
vSphere Datastore is required.
Table 115. NFS Datastore Design Decision

Decision Design Decision Design Justification Design Implication


ID

© 2016 VMware, Inc. All rights reserved.


Page 144 of 220
VMware Validated Design Reference Architecture Guide

SDDC- Create 2 datastores The application VMs using Do not use the NFS
NFS-005 for use across the these data assume that all datastores as primary VM
following clusters. hosts in the vSphere cluster storage in the management
can access the datastores. cluster even though that is
Management cluster:
possible.
vSphere Data
Protection
Shared Edge and
Compute cluster:
vRealize Automation
Content Library

3.3 Cloud Management Platform Design


The Cloud Management Platform (CMP) layer is the management component of the SDDC. This
layer includes the Service Catalog, which houses the facilities to be deployed, Orchestration which
provides the workflows to get the catalog items deployed, and the Self-Service Portal that empowers
the end users to take full advantage of the Software Defined Data Center. vRealize Automation
provides the Portal and the Catalog, and vRealize Orchestrator takes care of the Orchestration.
Figure 50. Cloud Management Platform Design

3.3.1 vRealize Automation Design


VMware vRealize Automation provides a service catalog from which tenants can deploy applications,
and a portal that lets you deliver a personalized, self-service experience to end users.

3.3.1.1 vRealize Automation Logical and Physical Design


The cloud management layer can deliver multi-platform and multi-vendor cloud services. The cloud
management services in the SDDC provide the following advantages.
 Comprehensive and purpose-built capabilities to provide standardized resources to global
customers in a short time span.
 Multi-platform and multi-vendor delivery methods that integrate with existing enterprise
management systems.

© 2016 VMware, Inc. All rights reserved.


Page 145 of 220
VMware Validated Design Reference Architecture Guide

 Central user-centric and business-aware governance for all physical, virtual, private, and public
cloud services.
 Design that meets the customer and business needs and is extensible.

3.3.1.1.1 Physical Design


The physical design consists of characteristics and decisions that support the logical design.
Deployment Considerations
This design uses NSX logical switches to abstract the vRealize Automation application and its
supporting services. This abstraction allows the application to be hosted in any given region
regardless of the underlying physical infrastructure such as network subnets, compute hardware, or
storage types. This design places the vRealize Automation application and its supporting services in
Region A. The same instance of the application manages workloads in both Region A and Region B.

Table 116. vRealize Automation Region Design Decision

Decision Design Decision Design Justification Design


ID Implication

SDDC- Set up vRealize You must size


3.3.1.1.2 vRealize Automation can
CMP-001 Automation to vRealize
manage one or more
manage both Automation to
regions. The abstraction of
Region A and accommodate
the vRealize Automation
Region B multi-region
application over virtual
deployments from a deployments.
networking allows it to be
single instance.
independent from any
physical site locations or
hardware.

© 2016 VMware, Inc. All rights reserved.


Page 146 of 220
VMware Validated Design Reference Architecture Guide

Figure 51. vRealize Automation Design Overview for Region A

© 2016 VMware, Inc. All rights reserved.


Page 147 of 220
VMware Validated Design Reference Architecture Guide

Figure 52. vRealize Automation Design Overview for Region B

vRealize Automation Appliance


The vRealize Automation virtual appliance includes the cloud management Web portal and database
services. The vRealize Automation portal allows self-service provisioning and management of cloud
services, as well as authoring blueprints, administration, and governance. The vRealize Automation
virtual appliance uses an embedded PostgreSQL database for catalog persistence and database
replication. The database is configured between two vRealize Automation appliances for high
availability.
Table 117. vRealize Automation Virtual Appliance Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC- Deploy two instances Enable an active/active -


CMP-002 of the vRealize front-end portal for higher
Automation virtual availability.
appliance to achieve
redundancy.

SDDC- Deploy two appliances Enable high availability for In this active/passive
CMP-003 that replicate data vRealize Automation. configuration, manual failover
using the embedded between the two instances is
PostgreSQL database. required.

© 2016 VMware, Inc. All rights reserved.


Page 148 of 220
VMware Validated Design Reference Architecture Guide

Decision Design Decision Design Justification Design Implication


ID

SDDC- During deployment Supports deployment of For environments with more


CMP-004 configure the vRealize vRealize Automation in than 25,000 Active Directory
Automation appliances environments with up to users of vRealize Automation,
with 18 GB vRAM. 25,000 Active Directory vRAM must be increased to
users. 22 GB.

Table 118. vRealize Automation Virtual Appliance Resource Requirements per Virtual Machine

Attribute Specification

Number of vCPUs 4

Memory 18 GB

Portal Web site, Application, service catalog and


vRealize Automation function
Identity Manager

vRealize Automation IaaS Web Server


vRealize Automation IaaS Web server provides a user interface within the vRealize Automation portal
Web site for the administration and consumption of IaaS components.

Note The vRealize Automation IaaS Web server is a separate component from the vRealize
Automation appliance.

Table 119. vRealize Automation IaaS Web Server Design Decision

Decision Design Decision Design Justification Design Implication


ID

SDDC- Install two vRealize vRealize Automation can Operational overhead


CMP-005 Automation IaaS Web support between 1,000 and increases as more
servers. 10,000 virtual machines. servers are deployed.

Table 120. vRealize Automation IaaS Web Server Resource Requirements

Attribute Specification

Number of vCPUs 2

Memory 4 GB

Number of vNIC ports 1

Number of local drives 1


Total useable capacity 60 GB (C:)

vRealize Automation functions Model Manager (Web)


IaaS Web

© 2016 VMware, Inc. All rights reserved.


Page 149 of 220
VMware Validated Design Reference Architecture Guide

Attribute Specification

Operating system Microsoft Windows Server 2012 SP2 R2


Microsoft IIS Components

vRealize Automation IaaS Manager Service and DEM Orchestrator Server


The vRealize Automation IaaS Manager Service and Distributed Execution Management (DEM)
server are at the core of the vRealize Automation IaaS platform. The vRealize Automation IaaS
Manager Service and DEM server supports several functions.
 Manages the integration of vRealize Automation IaaS with external systems and databases.
 Provides multi-tenancy.
 Provides business logic to the DEMs.
 Manages business logic and execution policies.
 Maintains all workflows and their supporting constructs.
A Distributed Execution Manager (DEM) runs the business logic of custom models, interacting with
the database and with external databases and systems as required. DEMs also manage cloud and
physical machines.
Each DEM instance acts in either an Orchestrator role or a Worker role. The DEM Orchestrator
monitors the status of the DEM Workers. If a DEM worker stops or loses the connection to the
Manager Service, the DEM Orchestrator puts the workflow back in the queue. It manages the
scheduled workflows by creating new workflow instances at the scheduled time and allows only one
instance of a particular scheduled workflow to run at a given time. It also preprocesses workflows
before execution. Preprocessing includes checking preconditions for workflows and creating the
workflow's execution history.

Note The vRealize Automation IaaS Manager Service and DEM server are separate servers, but
are installed on the same virtual machine.

Table 121. vRealize Automation IaaS Model Manager and DEM Orchestrator Server Design
Decision

Decision Design Decision Design Justification Design Implication


ID

SDDC- Deploy two virtual The Automation IaaS More resources are required
CMP-006 machines to run both the Manager Service and for these two virtual
Automation IaaS DEM Orchestrator machines to accommodate the
Manager Service and share the same load of the two applications.
the DEM Orchestrator active/passive You can scale up the virtual
services in a load- application model. machines later if additional
balanced pool. resources are required.

Table 122. vRealize Automation IaaS Manager Service and DEM Orchestrator Server Resource
Requirements per Virtual Machine

Attribute Specification

Number of vCPUs 2

Memory 4 GB

© 2016 VMware, Inc. All rights reserved.


Page 150 of 220
VMware Validated Design Reference Architecture Guide

Number of vNIC ports 1

Number of local drives 1


Total usable capacity 60 GB (C:)

Manager Service
vRealize Automation functions
DEM Orchestrator

Microsoft Windows Server 2012 SP2 R2


Operating system
Microsoft IIS Components

vRealize Automation IaaS DEM Worker Virtual Machine


vRealize Automation IaaS DEM Workers are responsible for the provisioning and deprovisioning
tasks initiated by the vRealize Automation portal. DEM Workers communicate with vRealize
Automation endpoints. In this design, the endpoint is vCenter Server.
Table 123. vRealize Automation IaaS DEM Worker Design Decision

Decision Design Design Justification Design Implication


ID Decision

SDDC- Install three Each DEM Worker can process up to 30 If you add more DEM
CMP-007 DEM Worker concurrent workflows. Beyond this limit, Workers, you must
instances per workflows are queued for execution. If the also provide
DEM host. number of concurrent workflows is additional resources
consistently above 90, you can add to run them.
additional DEM Workers on the DEM host.

Table 124. vRealize Automation DEM Worker Resource Requirements per Virtual Machine

Attribute Specification

Number of vCPUs 2

Memory 6 GB

Number of vNIC ports 1

Number of local drives 1


Total usable capacity 60 GB (C:)

vRealize Automation functions DEM Worker

Operating system Microsoft Windows Server 2012 SP2 R2

vRealize Automation IaaS Proxy Agent


The vRealize Automation IaaS Proxy Agent is a Windows program that proxies information gathering
from vCenter Server back to vRealize Automation. The IaaS Proxy Agent server provides the
following functions.
 vRealize Automation IaaS Proxy Agent can interact with different types of hypervisors and public
cloud services, such as Hyper-V and AWS. For this design, only the vSphere agent is used.

© 2016 VMware, Inc. All rights reserved.


Page 151 of 220
VMware Validated Design Reference Architecture Guide

 vRealize Automation does not itself virtualize resources, but works with vSphere to provision and
manage the virtual machines. It uses vSphere agents to send commands to and collect data from
vSphere.
Table 125. vRealize Automation IaaS Agent Server Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC- Deploy two vRealize Using two virtual More resources are
CMP-008 Automation vSphere Proxy machines provides required because
Agent virtual machines. redundancy for vSphere multiple virtual machines
connectivity. are deployed for this
function.

SDDC- Abstract the proxy agent Allows the failover of the Additional application
CMP-009 virtual machines on a vRealize Automation virtual networks and
separate virtual network for instance from one site to associated edge devices
independent failover of the another independently. need to be provisioned
main vRealize Automation for those proxy agents.
components across sites.

Table 126. vRealize Automation IaaS Proxy Agent Resource Requirements per Virtual
Machines

Attribute Specification

Number of vCPUs 2

Memory 4 GB

Number of vNIC ports 1

Number of local drives 1


Total usable capacity 60 GB (C:)

vRealize Automation functions Proxy agent

Operating system Microsoft Windows Server 2012 SP2 R2

Load Balancer
Session persistence of a load balancer allows the same server to serve all requests after a session is
established with that server. The session persistence is enabled on the load balancer to direct
subsequent requests from each unique session to the same vRealize Automation server in the load
balancer pool. The load balancer also handles failover for the vRealize Automation Server (Manager
Service) because only one Manager Service is active at any one time. Session persistence is not
enabled because it is not a required component for the Manager Service.

Table 127. Load Balancer Design Decisions

Decision Design Decision Design Justification Design Implication


ID

© 2016 VMware, Inc. All rights reserved.


Page 152 of 220
VMware Validated Design Reference Architecture Guide

SDDC- Set up a load balancer for all Required to enable vRealize Additional
CMP-010 vRealize Automation services Automation to handle a configuration is
that support active/active or greater load and obtain a required to configure
active/passive configurations. higher level of availability the load balancers.
than without load balancers.

Consider the following load balancer characteristics for vRealize Automation.


Table 128. Load Balancer Application Profile Characteristics

Server Role Type Enable SSL Pass-through Persistence Expires in


Persistence (Seconds)

vRealize Automation HTTPS (443) Enabled Source IP 120

Table 129. Load Balancer Service Monitoring Characteristics

Monitor Interval Time Max Type Expect- Meth- URL Receive


out Retries ed od

vRealize 3 9 3 HTTPS 204 GET /vcac/servi -


Automation ces/api/he
Appliance alth

vRealize 3 9 3 HTTPS GET /wapi/api/s REGISTERE


Automation tatus/web D
IaaS Web

vRealize 3 9 3 HTTPS GET /VMPSProv ProvisionServi


Automation ision ce
IaaS
Manager

vRealize 3 9 3 HTTPS GET /vco/api/he RUNNING


Orchestrator althstatus

Table 130. Load Balancer Pool Characteristics

Server Role Algorithm Monitor Members Port Monitor


Port

vRealize IP-HASH vRealize vRealize Automation 443 443


Automation Automation Appliance nodes
Appliance Appliance
monitor

vRealize IP-HASH vRealize vRealize Automation 8444 443


Automation Automation Appliance nodes
Remote Appliance
Console Proxy monitor

© 2016 VMware, Inc. All rights reserved.


Page 153 of 220
VMware Validated Design Reference Architecture Guide

vRealize IP-HASH vRealize IaaS web nodes 443 443


Automation Automation
IaaS Web IaaS Web
monitor

vRealize IP-HASH vRealize IaaS Manager nodes 443 443


Automation Automation
IaaS Manager IaaS
Manager
monitor

vRealize IP-HASH vRealize vRealize Orchestrator 8281 8281


Orchestrator Automation nodes
Orchestrat
or monitor

Table 131. Virtual Server Characteristics

Protocol Port Default Pool Application Profile

HTTPS 443 vRealize Automation Appliance vRealize Automation Profile


Pool

HTTPS 443 vRealize Automation IaaS Web vRealize Automation Profile


Pool

HTTPS 443 vRealize Automation IaaS vRealize Automation Profile


Manager Pool

HTTPS 8281 vRealize Orchestrator Pool vRealize Automation Profile

HTTPS 8444 vRealize Automation Remote vRealize Automation Profile


Console Proxy Pool

3.3.1.1.3 vRealize Automation Supporting Infrastructure


To satisfy the requirements of this SDDC design, you configure additional components for vRealize
Automation such as database servers for highly available database service, email server for
notification, and vRealize Business for cost management.
Microsoft SQL Server Database
vRealize Automation uses a Microsoft SQL Server database to store the vRealize Automation IaaS
elements and policies. The database also maintains information about the machines that vRealize
Automation manages.

© 2016 VMware, Inc. All rights reserved.


Page 154 of 220
VMware Validated Design Reference Architecture Guide

Table 132. vRealize Automation SQL Database Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC- Set up a Microsoft A dedicated or shared SQL Requires additional


CMP-011 SQL server that server can be used so long as it resources and licenses.
supports the meets the requirements of
availability and I/O vRealize Automation.
needs of vRealize
Automation.

SDDC- Locate the Microsoft For simple failover of the entire Adds additional overhead
CMP-012 SQL server in the vRealize Automation instance to managing Microsoft
vRealize Automation from one region to another, the SQL services.
virtual network or set Microsoft SQL server must be
it up to have global running as a VM inside the
failover available. vRealize Automation
application virtual network.
If the environment uses a
shared SQL server, global
failover ensures connectivity
from both primary and
secondary regions.

SDDC- Set up Microsoft SQL While each organization might You might need to consult
CMP-013 server with separate have their own best practices in with the Microsoft SQL
OS volumes for SQL the deployment and database administrators of
Data, Transaction configuration of Microsoft SQL your organization for
Logs, TempDB, and server, high level best practices guidance about production
Backup. recommend separation of deployment in your
database data files and environment.
database transaction logs.

Table 133. vRealize Automation SQL Database Server Resource Requirements per VM

Attribute Specification

Number of vCPUs 8

Memory 16 GB

Number of vNIC ports 1

1
80 GB (C:) (OS)
40 GB (D:) (Application)
Number of local drives 40 GB (E:) Database Data
Total usable capacity
20 GB (F:) Database Log
20 GB (G:) TempDB
80 GB (H:) Backup

vRealize Automation functions Microsoft SQL Server Database

© 2016 VMware, Inc. All rights reserved.


Page 155 of 220
VMware Validated Design Reference Architecture Guide

Microsoft SQL Version SQL Server 2012

Operating system Microsoft Windows Server 2012 R2

PostgreSQL Database Server


The vRealize Automation appliance uses a PostgreSQL database server to maintain the vRealize
Automation portal elements and services, and the information about the catalog items that the
appliance manages.
Table 134. vRealize Automation PostgreSQL Database Design Decision

Decision Design Decision Design Justification Design Implication


ID

SDDC- Use the embedded Simplifies the design and None.


CMP-014 PostgreSQL database enables replication of the
within each vRealize database across the two
Automation appliance. vRealize Automation
appliances.

SDDC- Configure the Asynchronous replication Asynchronous replication


CMP-015 embedded PostgreSQL offers a good balance provides a good level of
database to utilize between availability and availability in compliance
asynchronous performance. with the design objectives.
replication.

Notification Email Server


vRealize Automation notification emails are sent using SMTP. These emails include notification of
machine creation, expiration, and the notification of approvals received by users. vRealize Automation
supports both anonymous connections to the SMTP server and connections using basic
authentication. vRealize Automation also supports communication with or without SSL.
You create a global, inbound email server to handle inbound email notifications, such as approval
responses. Only one, global inbound email server, which appears as the default for all tenants, is
needed. The email server provides accounts that you can customize for each user, providing separate
email accounts, usernames, and passwords. Each tenant can override these settings. If tenant
administrators do not override these settings before enabling notifications, vRealize Automation uses
the globally configured email server. The server supports both the POP and the IMAP protocol, with
or without SSL certificates.
Table 135. Email Design Decision

Decision Design Decision Design Justification Design Implication


ID

SDDC- Use unencrypted Simplifies the design and All notifications will be sent
CMP-016 anonymous SMTP. eases the SMTP unencrypted with no
configuration. authentication.

Notifications
System administrators configure default settings for both the outbound and inbound emails servers
used to send system notifications. Systems administrators can create only one of each type of server

© 2016 VMware, Inc. All rights reserved.


Page 156 of 220
VMware Validated Design Reference Architecture Guide

that appears as the default for all tenants. If tenant administrators do not override these settings
before enabling notifications, vRealize Automation uses the globally configured email server.
System administrators create a global outbound email server to process outbound email notifications,
and a global inbound email server to process inbound email notifications, such as responses to
approvals.
vRealize Business for Cloud Standard
System administrators configure default settings for both the outbound and inbound emails servers
used to send system notifications. Systems administrators can create only one of each type of server
that appears as the default for all tenants. If tenant administrators do not override these settings
before enabling notifications, vRealize Automation uses the globally configured email server.
System administrators create a global outbound email server to process outbound email notifications,
and a global inbound email server to process inbound email notifications, such as responses to
approvals.
Table 136. vRealize Business for Cloud Standard Design Decision

Decision Design Decision Design Justification Design Implication


ID

SDDC- Deploy vRealize Tenant and Workload Additional appliances need to


CMP-017 Business for Cloud costing is provided by be deployed to handle for
Standard as part of vRealize Business for Cloud vRealize Business for Cloud
the cloud Standard. Additional Standard and remote
management capabilities provided by collectors.
platform and vRealize Business for Cloud
integrate it with Advanced are not needed.
vRealize Automation.

SDDC- Use default vRealize Default vRealize Business None.


CMP-018 Business for Cloud for Cloud Standard
Standard appliance appliance size supports up
size. to 10,000 VMs

SDDC- Use default vRealize Default reference costing is Default reference costing
CMP-019 Business reference based on industry might not accurately represent
costing database. information and is actual customer costs.
periodically updated. vRealize Business Appliance
requires Internet access to
periodically update the
reference database.

SDDC- Deploy vRealize For best performance, the In the case where the
CMP-020 Business as a three- vRealize Business environment does not
VM architecture with collectors should be implement disaster recovery
remote data regionally local to the support, you must deploy an
collectors in Region resource which they are additional appliance, the one
A and Region B. configured to collect. for the remote data collector,
Because this design although the vRealize
supports disaster recovery, Business server can handle
the CMP can reside in the load on its own.
Region A or Region B.

© 2016 VMware, Inc. All rights reserved.


Page 157 of 220
VMware Validated Design Reference Architecture Guide

Decision Design Decision Design Justification Design Implication


ID

SDDC- Deploy the vRealize The vRealize Business None.


CMP-021 Business server VM deployment depends on
in the cross-region vRealize Automation.
logical network. During a disaster recovery
event, vRealize Business
will migrate with vRealize
Automation.

SDDC- Deploy a vRealize vRealize Business remote The communication with


CMP-022 Business remote data collector is a region- vCenter Server involves an
data collector VM in specific installation. During additional Layer 3 hop through
each region-specific a disaster recovery event, an NSX edge device.
logical network the remote collector does
not need to migrate with
vRealize Automation.

Table 137. vRealize Business for Cloud Standard Virtual Appliance Resource Requirements
per Virtual Machine

Attribute Specification

Number of vCPUs 2

Memory 4 GB

vRealize Automation functions Server or remote collector

3.3.1.2 vRealize Automation Cloud Tenant Design


A tenant is an organizational unit within a vRealize Automation deployment, and can represent a
business unit within an enterprise, or a company that subscribes to cloud services from a service
provider. Each tenant has its own dedicated configuration, although some system-level configuration
is shared across tenants.

3.3.1.2.1 Comparison of Single-Tenant and Multi-Tenant Deployments


vRealize Automation supports deployments with a single tenant or multiple tenants. System-wide
configuration is always performed using the default tenant, and can then be applied to one or more
tenants. For example, system-wide configuration might specify defaults for branding and notification
providers.
Infrastructure configuration, including the infrastructure sources that are available for provisioning, can
be configured in any tenant and is shared among all tenants. The infrastructure resources, such as
cloud or virtual compute resources or physical machines, can be divided into fabric groups managed
by fabric administrators. The resources in each fabric group can be allocated to business groups
within each tenant by using reservations.
 Single-Tenant Deployment. In a single-tenant deployment, all configuration occurs in the default
tenant. Tenant administrators can manage users and groups, and configure tenant-specific
branding, notifications, business policies, and catalog offerings. All users log in to the vRealize
Automation console at the same URL, but the features available to them are determined by their
roles.

© 2016 VMware, Inc. All rights reserved.


Page 158 of 220
VMware Validated Design Reference Architecture Guide

 Multi-Tenant Deployment. In a multi-tenant deployment, the system administrator creates new


tenants for each organization that uses the same vRealize Automation instance. Tenant users log
in to the vRealize Automation console at a URL specific to their tenant. Tenant-level configuration
is segregated from other tenants and from the default tenant, although users with system-wide
roles can view and manage configuration across multiple tenants.
The IaaS administrator for each tenant creates fabric groups and appoints fabric administrators to
their respective tenants. Although fabric administrators can create reservations for business
groups in any tenant, in this scenario they typically create and manage reservations within their
own tenants. If the same identity store is configured in multiple tenants, the same users can be
designated as IaaS administrators or fabric administrators for each tenant.

3.3.1.2.2 Tenant Design


This design deploys a single tenant containing two business groups.
 The first business group is designated for production workloads provisioning.
 The second business group is designated for development workloads.
Tenant administrators manage users and groups, configure tenant-specific branding, notifications,
business policies, and catalog offerings. All users log in to the vRealize Automation console at the
same URL, but the features available to them are determined by their roles.
The following diagram illustrates the dual-region tenant design.
Figure 53. Rainpole Cloud Automation Tenant Design for Two Regions

© 2016 VMware, Inc. All rights reserved.


Page 159 of 220
VMware Validated Design Reference Architecture Guide

Table 138. Tenant Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC- Configure two business units Allows transparency Some elements such as
CMP-023 as business groups (instead of across the build profiles are visible
separate tenants). environments and to both business groups.
some level of sharing of The design does not
resources and services provide full isolation for
such as blueprints. security or auditing.

SDDC- Create separate fabric groups Provides future Initial deployment will
CMP-024 for each deployment region. isolation of fabric use a single shared
Each fabric group represent resources and potential fabric that consists of
region-specific data center delegation of duty to one compute pod.
resources. Each of the independent fabric
business groups have administrators.
reservations into each of the
fabric groups.

SDDC- Allow access to the default Isolates the default Each tenant
CMP-025 tenant only by the system tenant from individual administrator is
administrator and for the tenant configurations. responsible for
purposes of managing tenants managing their own
and modifying system-wide tenant configuration.
configurations.

Service Catalog
The service catalog provides a common interface for consumers of IT services to use to request and
manage the services and resources they need.
A tenant administrator or service architect can specify information about the service catalog, such as
the service hours, support team, and change window. While the catalog does not enforce service-
level agreements on services, this service hours, support team, and change window information is
available to business users browsing the service catalog.
Table 139. Service Catalog Design Decision

Decision Design Decision Design Justification Design


ID Implication

SDDC- Set up the Rainpole service Distinguishes the blueprints and None.
CMP-026 catalog with the following services that will be provisioned in
services: specific regions without over-
complicating the naming convention of
Common. Any blueprints
those catalog items.
or advanced services that
are not tied to a specific
data center.
Region A. Service catalog
that is dedicated to Region
A.
Region B. Service catalog
that is dedicated to Region
B.

© 2016 VMware, Inc. All rights reserved.


Page 160 of 220
VMware Validated Design Reference Architecture Guide

Catalog Items
Users can browse the service catalog for catalog items they are entitled to request. For some catalog
items, a request results in the provisioning of an item that the user can manage. For example, the
user can request a virtual machine with Windows 2012 preinstalled, and then manage that virtual
machine after it has been provisioned.
Tenant administrators define new catalog items and publish them to the service catalog. The tenant
administrator can then manage the presentation of catalog items to the consumer and entitle new
items to consumers. To make the catalog item available to users, a tenant administrator must entitle
the item to the users and groups who should have access to it. For example, some catalog items may
be available only to a specific business group, while other catalog items may be shared between
business groups using the same tenant. The administrator determines what catalog items are
available to different users based on their job functions, departments, or location.
Typically, a catalog item is defined in a blueprint, which provides a complete specification of the
resource to be provisioned and the process to initiate when the item is requested. It also defines the
options available to a requester of the item, such as virtual machine specifications or lease duration,
or any additional information that the requester is prompted to provide when submitting the request.
Table 140. Catalog Items – Common Service Catalog Design Decision

Decision Design Decision Design Justification Design Implication


ID

SDDC- Create machine Machine blueprints form Provisioning is limited to


CMP-027 blueprints types for the foundational virtual machines. This design
IaaS virtual machine component of the IaaS does not configure cloud
provisioning. portal workload blueprints or physical
deployments. provisioning.

Machine Blueprints
A machine blueprint is the complete specification for a virtual, cloud or physical machine. A machine
blueprint determines the machine's attributes, how it is provisioned, and its policy and management
settings. Machine blueprints are published as catalog items in the service catalog.
Machine blueprints can be specific to a business group or shared among groups within a tenant.
Tenant administrators can create shared blueprints that can be entitled to users in any business
group within the tenant. Business group managers can create group blueprints that can only be
entitled to users within a specific business group. A business group manager cannot modify or delete
shared blueprints. Tenant administrators cannot view or modify group blueprints unless they also
have the business group manager role for the appropriate group.
If a tenant administrator sets a shared blueprint's properties so that it can be copied, the business
group manager can also copy the shared blueprint for use as a starting point to create a new group
blueprint.
Table 141. Single Machine Blueprints

Name Description

Base Windows Server (Development) Standard Rainpole SOE deployment of Windows


2012 R2 available to the Development business
group.

Base Windows Server (Production) Standard Rainpole SOE deployment of Windows


2012 R2 available to the Production business
group.

© 2016 VMware, Inc. All rights reserved.


Page 161 of 220
VMware Validated Design Reference Architecture Guide

Name Description

Base Linux (Development) Standard Rainpole SOE deployment of Linux


available to the Development business group.

Base Linux (Production) Standard Rainpole SOE deployment of Linux


available to the Production business group.

Windows Server + SQL Server (Production) Base Windows 2012 R2 Server with silent SQL
2012 Server install with custom properties. This is
available to the Production business group.

Windows Server + SQL Server Base Windows 2012 R2 Server with silent SQL
(Development) 2012 Server install with custom properties. This is
available to the Development business group.

Blueprint Definitions
The following sections provide details of each service definition that has been included as part of the
current phase of cloud platform deployment.
Table 142. Base Windows Server Blueprint

Service Name Base Windows Server

When users select this blueprint, vRealize


Automation clones a vSphere virtual machine
Provisioning Method
template with preconfigured vCenter
customizations.

Both Production and Development business group


Entitlement
members.

No approval (pre-approval assumed based on


Approval Process
approved access to platform).

Operating System and Version Details Windows Server 2012 R2

Disk: Single disk drive


Configuration
Network: Standard vSphere Networks

Lease:
Production Blueprints: No expiration date
Lease and Archival Details Development Blueprints: Minimum 30 days –
Maximum 270 days
Archive: 15 days

Email sent to manager confirming service request


Pre- and Post-Deployment Requirements
(include description details).

Table 143. Base Windows Blueprint Sizing

Sizing vCPU Memory (GB) Storage (GB)

© 2016 VMware, Inc. All rights reserved.


Page 162 of 220
VMware Validated Design Reference Architecture Guide

Default 1 4 60

Maximum 4 16 60

Table 144. Base Linux Server Requirements and Standards

Service Name Base Linux Server

When users select this blueprint, vRealize


Automation clones a vSphere virtual machine
Provisioning Method template with preconfigured vCenter
customizations.

Both Production and Development business group


Entitlement
members.

No approval (pre-approval assumed based on


Approval Process
approved access to platform).

Operating System and Version Details Red Hat Enterprise Server 6

Disk: Single disk drive


Configuration
Network: Standard vSphere networks

Lease:
Production Blueprints: No expiration date
Lease and Archival Details Development Blueprints: Minimum 30 days –
Maximum 270 days
Archive: 15 days

Email sent to manager confirming service request


Pre- and Post-Deployment Requirements
(include description details).

Table 145. Base Linux Blueprint Sizing

Sizing vCPU Memory (GB) Storage (GB)

Default 1 6 20

Maximum 4 12 20

© 2016 VMware, Inc. All rights reserved.


Page 163 of 220
VMware Validated Design Reference Architecture Guide

Table 146. Base Windows Server with SQL Server Install Requirements and Standards

Service Name Base Windows Server

When users select this blueprint, vRealize


Automation clones a vSphere virtual machine
Provisioning Method template with preconfigured vCenter
customizations.

Both Production and Development business group


Entitlement
members

No approval (pre-approval assumed based on


Approval Process
approved access to platform).

Operating System and Version Details Windows Server 2012 R2

Disk: Single disk drive


Network: Standard vSphere Networks
Configuration Silent Install: The Blueprint calls a silent script
using the vRA Agent to install SQL2012 Server
with custom properties.

Lease:
Production Blueprints: No expiration date
Lease and Archival Details Development Blueprints: Minimum 30 days –
Maximum 270 days
Archive: 15 days

Email sent to manager confirming service request


Pre- and Post-Deployment Requirements
(include description details)

Table 147. Base Windows with SQL Server Blueprint Sizing

Sizing vCPU Memory (GB) Storage (GB)

Default 1 8 100

Maximum 4 16 400

Branding of the vRealize Automation Console


System administrators can change the appearance of the vRealize Automation console to meet site-
specific branding guidelines by changing the logo, the background color, or information in the header
and footer.
System administrators control the default branding for tenants. Tenant administrators can use the
default or reconfigure branding for each tenant.
Table 148. Tenant Branding Decisions

Decision Design Decision Design Justification Design Implication


ID

© 2016 VMware, Inc. All rights reserved.


Page 164 of 220
VMware Validated Design Reference Architecture Guide

SDDC- Perform branding with Provides a consistent look Logo image must be
CMP-028 corporate logo and colors on and feel in accordance provided in 800x52
the tenant and default tenant with corporate standards. pixel size.
web sites.

SDDC- Set the product name to Neutral default. This Users see this name
CMP-029 Infrastructure Service Portal. description can be as the portal display
configured on a per tenant name by default.
basis.

© 2016 VMware, Inc. All rights reserved.


Page 165 of 220
VMware Validated Design Reference Architecture Guide

3.3.1.3 vRealize Automation Infrastructure as a Service Design


The following diagram illustrates the logical design of the vRealize Automation groups and vSphere
Resources:
Figure 54. vRealize Automation Logical Design

© 2016 VMware, Inc. All rights reserved.


Page 166 of 220
VMware Validated Design Reference Architecture Guide

The following terms apply to vRealize Automation when integrated with vSphere. These terms and
their meaning may vary from the way they are used when referring only to vSphere.
Table 149. Terms and Definitions

Term Definition

vSphere (vCenter Server) endpoint Provides information required by vRealize


Automation IaaS to access vSphere compute
resources.
It requires the appropriate permissions for the
vSphere proxy agent to manage the vCenter Server
instance.

Compute resource Virtual object within vRealize Automation that


represents a vCenter Server cluster or resource
pool, and datastores or datastore clusters.
vRealize Automation provisions the virtual
machines requested by business group members
on the compute resource.

Note Compute resources are CPU, memory,


storage and networks. Datastores and
datastore clusters are part of the overall
storage resources.

Fabric groups vRealize Automation IaaS organizes compute


resources into fabric groups.

Fabric administrators Fabric administrators manage compute resources,


which are organized into fabric groups.

Compute reservation A share of compute resources (vSphere cluster,


resource pool, datastores, or datastore clusters),
such as CPU and memory reserved for use by a
particular business group for provisioning virtual
machines.

Note vRealize Automation uses the


term reservation to define resources (be
they memory, storage or networks) in a
cluster. This is different than the use of
reservation in vCenter Server, where a
share is a percentage of total resources,
and reservation is a fixed amount.

Storage reservation Similar to compute reservation (see above), but


pertaining only to a share of the available storage
resources. In this context, you specify a storage
reservation in terms of gigabytes from an existing
LUN or Datastore.

Business groups A collection of virtual machine consumers, usually


corresponding to an organization's business units
or departments. Only users in the business group
can request virtual machines.

© 2016 VMware, Inc. All rights reserved.


Page 167 of 220
VMware Validated Design Reference Architecture Guide

Term Definition

Reservation policy vRealize Automation IaaS determines its


reservation (also called virtual reservation) from
which a particular virtual machine is provisioned.
The reservation policy is a logical label or a pointer
to the original reservation. Each virtual reservation
can be added to one reservation policy.

Build profile A set of user defined properties you apply to a


virtual machine when it is provisioned. For
example, the operating system used in a blueprint,
or the available networks to use for connectivity at
the time of provisioning the virtual machine.
Build profile properties determine the specification
of the virtual machine, the manner in which it is
provisioned, operations to perform after it is
provisioned, or management information
maintained within vRealize Automation.

Blueprint The complete specification for a virtual machine,


determining the machine attributes, the manner in
which it is provisioned, and its policy and
management settings.
Blueprint allows the users of a business group to
create virtual machines on a virtual reservation
(compute resource) based on the reservation
policy, and using platform and cloning types. It also
lets you specify or add machine resources and
build profiles.

The following figure shows the logical design constructs discussed in the previous section as they
would apply to a deployment of vRealize Automation integrated with vSphere in a cross data center
provisioning.

© 2016 VMware, Inc. All rights reserved.


Page 168 of 220
VMware Validated Design Reference Architecture Guide

Figure 55. vRealize Automation Integration with vSphere Endpoint

3.3.1.3.1 Infrastructure Source Endpoints


An infrastructure source endpoint is a connection to the infrastructure that provides a set (or multiple
sets) of resources, which can then be made available by IaaS administrators for consumption by end
users. vRealize Automation IaaS regularly collects information about known endpoint resources and
the virtual resources provisioned therein. Endpoint resources are referred to as compute resources
(or as compute pods, the terms are often used interchangeably).
Infrastructure data is collected through proxy agents that manage and communicate with the endpoint
resources. This information about the compute resources on each infrastructure endpoint and the
machines provisioned on each computer resource is collected at regular intervals.
© 2016 VMware, Inc. All rights reserved.
Page 169 of 220
VMware Validated Design Reference Architecture Guide

During installation of the vRealize Automation IaaS components, you can configure the proxy agents
and define their associated endpoints. Alternatively, you can configure the proxy agents and define
their associated endpoints separately after the main vRealize Automation installation is complete.
Table 150. Endpoint Design Decisions

Decision Design Design Justification Design Implication


ID Decision

SDDC- Create two One vSphere endpoint is required As additional regions are
CMP-030 vSphere to connect to each vCenter Server brought online additional
endpoints. instance in each region. Two vSphere endpoints need to
endpoints will be needed for two be deployed.
regions.

SDDC- Create one vRealize Automation extensibility Using external vRealize


CMP-031 vRealize uses vRealize Orchestrator. One Orchestrator requires
Orchestrator vRealize Orchestrator cluster manual configuration of a
endpoint. exists which requires the creation vRealize Orchestrator
of a single endpoint. endpoint.

3.3.1.3.2 Virtualization Compute Resources


A virtualization compute resource is a vRealize Automation object that represents an ESXi host or a
cluster of ESXi hosts (vSphere cluster). When a group member requests a virtual machine, the virtual
machine is provisioned on these compute resources. vRealize Automation regularly collects
information about known compute resources and the virtual machines provisioned on them through
the proxy agents.

Table 151. Compute Resource Design Decision

Decision Design Design Justification Design Implication


ID Decision

SDDC- Create two Each region has one As additional compute clusters are
CMP-032 compute compute cluster, one created, they need to be added to the
resources. compute resource is existing compute resource in their
required for each cluster. region or to a new resource, which
has to be created.

Note By default, compute resources are provisioned to the root of the compute cluster. If desired,
compute resources can be configured to provision to a specific resource pool. This design
does not use resource pools.

Fabric Groups
A fabric group is a logical container of several compute resources, and can be managed by fabric
administrators.

Table 152. Fabric Group Design Decisions

Decision Design Decision Design Justification Design Implication


ID

© 2016 VMware, Inc. All rights reserved.


Page 170 of 220
VMware Validated Design Reference Architecture Guide

SDDC- Create a fabric group for To enable region specific As additional clusters
CMP-033 each region and include all provisioning a fabric are added in a region,
the compute resources and group in each region must they must be added to
edge resources in that be created. the fabric group.
region.

Business Groups
A Business group is a collection of machine consumers, often corresponding to a line of business,
department, or other organizational unit. To request machines, a vRealize Automation user must
belong to at least one Business group. Each group has access to a set of local blueprints used to
request machines.
Business groups have the following characteristics:
 A group must have at least one business group manager, who maintains blueprints for the group
and approves machine requests.
 Groups can contain support users, who can request and manage machines on behalf of other
group members.
 A vRealize Automation user can be a member of more than one Business group, and can have
different roles in each group.
Table 153. Business Group Design Decision

Decision Design Decision Design Justification Design Implication


ID

SDDC- Create two business Creating two groups, one for Creating more
CMP-034 groups, one for each type of user, allows business groups
production users and one different permissions and results in more
for the development access to be applied to each administrative
users. type of user overhead.

Reservations
A reservation is a share of one compute resource's available memory, CPU and storage reserved for
use by a particular fabric group. Each reservation is for one fabric group only but the relationship is
many-to-many. A fabric group might have multiple reservations on one compute resource, or
reservations on multiple compute resources, or both.
Converged Compute/Edge Clusters and Resource Pools
While reservations provide a method to allocate a portion of the cluster memory or storage within
vRA, reservations do not control how CPU and memory is allocated during periods of contention on
the underlying vSphere compute resources. vSphere Resource Pools are utilized to control the
allocation of CPU and memory during time of resource contention on the underlying host. To fully
utilize this, all VMs must be deployed into one of three resource pools – SDDC-EdgeRP01, User-
EdgeRP01, User-VMRP01. Core-NSX is dedicated for datacenter level NSX Edge components and
should not contain any user workloads. User-EdgeRP01 is dedicated for any statically or dynamically
deployed NSX components such as NSX Edges or Load Balancers which serve a specific customer
workload. User-VMRP01 is dedicated for any statically or dynamically deployed virtual machines
such as Windows, Linux, databases, etc, which contain specific customer workloads.

© 2016 VMware, Inc. All rights reserved.


Page 171 of 220
VMware Validated Design Reference Architecture Guide

Table 154. Reservation Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC- Create four Each resource cluster will have Because production and
CMP-035 reservations – two for two reservations, one for development share the
production and two production and one for same compute resources,
for development. development, allowing both the development business
production and development group must be limited to a
workloads to be provisioned. fixed amount of resources.

SDDC- Create two edge An edge reservation in each The workload reservation
CMP-036 reservations – one in region allows NSX to create must define the edge
each region. edge services gateways on reservation in the network
demand and place them on the settings.
edge cluster.

SDDC- All reservations for In order to ensure dedicated Cloud admins must ensure
CMP-037 production and compute resources of NSX all workload reservations
development networking components, end- are configured with the
workloads are user deployed workloads must appropriate resource pool.
configured to utilize a be assigned to a dedicated This may be a single
dedicated vCenter end-user workload vCenter resource pool for both
Resource Pools. Resource Pools. Workloads production and
provisioned at the root development workloads, or
resource pool level will receive two resource pools, one
more resources then resource dedicated for the
pools, which would starve Development Business
those virtual machines in Group and one dedicated
contention situations. for the Production Business
Group.

SDDC- All reservations for In order to ensure dedicated Cloud admins must ensure
CMP-038 dynamically compute resources of NSX all workload reservations
provisioned NSX networking components, end- are configured with the
Edge components user deployed NSX edge appropriate resource pool.
(routed gateway) are components must be assigned
configured to utilize a to a dedicated end-user
dedicated vCenter network component vCenter
Resource Pools. Resource Pool. Workloads
provisioned at the root
resource pool level will receive
more resources then resource
pools, which would starve
those virtual machines in
contention situations.

SDDC- All vCenter Resource Nesting of resource pools can All resource pools must be
CMP-039 Pools to be used for create administratively complex created at the root
Edge or Compute resource calculations that may resource pool level.
workloads must be result in unintended under or
created at the "root" over allocation of resources
level. Nesting of during contention situations.
resource pools is not
recommended.

© 2016 VMware, Inc. All rights reserved.


Page 172 of 220
VMware Validated Design Reference Architecture Guide

Reservation Policies
You can add each virtual reservation to one reservation policy. The reservation from which a
particular virtual machine is provisioned is determined by vRealize Automation based on the
reservation policy specified in the blueprint (if any), the priorities and current usage of the fabric
group's reservations, and other custom properties.

Table 155. Reservation Policy Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC- Create four workload Two reservation policies are As more groups are
CMP-040 reservation policies for required in each region, one created reservation
production and for production and the other policies for those
development blueprints. for development. groups must be
created.

SDDC- Create two edge Required to place the edge


CMP-041 reservations for placement devices into their respective
of the edge services edge clusters.
gateways into the edge
clusters.

A storage reservation policy is a set of datastores that can be assigned to a machine blueprint to
restrict disk provisioning to only those datastores. Storage reservation policies are created and
associated with the appropriate datastores and assigned to reservations.
Table 156. Storage Reservation Policy Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC- This design does The underlying physical Both business groups will
CMP-042 not use storage storage design does not use have access to the same
tiers. storage tiers. storage.

Template Synchronization
This dual-region design supports provisioning workloads across regions from the same portal using
the same single-machine blueprints. A synchronization mechanism is required to have consistent
templates across regions. There are multiple ways to achieve synchronization, for example, vSphere
Content Library or external services like vCloud Connector or vSphere Replication.

Table 157. Template Synchronization Design Decision

Decision Design Decision Design Justification Design


ID Implication

SDDC- This design uses The vSphere Content Library is Storage space
CMP-043 vSphere Content Library built into the version of vSphere must be
to synchronize templates being used and meets all the provisioned in
across regions. requirements to synchronize each region.
templates.

© 2016 VMware, Inc. All rights reserved.


Page 173 of 220
VMware Validated Design Reference Architecture Guide

When users select this blueprint, vRealize Automation clones a vSphere virtual machine template with
preconfigured vCenter customizations.
Figure 56. Template Synchronization

VMware Identity Management


The Identity Manager is integrated directly into the vRealize Automation appliance and provides
tenant identity management. The vIDM service synchronizes directly with the Rainpole Active
Directory domain. Important users and groups are synced with the Identity Manager. Authentication
always takes place against the Active Directory domain, but searches are made against the local
Active Directory mirror on the vRealize Automation appliance.
Figure 57. VMware Identity Manager proxies authentication between Active Directory and
vRealize Automation

Table 158. Active Directory Authentication Decision

Decision Design Decision Design Justification Design Implication


ID

© 2016 VMware, Inc. All rights reserved.


Page 174 of 220
VMware Validated Design Reference Architecture Guide

SDDC- Choose Active Directory Rainpole uses a single-forest, Requires that the
CMP-044 with Integrated Windows multiple-domain Active vRealize Automation
Authentication as the Directory environment. appliances are joined
Directory Service to the Active Directory
Integrated Windows
connection option. domain.
Authentication supports
establishing trust relationships
in a multi-domain or multi-
forest Active Directory
environment.

By default, the vRealize Automation appliance is initially configured with 18 GB of memory, which is
enough to support a small Active Directory environment. An Active Directory environment is
considered small if it fewer than 25,000 users in the OU have to be synced. An Active Directory
environment with more than 25,000 users is considered large and needs additional memory and CPU.
See the vRealize Automation sizing guidelines for details.
Table 159. vRealize Automation Appliance Sizing Decision

Decision Design Decision Design Justification Design Implication


ID

SDDC- Leave the vRealize This design's Active Customers should consider
CMP-045 Automation Directory environment expanding the memory allocation
appliance default contains no more than for the vRealize Automation
memory allocation 25,000 users and appliance based on the size of
(18 GB). groups. their actual Active Directory
environment.

The connector is a component of the vRealize Automation service and performs the synchronization
of users and groups between Active Directory and the vRealize Automation service. In addition, the
connector is the default identity provider and authenticates users to the service.
Table 160. Connector Configuration Decision

Decision Design Decision Design Justification Design Implication


ID

SDDC- To support Directories This design supports high This design


CMP-046 Service high availability, availability by installing two simplifies the
configure a second vRealize Automation appliances deployment while
connector that and using load-balanced NSX leveraging robust
corresponds to the Edge instances. Adding the built-in HA
second vRealize second connector to the second capabilities. This
Automation appliance. vRealize Automation appliance design uses NSX for
ensures redundancy and vSphere load
improves performance by load balancing.
balancing authentication requests.

3.3.2 vRealize Orchestrator Design


VMware vRealize Orchestrator is a development and process automation platform that provides a
library of extensible workflows to allow you to create and run automated, configurable processes to
manage the VMware vSphere infrastructure as well as other VMware and third-party technologies.

© 2016 VMware, Inc. All rights reserved.


Page 175 of 220
VMware Validated Design Reference Architecture Guide

In this VMware Validated Design, vRealize Administration uses the vRealize Orchestrator Plug-In to
connect to vCenter Server for compute resource allocation.

3.3.2.1 vRealize Orchestrator Logical Design


Logical Design
This VMware Validated Design includes this logical design for vRealize Orchestrator.
Table 161. vRealize Orchestrator Hardware Design Decision

Decision Design Decision Design Justification Design Implication


ID

SDDC- Deploy all vRealize The vRealize Orchestrator Resources should not
CMP- Orchestrator instances appliance requires the be reduced as the
VRO-01 required within the SDDC appropriate resources to vRealize Orchestrator
solution with 2 CPUs, 4 enable connectivity to Appliance requires this
GB memory, and 12 GB vRealize Automation via the for scalability.
of hard disk. vRealize Orchestrator Plugin.

3.3.2.2 Directory Services


vRealize Orchestrator supports the following directory services.
 Windows Server 2008 Active Directory
 Windows Server 2012 Active Directory
 OpenLDAP (Deprecated)
The only configuration supported for multi-domain Active Directory is domain tree. Forest and external
trusts are not supported. Multiple domains that have two-way trust, but are not in the same tree, are
not supported and do not work with vRealize Orchestrator.
Table 162. vRealize Orchestrator Directory Service Design Decision

Decision Design Decision Design Justification Design Implication


ID

SDDC- Configure all vRealize LDAP is being depreciated. This design does not
CMP- Orchestrator instances Supports existing design setup support local
VRO-02 within the SDDC to use utilizing Active Directory authentication for
vRealize Automation services. vRealize Orchestrator.
authentication.

SDDC- Configure vRealize The vRealize Automation End-users who will


CMP- Orchestrator to utilize Default Tenant users are only execute vRO workflows
VRO-03 the vRealize administrative users. By will be required to have
Automation customer connecting to the customer permissions on the
tenant (rainpole.local) tenant, workflows executing on vRO server.
for authentication. vRealize Orchestrator may
execute with end-user granted
permissions.

SDDC- A vRealize Orchestrator To provide best security and If additional vRealize


CMP- installation will be segregation between potential Automation Tenants
VRO-04 associated with only tenants, vRO installation are are configured,
one customer tenant. associate with a single tenant. additional vRealize
Orchestrator
installations will be
needed.

© 2016 VMware, Inc. All rights reserved.


Page 176 of 220
VMware Validated Design Reference Architecture Guide

3.3.2.3 Network Ports


vRealize Orchestrator uses specific network ports to communicate with other systems. The ports are
configured with a default value, but you can change the defaults at any time. When you make
changes, verify that all ports are available for use by your host. If necessary, open these ports on any
firewalls through which network traffic for the relevant components flows. Verify that the required
network ports are open before you deploy vRealize Orchestrator.
Default Communication Ports
Set default network ports and configure your firewall to allow incoming TCP connections. Other ports
may be required if you are using custom plug-ins.
Table 163. vRealize Orchestrator Default Configuration Ports

Port Number Protocol Source Target Description

HTTP server 8280 TCP End-user vRealize The requests sent to


port Web browser Orchestrator Orchestrator default
server HTTP Web port 8280
are redirected to the
default HTTPS Web
port 8281.

HTTPS Server 8281 TCP End-user vRealize The SSL secured


port Web browser Orchestrator HTTP protocol used to
server connect to the
vRealize Orchestrator
REST API.

Web 8283 TCP End-user vRealize The SSL access port


configuration Web browser Orchestrator for the Web UI for
HTTPS access configuration vRealize Orchestrator
port configuration.

Messaging 8286 TCP End-user vRealize A Java messaging port


port Orchestrator Orchestrator used for dispatching
Client server events.

Messaging 8287 TCP End-user vRealize An SSL secured Java


port Orchestrator Orchestrator messaging port used
Client server for dispatching events.

External Communication Ports


Configure your firewall to allow outgoing connections using the external network ports so vRealize
Orchestrator can communicate with external services.
Table 164. vRealize Orchestrator Default External Communication Ports

Port Number Protocol Source Target Description

LDAP 389 TCP vRealize LDAP server Lookup port of your LDAP
Orchestrator authentication server.
server

© 2016 VMware, Inc. All rights reserved.


Page 177 of 220
VMware Validated Design Reference Architecture Guide

Port Number Protocol Source Target Description

LDAP using 636 TCP vRealize LDAP server Lookup port of your secure LDAP
SSL Orchestrator authentication server.
server

LDAP using 3268 TCP vRealize Global Port to which Microsoft Global
Global Catalog Orchestrator Catalog Catalog server queries are
server server directed.

DNS 53 TCP vRealize DNS server Name resolution


Orchestrator
server

VMware 7444 TCP vRealize vCenter Port used to communicate with


vCenter™ Orchestrator Single Sign- the vCenter Single Sign-On
Single Sign-On server On server server.
server

SQL Server 1433 TCP vRealize Microsoft Port used to communicate with
Orchestrator SQL server the Microsoft SQL Server or SQL
server Server Express instances that
are configured as the vRealize
Orchestrator database.

PostgreSQL 5432 TCP vRealize PostgreSQL Port used to communicate with


Orchestrator server the PostgreSQL Server that is
server configured as the vRealize
Orchestrator database.

Oracle 1521 TCP vRealize Oracle DB Port used to communicate with


Orchestrator server the Oracle Database Server that
server is configured as the vRealize
Orchestrator database.

SMTP Server 25 TCP vRealize SMTP Server Port used for email notifications.
port Orchestrator
server

vCenter Server 443 TCP vRealize VMware The vCenter Server API
API port Orchestrator vCenter communication port used by
server server vRealize Orchestrator to obtain
virtual infrastructure and virtual
machine information from the
orchestrated vCenter Server
instances.

vCenter Server 80 TCP vRealize vCenter Port used to tunnel HTTPS


Orchestrator Server communication.
server

VMware ESXi 443 TCP vRealize ESXi hosts (Optional) Workflows using the
Orchestrator vCenter Guest Operations API
server need direct connection between
vRealize Orchestrator and the
ESXi hosts the VM is running on.

© 2016 VMware, Inc. All rights reserved.


Page 178 of 220
VMware Validated Design Reference Architecture Guide

3.3.2.4 vRealize Orchestrator Deployment


Table 165. vRealize Orchestrator Deployment Decision

Decision ID Design Decision Design Justification Design


Implication

SDDC-CMP- Install two vRealize Supports a highly available None.


VRO-05 Orchestrator servers behind a vRealize Orchestrator
load balancer. environment.

The vRealize Orchestrator appliance using Linux comes preconfigured, enabling fast deployment. In
contrast to a vRealize Orchestrator installations using Microsoft Windows as the operating system,
the Linux-based appliance does not incur Microsoft licensing costs.
The vRealize Orchestrator appliance package is distributed with preinstalled software contains the
following software components.
 SUSE Linux Enterprise Server 11 SP3 for VMware, 64-bit edition
 PostgreSQL
 OpenLDAP
 vRealize Orchestrator
Table 166. vRealize Orchestrator Platform Design Decision

Decision Design Decision Design Justification Design


ID Implication

SDDC- Deploy a vRealize Orchestrator Allows for rapid deployment None.


CMP-VRO- appliance for all vRealize with faster scalability and
06 Orchestrator instances required reduced license costs.
within the SDDC.

3.3.2.5 vRealize Orchestrator Topology


vRealize Orchestrator comes as a single-site topology product. The multi-node plug-in creates a
primary-secondary relation between vRealize Orchestrator servers that extends the package
management and workflow execution features. The plug-in contains a set of standard workflows for
hierarchical orchestration, management of vRealize Orchestrator instances, and the scale-out of
vRealize Orchestrator activities.
Table 167. vRealize Orchestrator Topology Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC- Deploy two vRealize Use a clustered Cluster setup is


CMP- Orchestrator appliances to implementation of vRealize required.
VRO-07 provide the SDDC Orchestrator to develop
Foundation orchestration workflows. Without a highly
engine. available clustered vRealize
Orchestrator implementation,
the design has a single point
of failure.

© 2016 VMware, Inc. All rights reserved.


Page 179 of 220
VMware Validated Design Reference Architecture Guide

Decision Design Decision Design Justification Design Implication


ID

SDDC- Install and configure the vRealize Orchestrator will not Enables disaster
CMP- multi-node plug-in your support disaster recovery recovery and multisite
VRO-08 multisite implementation to without the implementation for implementations
provide disaster recovery the multinode plug-in. within vRealize
capability through vRealize Orchestrator
Orchestrator content
replication.

3.3.2.6 vRealize Orchestrator Server Mode


vRealize Orchestrator supports the following server modes.
 Standalone mode. vRealize Orchestrator server runs as a standalone instance. This is the
default mode of operation.
 Cluster mode. To increase availability of the vRealize Orchestrator services, and to create a
more highly available SDDC, you can configure vRealize Orchestrator to work in cluster mode,
and start multiple vRealize Orchestrator instances in a cluster with a shared database. In cluster
mode, multiple vRealize Orchestrator instances with identical server and plug-in configurations
work together as a cluster, and share a single database.
All vRealize Orchestrator server instances communicate with each other by exchanging heartbeats at
a certain time interval. Only active vRealize Orchestrator server instances respond to client requests
and run workflows. If an active vRealize Orchestrator server instance fails to send heartbeats, it is
considered to be non-responsive, and one of the inactive instances takes over to resume all
workflows from the point at which they were interrupted. The heartbeat is implemented through the
shared database, so there are no implications in the network design for a vRealize Orchestrator
cluster. If you have more than one active vRealize Orchestrator node in a cluster, concurrency
problems can occur if different users use the different vRealize Orchestrator nodes to modify the
same resource.
Table 168. vRealize Orchestrator Server Mode Design Decision

Decision Design Decision Design Justification Design


ID Implication

SDDC- Deploy a minimum of one two- Use a clustered Requires an


CMP-VRO- node vRealize Orchestrator implementation of vRealize external
09 cluster to provide a production Orchestrator to develop database.
class orchestration engine. workflows.

3.3.2.7 vRealize Orchestrator SDDC Cluster


The Virtualization Design document specifies the following clusters within the SDDC solution.
 Management cluster
 Edge cluster
 Compute payload cluster
The vRealize Orchestrator instance is logically a part of the management cluster.

© 2016 VMware, Inc. All rights reserved.


Page 180 of 220
VMware Validated Design Reference Architecture Guide

Table 169. vRealize Orchestrator SDDC Cluster Design Decision

Decision Design Decision Design Justification Design


ID Implication

SDDC- Deploy all vRealize Orchestrator In this design, only the None
CMP-VRO- instances required by the SDDC vRealize Automation
10 solution within the same cluster as the component consumes
vRealize Automation instances vRealize Orchestrator.
(management cluster).

The following tables outline characteristics for this vRealize Orchestrator design.
Table 170. Service Monitors Characteristics

Receive
Monitor Interval Timeout Retries Type Send String
String

vRealize 3 9 3 HTTPS GET RUNNING


Orchestrator (443) /vco/api/healthstatus
Monitor

Table 171. Pool Characteristics

Monitor
Pool Name Algorithm Monitors Members Port
Port

vRealize IP-HASH vRealize vRealize 8281 8281


Orchestrator Orchestrator Monitor Orchestrator nodes

Table 172. Virtual Server Characteristics

Name Protocol Service Port Default Pool Name

vRealize Orchestrator HTTPS 8281 vRealize Orchestrator

3.3.2.8 vRealize Orchestrator Configuration


vRealize Orchestrator configuration includes appliance and client configuration.

3.3.2.9 vRealize Orchestrator Appliance Network Settings and Naming Conventions


Use consistent naming conventions and policies when labeling virtual machines so that their use is
clear to any IT staff working with them. It is a best practice to configure common ports, network
settings, and appliance names across all virtual appliances to simplify management and maintain
consistency. Keep in mind that future extensibility options might affect naming conventions.
Table 173. vRealize Orchestrator Appliance Network Settings and Naming Conventions Design
Decision

Decision ID Design Decision Design Justification Design


Implication

© 2016 VMware, Inc. All rights reserved.


Page 181 of 220
VMware Validated Design Reference Architecture Guide

SDDC- Deploy vRealize Orchestrator instances Supports general None.


CMP-VRO- following the network standards and requirements of the
11 naming conventions of the design. design.

3.3.2.10 vRealize Orchestrator Client


The vRealize Orchestrator client is a desktop application that lets you import packages, create, run,
and schedule workflows, and manage user permissions.
You can install the vRealize Orchestrator Client standalone on a desktop system. Download the
vRealize Orchestrator Client installation files from the vRealize Orchestrator appliance
page: https://vRO_hostname:8281. Alternatively, you can run the vRealize Orchestrator Client using
Java WebStart directly from the homepage of the vRealize Orchestrator appliance console.
Table 174. vRealize Orchestrator Client Design Decision

Decision Design Decision Design Justification Design Implication


ID

SDDC- For user who require You must first install the Any additional
CMP- frequent developer or vRealize Orchestrator instances of the
VRO-12 administrative access, install Integration Client from the vRealize Orchestrator
the vRealize Orchestrator vRealize Orchestrator Client or Client
Client required within the Virtual appliance. Casual Integration Plug-in can
SDDC solution from the users of vRealize be installed according
vRealize Orchestrator Orchestrator may utilize the to your needs.
appliance. Java WebStart client.

3.3.2.11 vRealize Orchestrator External Database Configuration


Although the vRealize Orchestrator appliance is a preconfigured Linux-based virtual machine, you
must configure the default vCenter Server plug-in as well as the other default vRealize Orchestrator
plug-ins. You may also want to change the vRealize Orchestrator settings.
Table 175. vRealize Orchestrator External Database Design Decision

Decision Design Decision Design Justification Design Implication


ID

SDDC- Set up vRealize Clustered vRealize Backups of the external


CMP-VRO- Orchestrator with an Orchestrator deployments database will need to take
13 external database. require an external place. MSSQL is utilized.
database.

3.3.2.12 SSL Certificates


The vRealize Orchestrator configuration interface uses a secure connection to communicate with
vCenter Server, relational database management systems (RDBMS), LDAP, vCenter Single Sign-On,
and other servers. You can import the required SSL certificate from a URL or file. You can import the
vCenter Server SSL certificate from the SSL Trust Manager tab in the vRealize Orchestrator
configuration interface.

© 2016 VMware, Inc. All rights reserved.


Page 182 of 220
VMware Validated Design Reference Architecture Guide

Table 176. vRealize Orchestrator SSL Design Decision

Decision ID Design Decision Design Justification Design


Implication

SDDC- Use a CA-signed certificate for Supports requirements None.


CMP-VRO- communication between vCenter for using CA-signed
14 Server and vRealize Orchestrator. certificates.

3.3.2.13 vRealize Orchestrator Database


vRealize Orchestrator requires a database. For small-scale deployments, you can use the SQL
Server Express database that is bundled with vCenter Server, or the preconfigured vRealize
Orchestrator database. vRealize Orchestrator supports Oracle, Microsoft SQL Server, Microsoft SQL
Server Express, and PostgreSQL. For a complete list of supported databases, see VMware Product
Interoperability Matrixes at http://www.vmware.com/resources/compatibility/sim/interop_matrix.php.
This design uses an external Microsoft SQL (MSSQL) database.
Table 177. vRealize Orchestrator Database Design Decision

Decision Design Decision Design Justification Design


ID Implication

SDDC- Configure the vRealize The SDDC design is already using an None
CMP- Orchestrator appliance external MSSQL database for other
VRO-15 required within the SDDC components. Database support currently
solution to use an includes MSSQL or Oracle. For all
external MSSQL supported versions of databases see the
database. VMware Product Interoperability Matrix.

3.3.2.14 vRealize Orchestrator Plug-Ins


Plug-ins allow you to use vRealize Orchestrator to access and control external technologies and
applications. Exposing an external technology in a vRealize Orchestrator plug-in allows you to
incorporate objects and functions in workflows that access the objects and functions of the external
technology. The external technologies that you can access using plug-ins can include virtualization
management tools, email systems, databases, directory services, and remote control interfaces.
vRealize Orchestrator provides a set of standard plug-ins that allow you to incorporate such
technologies as the vCenter Server API and email capabilities into workflows.
In addition, the vRealize Orchestrator open plug-in architecture allows you to develop plug-ins to
access other applications. vRealize Orchestrator implements open standards to simplify integration
with external systems. For information about developing custom content, see Developing with
VMware vRealize Orchestrator.
Default Plug-Ins
vRealize Orchestrator includes a collection of default plug-ins. Each plug-in exposes an external
product API to the vRealize Orchestrator platform. Plug-ins provide inventory classes, extend the
scripting engine with new object types, and publish notification events from the external system. Each
plug-in also provides a library of workflows for automating typical use cases of integrated products.
You can see the list of available plug-ins on the Plug-ins tab in the vRealize Orchestrator
configuration interface. There are separate tabs in the interface for the plug-ins that require
configuration.
All default plug-ins are installed together with the vRealize Orchestrator server. You must configure
the plug-ins before using them. Plug-ins extend the vRealize Orchestrator scripting engine with new
object types and methods, and plug-ins publish notification events from the external system that

© 2016 VMware, Inc. All rights reserved.


Page 183 of 220
VMware Validated Design Reference Architecture Guide

trigger events in vRealize Orchestrator, and in the plugged-in technology. Plug-ins provide an
inventory of JavaScript objects that you can access on the vRealize Orchestrator Inventory tab. Each
plug-in can provide one or more packages of workflows and actions that you can run on the objects in
the inventory to automate the typical use cases of the integrated product.
vRealize Orchestrator and the vCenter Server Plug-In
You can use the vCenter Server plug-in to manage multiple vCenter Server instances. You can create
workflows that use the vCenter Server plug-in API to automate tasks in your vCenter Server
environment. The vCenter Server plug-in maps the vCenter Server API to the JavaScript that you can
use in workflows. The plug-in also provides actions that perform individual vCenter Server tasks that
you can include in workflows.
The vCenter Server plug-in provides a library of standard workflows that automate vCenter Server
operations. For example, you can run workflows that create, clone, migrate, or delete virtual
machines. Before managing the objects in your VMware vSphere inventory by using vRealize
Orchestrator and to run workflows on the objects, you must configure the vCenter Server plug-in and
define the connection parameters between vRealize Orchestrator and the vCenter Server instances
you want to orchestrate. You can configure the vCenter Server plug-in by using the vRealize
Orchestrator configuration interface or by running the vCenter Server configuration workflows from the
vRealize Orchestrator client. You can configure vRealize Orchestrator to connect to your vCenter
Server instances for running workflows over the objects in your vSphere infrastructure.
To manage the objects in your vSphere inventory using the vSphere Web Client, configure vRealize
Orchestrator to work with the same vCenter Single Sign-On instance to which both vCenter Server
and vSphere Web Client are pointing. Also verify that vRealize Orchestrator is registered as a vCenter
Server extension. You register vRealize Orchestrator as a vCenter Server extension when you specify
a user (user name and password) who has the privileges to manage vCenter Server extensions.
Table 178. vRealize Orchestrator vCenter Server Plug-In Design Decisions

Decision Design Decision Design Justification Design


ID Implication

SDDC- Configure the vCenter Server Required for communication to None


CMP-VRO- plug-in to control vCenter Server instances, and
16 communication with the therefore required for workflows.
vCenter Servers.

3.3.2.15 vRealize Orchestrator Scalability


vRealize Orchestrator supports both scale-up and scale-out scalability.
Scale Up
You can use the vCenter Server plug-in to manage multiple vCenter Server instances. You can create
workflows that use the vCenter Server plug-in API to automate tasks in your vCenter Server
environment. The vCenter Server plug-in maps the vCenter Server API to the JavaScript that you can
use in workflows. The plug-in also provides actions that perform individual vCenter Server tasks that
you can include in workflows.
The vCenter Server plug-in provides a library of standard workflows that automate vCenter Server
operations. For example, you can run workflows that create, clone, migrate, or delete virtual
machines. Before managing the objects in your VMware vSphere inventory by using vRealize
Orchestrator and to run workflows on the objects, you must configure the vCenter Server plug-in and
define the connection parameters between vRealize Orchestrator and the vCenter Server instances
you want to orchestrate. You can configure the vCenter Server plug-in by using the vRealize
Orchestrator configuration interface or by running the vCenter Server configuration workflows from the
vRealize Orchestrator client. You can configure vRealize Orchestrator to connect to your vCenter
Server instances for running workflows over the objects in your vSphere infrastructure.

© 2016 VMware, Inc. All rights reserved.


Page 184 of 220
VMware Validated Design Reference Architecture Guide

To manage the objects in your vSphere inventory using the vSphere Web Client, configure vRealize
Orchestrator to work with the same vCenter Single Sign-On instance to which both vCenter Server
and vSphere Web Client are pointing. Also verify that vRealize Orchestrator is registered as a vCenter
Server extension. You register vRealize Orchestrator as a vCenter Server extension when you specify
a user (user name and password) who has the privileges to manage vCenter Server extensions.
Scale Out
You can use the vCenter Server plug-in to manage multiple vCenter Server instances. You can create
workflows that use the vCenter Server plug-in API to automate tasks in your vCenter Server
environment. The vCenter Server plug-in maps the vCenter Server API to the JavaScript that you can
use in workflows. The plug-in also provides actions that perform individual vCenter Server tasks that
you can include in workflows.
The vCenter Server plug-in provides a library of standard workflows that automate vCenter Server
operations. For example, you can run workflows that create, clone, migrate, or delete virtual
machines. Before managing the objects in your VMware vSphere inventory by using vRealize
Orchestrator and to run workflows on the objects, you must configure the vCenter Server plug-in and
define the connection parameters between vRealize Orchestrator and the vCenter Server instances
you want to orchestrate. You can configure the vCenter Server plug-in by using the vRealize
Orchestrator configuration interface or by running the vCenter Server configuration workflows from the
vRealize Orchestrator client. You can configure vRealize Orchestrator to connect to your vCenter
Server instances for running workflows over the objects in your vSphere infrastructure.
To manage the objects in your vSphere inventory using the vSphere Web Client, configure vRealize
Orchestrator to work with the same vCenter Single Sign-On instance to which both vCenter Server
and vSphere Web Client are pointing. Also verify that vRealize Orchestrator is registered as a vCenter
Server extension. You register vRealize Orchestrator as a vCenter Server extension when you specify
a user (user name and password) who has the privileges to manage vCenter Server extensions.
 An active-active cluster with up to five active nodes. VMware recommends a maximum of three
active nodes in this configuration.
 An active-passive cluster with only one active node, and up to seven standby nodes.
In a clustered vRealize Orchestrator environment you cannot change workflows while other vRealize
Orchestrator instances are running. Stop all other vRealize Orchestrator instances before you connect
the vRealize Orchestrator client and change or develop a new workflow.
You can scale out a vRealize Orchestrator environment by having multiple independent vRealize
Orchestrator instances (each with their own database instance). This option allows you to increase
the number of managed inventory objects. You can use the vRealize Orchestrator Multinode plug-in
to replicate the vRealize Orchestrator content, and to start and monitor workflow executions.
Table 65. vRealize Orchestrator Active-Passive Design Decision
Decision Design Decision Design Justification Design
ID Implication

SDDC- Configure vRealize An active-passive cluster is not None


CMP-VRO- Orchestrator in an active- currently being implemented as a
17 active cluster configuration. highly available environment is
required.

3.4 Operations Infrastructure Design


Operations Management is a required element of a software-defined data center. Monitoring
operations support in vRealize Operations Manager and vRealize Log Insight provides capabilities for
performance and capacity management of related infrastructure and cloud management components.

© 2016 VMware, Inc. All rights reserved.


Page 185 of 220
VMware Validated Design Reference Architecture Guide

Figure 58. Operations Infrastructure Conceptual Design

3.4.1 vRealize Log Insight Design


vRealize Log Insight design enables real-time logging for all components that build up the
management capabilities of the SDDC in a dual-region setup.

3.4.1.1 Logical Design


In a multi-region Software Defined Data Center (SDDC) deploy a vRealize Log Insight cluster in each
region that consists of three nodes. This allows for continued availability and increased log ingestion
rates.
Figure 59. Logical Design of vRealize Log Insight

3.4.1.2 Sources of Log Data


vRealize Log Insight collects logs as to provide monitoring information about the SDDC from a central
location.

© 2016 VMware, Inc. All rights reserved.


Page 186 of 220
VMware Validated Design Reference Architecture Guide

vRealize Log Insight collects log events from the following virtual infrastructure and cloud
management components:
 Management vCenter Server
o Platform Services Controller
o vCenter Server
 Compute vCenter Server
o Platform Services Controller
o vCenter Server
 Management, shared edge and compute ESXi hosts
 NSX for vSphere for the management and for the shared compute and edge clusters
o NSX Manager
o NSX Controller instances
o NSX Edge instances
 vRealize Automation
o vRealize Orchestrator
o vRealize Automation components
 vRealize Operations Manager
o Analytics cluster nodes

3.4.1.3 Cluster Nodes


The vRealize Log Insight cluster consists of one master node and two worker nodes. You enable the
Integrated Load Balancer (ILB) on the cluster to have vRealize Log Insight to balance incoming traffic
fairly among available nodes. vRealize Log Insight clients, using both the Web user interface, and
ingestion through syslog or the Ingestion API, connect to vRealize Log Insight at the ILB address.
vRealize Log Insight cluster can scale out to 6 nodes, that is, one master and 5 worker nodes.
Table 179. Cluster Node Configuration Design Decision

Decision Design Decision Design Justification Design


ID Implication

SDDC- Deploy vRealize Log Insight in Provides high availability. Using You must size
OPS-LOG- a cluster configuration of 3 the integrated load balancer each node
001 nodes with an integrated load simplifies the Log Insight identically.
balancer: one master and two deployment, and prevents a
worker nodes. single point of failure.

3.4.1.4 Sizing
By default, a vRealize Log Insight virtual appliance has 2 vCPUs, 4 GB of virtual memory, and 144
GB of disk space provisioned. vRealize Log Insight uses 100 GB of the disk space to store raw, index
and metadata.
Sizing Nodes
To accommodate all of log data from the products in the SDDC, you must size the Log Insight nodes
properly.

© 2016 VMware, Inc. All rights reserved.


Page 187 of 220
VMware Validated Design Reference Architecture Guide

Table 180. Compute Resources for a vRealize Log Insight Medium-Size Node

Attribute Specification

Appliance size Medium

Number of CPUs 8

Memory 16 GB

IOPS 1,000 IOPS

Amount of processed log data 38 GB/day

Number of process log messages 7,500

Environment Up to 250 syslog connections per node

Sizing Storage
Sizing is based on IT organization requirements, but assuming that you want to retain 7 days of data,
you can use the following calculations.
For 250 syslog sources at a rate of 150 MB of logs ingested per-day per-source over 7 days:
250 sources * 150 MB of log data ≈ 37 GB log data per-day
37 GB * 7 days ≈ 260 GB log data per vRealize Log Insight node
260 GB * 1.7 overhead index ≈ 450 GB
Based on this example, you must provide 270 GB of storage space per node when you deploy the
medium-size vRealize Log Insight virtual appliance. You must add additional space of approximately
190 GB.

Note vRealize Log Insight supports virtual hard disks of up to 2 TB. If more capacity is needed, add
another virtual hard disk. Do not extend existing retention virtual disks.

Table 181. Compute Resources for the vRealize Log Insight Nodes Design Decision

Decision Design Decision Design Justification Design Implication


ID

SDDC- Deploy vRealize Accommodates the You must increase the size of
OPS-LOG- Log Insight nodes number of expected the nodes if you configure Log
002 of medium size. syslog connections. Insight to monitor additional
syslog sources.

SDDC- Add (190 GB)* Used to ensure 7 days of Additional storage space is
OPS-LOG- additional storage data retention. required.
003 per node.

3.4.1.5 Networking Design


In both regions, the vRealize Log Insight instances are connected to the region-specific management
VXLANs Mgmt-RegionA01-VXLAN and Mgmt-RegionB01-VXLAN. Each vRealize Log Insight
instance is deployed within the shared management application isolated network.

© 2016 VMware, Inc. All rights reserved.


Page 188 of 220
VMware Validated Design Reference Architecture Guide

Figure 60. Networking Design for the vRealize Log Insight Deployment

3.4.1.5.1 Application Isolated Network Design


Each of the two instances of the vRealize Log Insight deployment is installed in the shared non-
failover application network for the region. An NSX universal distributed logical router (UDLR) is
configured at the front of each shared application network to provide network isolation. This
networking design has the following features:
 All nodes have routed access to the vSphere management network through the Management
NSX UDLR for the home region.
 Routing to the vSphere management network and the external network is dynamic, and is based
on the Border Gateway Protocol (BGP).
For more information about the networking configuration of the application isolated networks for
vRealize Log Insight, see NSX Design.
Table 182. vRealize Log Insight Isolated Network Design Decisions

Decision ID Design Decision Design Justification Design


Implication

SDDC-OPS- Deploy vRealize Log Insight on Secures the vRealize Log None
LOG-004 the shared management region Insight instances.
VXLAN.
Provides a consistent
deployment model for
management applications.

3.4.1.5.2 IP Subnets
You can allocate the following example subnets to the vRealize Log Insight deployment:
Table 183. IP Subnets in the Application Isolated Networks

vRealize Log Insight Cluster IP Subnet

Region A 192.168.31.0/24

© 2016 VMware, Inc. All rights reserved.


Page 189 of 220
VMware Validated Design Reference Architecture Guide

Region B 192.168.32.0/24

Table 184. IP Subnets Design Decision

Decision Design Decision Design Justification Design


ID Implication

SDDC- Allocate separate Satisfies the requirement for log None.


OPS-LOG- subnets for each forwarding between the two vRealize Log
005 application isolated Insight instances to place each instance
network. on its own unique subnet.

3.4.1.5.3 DNS Names


vRealize Log Insight node name resolution uses a region-specific suffix, such as
sfo01.rainpole.local or lax01.rainpole.local, including the load balancer virtual IP
addresses (VIPs). The Log Insight components in both regions have the following node names:
Table 185. DNS Names of the vRealize Log Insight Nodes

DNS Name Role Region

vrli-cluster-01.sfo01.rainpole.local Log Insight ILB VIP A

vrli-mstr01.sfo01.rainpole.local Master node A

vrli-wrkr01.sfo01.rainpole.local Worker node A

vrli-wrkr02.sfo01.rainpole.local Worker node A

vrli-cluster-51.lax01.rainpole.local Log Insight ILB VIP B

vrli-mstr51.lax01.rainpole.local Master node B

vrli-wrkr51.lax01.rainpole.local Worker node B

vrli-wrkr52.lax01.rainpole.local Worker node B

Table 186. DNS Names Design Decision

Decision Design Decision Design Justification Design Implication


ID

SDDC- Configure forward and All nodes are You must manually provide a
OPS- reverse DNS records for accessible by using DNS record for each node and
LOG-006 all vRealize Log Insight fully-qualified domain VIP.
nodes and VIPs. names instead of by
using IP addresses
only.

© 2016 VMware, Inc. All rights reserved.


Page 190 of 220
VMware Validated Design Reference Architecture Guide

SDDC- For all applications that Support logging when If vRealize Automation and
OPS- failover between regions not all management vRealize Operations Manager
LOG-007 (such as vRealize applications are failed are failed over to Region B
Automation and vRealize over to Region B. For and the vRealize Log Insight
Operations Manager), use example, only one cluster is no longer available
the FQDN of the vRealize application is moved to in Region A, update the A
Log Insight Region A VIP Region B. record on the child DNS
when you configure server to point to the vRealize
logging. Log Insight cluster in Region
B.

3.4.1.6 Retention and Archiving


Configure archive and retention parameters of vRealize Log Insight according to the company policy
for compliance and governance.
Retention
vRealize Log Insight virtual appliances contain three default virtual disks and can use addition virtual
disks for storage, for example, hard disk 4.
Table 187. Virtual Disk Configuration in the vRealize Log Insight Virtual Appliance

Hard Disk Size Usage

Hard disk 1 12.125 GB Root file system

Hard disk 2 270 GB for medium- Contains two partitions:


size deployment
/storage/var System logs
/storage/core Storage for Collected logs.

Hard disk 3 256 MB First boot only

Hard disk 4 190 GB Storage for collected logs. The capacity from
(additional virtual this disk is added to /storage/core.
disk)

Calculate the storage space that is available for log data using the following equation:
/storage/core = hard disk 2 space + hard disk 4 space - system logs space
on hard disk 2
Based on the size of the default and additional virtual disks, the storage core is equal to 440 GB.
/storage/core = 270 GB + 190 GB - 20 GB = 440 GB
Retention = /storage/core – 3% * /storage/core
If /storage/core is 425 GB, vRealize Log Insight can use 413 GB for retention.
Retention = 440 GB - 3% * 440 ≈ 427 GB
Configure a retention period of 7 days for the medium-size vRealize Log Insight appliance.
Table 188. Retention Period Design Decision

Decision ID Design Decision Design Justification Design Implication

© 2016 VMware, Inc. All rights reserved.


Page 191 of 220
VMware Validated Design Reference Architecture Guide

SDDC-OPS- Configure vRealize Accommodates logs from 750 You must add a
LOG-008 Log Insight to retain syslog sources (250 per node) as VMDK to each
data for 7 days. per the SDDC design. appliance.

Archiving
vRealize Log Insight archives log messages as soon as possible. At the same time, they remain
retained on the virtual appliance until the free local space is almost filled. Data exists on both the
vRealize Log Insight appliance and the archive location for most of the retention period. The archiving
period must be longer than the retention period.
The archive location must be on an NFS version 3 shared storage. The archive location must be
available and must have enough capacity to accommodate the archives.
Apply an archive policy of 90 days for the medium-size vRealize Log Insight appliance. The vRealize
Log Insight appliance will use about 1 TB of shared storage. According to the business compliance
regulations of your organization, these sizes might change.
Table 189. Log Archive Policy Design Decision

Decision Design Decision Design Design Implication


ID Justification

SDDC- Provide 1 TB of NFS Archives logs from You must provide an NFS version
OPS-LOG- version 3 shared storage 750 syslog 3 shared storage in addition to the
009 to each vRealize Log sources. data storage for the vRealize Log
Insight cluster. Insight cluster.
You must enforce the archive
policy directly on the shared
storage.

3.4.1.7 Alerting
vRealize Log Insight supports alerts that trigger notifications about its health. The following types of
alerts exist in vRealize Log Insight:
 System Alerts. vRealize Log Insight generates notifications when an important system event
occurs, for example when the disk space is almost exhausted and vRealize Log Insight must start
deleting or archiving old log files.
 Content Pack Alerts. Content packs contain default alerts that can be configured to send
notifications, these alerts are specific to the content pack and are disabled by default.
 User-Defined Alerts. Administrators and users can define their own alerts based on data
ingested by vRealize Log Insight.
vRealize Log Insight handles alerts in two ways:
 Send an e-mail over SMTP
 Send to vRealize Operations Manager

3.4.1.7.1 SMTP Notification


Enable e-mail notification for alerts in vRealize Log Insight.
Table 190. SMTP Alert Notification Design Decision

Decision ID Design Design Justification Design Implication


Decision

© 2016 VMware, Inc. All rights reserved.


Page 192 of 220
VMware Validated Design Reference Architecture Guide

SDDC-OPS- Enable alerting Enables administrators and operators Requires access to an


LOG-010 over SMTP. to receive alerts via email from external SMTP server.
vRealize Log Insight.

3.4.1.7.2 Integration with vRealize Operations Manager


vRealize Log Insight integrates with vRealize Operations Manager to provide a central location for
monitoring and diagnostics.
vRealize Log Insight integrates with vRealize Operations Manager in the following ways:
 Notification Events. Forward notification events from vRealize Log Insight to vRealize Operations
Manager.
 Launch in Context. Launch vRealize Log Insight from the vRealize Operation Manager user
interface. You must install the vRealize Log Insight management pack in vRealize Operations
Manager.
Table 191. Forwarding Alerts to vRealize Operations Manager Design Decision

Decision ID Design Decision Design Justification Design Implication

SDDC-OPS- Forward alerts to Provides centralized You must install the


LOG-011 vRealize Operations monitoring and alerting, and vRealize Log Insight
Manager. the use of launch in context. management pack.

3.4.1.8 Security and Authentication


Protect the vRealize Log Insight deployment by providing centralized role-based authentication and
secure communication with the other components in the Software-Defined Data Center (SDDC).
Authentication
Enable role-based access control in vRealize Log Insight by using the existing rainpole.local Active
Directory domain.
Table 192. Custom Role-Based User Management Design Decision

Decision ID Design Decision Design Justification Design Implication

SDDC- Use Active Provides fine-grained role and You must provide access
OPS-LOG- Directory for privilege-based access for to the Active Directory
012 authentication. administrator and operator from all Log Insight nodes.
roles.

Encryption
Replace default self-signed certificates with a CA-signed certificate to provide secure access to the
vRealize Log Insight Web user interface.
Table 193. Custom Certificates Design Decision

Decision Design Decision Design Justification Design


ID Implication

© 2016 VMware, Inc. All rights reserved.


Page 193 of 220
VMware Validated Design Reference Architecture Guide

SDDC- Replace the default Configuring a CA-signed certificate Access to a


OPS-LOG- self-signed certificates ensures that all communication to Certificate
013 with a CA-signed the externally facing Web UI is Authority is
certificate. encrypted. required.

3.4.1.9 Configuration for Collecting Logs


Client applications can send logs to vRealize Log Insight in one of the following ways:
 Directly to vRealize Log Insight over the syslog protocol.
 By using vRealize Log Insight agents.
Table 194. Direct Log Communication to vRealize Log Insight Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC- Configure syslog sources Simplifies the design You must configure
OPS-LOG- to send log data directly to implementation for log syslog sources to
014 vRealize Log Insight. sources that are syslog forward logs to the
capable. vRealize Log Insight
VIP.

SDDC- Configure the vRealize Log Windows does not natively You must manually
OPS-LOG- Insight agent for the support syslog. install and configure
015 vRealize Automation the agent.
vRealize Automation
Windows servers and Linux
requires the use of the
appliances.
agents to collect all
vRealize Automation logs.

3.4.1.10 Time Synchronization


Time synchronization is critical for the core functionality of vRealize Log Insight. By default, vRealize
Log Insight synchronizes time with a pre-defined list of public NTP servers.
Configure consistent NTP sources on all systems that send log data (vCenter Server, ESXi, vRealize
Operation Manager). See Time Synchronization in the Planning and Preparation implementation
guide.
Table 195. Time Synchronization Design Decision

Decision Design Decision Design Design Implication


ID Justification

SDDC- Configure consistent NTP sources Guarantees Requires that all


OPS-LOG- on all virtual infrastructure and cloud accurate log applications
016 management applications for correct timestamps. synchronize time to the
log analysis in vRealize Log Insight. same NTP time source.

3.4.1.11 Connectivity in the Cluster


All vRealize Log Insight cluster nodes must be in the same LAN with no firewall or NAT between the
nodes.
External Communication

© 2016 VMware, Inc. All rights reserved.


Page 194 of 220
VMware Validated Design Reference Architecture Guide

vRealize Log Insight receives log data over the syslog TCP, syslog TLS/SSL, or syslog UDP
protocols. Use the default syslog UDP protocol because security is already designed at the level of
the management network.
Table 196. Syslog Protocol Design Decision

Decision Design Decision Design Justification Design Implication


ID

SDDC- Communicate with the syslog Using the default syslog If the network
OPS-LOG- clients, such as ESXi, vCenter port simplifies connection is
017 Server, NSX for vSphere, on the configuration for all interrupted, the
default UDP syslog port. syslog sources. syslog traffic is lost.
UDP syslog traffic is
not secure.

3.4.1.12 Event Forwarding between Regions


vRealize Log Insight supports event forwarding to other clusters and standalone instances. While
forwarding events, the vRealize Log Insight instance still ingests, stores and archives events locally.
Event Forwarding Protocol
You forward syslog data in vRealize Log Insight by using the Ingestion API or a native syslog
implementation.
The vRealize Log Insight Ingestion API uses TCP communication. In contrast to syslog, the
forwarding module supports the following features for the Ingestion API:
 Forwarding to other vRealize Log Insight instances
 Both structured and unstructured data, that is, multi-line messages.
 Metadata in the form of tags
 Client-side compression
 Configurable disk-backed queue to save events until the server acknowledges the ingestion.
Table 197. Protocol for Event Forwarding across Regions Design Decision

Decision Design Decision Design Justification Design Implication


ID

SDDC- Forward log event Using the forwarding protocol You must configure
OPS-LOG- to the other region supports structured and each region to
018 by using the unstructured data, provides client- forward log data to
Ingestion API. side compression, and event the other.
throttling.

3.4.1.13 Disaster Recovery


Each region is configured to forward log information to the vRealize Log Insight instance in the other
region. As a result, you do not have to configure failover.

© 2016 VMware, Inc. All rights reserved.


Page 195 of 220
VMware Validated Design Reference Architecture Guide

3.4.2 vRealize Operations Manager Design


3.4.2.1 Logical Design
In a multi-region Software Defined Data Center (SDDC), you deploy a vRealize Operations Manager
configuration that consists of the following entities:
 4-node (medium-size) vRealize Operations Manager analytics cluster that is highly available (HA).
This topology provides high availability, scale-out capacity up to eight nodes, and failover.
 2-node remote collector cluster in each region. The remote collectors communicate directly with
the data nodes in the vRealize Operations Manager analytics cluster. For load balancing and fault
tolerance, deploy two remote collectors in each region.
Each region contains its own remote collectors whose role is to ease scalability by performing the
data collection from the applications that are not a subject of failover and periodically sending
collected data to the analytics cluster. You fail over the analytics cluster only because the analytics
cluster is the construct that analyzes and stores monitoring data. This configuration supports failover
of the analytics cluster by using Site Recovery Manager. In the event of a disaster, Site Recovery
Manager migrates the analytics cluster nodes to the failover region.
Figure 61. Logical Design of vRealize Operations Manager Multi-Region Deployment

3.4.2.2 Physical Design


The vRealize Operations Manager nodes run on the management pod in each region of SDDC. For
information about the types of pods, see Pod Architecture.

3.4.2.3 Data Sources


vRealize Operations Manager collects data from the following virtual infrastructure and cloud
management components:
 Management vCenter Server
o Platform Services Controller
o vCenter Server
 Compute vCenter Server
o Platform Services Controller
© 2016 VMware, Inc. All rights reserved.
Page 196 of 220
VMware Validated Design Reference Architecture Guide

o vCenter Server
 Management, Edge and Compute ESXi hosts
 NSX for vSphere for the management and compute clusters
o NSX Manager
o NSX Controller Instances
o NSX Edge instances
 vRealize Automation
o vRealize Orchestrator
 vRealize Automation Components
 vRealize Log Insight
 vRealize Operations Manager (Self Health Monitoring)

3.4.2.4 vRealize Operations Manager Nodes


The analytics cluster of the vRealize Operations Manager deployment contains the nodes that
analyze and store data from the monitored components.
Deploy a four-node vRealize Operations Manager analytics cluster in the x-Region application virtual
network. The analytics cluster consists of one master node, one master replica node, and two data
nodes to enable scale out and high availability.
Table 198. Analytics Cluster Node Configuration Design Decisions

Decision ID Design Decision Design Design


Justification Implication

SDDC-OPS- Deploy vRealize Operations Manager as Enables scale-out Each node must
MON-001 a cluster of 4 nodes: one master, one and high availability. be sized
master replica and two data nodes. identically.

3.4.2.5 Sizing Compute Resources


You size compute resources for vRealize Operations Manager to provide enough resources for
accommodating the analytics operations for monitoring the SDDC.
Size the vRealize Operations Manager analytics cluster according to VMware KB article 2130551
"vRealize Operations Manager 6.1, 6.2, and 6.2.1 Sizing Guidelines". vRealize Operations Manager is
sized so as to accommodate the SDDC design by deploying the following management packs:
 Management Pack for VMware vCenter Server (installed by default)
 Management Pack for NSX for vSphere
 Management Pack for Storage Devices
 Management Pack for vRealize Log Insight
 Management Pack for vRealize Automation

3.4.2.5.1 Sizing Compute Resources for the Analytics Cluster Nodes


Deploying four medium-size virtual appliances satisfies the requirement specification for retention
along with the number of expected objects and metrics.

© 2016 VMware, Inc. All rights reserved.


Page 197 of 220
VMware Validated Design Reference Architecture Guide

Table 199. Size of a Medium vRealize Operations Manager Virtual Appliance

Attribute Specification

Appliance size Medium

vCPU 8

Memory 32 GB

Single-Node Maximum Objects 7,000

Single-Node Maximum Collected Metrics (*) 2,000,000

Multi-Node Maximum Objects Per Node (**) 5,000

Multi-Node Maximum Collected Metrics Per Node 1,500,000


(**)

Maximum number of End Point Operations 1200


Management agents per node

(*) Metric numbers reflect the total number of metrics that are collected from all adapter instances in
vRealize Operations Manager. To get this number, you can go to the Cluster Management page in
vRealize Operations Manager, and view the adapter instances of each node at the bottom of the
page. You can get the number of metrics collected by each adapter instance. The sum of these
metrics is what is estimated in this sheet.

Note The number shown in the overall metrics on the Cluster Management page reflects the
metrics that are collected from different data sources and the metrics that vRealize
Operations Manager creates.

(**) Note the reduction in maximum metrics to permit some head room.

Table 200. Analytics Cluster Node Size Design Decisions

Decision ID Design Decision Design Justification Design Implication

SDDC-OPS- Deploy each node in the Provides the scale You must utilize 32 vCPUs and
MON-002 analytics cluster as a required to monitor the 128 GB of memory in the
medium-size appliance. SDDC. management cluster.

3.4.2.5.2 Sizing Compute Resources for the Remote Collector Nodes


Unlike the analytics cluster nodes, remote collector nodes have only the collector role. Deploying two
remote collector nodes in each region does not increase the number of monitored objects.
Table 201. Size of a Standard Remote Collector Virtual Appliance for vRealize Operations
Manager

Attribute Specification

Appliance size Remote Collector -


Standard

vCPU 2

© 2016 VMware, Inc. All rights reserved.


Page 198 of 220
VMware Validated Design Reference Architecture Guide

Attribute Specification

Memory 4 GB

Single-node maximum Objects (*) 1,500

Single-Node Maximum Collected Metrics 600,000

Multi-Node Maximum Objects Per Node N/A

Multi-Node Maximum Collected Metrics Per Node N/A

Maximum number of End Point Operations Management Agents per Node 250

Maximum Objects for 16-Node Maximum N/A

Maximum Metrics for 16-Node Configuration N/A

* The object limit for the remote collector is based on the VMware vCenter adapter.

Table 202. Compute Resources of the Remote Collector Nodes Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC- Deploy two remote Removes the load from the When configuring the
OPS-MON- collector nodes per analytics cluster from collecting monitoring of a solution, you
003 region. metrics from applications that do must assign a collector
not fail over between regions. group.

SDDC- Deploy the standard- Enables metric collection for the You must provide 4 vCPUs
OPS-MON- size remote collector expected number of objects in the and 8 GB of memory in the
004 virtual appliances. SDDC. management cluster in each
region.

3.4.2.6 Sizing Storage


A vRealize Operations Manager node of a medium size requires 266 GB of free space for data. To
collect the required number of metrics, you must add a 1 TB VMDK to each analytics cluster node.
Sizing Storage for the Analytics Cluster Nodes
The analytics cluster processes a large amount of objects and metrics. As the environment grows, a
need to add additional data nodes to the analytics cluster might emerge. Refer to the vRealize
Operations Manager sizing guidelines in the KB article to plan the sizing requirements of your
environment.
Table 203. Analytics Cluster Node Storage Design Decision

Decision ID Design Decision Design Justification Design Implication

SDDC- Provide a 1 TB VMDK for Provides enough You must add the 1 TB disk
OPS-MON- each analytics node storage to meet the manually while the virtual
005 (master, master replica, and SDDC design. machine for the analytics node
two data nodes). is powered off.

© 2016 VMware, Inc. All rights reserved.


Page 199 of 220
VMware Validated Design Reference Architecture Guide

Sizing Storage for the Remote Collector Nodes


Deploy the remote collector nodes with thin-provisioned disks. Because remote collectors do not
perform analytics operations or store data, the default VMDK size is sufficient.
Table 204. Remote Collector Node Storage Design Decision

Decision ID Design Decision Design Justification Design


Implication

SDDC-OPS- Do not provide additional Remote collectors do not perform None.


MON-006 storage for remote analytics operations or store data on
collectors. disk.

3.4.2.7 Networking Design


You place the vRealize Operations Manager nodes in several network units for isolation and failover.
The networking design also supports public access to the analytics cluster nodes.
For secure access, load balancing and portability, the vRealize Operations Manager analytics cluster
is deployed in the shared cross-region application isolated network xRegion01-VXLAN, and the
remote collector clusters in the shared local application isolated networks RegionA01-VXLAN and
RegionB01-VXLAN.

© 2016 VMware, Inc. All rights reserved.


Page 200 of 220
VMware Validated Design Reference Architecture Guide

Figure 62. Networking Design of the vRealize Operations Manager Deployment

3.4.2.7.1 Application Isolated Network Design


The vRealize Operations Manager analytics cluster is installed into the cross-region shared
application isolated network and the remote collector nodes are installed in their shared region
specific application isolated networks. This networking design has the following features:
 Each application component that needs to fail over between regions, such as the analytics cluster
for vRealize Operations Manager, is on the same network. vRealize Automation and vRealize Log
Insight also share components on this network.
 All nodes have routed access to the vSphere management network through the NSX Universal
Logical Router.
 Routing to the vSphere management network and other external networks is dynamic, and is
based on the Border Gateway Protocol (BGP).

© 2016 VMware, Inc. All rights reserved.


Page 201 of 220
VMware Validated Design Reference Architecture Guide

For more information about the networking configuration of the application isolated network, see
Software-Defined Networking Design and NSX Design.
Table 205. vRealize Operations Manager Isolated Network Design Decision

Decision Design Decision Design Justification Design


ID Implication

SDDC- Use the existing Support disaster recovery by You must


OPS- application virtual isolating vRealize Operations implement NSX to
MON-007 networks for vRealize Manager clusters on the support this
Operations Manager application virtual networks network
analytics and remote (xRegion01-VXLAN, configuration.
collector clusters. RegionA01-VXLAN and
RegionB01-VXLAN).

3.4.2.7.2 IP Subnets
You can allocate the following example subnets for each cluster in the vRealize Operations Manager
deployment:
Table 206. IP Subnets in the Application Virtual Network of vRealize Operations Manager

vRealize Operations Manager Cluster Type IP Subnet

Analytics cluster in Region A (also valid for Region B for failover) 192.168.11.0/24

Remote collectors in Region A 192.168.31.0/24

Remote collectors in Region B 192.168.32.0/24

Table 207. IP Subnets Design Decision

Decision Design Decision Design Justification Design


ID Implication

SDDC- Allocate separate Placing the remote collectors on their own None.
OPS-MON- subnets for each subnet enables them to communicate
008 application isolated with the analytics cluster and not be a
network. part of the failover group.

3.4.2.7.3 DNS Names


vRealize Operations Manager node name resolution uses a region-specific suffix, such as
sfo01.rainpole.local or lax01.rainpole.local. The analytics nodes IP addresses and the
load balancer virtual IP address (VIP) are mapped to the root domain suffix rainpole.local.
Access from the public network is provided through a VIP, the traffic to which is handled by the NSX
Edge service gateway.
Table 208. DNS Names for the Application Virtual Networks

vRealize Operations Manager DNS Name Node Type

vrops-cluster-01.rainpole.local Virtual IP of the analytics cluster

© 2016 VMware, Inc. All rights reserved.


Page 202 of 220
VMware Validated Design Reference Architecture Guide

vrops-mstrn-01.rainpole.local Master node in the analytics cluster

vrops-repln-02.rainpole.local Master replica node in the analytics cluster

vrops-datan-03.rainpole.local First data node in the analytics cluster

vrops-datan-04.rainpole.local Second data node in the analytics cluster

vrops-rmtcol-01.sfo01.rainpole.local First remote collector node in Region A

vrops-rmtcol-02.sfo01.rainpole.local Second remote collector node in Region A

vrops-rmtcol-51.lax01.rainpole.local First remote collector node in Region B

vrops-rmtcol-52.lax01.rainpole.local Second remote collector node in Region B

3.4.2.8 Networking for Failover and Load Balancing


By default, vRealize Operations Manager does not provide a solution for load-balanced UI user
sessions across nodes in the cluster. You associate vRealize Operations Manager with the shared
load balancer in the region.
The lack of load balancing for user sessions results in the following limitations:
 Users must know the URL of each node to access the UI. As a result, a single node might be
overloaded if all users access it at the same time.
 Each node supports up to four simultaneous user sessions.
 Taking a node offline for maintenance might cause an outage. Users cannot access the UI of the
node when the node is offline.
To avoid such problems, place the analytics cluster behind an NSX load balancer that is configured to
allow up to four connections per node. The load balancer must distribute the load evenly to all cluster
nodes. In addition, configure the load balancer to redirect service requests from the UI on port 80 to
port 443.
Load balancing for the remote collector nodes is not required.
Table 209. Networking Failover and Load Balancing Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC- Place the analytics Enables balanced access of tenants You must manually
OPS-MON- cluster behind an and users to the analytics services configure the NSX Edge
009 NSX load balancer. with the load being spread evenly devices to provide load
across the cluster. balancing services.

SDDC- Do not use a load Remote collectors must directly None


OPS-MON- balancer for the access the systems that they are
010 remote collector monitoring.
nodes.
Remote collectors do not require
access to and from the public
network.

© 2016 VMware, Inc. All rights reserved.


Page 203 of 220
VMware Validated Design Reference Architecture Guide

3.4.2.9 Security and Authentication


You use several sources for authentication in vRealize Operations Manager such as an Active
Directory service, vCenter Server, and local user inventory.
Identity Sources
You can allow users to authenticate in vRealize Operations Manager in the following ways:
 Import users or user groups from an LDAP database. Users can use their LDAP credentials to log
in to vRealize Operations Manager.
 Use vCenter Server user accounts. After a vCenter Server instance is registered with vRealize
Operations Manager, the following vCenter Server users can log in to vRealize Operations
Manager:
o Users that have administration access in vCenter Server.
o Users that have one of the vRealize Operations Manager privileges, such as PowerUser,
assigned to the account which appears at the root level in vCenter Server.
 Create local user accounts in vRealize Operations Manager.
Table 210. Identity Source for vRealize Operations Manager Design Decision

Decision ID Design Decision Design Justification Design Implication

SDDC- Use Active Provides access to vRealize You must manually


OPS-MON- Directory Operations Manager by using configure the Active
011 authentication. standard Active Directory Directory authentication.
accounts.
Ensures that authentication is
available even if vCenter Server
becomes unavailable.

Encryption
Access to all vRealize Operations Manager Web interfaces requires an SSL connection. By default,
vRealize Operations Manager uses a self-signed certificate. Replace default self-signed certificates
with a CA-signed certificate to provide secure access to the vRealize Operations Manager user
interface.
Table 211. Using CA-Signed Certificates Design Decision

Decision Design Decision Design Justification Design


ID Implication

SDDC- Replace the default Configuring a CA-signed certificate Access to a


OPS-MON- self-signed certificates ensures that all communication to Certificate
012 with a CA-signed the externally facing Web UI is Authority is
certificate. encrypted. required.

3.4.2.10 Monitoring and Alerting


vRealize Operations Manager can monitor itself and displays alerts about issues with its operational
state.
vRealize Operations Manager display the following administrative alerts:
 System alert. A component of the vRealize Operations Manager application has failed.
 Environment alert. vRealize Operations Manager has stopped receiving data from one or more
resources. Such an alert might indicate a problem with system resources or network
infrastructure.
© 2016 VMware, Inc. All rights reserved.
Page 204 of 220
VMware Validated Design Reference Architecture Guide

 Log Insight log event. The infrastructure on which vRealize Operations Manager is running has
low-level issues. You can also use the log events for root cause analysis.
 Custom dashboard. vRealize Operations Manager can show super metrics for data center
monitoring, capacity trends and single pane of glass overview.
Table 212. Monitoring vRealize Operations Manager Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC- Configure vRealize Enables administrators and Requires access to an


OPS-MON- Operations Manager operators to receive alerts via external SMTP server.
013 for SMTP outbound e-mail from vRealize
alerts. Operations Manager.

SDDC- Integrate vRealize Enables deeper root cause Requires installation of


OPS-MON- Operations Manager analysis and infrastructure the Management Pack
014 with vRealize Log alerting for vRealize Log Insight.
Insight.

SDDC- Configure vRealize Provides extended SDDC Requires manually


OPS-MON- Operations Manager monitoring, capacity trends configuring the
015 custom dashboards. and single pane of glass dashboards.
overview.

3.4.2.11 Management Packs


The SDDC contains several VMware products for network, storage, and cloud management. You can
monitor and perform diagnostics on all of them by using management packs.
Table 213. Management Packs for vRealize Operations Manager Design Decisions

Decision Design Decision Design Justification Design Implication


ID

SDDC- Install the following Provides monitoring for Requires manually


OPS-MON- management packs: all virtual infrastructure installing and
016 and cloud management configuring each non-
 Management Pack for components. default management
VMware vCenter Server pack.
(installed by default)
 Management Pack for
NSX for vSphere
 Management Pack for
Storage Devices
 Management Pack for
vRealize Log Insight
 Management Pack for
vRealize Automation

SDDC- Solutions that failover Provides monitoring for Adds minimal additional
OPS-MON- between sites will use the all components during a load to the analytics
017 default cluster group as the failover. cluster
collector.

© 2016 VMware, Inc. All rights reserved.


Page 205 of 220
VMware Validated Design Reference Architecture Guide

3.4.3 vSphere Data Protection Design


Design data protection of the management components in your environment to ensure continuous
operation of the SDDC if the data of a management appliance is damaged.

3.4.3.1 Data Protection Design


Data backup protects the data of your organization against data loss, hardware failure, accidental
deletion, or other disaster for each region. For consistent image-level backups, use backup software
that is based on the VMware Virtual Disk Development Kit (VDDK), such as vSphere Data Protection.
Table 214. vSphere Data Protection Design Decision

Decision Design Decision Design Justification Design Implication


ID

vSphere Data Protection provides


Use vSphere Data vSphere Data Protection
SDDC- the functionality that is required to
Protection to back up lacks some features that
OPS-BKP- back up full image VMs and
all management are available in other
001 applications in those VMs, for
components. backup solutions.
example, Microsoft SQL Server.

3.4.3.2 Logical Design


vSphere Data Protection protects the virtual infrastructure at the VMware vCenter Server layer.
Because vSphere Data Protection is connected to the Management vCenter Server, it can access all
management ESXi hosts, and can detect the virtual machines that require backups.
Figure 63. vSphere Data Protection Logical Design

3.4.3.3 Backup Datastore


vSphere Data Protection uses deduplication technology to back up virtual environments at data block
level, which enables efficient disk utilization. To optimize backups and leverage the VMware vSphere
Storage APIs, all ESXi hosts must have access to the production storage. The backup datastore
© 2016 VMware, Inc. All rights reserved.
Page 206 of 220
VMware Validated Design Reference Architecture Guide

stores all the data that is required to recover services according to a Recovery Point Objective
(RPO). Determine the target location and make sure that it meets performance requirements.
Table 215. Options for Backup Storage Location

Option Benefits Drawbacks

Store production You do not have to request a new You cannot recover your
and backup data on storage configuration from the data if the destination
the same storage storage team. datastore or the production
platform. storage is unrecoverable.
You can take full advantage of
vSphere capabilities.

Store backup data If production storage becomes You might be required to


on dedicated unavailable, you can recover your install and configure a
storage. data because your backup data is dedicated storage volume for
not located on the same shared backups.
storage.
You separate production and
backup virtual machines.
The backup schedule does not
impact production storage
performance because the backup
storage is completely separate.

Table 216. VMware Backup Store Target Design Decisions

Decision Design Decision Design Justification Design


ID Implication

SDDC- Allocate a dedicated NFS vSphere Data Protection emergency You must
OPS-BKP- datastore for the vSphere restore operations are possible even provide an
002 Data Protection appliance when the primary VMware Virtual SAN external NFS
and the backup data in datastore is not available because the storage array.
each region according to vSphere Data Protection storage volume
Physical Storage Design. is separate from the primary Virtual SAN
datastore.
The amount of storage required for
backups is greater than the amount of
storage available in the Virtual SAN
datastore.

3.4.3.4 Performance
vSphere Data Protection generates a significant amount of I/O operations, especially when
performing multiple concurrent backups. The storage platform must be able to handle this I/O. If the
storage platform does not meet the performance requirements, it might miss backup windows.
Backup failures and error messages might occur. Run the vSphere Data Protection performance
analysis feature during virtual appliance deployment or after deployment to assess performance.
Table 217. vSphere Data Protection Performance

Total Backup Avg Mbps in 4


Size hours

© 2016 VMware, Inc. All rights reserved.


Page 207 of 220
VMware Validated Design Reference Architecture Guide

0.5 TB 306 Mbps

1 TB 611 Mbps

2 TB 1223 Mbps

3.4.3.5 Volume Sizing


vSphere Data Protection can dynamically expand the destination backup store from 2 TB to 8 TB.
Using an extended backup storage requires additional memory on the vSphere Data Protection
appliance.
Table 218. VMware vSphere Data Protection Sizing Guide

Backup Appliance Memory


Store Size (Minimum)

2 TB 6 GB

4 TB 8 GB

6 TB 10 GB

8 TB 12 GB

Table 219. VMware Backup Store Size Design Decisions

Decision Design
Design Justification Design Implication
ID Decision

vSphere Data Protection is used for the


Set the More NFS storage will be
SDDC- management stack of a single region. The
backup required to accommodate
OPS-BKP- management stack currently consumes
targets to 4 the increased disk
003 approximately 2 TB of disk space,
TB initially. requirements.
uncompressed and without deduplication.

3.4.3.6 Other Considerations


vSphere Data Protection can protect virtual machines that reside on VMware Virtual SAN datastores
from host failures. The virtual machine storage policy is not backed up with the virtual machine, but
you can restore the storage policy after restoring the virtual machine.

Note The default Virtual SAN storage policy includes Number Of Failures To Tolerate =
1, which means that virtual machine data will be mirrored.

vSphere Data Protection is used to restore virtual machines that failed or need their data reverted to a
previous state.

3.4.3.7 Backup Policies


Use vSphere Data Protection backup policies to specify virtual machine backup options, the schedule
window, and retention policies.

© 2016 VMware, Inc. All rights reserved.


Page 208 of 220
VMware Validated Design Reference Architecture Guide

Virtual Machine Backup Options


vSphere Data Protection provides the following options for a virtual machine backup:
 HotAdd. Provides full image backups of virtual machines, regardless of the guest operating
system.
o The virtual machine base disk is attached directly to vSphere Data Protection to back up data.
vSphere Data Protection uses Changed Block Tracking to detect and back up blocks that are
altered.
o The backup and restore performance is faster because the data flow is through the VMkernel
layer instead of over a network connection.
o A quiesced snapshot can be used to redirect the I/O of a virtual machine disk .vmdk file.
o HotAdd does not work in multi-writer disk mode.
 Network Block Device (NBD). Transfers virtual machine data across the network to allow
vSphere Data Protection to back up the data.
o The performance of the virtual machine network traffic might be lower.
o NBD takes a quiesced snapshot. As a result, it might interrupt the I/O operations of the virtual
machine to swap the .vmdk file or consolidate the data after the backup is complete.
o The time to complete the virtual machine backup might be longer than the backup window.
o NBD does not work in multi-writer disk mode.
 vSphere Data Protection Agent Inside Guest OS. Provides backup of certain applications that
are running in the guest operating system through an installed backup agent.
o Enables application-consistent backup and recovery with Microsoft SQL Server, Microsoft
SharePoint, and Microsoft Exchange support.
o Provides more granularity and flexibility to restore on the file level.
Table 220. Virtual Machine Transport Mode Design Decisions

Decision
Design Decision Design Justification Design Implication
ID

HotAdd optimizes and speeds All ESXi hosts need to


SDDC-
Use HotAdd to back up up virtual machine backups, have the same visibility
OPS-BKP-
virtual machines. and does not impact the of the virtual machine
004
vSphere management network. datastores.

Use the vSphere Data


You must install the
SDDC- Protection agent for You can restore application
vSphere Data
OPS-BKP- backups of SQL databases data instead of entire virtual
Protection agent and
005 on Microsoft SQL Server machines.
maintain it.
virtual machines.

3.4.3.8 Schedule Window


Even though vSphere Data Protection uses the Changed Block Tracking technology to optimize the
backup data, to avoid any business impact, do not use a backup window when the production storage
is in high demand.

Warning Do not perform any backup or other administrative activities during the vSphere Data
Protection maintenance window. You can only perform restore operations. By default, the
vSphere Data Protection maintenance window begins at 8 PM local server time and
continues uninterrupted until 8 AM or until the backup jobs are complete. Configure
maintenance windows according to IT organizational policy requirements.
© 2016 VMware, Inc. All rights reserved.
Page 209 of 220
VMware Validated Design Reference Architecture Guide

Table 221. Backup Schedule Design Decisions

Decision Design
Design Justification Design Implication
ID Decision

SDDC- Allows for the recovery of virtual Data that changed since the
Schedule daily
OPS-BKP- machines data that is at most a day last backup, 24 hours ago, is
backups.
006 old lost.

Ensures that backups occur when the Backups need to be


Schedule system is under the least amount of scheduled to start between
SDDC-
backups outside load. You should verify that backups 8:00 PM and 8:00 AM or until
OPS-BKP-
the production are completed in the shortest time the backup jobs are
007
peak times. possible with the smallest risk of complete, whichever comes
errors. first.

3.4.3.9 Retention Policies


Retention policies are properties of a backup job. If you group virtual machines by business priority,
you can set the retention requirements according to the business priority.
Table 222. Retention Policies Design Decision

Decision Design
Design Justification Design Implication
ID Decision

Keeping 3 days of backups enables Depending on the rate of


SDDC- Retain
administrators to restore the change in virtual machines,
OPS-BKP- backups for at
management applications to a state backup retention policy can
008 least 3 days.
within the last 72 hours. increase the storage target size.

3.4.3.10 Component Backup Jobs


You can configure backup for each SDDC management component separately. For this scenario, no
requirement to back up the entire SDDC exists, and this design does not imply such an operation.
Some products can perform internal configuration backups. Use those products in addition to the
whole VM component backups as appropriate.
Table 223. Component Backup Jobs Design Decision

Decision
Design Decision Design Justification Design Implication
ID

SDDC- Use the internal Restoring small configuration files An FTP server is
OPS-BKP- configuration backup can be a faster and less destructive required for the NSX
009 features within VMware method to achieve a similar configuration backup.
NSX. restoration of functionality.

Backup Jobs in Region A


Create a single backup job for the components of a management application according to the
node configuration of the application in Region A.

© 2016 VMware, Inc. All rights reserved.


Page 210 of 220
VMware Validated Design Reference Architecture Guide

Table 224. VM Backup Jobs in Region A

Product Image VM Backup Jobs in Region A Application VM Backup Jobs in


Region A

ESXi Backup is not applicable

Platform Part of the vCenter Server backup job


Services
Controller
Management Job
vCenter Server
mgmt01vc01.sfo01.rainpole.local
mgmt01psc01.sfo01.rainpole.local

Compute Job

comp01vc01.sfo01.rainpole.local
comp01psc01.sfo01.rainpole.local
Management Job
NSX for vSphere
mgmt01nsxm01.sfo01.rainpole.local
Compute Job
comp01nsxm01.sfo01.rainpole.local

vRealize vra01mssql01.rainpole.local vra01mssql01.rainpole.local


Automation vra01bus01.rainpole.local
vra01buc01.sfo01.rainpole.local
vra01svr01a.rainpole.local
vra01svr01b.rainpole.local
vra01iws01a.rainpole.local
vra01iws01b.rainpole.local
vra01ims01a.rainpole.local
vra01ims01b.rainpole.local
vra01dem01.rainpole.local
vra01dem02.rainpole.local
vra01vro01a.rainpole.local
vra01vro01b.rainpole.local
vra01ias01.sfo01.rainpole.local
vra01ias02.sfo01.rainpole.local

vRealize Log vrli-mstr-01.sfo01.rainpole.local


Insight
vrli-wrkr-01.sfo01.rainpole.local
vrli-wrkr-02.sfo01.rainpole.local

© 2016 VMware, Inc. All rights reserved.


Page 211 of 220
VMware Validated Design Reference Architecture Guide

Product Image VM Backup Jobs in Region A Application VM Backup Jobs in


Region A

vRealize vrops-mstrn-01.rainpole.local
Operations
vrops-repln-02.rainpole.local
Manager
vrops-datan-03.rainpole.local
vrops-datan-04.rainpole.local
vrops-rmtcol-
01.sfo01.rainpole.local
vrops-rmtcol-
02.sfo01.rainpole.local

vRealize Part of the vRealize Automation backup job


Orchestrator

vRealize Part of the vRealize Automation backup job


Business Server
vRealize
Business Data
Collector

Backup Jobs in Region B


Create a single backup job for the components of a management application according to the
node configuration of the application in Region B.
Table 225. VM Backup Jobs in Region B*

Product Image VM Backup Jobs in Region B Application VM Backup Jobs in


Region B

ESXi Backup is not applicable

Platform Part of the vCenter Server backup job


Services
Controller
Management Job
vCenter
Server
mgmt01vc51.lax01.rainpole.local
mgmt01psc51.lax01.rainpole.local

Compute Job

comp01vc51.lax01.rainpole.local
comp01psc51.lax01.rainpole.local
Management Job
NSX for
vSphere
mgmt01nsxm51.lax01.rainpole.local
Compute Job
comp01nsxm51.lax01.rainpole.local

© 2016 VMware, Inc. All rights reserved.


Page 212 of 220
VMware Validated Design Reference Architecture Guide

Product Image VM Backup Jobs in Region B Application VM Backup Jobs in


Region B

vRealize vra01ias51.lax01.rainpole.local
Automation vra01ias52.lax01.rainpole.local
vra01buc51.lax01.rainpole.local

vRealize vrli-mstr-51.lax01.rainpole.local
Log Insight
vrli-wrkr-51.lax01.rainpole.local
vrli-wrkr-52.lax01.rainpole.local

vRealize vrops-rmtcol-51.lax01.rainpole.local
Operations
vrops-rmtcol-52.lax01.rainpole.local
Manager

vRealize Part of the vRealize Automation backup job


Business
Data
Collector

*Not applicable to a single-region SDDC implementation.

3.4.4 Site Recovery Manager and vSphere Replication Design


To support disaster recovery (DR) in the SDDC, you protect vRealize Operations Manager and
vRealize Automation by using VMware Site Recovery Manager and VMware vSphere Replication.
When failing over to a recovery region, these management applications continue the delivery of
operations management, and cloud platform management functionality.

3.4.4.1 Disaster Recovery Design


The SDDC disaster recovery design includes two locations:
 Protected Region A in San Francisco. Region A contains the management stack virtual machine
workloads that are being protected and is referred to as the protected region in this document.
 Recovery Region B in Los Angeles. Region B provides an environment to host virtual machines from the
protected region in the case of a disaster and is referred to as the recovery region.
Site Recovery Manager can automate the setup and execution of disaster recovery plans between
these two regions.

Note A region in the VMware Validated Design is equivalent to the site construct in Site Recovery
Manager.

3.4.4.2 Disaster Recovery Logical Design


Certain SDDC management applications and services must be available in the event of a disaster.
These management applications are running on vSphere virtual machines, and can have
dependencies on applications and services that run in both regions.
This validated design for disaster recovery defined the following logical configuration of the SDDC
management applications:
 Region A has a management cluster of ESXi hosts with management application virtual machines
that must be protected.
 Region B has a management cluster of ESXi hosts with sufficient free capacity to host the
protected management applications from Region A.
 Each region has a vCenter Server instance for the management ESXi hosts within the region.

© 2016 VMware, Inc. All rights reserved.


Page 213 of 220
VMware Validated Design Reference Architecture Guide

 Each region has a Site Recovery Manager server with an embedded Site Recovery Manager
database.
 In each region, Site Recovery Manager is integrated with the Management vCenter Server
instance.
 vSphere Replication provides hypervisor-based virtual machine replication between Region A and
Region B.
 vSphere Replication replicates data from Region A to Region B by using a dedicated VMkernel
TCP/IP stack.
 Users and administrators access management applications from other branch offices and remote
locations over the corporate Local Area Network (LAN), Wide Area Network (WAN), and Virtual
Private Network (VPN).
Figure 64. Disaster Recovery Logical Design

3.4.4.3 Deployment Design for Site Recovery Manager


A separate Site Recovery Manager instance is required for both the protected region and the recovery
region. Install and configure Site Recovery Manager after you install and configure vCenter Server
and the Platform Services Controller in the region. Site Recovery Manager takes advantage of
vCenter Server and Platform Services Controller services such as storage management,
authentication, authorization, and guest customization. Site Recovery Manager uses the standard set
of vSphere administrator tools to manage these services.

© 2016 VMware, Inc. All rights reserved.


Page 214 of 220
VMware Validated Design Reference Architecture Guide

You have the following options for deployment and pairing of vCenter Server and Site Recovery
Manager:
 vCenter Server options
o You can use Site Recovery Manager and vSphere Replication with vCenter Server Appliance
or with vCenter Server for Windows.
o You can deploy a vCenter Server Appliance in one region and a vCenter Server for Windows
instance in the other region.
 Site Recovery Manager options
o You can use either a physical system or a virtual system.
o You can deploy Site Recovery Manager on a shared system, such as the system of vCenter
Server for Windows, or on a dedicated system.

Table 226. Design Decisions for Site Recovery Manager and vSphere Replication Deployment

Decision
Design Decision Design Justification Design Implication
ID

SDDC- Deploy Site Recovery All components of the None.


OPS-DR- Manager in a virtual SDDC solution must
001 machine. support the highest levels
of availability. When Site
Recovery Manager runs as
a virtual machine, you can
enable the availability
capabilities of vCenter
Server clusters.

SDDC- Deploy each Site Recovery All management


OPS-DR- Manager instance in the components must be in the None.
002 management cluster. same cluster.
Reduce the dependence
Deploy each Site Recovery on external components. Requires assigning database
SDDC-
Manager instance with an administrators who have the
OPS-DR-
embedded Reduce potential database skills and tools to administer
003
PostgreSQL database. licensing costs. PostgreSQL databases.

Replacing the default


Similarly to vCenter Server,
SDDC- Deploy each Site Recovery certificates with trusted CA-
Site Recovery Manager
OPS-DR- Manager instance with signed certificates
must use trusted CA-
004 trusted certificates. complicates installation and
signed certificates.
configuration.

3.4.4.4 Networking Design for Disaster Recovery


Moving a service physically from one region to another represents a networking challenge, especially
if applications have hard-coded IP addresses. Network address space and IP address assignment
considerations require that you either use the same IP address or a different IP address at the
recovery region. In many situations, you assign new IP addresses because VLANs do not typically
stretch between regions.
While protecting the management applications, you can simplify the IP address assignment. This
design leverages a load balancer to separate a public network segment and a private network

© 2016 VMware, Inc. All rights reserved.


Page 215 of 220
VMware Validated Design Reference Architecture Guide

segment. The private network can remain unchanged. You only reassign the external load balancer
interface.
 On the public network segment, each management application is accessible under one or more
virtual IP (VIP) addresses.
 On the isolated application virtual network segment, the virtual machines of each management
application are isolated.
After a failover, the recovered application is available under a different IPv4 address (VIP). The use
of the new IP address requires changes to the DNS records. You can change the DNS records
manually or by using a script in the Site Recovery Manager recovery plan.
Figure 65. Logical Network Design for Cross-Region Deployment with Management
Application Network Containers

The IPv4 subnets (orange networks) are routed within the vSphere management network of each
region. Nodes on these network segments are reachable from within the SDDC. IPv4 subnets, such
as the subnet for the vRealize Automation primary components, overlap across a region. Make sure
that only the active IPv4 subnet is propagated in the region and beyond. The public facing Ext-Mgmt

© 2016 VMware, Inc. All rights reserved.


Page 216 of 220
VMware Validated Design Reference Architecture Guide

network of both regions (grey networks) is reachable by SDDC users and provides connection to
external resources, such as Active Directory or DNS. See Virtualization Network Design.
Load balancing functionality is provided by NSX Edge devices, each fronting a network that contains
the protected components of all management applications. In each region, you use the same
configuration for the management applications and their Site Recovery Manager shadow. Active
Directory and DNS services must be running in both the protected and recovery regions.

3.4.4.5 vSphere Replication


In a VMware Virtual SAN environment, you cannot configure array-based replication. You use
vSphere Replication instead to transfer VMs between regions.
vSphere Replication uses a VMkernel management interface on the ESXi host to send replication
traffic to the vSphere Replication appliance in the recovery region. To isolate vSphere Replication
traffic so that it does not impact other vSphere management traffic, configure the vSphere Replication
network in the following way.
 Place vSphere Replication traffic on a dedicated VMkernel adapter.
 Ensure that the vSphere Replication VMkernel adapter uses a dedicated replication VLAN in the
region.
 Attach the vSphere Replication server network adapter to the dedicated vSphere Replication
VLAN in the region
 Enable the service for vSphere Replication and vSphere Replication NFC traffic on the dedicated
vSphere Replication VMkernel adapter.
For more information about the vSphere Replication traffic on the management ESXi hosts,
see Virtualization Network Design.
vSphere Replication appliances and vSphere Replication servers are the target for the replication
traffic that originates from the vSphere Replication VMkernel ports.
Table 227. vSphere Replication Design Decisions

Decision
Design Decision Design Justification Design Implication
ID

SDDC- Set up a dedicated Ensures that vSphere Replication You must allocate a
OPS-DR- vSphere Replication traffic does not impact other dedicated VLAN for
005 distributed port group. vSphere management traffic. vSphere Replication.

The vSphere Replication servers


potentially receive large amounts
Set up a dedicated
SDDC- of data from the VMkernel adapters
VMkernel adapter on
OPS-DR- on the ESXi hosts. Ensure that the None.
the management ESXi
006 ESXi server replication traffic is
hosts
redirected to the dedicate vSphere
Replication VLAN.

Attach a virtual vSphere Replication VMs


network adapter for might require additional
SDDC- Ensures that the vSphere
the vSphere network adapters for
OPS-DR- Replication VMs can communicate
Replication VMs to the communication on the
007 on the correct replication VLAN.
vSphere Replication management and
port group. replication VLANs.

3.4.4.6 Placeholder Virtual Machines


Site Recovery Manager creates a placeholder virtual machine on the recovery region for every
machine from the Site Recovery Manager protection group. Placeholder virtual machine files are
small because they contain virtual machine configuration metadata but no virtual machine disks. Site
© 2016 VMware, Inc. All rights reserved.
Page 217 of 220
VMware Validated Design Reference Architecture Guide

Recovery Manager adds the placeholder virtual machines as recovery region objects to the
Management vCenter Server.

3.4.4.7 Snapshot Space


To perform failover tests, you must provide additional storage for the snapshots of the replicated VMs.
This storage is minimal in the beginning, but grows as test VMs write to their disks. Replication from
the protected region to the recovery region continues during this time. The snapshots created during
testing are deleted after the failover test is complete.

3.4.4.8 Messages and Commands for Site Recovery Manager


You can configure Site Recovery Manager to present messages for notification and accept
acknowledgement to users. Site Recovery Manager also provides a mechanism to run commands
and scripts as necessary when executing a recovery plan. You can insert pre-power-on or post-
power-on messages and commands to the recovery plans. These messages and commands are not
specific to Site Recovery Manager, but support pausing the execution of the recovery plan to
complete other procedures, or executing customer-specific commands or scripts to enable automation
of recovery tasks.

3.4.4.9 Site Recovery Manager Messages


Some additional steps might be required before, during, and after recovery plan execution. For
example, you might set up the environment so that a message appears when a recovery plan is
initiated, and that the administrator must acknowledge the message before the recovery plan
continues. Messages are specific to each IT organization.
Consider the following example messages and confirmation steps:
 Verify that IP address changes are made on the DNS server and that the changes are
propagated.
 Verify that the Active Directory services are available.
 After the management applications are recovered, perform application tests to verify that the
applications are recovered correctly.
Additionally, confirmation steps can be inserted after every group of services that have a dependency
on other services. These conformations can be used to pause the recovery plan so that appropriate
verification and testing be performed before subsequent steps are taken. These services are defined
as follows:
 Infrastructure services
 Core services
 Database services
 Middleware services
 Application services
 Web services
Details on each message are specified in the workflow definition of the individual recovery plan.

3.4.4.10 Site Recovery Manager Commands


You can run custom scripts to perform infrastructure configuration updates or configuration changes
on the virtual machine environment. The scripts that a recovery plan executes are located on the Site
Recovery Manager server. The scripts can run against the Site Recovery Manager server or can
impact a virtual machine.
If a script must run in the virtual machine, Site Recovery Manager does not run it directly, but instructs
the virtual machine to do it. The audit trail that Site Recovery Manager provides does not record the
execution of the script because the operation is on the target virtual machine.

© 2016 VMware, Inc. All rights reserved.


Page 218 of 220
VMware Validated Design Reference Architecture Guide

Scripts or commands must be available in the path on the virtual machine according to the following
guidelines:
 Use full paths to all executables. For example c:\windows\system32\cmd.exe instead of
cmd.exe.
 Call only.exe or .com files from the scripts. Command-line scripts can call only executables.

 To run a batch file, start the shell command with c:\windows\system32\cmd.exe.

The scripts that are run after powering on a virtual machine are executed under the Local Security
Authority of the Site Recovery Manager server. Store post-power-on scripts on the Site Recovery
Manager virtual machine. Do not store such scripts on a remote network share.

3.4.4.11 Recovery Plans for Site Recovery Manager


A recovery plan is the automated plan (runbook) for full or partial failover from Region A to Region B.

3.4.4.12 Startup Order and Response Time


Virtual machine priority determines virtual machine startup order.
 All priority 1 virtual machines are started before priority 2 virtual machines.
 All priority 2 virtual machines are started before priority 3 virtual machines.
 All priority 3 virtual machines are started before priority 4 virtual machines.
 All priority 4 virtual machines are started before priority 5 virtual machines.
 You can additionally set startup order of virtual machines within each priority group.
You can configure the following timeout parameters:
 Response time, which defines the time to wait after the first virtual machine powers on before
proceeding to the next virtual machine in the plan.
 Maximum time to wait if the virtual machine fails to power on before proceeding to the next virtual
machine.
You can adjust response time values as necessary during execution of the recovery plan test to
determine the appropriate response time values.

3.4.4.13 Recovery Plan Test Network


When you create a recovery plan, you must configure test network options. The following options are
available.
 Isolated Network (Automatically Created). An isolated private network is created automatically
on each ESXi host in the cluster for a virtual machine that is being recovered. Site Recovery
Manager creates a standard switch and a port group on it.
A limitation of this automatic configuration is that a virtual machine connected to the isolated port
group on one ESXi host cannot communicate with a virtual machine on another ESXi host. This
option limits testing scenarios and provides an isolated test network only for basic virtual machine
testing.
 Port Group. Selecting an existing port group provides a more granular configuration to meet your
testing requirements. If you want virtual machines across ESXi hosts to communicate, use either
a standard or distributed switch with uplinks to the production network, and create a port group on
the switch that is tagged with a non-routable VLAN. In this way, the network is isolated and
cannot communicate with other production networks.
Because the isolated application networks are fronted by a load balancer, the recovery plan test
network is equal to the recovery plan production network and provides realistic verification of a
recovered management application.

© 2016 VMware, Inc. All rights reserved.


Page 219 of 220
VMware Validated Design Reference Architecture Guide

Table 228. Recovery Plan Test Network Design Decision

Decision Design
Design Justification Design Implication
ID Decision

During recovery testing, a management


Use the target The design of the application will not be reachable using its
SDDC- recovery application virtual production FQDN. Either access the
OPS-DR- production networks supports application using its VIP address or assign a
008 network for their use as recovery temporary FQDN for testing. Note that this
testing. plan test networks. approach will result in certificate warnings due
to name and certificate mismatches.

© 2016 VMware, Inc. All rights reserved.


Page 220 of 220

You might also like