0% found this document useful (0 votes)
46 views

Tko Reference Architecture

Uploaded by

Larry ming
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

Tko Reference Architecture

Uploaded by

Larry ming
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 489

VMware Tanzu for

Kubernetes Operations
Reference Architecture 2.3
VMware Tanzu for Kubernetes Operations 2.3
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

You can find the most up-to-date technical documentation on the VMware website at:
https://docs.vmware.com/

VMware, Inc.
3401 Hillview Ave.
Palo Alto, CA 94304
www.vmware.com

Copyright © 2023 VMware, Inc. All rights reserved. Copyright and trademark information.

VMware, Inc 2
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Contents

VMware Tanzu for Kubernetes Operations Reference Architecture 2.3 17


Components 18

VMware Tanzu for Kubernetes Operation on Public Cloud Reference 20


Designs and Deployment

VMware Tanzu for Kubernetes Operations on VMware Cloud on AWS 20


Reference Design
Supported Component Matrix 21
Benefits of Running VMware Tanzu on VMware Cloud on AWS 21
Tanzu Kubernetes Grid Components 22
Tanzu Kubernetes Grid Storage 23
Tanzu Kubernetes Clusters Networking 24
Tanzu Kubernetes Grid Infrastructure Networking 25
Tanzu Kubernetes Grid on NSX-T Networking with NSX Advanced Load Balancer 25
NSX Advanced Load Balancer Components 26
Tanzu Kubernetes Grid Clusters Recommendations 27
Network Architecture 28
Network Requirements 30
Network Recommendations 30
Subnet and CIDR Examples 31
Firewall Requirements 31
Optional Firewall Rules 33
NSX Advanced Load Balancer Recommendations 34
NSX Advanced Load Balancer Service Engine Recommendations 35
Kubernetes Ingress Routing 36
NSX Advanced Load Balancer as in L4+L7 Ingress Service Provider 37
L7 Ingress in ClusterIP Mode 38
L7 Ingress in NodePort Mode 38
L7 Ingress in NodePortLocal Mode 38
NSX Advanced Load Balancer L4 Ingress with Contour L7 Ingress 39
NSX Advanced Load Balancer L7 Ingress Recommendations 39
Container Registry 39
Tanzu Kubernetes Grid Monitoring 40
Tanzu Kubernetes Grid Logging 41

VMware, Inc 3
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Bring Your Own Images for Tanzu Kubernetes Grid Deployment 41


Compliance and Security 42
Installation Experience 42
Deployment Instructions 43
Summary 43
Supplemental Information 44
Automating Deployment of Service Engines 44
Appendix A - Configure Node Sizes 46
Use Predefined Node Configurations 46
Define Custom Node Configurations 47
Appendix B - NSX Advanced Load Balancer Sizing Guidelines 47
NSX Advanced Load Balancer Controller Sizing Guidelines 47
Service Engine Sizing Guidelines 48

Deploy VMware Tanzu for Kubernetes Operations on VMware Cloud on 48


AWS
Deploying with VMware Service Installer for Tanzu 48
Prerequisites 48
Supported Component Matrix 49
Prepare the Environment for Deploying Tanzu for Kubernetes Operations 49
General Requirements 49
Resource Pools and VM Folders 50
Network Requirements 50
Firewall Requirements 50
Subnet and CIDR Examples 50
Tanzu for Kubernetes Operations Deployment Overview 51
Deploy and Configure NSX Advanced Load Balancer 51
Configure Licensing 54
NSX Advanced Load Balancer: NTP Configuration 55
NSX Advanced Load Balancer: Controller High Availability 56
Change NSX Advanced Load Balancer Portal Certificate 58
Export NSX Advanced Load Balancer Certificate 61
NSX Advanced Load Balancer: Create No Orchestrator Cloud and SE Groups 62
Configure Service Engine Groups 63
Configure VIP Networks 64
Configure Routing 66
Configuring IPAM and DNS Profiles 66
Deploy and Configure Service Engine 68
Import the Service Engine Image File into the Content Library 69
Generate the Cluster UUID and Authentication Token 69

VMware, Inc 4
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Deploy Service Engine VMs for Tanzu Kubernetes Grid Management Cluster 69
Deploy Service Engines for Tanzu Kubernetes Grid Workload cluster 71
Deploy and Configure Tanzu Kubernetes Grid 73
Deploy and Configure Bootstrap Machine 73
Import Base Image template for Tanzu Kubernetes Grid Cluster Deployment 76
Deploy Tanzu Kubernetes Grid Management Cluster 77
Register Management Cluster with Tanzu Mission Control 86
Create AKO Deployment Config for Tanzu Kubernetes Grid Workload Cluster 86
Configure AKO Deployment Config (ADC) for Shared Services Cluster 87
Configure AKO Deployment Config (ADC) for Workload Cluster to Enable NSX 89
Advanced Load Balancer L7 Ingress with NodePortLocal Mode
Deploy Tanzu Kubernetes Grid Shared Services Cluster 91
Deploy Tanzu Kubernetes Clusters (Workload Clusters) 97
Integrate Tanzu Kubernetes Clusters with Tanzu Observability 98
Integrate Tanzu Kubernetes Clusters with Tanzu Service Mesh 98
Deploy User-Managed Packages on Tanzu Kubernetes clusters 99

VMware Tanzu for Kubernetes Operations on AWS Reference Design 99


Supported Kubernetes Versions in Tanzu Kubernetes Grid v1.6 100
Tanzu Kubernetes Grid v1.6 Components 100
Network Overview 101
Network Recommendations 101
Storage 102
VPC Architectures 102
Single VPC with Multiple Availability Zones 103
Multiple VPC with Multiple Availability Zones 103
Availability 105
Quotas 105
Cluster Creation and Management 105
Global Cluster Lifecycle Management 107
Tanzu Kubernetes Clusters Networking 109
Ingress and Load Balancing 110
Authentication with Pinniped 111
Observability 112
Metrics Monitoring with Tanzu Observability by Wavefront (Recommended 112
Solution)
Custom Tanzu Observability Dashboards 113
Metrics Monitoring with Prometheus and Grafana (Alternative Solution) 113
Log Forwarding 114
Summary 115

VMware, Inc 5
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Deployment Instructions 115

Deploy VMware Tanzu for Kubernetes Operations on AWS 115


Deploying with VMware Service Installer for Tanzu 115
Prerequisites 115
Overview of the Deployment Steps 116
Set up AWS Infrastructure 116
Create and Set Up a Jumpbox 119
Prepare an External Identity Management 121
Deploy a Tanzu Kubernetes Grid Management Cluster 121
Deploy a Management Cluster from the Tanzu Kubernetes Grid Installer 121
Deploy Management Clusters from a Configuration File 126
Examine the Management Cluster Deployment 128
Deploy Workload Clusters 128
Troubleshooting Tips for Tanzu Kubernetes Grid 128
Install and Configure Packages into Workload Clusters 129
Auto-Managed Packages 129
CLI-Managed Packages 129
Configure SaaS Services 129
Delete Clusters 130
Delete a Workload Cluster 130
Delete a Management Cluster 130
Logs and Troubleshooting 130

VMware Tanzu for Kubernetes Operations on Azure Reference Design 130


Cluster Creation and Management 131
Tanzu Clusters 133
Network Design 133
Same Virtual Network 133
Separate Virtual Networks 134
Considerations 135
Required Microsoft Azure Components 136
Quotas 136
Application Registration or Service Principal 136
Virtual Network 136
Load Balancer 136
Network Security Group (NSG) 137
Virtual Machines 137
Azure Backup 137
Azure Monitor 138

VMware, Inc 6
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Optional Azure Components 138


Bastion Host 138
Public IP 138
Container Registries 138
Azure Container Registry 139
Private Container Registry Options 139
Public Container Registry Options 139
Global Cluster Lifecycle Management 139
Ingress and Load Balancing 141
Authentication with Pinniped 141
Observability 142
Metrics Monitoring with Tanzu Observability by Wavefront (Recommended 143
Solution)
Custom Tanzu Observability Dashboards 144
Metrics Monitoring with Prometheus and Grafana (Alternative Solution) 144
Log Forwarding 145
Summary 145
Deployment Instructions 146

Deploy VMware Tanzu for Kubernetes Operations on Microsoft Azure 146


Prerequisites 147
Overview of the Deployment steps 147
Set up your Microsoft Azure Environment 147
Azure ARM Template 148
Quotas 148
ARM Template Deployment 149
Deploy ARM Template 149
Azure CLI 149
Azure PowerShell 149
Azure Portal 149
Azure Service Principal/Application Registration Creation 151
Set Up Bootstrap VM 156
Deploy Tanzu Kubernetes Grid 158
Configure SaaS Services 159
(Optional) Deploy Tanzu Kubernetes Grid Packages 159
Auto-Managed Packages 159
CLI-Managed Packages 159

VMware Tanzu for Kubernetes Operation on vSphere Reference 161


Designs and Deployment

VMware, Inc 7
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

VMware Tanzu for Kubernetes Operations on vSphere Reference 161


Design
Supported Component Matrix 162
Tanzu Kubernetes Grid Components 162
Tanzu Kubernetes Grid Storage 164
Tanzu Kubernetes Clusters Networking 164
Tanzu Kubernetes Grid Infrastructure Networking 165
Tanzu Kubernetes Grid on vSphere Networking with NSX Advanced Load 165
Balancer
NSX Advanced Load Balancer Licensing 166
VMware NSX ALB Enterprise Edition 166
VMware NSX Advanced Load Balancer essentials for Tanzu 166
NSX Advanced Load Balancer Components 166
Tanzu Kubernetes Grid Clusters Recommendations 168
Generic Network Architecture 168
Network Requirements 170
Network Recommendations 170
Subnet and CIDR Examples 171
3-Network Architecture 171
Network Requirements 172
Firewall Requirements 173
NSX Advanced Load Balancer Recommendations 174
NSX Advanced Load Balancer Service Engine Recommendations 175
Kubernetes Ingress Routing 177
NSX ALB as in L4+L7 Ingress Service Provider 178
L7 Ingress in ClusterIP Mode 179
L7 Ingress in NodePort Mode 179
L7 Ingress in NodePortLocal Mode 179
NSX ALB L4 Ingress with Contour L7 Ingress 180
NSX Advanced Load Balancer L7 Ingress Recommendations 180
Container Registry 180
Tanzu Kubernetes Grid Monitoring 181
Tanzu Kubernetes Grid Logging 182
Bring Your Own Images for Tanzu Kubernetes Grid Deployment 183
Compliance and Security 183
Installation Experience 184
Deployment Instructions 185
Summary 185
Appendix A - Configure Node Sizes 185
Use Predefined Node Configurations 186

VMware, Inc 8
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Define Custom Node Configurations 186


Appendix B - NSX Advanced Load Balancer Sizing Guidelines 187
NSX Advanced Load Balancer Controller Sizing Guidelines 187
Service Engine Sizing Guidelines 187

Deploy VMware Tanzu for Kubernetes Operations on vSphere 187


Deploying with VMware Service Installer for Tanzu 187
Supported Component Matrix 188
Prepare your Environment for Deploying Tanzu for Kubernetes Operations 188
General Requirements 188
Resource Pools and VM Folders 189
Network Requirements 189
Firewall Requirements 190
Subnet and CIDR Examples 190
Tanzu for Kubernetes Operations Deployment Overview 190
Deploy and Configure NSX Advanced Load Balancer 190
Deploy NSX Advanced Load Balancer 191
NSX Advanced Load Balancer: Initial Setup 192
NSX Advanced Load Balancer: NTP Configuration 194
NSX Advanced Load Balancer: Licensing 195
NSX Advanced Load Balancer: Controller High Availability 196
NSX Advanced Load Balancer: Certificate Management 200
NSX Advanced Load Balancer: Create vCenter Cloud and SE Groups 203
NSX Advanced Load Balancer: Configure Network and IPAM Profile 208
Configure Tanzu Kubernetes Grid Networks in NSX Advanced Load 208
Balancer
Create IPAM and DNS Profile in NSX Advanced Load Balancer and Attach 210
it to Cloud
Deploy and Configure Bootstrap Machine 213
Import Base Image template for Tanzu Kubernetes Grid Cluster Deployment 216
Deploy Tanzu Kubernetes Grid (TKG) Management Cluster 217
Register Management Cluster with Tanzu Mission Control 226
Configure AKO Deployment Config (ADC) for Workload Clusters 227
Configure AKODeploymentConfig (ADC) for Shared Services Cluster 227
Configure AKO Deployment Config (ADC) for Workload Cluster to Enable NSX 229
ALB L7 Ingress with NodePortLocal Mode
Deploy Tanzu Kubernetes Grid Shared Services Cluster 231
Deploy Tanzu Kubernetes Grid Workload Clusters 237
Configure Tanzu SaaS Components and Deploy User-Managed Packages 238

VMware, Inc 9
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

VMware Tanzu for Kubernetes Operations on vSphere with NSX 238


Networking Reference Design
Supported Component Matrix 239
Tanzu Kubernetes Grid Components 239
Tanzu Kubernetes Grid Storage 241
Tanzu Kubernetes Clusters Networking 241
Deploy Pods with Routable and No-NAT IP Addresses (NSX) 242
Tanzu Kubernetes Grid Infrastructure Networking 242
Tanzu Kubernetes Grid on VMware NSX Data Center Networking with NSX 243
Advanced Load Balancer
NSX Advanced Load Balancer Components 243
Network Architecture 245
Network Requirements 246
Subnet and CIDR Examples 247
Firewall Requirements 248
Installation Experience 250
Kubernetes Ingress Routing 251
NSX Advanced Load Balancer (ALB) as an L4+L7 Ingress Service Provider 252
L7 Ingress in ClusterIP Mode 253
L7 Ingress in NodePort Mode 253
L7 Ingress in NodePortLocal Mode 253
NSX Advanced Load Balancer L4 Ingress with Contour L7 Ingress 254
Design Recommendations 254
NSX Advanced Load Balancer Recommendations 254
NSX Advanced Load Balancer Service Engine Recommendations 256
NSX Advanced Load Balancer L7 Ingress Recommendations 256
Network Recommendations 257
Tanzu Kubernetes Grid Clusters Recommendations 258
Container Registry 259
Tanzu Kubernetes Grid Monitoring 259
Tanzu Kubernetes Grid Logging 260
Bring Your Own Images for Tanzu Kubernetes Grid Deployment 261
Compliance and Security 261
Tanzu Kubernetes Grid and Tanzu SaaS Integration 262
Appendix A - Configure Node Sizes 262
Use Predefined Node Configuration 262
Define Custom Node Configurations 263
Appendix B - NSX Advanced Load Balancer Sizing Guidelines 263
NSX Advanced Load Balancer Controller Sizing Guidelines 263
Service Engine Sizing Guidelines 263

VMware, Inc 10
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Summary 264
Deployment Instructions 264

Deploy VMware Tanzu for Kubernetes Operations on VMware vSphere 264


with VMware NSX
Deploying with VMware Service Installer for Tanzu 264
Supported Component Matrix 265
Prepare Environment for Deploying Tanzu for Kubernetes Operations 265
General Requirements 265
Network Requirements 267
Firewall Requirements 267
Subnet and CIDR Example 267
Deployment Overview 267
Configure T1 Gateway and Logical Segments in NSX-T Data Center 268
Add a Tier-1 Gateway 268
DHCP configuration on Tier-1 Gateway 269
Create Overlay-Backed Segments 270
Deploy and Configure NSX Advanced Load Balancer 273
Deploy NSX Advanced Load Balancer 273
NSX Advanced Load Balancer: Initial Setup 274
NSX Advanced Load Balancer: NTP Configuration 276
NSX Advanced Load Balancer: Licensing 277
NSX Advanced Load Balancer: Controller High Availability 279
NSX Advanced Load Balancer: Certificate Management 281
Create Credentials 283
Create NSX Cloud and Service Engine Groups 284
Configure Network and IPAM Profile 291
Create IPAM Profile in NSX Advanced Load Balancer and Attach to Cloud 294
Deploy and Configure Bootstrap Machine 297
Import Base Image Template for Tanzu Kubernetes Grid Cluster Deployment 301
Deploy Tanzu Kubernetes Grid Management Cluster 302
What to Do Next 312
Register Management Cluster with Tanzu Mission Control 312
Configure AKO Deployment Config (ADC) for Workload Clusters 312
Configure AKODeploymentConfig (ADC) for Shared Services Cluster 313
Configure AKO Deployment Config (ADC) for Workload Cluster to Enable NSX 315
ALB L7 Ingress with NodePortLocal Mode
Deploy Tanzu Kubernetes Grid Shared Services Cluster 317
Deploy Tanzu Kubernetes Grid Workload Cluster 323
Integrate Tanzu Kubernetes clusters with Tanzu Observability 324

VMware, Inc 11
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Integrate Tanzu Kubernetes clusters with Tanzu Service Mesh 324


Deploy User-Managed Packages on Tanzu Kubernetes clusters 325
Single VIP Network Architecture 325
Appendix A - Management Cluster yaml file 325
Appendix B - Shared Service Cluster ADC file 326
Appendix C - Workload Cluster ADC file 327

VMware Tanzu for Kubernetes Operation on vSphere with Tanzu 329


Reference Designs and Deployment

VMware Tanzu for Kubernetes Operations using vSphere with Tanzu 329
Reference Design
vSphere with Tanzu Components 330
vSphere with Tanzu Architecture 331
Supported Component Matrix 333
vSphere with Tanzu Storage 333
Tanzu Kubernetes Clusters Networking 334
Networking for vSphere with Tanzu 335
vSphere with Tanzu on vSphere Networking with NSX Advanced Load 335
Balancer
NSX Advanced Load Balancer Components 336
Network Architecture 337
Subnet and CIDR Examples 339
Firewall Requirements 340
Deployment options 342
Single-Zone Deployment of Supervisor 342
Three-Zone Deployment of Supervisor 342
Installation Experience 342
Design Recommendations 343
NSX Advanced Load Balancer Recommendations 343
Network Recommendations 345
Recommendations for Supervisor Clusters 345
Recommendations for Tanzu Kubernetes Clusters 346
Kubernetes Ingress Routing 346
NSX Advanced Load Balancer Sizing Guidelines 347
NSX Advanced Load Balancer Controller Configuration 347
Service Engine Sizing Guidelines 347
Container Registry 348
vSphere with Tanzu SaaS Integration 349
Custom Tanzu Observability Dashboards 349
Summary 349

VMware, Inc 12
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Deployment Instructions 349

Deploy VMware Tanzu for Kubernetes Operations using vSphere with 350
Tanzu
Deploying with VMware Service Installer for Tanzu 350
Prerequisites 350
General Requirements 350
Network Requirements 351
Firewall Requirements 352
Resource Pools 352
Deployment Overview 352
Deploy and Configure NSX Advanced Load Balancer 353
Deploy NSX Advance Load Balancer Controller Node 353
Configure the Controller Node for your vSphere with Tanzu Environment 353
Configure Default-Cloud 355
Configure Licensing 358
Configure NTP Settings 359
Deploy NSX Advanced Load Balancer Controller Cluster 360
Change NSX Advanced Load Balancer Portal Default Certificate 362
Export NSX Advanced Load Balancer Certificate 364
Configure a Service Engine Group 364
Configure a Virtual IP Subnet for the Data Network 364
Configure Default Gateway 365
Configure IPAM and DNS Profile 366
Deploy Tanzu Kubernetes Grid Supervisor Cluster 369
Download and Install the Kubernetes CLI Tools for vSphere 374
Connect to the Supervisor Cluster 374
Create and Configure vSphere Namespaces 375
Configure Permissions for the Namespace 376
Set Persistent Storage to the Namespace 377
Specify Namespace Capacity Limits 378
Associate VM Class with Namespace 379
Register Supervisor Cluster with Tanzu Mission Control 381
Deploy Tanzu Kubernetes Clusters (Workload Cluster) 384
Integrate Tanzu Kubernetes clusters with Tanzu Observability 390
Integrate Tanzu Kubernetes Clusters with Tanzu Service Mesh 390
Integrate Tanzu Kubernetes clusters with Tanzu Observability 390
Deploy User-Managed Packages on Tanzu Kubernetes clusters 390
Self-Service Namespace in vSphere with Tanzu 390

VMware, Inc 13
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

VMware Tanzu for Kubernetes Operations using vSphere with Tanzu on 393
NSX Reference Design
Supported Component Matrix 394
vSphere with Tanzu Components 394
Identity and Access Management 396
Roles and Permissions 397
vSphere with Tanzu Architecture 397
vSphere with Tanzu Storage 398
Networking for vSphere with Tanzu 399
Network Requirements 401
Firewall Recommendations 402
Network Segmentation 404
Deployment options 404
Single-Zone Deployment of Supervisor 405
Three-Zone Deployment of Supervisor 405
Installation Experience 405
vSphere Namespaces 407
Tanzu Kubernetes Grid Cluster API’s 408
Tanzu Kubernetes Clusters Networking 408
Kubernetes Ingress Routing 409
Container Registry 409
Scale a Tanzu Kubernetes Grid Cluster 410
Backup And Restore 410
vSphere with Tanzu SaaS Integration 411
Custom Tanzu Observability Dashboards 411
Summary 411

VMware Tanzu for Kubernetes Operations using vSphere With Tanzu 411
Multi-AZ Reference Architecture on VDS Networking
Supported Component Matrix 412
vSphere with Tanzu Components 412
Identity and Access Management 415
vSphere with Tanzu Architecture for a Multi-Zone Deployment 415
Recommendations for using namespace in vSphere with Tanzu 416
vSphere with Tanzu Storage 416
Networking for vSphere with Tanzu 417
vSphere with Tanzu on VDS Networking with NSX Advanced Load Balancer 417
NSX Advanced Load Balancer Components 418
Network Architecture 421
Networking Prerequisites 422

VMware, Inc 14
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Networking Requirements 422


Subnet and CIDR Examples 423
Firewall Requirements 423
Installation Experience 425
Tanzu Kubernetes Grid Cluster APIs 427
Tanzu Kubernetes Clusters Networking 428
Recommendations for Tanzu Kubernetes Clusters 428
Kubernetes Ingress Routing 429
Container Registry 429
Scale a Tanzu Kubernetes Grid Cluster 430
Backup And Restore 430
Appendix A - Deploy TKG Cluster 431
Appendix B - Deploy StatefulSet Application to vSphere Zones 432
Appendix C - NSX Advanced Load Balancer Sizing Guidelines 434
NSX Advanced Load Balancer Sizing Guidelines 434
Service Engine Sizing Guidelines 434

VMware Tanzu for Kubernetes Operations using vSphere with Tanzu 435
Multi-AZ Reference Architecture on NSX Networking
Supported Component Matrix 435
vSphere With Tanzu Components 436
Identity and Access Management 438
Roles and Permissions 439
vSphere with Tanzu Architecture 439
vSphere with Tanzu Storage 440
Networking for vSphere with Tanzu 440
Networking Prerequisites 442
Network Requirements 442
Firewall Recommendations 443
Installation Experience 445
vSphere Namespaces 447
Tanzu Kubernetes Grid Workload Clusters 447
Tanzu Kubernetes Grid Cluster APIs 448
Tanzu Kubernetes Clusters Networking 449
Kubernetes Ingress Routing 449
Container Registry 450
Scale a Tanzu Kubernetes Grid Cluster 451
Backup And Restore 451
Appendix A - Deploy TKG Cluster 451
Appendix B - Deploy StatefulSet Application to vSphere Zones 453

VMware, Inc 15
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Authentication with Pinniped 455


Authentication Flow 455

Prepare External Identity Management 456


Enable and Configure Identity Management During Management Cluster 457
Deployment
Obtain Your Identity Provider Details 457
Configure LDAPS or OIDC Settings in Tanzu Kubernetes Grid 457
Enable and Configure Identity Management in an Existing Deployment 459
Obtain your identity provider details 459
Generate the Pinniped Add-on Secret for the Management Cluster and deploy 460
Pinniped package
Complete the Identity Management Configuration on Management Cluster 463
Connect kubectl to the Management Cluster 463
Check the Status of an LDAP Identity Management Service 464
Check the Status of an OIDC Identity Management Service 464
Provide the Callback URI to the OIDC Provider (OIDC Only) 465
Enable Identity Management on Workload Clusters 466
Configure RBAC 468
Generate and Test a Non-Administrator kubeconfig File for the Tanzu Clusters 469
Create a Role Binding on the Management Cluster 470
Authenticate Users on a Machine Without a Browser 472

Enable Data Protection on a Workload Cluster and Configure Backup 475


Prerequisites 475
Enable Data Protection on Workload Cluster 475
Configure Backup 477
Restore Backup 481

VMware Tanzu for Kubernetes Operations Network Port Diagram - 483


Reference Sheet
Reference for Port Diagram 483

ClusterClass Overview 487


Benefits of using ClusterClass: 488
Cluster 488
Configuration of the Cluster topology 488
ClusterClass and Cluster CRD use cases in TKGm: 488

VMware, Inc 16
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

VMware Tanzu for Kubernetes Operations


Reference Architecture 2.3

Kubernetes is a great platform that provides development teams with a single API to deploy,
manage, and run applications. However, running, maintaining, and securing Kubernetes is a
complex task. VMware Tanzu for Kubernetes Operations (informally known as TKO) simplifies
Kubernetes operations. It determines what base OS instances to use, which Kubernetes Container
Network Interface (CNI) and Container Storage Interfaces (CSI) to use, how to secure the Kubernetes
API, and much more. It monitors, upgrades, and backs up clusters and helps teams provision,
manage, secure, and maintain Kubernetes clusters on a day-to-day basis.

Note

This reference architecture is tested to work with Tanzu Kubernetes Grid 2.3. This
reference architecture will be refreshed shortly to capture new features and
capabilities introduced in Tanzu Kubernetes Grid 2.3.

The following diagram provides a high-level reference architecture for deploying the components
available with Tanzu for Kubernetes Operations as a solution.

The reference architecture documentation provides several reference designs and the instructions
for deploying the reference designs. The reference designs are based on the high-level reference
architecture and they are tailored for deploying Tanzu for Kubernetes Operations on your IaaS or

VMware, Inc 17
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

infrastructure of choice.

The reference architecture and the reference designs are tested and supported by VMware.

Components
The following components are used in the reference architecture:

VMware Tanzu Kubernetes Grid - Enables creation and lifecycle management operations of
Kubernetes clusters.

vSphere with Tanzu - Transforms vSphere into a platform for running Kubernetes workloads
natively on the hypervisor layer. When enabled on a vSphere cluster, vSphere with Tanzu provides
the capability to run Kubernetes workloads directly on ESXi hosts and to create upstream
Kubernetes clusters within dedicated resource pools.

VMware Tanzu Mission Control - Provides a global view of Kubernetes clusters and allows for
centralized policy management across all deployed and attached clusters.

VMware Tanzu Observability by Wavefront - Provides a centralized management platform for


consistently operating and securing your Kubernetes infrastructure and modern applications across
multiple teams and clouds.

VMware Tanzu Service Mesh - Provides consistent control and security for microservices, end
users, and data, across all your clusters and clouds.

VMware NSX Advanced Load Balancer Enterprise Edition - Provides layer 4 service type load
balancer support. NSX Advanced Load Balancer is recommended for vSphere deployments without
NSX-T, or which have unique scale requirements.

Pinniped - Provides identity services to Kubernetes. It is an authentication service for Kubernetes to


set up integration with identity providers such as OKTA, Dex, and LDAP.

User-managed packages - Provides in-cluster and shared services to the Kubernetes clusters that
are running in your Tanzu Kubernetes Grid environment.

Cert Manager - Provides automated certificate management. It runs by default in


management clusters.

Contour - Provides layer 7 ingress control to deployed HTTP(S) applications. Tanzu


Kubernetes Grid includes signed binaries for Contour. Deploying Contour is a prerequisite for
deploying the Prometheus, Grafana, and Harbor extensions.

ExternalDNS - Publishes DNS records for applications to DNS servers. It uses a declarative
Kubernetes-native interface.

Fluent Bit - Collects data and logs from different sources, unifies them, and sends them to
multiple destinations. Tanzu Kubernetes Grid includes signed binaries for Fluent Bit.

Prometheus - Provides out-of-the-box health monitoring of Kubernetes clusters. The Tanzu


Kubernetes Grid implementation of Prometheus includes Alert Manager. You can configure
Alert Manager to notify you when certain events occur.

Grafana - Provides monitoring dashboards for displaying key health metrics of Kubernetes
clusters. Tanzu Kubernetes Grid includes an implementation of Grafana.

Harbor Image Registry - Provides a centralized location to push, pull, store, and scan

VMware, Inc 18
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

container images used in Kubernetes workloads. It supports storing artifacts such as Helm
Charts and includes enterprise-grade features such as RBAC, retention policies, automated
garbage cleanup, and docker hub proxying.

Multus CNI - Enables attaching multiple network interfaces to pods. Multus CNI is a container
network interface (CNI) plugin for Kubernetes that lets you attach multiple network interfaces
to a single pod and associate each with a different address range.

VMware, Inc 19
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

VMware Tanzu for Kubernetes Operation on


Public Cloud Reference Designs and
Deployment

The following documentation lays out the reference designs for deploying Tanzu for Kubernetes
Operation (informally known as TKO) on a cloud. You can deploy Tanzu for Kubernetes Operations
on VMware Cloud on AWS, directly on AWS, or on Microsoft Azure.

VMware Tanzu for Kubernetes Operations on VMware Cloud on AWS Reference Design
Deploy Tanzu for Kubernetes Operations on VMware Cloud on AWS

VMware Tanzu for Kubernetes Operations on AWS Reference Design


Deploy Tanzu for Kubernetes Operations on AWS

VMware Tanzu for Kubernetes Operations on Microsoft Azure Reference Design


Deploy Tanzu for Kubernetes Operations on Microsoft Azure

VMware Tanzu for Kubernetes Operations on VMware Cloud


on AWS Reference Design
Tanzu for Kubernetes Operations simplifies operating Kubernetes for multi-cloud deployment by
centralizing management and governance for clusters and teams across on-premises, public clouds,
and edge. Tanzu for Kubernetes Operations delivers an open source aligned Kubernetes distribution
with consistent operations and management to support infrastructure and application modernization.

This document lays out a reference design for deploying VMware Tanzu for Kubernetes Operations
on VMware Cloud on AWS.

Note

The scope of this document is limited to Tanzu Kubernetes Grid (multi-cloud), which
is a customer-managed solution.

The following reference design is based on the architecture and components described in VMware
Tanzu for Kubernetes Operations Reference Architecture.

VMware, Inc 20
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Supported Component Matrix


The following table provides the component versions and interoperability matrix supported with the
reference design:

Software Components Version

Tanzu Kubernetes Grid 2.3.0

VMC on AWS SDDC Version 1.18 and later

NSX Advanced Load Balancer 22.1.2

For the latest information about which software versions can be used together, check the
Interoperability Matrix here.

Benefits of Running VMware Tanzu on VMware Cloud on


AWS
VMware Cloud on AWS enables your IT and operations teams to add value to your investments in
AWS by extending your on-premises VMware vSphere environments to the AWS cloud. VMware
Cloud on AWS is an integrated cloud offering jointly developed by Amazon Web Services (AWS)
and VMware. It is optimized to run on dedicated, elastic, bare-metal Amazon Elastic Compute Cloud
(Amazon EC2) infrastructure and supported by VMware and its partners. For more information about
VMware Cloud on AWS, see VMware Cloud on AWS Documentation.

VMware Cloud on AWS enables the following:

1. Cloud Migrations

2. Data Center Extension

3. Disaster Recovery

VMware, Inc 21
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

4. Next Generation Applications

By running VMware Tanzu within the same infrastructure as the general VM workloads enabled by
the first three use cases, organizations can start their next generation application modernization
strategy immediately without incurring additional cost. For example, SDDC spare capacity can be
used to run Tanzu Kubernetes Grid to enable next generation application modernization, or compute
capacity not used by disaster recovery can be used for Tanzu Kubernetes Grid clusters.

The following additional benefits are enabled by the Elastic Network Interface that connects the
VMware Cloud on AWS SDDC to the AWS services within the Amazon VPC:

Enable developers to modernize existing enterprise apps with AWS cloud capabilities and
services.

Integrate modern application tools and frameworks to develop next generation applications.

Remove egress charges as all traffic is internal of the Amazon availability zone.

Tanzu Kubernetes Grid Components


VMware Tanzu Kubernetes Grid (TKG) provides organizations with a consistent, upstream-
compatible, regional Kubernetes substrate that is ready for end-user workloads and ecosystem
integrations. You can deploy Tanzu Kubernetes Grid across software-defined datacenters (SDDC)
and public cloud environments, including vSphere, Microsoft Azure, and Amazon EC2.

Tanzu Kubernetes Grid comprises the following components:

Management Cluster - A management cluster is the first element that you deploy when you
create a Tanzu Kubernetes Grid instance. The management cluster is a Kubernetes cluster
that performs the role of the primary management and operational center for the Tanzu
Kubernetes Grid instance. The management cluster is purpose-built for operating the
platform and managing the lifecycle of Tanzu Kubernetes clusters.

Tanzu Kubernetes Cluster - Tanzu Kubernetes clusters are the Kubernetes clusters in which
your application workloads run. These clusters are also referred to as workload clusters.
Tanzu Kubernetes clusters can run different versions of Kubernetes, depending on the
needs of the applications they run.

Shared Services Cluster - Each Tanzu Kubernetes Grid instance can have only one shared
services cluster. You deploy this cluster only if you intend to deploy shared services such as
Contour and Harbor.

ClusterClass API - Tanzu Kubernetes Grid 2 functions through the creation of a management
Kubernetes cluster which holds ClusterClass API. The ClusterClass API then interacts with the
infrastructure provider to service workload Kubernetes cluster lifecycle requests. The earlier
primitives of Tanzu Kubernetes Clusters will still exist for Tanzu Kubernetes Grid 1.X. A new
feature has been introduced as a part of Cluster API called ClusterClass which reduces the
need for redundant templating and enables powerful customization of clusters. The whole
process for creating a cluster using ClusterClass is the same as before but with slightly
different parameters.

Tanzu Kubernetes Cluster Plans - A cluster plan is a blueprint that describes the
configuration with which to deploy a Tanzu Kubernetes cluster. It provides a set of
configurable values that describe settings like the number of control plane machines, worker

VMware, Inc 22
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

machines, VM types, and so on.

The current release of Tanzu Kubernetes Grid provides two default templates, dev and prod.

Tanzu Kubernetes Grid Instance - A Tanzu Kubernetes Grid instance is the full deployment
of Tanzu Kubernetes Grid, including the management cluster, the workload clusters, and the
shared services cluster that you configure.

Tanzu CLI - A command-line utility that provides the necessary commands to build and
operate Tanzu management and Tanzu Kubernetes clusters.

Carvel Tools - Carvel is an open-source suite of reliable, single-purpose, composable tools


that aid in building, configuring, and deploying applications to Kubernetes. Tanzu Kubernetes
Grid uses the following Carvel tools:

ytt - A command-line tool for templating and patching YAML files. You can also use
ytt to collect fragments and piles of YAML into modular chunks for reuse.

kapp - The application deployment CLI for Kubernetes. It allows you to install,
upgrade, and delete multiple Kubernetes resources as one application.

kbld - An image-building and resolution tool.

imgpkg - A tool that enables Kubernetes to store configurations and the associated
container images as OCI images, and to transfer these images.

yq - A lightweight and portable command-line YAML, JSON, and XML processor. yq


uses jq-like syntax but works with YAML files as well as with JSON and XML.

Bootstrap Machine - The bootstrap machine is the laptop, host, or server on which you
download and run the Tanzu CLI. This is where the initial bootstrapping of a management
cluster occurs before it is pushed to the platform where it will run.

Tanzu Kubernetes Grid Installer - The Tanzu Kubernetes Grid installer is a graphical wizard
that you launch by running the tanzu management-cluster create --ui command. The
installer wizard runs locally on the bootstrap machine and provides a user interface to guide
you through the process of deploying a management cluster.

Tanzu Kubernetes Grid Storage


Many storage options are available and Kubernetes is agnostic about which option you choose.

For Kubernetes stateful workloads, Tanzu Kubernetes Grid installs the vSphere Container Storage
interface (vSphere CSI) to provision Kubernetes persistent volumes for pods automatically. While the
default vSAN storage policy can be used, site reliability engineers (SREs) and administrators should
evaluate the needs of their applications and craft a specific vSphere Storage Policy. vSAN storage
policies describe classes of storage such as SSD and NVME, as well as cluster quotas.

In vSphere 7u1+ environments with vSAN, the vSphere CSI driver for Kubernetes also supports
creating NFS File Volumes, which support ReadWriteMany access modes. This allows for
provisioning volumes which can be read and written from multiple pods simultaneously. To support
this, the vSAN File Service must be enabled.

You can also use other types of vSphere datastores. There are Tanzu Kubernetes Grid Cluster Plans
that operators can define to use a certain vSphere datastore when creating new workload clusters.
All developers would then have the ability to provision container-backed persistent volumes from

VMware, Inc 23
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

that underlying datastore.

Decision ID Design Decision Design Justification Design Implications

TKO-STG-001 Use vSAN storage for TKO VMC on AWS come with default vSAN storage. NA

While the default vSAN storage policy can be used, administrators should evaluate the needs of their
applications and craft a specific vSphere Storage Policy.

Starting with vSphere 7.0 environments with vSAN, the vSphere CSI driver for Kubernetes also
supports the creation of NFS File Volumes, which support ReadWriteMany access modes. This allows
for provisioning volumes, which can be read and written from multiple pods simultaneously. To
support this, you must enable vSAN File Service.

Tanzu Kubernetes Clusters Networking


A Tanzu Kubernetes cluster provisioned by the Tanzu Kubernetes Grid supports two Container
Network Interface (CNI) options:

Antrea

Calico

Both are open-source software that provide networking for cluster pods, services, and ingress.

When you deploy a Tanzu Kubernetes cluster using Tanzu Mission Control or Tanzu CLI, Antrea CNI
is automatically enabled in the cluster. To provision a Tanzu Kubernetes cluster using a non-default
CNI, see the following instructions:

Deploy Tanzu Kubernetes clusters with calico

Implement Multiple Pod Network Interfaces with Multus

Each CNI is suitable for a different use case. The following table lists some common use cases for the
three CNIs that Tanzu Kubernetes Grid supports. This table helps you select the most appropriate
CNI for your Tanzu Kubernetes Grid implementation.

CNI Use Case Pros and Cons

VMware, Inc 24
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Ant Enable Kubernetes pod networking with IP overlay networks using Pros
rea VXLAN or Geneve for encapsulation. Optionally encrypt node-to- - Antrea leverages Open vSwitch as
node communication using IPSec packet encryption. the networking data plane. Open
Antrea supports advanced network use cases like kernel bypass and vSwitch supports both Linux and
network service mesh. Windows.
- VMware supports the latest
conformant Kubernetes and stable
releases of Antrea.

Cali Calico is used in environments where factors like network Pros


co performance, flexibility, and power are essential. - Support for Network Policies
For routing packets between nodes, Calico leverages the BGP - High network performance
routing protocol instead of an overlay network. This eliminates the - SCTP support
need to wrap packets with an encapsulation layer, resulting in
Cons
increased network performance for Kubernetes workloads.
- No multicast support

Mul Multus CNI can give multiple interfaces per each Kubernetes pod. Pros
tus Using Multus CRDs, you can specify which pods get which interfaces - Separation of data/control planes.
and allow different interfaces depending on the use case. - Separate security policies can be
used for separate interfaces.
- Supports SR-IOV, DPDK, OVS-
DPDK, and VPP workloads in
Kubernetes with both cloud-native
and NFV-based applications in
Kubernetes.

Tanzu Kubernetes Grid Infrastructure Networking


You can deploy Tanzu Kubernetes Grid on various networking stacks, including:

VMware NSX-T Data Center Networking.

vSphere Networking (VDS) with NSX Advanced Load Balancer.

Note

The scope of this document is limited to VMware NSX-T Data Center Networking
with NSX Advanced Load Balancer.

Tanzu Kubernetes Grid on NSX-T Networking with NSX


Advanced Load Balancer
When deployed on VMware NSX-T Networking, Tanzu Kubernetes Grid uses the NSX-T logical
segments and gateways to provide connectivity to Kubernetes control plane VMs, worker nodes,
services, and applications. All hosts from the cluster where Tanzu Kubernetes clusters are deployed
are configured as NSX-T Transport nodes, which provide network connectivity to the Kubernetes
environment.

You can configure NSX Advanced Load Balancer in Tanzu Kubernetes Grid as:

A load balancer for workloads in the clusters that are deployed on vSphere.

The L7 ingress service provider for the workloads in the clusters that are deployed on
vSphere.

VMware, Inc 25
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

The VIP endpoint provider for the control plane API server.

Each workload cluster integrates with NSX Advanced Load Balancer by running an Avi Kubernetes
Operator (AKO) on one of its nodes. The cluster’s AKO calls the Kubernetes API to manage the
lifecycle of load balancing and ingress resources for its workloads.

NSX Advanced Load Balancer Components


NSX Advanced Load Balancer is deployed in No-Orchestrator mode in VMC on AWS environment
because the cloudadmin user does not have all required permissions to perform write operations to
the vCenter API, which is essential. Therefore, the NSX Advanced Load Balancer controller cannot
orchestrate the deployment of service engines.

NSX Advanced Load Balancer service engines must be deployed before load balancing services can
be requested by Kubernetes.

The following are the core components of NSX Advanced Load Balancer:

NSX Advanced Load Balancer Controller - NSX Advanced Load Balancer Controller
manages Virtual Service objects and interacts with the vCenter Server infrastructure to
manage the lifecycle of the service engines (SEs). It is the central repository for the
configurations and policies related to services and management and provides the portal for
viewing the health of virtual services and SEs and the associated analytics provided by NSX
Advanced Load Balancer.

NSX Advanced Load Balancer Service Engine - The service engines (SEs) are lightweight
VMs that handle all data plane operations by receiving and executing instructions from the
controller. The SEs perform load balancing and all client- and server-facing network
interactions.

Avi Kubernetes Operator (AKO) - An Avi Kubernetes operator runs as a pod in the
management cluster and Tanzu Kubernetes clusters and provides ingress and load balancing
functionality. AKO translates the required Kubernetes objects to NSX Advanced Load
Balancer objects and automates the implementation of ingresses, routes, and services on the
service engines (SE) through the NSX Advanced Load Balancer controller.

AKO Operator (AKOO) - The AKO operator takes care of deploying, managing, and
removing AKO from Kubernetes clusters. When deployed, this operator creates an instance
of the AKO controller and installs all the relevant objects, including:

AKO StatefulSet

ClusterRole and ClusterRoleBinding

ConfigMap (required for the AKO controller and other artifacts)

Tanzu Kubernetes Grid management clusters have an AKO operator installed out-of-the-box during
cluster deployment. By default, a Tanzu Kubernetes Grid management cluster has a couple of
AkoDeploymentConfig created which dictates when and how AKO pods are created in the workload
clusters. For more information, see AKO Operator documentation.

Optionally, you can enter one or more cluster labels to identify clusters on which to selectively
enable NSX ALB or to customize NSX ALB settings for different groups of clusters. This is useful in
the following scenarios: - You want to configure different sets of workload clusters to different

VMware, Inc 26
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Service Engine Groups to implement isolation or to support more Service type Load Balancers than
one Service Engine Group’s capacity. - You want to configure different sets of workload clusters to
different Clouds because they are deployed in different sites.

To enable NSX ALB selectively rather than globally, add labels in the format key: value pair in the
management cluster config file. This will create a default AKO Deployment Config (ADC) on
management cluster with the NSX ALB settings provided. Labels that you define here will be used to
create a label selector. Only workload cluster objects that have the matching labels will have the load
balancer enabled.

To customize the NSX ALB settings for different groups of clusters, create an AKO Deployment
Config (ADC) on management cluster by customizing the NSX ALB settings, and providing a unique
label selector for the ADC. Only the workload cluster objects that have the matching labels will have
these custom settings applied.

You can label the cluster during the workload cluster deployment or label it manually post cluster
creation. If you define multiple key-values, you need to apply all of them. - Provide an AVI_LABEL
in the below format in the workload cluster deployment config file, and it will automatically label the
cluster and select the matching ADC based on the label selector during the cluster deployment.
AVI_LABELS: | 'type': 'tkg-workloadset01' - Optionally, you can manually label the cluster object
of the corresponding workload cluster with the labels defined in ADC. kubectl label cluster
<cluster-name> type=tkg-workloadset01

Each environment configured in NSX Advanced Load Balancer is referred to as a cloud. Each cloud
in NSX Advanced Load Balancer maintains networking and NSX Advanced Load Balancer service
engine settings. The cloud is configured with one or more VIP networks to provide IP addresses to
load balancing (L4/L7) virtual services created under that cloud.

The virtual services can be spanned across multiple service engines if the associated service engine
group is configured in Active/Active HA mode. A service engine can belong to only one service
engine group at a time.

IP address allocation for virtual services can be over DHCP or through the in-built IPAM functionality
of NSX Advanced Load Balancer. The VIP networks created or configured in NSX Advanced Load
Balancer are associated with the IPAM profile.

Tanzu Kubernetes Grid Clusters Recommendations


Decision
Design Decision Design Justification Design Implications
ID

TKO- Register the management Tanzu Mission Control automates the creation of Only Antrea CNI is
TKG-001 cluster with Tanzu Mission the Tanzu Kubernetes clusters and manages the supported on
Control. life cycle of all clusters centrally. Workload clusters
created from the
TMC portal.

TKO- Use NSX Advanced Load AVI is tightly coupled with TKG and vSphere. Adds NSX Advanced
TKG- Balancer as your control Since AVI is a VMware product, customers will Load Balancer
002 plane endpoint provider and have single point of contact for support. License Cost to the
for application load solution.
balancing.

VMware, Inc 27
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Decision
Design Decision Design Justification Design Implications
ID

TKO- Deploy Tanzu Kubernetes Large form factor should suffice to integrate TKG Consumes more
TKG- Management clusters in Management Cluster with TMC, pinniped and Resources from
003 large form factor. velero deployment. This must be capable of Infrastructure.
accommodating 100+ Tanzu Workload Clusters.

TKO- Deploy Tanzu Kubernetes This deploys multiple control plane nodes and Consumes more
TKG- clusters with prod plan. provides high availability for the control plane. Resources from
004 Infrastructure.

TKO- Enable identity management Role-based access control to Tanzu Kubernetes Required External
TKG- for TKG clusters. Grid clusters. Identity
005 Management.

TKO- Enable Machine Health MachineHealthCheck controller helps to provide NA


TKG- Checks for TKG clusters. health monitoring and auto-repair for
006 management and workload clusters Machines.

Network Architecture
For deployment of Tanzu Kubernetes Grid in VMware Cloud on AWS SDDCs, separate segments are
built for the Tanzu Kubernetes Grid management cluster, Tanzu Kubernetes Grid shared services
cluster, Tanzu Kubernetes Grid workload clusters, NSX Advanced Load Balancer management,
Cluster-VIP segment for control plane HA, Tanzu Kubernetes Grid Management VIP/Data segment,
and Tanzu Kubernetes Grid workload Data/VIP segment.

The network reference design can be mapped into this general framework.

VMware, Inc 28
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

This topology provides the following benefits:

Isolates and separates SDDC management components (vCenter, ESX) from the Tanzu
Kubernetes Grid components. This reference design allows only minimum connectivity
between the Tanzu Kubernetes Grid clusters and NSX Advanced Load Balancer and the

VMware, Inc 29
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

vCenter Server.

Isolates and separates the NSX Advanced Load Balancer management network segment
from the Tanzu Kubernetes Grid management segment and the Tanzu Kubernetes Grid
workload segments.

Depending on the workload cluster type and use case, multiple workload clusters can
leverage the same logical segments or new segments can be used for each workload cluster.
To isolate and separate Tanzu Kubernetes Grid workload cluster networking from each other,
VMware recommends that you use separate logical segments for each workload cluster and
configure the required firewall between these networks. See Firewall Recommendations for
more details.

Separates provider and tenant access to the Tanzu Kubernetes Grid environment.
Only provider administrators need access to the Tanzu Kubernetes Grid management
cluster. Allowing only administrators to access the Tanzu Kubernetes Grid
management cluster prevents tenants from attempting to connect to the Tanzu
Kubernetes Grid management cluster.

Network Requirements
As per the defined architecture, the list of required networks includes:

Network Type DHCP


Description & Recommendations
Service

NSX ALB Option


NSX ALB controllers and SEs are attached to this network.
management al
network
DHCP is not a mandatory requirement on this network as IP addresses to NSX ALB
controllers can be static. IP addresses can be assigned statically to SE interfaces or by
making use of NSX ALB IPAM profiles.

TKG Yes Control plane and worker nodes of TKG management cluster clusters are attached to this
management network.
network

TKG shared Yes Control plane and worker nodes of TKG shared services cluster are attached to this
services network.
network

TKG workload Yes Control plane and worker nodes of TKG workload clusters are attached to this network.
network

TKG cluster Option Virtual services for Control plane HA of all TKG clusters (management, shared services,
VIP/data al and workload).
network

TKG Option Virtual services for all user-managed packages (such as Contour and Harbor) hosted on
management al the shared services cluster.
VIP/data
network

TKG workload Option Virtual services for all applications hosted on the workload clusters.
VIP/data al
network

VMware, Inc 30
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Network Recommendations
The key network recommendations for a production-grade Tanzu Kubernetes Grid deployment with
NSX-T Data Center Networking are as follows:

Decision
Design Decision Design Justification Design Implications
ID

TKO- Use separate networks To have a flexible firewall and Sharing the same network for multiple
NET-001 for TKG management security policies. clusters can complicate the creation of
and workload clusters. firewall rules.

TKO- Use separate networks Isolate production Kubernetes


A separate set of service engines can be
NET- for workload clusters clusters from dev/test clusters.
used to separate dev/test workload
002 based on their usage.
clusters from prod clusters.

TKO- Configure DHCP for TKG Tanzu Kubernetes Grid does not Enable DHCP on the logical segments
NET- clusters. support static IP assignments for that are used to host TKG clusters.
003 Kubernetes VM components.

Subnet and CIDR Examples


The deployment described in this document makes use of the following CIDR.

Gateway
Network Type Port Group Name DHCP Pool NSX ALB IP Pool
CIDR

NSX ALB management sfo01-w01-vds01- 192.168.11.1 192.168.11.15 - NA


network albmanagement /27 192.168.11.30

TKG management sfo01-w01-vds01- 192.168.12. 192.168.12.2 - NA


network tkgmanagement 1/24 192.168.12.251

TKG workload network sfo01-w01-vds01- 192.168.13. 192.168.13.2 - NA


tkgworkload 1/24 192.168.13.251

TKG cluster VIP network sfo01-w01-vds01- 192.168.14. 192.168.14.2 - 192.168.14.31 -


tkgclustervip 1/26 192.168.14.30 192.168.14.60

TKG Management VIP sfo01-w01-vds01- 192.168.15. 192.168.15.2 - 192.168.15.31 -


Network tkgmanagementvip 1/26 192.168.15.30 192.168.15.60

TKG workload VIP sfo01-w01-vds01- 192.168.16. 192.168.16.2 - 192.168.16.31 -


network tkgworkloadvip 1/26 192.168.16.30 192.168.16.60

TKG shared services sfo01-w01-vds01- 192.168.17. 192.168.17.2 - NA


network tkgshared 1/24 192.168.17.251

Firewall Requirements
To prepare the firewall, you must collet the following information:

1. NSX Advanced Load Balancer controller nodes and Cluster IP address

2. NSX Advanced Load Balancer management network CIDR

3. Tanzu Kubernetes Grid management network CIDR

4. Tanzu Kubernetes Grid shared services network CIDR

VMware, Inc 31
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

5. Tanzu Kubernetes Grid workload network CIDR

6. Tanzu Kubernetes Grid cluster VIP address range

7. Tanzu Kubernetes Grid Management VIP address range

8. Tanzu Kubernetes Grid Workload VIP address range

9. Client machine IP address

10. Bootstrap machine IP address

11. Harbor registry IP address

12. vCenter Server IP address

13. DNS server IP address(es)

14. NTP server(s)

VMware Cloud on AWS uses a management gateway and compute gateway. These gateways need
firewall rules to allow traffic for Tanzu Kubernetes Grid deployments.

The following table provides a list of firewall rules based on the assumption that there is no firewall
within a subnet/VLAN.

Source Destination Protocol:Port Description

Client machine NSX ALB controller nodes TCP:443 To access NSX ALB portal for configuration.
and cluster IP address.

Client machine vCenter Server TCP:443 To create resource pools, VM folders, etc, in
vCenter.

projects.registry.vmware.c TCP:443 To pull binaries from VMware public repo for


Bootstrap machine
om TKG installation.

TKG VIP network (cluster TCP:6443 Allows bootstrap machine to communicate


Bootstrap machine
endpoint) with cluster.

TKG management DNS Server UDP:53 DNS service


network CIDR

TKG shared services NTP server UDP:123 Time synchronization


network CIDR.

TKG workload
network CIDR.

vCenter IP TCP:443 Allows components to access vCenter to


TKG management
create VMs and storage volumes.
network CIDR

TKG shared services


network CIDR.

TKG workload
network CIDR.

VMware, Inc 32
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Source Destination Protocol:Port Description

TKG cluster VIP Range. TCP:6443 For management cluster to configure shared
TKG management
services and workload clusters.
network CIDR.

TKG shared services


network CIDR.

TKG workload
network CIDR.

Internet TCP:443 For interaction with Tanzu Mission Control,


TKG management
Tanzu Observability, and Tanzu Service Mesh.
network

TKG shared services


network

TKG workload
networks

NSX ALB controllers and TCP:443 Allow Avi Kubernetes Operator (AKO) and AKO
TKG management
cluster IP address. Operator (AKOO) access to NSX ALB
network
controller.
TKG shared services
network

TKG workload
networks

NSX ALB vCenter and ESXi Hosts TCP:443 Allow NSX ALB to discover vCenter objects
controllers. and deploy SEs as required.

NSX ALB
DNS server UDP:53 DNS service
management
network CIDR. NTP server UDP:123 Time synchronization

Optional Firewall Rules

Source Destination Protocol:Port Description

TKG management TCP:443 Allows components to retrieve container images


Harbor registry
network CIDR from a local image registry.
(optional)
TKG shared services
network CIDR.

TKG workload network


CIDR.

TCP:443
Client machine console.cloud.vmw To access cloud Services portal to configure
are.com networks in VMC SDDC.

*.tmc.cloud.vmwar To access the TMC portal for TKG clusters


e.com registration and other SaaS integration.

projects.registry.vm To pull binaries from VMware public repo for TKG


ware.com installation.

VMware, Inc 33
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Source Destination Protocol:Port Description

Client machine To http/https workloads in shared services and


TKG management TCP:80
workload cluster.
VIP range.
TCP:443
TKG Workload VIP
range.

NSX Advanced Load Balancer Recommendations


The following table provides the recommendations for configuring NSX Advanced Load Balancer in
a vSphere with Tanzu environment.

Decision Design
Design Decision Design Justification
ID Implications

TKO- Deploy NSX ALB controller Isolate NSX ALB traffic from infrastructure management Additional
ALB-001 cluster nodes on a network traffic and Kubernetes workloads. Network
dedicated to NSX-ALB. (VLAN ) is
required.

TKO- Deploy 3 NSX ALB To achieve high availability for the NSX ALB platform. In Additional
ALB- controllers nodes. clustered mode, NSX ALB availability is not impacted by an resource
002 individual controller node failure. The failed node can be requirement
removed from the cluster and redeployed if recovery is not s.
possible. Provides the highest level of uptime for a site.

TKO- Under Compute policies vSphere places NSX Advanced Load Balancer Controller Affinity
ALB- create ‘VM-VM anti-affinity’ VMs in a way that always ensures maximum HA. Rules needs
003 rule that prevents to be
collocation of the NSX ALB configured
Controllers VMs on the manually.
same host.

TKO- Add vCenter as No VMConAWS NSX-T have only limited Access. Service
ALB- Orchestrator type. Engines
CloudAdmin user have Limited Access.
004 need to be
deployed
manually.

VIP
networks
and Server
Networks
need to be
assigned
manually.

Service
engine
group need
to be
assigned
manually.

TKO- Use static IP addresses for NSX ALB Controller cluster uses management IP addresses None
ALB- the NSX ALB controllers. to form and maintain quorum for the control plane cluster.
005 Any changes to management IP addresses will be disruptive.

VMware, Inc 34
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Decision Design
Design Decision Design Justification
ID Implications

TKO- Reserve an IP address in NSX ALB portal is always accessible over cluster IP address Additional
ALB- the NSX ALB management regardless of a specific individual controller node failure. IP is
006 subnet to be used as the required.
cluster IP address for the
controller cluster.

TKO- Create a dedicated Guarantees the CPU and Memory allocation for NSX ALB None
ALB- resource pool with Controllers and avoids performance degradation in case of
007 appropriate reservations for resource contention.
NSX ALB controllers.

TKO- Replace default NSX ALB To establish a trusted connection with other infra None,
ALB- certificates with Custom CA components, and the default certificate does not include SAN entries
008 or Public CA-signed SAN entries which is not acceptable by Tanzu. are not
certificates that contains applicable if
SAN entries of all wild card
Controller nodes. certificate is
used.

TKO- Configure NSX ALB backup Periodic backup of NSX ALB configuration database is Additional
ALB- with a remote server as recommended. The database defines all clouds, all virtual Operational
009 backup location. services, all users, and others. As a best practice, store Overhead.
backups in an external location to provide backup Additional
capabilities in case of entire cluster failure. infrastructur
e Resource.

TKO- Configure Remote logging For operations teams to be able to centrally monitor NSX Additional
ALB-010 for NSX ALB Controller to ALB and escalate alerts, events must be sent from the NSX Operational
send events on Syslog. ALB Controller. Overhead.
Additional
infrastructur
e Resource

TKO- Use LDAP/SAML based Helps to Maintain Role based Access Control. Additional
ALB-011 Authentication for NSX Configuratio
ALB. n is
required.

NSX Advanced Load Balancer Service Engine Recommendations


Decision
Design Decision Design Justification Design Implications
ID

TKO- NSX ALB Service Engine Provides higher resiliency, optimum Requires Enterprise
ALB-SE- High Availability set to performance, and utilization compared to Licensing.
001 Active/Active. N+M and/or Active/Standby. Certain applications might
not work in Active/ Active
mode. For instance,
applications that require
preserving client IP use the
Legacy Active/ Standby HA
mode.

VMware, Inc 35
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Decision
Design Decision Design Justification Design Implications
ID

TKO- Dedicated Service Engine SE resources are guaranteed for TKG Dedicated service engine
ALB-SE- Group for the TKG Management Stack and provides data path Groups increase licensing
002 Management. segregation for Management and Tenant cost.
Application.

TKO- Dedicated Service Engine SE resources are guaranteed for single or Dedicated service engine
ALB-SE- Group for the TKG Workload set of workload clusters and provides data Groups increase licensing
003 Clusters Depending on the path segregation for Tenant Application cost.
nature and type of workloads hosted on workload clusters.
(dev/prod/test).

TKO- Enable ALB Service Engine Enable SEs to elect a primary amongst None
ALB-SE- Self Elections. themselves in the absence of connectivity
004 to the NSX ALB controller.

TKO- Set ‘Placement across the This allows maximum utilization of None
ALB-SE- Service Engines’ setting to capacity.
005 ‘Compact’.

TKO- Under Compute policies vSphere will take care of placing the Service Affinity Rules needs to be
ALB-SE- Create a ‘VM-VM anti-affinity Engine VMs in a way that always ensures configured manually.
006 rule for SE engines part of maximum HA for the Service Engines part
the same SE group that of a Service Engine group.
prevents collocation of the
Service Engine VMs on the
same host.

TKO- Reserve Memory and CPU for The Service Engines are a critical You must perform
ALB-SE- Service Engines. infrastructure component providing load- additional configuration to
007 balancing services to mission-critical set up the reservations.
applications. Guarantees the CPU and
Memory allocation for SE VM and avoids
performance degradation in case of
resource contention.

Kubernetes Ingress Routing


The default installation of Tanzu Kubernetes Grid does not install an ingress controller. Users can
install Contour (available for installation through Tanzu Packages) or any third-party ingress controller
of their choice.

Contour is an open-source controller for Kubernetes Ingress routing. Contour can be installed in the
shared services cluster on any Tanzu Kubernetes Cluster. Deploying Contour is a prerequisite if you
want to deploy the Prometheus, Grafana, and Harbor Packages on a workload cluster.

For more information about Contour, see Contour site and Implementing Ingress Control with
Contour.

Another option for ingress control is to use the NSX Advanced Load Balancer Kubernetes ingress
controller which offers an advanced L7 ingress for containerized applications that are deployed in the
Tanzu Kubernetes workload cluster.

VMware, Inc 36
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

For more information about the NSX Advanced Load Balancer ingress controller, see Configuring L7
Ingress with NSX Advanced Load Balancer.

Tanzu Service Mesh, a SaaS offering for modern applications running across multi-cluster, multi-
clouds, also offers an ingress controller based on Istio.

Each ingress controller has its own pros and cons. The following table provides general
recommendations for choosing an ingress controller for your Kubernetes environment.

Ingress
Use Cases
Controller

Contour
Use contour when only north-south traffic is needed in a Kubernetes cluster. You can apply security
policies for north-south traffic by defining the policies in the applications manifest file.

It’s a reliable solution for simple Kubernetes workloads.

Istio Use Istio ingress controller when you intend to provide security, traffic direction, and insight within
the cluster (east-west traffic) and between the cluster and the outside world (north-south traffic).

NSX ALB
Use NSX ALB ingress controller when a containerized application requires features like local and
Ingress
global server load balancing (GSLB), web application firewall (WAF), performance monitoring, and so
controller
on.

NSX Advanced Load Balancer as in L4+L7 Ingress Service Provider


As a load balancer, NSX Advanced Load Balancer provides an L4+L7 load balancing solution for
vSphere. It includes a Kubernetes operator that integrates with the Kubernetes API to manage the
lifecycle of load balancing and ingress resources for workloads.

Legacy ingress services for Kubernetes include multiple disparate solutions. The services and

VMware, Inc 37
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

products contain independent components that are difficult to manage and troubleshoot. The ingress
services have reduced observability capabilities with little analytics, and they lack comprehensive
visibility into the applications that run on the system. Cloud-native automation is difficult in the legacy
ingress services.

In comparison to the legacy Kubernetes ingress services, NSX Advanced Load Balancer has
comprehensive load balancing and ingress services features. As a single solution with a central
control, NSX Advanced Load Balancer is easy to manage and troubleshoot. NSX Advanced Load
Balancer supports real-time telemetry with an insight into the applications that run on the system.
The elastic auto-scaling and the decision automation features highlight the cloud-native automation
capabilities of NSX Advanced Load Balancer.

NSX Advanced Load Balancer also lets you configure L7 ingress for your workload clusters by using
one of the following options:

L7 ingress in ClusterIP mode

L7 ingress in NodePortLocal mode

L7 ingress in NodePort mode

NSX Advanced Load Balancer L4 ingress with Contour L7 ingress

L7 Ingress in ClusterIP Mode

This option enables NSX Advanced Load Balancer L7 ingress capabilities, including sending traffic
directly from the service engines (SEs) to the pods, preventing multiple hops that other ingress
solutions need when sending packets from the load balancer to the right node where the pod runs.
The NSX Advanced Load Balancer controller creates a virtual service with a backend pool with the
pod IP addresses which helps to send the traffic directly to the pods.

However, each workload cluster needs a dedicated SE group for Avi Kubernetes Operator (AKO) to
work, which could increase the number of SEs you need for your environment. This mode is used
when you have a small number of workload clusters.

L7 Ingress in NodePort Mode

The NodePort mode is the default mode when AKO is installed on Tanzu Kubernetes Grid. This
option allows your workload clusters to share SE groups and it is fully supported by VMware. With
this option, the services of your workloads must be set to NodePort instead of ClusterIP even when
accompanied by an ingress object. This ensures that NodePorts are created on the worker nodes
and traffic can flow through the SEs to the pods via the NodePorts. Kube-Proxy, which runs on each
node as DaemonSet, creates network rules to expose the application endpoints to each of the nodes
in the format “NodeIP:NodePort”. The NodePort value is the same for a service on all the nodes. It
exposes the port on all the nodes of the Kubernetes Cluster, even if the pods are not running on it.

L7 Ingress in NodePortLocal Mode

This feature is supported only with Antrea CNI. The primary difference between this mode and the
NodePort mode is that the traffic is sent directly to the pods in your workload cluster through node
ports without interfering Kube-proxy. With this option, the workload clusters can share SE groups.
Similar to the ClusterIP Mode, this option avoids the potential extra hop when sending traffic from the

VMware, Inc 38
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

NSX Advanced Load Balancer SEs to the pod by targeting the right nodes where the pods run.

Antrea agent configures NodePortLocal port mapping rules at the node in the format
“NodeIP:Unique Port” to expose each pod on the node on which the pod of the service is running.
The default range of the port number is 61000-62000. Even if the pods of the service are running
on the same Kubernetes node, Antrea agent publishes unique ports to expose the pods at the node
level to integrate with the load balancer.

NSX Advanced Load Balancer L4 Ingress with Contour L7 Ingress

This option does not have all the NSX Advanced Load Balancer L7 ingress capabilities but uses it for
L4 load balancing only and leverages Contour for L7 Ingress. This also allows sharing SE groups
across workload clusters. This option is supported by VMware and it requires minimal setup.

NSX Advanced Load Balancer L7 Ingress Recommendations


Decision
Design Decision Design Justification Design Implications
ID

TKO- Deploy NSX ALB L7 - Gives good Network hop efficiency. Supported only with
ALB-L7- ingress in - Helps to reduce the east-west traffic and Antrea CNI with IPV4
001 NodePortLocal mode. encapsulation overhead. addressing.
- Service Engine groups are shared across clusters
and the load-balancing persistence is also
supported.

Container Registry
VMware Tanzu for Kubernetes Operations using Tanzu Kubernetes Grid includes Harbor as a
container registry. Harbor provides a location for pushing, pulling, storing, and scanning container
images used in your Kubernetes clusters.

Harbor registry is used for day-2 operations of the Tanzu Kubernetes workload clusters. Typical day-
2 operations include tasks such as pulling images from Harbor for application deployment, pushing
custom images to Harbor, etc.

You may use one of the following methods to install Harbor:

Tanzu Kubernetes Grid Package deployment - VMware recommends this installation


method for general use cases. The Tanzu packages, including Harbor, must either be pulled
directly from VMware or be hosted in an internal registry.

VM-based deployment using OVA - VMware recommends this installation method in cases
where Tanzu Kubernetes Grid is being installed in an air-gapped or Internet-restricted
environment, and no pre-existing image registry exists to host the Tanzu Kubernetes Grid
system images. VM-based deployments are only supported by VMware Global Support
Services to host the system images for air-gapped or Internet-restricted deployments. Do
not use this method for hosting application images.

If you are deploying Harbor without a publicly signed certificate, you must include the Harbor root
CA in your Tanzu Kubernetes Grid clusters. To do so, follow the procedure in Trust Custom CA
Certificates on Cluster Nodes.

VMware, Inc 39
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Tanzu Kubernetes Grid Monitoring


Monitoring for the Tanzu Kubernetes clusters is provided through Prometheus and Grafana. Both
Prometheus and Grafana can be installed on Tanzu Kubernetes Grid clusters using Tanzu Packages.

Prometheus is an open-source system monitoring and alerting toolkit. It can collect metrics from
target clusters at specified intervals, evaluate rule expressions, display the results, and trigger alerts if
certain conditions arise. The Tanzu Kubernetes Grid implementation of Prometheus includes Alert
Manager, which you can configure to notify you when certain events occur.

Grafana is an open-source visualization and analytics software. It allows you to query, visualize, alert
on, and explore your metrics no matter where they are stored. Both Prometheus and Grafana are
installed through user-managed Tanzu packages by creating the deployment manifests and invoking
the tanzu package install command to deploy the packages in the Tanzu Kubernetes clusters.

The following diagram shows how the monitoring components on a cluster interact.

You can use out-of-the-box Kubernetes dashboards or you can create new dashboards to monitor
compute, network, and storage utilization of Kubernetes objects such as Clusters, Namespaces,

VMware, Inc 40
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Pods, etc.

You can also monitor your Tanzu Kubernetes Grid clusters using Tanzu Observability which is a SaaS
offering by VMware. Tanzu Observability provides various out-of-the-box dashboards. You can
customize the dashboards for your particular deployment. For information on how to customize
Tanzu Observability dashboards for Tanzu for Kubernetes Operations, see Customize Tanzu
Observability Dashboard for Tanzu for Kubernetes Operations.

Tanzu Kubernetes Grid Logging


Metrics and logs are critical for any system or application as they provide insights into the activities of
the system or the application. It is important to have a central place to observe a multitude of metrics
and log sources from multiple endpoints.

Log processing and forwarding in Tanzu Kubernetes Grid is provided via Fluent Bit. Fluent bit
binaries are available as part of extensions and can be installed on management cluster or in
workload cluster. Fluent Bit is a light-weight log processor and forwarder that allows you to collect
data and logs from different sources, unify them, and send them to multiple destinations. VMware
Tanzu Kubernetes Grid includes signed binaries for Fluent Bit that you can deploy on management
clusters and on Tanzu Kubernetes clusters to provide a log-forwarding service.

Fluent Bit makes use of the Input Plug-ins, the filters, and the Output Plug-ins. The Input Plug-ins
define the source from where it can collect data, and the output plug-ins define the destination
where it should send the information. The Kubernetes filter will enrich the logs with Kubernetes
metadata, specifically labels and annotations. Once you configure Input and Output plug-ins on the
Tanzu Kubernetes Grid cluster. Fluent Bit is installed as a user-managed package.

Fluent Bit integrates with logging platforms such as VMware Aria Operations for Logs, Elasticsearch,
Kafka, Splunk, or an HTTP endpoint. For more details about configuring Fluent Bit to your logging
provider, see Implement Log Forwarding with Fluent Bit.

Bring Your Own Images for Tanzu Kubernetes Grid


Deployment
You can build custom machine images for Tanzu Kubernetes Grid to use as a VM template for the
management and Tanzu Kubernetes (workload) cluster nodes that it creates. Each custom machine
image packages a base operating system (OS) version and a Kubernetes version, along with any
additional customizations, into an image that runs on vSphere, Microsoft Azure infrastructure, and
AWS (EC2) environments.

A custom image must be based on the operating system (OS) versions that are supported by Tanzu
Kubernetes Grid. The table below provides a list of the operating systems that are supported for
building custom images for Tanzu Kubernetes Grid.

vSphere AWS Azure

VMware, Inc 41
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

- Ubuntu 20.04 - Ubuntu 20.04 - Ubuntu 20.04

- Ubuntu 18.04 - Ubuntu 18.04 - Ubuntu 18.04

- RHEL 7 - Amazon Linux 2

- Photon OS 3

- Windows 2019

For additional information on building custom images for Tanzu Kubernetes Grid, see Build Machine
Images.

Linux Custom Machine Images

Windows Custom Machine Images

Compliance and Security


VMware published Tanzu Kubernetes releases (TKrs), along with compatible versions of Kubernetes
and supporting components, use the latest stable and generally-available update of the OS version
that it packages, containing all current CVE and USN fixes, as of the day that the image is built. The
image files are signed by VMware and have file names that contain a unique hash identifier.

VMware provides FIPS-capable Kubernetes OVA that can be used to deploy FIPS compliant Tanzu
Kubernetes Grid management and workload clusters. Tanzu Kubernetes Grid core components,
such as Kubelet, Kube-apiserver, Kube-controller manager, Kube-proxy, Kube-scheduler, Kubectl,
Etcd, Coredns, Containerd, and Cri-tool are made FIPS compliant by compiling them with the
BoringCrypto FIPS modules, an open-source cryptographic library that provides FIPS 140-2
approved algorithms.

Installation Experience
Tanzu Kubernetes Grid management cluster is the first component that you deploy to get started
with Tanzu Kubernetes Grid.

You can deploy the management cluster in one of the following ways:

1. Run the Tanzu Kubernetes Grid installer, a wizard interface that guides you through the
process of deploying a management cluster. VMware recommends this method if you are
installing a Tanzu Kubernetes Grid Management cluster for the first time.

2. Create and edit YAML configuration files to use with CLI commands to deploy the
management cluster.

By using the current version of the The Tanzu Kubernetes Grid Installation user interface, you can
install Tanzu Kubernetes Grid on VMware vSphere, AWS, and Microsoft Azure. The UI provides a
guided experience tailored to the IaaS, in this case on VMware vSphere backed by NSX-T Data
Center networking.

VMware, Inc 42
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

The installation of Tanzu Kubernetes Grid on VMware Cloud on AWS is done through the same UI as
mentioned above but tailored to a vSphere environment.

This installation process takes you through setting up TKG Management Cluster on your vSphere
environment. Once the management cluster is deployed, you can register the management cluster
with Tanzu Mission Control and deploy Tanzu Kubernetes shared services and workload clusters
directly from the Tanzu Mission Control UI or Tanzu CLI to deploy Tanzu Kubernetes shared service
and workload clusters.

Deployment Instructions
For instructions on how to deploy this reference design, see Deploy Tanzu for Kubernetes
Operations in VMware Cloud on AWS.

Summary
Tanzu on VMware Cloud on AWS offers high-performance potential, convenience, and addresses
the challenges of creating, testing, and updating Kubernetes platforms in a consolidated production

VMware, Inc 43
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

environment. This validated approach results in a production quality installation with all the
application services needed to serve combined or uniquely separated workload types through a
combined infrastructure solution.

This plan meets many day-0 needs for aligning product capabilities, such as configuring firewall
rules, networking, load balancing, and workload compute, to the full stack infrastructure.

Supplemental Information
Automating Deployment of Service Engines
As discussed, Avi Vantage is installed in No Orchestrator mode on VMWare Cloud on AWS.
Therefore, the deployment of service engines (SE) on VMware Cloud on AWS is not orchestrated by
the Avi Controller. Once SE is integrated with the Avi Controller, virtual service placement and
scaling can be handled centrally from the Avi Controller. A pair of service engines provide HA for
load balancing.

It is troublesome to manually deploy a pair of service engines for each tenant using the Import OVA
workflow in VMware Cloud on AWS. Therefore, we recommend using GOVC in conjunction with
Python to obtain the OVF properties as a JSON file and then customizing the JSON file for each
service engine.

The following example JSON file can be used to automate the provisioning of service engines ready
for use with Tanzu Kubernetes Grid.

{
"DiskProvisioning": "flat",
"IPAllocationPolicy": "fixedPolicy",
"IPProtocol": "IPv4",
"PropertyMapping": [
{
"Key": "AVICNTRL",
"Value": "<ip-address-of-avi-controller>"
},

{
"Key": "AVISETYPE",
"Value": "NETWORK_ADMIN"
},
{
"Key": "AVICNTRL_AUTHTOKEN",
"Value": "<avi-controller-auth-token>"
},
{
"Key": "AVICNTRL_CLUSTERUUID",
"Value": "<avi-controller-cluster-id>"
},
{
"Key": "avi.mgmt-ip.SE",
"Value": "<management-ip-address-of-service-engine>"
},
{
"Key": "avi.mgmt-mask.SE",
"Value": "255.255.255.0"
},
{

VMware, Inc 44
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

"Key": "avi.default-gw.SE",
"Value": "<avi-management-network-gateway>"
},
{
"Key": "avi.DNS.SE",
"Value": "<dns-server>"
},
{
"Key": "avi.sysadmin-public-key.SE",
"Value": ""
}
],
"NetworkMapping": [
{
"Name": "Management",
"Network": "avi-management"
},
{
"Name": "Data Network 1",
"Network": "<tkg-workload-1-cluster-network-segment-name>"
},
{
"Name": "Data Network 2",
"Network": "<tkg-workload-2-cluster-network-segment-name>"
},
{
"Name": "Data Network 3",
"Network": "<tkg-workload-3-cluster-network-segment-name>"
},
{
"Name": "Data Network 4",
"Network": "<tkg-workload-4-cluster-network-segment-name>"
},
{
"Name": "Data Network 5",
"Network": "<tkg-workload-5-cluster-network-segment-name>"
},
{
"Name": "Data Network 6",
"Network": "<tkg-workload-6-cluster-network-segment-name>"
},
{
"Name": "Data Network 7",
"Network": "<tkg-workload-7-cluster-network-segment-name>"
},
{
"Name": "Data Network 8",
"Network": "<tkg-workload-8-cluster-network-segment-name>"
},
{
"Name": "Data Network 9",
"Network": "<tkg-workload-9-cluster-network-segment-name>"
}
],
"MarkAsTemplate": false,
"PowerOn": true,
"InjectOvfEnv": false,
"WaitForIP": false,
"Name": "se-1"

VMware, Inc 45
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Provision each service engine using the following code.

export GOVC_URL=<fqdn-of-vcenter-in-vmware-cloud-on-aws>
export [email protected]
export GOVC_PASSWORD=<[email protected]>
export GOVC_INSECURE=false
govc import.spec /home/admin/se.ova | python -m json.tool > se-1.json
govc import.ova -pool=*/Resources/Compute-ResourcePool/TKG/SEs -ds=WorkloadDatastore -
-options=/home/admin/se-1.json /home/admin/se.ova

This deploys a new service engine with a VM name of _se-1_ into the resource pool _Compute-
ResourcePool/TKG/SEs_. Since the _PowerOn_ parameter is set to _true_, the service engine boots
up automatically and since we have set the key value pairs for the following, the service engine is
automatically registered with Avi Controller and is ready for further configuration in Avi Vantage:

"Key": "AVICNTRL",
"Value": "<ip-address-of-avi-controller>"
"Key": "AVICNTRL_CLUSTERUUID",
"Value": "<avi-controller-cluster-id>"
"Key": "avi.mgmt-ip.SE",
"Value": "<management-ip-address-of-service-engine>"

Appendix A - Configure Node Sizes


The Tanzu CLI creates the individual nodes of management clusters and Tanzu Kubernetes clusters
according to the settings that you provide in the configuration file.

On vSphere, you can configure all node VMs to have the same predefined configurations, set
different predefined configurations for control plane and worker nodes, or customize the
configurations of the nodes. By using these settings, you can create clusters that have nodes with
different configurations to the management cluster nodes. You can also create clusters in which the
control plane nodes and worker nodes have different configurations.

Use Predefined Node Configurations


The Tanzu CLI provides the following predefined configurations for cluster nodes:

Size CPU Memory (in GB) Disk (in GB)

Small 2 4 20

Medium 2 8 40

Large 4 16 40

Extra-large 8 32 80

To create a cluster in which all of the control plane and worker node VMs are the same size, specify
the SIZE variable. If you set the SIZE variable, all nodes will be created with the configuration that
you set.

SIZE: "large"

VMware, Inc 46
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

To create a cluster in which the control plane and worker node VMs are different sizes, specify the
CONTROLPLANE_SIZE and WORKER_SIZE options.

CONTROLPLANE_SIZE: "medium"

WORKER_SIZE: "large"

You can combine the CONTROLPLANE_SIZE and WORKER_SIZE options with the SIZE option. For
example, if you specify SIZE: "large" with WORKER_SIZE: "extra-large", the control plane nodes
will be set to large and worker nodes will be set to extra-large.

SIZE: "large"

WORKER_SIZE: "extra-large"

Define Custom Node Configurations


You can customize the configuration of the nodes rather than using the predefined configurations.

To use the same custom configuration for all nodes, specify the VSPHERE_NUM_CPUS,
VSPHERE_DISK_GIB, and VSPHERE_MEM_MIB options.

VSPHERE_NUM_CPUS: 2

VSPHERE_DISK_GIB: 40

VSPHERE_MEM_MIB: 4096

To define different custom configurations for control plane nodes and worker nodes, specify the
VSPHERE_CONTROL_PLANE_* and VSPHERE_WORKER_*

VSPHERE_CONTROL_PLANE_NUM_CPUS: 2

VSPHERE_CONTROL_PLANE_DISK_GIB: 20

VSPHERE_CONTROL_PLANE_MEM_MIB: 8192

VSPHERE_WORKER_NUM_CPUS: 4

VSPHERE_WORKER_DISK_GIB: 40

VSPHERE_WORKER_MEM_MIB: 4096

Appendix B - NSX Advanced Load Balancer Sizing Guidelines


NSX Advanced Load Balancer Controller Sizing Guidelines
Controllers are classified into the following categories:

Classification vCPUs Memory (GB) Virtual Services Avi SE Scale

Essentials 4 12 0-50 0-10

Small 8 24 0-200 0-100

Medium 16 32 200-1000 100-200

Large 24 48 1000-5000 200-400

The number of virtual services that can be deployed per controller cluster is directly proportional to

VMware, Inc 47
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

the controller cluster size. For more information, see the NSX Advanced Load Balancer
Configuration Maximums Guide.

Service Engine Sizing Guidelines


The service engines can be configured with a minimum of 1 vCPU core and 2 GB RAM up to a
maximum of 64 vCPU cores and 256 GB RAM. The following table provides guidance for sizing a
service engine VM with regards to performance:

Performance metric Per core performance Maximum performance on a single Service Engine VM

HTTP Throughput 5 Gbps 7 Gbps

HTTP requests per second 50k 175k

SSL Throughput 1 Gbps 7 Gbps

SSL TPS (RSA2K) 750 40K

SSL TPS (ECC) 2000 40K

Multiple performance vectors or features may have an impact on performance. For instance, to
achieve 1 Gb/s of SSL throughput and 2000 TPS of SSL with EC certificates, NSX Advanced Load
Balancer recommends two cores.

Deploy VMware Tanzu for Kubernetes Operations on


VMware Cloud on AWS
This document provides step-by-step instructions for deploying VMware Tanzu Kubernetes
Operations (informally known as TKO) on VMware Cloud on AWS.

The scope of the document is limited to providing the deployment steps based on the reference
design in VMware Tanzu for Kubernetes Operations on VMware Cloud on AWS Reference Design.

Deploying with VMware Service Installer for Tanzu


You can use VMware Service Installer for VMware Tanzu to automate this deployment.

VMware Service Installer for Tanzu automates the deployment of the reference designs for Tanzu for
Kubernetes Operations. It uses best practices for deploying and configuring the required Tanzu for
Kubernetes Operations components.

To use Service Installer to automate this deployment, see Deploying VMware Tanzu for Kubernetes
Operations on VMware Cloud on AWS Using Service Installer for VMware Tanzu.

Alternatively, if you decide to manually deploy each component, follow the steps provided in this
document.

Prerequisites
These instructions assume that you have the following set up:

VMware Cloud subscription

SDDC deployment

VMware, Inc 48
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Access to VMware vCenter Server over HTTPs

NTP configured on all VMware ESXi hosts and vCenter Server

Supported Component Matrix


Software Components Version

Tanzu Kubernetes Grid 2.1.x

VMware Cloud on AWS SDDC Version 1.18 and later

NSX Advanced Load Balancer 22.1.2

To verify the interoperability of other versions and products, see VMware Interoperability Matrix.

Prepare the Environment for Deploying Tanzu for


Kubernetes Operations
Before deploying Tanzu Kubernetes Operations on VMC on AWS, ensure that your environment is
set up as described in the following:

General Requirements

Network Requirements

Firewall Requirements

Resource Pools and VM Folders

Subnet and CIDR Examples

General Requirements
Your environment should meet the following general requirements:

SDDC v1.18 or later deployed in VMC on AWS.

Your SDDC has the following objects in place:

Dedicated resource pools and VM folders for collecting Tanzu Kubernetes Grid and
NSX Advanced Load Balancer VMs. Refer to the Resource Pools and VM Folders
section for more information.

NSX Advanced Load Balancer 22.1.2 OVA downloaded from the customer connect
portal and readily available for deployment.

A content library to store NSX Advanced Load Balancer Controller and service
engine OVA templates.

Depending on the OS flavor of the bootstrap VM, download and configure the following
packages from VMware Customer Connect. As part of this documentation, refer to the
section to configure required packages on the Photon OS machine.

Tanzu CLI 2.1.x

kubectl cluster CLI 1.24.9

VMware, Inc 49
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

A vSphere account with the permissions described in Required Permissions for the vSphere
Account.

Download and import NSX Advanced Load Balancer 22.1.2 OVA to Content Library.

Download the following OVA from VMware Customer Connect and import to vCenter.
Convert the imported VMs to templates.

Photon v3 Kubernetes v1.24.9 OVA

Ubuntu 2004 Kubernetes v1.24.9 OVA

Note

You can also download supported older versions of Kubernetes from VMware
Customer Connect and import them to deploy workload clusters on the intended
Kubernetes versions.

In Tanzu Kubernetes Grid nodes, it is recommended to not use hostnames with


“.local” domain suffix. For more information, see KB article.

VMC vCenter fqdn should redirect to vCenter local IP.

Resource Pools and VM Folders

The sample entries of the resource pools and folders that need to be created are as follows.

Resource Type Sample Resource Pool Name Sample Folder Name

NSX ALB Components tkg-alb-components tkg-alb-components

TKG Management components tkg-management-components tkg-management-components

TKG Shared Service Components tkg-sharedsvc-components tkg-sharedsvc-components

TKG Workload components tkg-workload01-components tkg-workload01-components

Network Requirements
Create NSX-T logical segments for deploying Tanzu for Kubernetes Operations components as per
Network Recommendations defined in the reference architecture.

Firewall Requirements
Ensure that the firewall is set up as described in Firewall Recommendations.

Subnet and CIDR Examples


For the purpose of demonstration, this document uses the following subnet CIDRs for Tanzu for
Kubernetes Operations deployment.

Gateway NSX Advanced Load


Network Type Segment Name DHCP Pool
CIDR Balancer IP Pool

VMware, Inc 50
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

NSX ALB Mgmt sfo01-w01-vds01- 192.168.11. 192.168.11.15 - NA


Network albmanagement 1/27 192.168.11.30

TKG Management sfo01-w01-vds01- 192.168.12. 192.168.12.2 - NA


Network tkgmanagement 1/24 192.168.12.251

TKG Workload sfo01-w01-vds01- 192.168.13. 192.168.13.2 - NA


Network tkgworkload01 1/24 192.168.13.251

TKG Cluster VIP sfo01-w01-vds01- 192.168.14. 192.168.14.2 - 192.168.14.31 -


Network tkgclustervip 1/26 192.168.14.30 192.168.14.60

TKG Mgmt VIP sfo01-w01-vds01- 192.168.15 192.168.15.2 - 192.168.15.31 -


Network tkgmanagementvip .1/26 192.168.15.30 192.168.15.60

TKG Workload VIP sfo01-w01-vds01- 192.168.16 192.168.16.2 - 192.168.16.31 -


Network tkgworkloadvip .1/26 192.168.16.30 192.168.16.60

TKG Shared Services sfo01-w01-vds01- 192.168.17. 192.168.17.2 - NA


Network tkgshared 1/24 192.168.17.251

Tanzu for Kubernetes Operations Deployment Overview


The high-level steps for deploying Tanzu for Kubernetes Operation on VMware Cloud on AWS are
as follows:

1. Deploy and Configure NSX Advanced Load Balancer

2. Deploy and Configure Tanzu Kubernetes Grid

3. Deploy Tanzu Kubernetes Grid Management Cluster

4. Register Management Cluster with Tanzu Mission Control

5. Deploy and Configure Shared Services Workload Cluster

6. Deploy Tanzu Kubernetes Clusters (Workload Cluster)

7. Integrate Tanzu Kubernetes Clusters with Tanzu Observability

8. Integrate Tanzu Kubernetes Clusters with Tanzu Service Mesh

9. Deploy User-Managed Packages on Tanzu Kubernetes Grid Clusters

Deploy and Configure NSX Advanced Load Balancer


NSX Advanced Load Balancer (ALB) is an enterprise-grade integrated load balancer that provides L4
- L7 load balancer support. NSX Advanced Load Balancer is deployed in No Orchestrator Access
Mode in the VMC Environment.in this mode, adding, removing, or modifying properties of a Service
Engine requires an administrator to manually perform the changes. For instance, an administrator
would need to install a new SE through the orchestrator, such as vCenter, by uploading the OVA
and setting the resource and networking properties.

For a production-grade deployment, it is recommended to deploy three instances of the NSX


Advanced Load Balancer controller for high availability and resiliency.

The following table provides a sample IP address and FQDN set for the NSX Advanced Load
Balancer controllers:

VMware, Inc 51
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

The following IP addresses are reserved for NSX Advanced Load Balancer:

Controller Node IP Address FQDN

Node01 (Primary) 192.168.11.8 sfo01albctlr01a.sfo01.rainpole.local

Node02 (Secondary) 192.168.11.9 sfo01albctlr01b.sfo01.rainpole.local

Node03 (Secondary) 192.168.11.10 sfo01albctlr01c.sfo01.rainpole.local

Controller Cluster IP 192.168.11.11 sfo01albctlr01.sfo01.rainpole.local

Follow these steps to deploy and configure NSX Advanced Load Balancer:

1. Log in to the vCenter server from the vSphere client.

2. Select the cluster where you want to deploy the NSX Advanced Load Balancer controller
node.

3. Right-click the cluster and invoke the Deploy OVF Template wizard.

4. Follow the wizard to configure the following:

Set the VM Name and Folder Location.

Select the nsx-alb-components resource pool as a compute resource.

Select the datastore for the controller node deployment.

Select the sfo01-w01-vds01-albmanagement port group for the Management


Network.

Customize the configuration by providing the Management Interface IP Address,


Subnet Mask, and Default Gateway. The remaining fields are optional and can be
left blank.

After the controller VM is deployed and powered on, connect to the URL for the node and configure
the node for your Tanzu Kubernetes Grid environment as follows:

1. Create the administrator account by setting the password and optional email address.

VMware, Inc 52
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

2. Configure System Settings by specifying the backup passphrase and DNS information.

3. (Optional) Configure Email/SMTP

VMware, Inc 53
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

4. Configure Multi-Tenant settings as follows:

IP Route Domain: Per tenant IP route domain.

Service Engine Context: Tenant context, (not shared across tenants).

5. Click Save to complete the post-deployment configuration wizard.

If you did not select the Setup Cloud After option before saving, the initial configuration wizard exits.
The Cloud configuration window does not automatically launch and you are directed to a Dashboard
view on the controller.

Configure Licensing
Tanzu for Kubernetes Operations is bundled with a license for NSX Advanced Load Balancer
Enterprise. To configure licensing, complete the following steps.

1. Navigate to the Administration > Settings > Licensing and click on the gear icon to change
the license type to Enterprise.

2. Select Enterprise as license type and click Save.

VMware, Inc 54
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

3. Once the license tier is changed, apply the NSX Advanced Load Balancer Enterprise license
key. If you have a license file instead of a license key, apply the license by selecting the
Upload a License File option.

NSX Advanced Load Balancer: NTP Configuration


To configure NTP, go to Administration > Settings > DNS/NTP > Edit and add your NTP server
details and click Save.

Note

You may also delete the default NTP servers.

VMware, Inc 55
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

NSX Advanced Load Balancer: Controller High Availability


In a production environment, VMware recommends that you deploy additional controller nodes and
configure the controller cluster for high availability and disaster recovery. Adding two additional
nodes to create a 3-node cluster provides node-level redundancy for the controller and also
maximizes performance for CPU-intensive analytics functions.

To run a 3-node controller cluster, you deploy the first node, perform the initial configuration, and
set the cluster IP address. After that, you deploy and power on two more Controller VMs, but you
must not run the initial configuration wizard or change the admin password for these controller VMs.
The configuration of the first controller VM is assigned to the two new controller VMs.

Repeat the steps provided in the Deploy NSX Advanced Load Balancer Controller section to deploy
additional controllers.

1. To configure the controller cluster, navigate to the Administration > Controller > Nodes
page and click Edit.

VMware, Inc 56
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

2. Specify the name for the controller cluster and set the Cluster IP. This IP address should be
from the NSX Advanced Load Balancer management network.

3. Under Cluster Nodes, specify the IP addresses of the two additional controllers that you have
deployed. Optionally, you can configure the name for the controllers.

4. Click Save to complete the cluster configuration.

After you click Save, the controller cluster setup starts, and the controller nodes are rebooted in the
process. It takes approximately 10-15 minutes for cluster formation to complete.

You are automatically logged out of the controller node where you are currently logged in. On
entering the cluster IP address in the browser, you can see details about the cluster formation task.

Note

Once the controller cluster is deployed, you must use the IP address of the controller

VMware, Inc 57
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

cluster, not the IP address of the individual controller node, for any further
configuration.

Connect to the NSX Advanced Load Balancer controller cluster IP/FQDN and ensure that all
controller nodes are in a healthy state.

The first controller of the cluster receives the “Leader” role. The second and third controllers work
as “Followers”.

Change NSX Advanced Load Balancer Portal Certificate


The controller must send a certificate to clients to establish secure communication. This certificate
must have a Subject Alternative Name (SAN) that matches the NSX Advanced Load Balancer
controller cluster hostname or IP address.

The controller has a default self-signed certificate, but this certificate does not have the correct SAN.
You must replace it with a valid or self-signed certificate that has the correct SAN. You can create a
self-signed certificate or upload a CA-signed certificate.

For the purpose of the demonstration, this document uses a self-signed certificate.

1. To replace the default certificate, navigate to the Templates > Security > SSL/TLS
Certificate > Create and select Controller Certificate.

2. In the New Certificate (SSL/TLS) window, enter a name for the certificate and set the type
to Self Signed.

VMware, Inc 58
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

3. Enter the following details:

Common Name - Specify the fully-qualified site name. For the site to be considered
trusted, this entry must match the hostname that the client entered in the browser.

Subject Alternate Name (SAN) - Enter the cluster IP address or FQDN of the
controller cluster nodes.

Algorithm - Select either EC or RSA.

Key Size

4. Click Save to save the certificate.

VMware, Inc 59
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

5. To change the NSX Advanced Load Balancer portal certificate, navigate to the
Administration > Settings >Access Settings page and click the pencil icon to edit the
settings.

VMware, Inc 60
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

6. Under SSL/TLS Certificate, remove the existing default certificates. From the drop-down
menu, select the newly created certificate and click Save.

7. Refresh the controller portal from the browser and accept the newly created self-signed
certificate. Ensure that the certificate reflects the updated information in the browser.

Export NSX Advanced Load Balancer Certificate


After the certificate is created, export the certificate thumbprint. The thumbprint is required later
when you configure the Tanzu Kubernetes Grid management cluster. To export the certificate,
complete the following steps.

1. Navigate to the Templates > Security > SSL/TLS Certificate page and export the certificate
by clicking Export.

2. In the Export Certificate page, click Copy to clipboard against the certificate. Do not copy
the key. Save the copied certificate to use later when you enable workload management.

VMware, Inc 61
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

NSX Advanced Load Balancer: Create No Orchestrator Cloud and


SE Groups
1. To configure the No Orchestrator Cloud, navigate to the Infrastructure > Clouds tab.

2. Click Create and select No Orchestrator from the dropdown list.

3. Provide a name for the cloud, enable IPv4 DHCP under DHCP settings, and click Save.

VMware, Inc 62
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

4. After the cloud is created, ensure that the health status of the cloud is green.

Configure Service Engine Groups


Tanzu for Kubernetes Operations deployment is based on the use of distinct service engine (SE)
groups for the Tanzu Kubernetes Grid management and workload clusters. The service engines for
the management cluster are deployed in the Tanzu Kubernetes Grid management SE group, and
the service engines for Tanzu Kubernetes Grid workload clusters are deployed in the Tanzu
Kubernetes Grid workload SE group.

TKG-Mgmt-SEG: The service engines part of this SE group hosts:

Virtual services for all load balancer functionalities requested by the Tanzu Kubernetes Grid
management and shared services clusters.

Virtual services that load balance control plane nodes of management cluster and shared
services cluster.

TKG-WLD01-SEG: Service engines part of this SE group host virtual services that load balance
control plane nodes and virtual services for all load balancer functionalities requested by the
workload clusters mapped to this SE group.

Note

- Based on your requirements, you can create additional SE groups for the workload
clusters. - Multiple workload clusters can be mapped to a single SE group. - A Tanzu
Kubernetes Grid cluster can be mapped to only one SE group for application load
balancer services.

To create and configure a new SE group, complete the following steps. The following components
are created in NSX Advanced Load Balancer.

Object Sample Name

vCenter Cloud sfo01w01vc01

Service Engine Group 1 sfo01m01segroup01

Service Engine Group 2 sfo01w01segroup01

1. Go to Infrastructure > Service Engine Group under Cloud Resources and click Create.

2. Provide a name for the SE group and configure the following settings:

High Availability Mode: Elastic HA Active/Active

VS Placement across SEs: Compact

Virtual Service per Service Engine: 10

VMware, Inc 63
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

SE Self-Election: Selected

3. Repeat the steps to create an SE group for the Tanzu Kubernetes Grid workload cluster. You
should have created two service engine groups.

Configure VIP Networks


As per the reference architecture, Tanzu for Kubernetes Operations deployment makes use of three
VIP networks:

TKG-Cluster-VIP: This network provides high availability for the control plane nodes of the
Tanzu Kubernetes Grid management cluster, shared services cluster, and the workload
clusters.

TKG-Management-VIP: This network provides VIP for the extensions (Envoy, Contour, etc.)
deployed in the shared services cluster.

TKG-Workload-VIP: This network provides VIP for the applications (of type load balancer)
deployed in the workload clusters.

Note

You can provision additional VIP networks for the network traffic separation for the
applications deployed in various workload clusters. This is a day-2 operation.

To create and configure the VIP networks, complete the following steps.

1. Go to the Infrastructure > Networks tab under Cloud Resources and click Create. Check that
the VIP networks are being created under the correct cloud.

VMware, Inc 64
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

2. Provide a name for the VIP network and uncheck the DHCP Enabled and IPv6 Auto-
Configuration options.

3. Click Add Subnet and configure the following:

IP Subnet: Subnet CIDR of the VIP network.

Static IP Address Pool: Range of IP addresses that is assigned to service engines


and the virtual services that are deployed.

4. Click Save to continue.

5. Click Save again to finish the network configuration.

Repeat the steps to create additional VIP networks.

VMware, Inc 65
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Configure Routing
After configuring the VIP networks, set the default routes for all VIP/data networks. The following
table lists the default routes used in the current environment.

Network Name Gateway Subnet Mask Next Hop

sfo01-w01-vds01-tkgclustervip 0.0.0.0/0 192.168.14.1

sfo01-w01-vds01-tkgmanagementvip 0.0.0.0/0 192.168.15.1

sfo01-w01-vds01-tkgworkloadvip 0.0.0.0/0 192.168.16.1

Note

Change the gateway subnet addresses to match your network configuration.

1. Go to the Infrastructure > VRF Context > Edit global and add Static Route.

Configuring IPAM and DNS Profiles


IPAM is required to allocate virtual IP addresses when virtual services are created. NSX Advanced
Load Balancer provides IPAM service for Tanzu Kubernetes Grid cluster VIP network, Tanzu

VMware, Inc 66
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Kubernetes Grid management VIP network and Tanzu Kubernetes Grid workload VIP network.

To create an IPAM profile, complete the following steps.

1. Navigate to the Templates > Profiles > IPAM/DNS Profiles page, click Create, and select
IPAM Profile.

2. Create the profile using the values shown in the following table.

Parameter Value

Name sfo01w01ipam01

Type AVI Vantage IPAM

Cloud for Usable Networks tkg-vmc

Usable Networks
sfo01-w01-vds01-tkgclustervip

sfo01-w01-vds01-tkgmanagementvip

sfo01-w01-vds01-tkgworkloadvip

3. Click Save to finish the IPAM creation wizard.

4. To create a DNS profile, click Create again and select DNS Profile.

Provide a name for the DNS Profile and select AVI Vantage DNS as the profile type.

Under Domain Name, specify the domain that you want to use with NSX Advanced
Load Balancer.

Optionally, set a new value in Override Record TTL for this domain. The default
value for all domains is 30 seconds.

VMware, Inc 67
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

The newly created IPAM and DNS profiles must be associated with the cloud so they can be
leveraged by the NSX Advanced Load Balancer objects created under that cloud.

To assign the IPAM and DNS profile to the cloud, go to the Infrastructure > Cloud page and edit the
cloud configuration.

1. Under IPAM Profile, select the IPAM profile.

2. Under DNS Profile, select the DNS profile and save the settings.

After configuring the IPAM and DNS profiles, verify that the status of the cloud is green.

Deploy and Configure Service Engine


Deploying a service engine is a manual process in VMC on AWS environment because NSX
Advanced Load Balancer is deployed in the no-orchestrator mode. In this mode, NSX Advanced
Load Balancer does not have access to the ESX management plane. Access to the ESX
management plane is required for automated service engine deployment.

To download the service engine image for deployment, navigate to the Infrastructure > Clouds tab,
select your cloud, click the download icon, and select type as OVA.

Wait a few minutes for the image generating task to finish. When the task is finished, the resulting
image file is immediately downloaded.

VMware, Inc 68
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Import the Service Engine Image File into the Content Library

You can use the downloaded OVA file directly to create a service engine VM, but bear in mind that
this approach requires you to upload the image to vCenter every time you need to create a new
service engine VM.

For faster deployment, import the service engine OVA image into the content library and use the
“deploy from template” wizard to create new service engine VMs.

Generate the Cluster UUID and Authentication Token

Before deploying a service engine VM, you must obtain a cluster UUID and generate an
authentication token. A cluster UUID facilitates integrating the service engine with NSX Advanced
Load Balancer Controller. Authentication between the two is performed via an authentication token.

To generate a cluster UUID and auth token, navigate to Infrastructure > Clouds and click the key
icon in front of the cloud that you have created. This opens a new popup window containing both
the cluster UUID and the auth token.

Note

You need a new auth token every time a new Service Engine instance is deployed.

Deploy Service Engine VMs for Tanzu Kubernetes Grid Management Cluster

1. To deploy a service engine VM, log in to the vSphere client and navigate to Menu > Content
Library > Your Content Library. Navigate to the Templates tab and select the service engine
template, right-click it, and choose New VM from this template.

2. Follow the VM creation wizard. On the networks page, select the management and data
networks for the SE VM.

The Management network label is mapped to the NSX Advanced Load Balancer
Management logical segment. The remaining network labels (Data Network 1 – 9) are
connected to any of the front-end virtual service’s network or back-end server’s logical
network as required. It is left disconnected if not required.

The service engine for the Tanzu Kubernetes Grid management cluster is connected to the
following networks:

Management: sfo01-w01-vds01-albmanagement

Data Network 1: sfo01-w01-vds01-tkgclustervip

Data Network 2: sfo01-w01-vds01-tkgmanagementvip

VMware, Inc 69
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Data Network 3: sfo01-w01-vds01-tkgmanagement

Data Network 4: sfo01-w01-vds01-tkgshared

3. Provide the cluster UUID and authentication token that you generated earlier on the
Customize template page on the time of SE Deployment. Configure the service engine VM
management network settings as well.

VMware, Inc 70
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

4. Repeat the steps to deploy an additional service engine VM for the Tanzu Kubernetes Grid
management cluster.

By default, service engine VMs are created in the default Service Engine Group.

To map the service engine VMs to the correct Service Engine Group,

1. Go to the Infrastructure > Service Engine tab, select your cloud, and click the pencil icon to
update the settings and link the service engine to the correct SEG.

2. Repeat the step for all service engine VMs.

On the Service Engine Group page, you can confirm the association of service engines with Service
Engine Groups.

Deploy Service Engines for Tanzu Kubernetes Grid Workload cluster

Service engine VMs deployed for Tanzu Kubernetes Grid workload cluster are connected to the

VMware, Inc 71
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

following networks:

Management: sfo01-w01-vds01-albmanagement

Data Network 1: sfo01-w01-vds01-tkgworkloadvip

Data Network 2: sfo01-w01-vds01-tkgworkload

Data Network 3: sfo01-w01-vds01-tkgclustervip

You need to deploy service engine VMs with the above settings.

After deploying the service engines, edit the service engine VMs and associate them with the
sfo01w01segroup01 Service Engine Group.

VMware, Inc 72
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

The NSX Advanced Load Balancer configuration is complete.

Deploy and Configure Tanzu Kubernetes Grid


The deployment of the Tanzu Kubernetes Grid management and workload cluster is facilitated by
setting up a bootstrap machine where you install the Tanzu CLI and Kubectl utilities which are used
to create and manage the Tanzu Kubernetes Grid instance. This machine also keeps the Tanzu
Kubernetes Grid and Kubernetes configuration files for your deployments.

The bootstrap machine runs a local kind cluster when Tanzu Kubernetes Grid management cluster
deployment is started. Once the kind cluster is fully initialized, the configuration is used to deploy the
actual management cluster on the backend infrastructure. After the management cluster is fully
configured, the local kind cluster is deleted and future configurations are performed via the Tanzu
CLI.

Deploy and Configure Bootstrap Machine


The deployment of the Tanzu Kubernetes Grid management and workload clusters is facilitated by
setting up a bootstrap machine where you install the Tanzu CLI and Kubectl utilities which are used
to create and manage the Tanzu Kubernetes Grid instance. This machine also keeps the Tanzu
Kubernetes Grid and Kubernetes configuration files for your deployments. The bootstrap machine
can be a laptop, host, or server running on Linux, macOS, or Windows that you deploy management
and workload clusters from.

The bootstrap machine runs a local kind cluster when Tanzu Kubernetes Grid management cluster
deployment is started. Once the kind cluster is fully initialized, the configuration is used to deploy the
actual management cluster on the backend infrastructure. After the management cluster is fully
configured, the local kind cluster is deleted and future configurations are performed with the Tanzu
CLI.

For this deployment, a Photon-based virtual machine is used as the bootstrap machine. For
information on how to configure for a macOS or Windows machine, see Install the Tanzu CLI and
Other Tools.

The bootstrap machine must meet the following prerequisites:

A minimum of 6 GB of RAM and a 2-core CPU.

System time is synchronized with a Network Time Protocol (NTP) server.

Docker and containerd binaries are installed. For instructions on how to install Docker, see

VMware, Inc 73
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Docker documentation.

Ensure that the bootstrap VM is connected to Tanzu Kubernetes Grid management network.

To install Tanzu CLI, Tanzu Plugins, and Kubectl utility on the bootstrap machine, follow the
instructions below:

1. Download and unpack the following Linux CLI packages from VMware Tanzu Kubernetes
Grid Download Product page.

VMware Tanzu CLI 2.1.0 for Linux

kubectl cluster cli v1.24.9 for Linux

2. Execute the following commands to install Tanzu Kubernetes Grid CLI, kubectl CLIs, and
Carvel tools.

## Install required packages


tdnf install tar zip unzip wget -y

## Install Tanzu Kubernetes Grid CLI


tar -xvf tanzu-cli-bundle-linux-amd64.tar.gz
cd ./cli/
sudo install core/v0.28.0/tanzu-core-linux_amd64 /usr/local/bin/tanzu
chmod +x /usr/local/bin/tanzu

## Verify Tanzu CLI version

[root@tkg160-bootstrap ~] # tanzu version

version: v0.28.0
buildDate: 2023-01-20
sha: 3c34115bc-dirty

## Install Tanzu Kubernetes Grid CLI Plugins

[root@tkg160-bootstrap ~] # tanzu plugin sync

Checking for required plugins...


Installing plugin 'login:v0.28.0'
Installing plugin 'management-cluster:v0.28.0'
Installing plugin 'package:v0.28.0'
Installing plugin 'pinniped-auth:v0.28.0'
Installing plugin 'secret:v0.28.0'
Installing plugin 'telemetry:v0.28.0'
Successfully installed all required plugins
✔ Done

## Verify the plugins are installed

[root@tkg160-bootstrap ~]# tanzu plugin list


NAME DESCRIPTION
SCOPE DISCOVERY VERSION STATUS
login Login to the platform
Standalone default v0.28.0 installed
management-cluster Kubernetes management-cluster operations
Standalone default v0.28.0 installed
package Tanzu package management
Standalone default v0.28.0 installed

VMware, Inc 74
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

pinniped-auth Pinniped authentication operations (usually not directly in


voked) Standalone default v0.25.0 installed
secret Tanzu secret management
Standalone default v0.28.0 installed
telemetry Configure cluster-wide telemetry settings
Standalone default v0.28.0 installed

## Install Kubectl CLI


gunzip kubectl-linux-v1.24.9+vmware.1.gz
mv kubectl-linux-v1.24.9+vmware.1 /usr/local/bin/kubectl && chmod +x /usr/local
/bin/kubectl

# Install Carvel tools

##Install ytt
cd ./cli
gunzip ytt-linux-amd64-v0.43.1+vmware.1.gz
chmod ugo+x ytt-linux-amd64-v0.43.1+vmware.1 && mv ./ytt-linux-amd64-v0.43.1+v
mware.1 /usr/local/bin/ytt

##Install kapp

cd ./cli
gunzip kapp-linux-amd64-v0.53.2+vmware.1.gz
chmod ugo+x kapp-linux-amd64-v0.53.2+vmware.1 && mv ./kapp-linux-amd64-v0.53.2+
vmware.1 /usr/local/bin/kapp

##Install kbld

cd ./cli
gunzip kbld-linux-amd64-v0.35.1+vmware.1.gz
chmod ugo+x kbld-linux-amd64-v0.35.1+vmware.1 && mv ./kbld-linux-amd64-v0.35.1+
vmware.1 /usr/local/bin/kbld

##Install impkg

cd ./cli
gunzip imgpkg-linux-amd64-v0.31.1+vmware.1.gz
chmod ugo+x imgpkg-linux-amd64-v0.31.1+vmware.1 && mv ./imgpkg-linux-amd64-v0.3
1.1+vmware.1 /usr/local/bin/imgpkg

3. Validate Carvel tools installation using the following commands.

ytt version
kapp version
kbld version
imgpkg version

4. Install yq. yq is a lightweight and portable command-line YAML processor. yq uses jq-like
syntax but works with YAML and JSON files.

wget https://github.com/mikefarah/yq/releases/download/v4.24.5/yq_linux_amd64.t
ar.gz

tar -xvf yq_linux_amd64.tar.gz && mv yq_linux_amd64 /usr/local/bin/yq

5. Install kind.

VMware, Inc 75
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.11.1/kind-linux-amd64


chmod +x ./kind
mv ./kind /usr/local/bin/kind

6. Execute the following commands to start the Docker service and enable it to start at boot.
Photon OS has Docker installed by default.

## Check Docker service status


systemctl status docker

## Start Docker Service


systemctl start docker

## To start Docker Service at boot


systemctl enable docker

7. Execute the following commands to ensure that the bootstrap machine uses cgroup v1.

docker info | grep -i cgroup

## You should see the following


Cgroup Driver: cgroupfs

8. Create an SSH key pair.

An SSH key pair is required for Tanzu CLI to connect to vSphere from the bootstrap
machine.

The public key part of the generated key is passed during the Tanzu Kubernetes Grid
management cluster deployment.

## Generate SSH key pair


## When prompted enter file in which to save the key (/root/.ssh/id_rsa): press
Enter to accept the default and provide password
ssh-keygen -t rsa -b 4096 -C "[email protected]"

## Add the private key to the SSH agent running on your machine and enter the p
assword you created in the previous step
ssh-add ~/.ssh/id_rsa
## If the above command fails, execute "eval $(ssh-agent)" and then rerun the c
ommand

9. If your bootstrap machine runs Linux or Windows Subsystem for Linux, and it has a Linux
kernel built after the May 2021 Linux security patch, for example Linux 5.11 and 5.12 with
Fedora, run the following command.

sudo sysctl net/netfilter/nf_conntrack_max=131072

All required packages are now installed and the required configurations are in place in the bootstrap
virtual machine. The next step is to deploy the Tanzu Kubernetes Grid management cluster.

Import Base Image template for Tanzu Kubernetes Grid Cluster


Deployment

VMware, Inc 76
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Before you proceed with the management cluster creation, ensure that the base image template is
imported into vSphere and is available as a template. To import a base image template into vSphere:

1. Go to the Tanzu Kubernetes Grid downloads page and download a Tanzu Kubernetes Grid
OVA for the cluster nodes.

2. For the management cluster, this must be either Photon or Ubuntu based Kubernetes
v1.24.9 OVA.

Note

Custom OVA with a custom Tanzu Kubernetes release (TKr) is also


supported, as described in Build Machine Images.

3. For workload clusters, OVA can have any supported combination of OS and Kubernetes
version, as packaged in a Tanzu Kubernetes release.

Note

Make sure you download the most recent OVA base image templates in the
event of security patch releases. You can find updated base image templates
that include security patches on the Tanzu Kubernetes Grid product
download page.

4. In the vSphere client, right-click an object in the vCenter Server inventory and select Deploy
OVF template.

5. Select Local file, click the button to upload files, and go to the downloaded OVA file on your
local machine.

6. Follow the installer prompts to deploy a VM from the OVA.

7. Click Finish to deploy the VM. When the OVA deployment finishes, right-click the VM and
select Template > Convert to Template.

Note

Do not power on the VM before you convert it to a template.

8. If using non administrator SSO account: In the VMs and Templates view, right-click the new
template, select Add Permission, and assign the tkg-user to the template with the TKG role.

For information about how to create the user and role for Tanzu Kubernetes Grid, see Required
Permissions for the vSphere Account.

Deploy Tanzu Kubernetes Grid Management Cluster


The management cluster is a Kubernetes cluster that runs cluster API operations on a specific cloud
provider to create and manage workload clusters on that provider. The management cluster is also
where you configure the shared and in-cluster services that the workload clusters use.

VMware, Inc 77
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

You can deploy management clusters in two ways:

Run the Tanzu Kubernetes Grid installer, a wizard interface that guides you through the
process of deploying a management cluster.

Create a deployment YAML configuration file and use it to deploy the management cluster
with the Tanzu CLI commands.

The Tanzu Kubernetes Grid installer wizard is an easy way to deploy the cluster. The following steps
describe the process.

1. To launch the Tanzu Kubernetes Grid installer wizard, run the following command on the
bootstrapper machine:

tanzu management-cluster create --ui --bind <bootstrapper-ip>:<port> --browser


none

2. Access the Tanzu Kubernetes Grid installer wizard by opening a browser and entering
http://<bootstrapper-ip>:port/

Note

Ensure that the port number that you enter in this command is allowed by
the bootstrap machine firewall.

3. From the Tanzu Kubernetes Grid installation user interface, click Deploy for VMware
vSphere.

4. On the IaaS Provider page, enter the IP/FQDN and credentials of the vCenter server where
the Tanzu Kubernetes Grid management cluster is to be deployed and click Connect.

VMware, Inc 78
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

If you are running a vSphere 7.x environment, the Tanzu Kubernetes Grid installer detects it
and provides a choice between deploying vSphere with Tanzu (TKGS) or the Tanzu
Kubernetes Grid management cluster.

5. Select the Deploy Tanzu Kubernetes Grid Management Cluster option.

6. Select the Virtual Datacenter and enter the SSH public key that you generated earlier.

7. On the Management Cluster Settings page, select the instance type for the control plane
node and worker node and provide the following information:

Management Cluster Name: Name for your Tanzu Kubernetes Grid management

VMware, Inc 79
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

cluster.

Control Plane Endpoint Provider: Select NSX Advanced Load Balancer for the
Control Plane HA.

Control Plane Endpoint: This is an optional field. If left blank, NSX Advanced Load
Balancer assigns an IP address from the pool sfo01-w01-vds01-tkgclustervip which
is configured in NSX Advanced Load Balancer. If you need to provide an IP address,
pick an unused IP address from the sfo01-w01-vds01-tkgclustervip static IP pool.

Deployment Type: Development (recommended for Dev or POC environments)/


Production (recommended for Production environments).

Machine Health Checks: Enable

Enable Audit Logging: Enables audit logging for Kubernetes API server and node
VMs, choose as per environmental needs. For more information, see Audit Logging.

8. On the NSX Advanced Load Balancer page, provide the following information:

NSX Advanced Load Balancer controller cluster IP address.

Controller credentials.

Controller certificate.

9. Click the Verify Credentials to select/configure the following:

Note

VMware, Inc 80
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

In Tanzu Kubernetes Grid v2.1.x, you can configure the network to separate
the endpoint VIP network of the cluster from the external IP network of the
load balancer service and the ingress service in the cluster. This feature lets
you ensure the security of the clusters by providing you an option to expose
the endpoint of your management or the workload cluster and the load
balancer service and ingress service in the cluster, in different networks.

As per the Tanzu for Kubernetes Operations 2.1.x Reference Architecture, all the control
plane endpoints connected to Tanzu Kubernetes Grid cluster VIP network and data plane
networks are connected to the respective management data VIP network or workload data
VIP network.

Cloud Name: Name of the cloud created while configuring NSX Advanced Load
Balancer sfo01w01vc01.

Workload Cluster Service Engine Group Name: Name of the service engine group
created for Tanzu Kubernetes Grid workload cluster created when configuring NSX
Advanced Load Balancer sfo01w01segroup01.

Workload Cluster Data Plane VIP Network Name & CIDR: Select sfo01-w01-vds01-
tkgworkloadvip and subnet 192.168.16.0/26.

Workload Cluster Control Plane VIP Network Name & CIDR: Select sfo01-w01-
vds01-tkgclustervip and subnet 192.168.14.0/26.

Management Cluster Service Engine Group Name: Name of the service engine
group created for Tanzu Kubernetes Grid Management Cluster created when
configuring NSX Advanced Load Balancer sfo01m01segroup01.

Management Cluster Data Plane VIP Network Name & CIDR : Select sfo01-w01-
vds01-tkgmanagementvip and subnet 192.168.15.0/26.

Management Cluster Control Plane VIP Network Name & CIDR: Select sfo01-w01-
vds01-tkgclustervip and subnet 192.168.14.0/26.

Cluster Labels: Optional. Leave the cluster labels section empty to apply the above
workload cluster network settings by default. If you specify any label here, you must
specify the same values in the configuration YAML file of the workload cluster. Else,
the system places the endpoint VIP of your workload cluster in Management Cluster
Data Plane VIP Network by default.

VMware, Inc 81
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Note

With the above configuration, all the Tanzu workload clusters use
sfo01-w01-vds01-tkgclustervip for control plane VIP network and
sfo01-w01-vds01-tkgclustervip for data plane network by default. If
you would like to configure separate VIP networks for workload
control plane or data networks, create a custom AKO Deployment
Config (ADC) and provide the respective NSXALB_LABELS in the
workload cluster configuration file. For more information on network
separation and custom ADC creation, see Configure Separate VIP
Networks and Service Engine Groups in Different Workload Clusters.

10. On the Metadata page, you can specify location and labels.

11. On the Resources page, specify the compute containers for the Tanzu Kubernetes Grid

VMware, Inc 82
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

management cluster deployment.

12. On the Kubernetes Network page, select the network where the control plane and worker
nodes are placed during management cluster deployment. Ensure that the network has
DHCP service enabled.

If the Tanzu environment is placed behind a proxy, enable the proxy and provide the proxy
details.

Note

The procedure shown in this document does not use a proxy to connect to
the Internet.

13. If LDAP is configured in your environment, see Configure Identity Management for
instructions on how to integrate an identity management system with Tanzu Kubernetes Grid.

In this example, identity management integration is deactivated.

VMware, Inc 83
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

14. Select the OS image to use for the management cluster deployment.

Note

This list appears empty if no compatible template is present in your


environment.

After you import the correct template and click Refresh, the installer detects the image
automatically.

15. Optional: Select Participate in the Customer Experience Improvement Program.

16. Click Review Configuration to verify your configuration settings.

VMware, Inc 84
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

When you click Review Configuration, the installer populates the cluster configuration file,
which is located in the ~/.config/tanzu/tkg/clusterconfigs subdirectory, with the settings
that you specified in the interface. You can optionally export a copy of this configuration file
by clicking Export Configuration.

17. Deploy the management cluster from this configuration file by running the command:

tanzu management-cluster create -f t4uv9zk25b.yaml -v 6

When the deployment is started from the UI, the installer wizard displays the deployment
logs on the screen.

Deploying the management cluster takes approximately 20-30 minutes to complete. While
the management cluster is being deployed, a virtual service is created in NSX Advanced
Load Balancer and placed on one of the service engines created in the
“sfo01m01segroup01” SE Group.

The installer automatically sets the context to the management cluster so that you can log in
to it and perform additional tasks such as verifying health of the management cluster and
deploying the workload clusters.

18. After the Tanzu Kubernetes Grid management cluster deployment, run the following
command to verify the health status of the cluster:

tanzu management-cluster get

Ensure that the cluster status reports as running and the values in the Ready column for
nodes, etc., are True.

VMware, Inc 85
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

See Examine the Management Cluster Deployment to perform additional health checks.

19. When deployment is completed successfully, run the following command to install the
additional Tanzu plugins:

[root@tkg-bootstrapper ~]# tanzu plugin sync


Checking for required plugins...
Installing plugin 'cluster:v0.28.0'
Installing plugin 'kubernetes-release:v0.28.0'
Successfully installed all required plugins
✔ Done

Register Management Cluster with Tanzu Mission Control


After the management cluster is deployed, you must register the management cluster with Tanzu
Mission Control and other SaaS products. You can deploy the Tanzu Kubernetes clusters and Tanzu
packages directly from the Tanzu Mission Control portal. Refer to the Integrate Tanzu Kubernetes
Clusters with SaaS Endpoints page for instructions.

Create AKO Deployment Config for Tanzu Kubernetes Grid


Workload Cluster
Tanzu Kubernetes Grid v2.1.x management clusters with NSX Advanced Load Balancer are deployed
with 2 AKODeploymentConfigs.

install-ako-for-management-cluster: default config for management cluster

install-ako-for-all: default config for all workload clusters. By default, all the workload
clusters reference this file for their virtual IP networks, service engine (SE) groups. This ADC
configuration does not enable NSX L7 Ingress by default.

As per this Tanzu deployment, create 2 more ADCs:

tanzu-ako-for-shared: Used by shared services cluster to deploy the Virtual services in TKG
Mgmt SE Group and the loadbalancer applications in TKG Management VIP Network.

tanzu-ako-for-workload-L7-ingress: Use this ADC only if you would like to enable NSX
Advanced Load Balancer L7 Ingress on workload cluster, otherwise leave the cluster labels
empty to apply the network configuration from default ADC install-ako-for-all.

VMware, Inc 86
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Configure AKO Deployment Config (ADC) for Shared Services


Cluster
As per the defined architecture, shared services cluster use the same control plane and data plane
network as the management cluster. Shared services cluster control plane endpoint uses TKG
Cluster VIP Network, application load balancing uses TKG Management Data VIP network, and the
virtual services are deployed in the TKG-Mgmt-SEG SE group. This configuration is enforced by
creating a custom AKO Deployment Config (ADC) and applying the respective NSXALB_LABELS while
deploying the shared services cluster.

The format of the AKODeploymentConfig YAML file is as follows.

apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
finalizers:
- ako-operator.networking.tkg.tanzu.vmware.com
generation: 2
name: <Unique name of AKODeploymentConfig>
spec:
adminCredentialRef:
name: nsx-alb-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: nsx-alb-controller-ca
namespace: tkg-system-networking
cloudName: <NAME OF THE CLOUD in ALB>
clusterSelector:
matchLabels:
<KEY>: <VALUE>
controlPlaneNetwork:
cidr: <TKG-Cluster-VIP-CIDR>
Name: <TKG-Cluster-VIP-Network>
controller: <NSX ALB CONTROLLER IP/FQDN>
dataNetwork:
cidr: <TKG-Mgmt-Data-VIP-CIDR>
name: <TKG-Mgmt-Data-VIP-Name>
extraConfigs:
cniPlugin: antrea
disableStaticRouteSync: true
ingress:
defaultIngressController: false
disableIngressClass: true
nodeNetworkList:
- networkName: <TKG-Mgmt-Network>
serviceEngineGroup: <Mgmt-Cluster-SEG>

The sample AKODeploymentConfig with sample values in place is as follows. You should add the
respective NSX ALB label type=shared-services while deploying shared services cluster to enforce
this network configuration.

cloud: ​sfo01w01vmcvc01​

service engine group: sfo01m01segroup01

Control Plane network: sfo01-w01-vds01-tkgclustervip

VMware, Inc 87
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

VIP/data network: sfo01-w01-vds01-tkgmanagementvip

Node Network: sfo01-w01-vds01-tkgmanagement

apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
generation: 2
name: tanzu-ako-for-shared
spec:
adminCredentialRef:
name: nsx_alb-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: nsx_alb-controller-ca
namespace: tkg-system-networking
cloudName: sfo01w01vc01
clusterSelector:
matchLabels:
type: shared-services
controlPlaneNetwork:
cidr: 192.168.14.0/26
name: sfo01-w01-vds01-tkgclustervip
controller: 192.168.11.8
dataNetwork:
cidr: 192.168.16.0/26
name: sfo01-w01-vds01-tkgmanagementvip
extraConfigs:
cniPlugin: antrea
disableStaticRouteSync: true
ingress:
defaultIngressController: false
disableIngressClass: true
nodeNetworkList:
- networkName: sfo01-w01-vds01-tkgmanagement
serviceEngineGroup: sfo01m01segroup01

After you have the AKO configuration file ready, use the kubectl command to set the context to
Tanzu Kubernetes Grid management cluster and create the ADC:

# kubectl config use-context sfo01w01vc01-admin@sfo01w01vc01


Switched to context "sfo01w01vc01-admin@sfo01w01vc01".

# kubectl apply -f ako-shared-services.yaml


akodeploymentconfig.networking.tkg.tanzu.vmware.com/tanzu-ako-for-shared created

Use the following command to list all AKODeploymentConfig created under the management
cluster:

# kubectl get adc


NAME AGE
install-ako-for-all 21h
install-ako-for-management-cluster 21h
tanzu-ako-for-shared 113s

VMware, Inc 88
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Configure AKO Deployment Config (ADC) for Workload Cluster to


Enable NSX Advanced Load Balancer L7 Ingress with
NodePortLocal Mode
VMware recommends using NSX Advanced Load Balancer L7 ingress with NodePortLocal mode for
the L7 application load balancing. This is enabled by creating a custom ADC with ingress settings
enabled, and then applying the NSXALB_LABEL while deploying the workload cluster.

As per the defined architecture, workload cluster control plane endpoint uses TKG Cluster VIP
Network, application load balancing uses TKG Workload Data VIP network and the virtual services are
deployed in sfo01w01segroup01 SE group.

Below are the changes in ADC Ingress section when compare to the default ADC.

disableIngressClass: set to false to enable NSX Advanced Load Balancer L7 Ingress.

nodeNetworkList: Provide the values for Tanzu Kubernetes Grid workload network name
and CIDR.

serviceType: L7 Ingress type, recommended to use NodePortLocal

shardVSSize: Virtual service size

The format of the AKODeploymentConfig YAML file for enabling NSX Advanced Load Balancer L7
Ingress is as follows.

apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
name: <unique-name-for-adc>
spec:
adminCredentialRef:
name: nsx_alb-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: nsx_alb-controller-ca
namespace: tkg-system-networking
cloudName: <cloud name configured in nsx alb>
clusterSelector:
matchLabels:
<KEY>: <value>
controller: <ALB-Controller-IP/FQDN>
controlPlaneNetwork:
cidr: <TKG-Cluster-VIP-Network-CIDR>
name: <TKG-Cluster-VIP-Network-CIDR>
dataNetwork:
cidr: <TKG-Workload-VIP-network-CIDR>
name: <TKG-Workload-VIP-network-CIDR>
extraConfigs:
cniPlugin: antrea
disableStaticRouteSync: false # required
ingress:
disableIngressClass: false # required
nodeNetworkList: # required
- networkName: <TKG-Workload-Network>
cidrs:
- <TKG-Workload-Network-CIDR>
serviceType: NodePortLocal # required

VMware, Inc 89
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

shardVSSize: MEDIUM # required


serviceEngineGroup: <Workload-Cluster-SEG>

The AKODeploymentConfig with sample values in place is as follows. You should add the respective
NSX ALB label workload-l7-enabled=true while deploying shared services cluster to enforce this
network configuration.

cloud: ​sfo01w01vc01​

service engine group: sfo01w01segroup01

Control Plane network: sfo01-w01-vds01-tkgclustervip

VIP/data network: sfo01-w01-vds01-tkgworkloadvip

Node Network: sfo01-w01-vds01-tkgworkload

apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
name: tanzu-ako-for-workload-l7-ingress
spec:
adminCredentialRef:
name: nsx_alb-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: nsx_alb-controller-ca
namespace: tkg-system-networking
cloudName: tkg-vmc
clusterSelector:
matchLabels:
workload-l7-enabled: "true"
controlPlaneNetwork:
cidr: 192.168.14.0/26
name: sfo01-w01-vds01-tkgclustervip
controller: 192.168.11.8
dataNetwork:
cidr: 192.168.15.0/26
name: sfo01-w01-vds01-tkgworkloadvip
extraConfigs:
cniPlugin: antrea
disableStaticRouteSync: false
ingress:
disableIngressClass: false
nodeNetworkList:
- cidrs:
- 192.168.13.0/24
networkName: sfo01-w01-vds01-tkgworkload
serviceType: NodePortLocal
shardVSSize: MEDIUM
serviceEngineGroup: sfo01w01segroup01

Use the kubectl command to set the context to Tanzu Kubernetes Grid management cluster and
create the ADC:

VMware, Inc 90
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

# kubectl config use-context tkg149-mgmt-vmc-admin@tkg149-mgmt-vmc


Switched to context "tkg149-mgmt-vmc-admin@tkg149-mgmt-vmc".

# kubectl apply -f workload-adc-l7.yaml


akodeploymentconfig.networking.tkg.tanzu.vmware.com/tanzu-ako-for-workload-l7-ingress
created

Use the following command to list all AKODeploymentConfig created under the management
cluster:

# kubectl get adc


NAME AGE
install-ako-for-all 22h
install-ako-for-management-cluster 22h
tanzu-ako-for-shared 82m
tanzu-ako-for-workload-l7-ingress 25s

Now that you have successfully created the AKO deployment config, you need to apply the cluster
labels while deploying the workload clusters to enable NSX Advanced Load Balancer L7 Ingress with
NodePortLocal mode.

Deploy Tanzu Kubernetes Grid Shared Services Cluster


A shared services cluster is just a Tanzu Kubernetes Grid workload cluster used for shared services.
It can be provisioned using the standard CLI command tanzu cluster create, or through Tanzu
Mission Control. Each Tanzu Kubernetes Grid instance can have only one shared services cluster.

The procedure for deploying a shared service cluster is essentially the same as the procedure for
deploying a workload cluster. The only difference is that you add a tanzu-services label to the
shared services cluster to indicate its cluster role. This label identifies the shared services cluster to
the management cluster and workload clusters.

Shared services cluster use the custom ADC tanzu-ako-for-shared created earlier to apply the
network settings similar to management cluster. This is enforced by applying the NSXALB_LABEL
type:shared while deploying the shared services cluster.

Note

The scope of this document doesn’t cover the use of a proxy for Tanzu Kubernetes
Grid deployment. If your environment uses a proxy server to connect to the internet,
ensure that the proxy configuration object includes the CIDRs for the pod, ingress,
and egress from the workload network of the Management Cluster in the No proxy
list, as described in Create a Proxy Configuration Object for a Tanzu Kubernetes Grid
Service Cluster.

1. To deploy a shared services cluster, navigate to the Clusters tab and click Create Cluster.

VMware, Inc 91
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

2. On the Create cluster page, select the Tanzu Kubernetes Grid management cluster that you
registered in the previous step and click Continue to create cluster.

3. Select the provisioner for creating the shared services cluster.

VMware, Inc 92
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

4. On the Cluster Details Page

5. Enter a name for the cluster (Cluster names must be unique within an organization).

6. select the cluster group to which you want to attach your cluster.

7. Select Cluster Class by clicking on the down arrow button.

8. Use the NSXALB_Labels created for shared cluster on AKO Deployment.

9. On the Configure page, specify the following:

In the vCenter and tlsThumbprint fields, enter the details for authentication.

From the datacenter, resourcePool, folder, network, and datastore drop down,
select the required information.

From the template drop down, select the Kubernetes version.The latest supported

VMware, Inc 93
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

version is preselected for you.

In the sshAuthorizedKeys field, enter the SSH key that was created earlier.

Enable aviAPIServerHAProvider.

10. Update POD CIDR and Service CIDR if necessary.

11. Select the high availability mode for the control plane nodes of the shared services cluster.
For a production deployment, it is recommended to deploy a highly available shared services
cluster.

VMware, Inc 94
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

12. You can optionally define the default node pool for your workload cluster.

Specify the number of worker nodes to provision.

Select OS Version.

13. Click Create Cluster to start provisioning your Shared cluster.

Cluster creation roughly takes 15-20 minutes to complete. After the cluster deployment
completes, ensure that Agent and extensions health shows green.

VMware, Inc 95
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

14. Connect to the Tanzu Management Cluster context and verify the cluster labels for the
workload cluster.

## verify the workload service cluster creation

tanzu cluster list


NAME NAMESPACE STATUS CONTROLPLANE WORKERS KUBERNETES
ROLES PLAN TKR

sfo01w0tkgshared01 default running 3/3 3/3 v1.24.9+vmware


.1 <none> prod v1.24.9---vmware.1-tkg.1

## Connect to tkg management cluster

kubectl config use-context sfo01w01tkgmgmt01-admin@sfo01w01tkgmgmt01

## Add the tanzu-services label to the shared services cluster as its cluster r
ole. In the following command "sfo01w01tkgshared01” is the name of the shared s
ervice cluster

kubectl label cluster.cluster.x-k8s.io/sfo01w0tkgshared01 cluster-role.tkg.tanz


u.vmware.com/tanzu-services="" --overwrite=true
cluster.cluster.x-k8s.io/sfo01w0tkgshared01 labeled

## Validate that TMC has applied the AVI_LABEL while deploying the cluster

kubectl get cluster sfo01w0tkgshared01 --show-labels


NAME PHASE AGE VERSION LABELS

sfo01w0tkgshared01 Provisioned 105m cluster-role.tkg.tanzu.vmwa


re.com/tanzu-services=,networking.tkg.tanzu.vmware.com/avi=tanzu-ako-for-shared
,tanzuKubernetesRelease=v1.24.9---vmware.1-tkg.1,tkg.tanzu.vmware.com/cluster-n
ame=sfo01w0tkgshared01,type=shared-services

15. Connect to admin context of the workload cluster using the following commands and validate

VMware, Inc 96
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

the ako pod status.

## Use the following command to get the admin context of workload Cluster.

tanzu cluster kubeconfig get sfo01w0tkgshared01 --admin

Credentials of cluster 'sfo01w0tkgshared01' have been saved


You can now access the cluster by running 'kubectl config use-context sfo01w0tk
gshared01-admin@sfo01w0tkgshared01'

## Use the following command to use the context of workload Cluster

kubectl config use-context sfo01w0tkgshared01-admin@sfo01w0tkgshared01

Switched to context "sfo01w0tkgshared01-admin@sfo01w0tkgshared01".

# Verify that ako pod gets deployed in avi-system namespace

kubectl get pods -n avi-system


NAME READY STATUS RESTARTS AGE
ako-0 1/1 Running 0 73m

# verify the nodes and pods status by running the command:


kubectl get nodes -o wide

kubectl get pods -A

Now that the shared services cluster is successfully created, you may proceed with deploying the
Harbor package. For more information, see Install Harbor in Deploy User-Managed Packages in
Workload Clusters.

Deploy Tanzu Kubernetes Clusters (Workload Clusters)


As per the architecture, workload clusters make use of a custom ADC to enable NSX Advanced
Load Balancer L7 ingress with NodePortLocal mode. This is enforced by providing the
NSXALB_LABEL while deploying the workload cluster.

The steps for deploying a workload cluster are the same as for a shared services cluster. However, in
step number 4, use the NSXALB Labels created for the Workload cluster on AKO Deployment.

After the Workload cluster creation verify the cluster labels and ako pod status 1. Connect to the
Tanzu Management Cluster context and verify the cluster labels for the workload cluster. ```bash ##
verify the workload service cluster creation

tanzu cluster list


NAME NAMESPACE STATUS CONTROLPLANE WORKERS KUBERNETES ROL
ES PLAN TKR

sfo01w01shared01 default running 3/3 3/3 v1.24.9+vmware.1 <none


> prod v1.24.9---vmware.1-tkg.1

sfo01w01workload01 default running 3/3 3/3 v1.24.9+vmware.1 <none


> prod v1.24.9---vmware.1-tkg.1

## Connect to tkg management cluster

VMware, Inc 97
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

kubectl config use-context sfo01w01vc01-admin@sfo01w01vc01

## Validate that TMC has applied the AVI_LABEL while deploying the cluster

kubectl get cluster sfo01w01workload01 --show-labels


NAME PHASE AGE VERSION LABELS

sfo01w01workload01 Provisioned 105m networking.tkg.tanzu.vmware.com/av


i=tanzu-ako-for-workload-l7-ingress,tanzuKubernetesRelease=v1.249---vmware.1-tkg.1,tkg
.tanzu.vmware.com/cluster-name=sfo01w01workload01,workload-l7-enabled=true

```
<!-- /* cSpell:enable */ -->

1. Connect to admin context of the workload cluster using the following commands and validate
the ako pod status.

## Use the following command to get the admin context of workload Cluster.

tanzu cluster kubeconfig get sfo01w01workload01 --admin

Credentials of cluster 'sfo01w01workload01' have been saved


You can now access the cluster by running 'kubectl config use-context sfo01w01w
orkload01-admin@sfo01w01workload01'

## Use the following command to use the context of workload Cluster

kubectl config use-context sfo01w01workload01-admin@sfo01w01workload01

Switched to context "sfo01w01workload01-admin@sfo01w01workload01".

# Verify that ako pod gets deployed in avi-system namespace

kubectl get pods -n avi-system


NAME READY STATUS RESTARTS AGE
ako-0 1/1 Running 0 73m

# verify the nodes and pods status by running the command:


kubectl get nodes -o wide

kubectl get pods -A

You can now configure SaaS components and deploy user-managed packages on the cluster.

Integrate Tanzu Kubernetes Clusters with Tanzu


Observability
For instructions on enabling Tanzu Observability on your workload cluster, see Set up Tanzu
Observability to Monitor a Tanzu Kubernetes Clusters.

Integrate Tanzu Kubernetes Clusters with Tanzu Service


Mesh

VMware, Inc 98
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

For instructions on installing Tanzu Service Mesh on your workload cluster, see Onboard a Tanzu
Kubernetes Cluster to Tanzu Service Mesh.

Deploy User-Managed Packages on Tanzu Kubernetes


clusters
For instructions on installing user-managed packages on the Tanzu Kubernetes clusters, see Deploy
User-Managed Packages in Workload Clusters.

VMware Tanzu for Kubernetes Operations on AWS


Reference Design
VMware Tanzu for Kubernetes Operations (informally known as TKO) simplifies operation of
Kubernetes for multi-cloud deployment by centralizing management and governance for clusters
and teams across on-premises, public clouds, and edge. Tanzu for Kubernetes Operations delivers
an open source aligned Kubernetes distribution with consistent operations and management to
support infrastructure and application modernization.

This document lays out a reference design for deploying VMware Tanzu for Kubernetes Operations
with Tanzu components on AWS.

The following reference design is based on the architecture and components described in VMware
Tanzu for Kubernetes Operations Reference Architecture.

Note: This reference design is supported and validated for customers deploying Tanzu Kubernetes
Grid 1.6.x on AWS.

VMware, Inc 99
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Supported Kubernetes Versions in Tanzu Kubernetes Grid v1.6


Tanzu Kubernetes Grid v1.6.0 supports the Kubernetes versions v1.23.x, v1.22.x, and v1.21.x.

Tanzu Kubernetes Grid v1.6 Components


Tanzu Kubernetes Grid v1.6 supports the following operating systems and components. The
component versions listed in parentheses are included in Tanzu Kubernetes Grid v1.6.

Function Component and Version

Infrastructure platform Native AWS

CLI, API, and package infrastructure Tanzu Framework v0.25.0

Cluster creation and management Core Cluster API (v1.1.5), Cluster API Provider AWS (v1.2.0)

Kubernetes node OS distributed with Amazon Linux 2, Ubuntu 20.04


TKG

Build your own image Amazon Linux 2, Ubuntu 18.04, Ubuntu 20.04

Container runtime Containerd (v1.6.6)

Container networking Antrea (v1.5.3), Calico (v3.22.1)

Container registry Harbor (v2.5.3)

VMware, Inc 100


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Function Component and Version

Ingress Contour (v1.20.2)

Storage Amazon EBS CSI driver (v1.8.0) and in-tree cloud providers

Authentication OIDC through Pinniped (v0.12.1), LDAP through Pinniped (v0.12.1) and
Dex

Observability Fluent Bit (v1.8.15), Prometheus (v2.36.2), Grafana (v7.5.16), Tanzu


Observability

Backup and migration Velero (v1.8.1)

Network Overview
The following network diagram shows the network layout used with this reference design. It shows
the layout for a single virtual private cloud (VPC). The network layout uses the following types of
subnets:

1. One private subnet for each AWS availability zone (AZ). These subnets are not automatically
allocated a public IP address. The default gateway is a NAT gateway.

2. One public subnet for each AWS availability zone (AZ). These subnets are automatically
allocated a public IP address. The default gateway is an Internet gateway if subnet is
connected to the Internet. A public subnet is optional if you do not need Internet ingress or
egress.

Network Recommendations

VMware, Inc 101


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

This reference design uses Tanzu Kubernetes Grid to manage the lifecycle of multiple Kubernetes
workload clusters by bootstrapping a Kubernetes management cluster with the Tanzu command line
tool. Consider the following when configuring the network for Tanzu Kubernetes Grid:

Use an internal load balancer scheme. A best practice is to create an internal load balancer
to avoid exposing the Kubernetes API to the public Internet. To avoid creating a public-
facing load balancer, you can set AWS_LOAD_BALANCER_SCHEME_INTERNAL to true in
the cluster configuration file AWS_LOAD_BALANCER_SCHEME_INTERNAL: true This setting
customizes the management cluster’s load balancer to use an internal scheme, which means
that its Kubernetes API server will not be accessible and routed over the Internet. If you use
an internal load balancer, run Tanzu Kubernetes Grid from a machine with access to the
target VPC private IP space.

If you don’t want an outbound Internet or inbound connection from AWS, you can eliminate
the public subnet.

Beware that 172.17.0.0/16 is the default docker subnet. If you are going to use that for a VPC
deployment, you must change your docker container subnet.

Storage
Tanzu Kubernetes Grid ships with the AWS cloud storage driver, which allows you to provision
stateful storage volumes in your Tanzu Kubernetes Grid cluster. The following storage classes are
available:

gp2 - General Purpose SSD (default storage class)

io1 - IOPS provisioned SSD

st1 - Throughput Optimized HHD

sc1 - Cold HDD

For more information on the available storage options see Amazon EBS volume types.

VPC Architectures
In a production deployment, Tanzu Kubernetes Grid creates a multi-AZ deployment.

We recommend that you create the VPCs before you deploy Tanzu Kubernetes Grid. Also, make
sure that you tag a public and private subnet in each AZ, including the control plane cluster, with a
key of kubernetes.io/cluster/<cluster_name>. As a best practice, ensure that the value you use for
the public and private subnets for an AZ can easily identify the subnets as belonging to the same AZ.
For example,

aws ec2 create-subnet --vpc-id $vpcId --cidr-block <ip_address> --availability-zone $


{AWS_REGION}b --tag-specifications ‘ResourceType=subnet, Tags=[{Key=Name,Value=priv-b
}]’ --output json > $WORKING_DIR/subnet-priv-b
aws ec2 create-subnet --vpc-id $vpcId --cidr-block <ip_address> --availability-zone $
{AWS_REGION}b --tag-specifications ‘ResourceType=subnet, Tags=[{Key=Name,Value=pub-b}
]’ --output json > $WORKING_DIR/subnet-pub-b

Based on your application needs and desired outcomes, you can organize your workloads using one
of the following VPC architectures.

VMware, Inc 102


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Single VPC with Multiple Availability Zones


Most use cases require only a single VPC spread across multiple AZs as shown in the reference
diagram. If more separation is needed within one VPC, more subnets can be used to provide better
IP based visibility to corporate firewalls.

Multiple VPC with Multiple Availability Zones


For more separation of application workloads on AWS, you can deploy separate Kubernetes clusters
to independent VPCs. This separation might be desirable for workloads with different compliance
requirements, across different business units, or with different levels of Internet ingress and egress.
By default, Tanzu Kubernetes Grid creates a VPC per cluster.

The following diagram shows an example architecture with multiple VPCs. The control plane load
balancers in the example architecture are configured as internal load balancers.

VMware, Inc 103


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Another variant of multiple VPC and multiple AZ design is to have one VPC for the control plane and
another for just workload clusters. The following diagram shows such a design.

Consider the following design implications when designing your network architecture.

Decision
Design Decision Design Justification Design Implications
ID

TKO- Use separate Better isolation and security policies Sharing the same network for
AWS- networks/VPCs for the between environments isolate multiple clusters can cause shortage
001 management cluster and production Kubernetes clusters from of IP addresses
workload clusters dev/test clusters

VMware, Inc 104


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Decision
Design Decision Design Justification Design Implications
ID

TKO- Use separate networks Isolate production Kubernetes clusters A separate set of Service Engines
AWS- for workload clusters from dev/test clusters can be used for separating dev/test
002 based on their usage workload clusters from production
clusters

Availability
We recommend deploying your Tanzu Kubernetes Grid cluster in an odd number of AZs to ensure
high availability of components that require consensus to operate in failure modes.

The Tanzu Kubernetes Grid management cluster performs Machine Health Checks on all Kubernetes
worker VMs. This ensures workload remain in a functional state, and can remediate issues such as:

Worker VM is accidentally deleted or corrupted.

Kubelet process on worker VM is accidentally stopped or corrupted.

This health check ensures that your worker capacity remains stable and can be scheduled for
workloads. This health check, however, does not apply to the control plane or the load balancer
VMs. The health check does not recreate VMs due to physical host failure.

Quotas
Provide sufficient quotas to support both the management cluster and the workload clusters in your
deployment. Otherwise, the cluster deployments will fail. Depending on the number of workload
clusters you will deploy, you may need to increase the AWS services quotas from their default
values. You will need to increase the quota in every region in which you plan to deploy Tanzu
Kubernetes Grid.

See Tanzu Kubernetes Grid resources in AWS account for more details.

The number of VPCs depends on the VPC architecture you select. The following table indicates the
number of VPCs for the network architectures in the network diagrams shown above.

VPC Architecture Number of VPCs

Single VPC 1

Multiple VPCs - one for each Kubernetes cluster 3

Multiple VPCs - one for the management cluster and one for workload cluster 2

See AWS service quotas for more information on AWS services default quotas.

Cluster Creation and Management


This reference design uses Tanzu Kubernetes Grid to create and manage ubiquitous Kubernetes
clusters on AWS using Kubernetes Cluster API. Tanzu Kubernetes Grid functions through the
creation of a management cluster which houses the Cluster API. The Cluster API then interacts with
the infrastructure provider to service workload Kubernetes cluster lifecycle requests.

VMware, Inc 105


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

When making design decisions for your Tanzu Kubernetes Grid clusters, consider the design
implications listed in the following table.

Decision
Design Decision Design Justification Design Implications
ID

TKO- Deploy TKG UI doesn’t provide an option to Additional parameters are required to be
CLS-001 Management specify an internal registry to use passed in the cluster deployment file. Using UI,
cluster from CLI for TKG installation. you can’t pass these additional parameters.

TKO- Use AWS internal Don’t expose Kubernetes API Create additional AWS load balancers in your
CLS- load balancer endpoints to Internet in Tanzu AWS account which may increase AWS
002 scheme for your Kubernetes Grid clusters. infrastructure cost.
control plane
endpoints

TKO- Deploy Tanzu Allow TKG clusters to have enough Create larger AWS EC2 instances into your
CLS- Kubernetes resources for all Tanzu packages. AWS account which may increase AWS
003 clusters in large infrastructure cost.
and above sizes
EC2
instances(For
example, t2.large
or greater)

TKO- Deploy Tanzu This deploys multiple control plane TKG infrastructure is not impacted by single
CLS- Kubernetes nodes and provides high node failure.
004 clusters with availability for the control plane.
Prod plan

TKO- Deploy Tanzu This deploys multiple control plane TKG infrastructure is not impacted by single
CLS- Kubernetes nodes and provides high zone failure.
005 clusters with an availability for the control plane.
odd number of
AWS AZs for HA

TKO- Enable identity To avoid usage of administrator Pinniped package helps with integrating the
CLS- management for credentials and ensure that TKG management cluster with LDAPS
006 Tanzu required users with the right roles authentication and workload cluster inherits
Kubernetes Grid have access to Tanzu Kubernetes the authentication configuration from the
clusters Grid clusters. management cluster.

VMware, Inc 106


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Decision
Design Decision Design Justification Design Implications
ID

TKO- Enable Machine The Tanzu Kubernetes Grid A MachineHealthCheck is a resource within the
CLS- Health Checks management cluster performs Cluster API that allows users to define
007 for TKG clusters Machine Health Checks on all conditions under which Machines within a
Kubernetes worker VMs and HA, Cluster should be considered unhealthy.
Machine Health Checks Remediation actions can be taken when
interoperably work together to MachineHealthCheck has identified a node as
enhance workload resiliency unhealthy.

Tanzu Editions include components for observability, as well as container registry. We


recommended installing the necessary components into a centralized shared services cluster.

Global Cluster Lifecycle Management


Registering Management cluster and attaching workload clusters to Tanzu Mission Control allows you
to manage your global portfolio of Kubernetes clusters. You can do the following with Tanzu Mission
Control:

Centralized lifecycle management: managing the creation and deletion of workload clusters
using registered management or supervisor clusters

Centralized management: viewing the inventory of clusters and the health of clusters and
their components

Authorization: centralized authentication and authorization with federated identity from


multiple sources (e.g., AD, LDAP, and SAML), plus an easy-to-use policy engine for granting
the right access to the right users across teams

Compliance: enforcing all clusters to apply the same set of policies

Data protection: managing Velero deployment, configuration, and schedule to ensure that
cluster manifests and persistent volumes are backed up & restorable

Inspection: running a Sonobouy conformance check suite to ensure Kubernetes cluster


functionality

VMware, Inc 107


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

For a complete list of Tanzu Mission Control features, see VMware Tanzu Mission Control Feature
Comparison.

To Register your management or supervisor cluster for management through Tanzu Mission Control,
navigate to Administration > Management Cluster on the Tanzu Mission Control console and follow
the prompts.

To attach your cluster for management through Tanzu Mission Control, navigate to Clusters > Attach
Cluster on the Tanzu Mission Control console and follow the prompts.

Note

If a workload cluster under management requires a proxy to access the Internet, you
can use the Tanzu Mission Control CLI to generate the YAML necessary to install
Tanzu Mission Control components on it.

VMware, Inc 108


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Tanzu Kubernetes Clusters Networking


A Tanzu Kubernetes cluster provisioned by the Tanzu Kubernetes Grid supports two Container
Network Interface (CNI) options:

Antrea

Calico

Both are open-source software that provide networking for cluster pods, services, and ingress.

When you deploy a Tanzu Kubernetes cluster using Tanzu CLI using the default configuration,
Antrea CNI is automatically enabled in the cluster. While Kubernetes does have in-built network
policies, Antrea builds on those native network policies to provide more fine-grained network
policies of its own.

Antrea has a ClusterNetworkPolicy which operates at the Kubernetes cluster level. It also has a
NetworkPolicy which limits the scope of a policy to a Kubernetes namespace. The
ClusterNetworkPolicy can be used by a Kubernetes Cluster Admin to create a security policy for the
cluster as a whole. The NetworkPolicy can be used by a developer to secure applications in a
particular namespace. See Tanzu Kubernetes Grid Security and Compliance for more details.

To provision a Tanzu Kubernetes cluster using a non-default CNI, see Deploy Tanzu Kubernetes
clusters with calico.

Each CNI is suitable for a different use case. The following table lists some common use cases for the
two CNIs that Tanzu Kubernetes Grid supports. The information in this table helps you select the
right CNI in your Tanzu Kubernetes Grid implementation.

CNI Use Case Pros and Cons

VMware, Inc 109


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Ant Enable Kubernetes pod networking with IP overlay networks using VXLAN or Pros:
rea Geneve for encapsulation. Optionally encrypt node-to-node communication Provide an option to
using IPSec packet encryption.Antrea supports advanced network use cases like configure egress IP pool or
kernel bypass and network service mesh static egress IP for the
Kubernetes workloads.
Cons:
More complicated for
network troubleshooting
because of the additional
overlay network.

Cali Calico is used in environments where factors like network performance, flexibility, Pros:
co and power are essential. - Support for network
For routing packets between nodes, Calico leverages the BGP routing protocol policies
instead of an overlay network. This eliminates the need to wrap packets with an - High network
encapsulation layer resulting in increased network performance for Kubernetes performance
workloads. - SCTP support
Cons:
- No multicast support.

Ingress and Load Balancing


Tanzu Kubernetes Grid requires load balancing for both the control plane and the workload clusters.
Tanzu Kubernetes Grid for AWS uses elastic load balancers for the control plane and workload
clusters.

For workload clusters, the Tanzu Kubernetes Grid Contour ingress controller package can be used
for layer 7 load balancing.

If you have deployed with both public and private subnets, by default you will get an Internet-facing
load balancer. If you want a private load balancer, you can specifically request one by setting
service.beta.kubernetes.io/aws-load-balancer-internal: "true" in the annotations of the
service. This setting also applies to the Contour ingress and controls whether Contour is internal-
facing or external-facing.

VMware, Inc 110


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

In Tanzu Kubernetes Grid, you can optionally deploy the external-dns package, which automates the
updates to DNS records in AWS (Route53) associated with ingress resources or LoadBalancer
services. This can also automate DNS record management for externally exposed services.

Authentication with Pinniped


The Pinniped authentication and authorization service components are deployed into the
management cluster. Pinniped uses the OIDC or LDAP identity provider (IDP) configurations
specified during the management cluster deployment. The workload cluster inherits its
authentication configurations from its management cluster. With authentication in place, a
Kubernetes administrator can enforce role-based access control (RBAC) with Kubernetes
RoleBinding resources. These resources associate an identity provider user with a given Kubernetes
role on the workload cluster.

Pinniped consists of following components:

The Pinniped Supervisor is an OIDC server that authenticates users through an external
identity provider (IDP)/LDAP, and then issues its own federation ID tokens to be passed on
to clusters based on the user information from the IDP.

The Pinniped Concierge is a credential exchange API which takes as input a credential from
an identity source (e.g., Pinniped Supervisor, proprietary IDP), authenticates the user via that
credential, and returns another credential which is understood by the host Kubernetes
cluster or by an impersonation proxy which acts on behalf of the user.

Dex Pinniped uses Dex as a broker for your upstream LDAP identity provider. Dex is only
deployed when LDAP is selected as the OIDC backend during Tanzu Kubernetes Grid
management cluster creation.

The following diagram shows the Pinniped authentication flow with an external IDP. In the diagram,

VMware, Inc 111


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

the blue arrows represent the authentication flow between the workload cluster, the management
cluster and the external IDP. The green arrows represent Tanzu CLI and kubectl traffic between the
workload cluster, the management cluster and the external IDP.

See the Pinniped docs for more information on how to integrate Pinniped into Tanzu Kubernetes
Grid with OIDC providers and LDAP.

We recommend the following best practices for managing identities in Tanzu Kubernetes Grid
provisioned clusters:

Configure Pinniped services during management cluster creation.

Limit access to cluster resources following the least privilege principle.

Limit access to management clusters to the appropriate set of users. For example, provide
access only to users who are responsible for managing infrastructure and cloud resources
but not to application developers. This is especially important because access to the
management cluster inherently provides access to all workload clusters.

Limit cluster administrator access for workload clusters to the appropriate set of users. For
example, provide access to users who are responsible for managing infrastructure and
platform resources in your organization, but not to application developers.

Connect to an identity provider to manage the user identities allowed to access cluster
resources instead of relying on administrator-generated kubeconfig files.

Observability
Metrics Monitoring with Tanzu Observability by Wavefront
(Recommended Solution)
Using VMware Tanzu Observability by Wavefront significantly enhances observability. Tanzu
Observability is a VMware SaaS application that collects and displays metrics and trace data from the
full stack platform, as well as from applications. The service provides the ability to create alerts tuned
with advanced analytics, assist in the troubleshooting of systems, and to understand the impact of

VMware, Inc 112


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

running production code.

Tanzu Observability collects data from Kubernetes and from applications running within Kubernetes.

You can configure Tanzu Observability with an array of capabilities. There are over 200 integrations
with prebuilt dashboards available in Wavefront.

The following table describes the plugins we recommend for this design:

Plugin Purpose Key Metrics Example Metrics

Wavefront Collect metrics from Kubernetes Kubernetes container and POD POD CPU usage rate
Kubernetes clusters and pods statistics
Integration

Wavefront by Adapts Istio collected metrics Istio metrics including request rates, Request rate
VMware for Istio and forwards to Wavefront trace rates, throughput, etc. (Transactions per
Second)

Custom Tanzu Observability Dashboards

Tanzu Observability provides various out-of-the-box dashboards. You can customize the dashboards
for your particular deployment. For information on how to customize Tanzu Observability dashboards
for Tanzu for Kubernetes Operations, see Customize Tanzu Observability Dashboard for Tanzu for
Kubernetes Operations.

Metrics Monitoring with Prometheus and Grafana (Alternative

VMware, Inc 113


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Solution)
Tanzu Kubernetes Grid also supports Prometheus and Grafana as an alternative on-premises solution
for monitoring Kubernetes clusters.

Prometheus exposes scrapable metrics endpoints for various monitoring targets throughout your
cluster. Metrics are ingested by polling the endpoints at a set interval. The metrics are then stored in
a time-series database. You use the Prometheus Query Language interface to explore the metrics.

Grafana is responsible for visualizing Prometheus metrics without the need to manually write the
PromQL queries. You can create custom charts and graphs in addition to the pre-packaged options.

Prometheus and Grafana are user-managed packages available with Tanzu Kubernetes Grid. For
more information about packages bundled with Tanzu Kubernetes Grid, see Install and Configure
Packages. For more information about user-managed packages, see User-Managed Packages.

Log Forwarding

VMware, Inc 114


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Tanzu also includes Fluent Bit for integration with logging platforms such as vRealize, Log Insight
Cloud, and Elasticsearch. See Fluent Bit Documentation for various logging providers.

Summary
Tanzu Kubernetes Grid on AWS offers high-performance potential, convenience, and addresses the
challenges of creating, testing, and updating cloud based Kubernetes platforms in a consolidated
production environment. This validated approach will result in a production quality installation with all
the application services needed to serve combined or uniquely separated workload types via a
combined infrastructure solution.

This plan meets many Day 0 needs for aligning product capabilities, such as configuring firewall
rules, networking, load balancing, and workload compute, to the full stack infrastructure.

Deployment Instructions
For instructions on how to deploy this reference design, see Deploy Tanzu for Kubernetes
Operations on AWS.

Deploy VMware Tanzu for Kubernetes Operations on AWS


This document outlines the steps for deploying VMware Tanzu for Kubernetes Operations (informally
known as TKO) on AWS. The deployment is based on the reference design provided in VMware
Tanzu for Kubernetes Operations on AWS Reference Design.

Deploying with VMware Service Installer for Tanzu


You can use VMware Service Installer for VMware Tanzu to automate this deployment.

VMware Service Installer for Tanzu automates the deployment of the reference designs for Tanzu for
Kubernetes Operations. It uses best practices for deploying and configuring the required Tanzu for
Kubernetes Operations components.

To use Service Installer to automate this deployment, see Deploying Tanzu for Kubernetes
Operations on Non Air-gapped AWS VPC Using Service Installer for VMware Tanzu.

Alternatively, if you decide to manually deploy each component, follow the steps provided in this
document.

Prerequisites
Before deploying VMware Tanzu for Kubernetes Operations on AWS, ensure that the following are
set up.

AWS Account: An IAM user account with administrative privileges. Choose an AWS region
where the Tanzu Kubernetes Grid (TKG) AMIs exist.

AWS Resource Quotas: Sufficient quotas to support both the management cluster and the
workload clusters in your deployment. Otherwise, the cluster deployments will fail.
Depending on the number of workload clusters you plan to deploy, you may need to
increase the AWS services quotas from their default values. You will need to increase the

VMware, Inc 115


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

quota in every region in which you deploy Tanzu Kubernetes Grid. For more information on
AWS default service quotas, see AWS service quotas in the AWS documentation.

See Tanzu Kubernetes Grid resources in AWS account for more details.

Note

The number of VPCs will depend on the VPC architecture you have selected.

Bootstrap Machine with AWS CLI Installed: The bootstrap machine can be a local device
such as a laptop, or a virtual machine running in, for example, VMware Workstation or
Fusion. Install the AWS CLI on the bootstrap machine. You can get the AWS CLI through a
package manager such as Homebrew, apt-get, or by downloading the CLI from AWS CLI.
You will use the bootstrap machine to create the AWS VPC and jumpbox.

VMware Cloud: Access to VMware Cloud to download Tanzu CLI.

For additional information about preparing to deploy Tanzu Kubernetes Grid on AWS, see Prepare
to Deploy Management Clusters to Amazon EC2.

Overview of the Deployment Steps


The following provides an overview of the major steps necessary to deploy Tanzu for Kubernetes
Operations on AWS EC2. Each steps links to the section for detailed information.

1. Set up AWS Infrastructure.

2. Create and Set Up a Jumpbox.

3. Prepare an External Identity Management.

4. Install Tanzu Kubernetes Grid Management Cluster.

5. Examine the Management Cluster Deployment.

6. Deploy Workload Clusters.

7. Install and Configure Packages into Workload Clusters.

8. Configure SaaS Services.

Set up AWS Infrastructure


The following describes the steps to create your AWS environment and configure your network. The
instructions use AWS CLI. Follow the steps in the order provided.

1. Create the AWS environment.

Be sure to select a region that has at least three availability zones.

export AWS_ACCESS_KEY_ID=xx
export AWS_SECRET_ACCESS_KEY=xx
# Should be a region with at least 3 available AZs
export AWS_REGION=us-east-1
export AWS_PAGER=""

VMware, Inc 116


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

#Set up AWS profile

aws ec2 describe-instances --profile <profile name>


export AWS_PROFILE=<profile name>

2. Define and create a working directory.

WORKING_DIR="$(pwd)"/tkg-vpc
mkdir -p $WORKING_DIR

3. Create the VPC. This deployment uses a single VPC for all clusters.

Note

Beware that 172.17.0.0/16 is the default docker0 subnet.

aws ec2 create-vpc --cidr-block 172.16.0.0/16 --tag-specifications 'ResourceTyp


e=vpc, Tags=[{Key=Name,Value=TKGVPC}]' --output json > $WORKING_DIR/vpc

# Create a second VPC like: aws ec2 create-vpc --cidr-block 172.18.0.0/16 --tag
-specifications 'ResourceType=vpc, Tags=[{Key=Name,Value=TKGVPC-2}]' --output j
son > $WORKING_DIR/vpc2

export vpcId="$(jq -r .Vpc.VpcId $WORKING_DIR/vpc)"


# Verify you have a valid VPC ID
echo $vpcId

4. For each VPC, create a public and private subnet in each AZ.

aws ec2 create-subnet --vpc-id $vpcId --cidr-block 172.16.0.0/24 --availability


-zone ${AWS_REGION}a --tag-specifications 'ResourceType=subnet, Tags=[{Key=Name
,Value=priv-a}]' --output json > $WORKING_DIR/subnet-priv-a

aws ec2 create-subnet --vpc-id $vpcId --cidr-block 172.16.1.0/24 --availabilit


y-zone ${AWS_REGION}b --tag-specifications 'ResourceType=subnet, Tags=[{Key=Na
me,Value=priv-b}]' --output json > $WORKING_DIR/subnet-priv-b

aws ec2 create-subnet --vpc-id $vpcId --cidr-block 172.16.2.0/24 --availabilit


y-zone ${AWS_REGION}c --tag-specifications 'ResourceType=subnet, Tags=[{Key=N
ame,Value=priv-c}]' --output json > $WORKING_DIR/subnet-priv-c

aws ec2 create-subnet --vpc-id $vpcId --cidr-block 172.16.3.0/24 --availability


-zone ${AWS_REGION}a --tag-specifications 'ResourceType=subnet, Tags=[{Key=Name
,Value=pub-a}]' --output json > $WORKING_DIR/subnet-pub-a

aws ec2 create-subnet --vpc-id $vpcId --cidr-block 172.16.4.0/24 --availabilit


y-zone ${AWS_REGION}b --tag-specifications 'ResourceType=subnet, Tags=[{Key=Na
me,Value=pub-b}]' --output json > $WORKING_DIR/subnet-pub-b

aws ec2 create-subnet --vpc-id $vpcId --cidr-block 172.16.5.0/24 --availabilit


y-zone ${AWS_REGION}c --tag-specifications 'ResourceType=subnet, Tags=[{Key=N
ame,Value=pub-c}]' --output json > $WORKING_DIR/subnet-pub-c

5. For each public subnet, set map-public-ip-on-launch.

VMware, Inc 117


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

# Set the public subnets to give them public IPs.


for i in $WORKING_DIR/subnet-pub-*; do
subnetId="$(jq -r .Subnet.SubnetId $i)"
aws ec2 modify-subnet-attribute --subnet-id "$subnetId" --map-public-ip-on-laun
ch
done

6. Create the Internet and NAT gateways and attach them to the relevant subnets.

aws ec2 create-internet-gateway --output json > $WORKING_DIR/inet-gw


aws ec2 create-tags --resources "$(jq -r .InternetGateway.InternetGatewayId $W
ORKING_DIR/inet-gw)" --tags Key=Name,Value="tkg-inet-gw"

aws ec2 attach-internet-gateway \


--internet-gateway-id "$(jq -r .InternetGateway.InternetGatewayId $WORKING_DIR
/inet-gw)" \
--vpc-id "$vpcId"

aws ec2 allocate-address > $WORKING_DIR/nat-eip


aws ec2 create-nat-gateway --subnet-id $(jq -r .Subnet.SubnetId $WORKING_DIR/s
ubnet-pub-a) --allocation-id $(jq -r .AllocationId $WORKING_DIR/nat-eip) --out
put json > $WORKING_DIR/nat-gw

7. If you have an existing transit gateway, you can skip the create-transit-gateway command
and just feed the transit gateway ID into the vpc-attachment command. Otherwise, execute
the following commands to create a new transit gateway.

aws ec2 create-transit-gateway --description "For TKG Transit" > $WORKING_DIR/t


ransit-gw

Please wait until the transit gateway resource creation is complete.

aws ec2 create-transit-gateway-vpc-attachment --transit-gateway-id $(jq -r .Tr


ansitGateway.TransitGatewayId $WORKING_DIR/transit-gw) --vpc-id $vpcId --subnet
-ids $(jq -r .Subnet.SubnetId $WORKING_DIR/subnet-priv-a) --subnet-ids $(jq -r
.Subnet.SubnetId $WORKING_DIR/subnet-priv-b) --subnet-ids $(jq -r .Subnet.Subne
tId $WORKING_DIR/subnet-priv-c) -- output json > $WORKING_DIR/attachment_t
ransit_gw

8. Create the routing tables.

aws ec2 create-route-table --vpc-id $vpcId --output json > $WORKING_DIR/priv-r


t
PRIV_RT_TABLE_ID="$(jq -r .RouteTable.RouteTableId $WORKING_DIR/priv-rt)"
aws ec2 create-tags --resources $PRIV_RT_TABLE_ID --tags 'Key=Name,Value=tkgvpc
-priv-rt'

aws ec2 create-route \


--route-table-id "$PRIV_RT_TABLE_ID" \
--destination-cidr-block "0.0.0.0/0" \
--nat-gateway-id $(jq -r .NatGateway.NatGatewayId $WORKING_DIR/nat-gw)

# Route any corporate IPs through your transit gw

VMware, Inc 118


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

aws ec2 create-route \


--route-table-id "$PRIV_RT_TABLE_ID" \
--destination-cidr-block "172.16.0.0/12" \
--transit-gateway-id $(jq -r .TransitGateway.TransitGatewayId $WORKING_DIR/tran
sit-gw)

for i in $WORKING_DIR/subnet-priv-*; do
subnetId="$(jq -r .Subnet.SubnetId $i)"
aws ec2 associate-route-table --subnet-id "$subnetId" --route-table-id $PRIV_RT
_TABLE_ID --output json
done

aws ec2 create-route-table --vpc-id $vpcId --output json > $WORKING_DIR/pub-rt


PUB_RT_TABLE_ID="$(jq -r .RouteTable.RouteTableId $WORKING_DIR/pub-rt)"

aws ec2 create-tags --resources $PUB_RT_TABLE_ID --tags 'Key=Name,Value=tkgvpc-


pub-rt'

aws ec2 create-route \


--route-table-id "$PUB_RT_TABLE_ID" \
--destination-cidr-block "0.0.0.0/0" \
--gateway-id $(jq -r .InternetGateway.InternetGatewayId $WORKING_DIR/inet-gw)

# Route any corporate IPs through your transit gw


aws ec2 create-route \
--route-table-id "$PUB_RT_TABLE_ID" \
--destination-cidr-block "172.16.0.0/12" \
--transit-gateway-id $(jq -r .TransitGateway.TransitGatewayId $WORKING_DIR/tran
sit-gw)

for i in $WORKING_DIR/subnet-pub-*; do
subnetId="$(jq -r .Subnet.SubnetId $i)"
aws ec2 associate-route-table --subnet-id "$subnetId" --route-table-id $PUB_RT_
TABLE_ID --output json
done

Create and Set Up a Jumpbox


After doing the network configuration, complete the steps described in this section to set up your
jumpbox. You will download the Tanzu CLI to the jumpbox, which you will use to deploy the
management cluster and workload clusters from the jumpbox. You also keep the Tanzu and
Kubernetes configuration files for your deployments on your jumpbox.

1. Create a jumpbox.

aws ec2 create-security-group --group-name "jumpbox-ssh" --description "To Jump


box" --vpc-id "$vpcId" --output json > $WORKING_DIR/sg_jumpbox_ssh
aws ec2 create-tags --resources $(jq -r .GroupId $WORKING_DIR/sg_jumpbox_ssh) -
-tags Key=Name,Value="jumpbox-ssh"
# Allow ssh to jumpbox
aws ec2 authorize-security-group-ingress --group-id $(jq -r .GroupId $WORKING_
DIR/sg_jumpbox_ssh) --protocol tcp --port 22 --cidr "0.0.0.0/0"

# Save this file or use some team keypair already created


aws ec2 create-key-pair --key-name tkg-kp --query 'KeyMaterial' --output text >
tkgkp.pem
chmod 400 tkgkp.pem

VMware, Inc 119


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

# Find an AMI for your region https://cloud-images.ubuntu.com/locator/ec2/ (20.


04)
aws ec2 run-instances --image-id ami-036d46416a34a611c --count 1 --instance-typ
e t2.medium --key-name tkg-kp --security-group-ids $(jq -r .GroupId $WORKING_D
IR/sg_jumpbox_ssh) --subnet-id $(jq -r .Subnet.SubnetId $WORKING_DIR/subnet-p
ub-a) --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=tkg-ju
mpbox}]' --block-device-mappings 'DeviceName=/dev/sda1,Ebs={VolumeSize=64}' > $
WORKING_DIR/instance_jb_starting

2. Wait a few minutes for the instance to start. After it restarts, SSH to the jumpbox.

aws ec2 describe-instances --instance-id $(jq -r '.Instances[0].InstanceId' $WO


RKING_DIR/instance_jb_starting) > $WORKING_DIR/instance_jb_started

echo Public IP: $(jq -r '.Reservations[0].Instances[0].PublicIpAddress' $WORKIN


G_DIR/instance_jb_started)

ssh ubuntu@$(jq -r '.Reservations[0].Instances[0].PublicIpAddress' $WORKING_DIR


/instance_jb_started) -i tkgkp.pem

3. Log in to the jumpbox to install the necessary packages and configurations. Then reboot.

sudo apt update


sudo apt install docker.io
sudo apt install screen
sudo adduser ubuntu docker
sudo reboot

4. Download the Tanzu CLI and other utilities for Linux from the Tanzu Kubernetes Grid
Download Product site.

5. Copy the files and binaries to the jumpbox.

scp -i tkgkp.pem tanzu-cli-bundle-linux-amd64.tar kubectl-linux-v1.23.8+vmware.


gz ubuntu@$(jq -r '.Reservations[0].Instances[0].PublicIpAddress' $WORKING_DIR/
instance_jb_started):/home/ubuntu

6. Connect to the jumpbox and start port forwarding

Note that the command shown below assumes that no process is currently listening on local
port 8080. If it is in use then choose a different port and adjust the SSH command line
accordingly.

ssh -L 8080:localhost:8080 ubuntu@$(jq -r '.Reservations[0].Instances[0].Public


IpAddress' $WORKING_DIR/instance_jb_started) -i tkgkp.pem

7. Install the Tanzu CLI.

Run the session in screen in case your SSH connection is terminated. If your connection is
terminated, you can reattach to the screen session with screen -r once you have
reconnected.

screen
tar -xzvf tanzu-cli-bundle-linux-amd64.tar.gz
gunzip kubectl-*.gz

VMware, Inc 120


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

sudo install kubectl-linux-* /usr/local/bin/kubectl


cd cli/
sudo install core/*/tanzu-core-linux_amd64 /usr/local/bin/tanzu
gunzip *.gz
sudo install imgpkg-linux-amd64-* /usr/local/bin/imgpkg
sudo install kapp-linux-amd64-* /usr/local/bin/kapp
sudo install kbld-linux-amd64-* /usr/local/bin/kbld
sudo install vendir-linux-amd64-* /usr/local/bin/vendir
sudo install ytt-linux-amd64-* /usr/local/bin/ytt
cd ..
tanzu plugin sync
tanzu config init

Running the tanzu config init command for the first time creates the ~/.config/tanzu/tkg
subdirectory, which contains the Tanzu Kubernetes Grid configuration files.

For more information about ytt cluster overlays, see ytt Overlays.

Prepare an External Identity Management


Tanzu Kubernetes Grid implements user authentication with Pinniped. Pinniped allows you to plug
external OpenID Connect (OIDC) or LDAP identity providers (IDP) into Tanzu Kubernetes clusters,
so that you can control user access to those clusters.

Pinniped is an open-source authentication service for Kubernetes clusters. If you use LDAP
authentication, Pinniped uses Dex as the endpoint to connect to your upstream LDAP identity
provider. If you use OIDC, Pinniped provides its own endpoint, so Dex is not required. Pinniped and
Dex run automatically as in-cluster services in your management cluster.

You enable identity management during management cluster deployment. Therefore, ensure that
you have an IDP/LDAP server setup before you do the Tanzu Kubernetes Grid management cluster
installation.

If you don’t have identity management configured, see Configure Identity Management for a sample
IDP setup. Also see Pinniped Docs for information on Pinniped integration into Tanzu Kubernetes
Grid with various OIDC providers and LDAPs.

Deploy a Tanzu Kubernetes Grid Management Cluster


You can deploy a Tanzu Kubernetes Grid management cluster using one of the following methods:

Run the Tanzu Kubernetes Grid installer, a wizard interface that guides you through the
process of deploying a management cluster. See Deploy Management Cluster from the
Tanzu Kubernetes Grid Installer.

OR

Create and edit YAML configuration files, and use the configuration files to deploy a
management cluster with the CLI commands. See Deploy Management Clusters from a
Configuration File.

Deploy a Management Cluster from the Tanzu Kubernetes Grid


Installer
To deploy a management cluster from the Tanzu Kubernetes Grid installer interface:

VMware, Inc 121


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

1. From the jumpbox, execute the following command to launch the installer interface.

tanzu management-cluster create --ui

2. Open a web browser and launch localhost:8080 on the machine running the SSH session.

The Tanzu Kubernetes Grid installer interface displays. Note that if you chose a different
listening port when connecting to the jumpbox then the interface will be available on that
port instead of port 8080.

Note

The screens are provided to help you navigate the installer interface. Enter
the values that are specific to your AWS setup. The screens shown were
taken from the current version at the time of writing and may differ slightly
from other versions.

3. Click Deploy on the Amazon EC2 tile to start the management cluster setup on Amazon
EC2.

4. For IaaS Provider settings, enter your AWS Access Key ID, Secret Access Key, Session
Token, and Region, then click Connect followed by Next. Select the region you selected in
Set up AWS infrastructure.

VMware, Inc 122


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

5. For VPC for AWS settings, select the VPC ID you created in Set up AWS infrastructure,
select the check box next to This is not internet facing vpc and click Next.

6. For Management Cluster Settings, select Production and the instance type for the control
plane nodes.

7. Enter the following specifications for the management cluster and click Next.

EC2 Key Pair: The name of an existing key pair, which you may have created in
Create and Set Up a Jumpbox.

Bastion Host: Select Enable.

Machine Health Checks: Select Enable.

AWS CloudFormation Stack: Select this if this is the first time that you are deploying
a management cluster to this AWS account, see Permissions Set by Tanzu
Kubernetes Grid for more details.

Availability Zone: Select the three availability zones for your region.

VMware, Inc 123


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

VPC Public and Private Subnets: Select the existing subnets on the VPC for each
AZ.

Worker Node Instance Type: Select the configuration for the worker node VMs.

8. For Kubernetes Network, enter the Network CNI settings and click Next.

Optionally, if you already have a proxy server set up and want to send outgoing HTTP(S)
traffic from the management cluster to a proxy, toggle Enable Proxy Settings. For more
information on how to configure proxy settings, see Configure the Kubernetes Network and
Proxies.

VMware, Inc 124


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

9. For Identity Management, toggle Enable Identity Management Settings to configure your
IDP and click Next.

For more information about configuring the identity management settings, see Configure
Identity Management.

10. For OS Image, use the drop-down menu to select the OS and Kubernetes version image
template to use for deploying Tanzu Kubernetes Grid VM. Select Ubuntu OS image (amd64)
and click Next.

11. For Register with Tanzu Mission Control, you can follow these steps to register your Tanzu
Kubernetes Grid Management cluster with Tanzu Mission Control and generate the Tanzu
Mission Control url to enter into the url section.

VMware, Inc 125


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

12. Register a Management Cluster with Tanzu Mission Control

13. For CEIP Agreement, select the check box to opt in to the VMware Customer Experience
Improvement Program (CEIP), and click Next.

A summary of the configuration displays.

14. Review the summary of the configuration.

15. Click Deploy Management Cluster to complete the installation.

Deploy Management Clusters from a Configuration File


This section describes how to deploy a Tanzu Kubernetes Grid management cluster from a
configuration file using the Tanzu CLI. Skip this section if you have already deployed a management
cluster from the Tanzu Kubernetes Grid Installer UI.

Before creating a management cluster using the Tanzu CLI, define the base configuration for the
cluster in a YAML file. You specify this file by using the --file option of the tanzu management-
cluster create command.

Note

To avoid creating a public-facing load balancer you can set


AWS_LOAD_BALANCER_SCHEME_INTERNAL to true in the cluster configuration
file AWS_LOAD_BALANCER_SCHEME_INTERNAL: true. This setting customizes the
management cluster’s load balancer to use an internal scheme, which means that its
Kubernetes API server will not be accessible and routed over the Internet.

For Register with Tanzu Mission Control, you can Register a Management Cluster with Tanzu
Mission Control to generate Tanzu Mission Control url and set into TMC_REGISTRATION_URL: <Tanzu

VMware, Inc 126


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Mission Control url>

To create a new Tanzu Kubernetes Grid management cluster, run the following command:

tanzu management-cluster create --file path/to/cluster-config-file.yaml

Sample managment cluster config file

AWS_AMI_ID:
AWS_NODE_AZ: us-west-2a
AWS_NODE_AZ_1: ""
AWS_NODE_AZ_2: ""
AWS_PRIVATE_NODE_CIDR: 172.16.0.0/24
AWS_PRIVATE_NODE_CIDR_1: ""
AWS_PRIVATE_NODE_CIDR_2: ""
AWS_PRIVATE_SUBNET_ID: ""
AWS_PRIVATE_SUBNET_ID_1: ""
AWS_PRIVATE_SUBNET_ID_2: ""
AWS_PUBLIC_NODE_CIDR: 172.16.3.0/24
AWS_PUBLIC_NODE_CIDR_1: ""
AWS_PUBLIC_NODE_CIDR_2: ""
AWS_PUBLIC_SUBNET_ID: ""
AWS_PUBLIC_SUBNET_ID_1: ""
AWS_PUBLIC_SUBNET_ID_2: ""
AWS_REGION: us-west-2
AWS_SSH_KEY_NAME: tkg-kp
AWS_VPC_CIDR: 172.16.0.0/16
AWS_VPC_ID: ""
BASTION_HOST_ENABLED: "false"
CLUSTER_CIDR: 172.96.0.0/11
CLUSTER_NAME: tkg-validaton-mc
CLUSTER_PLAN: dev
CONTROL_PLANE_MACHINE_TYPE: t3.large
ENABLE_AUDIT_LOGGING: ""
ENABLE_CEIP_PARTICIPATION: "false"
ENABLE_MHC: "true"
IDENTITY_MANAGEMENT_TYPE: none
INFRASTRUCTURE_PROVIDER: aws
LDAP_BIND_DN: ""
LDAP_BIND_PASSWORD: ""
LDAP_GROUP_SEARCH_BASE_DN: ""
LDAP_GROUP_SEARCH_FILTER: ""
LDAP_GROUP_SEARCH_GROUP_ATTRIBUTE: ""
LDAP_GROUP_SEARCH_NAME_ATTRIBUTE: cn
LDAP_GROUP_SEARCH_USER_ATTRIBUTE: DN
LDAP_HOST: ""
LDAP_ROOT_CA_DATA_B64: ""
LDAP_USER_SEARCH_BASE_DN: ""
LDAP_USER_SEARCH_FILTER: ""
LDAP_USER_SEARCH_NAME_ATTRIBUTE: ""
LDAP_USER_SEARCH_USERNAME: userPrincipalName
NODE_MACHINE_TYPE: m5.large
OIDC_IDENTITY_PROVIDER_CLIENT_ID: ""
OIDC_IDENTITY_PROVIDER_CLIENT_SECRET: ""
OIDC_IDENTITY_PROVIDER_GROUPS_CLAIM: ""
OIDC_IDENTITY_PROVIDER_ISSUER_URL: ""
OIDC_IDENTITY_PROVIDER_NAME: ""
OIDC_IDENTITY_PROVIDER_SCOPES: ""

VMware, Inc 127


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

OIDC_IDENTITY_PROVIDER_USERNAME_CLAIM: ""
OS_ARCH: amd64
OS_NAME: ubuntu
OS_VERSION: "20.04"
SERVICE_CIDR: 172.64.0.0/13
TKG_HTTP_PROXY_ENABLED: "false"

If you had previously deployed a management cluster, the ~/.config/tanzu/tkg/clusterconfigs


directory contains the management cluster configuration file.

To use the configuration file from a previous deployment, make a copy of the configuration file with
a new name, open it in a text editor, and update the configuration. VMware recommends using a
dedicated configuration file for each management cluster, with the configuration settings specific to a
single infrastructure.

For more information about deploying a management cluster from a configuration file, see Deploy
Management Clusters from a Configuration File.

Examine the Management Cluster Deployment


During the deployment of the management cluster, either from the installer interface or from a
configuration file using Tanzu CLI, Tanzu Kubernetes Grid creates a temporary management cluster
using a Kubernetes in Docker, kind, cluster on the jumpbox.

Tanzu Kubernetes Grid uses the temporary management cluster to provision the final management
cluster on AWS. For information about how to examine and verify your Tanzu Kubernetes Grid
management cluster deployment, see Examine the Management Cluster Deployment.

Deploy Workload Clusters


After deploying the management cluster, you can create the workload clusters. The management
cluster’s context is updated automatically, so you can begin interacting with the management cluster.

Run the following command to create a basic workload cluster:

tanzu cluster create <cluster_name> --plan=prod

Workload clusters can be highly customized through YAML manifests and applied to the
management cluster for deployment and lifecycle management. To generate a YAML template to
update and modify to your own needs use the --dry-run switch. Edit the manifests to meet your
requirements and apply them to the cluster.

Example:

tanzu cluster create <workload_cluster> --plan=prod --worker-machine-count 3 --dry-run

After the workload cluster is created, the current context changes to the new workload cluster.

For more information on cluster lifecycle and management, see Manage Clusters.

Troubleshooting Tips for Tanzu Kubernetes Grid


For tips to help you to troubleshoot common problems that you might encounter when installing
Tanzu Kubernetes Grid and deploying Tanzu Kubernetes clusters, see Troubleshooting Tips for

VMware, Inc 128


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Tanzu Kubernetes Grid.

Install and Configure Packages into Workload Clusters


A package in Tanzu Kubernetes Grid is a collection of related software that supports or extends the
core functionality of the Kubernetes cluster in which the package is installed. Tanzu Kubernetes Grid
includes two types of packages, auto-managed packages and CLI-managed packages. For more
information about packages in Tanzu Kubernetes Grid, see Install and Configure Packages.

Auto-Managed Packages
Tanzu Kubernetes Grid automatically installs the auto-managed packages during cluster creation. For
more information about auto-managed packages, see Auto-Managed Packages.

CLI-Managed Packages
A CLI-managed package is an optional component of a Kubernetes cluster that you can install and
manage with the Tanzu CLI. These packages are installed after cluster creation. CLI-managed
packages are grouped into package repositories in the Tanzu CLI. If a package repository that
contains CLI-managed packages is available in the target cluster, you can use the Tanzu CLI to install
and manage any of the packages from that repository.

Using the Tanzu CLI, you can install CLI-managed packages from the built-in tanzu-standard
package repository or from package repositories that you add to your target cluster. From the tanzu-
standard package repository, you can install the Cert Manager, Contour, External DNS, Fluent Bit,
Grafana, Harbor, Multus CNI, and Prometheus packages. For more information about CLI-managed
packages, see CLI-Managed Packages.

The following provide more information on installing VMware recommended CLI-managed


packages:

Install Cert Manager

Implement Ingress Control with Contour

Implement Log Forwarding with Fluent Bit

Implement Monitoring with Prometheus and Grafana

Implement Multiple Pod Network Interfaces with Multus

Implement Service Discovery with ExternalDNS

Deploy Harbor Registry as a Shared Service

If you want to deploy Harbor into a shared services cluster, create a shared services cluster if it is not
already created. For instructions, see Create a Shared Services Cluster. Also, make sure you add
INFRASTRUCTURE_PROVIDER: aws into shared service workload cluster config file.

Configure SaaS Services


The following VMware SaaS services provide additional Kubernetes lifecycle management,
observability, and service mesh features.

Tanzu Mission Control (TMC)

VMware, Inc 129


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Tanzu Observability (TO)

Tanzu Service Mesh (TSM)

For configuration information, see Configure SaaS Services.

Delete Clusters
The procedures in this section are optional. They are provided in case you want to clean up your
production or lab environment.

Delete a Workload Cluster


To delete a provisioned workload first set your context back to the management cluster.

kubectl config use-context [mgmt_cluster_name]-admin@[mgmt_cluster_name]

From the management cluster context run:

tanzu cluster delete <cluster_name>

Delete a Management Cluster


Use this procedure to delete the management cluster as well as all of the AWS objects Tanzu
Kubernetes Grid created such as VPC, subnets and NAT Gateways.

Note

Be sure to wait until all the workload clusters have been reconciled before deleting
the management cluster or infrastructure will need to be manually cleaned up.

Running the following command will delete the objects.

tanzu cluster delete <management-cluster-name>

Logs and Troubleshooting


For information about how to find the Tanzu Kubernetes Grid logs, how to troubleshoot frequently
encountered Tanzu Kubernetes Grid issues, and how to use the Crash Recovery and Diagnostics
tool, see Logs and Troubleshooting.

VMware Tanzu for Kubernetes Operations on Azure


Reference Design
VMware Tanzu simplifies the operation of Kubernetes in multi-cloud environment by centralizing
management and governance for clusters and teams across on-premises, public clouds, and the
edge. This application delivers an open-source-aligned Kubernetes distribution with consistent
operations and management to support infrastructure and app modernization.

VMware, Inc 130


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

This document lays out a reference design for deploying VMware Tanzu for Kubernetes Operations
with Tanzu components on Microsoft Azure. This reference design is based on the architecture and
components described in VMware Tanzu for Kubernetes Operations Reference Architecture.

Note

This reference design is supported and validated for customers deploying Tanzu
Kubernetes Grid 1.6 on Microsoft Azure.

Cluster Creation and Management


This reference design uses Tanzu Kubernetes Grid to create and manage ubiquitous Kubernetes
clusters on Microsoft Azure using Kubernetes Cluster API. Tanzu Kubernetes Grid functions through
the creation of a management cluster which houses the Cluster API. The Cluster API then interacts
with the infrastructure provider to service workload Kubernetes cluster lifecycle requests.

The Tanzu Kubernetes Grid user interface (UI) provides a guided deployment experience that is
tailored for Microsoft Azure. The Tanzu Kubernetes Grid installer runs either on an operator’s own
machine (it uses Docker) or through a bootstrap machine or a jump box.

VMware, Inc 131


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Note

When using a bootstrap machine or a jump box, you may not be able to use the
Tanzu Kubernetes Grid UI to build your configuration of the management and
workload clusters. In such cases, use the following sample YAML file to help kickstart
the installation process.

AZURE_ENVIRONMENT: "AzurePublicCloud"
AZURE_CLIENT_ID: <AZURE_CLIENT_ID>
AZURE_CLIENT_SECRET: <AZURE_CLIENT_SECRET>
AZURE_CONTROL_PLANE_MACHINE_TYPE: Standard_D2s_v3
AZURE_CONTROL_PLANE_SUBNET_CIDR: 10.0.1.0/26
AZURE_CONTROL_PLANE_SUBNET_NAME: mgmt-control-subnet
AZURE_ENABLE_PRIVATE_CLUSTER: "true"
AZURE_FRONTEND_PRIVATE_IP: 10.0.1.4
AZURE_LOCATION: eastus2
AZURE_NODE_MACHINE_TYPE: Standard_D2s_v3
AZURE_NODE_SUBNET_CIDR: 10.0.1.64/26
AZURE_NODE_SUBNET_NAME: mgmt-worker-subnet
AZURE_RESOURCE_GROUP: bch-tkg-east
AZURE_SSH_PUBLIC_KEY_B64: <BASE64-SSH-PUBLIC>
AZURE_SUBSCRIPTION_ID: <AZURE_SUBSCRIPTION_ID>
AZURE_TENANT_ID: <AZURE_TENANT_ID>
AZURE_VNET_CIDR: 10.0.0.0/16
AZURE_VNET_NAME: bch-vnet-tkg
AZURE_VNET_RESOURCE_GROUP: bch-tkg-east
CLUSTER_CIDR: 100.96.0.0/11
CLUSTER_NAME: bchcluster-mgmt-east
CLUSTER_PLAN: prod
ENABLE_AUDIT_LOGGING: "true"
ENABLE_CEIP_PARTICIPATION: "false"
ENABLE_MHC: "true"
INFRASTRUCTURE_PROVIDER: azure
OS_ARCH: amd64
OS_NAME: ubuntu
OS_VERSION: "20.04"
SERVICE_CIDR: 100.64.0.0/13
TKG_HTTP_PROXY_ENABLED: "false"

Tanzu Kubernetes Grid is deployed as an Infrastructure as a Service (IaaS) solution on Microsoft


Azure. You can take advantage of the Azure platform services based on your own specific

VMware, Inc 132


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

requirements. However, not all Azure platform services can be tightly integrated into the Tanzu
Kubernetes Grid installation.

Tanzu Clusters
A Kubernetes cluster is made up of several components that act as a control plane of the cluster and
a set of supporting components and worker nodes that actually help run the deployed workloads.
There are two types of clusters in the Tanzu Kubernetes Grid setup: management cluster and
workload cluster. The Tanzu Kubernetes Grid management cluster hosts all the Tanzu Kubernetes
Grid components used to manage workload clusters. Workload clusters, which are spun up by Tanzu
Kubernetes Grid administrators, run the containerized applications. Cluster security is a shared
responsibility between Tanzu Kubernetes Grid cluster administrators, developers, and operators who
run applications on Tanzu Kubernetes Grid clusters.

Network Design
VMware recommends using one of the following production-level network designs for deploying
Tanzu Kubernetes Operations on Azure:

Clusters in the same virtual network (VNet)

Clusters in separate virtual networks (VNet)

Same Virtual Network


You can set up your networking such that the Tanzu Kubernetes Grid management cluster and
workload clusters are in the same VNet as the bootstrap machine. Each cluster is in a separate
subnet. The control plane and worker nodes are also placed in separate subnets.

VMware, Inc 133


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Separate Virtual Networks


The following design uses a hub-and-spoke model. The Tanzu Kubernetes clusters are separated
into different VNets. This network design requires that the corresponding VNets are peered with one
another so that the management cluster can correctly communicate with the workload clusters. This
approach is recommended by Microsoft.

VMware, Inc 134


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Considerations
The network designs are based on a default Tanzu CLI deployment for a production-level installation.
The designs use the default configuration values when running the Tanzu CLI. However, you have
complete control over how many nodes are deployed within the workload clusters for both the
control plane and worker nodes. You also determine the Azure components with which the clusters
will integrate.

Consider the following about the network designs:

1. Use CIDR range /28. Due to the way that Azure implements its IP addressing scheme within
subnets, VMware recommends that the minimum CIDR range for a Tanzu deployment is /28
to allow for scalability of each cluster.

2. Use only the required Microsoft Azure components that are necessary for deploying Tanzu
Kubernetes Grid on Microsoft Azure.

3. Fit into any production-level network design that you may have in place.

4. Use the default security and DevOps tooling available with an Azure subscription. The
security and DevOps tools are shown in the column to the right of the network designs.

5. Do not make assumptions or provide designs for the outer perimeter of your network design.
You may use Azure or third-party services. The outer perimeter network design should not
affect the network designs for Tanzu Kubernetes Operations on Microsoft Azure.

6. Integrating with SaaS services, such as Tanzu Mission Control and Tanzu Observability,
requires that the Tanzu Kubernetes clusters have outbound SSL-based connectivity to the

VMware, Inc 135


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Internet. Add a rule to allow port 443. Add the rule to the Network Security Groups (NSGs)
that are applied to the subnet where the control plane VMs are deployed. Allow port 443 to
all targets until VMware can provide a more detailed list of targeted CNAMES or IP ranges.

Required Microsoft Azure Components


The following Microsoft Azure components are required for deploying the reference architecture.

Quotas
Provide sufficient quotas to support both the management cluster and the workload clusters in your
deployment. Otherwise, the cluster deployments will fail. Depending on the number of workload
clusters you will deploy, you may need to increase the following quotas from their default values.
You will need to increase these quotas in every region in which you plan to deploy Tanzu
Kubernetes Grid.

Total Regional vCPUs

Family vCPUs based on your chosen family of VM (D, E, F, etc.)

Public IP Addresses - Basic

Static Public IP Addresses

Public IP Addresses - Standard

Application Registration or Service Principal


Create an Azure Application Registration or Service Principal (SP) for the Tanzu CLI. The Tanzu CLI
creates the necessary VMs and networking components in which the Kubernetes engine runs. The
Tanzu CLI uses the Application Registration to perform all the necessary Azure tasks to create the
VMs and networking components.

The Tanzu Kubernetes Grid documentation suggests that you assign the Contributor role to the
Service Principal. However, because the Tanzu CLI creates the VMs and networking components,
for security reasons VMware recommends assigning only the VM and Network Contributor roles to
the SP.

Virtual Network
Because Tanzu for Kubernetes operations is deployed as an IaaS solution on Azure, the Kubernetes
clusters must exist within the boundary of an Azure Virtual Network (VNet). Therefore, place the
bootstrap machine, which is used to run the Tanzu CLI, in the same VNet as the Tanzu management
cluster. Place the management cluster in its own subnet.

The workload clusters can exist within the same VNet, but in different subnets, or in a completely
separate VNet. However, ensure that the Workload VNet is peered with the VNet where the
management cluster is deployed.

Load Balancer
When you deploy a management or workload cluster using Tanzu CLI, a load balancer is created
and attached to both the control plane and the worker node clusters. The load balancers are used
only for running Kubernetes traffic to the underlying nodes. The Kubernetes engine does not use

VMware, Inc 136


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

the load balancers for traffic to service pods within the cluster.

The option to make the clusters private or public is controlled by the AZURE_ENABLE_PRIVATE_CLUSTER
configuration option within the config.yaml file that you use to deploy the cluster. Setting this option
to false tells the deployment process to create a public IP address and attach it to each load
balancer. Setting it to true requires you to specify a value in the AZURE_FRONTEND_PRIVATE_IP
configuration option, and attaches an IP address from your specified subnet to the load balancer.

Network Security Group (NSG)


Before you create clusters, create a Network Security Group and apply it to the subnets that you will
use in your Tanzu for Kubernetes Operations deployment. This ensures that the production
environment works properly after the deployment.

Before you begin your deployment, it is important to ensure that the necessary pathways are open
to all pieces of the clusters and that they are able to talk to one another. The following are the
primary requirements:

Bootstrap Machine/Subnet – SSH and HTTPS Inbound/Outbound Internet, Secure Kubectl


within VNet (6443)

Control Plane VMs/Subnet – HTTPS Inbound/Outbound to Internet and SSH and Secure
Kubectl (22, 443, and 6443) Inbound/Outbound within the VNet

Worker Node VMs/Subnet – Secure Kubectl (6443) Inbound/Outbound within the VNet

Note

HTTPS traffic to the bootstrap machine and the control plane nodes is required so
that they can download the necessary container images for the clusters to function
properly.

Virtual Machines
The primary component of the Tanzu Kubernetes Grid installation is the VMs that are created to work
either as the control plane or as worker nodes within the cluster. You can leverage many different
VM sizes, including GPU-based VMs, when you deploy your clusters. The default VM size is the
standard D2s_V3 and the minimum requirement for Azure instance types is 2 CPUs and 8 GB
memory.

VMware recommends that Resource Groups, VNets, subnets, and Network Security Groups are
created before you start a deployment.

Important

All clusters are deployed in a highly available state across Availability Zones within a
given Azure region. However, this does mean that regions that do not have
Availability Zones will not support Tanzu Kubernetes Grid deployments.

Azure Backup

VMware, Inc 137


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

As with any IaaS-based solution within Azure, VMware recommends that an Azure Backup Recovery
Vault is deployed and made available to all VMs. The availability of Azure Backup is important for the
control plane clusters and the bootstrap machine because that is where the Kubernetes and Tanzu
configurations are stored and managed.

Azure Monitor
The Azure Monitor set of services are automatically turned on for all customers within their given
subscription. Although Tanzu for Kubernetes Operations provides monitoring and logging, it does
not capture information on many of the Azure components mentioned in this reference design.
Therefore, it is important to use the available Azure Monitor features that Microsoft provides, such as:

Activity Log

Network Watcher

Azure Log Analytics

Diagnostics/Metrics/Alerts

Optional Azure Components


The following Microsoft Azure components are optional for deploying the reference architecture.

Bastion Host
Microsoft Azure creates an Azure Bastion service by default. You can use the service as a jump box
to the bootstrap machine.

This reference design uses a bootstrap machine that does cluster deployments using the Tanzu CLI.
However, your security requirements may not allow access from your cluster to a bootstrap machine
inside your firewall. In such cases, after the initial cluster creation, you can connect your clusters to
Tanzu Mission Control for lifecycle management.

Public IP
Use of a public IP address for the Kubernetes API server is optional. You can host your Kubernetes
API server on a private IP address. In fact, this reference design uses a private IP address. Access is
provided through a public endpoint in a DMZ with a Web Application Firewall (WAF) or through
some kind of VPN level connectivity, such as Express Route or Site-to-Site VPN with connectivity
back to your on-premises network.

Note

Keep in mind that the default deployment of Tanzu Kubernetes Grid creates public
facing clusters. Make sure you set AZURE_ENABLE_PRIVATE_CLUSTER to true if you want
to deploy your Kubernetes clusters on a private IP address.

Container Registries
Numerous container registry options are available, and you may already have one in place. Tanzu

VMware, Inc 138


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

comes pre-packaged with its own registry, called Harbor, which can be made available directly within
a Tanzu Kubernetes Grid workload cluster. If you are hosting your Kubernetes clusters on a private
IP address as described in this reference design, the Harbor registry sits in a workload cluster in the
same network architecture as all other clusters. This design allows only private traffic access to the
container images.

For more information, see Deploy Harbor Registry as a Shared Service.

Azure Container Registry


Microsoft Azure also provides a container registry as a service, called Azure Container Registry
(ACR). This service can be deployed within your Azure subscription. It is deployed by default as a
public service with a public endpoint. However, using Software Defined Networking (SDN)
configurations, the Azure Container Registry can be linked directly into the Virtual Network where
your Tanzu Kubernetes Grid clusters reside through a service called Private Endpoint. This limits
your ACR so that is available only to traffic originating from your Virtual Network, thereby creating a
completely private deployment.

For more information, see Azure Container Registry.

Private Container Registry Options


In addition to Harbor, there are other registry options that can be made available within the same
network. These options can be deployed as a VM within the same network as the Tanzu Kubernetes
Grid clusters. A few of the options are:

Docker Hub Registry

JFrog Container Registry

Red Hat Quay

Public Container Registry Options


There are many other options for publicly available container registries that are similar to Azure
Container Registry. These can also be connected to your Tanzu for Kubernetes Operations
deployments. However, you will need to open your networking traffic to the source of your registry.
Some of the available options are:

Elastic Container Registry (AWS)

Google Container Registry

DockerHub

Oracle Cloud Infrastructure Registry

Global Cluster Lifecycle Management


Attaching clusters to Tanzu Mission Control allows you to manage your global portfolio of Kubernetes
clusters.

Note

Ensure that Tanzu Kubernetes clusters have outbound SSL-based connectivity to the

VMware, Inc 139


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Internet.

Tanzu Mission Control provides the following capabilities:

Centralized lifecycle management: managing the creation and deletion of workload clusters
using registered management or supervisor clusters

Centralized management: viewing the inventory of clusters and the health of clusters and
their components

Authorization: centralized authentication and authorization with federated identity from


multiple sources (e.g., AD, LDAP, and SAML), plus an easy-to-use policy engine for granting
the right access to the right users across teams

Compliance: enforcing all clusters to apply the same set of policies

Data protection: managing Velero deployment, configuration, and schedule to ensure that
cluster manifests and persistent volumes are backed up and restorable

Inspection: running a Sonobouy conformance check suite to ensure Kubernetes cluster


functionality

For a complete list of features that Tanzu Mission Control includes with Tanzu, see this chart.

To attach your cluster for management through Tanzu Mission Control, navigate to Clusters > Attach
Cluster on the Tanzu Mission Control console and follow the prompts.

Note

VMware, Inc 140


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

If a workload cluster under management requires a proxy to access the Internet, you
can use the Tanzu Mission Control CLI to generate the YAML necessary to install
Tanzu Mission Control components on it.

Ingress and Load Balancing


Tanzu Kubernetes Grid requires load balancing for both the control plane and the workloads. Tanzu
Kubernetes Grid on Azure uses Azure Load Balancer for control plane and workload clusters.

For workloads, Tanzu Kubernetes Grid Contour ingress controller package can be used for layer 7
load balancing.

In Tanzu Kubernetes Grid, you can optionally deploy the external-dns package, which automates
updating DNS records in Azure DNS associated with ingress resources or load balancing services.
This can also automate away toil associated with DNS record management for externally exposed
services.

Authentication with Pinniped


The Pinniped authentication and authorization service components are deployed into the
management cluster. Pinniped uses the OIDC or LDAP identity provider (IDP) configurations
specified during the management cluster deployment. The workload cluster inherits its
authentication configurations from its management cluster. With authentication in place, a
Kubernetes administrator can enforce role-based access control (RBAC) with Kubernetes
RoleBinding resources. These resources associate an identity provider user with a given Kubernetes
role on the workload cluster.

Pinniped consists of following components:

The Pinniped Supervisor is an OIDC server which authenticates users through an external
identity provider (IDP)/LDAP, and then issues its own federation ID tokens to be passed on
to clusters based on the user information from the IDP.

VMware, Inc 141


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

The Pinniped Concierge is a credential exchange API which takes as input a credential from
an identity source (e.g., Pinniped Supervisor, proprietary IDP), authenticates the user via that
credential, and returns another credential which is understood by the host Kubernetes
cluster or by an impersonation proxy which acts on behalf of the user.

Dex Pinniped uses Dex as a broker for your upstream LDAP identity provider. Dex is
deployed only when LDAP is selected as the OIDC backend during Tanzu Kubernetes Grid
management cluster creation.

The following diagram shows the Pinniped authentication flow with an external IDP. In the diagram,
the blue arrows represent the authentication flow between the workload cluster, the management
cluster, and the external IDP. The green arrows represent Tanzu CLI and kubectl traffic between the
workload cluster, the management cluster, and the external IDP.

See the Pinniped Docs for more information on how to integrate Pinniped into Tanzu Kubernetes
Grid with OIDC providers and LDAP.

VMware recommends the following best practices for managing identities in clusters provisioned with
Tanzu Kubernetes Grid:

Limit access to cluster resources following least privilege principle.

Limit access to management clusters to the appropriate set of users. For example, provide
access only to users who are responsible for managing infrastructure and cloud resources
but not to application developers. This is especially important because access to the
management cluster inherently provides access to all workload clusters.

Limit cluster administrator access for workload clusters to the appropriate set of users. For
example, provide access to users who are responsible for managing infrastructure and
platform resources in your organization, but not to application developers.

Connect to an identity provider to manage the user identities allowed to access cluster
resources instead of relying on administrator-generated kubeconfig files.

Observability

VMware, Inc 142


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Metrics Monitoring with Tanzu Observability by Wavefront


(Recommended Solution)
Using VMware Tanzu Observability by Wavefront significantly enhances observability. Tanzu
Observability is a VMware SaaS application that collects and displays metrics and trace data from the
full stack platform, as well as from applications. The service provides the ability to create alerts tuned
by advanced analytics, assist in the troubleshooting of systems and to understand the impact of
running production code.

Note

Ensure that Tanzu Kubernetes clusters have outbound SSL-based connectivity to the
Internet.

Tanzu Observability collects data from components in Azure, Kubernetes, and applications running
within Kubernetes.

You can configure Tanzu Observability with an array of capabilities. The following table describes the
plugins that VMware recommends for this design:

Plugin Purpose Key Metrics Example Metrics

Wavefront Collects metrics from Kubernetes container and POD POD CPU usage rate
Kubernetes Kubernetes clusters and pods statistics
Integration

Wavefront by Adapts Istio collected metrics Istio metrics including request rates, Request rate
VMware for Istio and forwards to Wavefront trace rates, throughput, etc. (Transactions per
Second)

VMware, Inc 143


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Custom Tanzu Observability Dashboards

Tanzu Observability provides various out-of-the-box dashboards. You can customize the dashboards
for your particular deployment. For information on how to customize Tanzu Observability dashboards
for Tanzu for Kubernetes Operations, see Customize Tanzu Observability Dashboard for Tanzu for
Kubernetes Operations.

Metrics Monitoring with Prometheus and Grafana (Alternative


Solution)
Tanzu Kubernetes Grid also supports Prometheus and Grafana as alternative on-premise solutions
for monitoring Kubernetes clusters.

Prometheus operates by exposing scrapable metrics endpoints for various monitoring targets
throughout your cluster. Metrics are ingested by polling the endpoints on a set interval which are
then stored in a time-series database. Metrics data can be explored via the Prometheus Query
Language interface.

Grafana is responsible for visualizing Prometheus metrics without the need to manually write PromQL
queries. Custom charts and graphs can be created in addition to the pre-packaged options.

The Tanzu Kubernetes Grid extensions bundles contain instructions and manifests for deploying
these tools out.

VMware, Inc 144


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Prometheus and Grafana are user-managed packages available with Tanzu Kubernetes Grid. For
more information about packages bundled with Tanzu Kubernetes Grid, see Install and Configure
Packages. For more information about user-managed packages, see User-Managed Packages.

Log Forwarding
Tanzu also includes Fluent Bit for integration with logging platforms such as vRealize LogInsight,
Elastic Search, and other logging aggregators. For information on configuring Fluent Bit to your
logging provider, see Implement Log Forwarding with Fluent Bit.

Summary
Tanzu Kubernetes Grid on Azure offers high-performance potential, convenience, and addresses
the challenges of creating, testing, and updating cloud-based Kubernetes platforms in a consolidated
production environment. This validated approach will result in a production-quality installation with all
the application services needed to serve combined or uniquely separated workload types via a

VMware, Inc 145


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

combined infrastructure solution.

This plan meets many Day 0 needs for aligning product capabilities, such as configuring firewall
rules, networking, load balancing, and workload computing, to the full stack infrastructure.

Deployment Instructions
For instructions on how to deploy this reference design, see Deploy Tanzu for Kubernetes
Operations on Microsoft Azure.

Deploy VMware Tanzu for Kubernetes Operations on


Microsoft Azure
VMware Tanzu simplifies the operations of Kubernetes for multi-cloud deployments by centralizing
management and governance for clusters and teams across on-premises, public clouds, and the
edge. It delivers an open source aligned Kubernetes distribution with consistent operations and
management to support infrastructure and app modernization. This document provides a step-by-
step guide for installing and deploying VMware Tanzu for Kubernetes Operations (informally known
as TKO) on Microsoft Azure.

The scope of this document is limited to providing the deployment steps based on the following
reference design. The reference design represents one of the two production-level reference
designs described in VMware Tanzu for Kubernetes Operations on Azure Reference Design.

This design shows both the Tanzu Kubernetes Grid management cluster and workload clusters in the
same virtual network along with the bootstrap machine. However, each cluster is placed in its own
subnet. In addition, the control plane and worker nodes of each cluster are also separated by a
subnet.

1. The reference design shows the deployment of only the base components within Tanzu

VMware, Inc 146


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Kubernetes Grid.

2. The reference design fits in with any production-level design that a customer may have in
place, such as a Hub and Spoke, Global WAN Peering, or just a simple DMZ based
implementation.

3. This guide does not make any assumptions about your chosen tooling for security or
DevOps, other than what is available with a default Azure subscription.

Note

You can use this guide to deploy additional workload clusters and workload clusters
of a different size. However, you’ll need to make additional configuration changes.
You can make these configuration changes after you have gone through the
deployment steps provided in this document.

Prerequisites
Ensure that you have:

Read VMware Tanzu for Kubernetes Operations on Azure Reference Design.

A Microsoft Azure subscription.

Owner-level access to the subscription.

To deploy the ARM template in Microsoft Azure, you’ll need:


Contributor role in Microsoft Azure.

Resource group in Microsoft Azure.

An SSH key and the Base 64 encoded value of the public key. You will configure the Base
64 encoded value for the AZURE_SSH_PUBLIC_KEY_B64 parameter of the configuration
file for deploying Tanzu Kubernetes Grid. How you generate the SSH key and how you
encode the entire public key is up to you. However, you will need to encode the public key
before storing it in the Tanzu Kubernetes Grid deployment configuration file.

Access to Customer Connect and the available downloads for Tanzu Kubernetes Grid. To
verify that you have access, go to VMware Tanzu Kubernetes Grid Download Product.

Overview of the Deployment steps


1. Set up your Microsoft Azure environment

2. Set up Bootstrap VM

3. Deploy Tanzu Kubernetes Grid

4. Configure SaaS Services

5. (Optional) Deploy Tanzu Kubernetes Grid Packages

Set up your Microsoft Azure Environment


Before deploying Tanzu for Kubernetes operations and the actual Kubernetes clusters, ensure that

VMware, Inc 147


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

your Azure environment is set up as described in this section.

Azure ARM Template


The deployment detailed in this document uses the following resources:

ARM Template

Parameters

The ARM template contains parameters that you can populate or customize so that your Azure
environment uses your naming standards and networking requirements.

The ARM template deploys the following items:

Virtual Network

5 Subnets
Bootstrap

Management Cluster Control Plane (API Servers)

Management Cluster Worker Nodes

Workload Cluster Control Plane (API Servers)

Workload Cluster Worker Nodes

Network Security Group for Bootstrap Machine NIC

Network Security Groups for each of the Cluster Subnets

Public IP Address attached to Bootstrap Machine

Virtual Machine for Bootstrap (Ubuntu 20.0.4)

In addition, the ARM template,

Uses the Region where the Resource Group is located to specify where the resources should
be deployed.

Contains security rules for each of the Network Security Groups attached to the Control
Plane clusters. These rules allow for SSH and secure kubectl access from the public Internet.
Access from the public Internet makes troubleshooting easier while you deploy your
management and workload clusters. You can remove these rules after you complete your
deployment.

Quotas
To successfully deploy Tanzu Kubernetes Grid to Azure, ensure that the quotas are sufficient to
support both the management cluster and workload cluster deployments. Otherwise, the
deployments will fail.

Review the quotas for the following resources, which are included in the ARM template, and
increase their values as needed. Increase the quotas for every region to which you plan to deploy
Tanzu Kubernetes Grid.

Total Regional vCPUs

Family vCPUs based on your chosen virtual machine family (D, E, F, etc.)

VMware, Inc 148


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Static Public IP Addresses

Public IP Addresses - Standard

Based on the recommended minimum virtual machine size of D2s_v3, the following minimum
quotas are required per cluster:

Total Regional vCPUs = 24

Family vCPUs for D Family = 24

Static Public IP Addresses = 6

Public IP Addresses - Standard = 6

Ensure that you increase the quotas if you make changes to the basic configuration of the clusters.

ARM Template Deployment


Deploy ARM Template

There are multiple methods to deploy an ARM template on Azure. If you are experienced with
Azure, you can deploy the ARM template in a method that is comfortable to you.

Otherwise, you can use the example Azure CLI commands locally or in Azure Cloud Shell. If you
prefer to use Azure PowerShell, use the example command for Azure PowerShell.

Ensure that you have the following to deploy the ARM template in Microsoft Azure:

Contributor role in Microsoft Azure.

Resource group in Microsoft Azure.

Azure CLI

Run the following example Azure CLI command locally or in Azure Cloud Shell to deploy the ARM
template.

az deployment create –template-file azure-deploy.json –parameters azure-deploy-paramet


ers.json –resource-group <Resource Group Name>

Azure PowerShell

Alternatively, run the following example command in Azure PowerShell to deploy the ARM template.

New-AzResourceGroupDeployment -ResourceGroupName <Resource Group Name> -TemplateFile a


zure-deploy.json -TemplateParameterFile azure-deploy-parameters.json

Azure Portal

If you prefer to use the Azure Portal, do the following to process an ARM template directly on the
Azure Portal.

1. Search and click Deploy a Custom Template > Build your own template in the editor.

VMware, Inc 149


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

2. Click Load file to upload the ARM template, azuredeploy.json.

VMware, Inc 150


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

3. Fill in the parameter values so that the values are specific to your deployment.

Azure Service Principal/Application Registration Creation


The Tanzu CLI requires access to an Azure Service Principal (SP) or Application Registration to
programmatically configure the Tanzu cluster’s infrastructure during deployment and during auto-
scale events.

VMware, Inc 151


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

VMware recommends that you create the SP on the Azure Portal. However, if you prefer to use
either the Azure CLI or Azure PowerShell, see the following Microsoft product documentation:

Azure CLI

Azure PowerShell

Important

To create an Azure Service Principal or Application Registration, you must be an


Administrator in your Azure Active Directory tenant. Alternatively, all Users must
have App Registrations set to Yes, which allows all Users to create an Azure Service
Principal.

Do the following on the Azure portal to create an Azure Service Principal:

1. Go to Azure Active Directory > Application Registrations.

2. Click New App Registration.

VMware, Inc 152


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

3. Enter the following:

Name: Enter a name that reflects what the App Registration is being used for.
Example: tanzucli

URL: Enter any URL.


In this deployment, the App Registration is used for programmatic purposes only. In
such cases, the URL can be anything. However, the field is required.

Supported Account Type: (Optional) By default, Accounts in this organizational


directory only (Default Directory only - Single tenant) is selected.
In the default case, the new App Registration is used for a Single Azure Active
Directory tenant and for development clusters. Depending on the size of the
organization that Tanzu is deployed in, the App Registration may need to be available
across one-to-many Azure Active Directory tenants. For such cases, select the
appropriate multi-tenant option.

VMware, Inc 153


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

After the Application Registration is created, an Overview page appears.

4. Copy the values for the Application client ID and Directory (tenant) ID from the Overview
page. You will need the IDs for running the Tanzu CLI.

5. Add a Key to the Application Registration.

You will use the key for programmatic authentication and execution.

1. Click Certificates & secrets > Client secrets > New client secret.

VMware, Inc 154


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

2. Choose the expiration date.

3. Store the randomly generated key so that it can be used later.

6. Assign VM Contributor and Network Contributor roles to the Azure SP.

The roles provide the minimum level of permissions required for the Tanzu CLI to function
properly within Azure.

Assign the roles through the Subscription scope. Depending on your security boundaries,
you can also assign it at the Resource Group scope.

Important

To assign a role to the SP, you must have either the Owner role or User
Access Administration role within the scope of the Azure subscription.

1. Find your specific Subscription on the Azure Portal and go to Access Control (IAM) >
Roles.

2. Click Add role assignment.

VMware, Inc 155


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

3. In the Add role assignment page, select User, group, or service principal.

4. For Select Members, search for the new SP name you created.

7. Make a note of the following information. You will need the information to create the
configuration files to set up the Bootstrap machine and Tanzu CLI.

Azure Subscription ID

Azure Active Directory Tenant ID

Azure Application ID (ServicePrincipal)

Azure Application Key

Set Up Bootstrap VM
You will use the bootstrap VM to deploy the Tanzu Kubernetes Grid management and workload
clusters. Create the bootstrap VM after you have set up your Microsoft Azure environment.

You will set up the bootstrap VM with the following:

Authentication and access to VMware Customer Connect


You will download the required Tanzu components from VMware Customer Connect.

Azure Tenant, subscription, and client IDs


The IDs are for the Azure subscription on which you created resources using the ARM
template.

Docker

Azure CLI

Tanzu CLI

VMware, Inc 156


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Tanzu Kubectl

To set up the bootstrap VM:

1. Verify that the VM is up and running.

2. Connect to the VM through a standard SSH connection.

3. Run the following Shell commands to set up the bootstrap VM. Replace the variables with the
VMware account information needed to access VMware Customer Connect and Azure IDs
for the Azure subscription on which you created resources using the ARM template and
Application Registration/Service Principal.

# Variables
export VMWUSER = "<CUSTOMER_CONNECT_USERID>"
export VMWPASS = "<CUSTOMER_CONNECT_PWD>"
export AZURETENANTID = "<AAD Tenant ID>"
export AZURESUBSCRIPTION = "<Subscription GUID>"
export AZURECLIENTID = "<Service Principal ID>"
export AZURECLIENTSECRET = "<Service Principal Secret>"

sudo apt-get update


sudo apt-get upgrade

# Docker Install & Verify


sudo apt-get install \
apt-transport-https \
ca-certificates \
curl \
gnupg \
lsb-release

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/s


hare/keyrings/docker-archive-keyring.gpg

echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive
-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/nul
l

sudo apt-get update


sudo apt-get install docker-ce docker-ce-cli containerd.io

sudo groupadd docker


sudo usermod -aG docker $USER

# Optional Verification
# docker run hello-world

# Downloading and Installing Tanzu CLI

git clone https://github.com/z4ce/vmw-cli


cd vmw-cli
curl -o tmc 'https://tmc-cli.s3-us-west-2.amazonaws.com/tmc/0.4.3-fcb03104/linux/x64/t
mc'
./vmw-cli ls
./vmw-cli ls vmware_tanzu_kubernetes_grid
./vmw-cli cp tanzu-cli-bundle-linux-amd64.tar.gz

VMware, Inc 157


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

./vmw-cli cp "$(./vmw-cli ls vmware_tanzu_kubernetes_grid | grep kubectl-linux | cut -


d ' ' -f1)"
curl -o yq -L https://github.com/mikefarah/yq/releases/latest/download/yq_linux_amd64
fi

sudo install yq /usr/local/bin


sudo install tmc /usr/local/bin/tmc
yes | tar --overwrite -xzvf tanzu-cli-bundle-linux-amd64.tar.gz
yes | gunzip kubectl-*.gz
sudo install kubectl-linux-* /usr/local/bin/kubectl
cd cli/
sudo install core/*/tanzu-core-linux_amd64 /usr/local/bin/tanzu
yes | gunzip *.gz
sudo install imgpkg-linux-amd64-* /usr/local/bin/imgpkg
sudo install kapp-linux-amd64-* /usr/local/bin/kapp
sudo install kbld-linux-amd64-* /usr/local/bin/kbld
sudo install vendir-linux-amd64-* /usr/local/bin/vendir
sudo install ytt-linux-amd64-* /usr/local/bin/ytt
cd ..

tanzu plugin sync


tanzu config init

# Azure CLI Install and VM Acceptance


curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash

az login --service-principal --username $AZURECLIENTID --password $AZURECLIENTSECRET -


-tenant $AZURETENANTID
az vm image terms accept --publisher vmware-inc --offer tkg-capi --plan k8s-1dot21dot2
-ubuntu-2004 --subscription $AZURESUBSCRIPTION

Note

Because of permission issues, you will have to log out and log in to the bootstrap VM
after installing Docker and before you download and install the Tanzu components.

If you prefer not to copy paste code, you can use the following sample script files:

bootstrapsetup.sh

bootstraptanzu.sh

Deploy Tanzu Kubernetes Grid


Deploy Tanzu Kubernetes Grid after you set up your Azure environment and bootstrap VM. You will
use Tanzu CLI to deploy a management cluster and workload cluster.

1. Create a YAML file that contains the required configuration details.

The ex-config.yaml sample YAML file contains the minimum configuration needed to deploy
a management cluster and workload clusters. The configuration contains the default values
used in the ARM template. Change the values in the YAML as needed for your deployment.
For example, replace the values for the Azure IDs, Application Registration/Service Principal,
cluster name, and the Base 64 encoded value of the public key.

2. Run the following commands from your bootstrap VM to create the management and

VMware, Inc 158


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

workload clusters.

tanzu management-cluster create --file config.yaml -v 0-9

tanzu cluster create –file config.yaml -v 0-9

For additional product documentation on how to create the YAML configuration file and what each
value corresponds to in Azure, see Management Cluster Configuration for Microsoft Azure.

Configure SaaS Services


The following VMware SaaS services provide additional Kubernetes lifecycle management,
observability, and service mesh features.

Tanzu Mission Control (TMC)

Tanzu Observability (TO)

Tanzu Service Mesh (TSM)

For configuration information, see Configure SaaS Services.

(Optional) Deploy Tanzu Kubernetes Grid Packages


A package in Tanzu Kubernetes Grid is a collection of related software that supports or extends the
core functionality of the Kubernetes cluster in which the package is installed.

These packages are available for deployment in each workload cluster that you deploy, but they are
not automatically installed and working as pods.

Tanzu Kubernetes Grid includes two types of packages, auto-managed packages and CLI-managed
packages.

Auto-Managed Packages
Tanzu Kubernetes Grid automatically installs the auto-managed packages during cluster creation. For
more information about auto-managed packages, see Auto-Managed Packages.

CLI-Managed Packages
A CLI-managed package is an optional component of a Kubernetes cluster that you can install and
manage with the Tanzu CLI. These packages are installed after cluster creation. CLI-managed
packages are grouped into package repositories in the Tanzu CLI. If a package repository that
contains CLI-managed packages is available in the target cluster, you can use the Tanzu CLI to install
and manage any of the packages from that repository.

Using the Tanzu CLI, you can install CLI-managed packages from the built-in tanzu-standard
package repository or from package repositories that you add to your target cluster. From the tanzu-
standard package repository, you can install the Cert Manager, Contour, External DNS, Fluent Bit,
Grafana, Harbor, Multus CNI, and Prometheus packages. For more information about CLI-managed
packages, see CLI-Managed Packages.

The following provide more information on installing VMware recommended CLI-managed


packages:

VMware, Inc 159


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Install Cert Manager

Implement Ingress Control with Contour

Implement Log Forwarding with Fluent Bit

Implement Monitoring with Prometheus and Grafana

Implement Multiple Pod Network Interfaces with Multus

Implement Service Discovery with ExternalDNS

Deploy Harbor Registry as a Shared Service

If your deployment requires Harbor to take on a heavy load and store large images in the registry,
you can install Harbor into a separate workload cluster.

VMware, Inc 160


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

VMware Tanzu for Kubernetes Operation on


vSphere Reference Designs and
Deployment

The following documentation lays out the reference designs for deploying Tanzu for Kubernetes
Operation (informally known as TKO) on vSphere. Separate reference designs are provided for
environments that use vSphere networking, VDS, and environments that use NSX-T.

VMware Tanzu for Kubernetes Operations on vSphere Reference Design


Deploy Tanzu for Kubernetes Operations on vSphere with VMware VDS

VMware Tanzu for Kubernetes Operations on vSphere with NSX-T Reference Design
Deploy VMware Tanzu for Kubernetes Operations on VMware vSphere with VMware
NSX-T

VMware Tanzu for Kubernetes Operations on vSphere


Reference Design
Tanzu for Kubernetes Operations (informally known as TKO) simplifies operating Kubernetes for
multi-cloud deployment by centralizing management and governance for clusters and teams across
on-premises, public clouds, and edge. It delivers an open source aligned Kubernetes distribution
with consistent operations and management to support infrastructure and app modernization.

This document describes a reference design for deploying VMware Tanzu for Kubernetes
Operations on vSphere backed by vSphere Networking (VDS).

The following reference design is based on the architecture and components described in VMware
Tanzu for Kubernetes Operations Reference Architecture.

VMware, Inc 161


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Supported Component Matrix


The following table provides the component versions and interoperability matrix supported with the
reference design:

Software Components Version

Tanzu Kubernetes Grid 2.3.0

VMware vSphere ESXi 8.0 U1 and later

VMware vCenter (VCSA) 8.0 U1 and later

VMware vSAN 8.0 U1 and later

NSX Advanced LB 22.1.2

For the latest information, see VMware Product Interoperability Matrix.

Tanzu Kubernetes Grid Components


VMware Tanzu Kubernetes Grid (TKG) provides organizations with a consistent, upstream-
compatible, regional Kubernetes substrate that is ready for end-user workloads and ecosystem
integrations. You can deploy Tanzu Kubernetes Grid across software-defined datacenters (SDDC)
and public cloud environments, including vSphere, Microsoft Azure, and Amazon EC2.

Tanzu Kubernetes Grid comprises the following components:

Management Cluster - A management cluster is the first element that you deploy when you create a
Tanzu Kubernetes Grid instance. The management cluster is a Kubernetes cluster that performs the
role of the primary management and operational center for the Tanzu Kubernetes Grid instance. The
management cluster is purpose-built for operating the platform and managing the lifecycle of Tanzu
Kubernetes clusters.

VMware, Inc 162


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

ClusterClass API - Tanzu Kubernetes Grid 2 functions through the creation of a management
Kubernetes cluster which holds ClusterClass API. The ClusterClass API then interacts with the
infrastructure provider to service workload Kubernetes cluster lifecycle requests. The earlier
primitives of Tanzu Kubernetes Clusters will still exist for Tanzu Kubernetes Grid 1.X. A new feature
has been introduced as a part of Cluster API called ClusterClass which reduces the need for
redundant templating and enables powerful customization of clusters. The whole process for creating
a cluster using ClusterClass is the same as before but with slightly different parameters.

Workload Clusters - Workload Clusters are the Kubernetes clusters in which your application
workloads run. These clusters are also referred to as Tanzu Kubernetes clusters. Workload Clusters
can run different versions of Kubernetes, depending on the needs of the applications they run.

Shared Service Cluster - Each Tanzu Kubernetes Grid instance can only have one shared services
cluster. You will deploy this cluster only if you intend to deploy shared services such as Contour and
Harbor.

Tanzu Kubernetes Cluster Plans - A cluster plan is a blueprint that describes the configuration with
which to deploy a Tanzu Kubernetes cluster. It provides a set of configurable values that describe
settings like the number of control plane machines, worker machines, VM types, and so on. This
release of Tanzu Kubernetes Grid provides two default templates, dev and prod.

Tanzu Kubernetes Grid Instance - A Tanzu Kubernetes Grid instance is the full deployment of Tanzu
Kubernetes Grid, including the management cluster, the workload clusters, and the shared services
cluster that you configure.

Tanzu CLI - A command-line utility that provides the necessary commands to build and operate
Tanzu management and Tanzu Kubernetes clusters. Starting with TKG 2.3.0, Tanzu Core CLI is now
distributed separately from Tanzu Kubernetes Grid. For more information about installing the Tanzu
CLI for use with Tanzu Kubernetes Grid, see [Install the Tanzu CLI]

Carvel Tools - Carvel is an open-source suite of reliable, single-purpose, composable tools that aid
in building, configuring, and deploying applications to Kubernetes. Tanzu Kubernetes Grid uses the
following Carvel tools:

ytt - A command-line tool for templating and patching YAML files. You can also use ytt to
collect fragments and piles of YAML into modular chunks for reuse.

kapp - The application deployment CLI for Kubernetes. It allows you to install, upgrade, and
delete multiple Kubernetes resources as one application.

kbld - An image-building and resolution tool.

imgpkg - A tool that enables Kubernetes to store configurations and the associated
container images as OCI images, and to transfer these images.

yq - a lightweight and portable command-line YAML, JSON, and XML processor. yq uses jq-
like syntax but works with YAML files as well as JSON and XML.

Bootstrap Machine - The bootstrap machine is the laptop, host, or server on which you download
and run the Tanzu CLI. This is where the initial bootstrapping of a management cluster occurs before
it is pushed to the platform where it will run.

Tanzu Kubernetes Grid Installer - The Tanzu Kubernetes Grid installer is a graphical wizard that you
launch by running the tanzu management-cluster create --ui command. The installer wizard runs
locally on the bootstrap machine and provides a user interface to guide you through the process of

VMware, Inc 163


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

deploying a management cluster.

Tanzu Kubernetes Grid Storage


Many storage options are available and Kubernetes is agnostic about which option you choose.

For Kubernetes stateful workloads, Tanzu Kubernetes Grid installs the vSphere Container Storage
interface (vSphere CSI) to provision Kubernetes persistent volumes for pods automatically. While the
default vSAN storage policy can be used, site reliability engineers (SREs) and administrators should
evaluate the needs of their applications and craft a specific vSphere Storage Policy. vSAN storage
policies describe classes of storage such as SSD and NVME, as well as cluster quotas.

In vSphere 7u1+ environments with vSAN, the vSphere CSI driver for Kubernetes also supports
creating NFS File Volumes, which support ReadWriteMany access modes. This allows for
provisioning volumes which can be read and written from multiple pods simultaneously. To support
this, the vSAN File Service must be enabled.

You can also use other types of vSphere datastores. There are Tanzu Kubernetes Grid Cluster Plans
that operators can define to use a certain vSphere datastore when creating new workload clusters.
All developers would then have the ability to provision container-backed persistent volumes from
that underlying datastore.

Tanzu Kubernetes Clusters Networking


A Tanzu Kubernetes cluster provisioned by Tanzu Kubernetes Grid supports two Container Network
Interface (CNI) options:

Antrea

Calico

Both are open-source software that provide networking for cluster pods, services, and ingress.

When you deploy a Tanzu Kubernetes cluster using Tanzu Mission Control or Tanzu CLI, Antrea CNI
is automatically enabled in the cluster.

Tanzu Kubernetes Grid also supports Multus CNI which can be installed through Tanzu user-
managed packages. Multus CNI lets you attach multiple network interfaces to a single pod and
associate each with a different address range.

To provision a Tanzu Kubernetes cluster using a non-default CNI, see the following instructions:

Deploy Tanzu Kubernetes clusters with Calico

Implement Multiple Pod Network Interfaces with Multus

Each CNI is suitable for a different use case. The following table lists some common use cases for the
three CNIs that Tanzu Kubernetes Grid supports. This table will help you with information on
selecting the right CNI in your Tanzu Kubernetes Grid implementation.

CNI Use Case Pros and Cons

VMware, Inc 164


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Ant
Enable Kubernetes pod networking with IP overlay networks using Pros
rea
VXLAN or Geneve for encapsulation. Optionally encrypt node-to-
- Provides an option to configure
node communication using IPSec packet encryption.
egress IP pool or static egress IP for
Antrea supports advanced network use cases like kernel bypass and Kubernetes workloads.
network service mesh.

Cali Calico is used in environments where factors like network


Pros
co performance, flexibility, and power are essential.
- Support for network policies
For routing packets between nodes, Calico leverages the BGP
routing protocol instead of an overlay network. This eliminates the - High network performance
need to wrap packets with an encapsulation layer resulting in
- SCTP support
increased network performance for Kubernetes workloads.
Cons

- No multicast support

Mul Multus CNI provides multiple interfaces per each Kubernetes pod.
Pros
tus Using Multus CRDs, you can specify which pods get which interfaces
and allow different interfaces depending on the use case. - Separation of data/control planes.

- Separate security policies can be


used for separate interfaces.

- Supports SR-IOV, DPDK, OVS-


DPDK, and VPP workloads in
Kubernetes with both cloud native
and NFV based applications in
Kubernetes.

Tanzu Kubernetes Grid Infrastructure Networking


Tanzu Kubernetes Grid on vSphere can be deployed on various networking stacks including:

VMware NSX-T Data Center Networking

vSphere Networking (VDS)

Note

The scope of this document is limited to vSphere Networking.

Tanzu Kubernetes Grid on vSphere Networking with NSX


Advanced Load Balancer
Tanzu Kubernetes Grid when deployed on the vSphere networking uses the distributed port groups
to provide connectivity to Kubernetes control plane VMs, worker nodes, services, and applications.
All hosts from the cluster where Tanzu Kubernetes clusters are deployed are connected to the
distributed switch that provides connectivity to the Kubernetes environment.

You can configure NSX ALB in Tanzu Kubernetes Grid as:

L4 load balancer for an application hosted on the TKG cluster.

VMware, Inc 165


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

The L7 ingress service provider for the application hosted on the TKG cluster.

L4 load balancer for the Kubernetes cluster control plane API server.

Each workload cluster integrates with NSX ALB by running an NSX ALB Kubernetes Operator (AKO)
on one of its nodes. The cluster’s AKO calls the Kubernetes API to manage the lifecycle of load
balancing and ingress resources for its workloads.

NSX Advanced Load Balancer Licensing


NSX ALB requires a license to enable and utilize the available load balancing features. The following
license editions are supported for VMware Tanzu for Kubernetes Operations:

VMware NSX Advance Load Balancer Enterprise Edition

VMware NSX Advanced Load Balancer essentials for Tanzu

The Enterprise Edition is the default licensing tier for an Avi Controller. A new Avi Controller is set up
in the Enterprise Edition licensing tier, and the Controller can be switched from one edition to
another. For more information about NSX ALB Feature comparison, see NSX Advanced Load
Balancer Editions.

VMware NSX ALB Enterprise Edition


The VMware NSX ALB Enterprise Edition is a full-featured Avi Vantage license that includes load
balancing, GSLB, WAF, and so on.

For more information about VMware NSX ALB Enterprise edition, see VMware NSX ALB Enterprise
Edition.

VMware NSX Advanced Load Balancer essentials for Tanzu


VMware NSX ALB essentials for Tanzu edition is supported on Avi Vantage 20.1.2 and later. NSX
ALB essentials for Tanzu has been introduced to provide basic Layer 4 load balancing services for
VMware Tanzu Basic and Standard editions.

For more information about VMware NSX ALB essentials for Tanzu edition, see VMware NSX ALB
essentials for Tanzu.

NSX Advanced Load Balancer Components


NSX ALB is deployed in Write Access Mode in the vSphere environment. This mode grants NSX
ALB Controller full write access to the vCenter which helps in automatically creating, modifying, and
removing service engines (SEs) and other resources as needed to adapt to changing traffic needs.
The core components of NSX ALB are as follows:

NSX Advanced Load Balancer Controller - NSX ALB Controller manages Virtual Service
objects and interacts with the vCenter Server infrastructure to manage the lifecycle of the
service engines (SEs). It is the central repository for the configurations and policies related to
services and management, and it provides the portal for viewing the health of VirtualServices
and SEs and the associated analytics that NSX ALB provides.

NSX Advanced Load Balancer Service Engine - The service engines (SEs) are lightweight
VMs that handle all data plane operations by receiving and executing instructions from the

VMware, Inc 166


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

controller. The SEs perform load balancing and all client- and server-facing network
interactions.

Cloud - Clouds are containers for the environment that NSX ALB is installed or operating
within. During initial setup of NSX Advanced Load Balancer, a default cloud, named Default-
Cloud, is created. This is where the first controller is deployed into Default-Cloud. Additional
clouds may be added, containing SEs and virtual services.

NSX ALB Kubernetes Operator (AKO) - It is a Kubernetes operator that runs as a pod in the
Supervisor Cluster and Tanzu Kubernetes clusters, and it provides ingress and load balancing
functionality. AKO translates the required Kubernetes objects to NSX ALB objects and
automates the implementation of ingresses, routes, and services on the service engines (SE)
through the NSX ALB Controller.

AKO Operator (AKOO) - This is an operator which is used to deploy, manage, and remove
the AKO pod in Kubernetes clusters. This operator when deployed creates an instance of the
AKO controller and installs all the relevant objects like:
AKO StatefulSet

ClusterRole and ClusterRoleBinding

ConfigMap required for the AKO controller and other artifacts.

Tanzu Kubernetes Grid management clusters have an AKO operator installed out-of-the-box during
cluster deployment. By default, a Tanzu Kubernetes Grid management cluster has a couple of
AkoDeploymentConfig created which dictates when and how AKO pods are created in the workload
clusters. For more information, see AKO Operator documentation.

Optionally, you can enter one or more cluster labels to identify clusters on which to selectively
enable NSX ALB or to customize NSX ALB settings for different groups of clusters. This is useful in
the following scenarios: - You want to configure different sets of workload clusters to different
Service Engine Groups to implement isolation or to support more Service type Load Balancers than
one Service Engine Group’s capacity. - You want to configure different sets of workload clusters to
different Clouds because they are deployed in different sites.

To enable NSX ALB selectively rather than globally, add labels in the format key: value pair in the
management cluster config file. This will create a default AKO Deployment Config (ADC) on
management cluster with the NSX ALB settings provided. Labels that you define here will be used to
create a label selector. Only workload cluster objects that have the matching labels will have the load
balancer enabled.

To customize the NSX ALB settings for different groups of clusters, create an AKO Deployment
Config (ADC) on management cluster by customizing the NSX ALB settings, and providing a unique
label selector for the ADC. Only the workload cluster objects that have the matching labels will have
these custom settings applied.

You can label the cluster during the workload cluster deployment or label it manually post cluster
creation. If you define multiple key-values, you need to apply all of them. - Provide an AVI_LABEL
in the below format in the workload cluster deployment config file, and it will automatically label the
cluster and select the matching ADC based on the label selector during the cluster deployment.
AVI_LABELS: | 'type': 'tkg-workloadset01' - Optionally, you can manually label the cluster object
of the corresponding workload cluster with the labels defined in ADC. kubectl label cluster
<cluster-name> type=tkg-workloadset01

VMware, Inc 167


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Each environment configured in NSX ALB is referred to as a cloud. Each cloud in NSX ALB
maintains networking and NSX ALB Service Engine settings. The cloud is configured with one or
more VIP networks to provide IP addresses to load balancing (L4 or L7) virtual services created
under that cloud.

The virtual services can span across multiple service engines if the associated Service Engine Group
is configured in the Active/Active HA mode. A service engine can belong to only one Service
Engine group at a time.

IP address allocation for virtual services can be over DHCP or through the in-built IPAM functionality
of NSX Advanced Load Balancer. The VIP networks created or configured in NSX Advanced Load
Balancer are associated with the IPAM profile.

Tanzu Kubernetes Grid Clusters Recommendations


Decision
Design Decision Design Justification Design Implications
ID

TKO- Register the management Tanzu Mission Control automates the creation of Only Antrea CNI is
TKG-001 cluster with Tanzu Mission the Tanzu Kubernetes clusters and manages the life supported on
Control. cycle of all clusters centrally. Workload clusters
created from the TMC
portal.

TKO- Use NSX ALB as your NSX ALB is tightly coupled with TKG and vSphere. Adds NSX ALB
TKG- control plane endpoint Since NSX ALB is a VMware product, customers License Cost to the
002 provider and for will have single point of contact for support. solution.
application load
balancing.

TKO- Deploy Tanzu Kubernetes Large form factor should suffice to integrate TKG Consume more
TKG- Management clusters in Mgmt Cluster with TMC, pinniped and velero Resources from
003 large form factor. deployment. This must be capable of Infrastructure.
accommodating 100+ Tanzu Workload Clusters.

TKO- Deploy Tanzu Kubernetes This deploys multiple control plane nodes and Consume more
TKG- clusters with prod plan provides high availability for the control plane. Resources from
004 (Management Cluster and Infrastructure.
Workload Clusters).

TKO- Enable identity Role-based access control to Tanzu Kubernetes Required External
TKG- management for TKG Grid clusters. Identity Management.
005 clusters.

TKO- Enable Machine Health MachineHealthCheck controller helps to provide NA


TKG- Checks for TKG clusters. health monitoring and auto-repair for management
006 and workload clusters Machines.

Generic Network Architecture


For the deployment of Tanzu Kubernetes Grid in the vSphere environment, it is required to build
separate networks for the Tanzu Kubernetes Grid management cluster and workload clusters, NSX
ALB management, cluster-VIP network for control plane HA, Tanzu Kubernetes Grid management
VIP or data network, and Tanzu Kubernetes Grid workload data or VIP network.

The network reference design can be mapped into this general framework:

VMware, Inc 168


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

This topology enables the following benefits:

Isolate and separate SDDC management components (vCenter, ESX) from the Tanzu
Kubernetes Grid components. This reference design allows only the minimum connectivity
between the Tanzu Kubernetes Grid clusters and NSX ALB to the vCenter Server.

Isolate and separate NSX ALB management network from the Tanzu Kubernetes Grid
management segment and the Tanzu Kubernetes Grid workload segments.

Depending on the workload cluster type and use case, multiple workload clusters may

VMware, Inc 169


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

leverage the same workload network or new networks can be used for each workload
cluster. To isolate and separate Tanzu Kubernetes Grid workload cluster networking from
each other, it is recommended to use separate networks for each workload cluster and
configure the required firewall between these networks. For more information, see Firewall
Requirements.

Separate provider and tenant need access to the Tanzu Kubernetes Grid environment.
Only provider administrators need access to the Tanzu Kubernetes Grid management
cluster. This prevents tenants from attempting to connect to the Tanzu Kubernetes
Grid management cluster.

Only allow tenants to access their Tanzu Kubernetes Grid workload clusters and restrict
access to this cluster from other tenants.

Network Requirements
As per the defined architecture, the list of required networks follows:

Network Type DHCP


Description & Recommendations
Service

NSX ALB Option


NSX ALB controllers and SEs will be attached to this network.
Management al
Network DHCP is not a mandatory requirement on this network as NSX ALB can take care of
IPAM.

TKG Yes Control plane and worker nodes of TKG management cluster and shared service clusters
Management will be attached to this network.
Network
Creating shared service cluster on a separate network is also supported.

TKG Workload Yes Control plane and worker nodes of TKG workload clusters will be attached to this
Network network.

TKG Cluster No Virtual services for control plane HA of all TKG clusters (management, shared service,
VIP/Data and workload).
Network
Reserve sufficient IP addresses depending on the number of TKG clusters planned to be
deployed in the environment. The NSX ALB takes care of IPAM on this network.

TKG No Virtual services for all user-managed packages (such as Contour, Harbor, Contour,
Management Prometheus, Grafana) hosted on the Shared service cluster. For more information, see
VIP/Data User-Managed Packages.
Network

TKG Workload No Virtual services for all applications are hosted on the workload clusters.
VIP/Data
Network Reserve sufficient IP addresses depending on the number of applications that are
planned to be hosted on the workload clusters and scalability considerations.

Network Recommendations
The key network recommendations for a production-grade Tanzu Kubernetes Grid deployment with
NSX-T Data Center Networking are as follows:

VMware, Inc 170


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Decision
Design Decision Design Justification Design Implications
ID

TKO- Use Dedicated networks for the Management Cluster To have a flexible Additional VLAN
NET-001 Nodes and Workload Cluster Nodes. firewall and security Required (OPEX
policies. overhead).

TKO- Use Dedicated VIP network for the Application To have a flexible Additional VLAN
NET- Hosted in Management and Workload Cluster. firewall and security Required (OPEX
002 policies. overhead).

TKO- Shared Service Cluster uses Management network Host Shared Services VLAN Based Firewall
NET- and Application VIP network of Management Cluster. like Harbor. Policies are not
003 possible.

Node IPAM can be configured for standalone management clusters on vSphere, and the associated
class-based workload clusters that they manage. In the Tanzu Kubernetes Grid Management
configuration file, a dedicated Node IPAM pool is defined for the management cluster only. The
following types of Node IPAM pools are available for workload clusters: - InClusterIPPool -
Configures IP pools that are only available to workload clusters in the same management cluster
namespace. For example, default. - GlobalInClusterIPPool - Configures IP pools with addresses that
can be allocated to workload clusters across multiple namespaces. Node IPAM in TKG provides
flexibility in managing IP addresses for both management and workload clusters that allows efficient
IP allocation and management within the cluster environment.

Subnet and CIDR Examples


This document uses the following CIDRs for Tanzu Kubernetes Grid deployment:

Gateway
Network Type Port Group Name DHCP Pool NSX ALB IP Pool
CIDR

NSX ALB Mgmt sfo01-w01-vds01- 172.16.10.1 N/A 172.16.10.100-


Network albmanagement /24 172.16.10.200

TKG Management sfo01-w01-vds01- 172.16.40.1 172.16.40.100- N/A


Network tkgmanagement /24 172.16.40.200

TKG Mgmt VIP sfo01-w01-vds01- 172.16.50.1 N/A 172.16.50.100-


Network tkgmanagementvip /24 172.16.50.200

TKG Cluster VIP sfo01-w01-vds01- 172.16.80.1 N/A 172.16.80.100-


Network tkgclustervip /24 172.16.80.200

TKG Workload VIP sfo01-w01-vds01- 172.16.70.1 N/A 172.16.70.100 -


Network tkgworkloadvip /24 172.16.70.200

TKG Workload sfo01-w01-vds01- 172.16.60.1 172.16.60.100- N/A


Segment tkgworkload /24 172.16.60.200

3-Network Architecture
For POC environments and minimal networks requirement, you can proceed with 3 network
architecture. In this design, we deploy the Tanzu Kubernetes Grid into 3 networks as Infrastructure
Management Network, TKG Management Network and TKG Workload Network. This design allows
us to use only 3 networks and ensures the isolation between Infra VMs, TKG Management and TKG

VMware, Inc 171


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Workload components.

This network reference design can be mapped into this general framework:

This topology enables the following benefits: - Deploy the NSX ALB components on the existing
infrastructure management network which reduces an additional network usage. - Isolate and
separate the NSX ALB, SDDC management components (vCenter and ESX) from the VMware Tanzu
Kubernetes Grid components. - Club TKG Mgmt Cluster VIP, TKG Mgmt Data VIP, TKG Mgmt into a
single network TKG-Mgmt-Network, that ensures that the TKG Management components are deployed
in a common network, and removes additional network overhead and firewall rules. - Club TKG
Workload Cluster VIP, TKG Workload Data VIP, TKG Workload into a single network TKG-Workload-
Network, that ensures that the TKG Workload components are deployed in a common network. -
Separate the Management control plane/Data VIP and the Workload control plane/Data VIP into
different networks to enhance the isolation and security.

Network Requirements

Network Type DHCP


Description
Service

Infrastructure Option NSX ALB controllers and Service Engines (SE) are attached to this network. DHCP is
Management al not a mandatory requirement on this network as NSX ALB manages the SE networking
Network with IPAM.

This network also hosts core infrastructure components such as, vCenter, ESXi hosts,
DNS, NTP, and so on.

VMware, Inc 172


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Network Type DHCP


Description
Service

TKG Yes Control plane and worker nodes of the TKG Management cluster and the shared services
Management clusters are attached to this network. The IP Assignment is managed through DHCP.
Network
TKG Management cluster VIP and TKG Management Data VIP assignment is also
managed from the same network using NSX ALB Static IP pool.

Ensure that DHCP range does not interfere with the NSX ALB IP Block reservation.

TKG Workload Yes Control plane and worker nodes of the TKG Workload cluster and the shared services
Network clusters are attached to this network. IP Assignment is managed done through DHCP.

TKG Workload cluster VIP and TKG Workload Data VIP assignment is also managed
from the same network using NSX ALB Static IP pool.

Ensure that DHCP range does not interfere with the NSX ALB IP Block reservation.

Firewall Requirements
To prepare the firewall, you need to gather the following information:

NSX ALB management network CIDR

TKG Management cluster network CIDR

TKG Cluster VIP network CIDR

TKG Management VIP network CIDR

TKG Workload cluster CIDR

VMware Harbor registry IP address

vCenter server IP address

DNS server IP addresses

NTP servers

The following table provides a list of firewall rules based on the assumption that there is no firewall
within a subnet/VLAN:

Source Destination Protocol:Port Description

TKG management and DNS Server UDP:53 DNS Service


workload networks NTP Server UDP:123 Time Synchronization

TKG management and DHCP Server UDP: 67, 68 Allows hosts to get DHCP
workload Networks addresses.

TKG management and vCenter IP TCP:443 Allows components to access


workload Networks vCenter to create VMs and
storage volumes.

VMware, Inc 173


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Source Destination Protocol:Port Description

TKG management, shared Harbor Registry TCP:443


Allows components to retrieve
service, and workload
container images.
cluster CIDR
This registry can be a local or a
public image registry
(projects.registry.vmware.com).

TKG management cluster TKG cluster VIP network TCP:6443 For management cluster to
network configure shared service and
workload cluster.

TKG shared service cluster TKG cluster VIP network TCP:6443 Allows shared cluster to register
network with management cluster.
(Required only if using a
separate network for shared
service cluster)

TKG workload cluster TKG cluster VIP network TCP:6443 Allow workload cluster to register
network with management cluster.
Note: In a 3 network design,
destination network is “TKG
Management Network”

TKG management, shared NSX ALB Controllers (NSX ALB TCP:443 Allows NSX ALB Kubernetes
service, and workload Management Network) Operator (AKO) and AKO
Networks Operator (AKOO) access to NSX
ALB Controller.

NSX ALB Management vCenter and ESXi Hosts TCP:443 Allows NSX ALB to discover
Network vCenter objects and deploy SEs
as required.

NSX ALB Controller Nodes DNS server TCP/UDP:53 DNS Service


NTP Server UDP:123 Time Synchronization

Admin network Bootstrap VM SSH:22 To deploy, manage and configure


TKG clusters.

deny-all any any deny

NSX Advanced Load Balancer Recommendations


The following table provides the recommendations for configuring NSX Advanced Load Balancer in
a Tanzu Kubernetes Grid environment:

Decision Design
Design Decision Design Justification
ID Implications

TKO- Deploy NSX ALB controller Isolate NSX ALB traffic from infrastructure management Additional
ALB-001 cluster nodes on a network traffic and Kubernetes workloads. Network
dedicated to NSX-ALB. (VLAN ) is
required.

TKO- Deploy 3 NSX ALB controllers To achieve high availability for the NSX ALB platform. In None
ALB- nodes. clustered mode, NSX ALB availability is not impacted by
002 an individual controller node failure.

VMware, Inc 174


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Decision Design
Design Decision Design Justification
ID Implications

TKO- Under Compute policies vSphere places NSX ALB Controller VMs in a way that Affinity
ALB- create ‘VM-VM anti-affinity’ always ensures maximum HA. Rules needs
003 rule that prevents collocation to be
of the NSX ALB Controllers configured
VMs on the same host. manually.

TKO- Use static IP addresses for the NSX ALB Controller cluster uses management IP addresses None
ALB- NSX ALB controllers to form and maintain quorum for the control plane cluster.
004 Any changes to management IP addresses will be
disruptive.

TKO- Use NSX ALB IPAM for Simplify the IP address management for Virtual Service None
ALB- service engine data network and Service engine from NSX ALB
005 and virtual services.

TKO- Reserve an IP address in the NSX ALB portal is always accessible over cluster IP address Additional
ALB- NSX ALB management regardless of a specific individual controller node failure. IP is
006 subnet to be used as the required.
cluster IP address for the
controller cluster.

TKO- Create a dedicated resource Guarantees the CPU and Memory allocation for NSX ALB None
ALB- pool with appropriate Controllers and avoids performance degradation in case of
007 reservations for NSX ALB resource contention.
controllers.

TKO- Replace default NSX ALB To establish a trusted connection with other infra None,
ALB- certificates with Custom CA components, and the default certificate does not include SAN entries
008 or Public CA-signed SAN entries which is not acceptable by Tanzu. are not
certificates that contains SAN applicable if
entries of all Controller wild card
nodes. certificate is
used.

TKO- Configure NSX ALB backup Periodic backup of NSX ALB configuration database is Additional
ALB- with a remote server as recommended. The database defines all clouds, all virtual Operational
009 backup location. services, all users, and others. As a best practice, store Overhead.
backups in an external location to provide backup Additional
capabilities in case of entire cluster failure. infrastructur
e Resource.

TKO- Configure Remote logging For operations teams to be able to centrally monitor NSX Additional
ALB-010 for NSX ALB Controller to ALB and escalate alerts, events must be sent from the NSX Operational
send events on Syslog. ALB Controller. Overhead.
Additional
infrastructur
e Resource.

TKO- Use LDAP/SAML based Helps to Maintain Role based Access Control. Additional
ALB-011 Authentication for NSX ALB. Configurati
on is
required.

NSX Advanced Load Balancer Service Engine Recommendations

VMware, Inc 175


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Decision
Design Decision Design Justification Design Implications
ID

TKO- NSX ALB Service Engine Provides higher resiliency, optimum Requires NSX ALB
ALB-SE- High Availability set to performance, and utilization compared to Enterprise Licensing. Only
001 Active/Active. N+M and/or Active/Standby. the Active/Standby mode
is supported with NSX ALB
essentials for Tanzu license.

Certain applications might


not work in the
Active/Active mode. For
example, applications that
preserve the client IP use
the Legacy Active/Standby
HA mode.

TKO- Dedicated Service Engine SE resources are guaranteed for TKG Dedicated service engine
ALB-SE- Group for the TKG Management Stack and provides data path Groups increase licensing
002 Management. segregation for Management and Tenant cost.
Application.

TKO- Dedicated Service Engine SE resources are guaranteed for single or Dedicated service engine
ALB-SE- Group for the TKG Workload set of workload clusters and provides data Groups increase licensing
003 Clusters Depending on the path segregation for Tenant Application cost.
nature and type of workloads hosted on workload clusters.
(dev/prod/test).

TKO- Enable ALB Service Engine Enable SEs to elect a primary amongst Requires NSX ALB
ALB-SE- Self Elections. themselves in the absence of connectivity Enterprise Licensing. This
004 to the NSX ALB controller. feature is not supported
with NSX ALB essentials for
Tanzu license.

TKO- Enable ‘Dedicated dispatcher This will enable a dedicated core for packet Consume more Resources
ALB-SE- CPU’ on Service Engine processing enabling high packet pipeline from Infrastructure.
005 Groups that contain the on the Service Engine VMs.
Service Engine VMs of 4 or Note: By default, the packet processing
more vCPUs. core also processes load-balancing flows.

Note: This setting should be


enabled on SE Groups that
are servicing applications
and has high network
requirements.

TKO- Set ‘Placement across the This allows maximum utilization of capacity None
ALB-SE- Service Engines’ setting to (Service Engine).
006 ‘Compact’.

TKO- Set the SE size to a minimum This configuration should meet the most For services that require
ALB-SE- 2vCPU and 4GB of Memory. generic use case. higher throughput, these
007 configuration needs to be
investigated and modified
accordingly.

VMware, Inc 176


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Decision
Design Decision Design Justification Design Implications
ID

TKO- Under Compute policies vSphere will take care of placing the Service Affinity Rules needs to be
ALB-SE- Create a ‘VM-VM anti-affinity Engine VMs in a way that always ensures configured manually.
008 rule for SE engines part of the maximum HA for the Service Engines part
same SE group that prevents of a Service Engine group.
collocation of the Service
Engine VMs on the same
host.

TKO- Reserve Memory and CPU for The Service Engines are a critical You must perform
ALB-SE- Service Engines. infrastructure component providing load- additional configuration to
009 balancing services to mission-critical set up the reservations.
applications. Guarantees the CPU and
Memory allocation for SE VM and avoids
performance degradation in case of
resource contention.

Kubernetes Ingress Routing


The default installation of Tanzu Kubernetes Grid does not have any ingress controller installed.
Users can use Contour (available for installation through Tanzu Packages) or a third-party ingress
controller of their choice.

Contour is an open-source controller for Kubernetes ingress routing. Contour can be installed in the
shared services cluster on any Tanzu Kubernetes Cluster. Deploying Contour is a prerequisite if you
want to deploy Prometheus, Grafana, and Harbor packages on a workload cluster.

For more information about Contour, see the Contour website and Implementing Ingress Control
with Contour.

Another option is to use the NSX ALB Kubernetes ingress controller that offers an advanced L7
ingress for containerized applications that are deployed in the Tanzu Kubernetes workload cluster.

VMware, Inc 177


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

For more information about the NSX ALB ingress controller, see Configuring L7 Ingress with NSX
Advanced Load Balancer.

Tanzu Service Mesh, which is a SaaS offering for modern applications running across multi-cluster,
multi-clouds, also offers an ingress controller based on Istio.

The following table provides general recommendations on when you should use a specific ingress
controller for your Kubernetes environment.

Ingress
Use Cases
Controller

Contour
Use Contour when only north-south traffic is needed in a Kubernetes cluster. You can apply security
policies for the north-south traffic by defining the policies in the application’s manifest file.

It’s a reliable solution for simple Kubernetes workloads.

Istio Use Istio ingress controller when you intend to provide security, traffic direction, and insights within the
cluster (east-west traffic) and between the cluster and the outside world (north-south traffic).

NSX ALB
Use NSX ALB ingress controller when a containerized application requires features like local and global
ingress
server load balancing (GSLB), web application firewall (WAF), performance monitoring, direct routing
controller
from LB to pod, and so on.

NSX ALB as in L4+L7 Ingress Service Provider


As a load balancer, NSX ALB provides an L4+L7 load balancing solution for vSphere. It includes a

VMware, Inc 178


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Kubernetes operator that integrates with the Kubernetes API to manage the lifecycle of load
balancing and ingress resources for workloads.

Legacy ingress services for Kubernetes include multiple disparate solutions. The services and
products contain independent components that are difficult to manage and troubleshoot. The ingress
services have reduced observability capabilities with little analytics, and they lack comprehensive
visibility into the applications that run on the system. Cloud-native automation is difficult in the legacy
ingress services.

In comparison to the legacy Kubernetes ingress services, NSX ALB has comprehensive load
balancing and ingress services features. As a single solution with a central control, NSX ALB is easy
to manage and troubleshoot. NSX ALB supports real-time telemetry with an insight into the
applications that run on the system. The elastic auto-scaling and the decision automation features
highlight the cloud-native automation capabilities of NSX Advanced Load Balancer.

NSX ALB with Enterprise Licensing also lets you configure L7 ingress for your workload clusters by
using one of the following options:

L7 ingress in ClusterIP mode

L7 ingress in NodePortLocal mode

L7 ingress in NodePort mode

NSX ALB L4 ingress with Contour L7 ingress

L7 Ingress in ClusterIP Mode

This option enables NSX ALB L7 ingress capabilities, including sending traffic directly from the
service engines (SEs) to the pods, preventing multiple hops that other ingress solutions need when
sending packets from the load balancer to the right node where the pod runs. The ALB controller
creates a virtual service with a backend pool with the pod IP addresses which helps to send the
traffic directly to the pods.

However, each workload cluster needs a dedicated SE group for NSX ALB Kubernetes Operator
(AKO) to work, which could increase the number of SEs you need for your environment. This mode
is used when you have a small number of workload clusters.

L7 Ingress in NodePort Mode

The NodePort mode is the default mode when AKO is installed on Tanzu Kubernetes Grid. This
option allows your workload clusters to share SE groups and is fully supported by VMware. With this
option, the services of your workloads must be set to NodePort instead of ClusterIP even when
accompanied by an ingress object. This ensures that NodePorts are created on the worker nodes
and traffic can flow through the SEs to the pods via the NodePorts. Kube-Proxy, which runs on each
node as DaemonSet, creates network rules to expose the application endpoints to each of the nodes
in the format “NodeIP:NodePort”. The NodePort value is the same for a service on all the nodes. It
exposes the port on all the nodes of the Kubernetes Cluster, even if the pods are not running on it.

L7 Ingress in NodePortLocal Mode

This feature is supported only with Antrea CNI. You must enable this feature on a workload cluster
before its creation. The primary difference between this mode and the NodePort mode is that the

VMware, Inc 179


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

traffic is sent directly to the pods in your workload cluster through node ports without interfering
Kube-proxy. With this option, the workload clusters can share SE groups. Similar to the ClusterIP
Mode, this option avoids the potential extra hop when sending traffic from the NSX ALB SEs to the
pod by targeting the right nodes where the pods run.

Antrea agent configures NodePortLocal port mapping rules at the node in the format
“NodeIP:Unique Port” to expose each pod on the node on which the pod of the service is running.
The default range of the port number is 61000-62000. Even if the pods of the service are running
on the same Kubernetes node, Antrea agent publishes unique ports to expose the pods at the node
level to integrate with the load balancer.

NSX ALB L4 Ingress with Contour L7 Ingress

This option does not have all the NSX ALB L7 ingress capabilities but uses it for L4 load balancing
only and leverages Contour for L7 Ingress. This also allows sharing SE groups across workload
clusters. This option is supported by VMware and it requires minimal setup.

NSX Advanced Load Balancer L7 Ingress Recommendations


Decision
Design Decision Design Justification Design Implications
ID

TKO- Deploy NSX ALB L7 - Provides better Network hop efficiency. Supported with Antrea CNI
ALB-L7- ingress in - Helps to reduce the east-west traffic and with IPV4 addressing.
001 NodePortLocal mode. encapsulation overhead.
- Shared Service Engine groups across To configure L7 Ingress, you
clusters and supports the load-balancing need NSX ALB Enterprise
persistence. Licensing.

Container Registry
VMware Tanzu for Kubernetes Operations using Tanzu Kubernetes Grid includes Harbor as a
container registry. Harbor provides a location for pushing, pulling, storing, and scanning container
images used in your Kubernetes clusters.

Harbor registry is used for day-2 operations of the Tanzu Kubernetes workload clusters. Typical day-
2 operations include tasks such as pulling images from Harbor for application deployment, pushing
custom images to Harbor, and so on.

You may use one of the following methods to install Harbor:

Tanzu Kubernetes Grid Package deployment - VMware recommends this installation


method for general use cases. The Tanzu packages, including Harbor, must either be pulled
directly from VMware or be hosted in an internal registry.

VM-based deployment using OVA - VMware recommends this installation method in cases
where Tanzu Kubernetes Grid is being installed in an air-gapped or Internet-restricted
environment, and no pre-existing image registry exists to host the Tanzu Kubernetes Grid
system images. VM-based deployments are only supported by VMware Global Support
Services to host the system images for air-gapped or Internet-restricted deployments. Do
not use this method for hosting application images.

VMware, Inc 180


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

If you are deploying Harbor without a publicly signed certificate, you must include the Harbor root
CA in your Tanzu Kubernetes Grid clusters. To do so, follow the procedure in Trust Custom CA
Certificates on Cluster Nodes.

Tanzu Kubernetes Grid Monitoring


Tanzu Kubernetes Grid provides cluster monitoring services by implementing the open source
Prometheus and Grafana projects.

Tanzu Kubernetes Grid includes signed binaries for Prometheus and Grafana that you can deploy on
Tanzu Kubernetes clusters to monitor cluster health and services.

Prometheus is an open source systems monitoring and alerting toolkit. It can collect metrics
from target clusters at specified intervals, evaluate rule expressions, display the results, and
trigger alerts if certain conditions arise. The Tanzu Kubernetes Grid implementation of
Prometheus includes Alert Manager, which you can configure to notify you when certain
events occur.

Grafana is an open source visualization and analytics software. It allows you to query,
visualize, alert on, and explore your metrics no matter where they are stored. Grafana is used
for visualizing Prometheus metrics without the need to manually write the PromQL queries.
You can create custom charts and graphs in addition to the pre-packaged options.

You deploy Prometheus and Grafana on Tanzu Kubernetes clusters. The following diagram shows
how the monitoring components on a cluster interact.

VMware, Inc 181


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

You can use out-of-the-box Kubernetes dashboards or you can create new dashboards to monitor
compute, network, and storage utilization of Kubernetes objects such as Clusters, Namespaces,
Pods, and so on.

You can also monitor your Tanzu Kubernetes Grid clusters with Tanzu Observability which is a SaaS
offering by VMware. Tanzu Observability provides various out-of-the-box dashboards. You can
customize the dashboards for your particular deployment. For information on how to customize
Tanzu Observability dashboards for Tanzu for Kubernetes Operations, see Customize Tanzu
Observability Dashboard for Tanzu for Kubernetes Operations.

Tanzu Kubernetes Grid Logging


Metrics and logs are critical for any system or application as they provide insights into the activities of
the system or the application. It is important to have a central place to observe a multitude of metrics
and log sources from multiple endpoints.

Log processing and forwarding in Tanzu Kubernetes Grid is provided via Fluent Bit. Fluent bit
binaries are available as part of extensions and can be installed on management cluster or in
workload cluster. Fluent Bit is a light-weight log processor and forwarder that allows you to collect

VMware, Inc 182


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

data and logs from different sources, unify them, and send them to multiple destinations. VMware
Tanzu Kubernetes Grid includes signed binaries for Fluent Bit that you can deploy on management
clusters and on Tanzu Kubernetes clusters to provide a log-forwarding service.

Fluent Bit makes use of the Input Plug-ins, the filters, and the Output Plug-ins. The Input Plug-ins
define the source from where it can collect data, and the Output plug-ins define the destination
where it should send the information. The Kubernetes filter will enrich the logs with Kubernetes
metadata, specifically labels and annotations. Once you configure Input and Output plug-ins on the
Tanzu Kubernetes Grid cluster. Fluent Bit is installed as a user-managed package.

Fluent Bit integrates with logging platforms such as VMware Aria Operations for Logs, Elasticsearch,
Kafka, Splunk, or an HTTP endpoint. For more details about configuring Fluent Bit to your logging
provider, see Implement Log Forwarding with Fluent Bit.

Bring Your Own Images for Tanzu Kubernetes Grid


Deployment
You can build custom machine images for Tanzu Kubernetes Grid to use as a VM template for the
management and Tanzu Kubernetes (workload) cluster nodes that it creates. Each custom machine
image packages a base operating system (OS) version and a Kubernetes version, along with any
additional customizations, into an image that runs on vSphere, Microsoft Azure infrastructure, and
AWS (EC2) environments.

A custom image must be based on the operating system (OS) versions that are supported by Tanzu
Kubernetes Grid. The table below provides a list of the operating systems that are supported for
building custom images for Tanzu Kubernetes Grid.

vSphere AWS Azure

- Ubuntu 20.04 - Ubuntu 20.04 - Ubuntu 20.04

- Ubuntu 18.04 - Ubuntu 18.04 - Ubuntu 18.04

- RHEL 8 - Amazon Linux 2

- Photon OS 3

- Windows 2019

For additional information on building custom images for Tanzu Kubernetes Grid, see the Build
Machine Images.

Linux Custom Machine Images

Windows Custom Machine Images

Compliance and Security


VMware published Tanzu Kubernetes releases (TKrs), along with compatible versions of Kubernetes
and supporting components, use the latest stable and generally-available update of the OS version
that it packages, containing all current CVE and USN fixes, as of the day that the image is built. The
image files are signed by VMware and have file names that contain a unique hash identifier.

VMware provides FIPS-capable Kubernetes OVA that can be used to deploy FIPS compliant Tanzu

VMware, Inc 183


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Kubernetes Grid management and workload clusters. Tanzu Kubernetes Grid core components,
such as Kubelet, Kube-apiserver, Kube-controller manager, Kube-proxy, Kube-scheduler, Kubectl,
Etcd, Coredns, Containerd, and Cri-tool are made FIPS compliant by compiling them with the
BoringCrypto FIPS modules, an open-source cryptographic library that provides FIPS 140-2
approved algorithms.

Installation Experience
Tanzu Kubernetes Grid management cluster is the first component that you deploy to get started
with Tanzu Kubernetes Grid.

You can deploy the management cluster in two ways:

Run the Tanzu Kubernetes Grid installer, a wizard interface that guides you through the
process of deploying a management cluster. This is the recommended method if you are
installing a Tanzu Kubernetes Grid management cluster for the first time.

Create and edit YAML configuration files, and use them to deploy a management cluster with
the CLI commands.

The Tanzu Kubernetes Grid Installation user interface shows that, in the current version, it is possible
to install Tanzu Kubernetes Grid on vSphere (including VMware Cloud on AWS), AWS EC2, and
Microsoft Azure. The UI provides a guided experience tailored to the IaaS, in this case, VMware
vSphere.

The installation of Tanzu Kubernetes Grid on vSphere is done through the same installer UI but
tailored to a vSphere environment.

VMware, Inc 184


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

This installation process will take you through the setup of a management cluster on your vSphere
environment. Once the management cluster is deployed, you can make use of Tanzu Mission
Control or Tanzu CLI to deploy Tanzu Kubernetes shared service and workload clusters.

Deployment Instructions
For instructions on how to deploy this reference design, see Deploy Tanzu for Kubernetes
Operations on vSphere with VMware VDS.

Summary
Tanzu Kubernetes Grid on vSphere on hyper-converged hardware offers high-performance
potential, convenience, and addresses the challenges of creating, testing, and updating on-premises
Kubernetes platforms in a consolidated production environment. This validated approach will result in
a near-production quality installation with all the application services needed to serve combined or
uniquely separated workload types through a combined infrastructure solution.

This plan meets many Day 0 needs for quickly aligning product capabilities to full stack infrastructure,
including networking, firewalling, load balancing, workload compute alignment, and other
capabilities.

Appendix A - Configure Node Sizes


The Tanzu CLI creates the individual nodes of management clusters and Tanzu Kubernetes clusters
according to the settings that you provide in the configuration file.

On vSphere, you can configure all node VMs to have the same predefined configurations, set
different predefined configurations for control plane and worker nodes, or customize the
configurations of the nodes. By using these settings, you can create clusters that have nodes with
different configurations to the management cluster nodes. You can also create clusters in which the
control plane nodes and worker nodes have different configurations.

VMware, Inc 185


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Use Predefined Node Configurations


The Tanzu CLI provides the following predefined configurations for cluster nodes:

Size CPU Memory (in GB) Disk (in GB)

Small 2 4 20

Medium 2 8 40

Large 4 16 40

Extra-large 8 32 80

To create a cluster in which all of the control plane and worker node VMs are the same size, specify
the SIZE variable. If you set the SIZE variable, all nodes will be created with the configuration that
you set.

SIZE: "large"

To create a cluster in which the control plane and worker node VMs are different sizes, specify the
CONTROLPLANE_SIZE and WORKER_SIZE options.

CONTROLPLANE_SIZE: "medium"

WORKER_SIZE: "large"

You can combine the CONTROLPLANE_SIZE and WORKER_SIZE options with the SIZE option. For
example, if you specify SIZE: "large" with WORKER_SIZE: "extra-large", the control plane nodes
will be set to large and worker nodes will be set to extra-large.

SIZE: "large"

WORKER_SIZE: "extra-large"

Define Custom Node Configurations


You can customize the configuration of the nodes rather than using the predefined configurations.

To use the same custom configuration for all nodes, specify the VSPHERE_NUM_CPUS,
VSPHERE_DISK_GIB, and VSPHERE_MEM_MIB options.

VSPHERE_NUM_CPUS: 2

VSPHERE_DISK_GIB: 40

VSPHERE_MEM_MIB: 4096

To define different custom configurations for control plane nodes and worker nodes, specify the
VSPHERE_CONTROL_PLANE_* and VSPHERE_WORKER_*

VSPHERE_CONTROL_PLANE_NUM_CPUS: 2

VSPHERE_CONTROL_PLANE_DISK_GIB: 20

VSPHERE_CONTROL_PLANE_MEM_MIB: 8192

VSPHERE_WORKER_NUM_CPUS: 4

VSPHERE_WORKER_DISK_GIB: 40

VMware, Inc 186


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

VSPHERE_WORKER_MEM_MIB: 4096

Appendix B - NSX Advanced Load Balancer Sizing Guidelines


NSX Advanced Load Balancer Controller Sizing Guidelines
Controllers are classified into the following categories:

Classification vCPUs Memory (GB) Virtual Services NSX ALB SE Scale

Essentials 4 12 0-50 0-10

Small 8 24 0-200 0-100

Medium 16 32 200-1000 100-200

Large 24 48 1000-5000 200-400

The number of virtual services that can be deployed per controller cluster is directly proportional to
the controller cluster size. See the NSX ALB Configuration Maximums Guide for more information.

Service Engine Sizing Guidelines


The service engines can be configured with a minimum of 1 vCPU core and 2 GB RAM up to a
maximum of 64 vCPU cores and 256 GB RAM. The following table provides guidance for sizing a
service engine VM with regards to performance:

Performance metric Per core performance Maximum performance on a single Service Engine VM

HTTP Throughput 5 Gbps 7 Gbps

HTTP requests per second 50k 175k

SSL Throughput 1 Gbps 7 Gbps

SSL TPS (RSA2K) 750 40K

SSL TPS (ECC) 2000 40K

Multiple performance vectors or features may have an impact on performance. For instance, to
achieve 1 Gb/s of SSL throughput and 2000 TPS of SSL with EC certificates, NSX ALB recommends
two cores.

Deploy VMware Tanzu for Kubernetes Operations on


vSphere
This document provides step-by-step instructions for deploying and configuring VMware Tanzu for
Kubernetes Operations (informally known as TKO) on a vSphere environment backed by a Virtual
Distributed Switch (VDS).

The scope of the document is limited to providing the deployment steps based on the reference
design in VMware Tanzu for Kubernetes Operations on vSphere Reference Design. This document
does not cover any deployment procedures for the underlying SDDC components.

VMware, Inc 187


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Deploying with VMware Service Installer for Tanzu


You can use VMware Service Installer for VMware Tanzu to automate this deployment.

VMware Service Installer for Tanzu automates the deployment of the reference designs for Tanzu for
Kubernetes Operations. It uses best practices for deploying and configuring the required Tanzu for
Kubernetes Operations components.

To use Service Installer to automate this deployment, see Deploying VMware Tanzu for Kubernetes
Operations on vSphere with vSphere Distributed Switch Using Service Installer for VMware Tanzu.

Alternatively, if you decide to manually deploy each component, follow the steps provided in this
document.

Supported Component Matrix


The following table provides the validated component versions for this deployment.

Software Components Version

Tanzu Kubernetes Grid 2.3.0

VMware vSphere ESXi 8.0U1 or later

VMware vCenter (VCSA) 8.0U1 or later

VMware vSAN 8.0U1 or later

NSX Advanced LB 22.1.3

For the latest information, see VMware Product Interoperability Matrix.

Prepare your Environment for Deploying Tanzu for


Kubernetes Operations
Before deploying Tanzu for Kubernetes Operations on vSphere, ensure that your environment is set
up as described in the following requirements:

General Requirements

Network Requirements

Firewall Requirements

General Requirements
The general requirements for deploying Tanzu for Kubernetes Operations on vSphere in your
environment are as follows:

vSphere 8.0 U1 or later with an Enterprise Plus license.

Your SDDC environment has the following objects:


A vSphere cluster with at least 3 hosts, on which vSphere DRS is enabled

A dedicated resource pool to deploy the Tanzu Kubernetes Grid management


cluster, shared services cluster, and workload clusters. The number of resource pools
depends on the number of workload clusters to be deployed.

VMware, Inc 188


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

VM folders to collect the Tanzu Kubernetes Grid VMs.

A datastore with sufficient capacity for the control plane and worker node VM files.

Network Time Protocol (NTP) service running on all hosts and vCenter.

A host, server, or VM based on Linux, MacOS, or Windows that acts as your bootstrap
machine, and that has docker installed. For this deployment, a virtual machine based on
Photon OS is used.

Depending on the OS flavor of the bootstrap VM, download and configure the kubectl
cluster CLI 1.26.5 from VMware Customer Connect. As part of this documentation, refer to
the section to configure required packages on the Photon OS machine.

Download Tanzu CLI v0.90.1 from VMware Customer Connect. Starting with TKG 2.3.0,
Tanzu Core CLI is now distributed separately from Tanzu Kubernetes Grid. For instructions
on how to install the Tanzu CLI for use with Tanzu Kubernetes Grid, refer Install the Tanzu
CLI.

A vSphere account with the permissions described in Required Permissions for the vSphere
Account.

Download and import NSX Advanced Load Balancer 22.1.3 OVA to Content Library.

Download the following OVA from VMware Customer Connect and import to vCenter.
Convert the imported VMs to templates.

Photon v3 Kubernetes v1.26.5 OVA

Ubuntu 2004 Kubernetes v1.26.5 OVA

Note

You can also download supported older versions of Kubernetes from VMware
Customer Connect, and import them to deploy workload clusters on the intended
Kubernetes versions.

In Tanzu Kubernetes Grid nodes, it is recommended to not use hostnames with


“.local” domain suffix. For more information, see KB article.

Resource Pools and VM Folders

The sample entries of the resource pools and folders that need to be created are as follows.

Resource Type Sample Resource Pool Name Sample Folder Name

NSX ALB Components tkg-alb-components tkg-alb-components

TKG Management components tkg-management-components tkg-management-components

TKG Shared Service Components tkg-sharedsvc-components tkg-sharedsvc-components

TKG Workload components tkg-workload01-components tkg-workload01-components

Network Requirements

VMware, Inc 189


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Create port groups on vSphere DVSwitch for deploying Tanzu for Kubernetes Operations
components as per Network Requirements defined in the reference architecture.

Firewall Requirements
Ensure that the firewall is set up as described in Firewall Requirements.

Subnet and CIDR Examples


For this demonstration, this document uses the following CIDR for Tanzu for Kubernetes Operations
deployment.

Gateway
Network Type Port Group Name DHCP Pool NSX ALB IP Pool
CIDR

NSX ALB Management sfo01-w01-vds01- 172.16.10.1 N/A 172.16.10.100-


Network albmanagement /24 172.16.10.200

TKG Management sfo01-w01-vds01- 172.16.40. 172.16.40.100- N/A


Network tkgmanagement 1/24 172.16.40.200

TKG Management VIP sfo01-w01-vds01- 172.16.50. N/A 172.16.50.100-


Network tkgmanagementvip 1/24 172.16.50.200

TKG Cluster VIP sfo01-w01-vds01- 172.16.80. N/A 172.16.80.100-


Network tkgclustervip 1/24 172.16.80.200

TKG Workload VIP sfo01-w01-vds01- 172.16.70. N/A 172.16.70.100 -


Network tkgworkloadvip 1/24 172.16.70.200

TKG Workload sfo01-w01-vds01- 172.16.60. 172.16.60.100- N/A


Segment tkgworkload 1/24 172.16.60.200

Tanzu for Kubernetes Operations Deployment Overview


The high-level steps for deploying Tanzu for Kubernetes Operations on vSphere backed by VDS are
as follows:

1. Deploy and Configure NSX Advanced Load Balancer

2. Deploy and Configure Bootstrap Machine

3. Deploy Tanzu Kubernetes Grid Management Cluster

4. Register Management Cluster with Tanzu Mission Control

5. Deploy Tanzu Kubernetes Grid Shared Services Cluster

6. Deploy Tanzu Kubernetes Grid Workload Clusters

7. Configure Tanzu SaaS Components and Deploy User-Managed Packages

Deploy and Configure NSX Advanced Load Balancer


NSX Advanced Load Balancer (ALB) is an enterprise-grade integrated load balancer that provides L4
- L7 load balancer support. It is recommended for vSphere deployments without NSX, or when there
are unique scaling requirements.

VMware, Inc 190


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

NSX Advanced Load Balancer is deployed in Write Access Mode in the vSphere Environment. This
mode grants NSX Advanced Load Balancer controllers full write access to vCenter that helps in
automatically creating, modifying, and removing service engines (SEs) and other resources as
needed to adapt to changing traffic needs.

For a production-grade deployment, it is recommended to deploy three instances of the NSX


Advanced Load Balancer controller for high availability and resiliency.

The following table provides a sample IP address and FQDN set for the NSX Advanced Load
Balancer controllers:

Controller Node IP Address FQDN

Node 1 Primary 172.16.10.11 sfo01albctlr01a.sfo01.rainpole.vmw

Node 2 Secondary 172.16.10.12 sfo01albctlr01b.sfo01.rainpole.vmw

Node 3 Secondary 172.16.10.13 sfo01albctlr01c.sfo01.rainpole.vmw

HA Address 172.16.10.10 sfo01albctlr01.sfo01.rainpole.vmw

Perform the following steps to deploy and configure NSX Advanced Load Balancer:

1. Deploy NSX Advanced Load Balancer

2. NSX Advanced Load Balancer: Initial setup

3. NSX Advanced Load Balancer: Licensing

4. NSX Advanced Load Balancer: Controller High Availability

5. NSX Advanced Load Balancer: Certificate Management

6. NSX Advanced Load Balancer: Create vCenter Cloud and SE Groups

7. NSX Advanced Load Balancer: Configure Network and IPAM & DNS Profiles

Deploy NSX Advanced Load Balancer


As a prerequisite, you must have the NSX Advanced Load Balancer 22.1.3 OVA downloaded and
imported to the content library. Deploy the NSX Advanced Load Balancer under the resource pool
“tkg-alb-components” and place it under the folder “tkg-alb-components”.

To deploy NSX Advanced Load Balancer, complete the following steps.

1. Log in to vCenter and go to Home > Content Libraries.

2. Select the content library under which the NSX Advanced Load Balancer OVA is placed.

3. Click on OVA & OVF Templates.

4. Right-click the NSX Advanced Load Balancer image and select New VM from this
Template.

5. On the Select name and folder page, enter a name and select a folder for the NSX
Advanced Load Balancer VM as tkg-alb-components.

6. On the Select a compute resource page, select the resource pool tkg-alb-components.

7. On the Review details page, verify the template details and click Next.

8. On the Select storage page, select a storage policy from the VM Storage Policy drop-down

VMware, Inc 191


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

menu and choose the datastore location where you want to store the virtual machine files.

9. On the Select networks page, select the network sfo01-w01-vds01-albmanagement and click
Next.

10. On the Customize template page, provide the NSX Advanced Load Balancer management
network details such as IP address, subnet mask, and gateway, and click Next.

11. On the Ready to complete page, review the page and click Finish.

A new task for creating the virtual machine appears in the Recent Tasks pane. After the task is
complete, the NSX Advanced Load Balancer virtual machine is created on the selected resource.
Power on the virtual machine and give it a few minutes for the system to boot. Upon successful boot
up, go to NSX Advanced Load Balancer on your browser.

Note

While the system is booting up, a blank web page or a 503 status code might
appear.

NSX Advanced Load Balancer: Initial Setup


After NSX Advanced Load Balancer is successfully deployed and running, go to NSX Advanced
Load Balancer on your browser using the URL https://<IP/FQDN> and configure the basic system
settings:

1. Set admin password and click on Create Account.

VMware, Inc 192


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

2. On the Welcome page, under System Settings, set backup passphrase and provide DNS
information, and click Next.

VMware, Inc 193


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

3. Under Email/SMTP, provide email and SMTP information, and click Next.

4. Under Multi-Tenant, configure settings as follows and click Save.

IP Route Domain: Share IP route domain across tenants

Service Engines are managed within the: Provider (Shared across tenants)

Tenant Access to Service Engine: Read Access

If you did not select the Setup Cloud After option before saving, the initial configuration wizard exits.
The Cloud configuration window does not automatically launch and you are directed to a dashboard
view on the controller.

VMware, Inc 194


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

NSX Advanced Load Balancer: NTP Configuration


To configure NTP, go to Administration > System Settings > Edit, and under DNS/NTP tab, add
your NTP server details and click Save.

Note

You may also delete the default NTP servers.

NSX Advanced Load Balancer: Licensing


You can configure the license tier as NSX ALB Enterprise or NSX ALB Essentials for VMware Tanzu
as per the feature requirement. This section focuses on enabling NSX Advanced Load Balancer
using Enterprise Tier (VMware NSX ALB Enterprise) license model.

1. To configure licensing, go to Administration > System Settings > Licensing and click on the
gear icon to change the license type to Enterprise.

VMware, Inc 195


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

2. Select Enterprise Tier as the license type and click Save.

3. Once the license tier is changed, apply the NSX Advanced Load Balancer Enterprise license
key. If you have a license file instead of a license key, apply the license by clicking on the
Upload a License File(.lic) option.

VMware, Inc 196


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

NSX Advanced Load Balancer: Controller High Availability


In a production environment, we recommend to deploy additional controller nodes and configure
the controller cluster for high availability and disaster recovery. Adding 2 additional nodes to create a
3-node cluster provides node-level redundancy for the controller and also maximizes performance
for CPU-intensive analytics functions.

To run a 3-node controller cluster, you deploy the first node and perform the initial configuration,
and set the cluster IP address. After that, you deploy and power on two more controller VMs, but
you must not run the initial configuration wizard or change the admin password for these controllers
VMs. The configuration of the first controller VM is assigned to the two new controller VMs.

The first controller of the cluster receives the Leader role. The second and third controllers work as
the Follower.

Perform the following steps to configure NSX Advanced Load Balancer cluster.

1. Log in to the primary NSX Advanced Load Balancer controller and go to Administrator >
Controller > Nodes, and click Edit.

2. Specify Name and Controller Cluster IP, and click Save. This IP address must be from the
NSX Advanced Load Balancer management network.

VMware, Inc 197


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

3. Deploy the 2nd and 3rd NSX Advanced Load Balancer controller nodes by using steps in
Deploy NSX Advanced Load Balancer.

4. Log into the primary NSX Advanced Load Balancer controller using the Controller Cluster
IP/FQDN and go to Administrator > Controller > Nodes, and click Edit. The Edit Controller
Configuration popup appears.

5. In the Cluster Nodes field, enter the IP address for the 2nd and 3rd controller, and click
Save.

VMware, Inc 198


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

After you complete these steps, the primary NSX Advanced Load Balancer controller
becomes the leader for the cluster and invites the other controllers to the cluster as
members.

NSX Advanced Load Balancer then performs a warm reboot of the cluster. This process can
take approximately 10-15 minutes. You are automatically logged out of the controller node
where you are currently logged in. To see details about the cluster formation task, enter the
cluster IP address in the browser.

The configuration of the primary (leader) controller is synchronized to the new member nodes when
the cluster comes online following the reboot. After the cluster is successfully formed, you can see
the following status:

VMware, Inc 199


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Note

In the following tasks, all NSX Advanced Load Balancer configurations are done by
connecting to the NSX ALB Controller Cluster IP/FQDN.

NSX Advanced Load Balancer: Certificate Management


The default system-generated controller certificate generated for SSL/TSL connections will not have
the required subject alternate name (SAN) entries. Perform the following steps to create a controller
certificate:

1. Log in to the NSX Advanced Load Balancer controller and go to Templates > Security >
SSL/TLS Certificates.

2. Click Create and select Controller Certificate. You can either generate a self-signed
certificate, generate CSR, or import a certificate. For the purpose of this document, a self-
signed certificate is generated.

3. Provide all required details as per your infrastructure requirements and in the Subject
Alternate Name (SAN) field, provide IP address and FQDN of all NSX Advanced Load
Balancer controllers including NSX Advanced Load Balancer cluster IP and FQDN, and click
Save.

VMware, Inc 200


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

VMware, Inc 201


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

4. After the certificate is created, capture the certificate contents as this is required while
deploying the Tanzu Kubernetes Grid management cluster. To capture the certificate
content, click the Download icon next to the certificate, and click Copy to clipboard under
Certificate.

5. To replace the certificate, go to Administration > System Settings, and edit it under Access.
You can replace the SSL/TSL certificate to previously created certificate and click Save.

VMware, Inc 202


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

6. Log out and log in to NSX Advanced Load Balancer.

NSX Advanced Load Balancer: Create vCenter Cloud and SE


Groups
NSX Advanced Load Balancer can be deployed in multiple environments for the same system. Each
environment is called a cloud. The following procedure provides steps on how to create a VMware
vCenter cloud, and as shown in the architecture two service engine (SE) groups are created.

Service Engine Group 1: Service engines part of this service engine group hosts:

Virtual services that load balances control plane nodes of Management Cluster and Shared
services cluster.

Virtual services for all load balancer functionalities requested by Tanzu Kubernetes Grid
management cluster and Shared services cluster.

Service Engine Group 2: Service engines part of this service engine group hosts virtual services that
load balances control plane nodes and virtual services for all load balancer functionalities requested
by the workload clusters mapped to this SE group.

Note

- Based on your requirements, you can create additional SE groups for the workload
clusters. - Multiple workload clusters can be mapped to a single SE group. - A Tanzu
Kubernetes Grid cluster can be mapped to only one SE group for application load

VMware, Inc 203


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

balancer services. - Control plane VIP for the workload clusters will be placed on the
respective Service Engine group assigned through AKO Deployment Config (ADC)
during cluster creation.

For information about mapping a specific service engine group to Tanzu Kubernetes Grid workload
cluster, see Configure NSX Advanced Load Balancer in Tanzu Kubernetes Grid Workload Cluster.

1. Log in to NSX Advanced Load Balancer and go to Infrastructure > Clouds > Create >
VMware vCenter/vSphere ESX.

2. Under General pane, in the Name field, enter a Cloud name.

3. Under the vCenter/vSphere pane, specify the vCenter address*, Username, and Password,
and click CONNECT.

4. Under the Data Center pane, choose the data center from the Data Center drop-down
menu. Select Content Library for SE template and click SAVE & LAUNCH.

VMware, Inc 204


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

5. To choose the NSX Advanced Load Balancer management network for service engines,
select the Management Network from the Management Network drop-down menu. Enter a
static IP address pool for SEs and VIP, and click Complete.

6. Wait for the cloud to get configured and the status to turn green.

VMware, Inc 205


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

7. To create a service engine group for Tanzu Kubernetes Grid management clusters, under
the Infrastructure tab, go to Cloud Resources > Service Engine Group. From the Select
Cloud drop-down menu, select the cloud created in the previous step and click Create.

The following components are created in NSX Advanced Load Balancer.

Object Sample Name

vCenter Cloud sfo01w01vc01

Service Engine Group 1 sfo01m01segroup01

Service Engine Group 2 sfo01w01segroup01

8. Enter a name for the Tanzu Kubernetes Grid management service engine group and set the
following parameters:

Parameter Value

High availability mode Active/Active - NSX ALB Enterprise edition


Active/Standby - NSX ALB Essentials for Tanzu edition.

Enable Service Engine Self Supported with NSX ALB Enterprise edition.
Election

Memory for caching Supported with NSX ALB Enterprise edition. You must set value to 0 for
essentials.

Memory per Service Engine 4

vCPU per Service Engine 2

Use the default values for the rest of the parameters.

VMware, Inc 206


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

For advanced configuration, under Scope tab click on the Add vCenter and select
configured vCenter cloud, select cluster and datastore for service engine placement, and
click Save.

VMware, Inc 207


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

9. Repeat steps 7 and 8 to create another service engine group for Tanzu Kubernetes Grid
workload clusters. After completing this step, you will have two service engine groups
created.

NSX Advanced Load Balancer: Configure Network and IPAM Profile


Configure Tanzu Kubernetes Grid Networks in NSX Advanced Load Balancer

As part of the cloud creation in NSX Advanced Load Balancer, only management network has been
configured in NSX Advanced Load Balancer. Perform the following steps to configure these
networks:

TKG Management Network

TKG Workload Network

VMware, Inc 208


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

TKG Cluster VIP/Data Network

TKG Management VIP/Data Network

TKG Workload VIP/Data Network

Log in to NSX Advanced Load Balancer and go to Infrastructure > Cloud Resources >
Networks.

Select the desired cloud. All the networks available in vCenter are listed.

Click on the edit icon next for the network and configure as follows. Change the provided
details as per your SDDC configuration.

Note

Not all networks are auto-discovered. For those networks, manually add the
subnet.

Network Name DHCP Subnet Static IP Pool

sfo01-w01-vds01-tkgmanagement Yes 172.16.40.0/24 NA

sfo01-w01-vds01-tkgworkload Yes 172.16.60.0/24 NA

sfo01-w01-vds01-tkgclustervip No 172.16.80.0/24 172.16.80.100 - 172.16.80.200

sfo01-w01-vds01-tkgmanagementvip No 172.16.50.0/24 172.16.50.100 - 172.16.50.200

sfo01-w01-vds01-tkgworkloadvip No 172.16.70.0/24 172.16.70.100 - 172.16.70.200

The following image shows a sample network configuration for network sfo01-w01-vds01-
tkgclustervip. You should apply the same configuration in sfo01-w01-vds01-
tkgmanagementvip and sfo01-w01-vds01-tkgworkloadvip.

VMware, Inc 209


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

The sfo01-w01-vds01-tkgmanagement and sfo01-w01-vds01-tkgworkload network should be


enabled with DHCP.

After the networks are configured, the configuration must look like the following image.

Create IPAM and DNS Profile in NSX Advanced Load Balancer and Attach it to Cloud

At this point, all the required networks related to Tanzu functionality are configured in NSX
Advanced Load Balancer, except for Tanzu Kubernetes Grid management and workload network
which uses DHCP. NSX Advanced Load Balancer provides IPAM service for Tanzu Kubernetes Grid
cluster VIP network, management VIP network, and workload VIP network.

Perform the following steps to create an IPAM profile and attach it to the vCenter cloud created
earlier.

VMware, Inc 210


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

1. Log in to NSX Advanced Load Balancer and go to Templates > Profiles > IPAM/DNS
Profiles > Create > IPAM Profile, provide the following details, and click Save.

Parameter Value

Name sfo01-w01-vcenter-ipam-01

Type AVI Vintage IPAM

Cloud for Usable Networks Tanzu-vcenter-01 (created earlier in this deployment)

Usable Networks sfo01-w01-vds01-tkgclustervip


sfo01-w01-vds01-tkgmanagementvip
sfo01-w01-vds01-tkgworkloadvip

2. Click Create > DNS Profile and provide the domain name.

VMware, Inc 211


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

3. Attach the IPAM and DNS profiles to the sfo01w01vc01 cloud.

1. Navigate to Infrastructure > Clouds.

2. Edit the sfo01w01vc01 cloud.

3. Under IPAM/DNS section, choose the IPAM and DNS profiles created earlier and
save the updated configuration.

VMware, Inc 212


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

The above steps complete the NSX Advanced Load Balancer configuration. The next step is to
deploy and configure a bootstrap machine. The bootstrap machine is used to deploy and manage
Tanzu Kubernetes clusters.

Deploy and Configure Bootstrap Machine


The deployment of the Tanzu Kubernetes Grid management and workload clusters is facilitated by
setting up a bootstrap machine where you install the Tanzu CLI and Kubectl utilities which are used
to create and manage the Tanzu Kubernetes Grid instance. This machine also keeps the Tanzu
Kubernetes Grid and Kubernetes configuration files for your deployments. The bootstrap machine
can be a laptop, host, or server running on Linux, macOS, or Windows that you deploy management
and workload clusters from.

The bootstrap machine runs a local kind cluster when Tanzu Kubernetes Grid management cluster
deployment is started. Once the kind cluster is fully initialized, the configuration is used to deploy the
actual management cluster on the backend infrastructure. After the management cluster is fully
configured, the local kind cluster is deleted and future configurations are performed with the Tanzu
CLI.

For this deployment, a Photon-based virtual machine is used as the bootstrap machine. For
information on how to configure for a macOS or Windows machine, see Install the Tanzu CLI and
Other Tools.

The bootstrap machine must meet the following prerequisites:

A minimum of 6 GB of RAM and a 2-core CPU.

System time is synchronized with a Network Time Protocol (NTP) server.

VMware, Inc 213


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Docker and containerd binaries are installed. For instructions on how to install Docker, see
Docker documentation.

Ensure that the bootstrap VM is connected to Tanzu Kubernetes Grid management network,
sfo01-w01-vds01-tkgmanagement.

To install Tanzu CLI, Tanzu Plugins, and Kubectl utility on the bootstrap machine, follow the
instructions below:

1. Download and unpack the following Linux CLI packages from VMware Tanzu Kubernetes
Grid Download Product page.

VMware Tanzu CLI v0.90.1 for Linux

kubectl cluster cli v1.26.5 for Linux

2. Execute the following commands to install Tanzu Kubernetes Grid CLI, kubectl CLIs, and
Carvel tools.

## Install required packages


tdnf install tar zip unzip wget -y

## Install Tanzu Kubernetes Grid CLI


tar -xvf tanzu-cli-linux-amd64.tar
cd ./v0.90.1/
install tanzu-cli-linux_amd64 /usr/local/bin/tanzu
chmod +x /usr/local/bin/tanzu

## Verify Tanzu CLI version

[root@tkg160-bootstrap ~] # tanzu version

version: v0.90.1
buildDate: 2023-06-29
sha: 8945351c

## Install Tanzu Kubernetes Grid CLI Plugins

[root@tkg160-bootstrap ~] # tanzu plugin group search


GROUP DESCRIPTION LATEST
vmware-tkg/default Plugins for TKG v2.3.0

[root@tkg160-bootstrap ~] # tanzu plugin install --group vmware-tkg/default


[i] Installing plugin 'isolated-cluster:v0.30.1' with target 'global'
[i] Installing plugin 'management-cluster:v0.30.1' with target 'kubernetes'
[i] Installing plugin 'package:v0.30.1' with target 'kubernetes'
[i] Installing plugin 'pinniped-auth:v0.30.1' with target 'global'
[i] Installing plugin 'secret:v0.30.1' with target 'kubernetes'
[i] Installing plugin 'telemetry:v0.30.1' with target 'kubernetes'
[ok] successfully installed all plugins from group 'vmware-tkg/default:v2.3.0'

## List installed plugins


[root@tkg160-bootstrap ~] # tanzu plugin list
Standalone Plugins
NAME DESCRIPTION
TARGET VERSION STATUS
isolated-cluster Prepopulating images/bundle for internet-restricted environ
ments global v0.30.1 installed

VMware, Inc 214


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

pinniped-auth Pinniped authentication operations (usually not directly in


voked) global v0.30.1 installed
management-cluster Kubernetes management cluster operations
kubernetes v0.30.1 installed
package Tanzu package management
kubernetes v0.30.1 installed
secret Tanzu secret management
kubernetes v0.30.1 installed
telemetry configure cluster-wide settings for vmware tanzu telemetry
kubernetes v0.30.1 installed

## Install Kubectl CLI


gunzip kubectl-linux-v1.26.5+vmware.2.gz
mv kubectl-linux-v1.26.5+vmware.2 /usr/local/bin/kubectl && chmod +x /usr/local
/bin/kubectl

# Install Carvel tools

##Install ytt
gunzip ytt-linux-amd64-v0.45.0+vmware.2.gz
chmod ugo+x ytt-linux-amd64-v0.45.0+vmware.2 && mv ./ytt-linux-amd64-v0.45.0+v
mware.2 /usr/local/bin/ytt

##Install kapp

gunzip kapp-linux-amd64-v0.55.0+vmware.2.gz
chmod ugo+x kapp-linux-amd64-v0.55.0+vmware.2 && mv ./kapp-linux-amd64-v0.55.0+
vmware.2 /usr/local/bin/kapp

##Install kbld

gunzip kbld-linux-amd64-v0.37.0+vmware.2.gz
chmod ugo+x kbld-linux-amd64-v0.37.0+vmware.2 && mv ./kbld-linux-amd64-v0.37.0+
vmware.2 /usr/local/bin/kbld

##Install impkg

gunzip imgpkg-linux-amd64-v0.36.0+vmware.2.gz
chmod ugo+x imgpkg-linux-amd64-v0.36.0+vmware.2 && mv ./imgpkg-linux-amd64-v0.3
6.0+vmware.2 /usr/local/bin/imgpkg

3. Validate Carvel tools installation using the following commands.

ytt version
kapp version
kbld version
imgpkg version

4. Install yq. yq is a lightweight and portable command-line YAML processor. yq uses jq-like
syntax but works with YAML and JSON files.

wget https://github.com/mikefarah/yq/releases/download/v4.24.5/yq_linux_amd64.t
ar.gz

tar -xvf yq_linux_amd64.tar.gz && mv yq_linux_amd64 /usr/local/bin/yq

5. Install kind.

VMware, Inc 215


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.11.1/kind-linux-amd64


chmod +x ./kind
mv ./kind /usr/local/bin/kind

6. Execute the following commands to start the Docker service and enable it to start at boot.
Photon OS has Docker installed by default.

## Check Docker service status


systemctl status docker

## Start Docker Service


systemctl start docker

## To start Docker Service at boot


systemctl enable docker

7. Execute the following commands to ensure that the bootstrap machine uses cgroup v1.

docker info | grep -i cgroup

## You should see the following


Cgroup Driver: cgroupfs

8. Create an SSH key pair.

An SSH key pair is required for Tanzu CLI to connect to vSphere from the bootstrap
machine.

The public key part of the generated key is passed during the Tanzu Kubernetes Grid
management cluster deployment.

## Generate SSH key pair


## When prompted enter file in which to save the key (/root/.ssh/id_rsa): press
Enter to accept the default and provide password
ssh-keygen -t rsa -b 4096 -C "[email protected]"

## Add the private key to the SSH agent running on your machine and enter the p
assword you created in the previous step
ssh-add ~/.ssh/id_rsa
## If the above command fails, execute "eval $(ssh-agent)" and then rerun the c
ommand

9. If your bootstrap machine runs Linux or Windows Subsystem for Linux, and it has a Linux
kernel built after the May 2021 Linux security patch, for example Linux 5.11 and 5.12 with
Fedora, run the following command.

sudo sysctl net/netfilter/nf_conntrack_max=131072

All required packages are now installed and the required configurations are in place in the bootstrap
virtual machine. The next step is to deploy the Tanzu Kubernetes Grid management cluster.

Import Base Image template for Tanzu Kubernetes Grid Cluster


Deployment
Before you proceed with the management cluster creation, ensure that the base image template is

VMware, Inc 216


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

imported into vSphere and is available as a template. To import a base image template into vSphere:

1. Go to the Tanzu Kubernetes Grid downloads page and download a Tanzu Kubernetes Grid
OVA for the cluster nodes.

2. For the management cluster, this must be either Photon or Ubuntu based Kubernetes
v1.26.5 OVA.

Note

Custom OVA with a custom Tanzu Kubernetes release (TKr) is also


supported, as described in Build Machine Images.

3. For workload clusters, OVA can have any supported combination of OS and Kubernetes
version, as packaged in a Tanzu Kubernetes release.

Note

Make sure you download the most recent OVA base image templates in the
event of security patch releases. You can find updated base image templates
that include security patches on the Tanzu Kubernetes Grid product
download page.

4. In the vSphere client, right-click an object in the vCenter Server inventory and select Deploy
OVF template.

5. Select Local file, click the button to upload files, and go to the downloaded OVA file on your
local machine.

6. Follow the installer prompts to deploy a VM from the OVA.

7. Click Finish to deploy the VM. When the OVA deployment finishes, right-click the VM and
select Template > Convert to Template.

Note

Do not power on the VM before you convert it to a template.

8. If using non administrator SSO account: In the VMs and Templates view, right-click the new
template, select Add Permission, and assign the tkg-user to the template with the TKG role.

For information about how to create the user and role for Tanzu Kubernetes Grid, see Required
Permissions for the vSphere Account.

Deploy Tanzu Kubernetes Grid (TKG) Management Cluster


The management cluster is a Kubernetes cluster that runs Cluster API operations on a specific cloud
provider to create and manage workload clusters on that provider.

The management cluster is also where you configure the shared and in-cluster services that the

VMware, Inc 217


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

workload clusters use.

You can deploy management clusters in two ways:

Run the Tanzu Kubernetes Grid installer, a wizard interface that guides you through the
process of deploying a management cluster. This is the recommended method.

Create and edit YAML configuration files, and use them to deploy a management cluster with
the CLI commands.

The following procedure provides the required steps to deploy Tanzu Kubernetes Grid management
cluster using the installer interface.

1. To launch the UI installer wizard, run the following command on the bootstrap machine:

tanzu management-cluster create --ui --bind <bootstrapper-ip>:<port> --browser


none

For example:

tanzu management-cluster create --ui --bind 172.16.40.6:8000 --browser none

2. Access Tanzu UI wizard by opening a browser and entering: http://<bootstrapper-


ip:port/

3. Click the Deploy on the VMware vSphere tile.

4. In the IaaS Provider section, enter the IP/FQDN and credentials of the vCenter server where
the Tanzu Kubernetes Grid management cluster is deployed.

Note

Do not provide a vSphere administrator account to Tanzu Kubernetes Grid.


Instead, create a custom role and user account with the required permissions
specified in Required Permissions for the vSphere Account.

VMware, Inc 218


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

5. Click Connect and accept the vCenter Server SSL thumbprint.

6. Select DEPLOY TKG MANAGEMENT CLUSTER.

7. Select the data center and provide the SSH public Key generated while configuring the
bootstrap VM.
If you have saved the SSH key in the default location, run the following command in you
bootstrap machine to get the SSH public key.

cat /root/.ssh/id_rsa.pub

8. Click Next.

9. On the Management Cluster Settings section, provide the following details and click Next.

Based on the environment requirements, select appropriate deployment type for the
Tanzu Kubernetes Grid Management cluster:

Development: Recommended for Dev or POC environments

Production: Recommended for Production environments

It is recommended to set the instance type to Large or above. For the purpose of this
document, we will proceed with deployment type Production and instance type
Medium.

Management Cluster Name: Name for your management cluster.

VMware, Inc 219


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Control Plane Endpoint Provider: Select NSX Advanced Load Balancer for Control
Plane HA.

Control Plane Endpoint: This is an optional field. If left blank, NSX Advanced Load
Balancer will assign an IP address from the pool “sfo01-w01-vds01-tkgclustervip”
created earlier.
If you need to provide an IP address, pick an IP address from “sfo01-w01-vds01-
tkgclustervip” static IP pools configured in NSX_ALB and ensure that the IP address
is unused.

Machine Health Checks: Enable. You can activate or deactivate MachineHealthCheck


on clusters after deployment by using the CLI. For instructions, see Configure
Machine Health Checks for Workload Clusters.

Enable Audit Logging: Enable for audit logging for Kubernetes API server and node
VMs. Choose as per your environment needs. For more information, see Audit
Logging.

10. On the NSX Advanced Load Balancer section, provide the following information and click
Next.

Controller Host: NSX Advanced Load Balancer Controller IP/FQDN (IP/FQDN of the
Advanced Load Balancer controller cluster configured)

Controller credentials: Username and Password of NSX Advanced Load Balancer

Controller certificate: Paste the contents of the Certificate Authority that is used to
generate your controller certificate into the Controller Certificate Authority text
box.

11. After these details are provided, click Verify Credentials and choose the following
parameters.

VMware, Inc 220


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Note

Since Tanzu Kubernetes Grid v2.1.0, you can configure the network to
separate the endpoint VIP network of the cluster from the external IP
network of the load balancer service and the ingress service in the cluster.
This feature lets you ensure the security of the clusters by providing you an
option to expose the endpoint of your management or the workload cluster
and the load balancer service and ingress service in the cluster, in different
networks.

As per the Tanzu for Kubernetes Operations 2.1 Reference Architecture, all the control plane
endpoints connected to Tanzu Kubernetes Grid cluster VIP network and data plane networks
are connected to respective management data VIP network or workload data VIP network.

Cloud Name: Name of the cloud created while configuring NSX Advanced Load
Balancer sfo01w01vc01.

Workload Cluster service Engine Group Name: Name of the service engine group
created for Tanzu Kubernetes Grid workload clusters created while configuring NSX
Advanced Load Balancer sfo01w01segroup01.

Workload Cluster Data Plane VIP Network Name & CIDR: Select Tanzu Kubernetes
Grid workload data network sfo01-w01-vds01-tkgworkloadvip and the subnet
172.16.70.0/24 associated with it.

Workload Cluster Control Plane VIP Network Name & CIDR: Select Tanzu
Kubernetes Grid cluster VIP network sfo01-w01-vds01-tkgclustervip and the
subnet 172.16.80.0/24 associated with it.

Management Cluster service Engine Group Name: Name of the service engine
group created for Tanzu Kubernetes Grid management cluster created while
configuring NSX Advanced Load Balancer sfo01m01segroup01.

Management Cluster Data Plane VIP Network Name & CIDR: Select Tanzu
Kubernetes Grid management data network sfo01-w01-vds01-tkgmanagementvip and
the subnet 172.16.50.0/24 associated with it.

Management Cluster Control Plane VIP Network Name & CIDR: Select Tanzu
Kubernetes Grid cluster VIP network sfo01-w01-vds01-tkgclustervip and the
subnet 172.16.80.0/24 associated with it.

Cluster Labels: Optional. Leave the cluster labels section empty to apply the above
workload cluster network settings by default. If you specify any label here, you must
specify the same values in the configuration YAML file of the workload cluster. Else,
the system places the endpoint VIP of your workload cluster in Management Cluster
Data Plane VIP Network by default.

VMware, Inc 221


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Note

With the above configuration, all the Tanzu workload clusters use sfo01-w01-
vds01-tkgclustervip for control plane VIP network and sfo01-w01-vds01-
tkgworkloadvip for data plane network by default. If you would like to
configure separate VIP networks for workload control plane/data networks,
create a custom AKO Deployment Config (ADC) and provide the respective
AVI_LABELS in the workload cluster config file. For more information on
network separation and custom ADC creation, see Configure Separate VIP
Networks and Service Engine Groups in Different Workload Clusters.

12. (Optional) On the Metadata page, you can specify location and labels and click Next.

13. On the Resources section, specify the resources to be consumed by Tanzu Kubernetes Grid
management cluster and click Next.

VMware, Inc 222


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

14. On the Kubernetes Network section, select the Tanzu Kubernetes Grid management
network (sfo01-w01-vds01-tkgmanagement) where the control plane and worker nodes are
placed during management cluster deployment. Ensure that the network has DHCP service
enabled. Optionally, change the pod and service CIDR.

If the Tanzu environment is placed behind a proxy, enable proxy and provide proxy details:

If you set http-proxy, you must also set https-proxy and vice-versa. You can
choose to use one proxy for HTTP traffic and another proxy for HTTPS traffic or to
use the same proxy for both HTTP and HTTPS traffic.

Under the no-proxy section, enter a comma-separated list of network CIDRs or host
names that must bypass the HTTP(S) proxy.

Your No Proxy list must include the following: * The IP address or hostname for
vCenter. Traffic to vCenter cannot be proxied.

The CIDR of the vSphere network that you selected under Network Name.
The vSphere network CIDR includes the IP address of your control plane
endpoint. If you entered an FQDN under control plane endpoint, add both
the FQDN and the vSphere network CIDR to the no-proxy section.

Internally, Tanzu Kubernetes Grid appends localhost, 127.0.0.1, the values


of Cluster Pod CIDR and Cluster Service CIDR, .svc, and
.svc.cluster.local to the list that you enter in this field.

Note

If the Kubernetes cluster needs to communicate with external


services and infrastructure endpoints in your Tanzu Kubernetes Grid
environment, ensure that those endpoints are reachable by your
proxies or add them to the no-proxy section. Depending on your
environment configuration, this may include, but is not limited to,
your OIDC or LDAP server, Harbor, NSX, NSX Advanced Load
Balancer.

VMware, Inc 223


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

15. (Optional) Specify identity management with OIDC or LDAP. For this deployment, identity
management is not enabled.

If you would like to enable identity management, see Enable and Configure Identity
Management During Management Cluster Deployment.

16. Select the OS image to use for deploying the management cluster

Note

This list appears empty if you don’t have a compatible template present in
your environment. See the steps provided in Import Base Image Template
into vSphere.

17. Check the “Participate in the Customer Experience Improvement Program”, if you so desire,
and click Review Configuration.

Note

Tanzu Kubernetes Grid v2.1.0 has a known issue that installer UI populates an
empty NSXALB_LABEL in the cluster configuration and leads to management
cluster creation failure. It is recommended to export the cluster configuration
to a file, delete the empty label, and run the cluster creation command from
CLI instead of deploying the cluster from UI.

VMware, Inc 224


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

18. When you click on Review Configuration, the installer populates the cluster configuration
file, which is located in the ~/.config/tanzu/tkg/clusterconfigs subdirectory, with the
settings that you specified in the interface. You can optionally export a copy of this
configuration file by clicking Export Configuration.

19. Click Deploy the Management cluster

While the cluster is being deployed, you will find that a virtual service is created in NSX
Advanced Load Balancer and new service engines are deployed in vCenter by NSX
Advanced Load Balancer and the service engines are mapped to the SE Group
sfo01m01segroup01.​​

When Tanzu Kubernetes Grid management cluster is being deployed, behind the scenes:

NSX Advanced Load Balancer service engines get deployed in vCenter and this task is
orchestrated by the NSX Advanced Load Balancer controller.

Service engine status in NSX Advanced Load Balancer: The following snippet shows the
service engines status. They are in the initializing state for sometime and then the status
changes to Up.

Service engine group status in NSX Advanced Load Balancer: As per the configuration, the
virtual service required for Tanzu Kubernetes Grid clusters control plane HA are hosted on
service engine group sfo01m01segroup01.

Virtual service status in NSX Advanced Load Balancer: The cluster is configured with
Production type that deployed 3 control plane nodes, which are placed behind the cluster
VIP.

VMware, Inc 225


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

The installer automatically sets the context to the Tanzu Kubernetes Grid management
cluster on the bootstrap machine. Now you can access the Tanzu Kubernetes Grid
management cluster from the bootstrap machine and perform additional tasks such as
verifying the management cluster health and deploying the workload clusters, etc.

To get the status of Tanzu Kubernetes Grid management cluster, run the following
command:

tanzu management-cluster get

Use kubectl to get the status of the Tanzu Kubernetes Grid management cluster nodes.

Register Management Cluster with Tanzu Mission Control


If you want to register your management cluster with Tanzu Mission Control, see Register Your
Management Cluster with Tanzu Mission Control.

VMware, Inc 226


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Configure AKO Deployment Config (ADC) for Workload


Clusters
Tanzu Kubernetes Grid v2.1.0 management clusters with NSX Advanced Load Balancer are
deployed with 2 AKODeploymentConfigs.

install-ako-for-management-cluster: default configuration for management cluster

install-ako-for-all: default configuration for all workload clusters. By default, all the
workload clusters reference this file for their virtual IP networks and service engine (SE)
groups. This ADC configuration does not enable NSX L7 Ingress by default.

As per this Tanzu deployment, create 2 more ADCs:

tanzu-ako-for-shared: Used by shared services cluster to deploy the virtual services in TKG
Mgmt SE Group and the loadbalancer applications in TKG Management VIP Network.

tanzu-ako-for-workload-L7-ingress: Use this ADC only if you would like to enable NSX
Advanced Load Balancer L7 ingress on workload cluster. Otherwise, leave the cluster labels
empty to apply the network configuration from default ADC install-ako-for-all.

Configure AKODeploymentConfig (ADC) for Shared Services Cluster


As per the defined architecture, shared services cluster uses the same control plane and data plane
network as the management cluster. Shared services cluster control plane endpoint uses TKG
Cluster VIP Network, application loadbalancing uses TKG Management Data VIP network and the
virtual services are deployed in sfo01m01segroup01 SE group. This configuration is enforced by
creating a custom AKO Deployment Config (ADC) and applying the respective NSXALB_LABELS while
deploying the shared services cluster.

The format of the AKODeploymentConfig YAML file is as follows.

apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
finalizers:
- ako-operator.networking.tkg.tanzu.vmware.com
generation: 2
name: <Unique name of AKODeploymentConfig>
spec:
adminCredentialRef:
name: nsx-alb-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: nsx-alb-controller-ca
namespace: tkg-system-networking
cloudName: <NAME OF THE CLOUD in ALB>
clusterSelector:
matchLabels:
<KEY>: <VALUE>
controlPlaneNetwork:
cidr: <TKG-Cluster-VIP-CIDR>
Name: <TKG-Cluster-VIP-Network>
controller: <NSX ALB CONTROLLER IP/FQDN>

VMware, Inc 227


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

dataNetwork:
cidr: <TKG-Mgmt-Data-VIP-CIDR>
name: <TKG-Mgmt-Data-VIP-Name>
extraConfigs:
cniPlugin: antrea
disableStaticRouteSync: true
ingress:
defaultIngressController: false
disableIngressClass: true
nodeNetworkList:
- networkName: <TKG-Mgmt-Network>
serviceEngineGroup: <Mgmt-Cluster-SEG>

The sample AKODeploymentConfig with sample values in place is as follows. You should add the
respective NSX_ALB label type=shared-services while deploying shared services cluster to enforce
this network configuration.

cloud: ​sfo01w01vc01​

service engine group: sfo01m01segroup01

Control Plane network: sfo01-w01-vds01-tkgclustervip

VIP/data network: sfo01-w01-vds01-tkgmanagementvip

Node Network: sfo01-w01-vds01-tkgmanagement

apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
finalizers:
- ako-operator.networking.tkg.tanzu.vmware.com
generation: 2
labels:
name: tanzu-ako-for-shared
spec:
adminCredentialRef:
name: NSX_ALB-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: NSX_ALB-controller-ca
namespace: tkg-system-networking
cloudName: sfo01w01vc01
clusterSelector:
matchLabels:
type: shared-services
controlPlaneNetwork:
cidr: 172.16.80.0/24
name: sfo01-w01-vds01-tkgclustervip
controller: 172.16.10.10
dataNetwork:
cidr: 172.16.50.0/24
name: sfo01-w01-vds01-tkgmanagementvip
extraConfigs:
cniPlugin: antrea
disableStaticRouteSync: true
ingress:
defaultIngressController: false

VMware, Inc 228


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

disableIngressClass: true
nodeNetworkList:
- networkName: sfo01-w01-vds01-tkgmanagement
serviceEngineGroup: sfo01m01segroup01

After you have the AKO configuration file ready, use the kubectl command to set the context to
Tanzu Kubernetes Grid management cluster and create the ADC:

# kubectl config use-context sfo01w01tkgmgmt01-admin@sfo01w01tkgmgmt01


Switched to context "sfo01w01tkgmgmt01-admin@sfo01w01tkgmgmt01".

# kubectl apply -f ako-shared-services.yaml


akodeploymentconfig.networking.tkg.tanzu.vmware.com/tanzu-ako-for-shared created

Use the following command to list all AKODeploymentConfig created under the management
cluster:

# kubectl get adc


NAME AGE
install-ako-for-all 21h
install-ako-for-management-cluster 21h
tanzu-ako-for-shared 113s

Configure AKO Deployment Config (ADC) for Workload Cluster to


Enable NSX ALB L7 Ingress with NodePortLocal Mode
VMware recommends using NSX Advanced Load Balancer L7 ingress with NodePortLocal mode for
the L7 application load balancing. This is enabled by creating a custom ADC with ingress settings
enabled, and then applying the NSX_ALB LABEL while deploying the workload cluster.

As per the defined architecture, workload cluster cluster control plane endpoint uses TKG Cluster
VIP Network, application loadbalancing uses TKG Workload Data VIP network and the virtual services
are deployed in sfo01w01segroup01 SE group.

Below are the changes in ADC Ingress section when compare to the default ADC.

disableIngressClass: set to false to enable NSX ALB L7 Ingress.

nodeNetworkList: Provide the values for TKG workload network name and CIDR.

serviceType: L7 Ingress type, recommended to use NodePortLocal

shardVSSize: Virtual service size

Note

The NSX ALB L7 Ingress feature requires Enterprise edition license. If you do not
wish to enable L7 feature/applied with ALB essentials for Tanzu license, disable the
L7 feature by setting the value disableIngressClass to true.

The format of the AKODeploymentConfig YAML file for enabling NSX ALB L7 Ingress is as follows.

apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1

VMware, Inc 229


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

kind: AKODeploymentConfig
metadata:
name: <unique-name-for-adc>
spec:
adminCredentialRef:
name: NSX_ALB-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: NSX_ALB-controller-ca
namespace: tkg-system-networking
cloudName: <cloud name configured in nsx alb>
clusterSelector:
matchLabels:
<KEY>: <value>
controller: <ALB-Controller-IP/FQDN>
controlPlaneNetwork:
cidr: <TKG-Cluster-VIP-Network-CIDR>
name: <TKG-Cluster-VIP-Network-CIDR>
dataNetwork:
cidr: <TKG-Workload-VIP-network-CIDR>
name: <TKG-Workload-VIP-network-CIDR>
serviceEngineGroup: <Workload-Cluster-SEG>
extraConfigs:
cniPlugin: antrea
disableStaticRouteSync: false # required
ingress:
disableIngressClass: false # required
nodeNetworkList: # required
- networkName: <TKG-Workload-Network>
cidrs:
- <TKG-Workload-Network-CIDR>
serviceType: NodePortLocal # required
shardVSSize: MEDIUM # required

The AKODeploymentConfig with sample values in place is as follows. You should add the respective
NSX ALB label workload-l7-enabled=true while deploying workload cluster to enforce this network
configuration.

cloud: ​sfo01w01vc01​

service engine group: sfo01w01segroup01

Control Plane network: sfo01-w01-vds01-tkgclustervip

VIP/data network: sfo01-w01-vds01-tkgworkloadvip

Node Network: sfo01-w01-vds01-tkgworkload

apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
name: tanzu-ako-for-workload-l7-ingress
spec:
adminCredentialRef:
name: NSX_ALB-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: NSX_ALB-controller-ca

VMware, Inc 230


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

namespace: tkg-system-networking
cloudName: sfo01w01vc01
clusterSelector:
matchLabels:
workload-l7-enabled: "true"
controller: 172.16.10.10
controlPlaneNetwork:
cidr: 172.16.80.0/24
name: sfo01-w01-vds01-tkgclustervip
dataNetwork:
cidr: 172.16.70.0/24
name: sfo01-w01-vds01-tkgworkloadvip
serviceEngineGroup: sfo01w01segroup01
extraConfigs:
cniPlugin: antrea
disableStaticRouteSync: false # required
ingress:
disableIngressClass: false # required
nodeNetworkList: # required
- networkName: sfo01-w01-vds01-tkgworkload
cidrs:
- 172.16.60.0/24
serviceType: NodePortLocal # required
shardVSSize: MEDIUM # required

Use the kubectl command to set the context to Tanzu Kubernetes Grid management cluster and
create the ADC:

# kubectl config use-context sfo01w01tkgmgmt01-admin@sfo01w01tkgmgmt01


Switched to context "sfo01w01tkgmgmt01-admin@sfo01w01tkgmgmt01".

# kubectl apply -f workload-adc-l7.yaml


akodeploymentconfig.networking.tkg.tanzu.vmware.com/tanzu-ako-for-workload-l7-ingress
created

Use the following command to list all AKODeploymentConfig created under the management cluster:

# kubectl get adc


NAME AGE
install-ako-for-all 22h
install-ako-for-management-cluster 22h
tanzu-ako-for-shared 82m
tanzu-ako-for-workload-l7-ingress 25s

Now that you have successfully created the AKO deployment config, you need to apply the cluster
labels while deploying the workload clusters to enable NSX Advanced Load Balancer L7 Ingress with
NodePortLocal mode.

Deploy Tanzu Kubernetes Grid Shared Services Cluster


Each Tanzu Kubernetes Grid instance can have only one shared services cluster. Create a shared
services cluster if you intend to deploy Harbor.

The procedures for deploying a shared services cluster and workload cluster are almost the same. A
key difference is that for the shared service cluster you add the tanzu-services label to the shared

VMware, Inc 231


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

services cluster, as its cluster role. This label identifies the shared services cluster to the
management cluster and workload clusters.

Shared services cluster uses the custom ADC tanzu-ako-for-shared created earlier to apply the
network settings similar to the management cluster. This is enforced by applying the
NSXALB_LABEL type:shared-services while deploying the shared services cluster.

After the management cluster is registered with Tanzu Mission Control, the deployment of the Tanzu
Kubernetes clusters can be done in just a few clicks. The procedure for creating Tanzu Kubernetes
clusters is as follows.

Note

The scope of this document doesn’t cover the use of a proxy for Tanzu Kubernetes
Grid deployment. If your environment uses a proxy server to connect to the internet,
ensure that the proxy configuration object includes the CIDRs for the pod, ingress,
and egress from the workload network of the Management Cluster in the No proxy
list, as described in Create a Proxy Configuration Object for a Tanzu Kubernetes Grid
Service Cluster Running in vSphere with Tanzu.

1. Navigate to the Clusters tab and click Create Cluster and select Create Tanzu Kubernetes
Grid cluster.

2. Under the Create cluster page, select the management cluster which you registered in the
previous step and click Continue to create cluster.

3. Select the provisioner for creating the shared services cluster. Provisioner reflects the
vSphere namespaces that you have created and associated with the management cluster.

VMware, Inc 232


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

4. On the Cluster Details page, do the following:

5. Enter a name for the cluster (Cluster names must be unique within an organization).

6. Select the cluster group to which you want to attach your cluster.

7. Select Cluster Class from the drop down.

8. Use the NSXALB_Labels created for shared cluster on AKO Deployment.

9. On the Configure page, specify the following:

In the vCenter and tlsThumbprint fields, enter the details for authentication.

From the datacenter, resourcePool, folder, network, and datastore drop-down


menu, select the required information.

From the template drop down, select the Kubernetes version.The latest supported
version is preselected for you.

VMware, Inc 233


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

In the sshAuthorizedKeys field, enter the SSH key that was created earlier.

Enable aviAPIServerHAProvider.

10. Update POD CIDR and Service CIDR if necessary.

11. Select the high availability mode for the control plane nodes of the shared services cluster.
For a production deployment, it is recommended to deploy a highly available shared services
cluster.

VMware, Inc 234


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

12. Customize the default node pool for your workload cluster.

Specify the number of worker nodes to provision.

Select OS Version.

13. Click Create Cluster to start provisioning your workload cluster.

14. Cluster creation takes approximately 15-20 minutes to complete. After the cluster
deployment completes, ensure that agent and extensions health shows green.

Now that the shared services cluster is successfully created, you may proceed with deploying the
Harbor package. For more information, see Install Harbor in Deploy User-Managed Packages in
Workload Clusters.

1. Connect to the Tanzu Management Cluster context and verify the cluster labels for the
workload cluster.

VMware, Inc 235


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

## verify the workload service cluster creation

tanzu cluster list


NAME NAMESPACE STATUS CONTROLPLANE WORKERS KUBERNETES
ROLES PLAN TKR
sfo01w01tkgshared01 default running 3/3 3/3 v1.26.5+vmwa
re.2 <none> prod v1.26.5---vmware.2-tkg.1

## Connect to tkg management cluster

kubectl config use-context sfo01w01tkgmgmt01-admin@sfo01w01tkgmgmt01

## Add the tanzu-services label to the shared services cluster as its cluster r
ole. In the following command "sfo01w01tkgshared01" is the name of the shared s
ervice cluster

kubectl label cluster.cluster.x-k8s.io/sfo01w0tkgshared01 cluster-role.tkg.tanz


u.vmware.com/tanzu-services="" --overwrite=true
cluster.cluster.x-k8s.io/sfo01w0tkgshared01 labeled

## Validate that TMC has applied the AVI_LABEL while deploying the cluster

kubectl get cluster sfo01w0tkgshared01 --show-labels


NAME PHASE AGE VERSION LABELS
sfo01w0tkgshared01 Provisioned 105m v1.26.5+vmware.2 cluster-role.tk
g.tanzu.vmware.com/tanzu-services=,networking.tkg.tanzu.vmware.com/avi=tanzu-ak
o-for-shared,tanzuKubernetesRelease=v1.26.5---vmware.2-tkg.1,tkg.tanzu.vmware.c
om/cluster-name=sfo01w0tkgshared01,type=shared-services

2. Connect to admin context of the workload cluster using the following commands and validate
the ako pod status.

## Use the following command to get the admin context of workload Cluster.

tanzu cluster kubeconfig get sfo01w0tkgshared01 --admin

Credentials of cluster 'sfo01w0tkgshared01' have been saved


You can now access the cluster by running 'kubectl config use-context sfo01w0tk
gshared01-admin@sfo01w0tkgshared01'

## Use the following command to use the context of workload Cluster

kubectl config use-context sfo01w0tkgshared01-admin@sfo01w0tkgshared01

Switched to context "sfo01w0tkgshared01-admin@sfo01w0tkgshared01".

# Verify that ako pod gets deployed in avi-system namespace

kubectl get pods -n avi-system


NAME READY STATUS RESTARTS AGE
ako-0 1/1 Running 0 73m

# verify the nodes and pods status by running the command:


kubectl get nodes -o wide

VMware, Inc 236


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

kubectl get pods -A

Now that the shared services cluster is successfully created, you may proceed with deploying the
Harbor package. For more information, see Install Harbor for Service Registry.

Deploy Tanzu Kubernetes Grid Workload Clusters


As per the architecture, workload clusters make use of a custom ADC to enable NSX Advanced
Load Balancer L7 ingress with NodePortLocal mode. This is enforced by providing the
NSXALB_LABEL while deploying the workload cluster.

The steps for deploying a workload cluster are the same as for a shared services cluster. except use
the NSX ALB Labels created for the Workload cluster on AKO Deployment in step number 4.

After the Workload cluster creation verify the cluster labels and ako pod status 1. Connect to the
Tanzu Management Cluster context and verify the cluster labels for the workload cluster. ```bash ##
verify the workload service cluster creation

tanzu cluster list


NAME NAMESPACE STATUS CONTROLPLANE WORKERS KUBERNETES RO
LES PLAN TKR

sfo01w01shared01 default running 3/3 3/3 v1.26.5+vmware.2 <n


one> prod v1.26.5---vmware.2-tkg.1
sfo01w01workload01 default running 3/3 3/3 v1.26.5+vmware.2 <n
one> prod v1.26.5---vmware.2-tkg.1

## Connect to tkg management cluster

kubectl config use-context sfo01w01tkgmgmt01-admin@sfo01w01tkgmgmt01

## Validate that TMC has applied the AVI_LABEL while deploying the cluster

kubectl get cluster sfo01w01workload01 --show-labels


NAME PHASE AGE VERSION LABELS

sfo01w01workload01 Provisioned 105m networking.tkg.tanzu.vmware.com/av


i=tanzu-ako-for-workload-l7-ingress,tanzuKubernetesRelease=v1.249---vmware.1-tkg.1,tkg
.tanzu.vmware.com/cluster-name=sfo01w01workload01,workload-l7-enabled=true

```
<!-- /* cSpell:enable */ -->

1. Connect to admin context of the workload cluster using the following commands and validate
the ako pod status.

## Use the following command to get the admin context of workload Cluster.

tanzu cluster kubeconfig get sfo01w01workload01 --admin

Credentials of cluster 'sfo01w01workload01' have been saved


You can now access the cluster by running 'kubectl config use-context sfo01w01w
orkload01-admin@sfo01w01workload01'

VMware, Inc 237


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

## Use the following command to use the context of workload Cluster

kubectl config use-context sfo01w01workload01-admin@sfo01w01workload01

Switched to context "sfo01w01workload01-admin@sfo01w01workload01".

# Verify that ako pod gets deployed in avi-system namespace

kubectl get pods -n avi-system


NAME READY STATUS RESTARTS AGE
ako-0 1/1 Running 0 73m

# verify the nodes and pods status by running the command:


kubectl get nodes -o wide

kubectl get pods -A

You can now configure SaaS components and deploy user-managed packages on the cluster.

Configure Tanzu SaaS Components and Deploy User-


Managed Packages
For information on how to configure the SaaS components, see Configure Tanzu SaaS Components
for Tanzu for Kubernetes Operations.

For more information about deploying user-managed packages, see Deploy User-Managed
Packages in Workload Clusters.

VMware Tanzu for Kubernetes Operations on vSphere with


NSX Networking Reference Design
VMware Tanzu simplifies operation of Kubernetes for multi-cloud deployment by centralizing
management and governance for clusters and teams across on-premises, public clouds, and edge. It
delivers an open source aligned Kubernetes distribution with consistent operations and management
to support infrastructure and application modernization.

This document lays out a reference architecture related for VMware Tanzu for Kubernetes
Operations when deployed on a vSphere environment backed by VMware NSX and offers a high-
level overview of the different components.

This reference design is based on the architecture and components described in VMware Tanzu for
Kubernetes Operations Reference Architecture.

VMware, Inc 238


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Supported Component Matrix


The validated Bill of Materials that can be used to install Tanzu Kubernetes Grid on your vSphere with
NSX environment is as follows:

Software Components Version

Tanzu Kubernetes Grid 2.3.0

VMware vSphere ESXi 8.0 U1 and later

VMware vCenter (VCSA) 8.0 U1 and later

NSX Advanced Load Balancer 22.1.2

VMware NSX 4.1.0.2

For more information about the software versions can be used together, see Interoperability Matrix.

Tanzu Kubernetes Grid Components


VMware Tanzu Kubernetes Grid (TKG) provides organizations with a consistent, upstream-
compatible, regional Kubernetes substrate that is ready for end-user workloads and ecosystem
integrations. You can deploy Tanzu Kubernetes Grid across software-defined datacenters (SDDC)
and public cloud environments, including vSphere, Microsoft Azure, and Amazon EC2.

Tanzu Kubernetes Grid comprises the following components:

Management Cluster - A management cluster is the first element that you deploy when you create a
Tanzu Kubernetes Grid instance. The management cluster is a Kubernetes cluster that performs the
role of the primary management and operational center for the Tanzu Kubernetes Grid instance. The
management cluster is purpose-built for operating the platform and managing the lifecycle of Tanzu
Kubernetes clusters.

VMware, Inc 239


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

ClusterClass API - Tanzu Kubernetes Grid 2 functions through the creation of a management
Kubernetes cluster which holds ClusterClass API. The ClusterClass API then interacts with the
infrastructure provider to service workload Kubernetes cluster lifecycle requests. The earlier
primitives of Tanzu Kubernetes Clusters will still exist for Tanzu Kubernetes Grid 1.X . A new feature
has been introduced as a part of Cluster API called ClusterClass which reduces the need for
redundant templating and enables powerful customization of clusters. The whole process for creating
a cluster using ClusterClass is the same as before but with slightly different parameters.

Tanzu Kubernetes Cluster - Tanzu Kubernetes clusters are the Kubernetes clusters in which your
application workloads run. These clusters are also referred to as workload clusters. Tanzu
Kubernetes clusters can run different versions of Kubernetes, depending on the needs of the
applications they run.

Shared Service Cluster - Each Tanzu Kubernetes Grid instance can only have one shared services
cluster. You deploy this cluster only if you intend to deploy shared services such as Contour and
Harbor.

Tanzu Kubernetes Cluster Plans - A cluster plan is a blueprint that describes the configuration with
which to deploy a Tanzu Kubernetes cluster. It provides a set of configurable values that describe
settings like the number of control plane machines, worker machines, VM types, and so on. This
release of Tanzu Kubernetes Grid provides two default templates, dev and prod.

Tanzu Kubernetes Grid Instance - A Tanzu Kubernetes Grid instance is the full deployment of Tanzu
Kubernetes Grid, including the management cluster, the workload clusters, and the shared services
cluster that you configure.

Tanzu CLI - A command-line utility that provides the necessary commands to build and operate
Tanzu management and Tanzu Kubernetes clusters. Starting with TKG 2.3.0, Tanzu Core CLI is now
distributed separately from Tanzu Kubernetes Grid. For more information about installing the Tanzu
CLI for use with Tanzu Kubernetes Grid, see Install the Tanzu CLI.

Carvel Tools - Carvel is an open-source suite of reliable, single-purpose, composable tools that aid
in building, configuring, and deploying applications to Kubernetes. Tanzu Kubernetes Grid uses the
following Carvel tools:

ytt - A command-line tool for templating and patching YAML files. You can also use ytt to
collect fragments and piles of YAML into modular chunks for reuse.

kapp - The application deployment CLI for Kubernetes. It allows you to install, upgrade, and
delete multiple Kubernetes resources as one application.

kbld - An image-building and resolution tool.

imgpkg - A tool that enables Kubernetes to store configurations and the associated
container images as OCI images, and to transfer these images.

yq - a lightweight and portable command-line YAML, JSON, and XML processor. yq uses jq-
like syntax but works with YAML files as well as JSON and XML.

Bootstrap Machine - The bootstrap machine is the laptop, host, or server on which you download
and run the Tanzu CLI. This is where the initial bootstrapping of a management cluster occurs before
it is pushed to the platform on which it runs.

Tanzu Kubernetes Grid Installer - The Tanzu Kubernetes Grid installer is a CLI or a graphical wizard
that provides an option to deploy a management cluster. You launch this installer locally on the

VMware, Inc 240


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

bootstrap machine by running the tanzu management-cluster create command.

Tanzu Kubernetes Grid Storage


Tanzu Kubernetes Grid integrates with shared datastores available in the vSphere infrastructure. The
following types of shared datastores are supported:

vSAN

VMFS

NFS

vVols

Tanzu Kubernetes Grid is agnostic to which option you choose. For Kubernetes stateful workloads,
Tanzu Kubernetes Grid installs the vSphere Container Storage interface (vSphere CSI) to
automatically provision Kubernetes persistent volumes for pods.

Tanzu Kubernetes Grid Cluster Plans can be defined by operators to use a certain vSphere datastore
when creating new workload clusters. All developers then have the ability to provision container-
backed persistent volumes from that underlying datastore.

Tanzu Kubernetes Clusters Networking


A Tanzu Kubernetes cluster provisioned by Tanzu Kubernetes Grid supports two Container Network
Interface (CNI) options:

Antrea

Calico

Both are open-source software that provide networking for cluster pods, services, and ingress.

When you deploy a Tanzu Kubernetes cluster using Tanzu Mission Control or Tanzu CLI, Antrea CNI
is automatically enabled in the cluster.

Tanzu Kubernetes Grid also supports Multus CNI which can be installed through Tanzu user-
managed packages. Multus CNI lets you attach multiple network interfaces to a single pod and
associate each with a different address range.

To provision a Tanzu Kubernetes cluster using a non-default CNI, see the following instructions:

Deploy Tanzu Kubernetes clusters with Calico

Implement Multiple Pod Network Interfaces with Multus

Each CNI is suitable for a different use case. The following table lists some common use cases for the
three CNIs that Tanzu Kubernetes Grid supports. This table helps you with information on selecting
the right CNI in your Tanzu Kubernetes Grid implementation.

CNI Use Case Pros and Cons

VMware, Inc 241


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Ant
Enable Kubernetes pod networking with IP overlay networks using Pros
rea
VXLAN or Geneve for encapsulation. Optionally, you can encrypt
- Provides an option to configure
node-to-node communication using IPSec packet encryption.
egress IP pool or static egress IP for
Antrea supports advanced network use cases like kernel bypass and Kubernetes workloads.
network service mesh.

Cali
Calico is used in environments where factors like network Pros
co
performance, flexibility, and power are essential.
- Support for network policies
For routing packets between nodes, Calico leverages the BGP
- High network performance
routing protocol instead of an overlay network. This eliminates the
need to wrap packets with an encapsulation layer resulting in - SCTP support
increased network performance for Kubernetes workloads.
Cons

- No multicast support

Mul Multus CNI provides multiple interfaces per each Kubernetes pod.
Pros
tus Using Multus CRDs, you can specify which pods get which interfaces
and allow different interfaces depending on the use case. - Separation of data/control planes.

- Separate security policies can be


used for separate interfaces.

- Supports SR-IOV, DPDK, OVS-


DPDK, and VPP workloads in
Kubernetes with both cloud native
and NFV based applications in
Kubernetes.

Deploy Pods with Routable and No-NAT IP Addresses (NSX)


On vSphere with NSX networking and the Antrea container network interface (CNI), you can
configure a workload clusters with routable IP addresses for its worker pods, bypassing network
address translation (NAT) for external requests from and to the pods.

You can perform the following tasks by using the routable IP addresses on pods:

Trace outgoing requests to common shared services, due to their source IP address is the
routable pod IP address and not a NAT address.

Support authenticated incoming requests from the external internet directly to pods by
bypassing NAT.

Tanzu Kubernetes Grid Infrastructure Networking


Tanzu Kubernetes Grid on vSphere can be deployed on various networking stacks including:

VMware NSX Data Center Networking

vSphere Networking (VDS)

Note

The scope of this document is limited to VMware NSX Data Center Networking with
NSX Advanced Load Balancer Enterprise Edition.

VMware, Inc 242


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Tanzu Kubernetes Grid on VMware NSX Data Center


Networking with NSX Advanced Load Balancer
When deployed on VMware NSX Networking, Tanzu Kubernetes Grid uses the NSX logical
segments and gateways to provide connectivity to Kubernetes control plane VMs, worker nodes,
services, and applications. All hosts from the cluster where Tanzu Kubernetes clusters are deployed,
are configured as NSX transport nodes, which provide network connectivity to the Kubernetes
environment.

You can configure NSX Advanced Load Balancer in Tanzu Kubernetes Grid as:

L4 load balancer for application hosted on the TKG cluster.

The L7 ingress service provider for the applications in the clusters that are deployed on
vSphere.

L4 load balancer for the control plane API server.

Each workload cluster integrates with NSX Advanced Load Balancer by running an Avi Kubernetes
Operator (AKO) on one of its nodes. The cluster’s AKO calls the Kubernetes API to manage the
lifecycle of load balancing and ingress resources for its workloads.

NSX Advanced Load Balancer Components


NSX Advanced Load Balancer is deployed in Write Access Mode in VMware NSX Environment. This
mode grants NSX Advanced Load Balancer controllers full write access to vCenter which helps in
automatically creating, modifying, and removing service engines (SEs), and other resources as
needed to adapt to changing traffic needs. The core components of NSX Advanced Load Balancer
are as follows:

NSX Advanced Load Balancer Controller - NSX Advanced Load Balancer controller
manages virtual service objects and interacts with the vCenter Server infrastructure to
manage the lifecycle of the service engines (SEs). It is the central repository for the
configurations and policies related to services and management, and it provides the portal for
viewing the health of VirtualServices and SEs and the associated analytics that NSX
Advanced Load Balancer provides.

NSX Advanced Load Balancer Service Engine - The service engines (SEs) are lightweight
VMs that handle all data plane operations by receiving and executing instructions from the
controller. The SEs perform load balancing and all client- and server-facing network
interactions.

Service Engine Group - Service engines are created within a group, which contains the
definition of how the SEs should be sized, placed, and made highly available. Each cloud has
at least one SE group.

Cloud - Clouds are containers for the environment that NSX Advanced Load Balancer is
installed or operating within. During the initial setup of NSX Advanced Load Balancer, a
default cloud, named Default-Cloud, is created. This is where the first controller is deployed
into Default-Cloud. Additional clouds may be added containing SEs and virtual services.

VMware, Inc 243


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Avi Kubernetes Operator (AKO) - It is a Kubernetes operator that runs as a pod in the
Supervisor Cluster and Tanzu Kubernetes clusters, and it provides ingress and load balancing
functionality. AKO translates the required Kubernetes objects to NSX Advanced Load
Balancer objects and automates the implementation of ingresses, routes, and services on the
service engines (SE) through the NSX Advanced Load Balancer Controller.

AKO Operator (AKOO) - This is an operator which is used to deploy, manage, and remove
the AKO pod in Kubernetes clusters. This operator when deployed creates an instance of the
AKO controller and installs all the relevant objects like:
AKO Statefulset

Clusterrole and Clusterrolebinding

Configmap (required for the AKO controller and other artifacts).

Tanzu Kubernetes Grid management clusters have an AKO operator installed out-of-the-box during
cluster deployment. By default, a Tanzu Kubernetes Grid management cluster has a couple of
AkoDeploymentConfig created which dictates when and how AKO pods are created in the workload
clusters. For more information, see AKO Operator documentation.

Optionally, you can enter one or more cluster labels to identify clusters on which to selectively
enable NSX ALB or to customize NSX ALB settings for different groups of clusters. This is useful in
the following scenarios: - You want to configure different sets of workload clusters to different
Service Engine Groups to implement isolation or to support more Service type Load Balancers than
one Service Engine Group’s capacity. - You want to configure different sets of workload clusters to
different Clouds because they are deployed in different sites.

To enable NSX ALB selectively rather than globally, add labels in the format key: value pair in the
management cluster config file. This will create a default AKO Deployment Config (ADC) on
management cluster with the NSX ALB settings provided. Labels that you define here will be used to
create a label selector. Only workload cluster objects that have the matching labels will have the load
balancer enabled.

To customize the NSX ALB settings for different groups of clusters, create an AKO Deployment
Config (ADC) on management cluster by customizing the NSX ALB settings, and providing a unique
label selector for the ADC. Only the workload cluster objects that have the matching labels will have
these custom settings applied.

You can label the cluster during the workload cluster deployment or label it manually post cluster
creation. If you define multiple key-values, you need to apply all of them. - Provide an AVI_LABEL
in the below format in the workload cluster deployment config file, and it will automatically label the
cluster and select the matching ADC based on the label selector during the cluster deployment.
AVI_LABELS: | 'type': 'tkg-workloadset01' - Optionally, you can manually label the cluster object
of the corresponding workload cluster with the labels defined in ADC. kubectl label cluster
<cluster-name> type=tkg-workloadset01

Each environment configured in NSX Advanced Load Balancer is referred to as a cloud. Each cloud
in NSX Advanced Load Balancer maintains networking and service engine settings. The cloud is
configured with one or more VIP networks to provide IP addresses for load balancing (L4/L7) virtual
services created under that cloud.

The virtual services can be spanned across multiple service Engines if the associated Service Engine
Group is configured in Active/Active HA mode. A service engine can belong to only one SE group

VMware, Inc 244


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

at a time.

IP address allocation for virtual services can be over DHCP or through the in-built IPAM functionality
of NSX Advanced Load Balancer. The VIP networks created or configured in NSX Advanced Load
Balancer are associated with the IPAM profile.

Network Architecture
For the deployment of Tanzu Kubernetes Grid in the VMware NSX environment, it is required to
build separate networks for the Tanzu Kubernetes Grid management clusters, workload clusters,
NSX Advanced Load Balancer management, cluster-VIP, and workload VIP network for control
plane HA and application load balancing/ingress.

The network reference design can be mapped into this general framework. This design uses a single
VIP network for control plane L4 load balancing and application L4/L7. This design is mostly suited
for dev/test environment.

Another reference design that can be implemented in production environment is shown below, and
it uses separate VIP network for the applications deployed in management/shared services and the
workload cluster.

VMware, Inc 245


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

This topology enables the following benefits:

Isolate and separate SDDC management components (vCenter, ESX) from the Tanzu
Kubernetes Grid components. This reference design allows only the minimum connectivity
between the Tanzu Kubernetes Grid clusters and NSX Advanced Load Balancer to the
vCenter Server.

Isolate and separate NSX Advanced Load Balancer management network from the Tanzu
Kubernetes Grid management segment and the Tanzu Kubernetes Grid workload segments.

Depending on the workload cluster type and use case, multiple workload clusters may
leverage the same workload network or new networks can be used for each workload
cluster. To isolate and separate Tanzu Kubernetes Grid workload cluster networking from
each other, it is recommended to use separate networks for each workload cluster and
configure the required firewall between these networks. For more information, see Firewall
Requirements.

Separate provider and tenant access to the Tanzu Kubernetes Grid environment.
Only provider administrators need access to the Tanzu Kubernetes Grid management
cluster. This prevents tenants from attempting to connect to the Tanzu Kubernetes
Grid management cluster.

Network Requirements
As per the production architecture, the following list of networks are required:

VMware, Inc 246


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

DHCP
Network Type Description & Recommendations
Service

NSX ALB Management Optiona


NSX ALB controllers and SEs are attached to this network.
Logical Segment l
DHCP is not a mandatory requirement on this network as NSX ALB can handle
IPAM services for the management network.

TKG Management Yes Control plane and worker nodes of TKG management cluster are attached to
Logical Segment this network.

TKG Shared Service Yes Control plane and worker nodes of TKG shared services cluster are attached to
Logical Segment this network.

TKG Workload Logical Yes Control plane and worker nodes of TKG workload clusters are attached to this
Segment network.

TKG Management VIP No Virtual services for control plane HA of all TKG clusters (management, shared
Logical Segment services, and workload).
Reserve sufficient IP addresses depending on the number of TKG clusters
planned to be deployed in the environment.
The NSX Advanced Load Balancer takes care of IPAM on this network.

TKG Workload VIP No Virtual services for applications deployed in the workload cluster. The
Logical Segment applications can be of type Load balancer or Ingress.
Reserve sufficient IP addresses depending on the number of applications
planned to be deployed in the environment.
NSX Advanced Load Balancer takes care of IPAM on this network.

Note

You can also select TKG Workload VIP network for control plane HA of the workload
cluster if you wish so.

Subnet and CIDR Examples

For this demonstration, this document uses the following subnet CIDR for Tanzu for Kubernetes
Operations deployment.

Gateway
Network Type Segment Name DHCP Pool NSX ALB IP Pool
CIDR

NSX ALB Management sfo01-w01-vds01- 172.16.10.1 N/A 172.16.10.100 -


Network albmanagement /24 172.16.10.200

TKG Management VIP sfo01-w01-vds01- 172.16.80. N/A 172.16.80.100 -


Network tkgclustervip 1/24 172.16.80.200

TKG Management sfo01-w01-vds01- 172.16.40. 172.16.40.100 - N/A


Network tkgmanagement 1/24 172.16.40.200

TKG Shared Service sfo01-w01-vds01- 172.16.50. 172.16.50.100- N/A


Network tkgshared 1/24 172.16.50.200

TKG Workload Network sfo01-w01-vds01- 172.16.60. 172.16.60.100- N/A


tkgworkload 1/24 172.16.60.200

VMware, Inc 247


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Gateway
Network Type Segment Name DHCP Pool NSX ALB IP Pool
CIDR

TKG Workload VIP sfo01-w01-vds01- 172.16.70. 172.16.70.100- N/A


Network workloadvip 1/24 172.16.70.200

These networks are spread across the tier-1 gateways shown in the reference architecture diagram.
You must configure the appropriate firewall rules on the tier-1 gateways for a successful deployment.

Firewall Requirements
To prepare the firewall, you must collect the following information:

1. NSX ALB Controller nodes and Cluster IP address

2. NSX ALB Management Network CIDR

3. TKG Management Network CIDR

4. TKG Shared Services Network CIDR

5. TKG Workload Network CIDR

6. TKG Management VIP Address Range

7. Client Machine IP Address

8. Bootstrap machine IP Address

9. Harbor registry IP address

10. vCenter Server IP

11. DNS server IP(s)

12. NTP Server(s)

13. NSX nodes and VIP address.

Configured
Source Destination Protocol:Port Description
On

NSX Advanced Load vCenter and ESXi hosts TCP:443 Allows NSX ALB to discover NSX ALB
Balancer controllers and vCenter objects and deploy Tier-1
Cluster IP address SEs as required. Gateway

NSX Advanced Load NSX nodes and VIP TCP:443 Allows NSX ALB to discover NSX ALB
Balancer controllers and address. NSX Objects (logical routers Tier-1
Cluster IP address and logical segments, and so Gateway
on).

NSX Advanced Load NSX ALB


DNS Server. UDP:53 DNS Service
Balancer management Tier-1
network CIDR NTP Server UDP:123 Time synchronization Gateway

Client Machine NSX Advanced Load TCP:443 To access NSX Advanced Load NSX ALB
Balancer controllers and Balancer portal. Tier-1
Cluster IP address Gateway

VMware, Inc 248


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Configured
Source Destination Protocol:Port Description
On

Client Machine Bootstrap VM IP address SSH:22 To deploy,configure and TKG Mgmt


manage TKG clusters. Tier-1
Gateway

DNS Server UDP:53 DNS Service TKG Mgmt


TKG management
NTP Server UDP:123 Time Synchronization Tier-1
network CIDR
Gateway
TKG shared services
network CIDR

vCenter Server TCP:443 Allows components to access TKG Mgmt


TKG management
vCenter to create VMs and Tier-1
network CIDR
storage volumes Gateway
TKG shared services
network CIDR

Harbor Registry TCP:443 TKG Mgmt


TKG management Allows components to retrieve
Tier-1
network CIDR container images.
Gateway
TKG shared services This registry can be a local or a
network CIDR public image registry.

TKG Management VIP TCP:6443 For management cluster to TKG Mgmt


TKG management
Network configure workload cluster. Tier-1
network CIDR
Gateway
TKG shared services Allows shared cluster to
network CIDR register with management
cluster.

NSX Advanced Load TCP:443 Allow Avi Kubernetes Operator TKG Mgmt
TKG management
Balancer management (AKO) and AKO Operator Tier-1
network CIDR
network CIDR (AKOO) access to NSX ALB Gateway
TKG shared services controller.
network CIDR

TKG workload network DNS Server UDP:53 DNS Service TKG


CIDR NTP Server UDP:123 Time Synchronization Workload
Tier-1
Gateway

TKG workload network vCenter Server TCP:443 Allows components to access TKG
CIDR vCenter to create VMs and Workload
storage volumes. Tier-1
Gateway

TKG workload network Harbor Registry TCP:443 TKG


Allows components to retrieve
CIDR Workload
container images.
Tier-1
This registry can be a local or a Gateway
public image registry.

TKG workload network TKG Management VIP TCP:6443 Allow TKG workload clusters to TKG
CIDR Network register with TKG management Workload
cluster. Tier-1
Gateway

VMware, Inc 249


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Configured
Source Destination Protocol:Port Description
On

TKG workload network NSX Advanced Load TCP:443 Allow Avi Kubernetes Operator TKG
CIDR Balancer management (AKO) and AKO Operator Workload
network CIDR (AKOO) access to NSX ALB Tier-1
controller. Gateway

deny-all any any deny All Tier-1


gateways

Installation Experience
Tanzu Kubernetes Grid management cluster is the first component that you deploy to get started
with Tanzu Kubernetes Grid.

You can deploy the management cluster in one of the following ways:

Run the Tanzu Kubernetes Grid installer, a wizard interface that guides you through the
process of deploying a management cluster. This is the recommended method if you are
installing a Tanzu Kubernetes Grid management cluster for the first time.

Create and edit YAML configuration files, and use them to deploy a management cluster with
the CLI commands.

By using the current version of the The Tanzu Kubernetes Grid Installation user interface, you can
install Tanzu Kubernetes Grid on VMware vSphere, AWS, and Microsoft Azure. The UI provides a
guided experience tailored to the IaaS, in this case on VMware vSphere backed by NSX-T Data
Center networking.

The installation of Tanzu Kubernetes Grid on vSphere is done through the same UI as mentioned
above but tailored to a vSphere environment.

VMware, Inc 250


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

This installation process takes you through the setup of a management cluster on your vSphere
environment. Once the management cluster is deployed, you can make use of Tanzu Mission
Control or Tanzu CLI to deploy Tanzu Kubernetes shared service and workload clusters.

Kubernetes Ingress Routing


The default installation of Tanzu Kubernetes Grid does not have any default ingress controller
deployed. Users can use Contour (available for installation through Tanzu Packages), or any third-
party ingress controller of their choice.

Contour is an open-source controller for Kubernetes ingress routing. Contour can be installed in the
shared services cluster on any Tanzu Kubernetes Cluster. Deploying Contour is a prerequisite if you
want to deploy Prometheus, Grafana, and Harbor packages on a workload cluster.

For more information about Contour, see the Contour website and Implementing Ingress Control
with Contour.

Another option is to use the NSX Advanced Load Balancer Kubernetes ingress controller that offers
an advanced L4-L7 load balancing/ingress for containerized applications that are deployed in the
Tanzu Kubernetes workload cluster.

VMware, Inc 251


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

For more information about the NSX ALB ingress controller, see Configuring L7 Ingress with NSX
Advanced Load Balancer.

Tanzu Service Mesh, which is a SaaS offering for modern applications running across multi-cluster,
multi-clouds, also offers an ingress controller based on Istio.

The following table provides general recommendations about using a specific ingress controller for
your Kubernetes environment.

Ingress
Use Cases
Controller

Contour
Use Contour when only north-south traffic is needed in a Kubernetes cluster. You can apply security
policies for the north-south traffic by defining the policies in the application’s manifest file.

It’s a reliable solution for simple Kubernetes workloads.

Istio Use Istio ingress controller when you intend to provide security, traffic direction, and insights within
the cluster (east-west traffic) and between the cluster and the outside world (north-south traffic).

NSX ALB
Use NSX ALB ingress controller when a containerized application requires features like local and
ingress
global server load balancing (GSLB), web application firewall (WAF), performance monitoring, and so
controller
on.

NSX Advanced Load Balancer (ALB) as an L4+L7 Ingress Service


Provider

VMware, Inc 252


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

NSX Advanced Load Balancer provides an L4+L7 load balancing solution for vSphere. It includes a
Kubernetes operator that integrates with the Kubernetes API to manage the lifecycle of load
balancing and ingress resources for workloads.

Legacy ingress services for Kubernetes include multiple disparate solutions. The services and
products contain independent components that are difficult to manage and troubleshoot. The ingress
services have reduced observability capabilities with little analytics, and they lack comprehensive
visibility into the applications that run on the system. Cloud-native automation is difficult in the legacy
ingress services.

In comparison to the legacy Kubernetes ingress services, NSX Advanced Load Balancer has
comprehensive load balancing and ingress services features. As a single solution with a central
control, NSX Advanced Load Balancer is easy to manage and troubleshoot. NSX Advanced Load
Balancer supports real-time telemetry with an insight into the applications that run on the system.
The elastic auto-scaling and the decision automation features highlight the cloud-native automation
capabilities of NSX Advanced Load Balancer.

NSX Advanced Load Balancer also lets you configure L7 ingress for your workload clusters by using
one of the following options:

L7 ingress in ClusterIP mode

L7 ingress in NodePortLocal mode

L7 ingress in NodePort mode

NSX Advanced Load Balancer L4 ingress with Contour L7 ingress

L7 Ingress in ClusterIP Mode

This option enables NSX Advanced Load Balancer L7 ingress capabilities, including sending traffic
directly from the service engines (SEs) to the pods, preventing multiple hops that other ingress
solutions need when sending packets from the load balancer to the right node where the pod runs.
The NSX Advanced Load Balancer controller creates a virtual service with a backend pool with the
pod IPs which helps to send the traffic directly to the pods.

However, each workload cluster needs a dedicated SE group for Avi Kubernetes Operator (AKO) to
work, which could increase the number of SEs you need for your environment. This mode is used
when you have a small number of workload clusters.

L7 Ingress in NodePort Mode

The NodePort mode is the default mode when AKO is installed on Tanzu Kubernetes Grid. This
option allows your workload clusters to share SE groups and is fully supported by VMware. With this
option, the services of your workloads must be set to NodePort instead of ClusterIP even when
accompanied by an ingress object. This ensures that NodePorts are created on the worker nodes
and traffic can flow through the SEs to the pods via the NodePorts. Kube-Proxy, which runs on each
node as DaemonSet, creates network rules to expose the application endpoints to each of the nodes
in the format “NodeIP:NodePort”. The NodePort value is the same for a service on all the nodes. It
exposes the port on all the nodes of the Kubernetes Cluster, even if the pods are not running on it.

L7 Ingress in NodePortLocal Mode

VMware, Inc 253


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

This feature is supported only with Antrea CNI. You must enable this feature on a workload cluster
before its creation. The primary difference between this mode and the NodePort mode is that the
traffic is sent directly to the pods in your workload cluster through node ports without interfering
Kube-proxy. With this option, the workload clusters can share SE groups. Similar to the ClusterIP
Mode, this option avoids the potential extra hop when sending traffic from the NSX Advanced Load
Balancer SEs to the pod by targeting the right nodes where the pods run.

Antrea agent configures NodePortLocal port mapping rules at the node in the format
“NodeIP:Unique Port” to expose each pod on the node on which the pod of the service is running.
The default range of the port number is 61000-62000. Even if the pods of the service are running
on the same Kubernetes node, Antrea agent publishes unique ports to expose the pods at the node
level to integrate with the load balancer.

NSX Advanced Load Balancer L4 Ingress with Contour L7 Ingress

This option does not have all the NSX Advanced Load Balancer L7 ingress capabilities but uses it for
L4 load balancing only and leverages Contour for L7 Ingress. This also allows sharing SE groups
across workload clusters. This option is supported by VMware and it requires minimal setup.

Design Recommendations
NSX Advanced Load Balancer Recommendations
The following table provides the recommendations for configuring NSX Advanced Load Balancer in
a vSphere environment backed by NSX networking.

Decision
Design Decision Design Justification Design Implications
ID

TKO- Deploy NSX ALB controller cluster Isolate NSX ALB traffic from Additional Network
ALB-001 nodes on a network dedicated to infrastructure management traffic and (VLAN) is required.
NSX ALB. Kubernetes workloads.

TKO- Deploy 3 NSX ALB controller To achieve high availability for the Clustered mode requires
ALB- nodes. NSX ALB platform. more Compute and
002 In clustered mode, NSX ALB Storage resources.
availability is not impacted by an
individual controller node failure. The
failed node can be removed from the
cluster and redeployed if recovery is
not possible.

TKO- Initial setup should be done only NSX Advanced Load Balancer NSX Advanced Load
ALB- on one NSX Advanced Load controller cluster is created from an Balancer controller cluster
003 Balancer controller VM out of the initialized NSX Advanced Load creation fails if more than
three deployed to create an NSX Balancer controller which becomes one NSX Advanced Load
Advanced Load Balancer controller the cluster leader. Balancer controller is
cluster. Follower NSX Advanced Load initialized.
Balancer controller nodes need to be
uninitialized to join the cluster.

VMware, Inc 254


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Decision
Design Decision Design Justification Design Implications
ID

TKO- Use static IP addresses for the NSX NSX ALB controller cluster uses NSX ALB Controller
ALB- ALB controllers. management IP addresses to form and control plane might go
004 maintain quorum for the control plane down if the management
cluster. Any changes to management IP addresses of the
IP addresses are disruptive. controller node change.

TKO- Use NSX ALB IPAM for service Guarantees IP address assignment for None
ALB- engine data network and virtual service engine data NICs and virtual
005 services. services.

TKO- Reserve an IP address in the NSX NSX ALB portal is always accessible None
ALB- ALB management subnet to be over cluster IP address regardless of a
006 used as the cluster IP address for specific individual controller node
the controller cluster. failure.

TKO- Share service engines for the same Minimize the licensing cost.
Each service engine
ALB- type of workload (dev/test/prod)
contributes to the CPU
007 clusters.
core capacity associated
with a license.

Sharing service engines


can help reduce the
licensing cost.

TKO- Configure anti-affinity rules for the This is to ensure that no two Anti-affinity rules need to
ALB- NSX ALB controller cluster. controllers end up in same ESXi host be created manually.
008 and thus avoid single point of failure.

TKO- Configure backup for the NSX ALB Backups are required if the NSX ALB To store backups, a SCP
ALB- Controller cluster. Controller becomes inoperable or if capable backup location is
009 the environment needs to be restored needed. SCP is the only
from a previous state. supported protocol
currently.

TKO- Create an NSX-T Cloud connector An NSX-T Cloud connector None


ALB-010 on NSX Advanced Load Balancer configured on the NSX Advanced
controller for each NSX transport Load Balancer controller provides load
zone requiring load balancing. balancing for workloads belonging to
a transport zone on NSX.

TKO- Replace default NSX ALB To establish a trusted connection with None,
ALB-011 certificates with Custom CA or other infra components, the default SAN entries are not
Public CA-signed certificates that certificate doesn’t include SAN entries applicable if using wild
contains SAN entries of all which is not acceptable by Tanzu. card certificate.
Controller nodes

TKO- Create a dedicated resource pool Guarantees the CPU and Memory None
ALB-012 with appropriate reservations for allocation for NSX ALB Controllers
NSX ALB controllers. and avoids performance degradation
in case of resource contention.

TKO- Configure Remote logging for NSX For operations teams to centrally Additional Operational
ALB-013 ALB Controller to send events on monitor NSX ALB and escalate alerts Overhead.
Syslog. events sent from the NSX ALB Additional infrastructure
Controller. Resource.

VMware, Inc 255


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Decision
Design Decision Design Justification Design Implications
ID

TKO- Use LDAP/SAML based Helps to Maintain Role based Access Additional Configuration
ALB-014 Authentication for NSX ALB Control. is required.

NSX Advanced Load Balancer Service Engine Recommendations


Decision
Design Decision Design Justification Design Implications
ID

TKO- Configure SE Group for Provides optimum Certain applications might not work in
ALB-SE- Active/Active HA mode. resiliency, performance, Active/Active mode. For example,
001 and utilization. applications that require preserving client IP
address. In such cases, use the legacy
Active/Standby HA mode.

TKO- Configure anti-affinity rule This is ensure that no two DRS must be enabled on vSphere cluster
ALB-SE- for the SE VMs. SEs in the same SE group where SE VMs are deployed.
002 end up on same ESXi Host
and thus avoid single point
of failure.

TKO- Configure CPU and This is to ensure that CPU and memory reservation is configured
ALB-SE- Memory reservation for the service engines don’t at SE Group level.
003 SE VMs. compete with other VMs
during resource
contention.

TKO- Enable ‘Dedicated This enables a dedicated None.


ALB-SE- dispatcher CPU’ on SE core for packet processing
004 groups that contain the SE enabling high packet
VMs of 4 or more vCPUs. pipeline on the SE VMs.
Note: This setting must be
enabled on SE groups that
are servicing applications
that have high network
requirement.

TKO- Create multiple SE groups Allows efficient isolation of Multi SE groups will increase the licensing
ALB-SE- as desired to isolate applications for better cost.
005 applications. capacity planning.
Allows flexibility of life-
cycle-management.

TKO- Create separate service Allows isolating the load Dedicated service engine groups increase
ALB-SE- engine groups for TKG balancing traffic of licensing cost.
006 management and workload management cluster from
clusters. shared services cluster and
workload clusters.

TKO- Set ‘Placement across the This allows maximum fault None
ALB-SE- Service Engines’ setting to tolerance and even
007 ‘distributed’. utilization of capacity.

TKO- Set the SE size to a This configuration should For services that require higher throughput,
ALB-SE- minimum 2vCPU and 4GB meet the most generic use these configurations need to be investigated
008 of Memory. case. and modified accordingly.

VMware, Inc 256


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

NSX Advanced Load Balancer L7 Ingress Recommendations


Decision Design
Design Justification Design Implications
ID Decision

TKO- Deploy 1. Network hop efficiency is 1. This is supported only with Antrea CNI. ß
ALB-L7- NSX ALB gained by bypassing the kube- 2. ‘NodePortLocal’ mode is currently only supported
001 L7 Ingress proxy to receive external traffic for nodes running Linux or Windows with IPv4
in to applications. addresses. Only TCP and UDP service ports are
NodePortL 2. TKG clusters can share SE supported (not SCTP). For more information, see
ocal groups, optimizing or Antrea NodePortLocal Documentation.
mode. maximizing capacity and
license consumption.
3. Pod’s node port only exists
on nodes where the Pod is
running, and it helps to reduce
the east-west traffic and
encapsulation overhead.
4. Better session persistence.

VMware recommends using NSX Advanced Load Balancer L7 ingress with the NodePortLocal mode
as it gives you a distinct advantage over other modes as mentioned below:

Although there is a constraint of one SE group per Tanzu Kubernetes Grid cluster, which
results in increased license capacity, ClusterIP provides direct communication to the
Kubernetes pods, enabling persistence and direct monitoring of individual pods.

NodePort resolves the issue for needing a SE group per workload cluster, but a kube-proxy
is created on each and every workload node even if the pod doesn’t exist in it, and there’s
no direct connectivity. Persistence is then broken.

NodePortLocal is the best of both use cases. Traffic is sent directly to the pods in your
workload cluster through node ports without interfering with kube-proxy. SE groups can be
shared and load balancing persistence is supported.

Network Recommendations
The key network recommendations for a production-grade Tanzu Kubernetes Grid deployment with
NSX Data Center Networking are as follows:

Decision
Design Decision Design Justification Design Implications
ID

TKO- Use separate logical segments for To have a flexible firewall and Sharing the same network for
NET-001 management cluster, shared services security policies. multiple clusters can complicate
cluster, workload clusters, and VIP firewall rules creation.
network.

TKO- Configure DHCP for each TKG cluster Tanzu Kubernetes Grid does IP address pool can be used for
NET- network. not support static IP address the TKG clusters in absence of
002 assignments for Kubernetes the DHCP.
VM components.

TKO- Use NSX for configuring DHCP. This avoids setting up For a simpler configuration,
NET- dedicated DHCP server for make use of the DHCP local
003 TKG. server to provide DHCP services
for required segments.

VMware, Inc 257


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Decision
Design Decision Design Justification Design Implications
ID

TKO- Create a overlay-backed NSX This network is used for the None
NET- segment connected to a Tier-1 controller to the SE
004 gateway for the SE management for connectivity.
the NSX Cloud of overlay type.

TKO- Create a overlay-backed NSX The SEs are placed on overlay None
NET- segment as data network for the NSX Segments created on Tier-1
005 Cloud of overlay type. gateway.

With Tanzu Kubernetes Grid 2.3 and above, you can use Node IPAM, which simplifies the allocation
and management of IP addresses for cluster nodes within the cluster. This eliminates the need for
external DHCP configuration. The Node IPAM can be configured for standalone management
clusters on vSphere, and the associated class-based workload clusters that they manage. In the
Tanzu Kubernetes Grid Management configuration file, a dedicated Node IPAM pool is defined for
the management cluster only. The following types of Node IPAM pools are available for workload
clusters: - InClusterIPPool - Configures IP pools that are only available to workload clusters in the
same management cluster namespace. For example, default. - GlobalInClusterIPPool - Configures
IP pools with addresses that can be allocated to workload clusters across multiple namespaces. Node
IPAM in TKG provides flexibility in managing IP addresses for both management and workload
clusters that allows efficient IP allocation and management within the cluster environment.

Tanzu Kubernetes Grid Clusters Recommendations


Decision
Design Decision Design Justification Design Implications
ID

TKO- Register management cluster Tanzu Mission Control automates the creation of Only Antrea CNI is
TKG-001 with Tanzu Mission Control the Tanzu Kubernetes clusters and manages the supported on
(TMC). lifecycle of all clusters centrally. Workload clusters
created via TMC
Portal.

TKO- Use NSX Advanced Load Eliminates the requirement for an external load Adds NSX Advanced
TKG- Balancer as your control plane balancer and additional configuration changes Load Balancer
002 endpoint provider and for on your Tanzu Kubernetes Grid clusters. License cost to the
application load balancing. solution.

TKO- Deploy Tanzu Kubernetes Large form factor should suffice to integrate TKG
Consume more
TKG- Management cluster in large Management cluster with TMC, pinniped and
resources from
003 form factor. Velero. This must be capable of accommodating
infrastructure.
100+ Tanzu Workload Clusters.

TKO- Deploy the Tanzu Kubernetes Deploying three control plane nodes ensures the
Consume more
TKG- Cluster with prod state of your Tanzu Kubernetes Cluster control
resources from
004 plan(Management and plane stays healthy in the event of a node failure.
infrastructure.
Workload Clusters).

TKO- Enable identity management To avoid usage of administrator credentials and


Required external
TKG- for Tanzu Kubernetes Grid ensure that required users with right roles have
Identity
005 clusters. access to Tanzu Kubernetes Grid clusters.
Management.

VMware, Inc 258


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Decision
Design Decision Design Justification Design Implications
ID

TKO- Enable MachineHealthCheck vSphere HA and MachineHealthCheck NA


TKG- for TKG clusters. interoperability work together to enhance
006 workload resiliency.

Container Registry
VMware Tanzu for Kubernetes Operations using Tanzu Kubernetes Grid includes Harbor as a
container registry. Harbor provides a location for pushing, pulling, storing, and scanning container
images used in your Kubernetes clusters.

Harbor registry is used for day-2 operations of the Tanzu Kubernetes workload clusters. Typical day-
2 operations include tasks such as pulling images from Harbor for application deployment, pushing
custom images to Harbor, and so on.

You may use one of the following methods to install Harbor:

Tanzu Kubernetes Grid Package deployment - VMware recommends this installation


method for general use cases. The Tanzu packages, including Harbor, must either be pulled
directly from VMware or be hosted in an internal registry.

VM-based deployment using OVA - VMware recommends this installation method in cases
where Tanzu Kubernetes Grid is being installed in an air-gapped or Internet-restricted
environment, and no pre-existing image registry exists to host the Tanzu Kubernetes Grid
system images. VM-based deployments are only supported by VMware Global Support
Services to host the system images for air-gapped or Internet-restricted deployments. Do
not use this method for hosting application images.

If you are deploying Harbor without a publicly signed certificate, you must include the Harbor root
CA in your Tanzu Kubernetes Grid clusters. To do so, follow the procedure in Trust Custom CA
Certificates on Cluster Nodes.

VMware, Inc 259


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Tanzu Kubernetes Grid Monitoring


Monitoring for the Tanzu Kubernetes clusters is provided through Prometheus and Grafana. Both
Prometheus and Grafana can be installed on Tanzu Kubernetes Grid clusters through Tanzu
Packages.

Prometheus is an open-source system monitoring and alerting toolkit. It can collect metrics from
target clusters at specified intervals, evaluate rule expressions, display the results, and trigger alerts if
certain conditions arise. The Tanzu Kubernetes Grid implementation of Prometheus includes Alert
Manager, which you can configure to notify you when certain events occur.

Grafana is open-source visualization and analytics software. It allows you to query, visualize, alert on,
and explore your metrics no matter where they are stored. Both Prometheus and Grafana are
installed through user-managed Tanzu packages by creating the deployment manifests and invoking
the tanzu package install command to deploy the packages in the Tanzu Kubernetes clusters.

The following diagram shows how the monitoring components on a cluster interact.

You can use out-of-the-box Kubernetes dashboards or you can create new dashboards to monitor
compute, network, and storage utilization of Kubernetes objects such as Clusters, Namespaces,
Pods, and so on.

You can also monitor your Tanzu Kubernetes Grid clusters with Tanzu Observability which is a SaaS
offering by VMware. Tanzu Observability provides various out-of-the-box dashboards. You can
customize the dashboards for your particular deployment. For information on how to customize
Tanzu Observability dashboards for Tanzu for Kubernetes Operations, see Customize Tanzu
Observability Dashboard for Tanzu for Kubernetes Operations.

Tanzu Kubernetes Grid Logging


Metrics and logs are critical for any system or application as they provide insights into the activities of
the system or the application. It is important to have a central place to observe a multitude of metrics
and log sources from multiple endpoints.

Log processing and forwarding in Tanzu Kubernetes Grid is provided via Fluent Bit. Fluent bit
binaries are available as part of extensions and can be installed on management cluster or in
workload cluster. Fluent Bit is a light-weight log processor and forwarder that allows you to collect

VMware, Inc 260


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

data and logs from different sources, unify them, and send them to multiple destinations. VMware
Tanzu Kubernetes Grid includes signed binaries for Fluent Bit that you can deploy on management
clusters and on Tanzu Kubernetes clusters to provide a log-forwarding service.

Fluent Bit uses the Input Plug-ins, the filters, and the Output Plug-ins. The Input Plug-ins define the
source from where it can collect data, and the Output plug-ins define the destination where it should
send the information. The Kubernetes filter will enrich the logs with Kubernetes metadata,
specifically labels and annotations. Once you configure Input and Output plug-ins on the Tanzu
Kubernetes Grid cluster. Fluent Bit is installed as a user-managed package.

Fluent Bit integrates with logging platforms such as VMware Aria Operations for Logs, Elasticsearch,
Kafka, Splunk, or an HTTP endpoint. For more details about configuring Fluent Bit to your logging
provider, see Implement Log Forwarding with Fluent Bit.

Bring Your Own Images for Tanzu Kubernetes Grid


Deployment
You can build custom machine images for Tanzu Kubernetes Grid to use as a VM template for the
management and Tanzu Kubernetes (workload) cluster nodes that it creates. Each custom machine
image packages a base operating system (OS) version and a Kubernetes version, along with any
additional customizations, into an image that runs on vSphere, Microsoft Azure infrastructure, and
AWS (EC2) environments.

A custom image must be based on the operating system (OS) versions that are supported by Tanzu
Kubernetes Grid. The table below provides a list of the operating systems that are supported for
building custom images for Tanzu Kubernetes Grid.

vSphere AWS Azure

- Ubuntu 20.04 - Ubuntu 20.04 - Ubuntu 20.04

- Ubuntu 18.04 - Ubuntu 18.04 - Ubuntu 18.04

- RHEL 8 - Amazon Linux 2

- Photon OS 3

- Windows 2019

For more information about building custom images for Tanzu Kubernetes Grid, see Build Machine
Images.

Linux Custom Machine Images

Windows Custom Machine Images

Compliance and Security


VMware published Tanzu Kubernetes releases (TKrs), along with compatible versions of Kubernetes
and supporting components, use the latest stable and generally-available update of the OS version
that they package. They contain all current CVE and USN fixes, as of the day that the image is built.
The image files are signed by VMware and have file names that contain a unique hash identifier.

VMware provides FIPS-capable Kubernetes OVA which can be used to deploy FIPS compliant
Tanzu Kubernetes Grid management and workload clusters. Tanzu Kubernetes Grid core

VMware, Inc 261


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

components, such as Kubelet, Kube-apiserver, Kube-controller manager, Kube-proxy, Kube-


scheduler, Kubectl, Etcd, Coredns, Containerd, and Cri-tool are made FIPS compliant by compiling
them with the BoringCrypto FIPS modules, an open-source cryptographic library that provides FIPS
140-2 approved algorithms.

Tanzu Kubernetes Grid and Tanzu SaaS Integration


The SaaS products in the VMware Tanzu portfolio are in the critical path for securing systems at the
heart of your IT infrastructure. VMware Tanzu Mission Control provides a centralized control plane for
Kubernetes, and Tanzu Service Mesh provides a global control plane for service mesh networks.
Tanzu Observability provides Kubernetes monitoring, application observability, and service insights.

To learn more about Tanzu Kubernetes Grid integration with Tanzu SaaS, see Tanzu SaaS Services.

Appendix A - Configure Node Sizes


The Tanzu CLI creates the individual nodes of management clusters and Tanzu Kubernetes clusters
according to the settings that you provide in the configuration file.

On vSphere, you can configure all node VMs to have the same predefined configurations, set
different predefined configurations for control plane and worker nodes, or customize the
configurations of the nodes. By using these settings, you can create clusters that have nodes with
different configuration compared to the configuration of management cluster nodes. You can also
create clusters in which the control plane nodes and worker nodes have different configuration.

Use Predefined Node Configuration


The Tanzu CLI provides the following predefined configuration for cluster nodes:

Size CPU Memory (in GB) Disk (in GB)

Small 2 4 20

Medium 2 8 40

Large 4 16 40

Extra-large 8 32 80

To create a cluster in which all of the control plane and worker node VMs are the same size, specify
the SIZE variable. If you set the SIZE variable, all nodes are created with the configuration that you
set.

SIZE: "large"

To create a cluster in which the control plane and worker node VMs are different sizes, specify the
CONTROLPLANE_SIZE and WORKER_SIZE options.

CONTROLPLANE_SIZE: "medium"

WORKER_SIZE: "large"

You can combine the CONTROLPLANE_SIZE and WORKER_SIZE options with the SIZE option. For
example, if you specify SIZE: "large" with WORKER_SIZE: "extra-large", the control plane nodes
are set to large and worker nodes are set to extra-large.

VMware, Inc 262


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

SIZE: "large"

WORKER_SIZE: "extra-large"

Define Custom Node Configurations


You can customize the configuration of the nodes rather than using the predefined configurations.

To use the same custom configuration for all nodes, specify the VSPHERE_NUM_CPUS,
VSPHERE_DISK_GIB, and VSPHERE_MEM_MIB options.

VSPHERE_NUM_CPUS: 2

VSPHERE_DISK_GIB: 40

VSPHERE_MEM_MIB: 4096

To define different custom configurations for control plane nodes and worker nodes, specify the
VSPHERE_CONTROL_PLANE_* and VSPHERE_WORKER_* options.

VSPHERE_CONTROL_PLANE_NUM_CPUS: 2

VSPHERE_CONTROL_PLANE_DISK_GIB: 20

VSPHERE_CONTROL_PLANE_MEM_MIB: 8192

VSPHERE_WORKER_NUM_CPUS: 4

VSPHERE_WORKER_DISK_GIB: 40

VSPHERE_WORKER_MEM_MIB: 4096

Appendix B - NSX Advanced Load Balancer Sizing Guidelines


NSX Advanced Load Balancer Controller Sizing Guidelines
Regardless of NSX Advanced Load Balancer Controller configuration, each controller cluster can
achieve up to 5000 virtual services, which is a hard limit. For further details, see Sizing Compute and
Storage Resources for NSX Advanced Load Balancer Controller(s).

Controller Size VM Configuration Virtual Services Avi SE Scale

Essential 4 vCPUS, 24 GB RAM 0-50 0-10

Small 6 vCPUS, 24 GB RAM 0-200 0-100

Medium 10 vCPUS, 32 GB RAM 200-1000 100-200

Large 16 vCPUS, 48 GB RAM 1000-5000 200-400

Service Engine Sizing Guidelines


For guidance on sizing your service engines (SEs), see Sizing Compute and Storage Resources for
NSX Advanced Load Balancer Service Engine(s).

Performance metric 1 vCPU core

VMware, Inc 263


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Throughput 4 Gb/s

Connections/s 40k

SSL Throughput 1 Gb/s

SSL TPS (RSA2K) ~600

SSL TPS (ECC) 2500

Multiple performance vectors or features may have an impact on performance. For instance, to
achieve 1 Gb/s of SSL throughput and 2000 TPS of SSL with EC certificates, NSX Advanced Load
Balancer recommends two cores.

NSX Advanced Load Balancer SEs may be configured with as little as 1 vCPU core and 1 GB RAM, or
up to 36 vCPU cores and 128 GB RAM. SEs can be deployed in Active/Active or Active/Standby
mode depending on the license tier used. NSX Advanced Load Balancer Essentials license doesn’t
support Active/Active HA mode for SE.

Summary
Tanzu Kubernetes Grid on vSphere offers high-performance potential, convenience, and addresses
the challenges of creating, testing, and updating on-premises Kubernetes platforms in a consolidated
production environment. This validated approach results in a near-production quality installation with
all the application services needed to serve combined or uniquely separated workload types through
a combined infrastructure solution.

This plan meets many day-0 needs for quickly aligning product capabilities to full stack infrastructure,
including networking, firewalling, load balancing, workload compute alignment, and other
capabilities. Observability is quickly established and easily consumed with Tanzu Observability.

Deployment Instructions
For instructions to deploy this reference design, see Deploy VMware Tanzu for Kubernetes
Operations on VMware vSphere with VMware NSX.

Deploy VMware Tanzu for Kubernetes Operations on


VMware vSphere with VMware NSX
This document provides step-by-step instructions for deploying VMware Tanzu Kubernetes
Operations (informally known as TKO) in an Internet available vSphere environment backed by the
NSX Data Center networking.

The scope of the document is limited to providing deployment steps based on the reference design
in VMware Tanzu for Kubernetes Operations on vSphere with NSX-T. It does not cover deployment
procedures for the underlying SDDC components.

Deploying with VMware Service Installer for Tanzu


You can use VMware Service Installer for VMware Tanzu to automate this deployment.

VMware Service Installer for Tanzu automates the deployment of the reference designs for Tanzu for

VMware, Inc 264


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Kubernetes Operations. It uses best practices for deploying and configuring the required Tanzu for
Kubernetes Operations components.

To use Service Installer to automate this deployment, see Deploying VMware Tanzu for Kubernetes
Operations on vSphere with NSX-T Using Service Installer for VMware Tanzu.

Alternatively, if you decide to manually deploy each component, follow the steps provided in this
document.

Supported Component Matrix


The following table lists the validated software components that can be used to install Tanzu
Kubernetes Grid on your vSphere with NSX environment:

Software Components Version

Tanzu Kubernetes Grid 2.3.0

VMware vSphere ESXi 8.0 U1 or later

VMware vCenter (VCSA) 8.0 U1 or later

NSX Advanced Load Balancer 22.1.3

VMware NSX-T 4.1.0.2

For the latest information about the software versions can be used together, see the Interoperability
Matrix.

Prepare Environment for Deploying Tanzu for Kubernetes


Operations
Before deploying Tanzu for Kubernetes Operations on vSphere, ensure that your environment is set
up as described in the following requirements:

General Requirements

Network Requirements

Firewall Requirements

General Requirements
A vCenter with NSX backed environment.

Ensure to complete the following NSX configurations:

Note

The following provides only a high-level overview of the required NSX


configuration. For more information, see NSX Data Center Installation Guide
and NSX Data Center Product Documentation.

NSX manager instance is deployed and configured with Advanced or higher license.

VMware, Inc 265


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

vCenter Server that is associated with the NSX Data Center is configured as Compute
Manager.

Required overlay and vLAN Transport Zones are created.

IP pools for host and edge tunnel endpoints (TEP) are created.

Host and edge uplink profiles are in place.

Transport node profiles are created. This is not required if configuring NSX data
center on each host instead of the cluster.

NSX data center configured on all hosts part of the vSphere cluster or clusters.

Edge transport nodes and at least one edge cluster is created.

Tier-0 uplink segments and tier-0 gateway is created.

Tier-0 router is peered with uplink L3 switch.

DHCP profile is created in NSX.

SDDC environment has the following objects in place:


A vSphere cluster with at least three hosts on which vSphere DRS is enabled and
NSX is successfully configured.

A dedicated resource pool to deploy the following Tanzu Kubernetes management


cluster, shared services cluster, and workload clusters. The number of required
resource pools depends on the number of workload clusters to be deployed.

VM folders to collect the Tanzu Kubernetes Grid VMs.

A datastore with sufficient capacity for the control plane and worker node VM files.

Network time protocol (NTP) service is running on all hosts and vCenter.

A host, server, or VM based on Linux, macOS, or Windows which acts as your


bootstrap machine which has docker installed. For this deployment, a virtual machine
based on Photon OS will be used.

Depending on the OS flavor of the bootstrap VM, download and configure the
following packages from VMware Customer Connect. To configure required
packages on the Cent OS machine, see Deploy and Configure Bootstrap Machine."

Tanzu CLI 2.3.0

Kubectl cluster CLI 1.26.5

A vSphere account with permissions as described in Required Permissions for the


vSphere Account.

Download and import NSX Advanced Load Balancer 22.1.3 OVA to Content Library.

Download the following OVA files from VMware Customer Connect and import to
vCenter. Convert the imported VMs to templates."

Photon v3 Kubernetes v1.26.5 OVA and/or

Ubuntu 2004 Kubernetes v1.26.5 OVA

Note

VMware, Inc 266


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

You can also download supported older versions of Kubernetes from VMware
Customer Connect and import them to deploy workload clusters on the intended
Kubernetes versions.

In Tanzu Kubernetes Grid nodes, it is recommended to not use hostnames with


“.local” domain suffix. For more information, see KB article.

Resource Pools and VM Folders:

The sample entries of the resource pools and folders that need to be created are as follows:

Resource Type Sample Resource Pool Name Sample Folder Name

NSX ALB Components tkg-vsphere-alb-components tkg-vsphere-alb-components

TKG Management components tkg-management-components tkg-management-components

TKG Shared Service Components tkg-vsphere-shared-services tkg-vsphere-shared-services

TKG Workload components tkg-vsphere-workload tkg-vsphere-workload

Network Requirements
Create separate logical segments in NSX for deploying TKO components as per Network
Requirements defined in the reference architecture.

Firewall Requirements
Ensure that the firewall is set up as described in Firewall Requirements.

Subnet and CIDR Example


For this demonstration, this document uses the following subnet CIDR for Tanzu for Kubernetes
Operations deployment:

Gateway
Network Type Segment Name DHCP Pool in NSXT NSX ALB IP Pool
CIDR

NSX ALB Management sfo01-w01-vds01- 172.16.10.1 N/A 172.16.10.100 -


Network albmanagement /24 172.16.10.200

TKG Cluster VIP sfo01-w01-vds01- 172.16.80. N/A 172.16.80.100 -


Network tkgclustervip 1/24 172.16.80.200

TKG Management sfo01-w01-vds01- 172.16.40. 172.16.40.100 - N/A


Network tkgmanagement 1/24 172.16.40.200

TKG Shared Service sfo01-w01-vds01- 172.16.50. 172.16.50.100 - N/A


Network tkgshared 1/27 172.16.50.200

TKG Workload Network sfo01-w01-vds01- 172.16.60. 172.16.60.100- N/A


tkgworkload 1/24 172.16.60.200

TKG Workload VIP sfo01-w01-vds01- 172.16.70. N/A 172.16.70.100-


Network tkgworkloadvip 1/24 172.16.70.200

VMware, Inc 267


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Deployment Overview
The steps for deploying Tanzu for Kubernetes Operations on vSphere backed by NSX-T are as
follows:

1. Configure T1 Gateway and Logical Segments in NSX Data Center

2. Deploy and Configure NSX Advanced Load Balancer

3. Deploy and Configure Bootstrap Machine

4. Deploy Tanzu Kubernetes Grid Management Cluster

5. Register Tanzu Kubernetes Grid Management Cluster with Tanzu Mission Control

6. Deploy Tanzu Kubernetes Grid Shared Service Cluster

7. Deploy Tanzu Kubernetes Grid Workload Cluster

8. Integrate Tanzu Kubernetes Clusters with Tanzu Observability

9. Integrate Tanzu Kubernetes Clusters with Tanzu Service Mesh

10. Deploy User-Managed Packages on Tanzu Kubernetes Grid Clusters

Configure T1 Gateway and Logical Segments in NSX-T Data


Center
As a prerequisite, an NSX backed vSphere environment must be configured with at least one tier-0
gateway. A tier-0 gateway performs the functions of a tier-0 logical router. It processes traffic
between the logical and physical networks. For more information about creating and configuring a
tier-0 gateway, see NSX documentation.

This procedure comprises the following tasks:

1. Add two Tier-1 Gateways

2. Create Overlay-Backed Segments

Add a Tier-1 Gateway


The tier-1 logical router must be connected to the tier-0 logical router to get the northbound
physical router access. The following procedure provides the minimum required configuration to
create a tier-1 gateway, which is adequate to successfully deploy the Tanzu for Kubernetes
Operations stack. For a more advanced configuration, see NSX documentation.

1. With admin privileges, log in to NSX Manager.

2. Select Networking > Tier-1 Gateways.

3. Click Add Tier-1 Gateway.

4. Enter a name for the gateway.

5. Select a tier-0 gateway to connect to this tier-1 gateway to create a multi-tier topology.

6. Select an NSX Edge cluster. This is required for this tier-1 gateway to host stateful services
such as NAT, load balancer, or firewall.

7. (Optional) In the Edges field, select Auto Allocated or manually set the edge nodes.

VMware, Inc 268


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

8. Select a failover mode or accept the default. The default option is Non-preemptive.

9. Select Enable Standby Relocation.

10. Click Route Advertisement and ensure that following routes are selected:

All DNS Forwarder Routes

All Connected Segments and Service Ports

All IPSec Local Endpoints

All LB VIP Routes

All LB SNAT IP Routes

11. Click Save.

12. Repeat steps from 1-11 and create another Tier-1 gateway.

DHCP configuration on Tier-1 Gateway

Complete the following steps to set the DHCP configuration in both the tier-1 gateways:

VMware, Inc 269


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

1. With admin privileges, log in to NSX Manager.

2. Select Networking > Tier-1 Gateways.

3. On the tier-1 gateway that you created earlier, click the three dots menu and select Edit.

4. Next to DHCP Config, click Set.

5. In the Set DHCP Configuration dialog box, set Type to DHCP Server and select the DHCP
profile that you created as part of the prerequisites.

6. Click Save.

Create Overlay-Backed Segments


VMware NSX provides the option to add two kinds of segments: overlay-backed segments and
VLAN-backed segments. Segments are created as part of a transport zone. There are two types of
transport zones: VLAN transport zones and overlay transport zones. A segment created in a VLAN
transport zone is a VLAN-backed segment and a segment created in an overlay transport zone is an
overlay-backed segment.

Create the overlay backed logical segments as shown in the Overlay backed segments CIDR

VMware, Inc 270


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

example. All these segments will be a part of the same overlay transport zone and they must be
connected to tier-1 gateway.

Note: NSX ALB Management Network, TKG Cluster VIP Network, TKG Management Network, and
TKG Shared Service Network must be connected to sfo01w01tier1 while TKG Workload Network and
TKG Workload VIP Network should be connected to sfo01w01tier2.

Note If you want to use TKG Cluster VIP Network to be used for applications deployed in workload
cluster, connect all network segments to sfo01w01tier1 tier-1 gateway.

The following procedure provides required details to create one such network which is required for
the Tanzu for Kubernetes Operations deployment:

1. With admin privileges, log in to NSX Manager

2. Select Networking > Segments.

3. Click ADD SEGMENT and enter a name for the segment. For example, sfo01-w01-vds01-
tkgmanagement

4. Under Connected Gateway, select the tier-1 gateway that you created earlier.

5. Under Transport Zone, select a transport zone that will be an overlay transport zone.

6. Under Subnets, enter the gateway IP address of the subnet in the CIDR format. For
example, 172.16.40.1/24

Note

The following step is required only for Tanzu Kubernetes Grid management
network, shared services network, and workload network.

7. Click SET DHCP CONFIG.

DHCP Type field is set to Gateway DHCP Server and DHCP Profile is set to the profile
created while creating the tier-1 gateway.

1. Click Settings, select Enable DHCP Config, and enter the DHCP range and DNS
server information.

VMware, Inc 271


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

2. Click Options and under Select DHCP Options, select GENERIC OPTIONS.

3. Click ADD GENERIC OPTION, Add NTP servers (42) and Domain Search (119).

4. Click Save to create the logical segment.

Repeat steps 1-7 to create all other required overlay-backed segments. Once completed, you should
see an output similar to the following screenshot:

VMware, Inc 272


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Additionally, you can create the required inventory groups and firewall rules. For more information,
see NSX Data Center Product Documentation.

Deploy and Configure NSX Advanced Load Balancer


NSX Advanced Load Balancer (ALB) is an enterprise-grade integrated load balancer that provides
L4- L7 load balancer support.

NSX Advanced Load Balancer is deployed in Write Access Mode in the vSphere Environment
backed by NSX. This mode grants NSX Advanced Load Balancer controllers full write access to the
vCenter or NSX which helps in automatically creating, modifying, and removing service engines
(SEs) and other resources as needed to adapt to changing traffic needs.

The sample IP address and FQDN set for the NSX Advanced Load Balancer controllers is as follows:

Controller Node IP Address FQDN

Node 1 Primary 172.16.10.11 sfo01albctlr01a.sfo01.rainpole.local

Node 2 Secondary 172.16.10.12 sfo01albctlr01b.sfo01.rainpole.local

Node 3 Secondary 172.16.10.13 sfo01albctlr01c.sfo01.rainpole.local

HA Address 172.16.10.10 sfo01albctlr01.sfo01.rainpole.local

Deploy NSX Advanced Load Balancer


As part of the prerequisites, you must have the NSX Advanced Load Balancer 22.1.3 OVA
downloaded and imported to the content library. Deploy the NSX Advanced Load Balancer under
the resource pool tkg-vsphere-alb-components and place it under the folder tkg-vsphere-alb-
components.

To deploy NSX Advanced Load Balancer, complete the following steps.

1. Log in to vCenter and navigate to Home > Content Libraries.

2. Select the content library under which the NSX-ALB OVA is placed.

3. Click on OVA & OVF Templates.

4. Right-click the NSX Advanced Load Balancer image and select New VM from this

VMware, Inc 273


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Template.

5. On the Select name and folder page, enter a name and select a folder for the NSX
Advanced Load Balancer VM as tkg-vsphere-alb-components.

6. On the Select a compute resource page, select the resource pool tkg-vsphere-alb-
components.

7. On the Review details page, verify the template details and click Next.

8. On the Select storage page, select a storage policy from the VM Storage Policy drop-down
menu and choose the datastore location where you want to store the virtual machine files.

9. On the Select networks page, select the network sfo01-w01-vds01-albmanagement and click
Next.

10. On the Customize template page, provide the NSX Advanced Load Balancer management
network details such as IP address, subnet mask, and gateway, and then click Next.

11. On the Ready to complete page, review the provided information and click Finish.

A new task for creating the virtual machine appears in the Recent Tasks pane. After the task is
complete, the NSX Advanced Load Balancer virtual machine is created on the selected resource.
Power on the virtual machine and give it a few minutes for the system to boot. Upon successful boot
up, navigate to NSX Advanced Load Balancer on your browser.

Note

While the system is booting up, a blank web page or a 503 status code might
appear.

NSX Advanced Load Balancer: Initial Setup

VMware, Inc 274


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Once NSX Advanced Load Balancer is successfully deployed and running, navigate to NSX
Advanced Load Balancer on your browser using the URL https://<IP/FQDN> and configure the
basic system settings:

1. Set admin password and click on Create Account.

2. On the Welcome page, under System Settings, set backup passphrase and provide DNS
information, and then click Next.

3. Under Email/SMTP, provide email and SMTP information, and then click Next.

VMware, Inc 275


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

4. Under Multi-Tenant, configure settings as follows and click Save.

IP Route Domain: Share IP route domain across tenants

Service Engines are managed within the: Provider (Shared across tenants)

Tenant Access to Service Engine: Read

If you did not select the Setup Cloud After option before saving, the initial configuration wizard exits.
The Cloud configuration window does not automatically launch, and you are directed to a dashboard
view on the controller.

NSX Advanced Load Balancer: NTP Configuration


To configure NTP, navigate to Administration > System Settings > Edit the System Settings and
Select DNS/NTP. Add your NTP server details, and then click Save.

Note

VMware, Inc 276


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

You may also delete the default NTP servers.

NSX Advanced Load Balancer: Licensing


This document focuses on enabling NSX Advanced Load Balancer using the license model:
Enterprise License (VMware NSX ALB Enterprise).

1. To configure licensing, navigate to Administration > Licensing, and click on the gear icon to
change the license type to Enterprise.

VMware, Inc 277


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

2. Select Enterprise Tier as the license type and click Save.

3. Once the license tier is changed, apply the NSX Advanced Load Balancer Enterprise license
key. If you have a license file instead of a license key, apply the license by selecting the
Upload a License File option.

VMware, Inc 278


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

NSX Advanced Load Balancer: Controller High Availability


In a production environment, it is recommended to deploy additional controller nodes and configure
the controller cluster for high availability and disaster recovery. Adding 2 additional nodes to create a
3-node cluster provides node-level redundancy for the controller and also maximizes performance
for CPU-intensive analytics functions.

To run a 3-node controller cluster, you deploy the first node and perform the initial configuration,
and set the cluster IP address. After that, you deploy and power on two more controller VMs, but
you must not run the initial configuration wizard or change the admin password for these controllers
VMs. The configuration of the first controller VM is assigned to the two new controller VMs.

The first controller of the cluster receives the Leader role. The second and third controllers work as
Follower.

Complete the following steps to configure NSX Advanced Load Balancer cluster:

1. Log in to the primary NSX Advanced Load Balancer controller and navigate to Administrator
> Controller > Nodes, and then click Edit.

2. Specify Name and Controller Cluster IP, and then click Save. This IP address must be from
the NSX ALB management network.

3. Deploy the 2nd and 3rd NSX Advanced Load Balancer controller nodes by using steps in
Deploy NSX Advanced Load Balancer.

4. Log into the primary NSX Advanced Load Balancer controller using the Controller Cluster
IP/FQDN and navigate to Administrator > Controller > Nodes, and then click Edit. The Edit
Controller Configuration popup appears.

VMware, Inc 279


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

5. In the Cluster Nodes field, enter the IP address for the 2nd and 3rd controller, and then click
Save.

After you complete these steps, the primary NSX Advanced Load Balancer controller
becomes the leader for the cluster and invites the other controllers to the cluster as
members.

NSX Advanced Load Balancer then performs a warm reboot of the cluster. This process can
take approximately 10-15 minutes. You will be automatically logged out of the controller node
where you are currently logged in. On entering the cluster IP address in the browser, you
can see details about the cluster formation task.

The configuration of the primary (leader) controller is synchronized to the new member nodes when
the cluster comes online following the reboot. Once the cluster is successfully formed, you can see
the following status:

VMware, Inc 280


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Note

In the following tasks, all NSX Advanced Load Balancer configurations are done by
connecting to the NSX Advanced Load Balancer Controller Cluster IP/FQDN.

NSX Advanced Load Balancer: Certificate Management


The default system-generated controller certificate generated for SSL/TSL connections will not have
the required subject alternate name (SAN) entries. Complete the following steps to create a
controller certificate:

1. Log in to the NSX Advanced Load Balancer controller and navigate to Templates > Security
> SSL/TLS Certificates.

2. Click Create and select Controller Certificate. You can either generate a self-signed
certificate, generate CSR, or import a certificate. For the purpose of this document, a self-
signed certificate will be generated.

3. Provide all required details as per your infrastructure requirements and in the Subject
Alternate Name (SAN) field, provide IP address and FQDN of all NSX Advanced Load
Balancer controllers including NSX Advanced Load Balancer cluster IP and FQDN, and then
click Save.

VMware, Inc 281


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

4. Once the certificate is created, capture the certificate contents as this is required while
deploying the Tanzu Kubernetes Grid management cluster. To capture the certificate
content, click on the Download icon next to the certificate, and then click Copy to clipboard

VMware, Inc 282


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

under Certificate.

5. To replace the certificate, navigate to Administration > Settings > Access Settings, and click
the pencil icon at the top right to edit the system access settings, and then replace the
SSL/TSL certificate and click Save.

6. Log out and log in to NSX Advanced Load Balancer.

Create Credentials
NSX Advanced Load Balancer requires credentials of VMware NSX and vCenter Server to
authenticate with these endpoints. These credentials need to be created before configuring NSX
Cloud.

To create a new credential, navigate to Administration > User Credentials and click Create.

1. Create NSX Credential: Select the credential type as NSX-T and provide a name for the
credential. Under the section NSX-T Credentials, specify the username and password that
NSX Advanced Load Balancer will use to authenticate with VMware NSX.

VMware, Inc 283


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

1. Create vCenter Credential: Select the credential type as vCenter and provide a name for the
credential. Under the section vCenter Credentials, specify the username and password that
NSX Advanced Load Balancer will use to authenticate with vCenter server.

Create NSX Cloud and Service Engine Groups


NSX Advanced Load Balancer can be deployed in multiple environments for the same system. Each
environment is called a cloud. The following procedure provides steps to create a VMware NSX
cloud. As per the architecture, two service engine (SE) groups will be created.

Service Engine Group 1: Service engines associated with this service engine group hosts:

Virtual services that load balances control plane nodes of Management Cluster and Shared
services cluster.

Virtual services for all load balancer functionalities requested by Tanzu Kubernetes Grid

VMware, Inc 284


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

management cluster and Shared services cluster.

Service Engine Group 2: Service engines part of this service engine group hosts virtual services that
load balances control plane nodes and virtual services for all load balancer functionalities requested
by the workload clusters mapped to this SE group.

Note

- Based on your requirements, you can create additional SE groups for the workload
clusters. - Multiple workload clusters can be mapped to a single SE group. - A Tanzu
Kubernetes Grid cluster can be mapped to only one SE group for application load
balancer services. - Control plane VIP for the workload clusters will be placed on the
respective Service Engine group assigned through AKO Deployment Config (ADC)
during cluster creation.

For more information about mapping a specific service engine group to Tanzu Kubernetes Grid
workload cluster, see Configure NSX Advanced Load Balancer in Tanzu Kubernetes Grid Workload
Cluster.

The following components are created in NSX Advanced Load Balancer.

Object Sample Name

NSX Cloud sfo01w01vc01

Service Engine Group 1 sfo01m01segroup01

Service Engine Group 2 sfo01w01segroup01

1. Log in to NSX Advanced Load Balancer and navigate to Infrastructure > Clouds > Create >
NSX-T Cloud.

2. Enter cloud name and provide a object name prefix. Click CHANGE CREDENTIALS to
connect NSX Advanced Load Balancer with VMware NSX.

VMware, Inc 285


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

3. Specify NSX-T Manager Address and select the NSX-T credential that you created earlier.

4. Under the Management Network pane, select the following:

Transport Zone: Overlay transport zone where you connect the NSX Advanced Load
Balancer management network.

Tier-1 Router: Tier-1 gateway where the Advanced Load Balancer management
network is connected.

Overlay Segment: Logical segment that you have created for the Advanced Load
Balancer management.

5. Under the Data Networks pane, select the following:

Transport Zone: Overlay transport zone where you connected the Tanzu Kubernetes
Grid VIP networks.

Tier-1 Router: Tier-1 gateway sfo01w01tier1 where the TKG Cluster VIP Network
network is connected.

Overlay Segment: Logical segment that you have created for TKG Cluster VIP
Network.

Tier-1 Router: Tier-1 gateway sfo01w01tier2 where TKG Workload VIP Network is
connected.

VMware, Inc 286


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Overlay Segment: Logical segment that you created for the TKG Workload VIP
Network.

Note: For single VIP network architecture, Don’t add sfo01w01tier2 tier-1 gateway under Data
Network Segments and associated Overlay Segment.

1. Under vCenter Servers pane, click ADD.

2. Specify a name for the vCenter server and click CHANGE CREDENTIALS to connect NSX
Advanced Load Balancer with the vCenter server.

VMware, Inc 287


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

3. Select the vCenter server from the drop down and select the vCenter credential which you
have created earlier.

4. Select the Content Library where Service Engine templates will be stored by NSX Advanced
Load Balancer.

VMware, Inc 288


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

5. Leave the IPAM/DNS profile section empty as this will be populated later, once you have
created the profiles. Click SAVE to finish the NSX-T cloud configuration.

6. Ensure that status of the NSX-T cloud is Green post creation.

7. Create a service engine group for Tanzu Kubernetes Grid management clusters:

1. Click on the Service Engine Group tab.

2. Under Select Cloud, choose the cloud created in the previous step, and click Create.

8. Enter a name for the Tanzu Kubernetes Grid management service engine group, and set the
following parameters:

Parameter Value

VMware, Inc 289


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

High availability mode Active/Active

VS Placement Compact

Memory per Service Engine 4

vCPU per Service Engine 2

Use the default values for the rest of the parameters.

Under Scope tab, Specify the vCenter server endpoint by clicking on the Add option.

VMware, Inc 290


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Select the vCenter server from the dropdown, Service Engine Folder, vSphere cluster and
datastore for service engine placement, and then click Save.

9. Repeat steps 12 and 13 to create another service engine group for Tanzu Kubernetes Grid
workload clusters. Once complete, there must be two service engine groups created.

Configure Network and IPAM Profile


As part of the cloud creation, NSX Advanced Load Balancer management and Tanzu Kubernetes
Grid VIP networks have been configured in NSX Advanced Load Balancer. Since DHCP was not
selected as the IP address management method in the cloud configuration, you have to specify pool
of IP addresses that can be assigned to the service engine NICs and the virtual services that will be
created in future.

To configure IP address pools for the networks, follow this procedure:

1. Navigate to Infrastructure > Cloud Resources >Networks and select the cloud that you have
created earlier. Click on the edit icon next for the network and configure as follows. Change

VMware, Inc 291


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

the provided details as per your SDDC configuration.

Network Name DHCP Subnet Static IP Pool

sfo01-w01-vds01-albmanagement No 172.16.10.0/24 172.16.10.100 - 172.16.10.200

sfo01-w01-vds01-tkgclustervip No 172.16.80.0/24 172.16.80.100 - 172.16.80.200

sfo01-w01-vds01-tkgworkloadvip No 172.16.70.0/24 172.16.70.100 - 172.16.70.200

Once the networks are configured, the configuration must look like the following image.

Note

For single VIP network architecture, do not configure the sfo01-w01-vds01-


tkgworkloadvip network. The sfo01-w01-vds01-tkgclustervip segment is used for
control plane and data network of TKG workload cluster.

1. Once the networks are configured, set the default routes for the networks by navigating to
Infrastructure > Cloud Resources > Routing.

Note

Ensure that VRF Context for sfo01-w01-vds01-albmanagement network is set


to Global.

Ensure that VRF Context for sfo01-w01-vds01-tkgclustervip network is set


to NSX tier-1 gateway sfo01w01tier1.

Ensure that VRF Context for sfo01-w01-vds01-tkgworkloadvip network is set


to NSX tier-1 gateway sfo01w01tier2.

To set the default gateway for the asfo01-w01-vds01-albmanagement network, click CREATE
under the global VRF context and set the default gateway to gateway of the NSX Advanced
Load Balancer management subnet.

VMware, Inc 292


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

To set the default gateway for the sfo01-w01-vds01-tkgclustervip network, click CREATE
under the tier-1 gateway sfo01w01tier1 VRF context and set the default gateway to gateway
of the VIP network subnet.

To set the default gateway for the sfo01-w01-vds01-tkgworkloadvip network, click CREATE
under the tier-1 gateway sfo01w01tier2 VRF context and set the default gateway to gateway

VMware, Inc 293


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

of the VIP network subnet.

The final configuration is shown below:

Create IPAM Profile in NSX Advanced Load Balancer and Attach to Cloud

At this point, all the required networks related to Tanzu functionality are configured in NSX
Advanced Load Balancer. NSX Advanced Load Balancer provides IPAM service for Tanzu
Kubernetes Grid cluster VIP network and NSX ALB management network.

Complete the following steps to create an IPAM profile and once created, attach it to the NSX-T
cloud created earlier.

1. Log in to NSX Advanced Load Balancer and navigate to Templates > IPAM/DNS Profiles >
Create > IPAM Profile.

Provide the following details, and click Save.

Parameter Value

Name sfo01-w01-vcenter-ipam01

Type AVI Vintage IPAM

Cloud for Usable Networks sfo01w01vc01

Usable Networks sfo01-w01-vds01-albmanagement


sfo01-w01-vds01-tkgclustervip
sfo01-w01-vds01-tkgworkloadvip

VMware, Inc 294


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Note

For single VIP network architecture, do not add the sfo01-w01-vds01-


tkgworkloadvip network segment to IPAM profile.

1. Click Create > DNS Profile and provide the domain name.

VMware, Inc 295


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

2. Attach the IPAM and DNS profiles to the NSX-T cloud.

1. Navigate to Infrastructure > Clouds.

2. Edit the sfo01w01vc01 cloud.

3. Under IPAM/DNS section, choose the IPAM and DNS profiles created earlier and
save the updated configuration.

VMware, Inc 296


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

This completes the NSX Advanced Load Balancer configuration. The next step is to deploy and
configure a bootstrap machine which will be used to deploy and manage Tanzu Kubernetes clusters.

Deploy and Configure Bootstrap Machine


The deployment of the Tanzu Kubernetes Grid management and workload clusters is facilitated by
setting up a bootstrap machine where you install the Tanzu CLI and Kubectl utilities which are used
to create and manage the Tanzu Kubernetes Grid instance. This machine also keeps the Tanzu
Kubernetes Grid and Kubernetes configuration files for your deployments. The bootstrap machine
can be a laptop, host, or server running on Linux, macOS, or Windows that you deploy management
and workload clusters from.

The bootstrap machine runs a local kind cluster when Tanzu Kubernetes Grid management cluster
deployment is started. Once the kind cluster is fully initialized, the configuration is used to deploy the
actual management cluster on the backend infrastructure. After the management cluster is fully
configured, the local kind cluster is deleted and future configurations are performed with the Tanzu
CLI.

For this deployment, a Photon-based virtual machine is used as the bootstrap machine. For more
information about configuring for a macOS or Windows machine, see Install the Tanzu CLI and Other
Tools.

The bootstrap machine must meet the following prerequisites:

A minimum of 6 GB of RAM and a 2-core CPU.

System time is synchronized with a Network Time Protocol (NTP) server.

Docker and containerd binaries are installed. For instructions on how to install Docker, see
Docker documentation.

VMware, Inc 297


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Ensure that the bootstrap VM is connected to Tanzu Kubernetes Grid management network,
sfo01-w01-vds01-tkgmanagement.

To install Tanzu CLI, Tanzu Plugins, and Kubectl utility on the bootstrap machine, follow the
instructions below:

1. Download and unpack the following Linux CLI packages from VMware Tanzu Kubernetes
Grid Download Product page.

VMware Tanzu CLI v0.90.1 for Linux

kubectl cluster CLI v1.26.5 for Linux

2. Execute the following commands to install Tanzu Kubernetes Grid CLI, kubectl CLIs, and
Carvel tools.

## Install required packages


tdnf install tar zip unzip wget -y

## Install Tanzu Kubernetes Grid CLI


tar -xvf tanzu-cli-linux-amd64.tar
cd ./v0.90.1/
install tanzu-cli-linux_amd64 /usr/local/bin/tanzu
chmod +x /usr/local/bin/tanzu

## Verify Tanzu CLI version

root@photon-829669d9bf1f [ ~ ]# tanzu version

version: v0.90.1
buildDate: 2023-06-29
sha: 8945351c

## Install the Tanzu CLI plugins.

root@photon-829669d9bf1f [ ~ ]# tanzu plugin group search

[i] Reading plugin inventory for "projects.registry.vmware.com/tanzu_cli/plugin


s/plugin-inventory:latest", this will take a few seconds.
GROUP DESCRIPTION LATEST
vmware-tkg/default Plugins for TKG v2.3.0

root@photon-829669d9bf1f [ ~ ]# tanzu plugin install --group vmware-tkg/default


[i] Installing plugin 'isolated-cluster:v0.30.1' with target 'global'
[i] Installing plugin 'management-cluster:v0.30.1' with target 'kubernetes'
[i] Installing plugin 'package:v0.30.1' with target 'kubernetes'
[i] Installing plugin 'pinniped-auth:v0.30.1' with target 'global'
[i] Installing plugin 'secret:v0.30.1' with target 'kubernetes'
[i] Installing plugin 'telemetry:v0.30.1' with target 'kubernetes'
[ok] successfully installed all plugins from group 'vmware-tkg/default:v2.3.0'

#Accept EULA
root@photon-829669d9bf1f [ ~ ]# tanzu config eula accept
[ok] Marking agreement as accepted.

## Verify the plugins are installed

root@photon-829669d9bf1f [ ~ ]# tanzu plugin list

VMware, Inc 298


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Standalone Plugins
NAME DESCRIPTION
TARGET VERSION STATUS
isolated-cluster Prepopulating images/bundle for internet-restricted envir
onments global v0.30.1 installed
pinniped-auth Pinniped authentication operations (usually not directly
invoked) global v0.30.1 installed
management-cluster Kubernetes management cluster operations
kubernetes v0.30.1 installed
package Tanzu package management
kubernetes v0.30.1 installed
secret Tanzu secret management
kubernetes v0.30.1 installed
telemetry configure cluster-wide settings for vmware tanzu telemetr
y kubernetes v0.30.1 installed

## Install Kubectl CLI


gunzip kubectl-linux-v1.26.5+vmware.2.gz
mv kubectl-linux-v1.26.5+vmware. /usr/local/bin/kubectl && chmod +x /usr/local/
bin/kubectl

# Install Carvel tools

##Install ytt
gunzip ytt-linux-amd64-v0.45.0+vmware.2.gz

chmod ugo+x ytt-linux-amd64-v0.45.0+vmware.2 && mv ./ytt-linux-amd64-v0.45.0+v


mware.2 /usr/local/bin/ytt

##Install kapp

gunzip kapp-linux-amd64-v0.55.0+vmware.2.gz

chmod ugo+x kapp-linux-amd64-v0.55.0+vmware.2 && mv ./kapp-linux-amd64-v0.55.0+


vmware.2 /usr/local/bin/kapp

##Install kbld

gunzip kbld-linux-amd64-v0.37.0+vmware.2.gz

chmod ugo+x kbld-linux-amd64-v0.37.0+vmware.2 && mv ./kbld-linux-amd64-v0.37.0+


vmware.2 /usr/local/bin/kbld

##Install impkg

gunzip imgpkg-linux-amd64-v0.36.0+vmware.2.gz
chmod ugo+x imgpkg-linux-amd64-v0.36.0+vmware.2 && mv ./imgpkg-linux-amd64-v0.3
6.0+vmware.2 /usr/local/bin/imgpkg

3. Validate Carvel tools installation using the following commands.

ytt version
kapp version
kbld version
imgpkg version

4. Install yq. yq is a lightweight and portable command-line YAML processor. yq uses jq-like

VMware, Inc 299


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

syntax but works with YAML and JSON files.

wget https://github.com/mikefarah/yq/releases/download/v4.24.5/yq_linux_amd64.t
ar.gz

tar -xvf yq_linux_amd64.tar.gz && mv yq_linux_amd64 /usr/local/bin/yq

5. Install kind.

curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.11.1/kind-linux-amd64


chmod +x ./kind
mv ./kind /usr/local/bin/kind

6. Execute the following commands to start the Docker service and enable it to start at boot.
Photon OS has Docker installed by default.

## Check Docker service status


systemctl status docker

## Start Docker Service


systemctl start docker

## To start Docker Service at boot


systemctl enable docker

7. Execute the following commands to ensure that the bootstrap machine uses cgroup v1.

docker info | grep -i cgroup

## You should see the following


Cgroup Driver: cgroupfs

8. Create an SSH key pair.

An SSH key pair is required for Tanzu CLI to connect to vSphere from the bootstrap
machine.

The public key part of the generated key is passed during the Tanzu Kubernetes Grid
management cluster deployment.

## Generate SSH key pair


## When prompted enter file in which to save the key (/root/.ssh/id_rsa): press
Enter to accept the default and provide password
ssh-keygen -t rsa -b 4096 -C "[email protected]"

## Add the private key to the SSH agent running on your machine and enter the p
assword you created in the previous step
ssh-add ~/.ssh/id_rsa
## If the above command fails, execute "eval $(ssh-agent)" and then rerun the c
ommand

9. If your bootstrap machine runs Linux or Windows Subsystem for Linux, and it has a Linux
kernel built after the May 2021 Linux security patch, for example Linux 5.11 and 5.12 with
Fedora, run the following command.

sudo sysctl net/netfilter/nf_conntrack_max=131072

VMware, Inc 300


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

All required packages are now installed and the required configurations are in place in the bootstrap
virtual machine. The next step is to deploy the Tanzu Kubernetes Grid management cluster.

Import Base Image Template for Tanzu Kubernetes Grid Cluster


Deployment
Before you proceed with the management cluster creation, ensure that the base image template is
imported into vSphere and is available as a template. To import a base image template into vSphere:

1. Go to the Tanzu Kubernetes Grid downloads page and download a Tanzu Kubernetes Grid
OVA for the cluster nodes.

2. For the management cluster, this must be either Photon or Ubuntu based Kubernetes
v1.24.9 OVA.

Note

Custom OVA with a custom Tanzu Kubernetes release (TKr) is also


supported, as described in Build Machine Images.

3. For workload clusters, OVA can have any supported combination of OS and Kubernetes
version, as packaged in a Tanzu Kubernetes release.

Note

Make sure you download the most recent OVA base image templates in the
event of security patch releases. You can find updated base image templates
that include security patches on the Tanzu Kubernetes Grid product
download page.

4. In the vSphere client, right-click an object in the vCenter Server inventory and select Deploy
OVF template.

5. Select Local file, click the button to upload files, and go to the downloaded OVA file on your
local machine.

6. Follow the installer prompts to deploy a VM from the OVA.

7. Click Finish to deploy the VM. When the OVA deployment finishes, right-click the VM and
select Template > Convert to Template.

Note

Do not power on the VM before you convert it to a template.

8. If using non administrator SSO account: In the VMs and Templates view, right-click the new
template, select Add Permission, and assign the tkg-user to the template with the TKG role.

For more information about creating the user and role for Tanzu Kubernetes Grid, see Required

VMware, Inc 301


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Permissions for the vSphere Account.

Deploy Tanzu Kubernetes Grid Management Cluster


The management cluster is a Kubernetes cluster that runs Cluster API operations on a specific cloud
provider to create and manage workload clusters on that provider.

The management cluster is also where you configure the shared and in-cluster services that the
workload clusters use.

You can deploy management clusters in two ways:

Run the Tanzu Kubernetes Grid installer, a wizard interface that guides you through the
process of deploying a management cluster. This is the recommended method.

Create and edit YAML configuration files, and use them to deploy a management cluster with
the CLI commands.

The following procedure provides the required steps to deploy Tanzu Kubernetes Grid management
cluster using the installer interface.

1. To launch the UI installer wizard, run the following command on the bootstrap machine:

tanzu management-cluster create --ui --bind <bootstrapper-ip>:<port> --browser


none

For example:
tanzu management-cluster create --ui --bind 172.16.40.10:8000 --browser none

2. Access Tanzu UI wizard by opening a browser and entering: http://<bootstrapper-ip:port/

3. On the VMware vSphere tile, click DEPLOY.

4. In the IaaS Provider section, enter the IP address/FQDN and credentials of the vCenter
server where the Tanzu Kubernetes Grid management cluster will be deployed. (Optional)
you can skip the vCenter SSL thumbprint verification.

VMware, Inc 302


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

5. Click CONNECT and select “DEPLOY TKG MANAGEMENT CLUSTER”.

6. Select the data center and provide the SSH public Key generated while configuring the
bootstrap VM.
If you have saved the SSH key in the default location, run the following command in your
bootstrap machine to get the SSH public key.

cat /root/.ssh/id_rsa.pub

7. Click NEXT.

8. On the Management Cluster Settings section, provide the following details and click Next.

Based on the environment requirements, select appropriate deployment type for the
Tanzu Kubernetes Grid management cluster:

Development: Recommended for Dev or POC environments

Production: Recommended for Production environments

It is recommended to set the instance type to Large or above. For the purpose of this
document, we will proceed with deployment type Production and instance type
Medium.

Management Cluster Name: Name for your management cluster.

Control Plane Endpoint Provider: Select NSX Advanced Load Balancer for Control
Plane HA.

Control Plane Endpoint: This is an optional field. If left blank, NSX Advanced Load

VMware, Inc 303


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Balancer will assign an IP address from the pool defined for the network “sfo01-w01-
vds01-tkgclustervip”.
If you need to provide an IP address, pick an IP address from “sfo01-w01-vds01-
tkgclustervip” static IP pools configured in AVI and ensure that the IP address is
unused.

Machine Health Checks: Enable

Enable Audit Logging: Enable for audit logging for Kubernetes API server and node
VMs. Choose as per your environment needs. For more information, see Audit
Logging.

9. On the NSX Advanced Load Balancer section, provide the following information and click
Next.

Controller Host: NSX Advanced Load Balancer Controller IP/FQDN (ALB Controller
cluster IP/FQDN of the controller cluster is configured)

Controller credentials: Username and Password of NSX Advanced Load Balancer

Controller certificate: Paste the contents of the Certificate Authority that is used to
generate your controller certificate into the Controller Certificate Authority text box.

10. Once these details are provided, click VERIFY CREDENTIALS and choose the following
parameters.

Cloud Name: Name of the cloud created while configuring NSX Advanced Load
Balancer sfo01w01vc01.

VMware, Inc 304


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Workload Cluster Service Engine Group Name: Name of the service engine group
created for Tanzu Kubernetes Grid workload clusters created while configuring NSX
Advanced Load Balancer sfo01w01segroup01.

Workload Cluster Data Plane VIP Network Name: Select sfo01-w01-vds01-


tkgworkloadvip network and the subnet associated with it.

Workload Cluster Control Plane VIP Network Name: Select sfo01-w01-vds01-


tkgclustervip network and the subnet associated with it.

Management Cluster Service Engine Group Name: Name of the service engine
group created for Tanzu Kubernetes Grid management cluster created while
configuring NSX Advanced Load Balancer sfo01m01segroup01.

Management Cluster Data Plane VIP network Name: Select sfo01-w01-vds01-


tkgclustervip network and the subnet associated with it.

Management Cluster Control Plane VIP network Name: Select sfo01-w01-vds01-


tkgclustervip network and the subnet associated with it.

Cluster Labels: Optional. Leave the cluster labels section empty to apply the above
workload cluster network settings by default. If you specify any label here, you must
specify the same values in the configuration YAML file of the workload cluster. Else,
the system places the endpoint VIP of your workload cluster in TKG Cluster VIP
Network by default.

Note

With the above configuration, all the Tanzu workload clusters use sfo01-w01-
vds01-tkgclustervip for control plane VIP network and sfo01-w01-vds01-
tkgworkloadvip for data plane network by default. If you would like to

VMware, Inc 305


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

configure separate VIP networks for workload control plane/data networks,


create a custom AKO Deployment Config (ADC) and provide the respective
NSX ALB_LABELS in the workload cluster config file. For more information on
network separation and custom ADC creation, see Configure Separate VIP
Networks and Service Engine Groups in Different Workload Clusters.

11. (Optional) On the Metadata page, you can specify location and labels and click Next.

12. On the Resources section, specify the resources to be consumed by the Tanzu Kubernetes
Grid management cluster and click NEXT.

VMware, Inc 306


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

13. On the Kubernetes Network section, select the Tanzu Kubernetes Grid management
network (sfo01-w01-vds01-tkgmanagement) where the control plane and worker nodes will be
placed during management cluster deployment. Ensure that the network has DHCP service
enabled. Optionally, change the pod and service CIDR.

If the Tanzu environment is placed behind a proxy, enable proxy and provide proxy details:

If you set http-proxy, you must also set https-proxy and vice-versa.

For the no-proxy section:

For Tanzu Kubernetes Grid management and workload clusters, localhost,


127.0.0.1, the values of CLUSTER_CIDR and SERVICE_CIDR, .svc, and
.svc.cluster.local are appended along with the user specified values.

Note

If the Kubernetes cluster needs to communicate with external


services and infrastructure endpoints in your Tanzu Kubernetes Grid
environment, ensure that those endpoints are reachable by your
proxies or add them to TKG_NO_PROXY. Depending on your
environment configuration, this may include, but is not limited to,
your OIDC or LDAP server, Harbor, NSX, NSX Advanced Load
Balancer, and vCenter.

For vSphere, you must manually add the CIDR of Tanzu Kubernetes Grid
management network and Cluster VIP networks that includes the IP address of your
control plane endpoints, to TKG_NO_PROXY.

VMware, Inc 307


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

14. (Optional) Specify identity management with OIDC or LDAP. For the purpose of this
document, identity management integration is deactivated.

If you would like to enable identity management, see Enable and Configure Identity
Management During Management Cluster Deployment section in the Tanzu Kubernetes
Grid Integration with Pinniped Deployment Guide.

15. Select the OS image that will be used for the management cluster deployment.

Note

This list appears empty if you don’t have a compatible template present in
your environment. Refer steps provided in Import Base Image template for
TKG Cluster deployment.

16. Select “Participate in the Customer Experience Improvement Program”, if you so desire.

VMware, Inc 308


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

17. Click REVIEW CONFIGURATION.

As of now, it is not possible to deploy management cluster for NSX cloud from the Tanzu
Kubernetes Grid installer UI as one of the required field for NSX cloud is not exposed in the
UI and it needs to be manually inserted in the cluster deployment yaml.

18. Click on EXPORT CONFIGURATION to download the deployment yaml file.

19. Edit the file and insert the key AVI_NSXT_T1LR. The value of this key is the tier-1 gateway
where you have connected the sfo01-w01-vds01-tkgmanagement network. In this example,
the value is set to /infra/tier-1s/sfo01w01tier1.

20. Deploy the Management cluster from this config file by running the command:

tanzu management-cluster create -f example.yaml -v 6

A sample file used for the management cluster deployment is shown below:

AVI_CA_DATA_B64: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM3ekNDQWRlZ0F3SUJBZ0lVRis5S
3BUSmdydmdFS1paRklabTh1WEFiRVN3d0RRWUpLb1pJaHZjTkFRRUwKQlFBd0ZURVRNQkVHQTFVRUF3d0tZV3h
pTFdObGNuUXdNVEFlRncweU16QTRNamt3T1RJeE16UmFGdzB5TkRBNApNamd3T1RJeE16UmFNQlV4RXpBUkJnT
lZCQU1NQ21Gc1lpMWpaWEowTURFd2dnRWlNQTBHQ1NxR1NJYjNEUUVCCkFRVUFBNElCRHdBd2dnRUtBb0lCQVF
DemU5eGxydzhjQlplTDE0TEc3L2RMMkg3WnJaVU5qM09zQXJxU3JxVmIKWEh4VGUrdTYvbjA1b240RGhUdDBEZ
ys0cDErZEZYMUc2N0kxTldJZlEzZGFRRnhyenBJSWdKTHUxYUF6R2hDRgpCR0dOTkxqbEtDMDVBMnZMaE1TeG5
ZR1orbDhWR2VKWDJ4dzY5N1M4L3duUUtVRGdBUUVwcHpZT0tXQnJLY3RXCktTYm1vNlR3d1UvNWFTS0tvS3h5U
DJJYXYrb1plOVNrNG05ejArbkNDWjVieDF1SzlOelkzZFBUdUUwQ3crMTgKUkpzN3Z4MzIxL3ZTSnM3TUpMa05
Ud0lEUlNLVkViWkR4b3VMWXVMOFRHZjdMLys2Sm1UdGc3Y3VsRmVhTlRKVgowTkJwb201ODc2UmMwZjdnODE3a
EFYcllhKzdJK0hxdnBSdlMrdFJkdjhDM0FnTUJBQUdqTnpBMU1ETUdBMVVkCkVRUXNNQ3FDSW5ObWJ6QXhZV3h
pWTNSc2NqQXhZUzV6Wm04d01TNXlZV2x1Y0c5c1pTNTJiWGVIQkt3UUNnc3cKRFFZSktvWklodmNOQVFFTEJRQ
URnZ0VCQUJIK20xUFUxcm1kNGRJenNTNDBJcWV3bUpHbUVBN3ByMkI2c0VIWAo0VzZWakFZTDNsTE4ySHN4VUN
Sa2NGbEVsOUFGUEpkNFZNdldtQkxabTB4SndHVXdXQitOb2NXc0puVjBjYWpVCktqWUxBWWExWm1hS2g3eGVYK
3VRVEVKdGFKNFJxeG9WYXoxdVNjamhqUEhteFkyZDNBM3RENDFrTCs3ZUUybFkKQmV2dnI1QmhMbjhwZVRyUlN
xb2h0bjhWYlZHbng5cVIvU0d4OWpOVC8vT2hBZVZmTngxY1NJZVNlR1dGRHRYQwpXa0ZnQ0NucWYyQWpoNkhVT
TIrQStjNFlsdW13QlV6TUorQU05SVhRYUUyaUlpN0VRUC9ZYW8xME5UeU1SMnJDCkh4TUkvUXdWck9NTThyK1p

VMware, Inc 309


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

VYm10QldIY1JWZS9qMVlVaXFTQjBJbmlraDFmeDZ3PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
AVI_CLOUD_NAME: sfo01w01vc01
AVI_CONTROL_PLANE_HA_PROVIDER: "true"
AVI_CONTROL_PLANE_NETWORK: sfo01-w01-vds01-clustervip
AVI_CONTROL_PLANE_NETWORK_CIDR: 172.16.80.0/24
AVI_CONTROLLER: 172.16.10.11
AVI_DATA_NETWORK: sfo01-w01-vds01-tkgworkloadvip
AVI_DATA_NETWORK_CIDR: 172.16.70.0/24
AVI_ENABLE: "true"
AVI_NSXT_T1LR: /infra/tier-1s/sfo01w01tier1
AVI_MANAGEMENT_CLUSTER_CONTROL_PLANE_VIP_NETWORK_CIDR: 172.16.80.0/24
AVI_MANAGEMENT_CLUSTER_CONTROL_PLANE_VIP_NETWORK_NAME: sfo01-w01-vds01-tkgclustervip
AVI_MANAGEMENT_CLUSTER_SERVICE_ENGINE_GROUP: sfo01m01segroup01
AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_CIDR: 172.16.80.0/24
AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_NAME: sfo01-w01-vds01-tkgclustervip
AVI_PASSWORD: <encoded:Vk13YXJlMSE=>
AVI_SERVICE_ENGINE_GROUP: sfo01w01segroup01
AVI_USERNAME: admin
CLUSTER_ANNOTATIONS: 'description:,location:'
CLUSTER_CIDR: 100.96.0.0/11
CLUSTER_NAME: sfo01w01tkgmgmt01
CLUSTER_PLAN: prod
ENABLE_AUDIT_LOGGING: "true"
ENABLE_CEIP_PARTICIPATION: "false"
ENABLE_MHC: "true"
IDENTITY_MANAGEMENT_TYPE: oidc
INFRASTRUCTURE_PROVIDER: vsphere
LDAP_BIND_DN: ""
LDAP_BIND_PASSWORD: ""
LDAP_GROUP_SEARCH_BASE_DN: ""
LDAP_GROUP_SEARCH_FILTER: ""
LDAP_GROUP_SEARCH_GROUP_ATTRIBUTE: ""
LDAP_GROUP_SEARCH_NAME_ATTRIBUTE: cn
LDAP_GROUP_SEARCH_USER_ATTRIBUTE: DN
LDAP_HOST: ""
LDAP_ROOT_CA_DATA_B64: ""
LDAP_USER_SEARCH_BASE_DN: ""
LDAP_USER_SEARCH_FILTER: ""
LDAP_USER_SEARCH_NAME_ATTRIBUTE: ""
LDAP_USER_SEARCH_USERNAME: userPrincipalName
OIDC_IDENTITY_PROVIDER_CLIENT_ID: ""
OIDC_IDENTITY_PROVIDER_CLIENT_SECRET: ""
OIDC_IDENTITY_PROVIDER_GROUPS_CLAIM: ""
OIDC_IDENTITY_PROVIDER_ISSUER_URL: ""
OIDC_IDENTITY_PROVIDER_NAME: ""
OIDC_IDENTITY_PROVIDER_SCOPES: ""
OIDC_IDENTITY_PROVIDER_USERNAME_CLAIM: ""
OS_ARCH: amd64
OS_NAME: photon
OS_VERSION: "3"
SERVICE_CIDR: 100.64.0.0/13
TKG_HTTP_PROXY_ENABLED: "false"
VSPHERE_CONTROL_PLANE_DISK_GIB: "20"
VSPHERE_CONTROL_PLANE_ENDPOINT: ""
VSPHERE_CONTROL_PLANE_MEM_MIB: "4096"
VSPHERE_CONTROL_PLANE_NUM_CPUS: "2"
VSPHERE_DATACENTER: /sfo01w01dc01
VSPHERE_DATASTORE: /sfo01w01dc01/datastore/vsanDatastore
VSPHERE_FOLDER: /sfo01w01dc01/vm/tkg-management-components

VMware, Inc 310


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

VSPHERE_INSECURE: "false"
VSPHERE_NETWORK: /sfo01w01dc01/network/sfo01-w01-vds01-tkgmanagement
VSPHERE_PASSWORD: <encoded:Vk13YXJlMSE=>
VSPHERE_RESOURCE_POOL: /sfo01w01dc01/host/sfo01w01cluster01/Resources/tkg-management-c
omponents
VSPHERE_SERVER: 192.168.200.100
VSPHERE_SSH_AUTHORIZED_KEY: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDrPqkVaPpNxHcKxukYro
V6LcCTuRK9NDyygbsAr/P73jEeWIcC+SU4tRpOZks2+BoduUDzdrsfm/Uq/0uj9LuzqIZKAzA1iQ5DtipVzROq
eTuAXJVCMZc6RPgQSZofLBo1Is85M/IrBS20OMALwjukMdwotKKFwL758l51FVsKOT+MUSW/wJLKTv3l0KPObg
SRTMUQdQpoG7ONcMNG2VkBMfgaK44cL7vT0/0Mv/Fmf3Zd59ZaWvX28ZmGEjRx8kOm1j/os61Y+kOvl1MTv8wc
85rYusRuP2Uo5UM4kUTdhSTFasw6TLhbSWicKORPi3FYklvS70jkQFse2WsvmtFG5xyxE/rzDGHloud9g2bQ7T
x0rtWWoRCCC8Sl/vzCjgZfDQXwKXoMP0KbcYHZxSA3zY2lXBlhNtZtyKlynnhr97EaWsm3b9fvhJMmKW5ylkmk
7+4Bql7frJ4bOOR4+hHv57Q8XFOYdLGQPGv03RUFQwFE6a0a6qWAvmVmoh8+BmlGOfx7WYpp8hkyGOdtQz8ZJe
SOyMT6ztLHbY/WqDwEvKpf1dJy93w8fDmz3qXHpkpdnA0t4TiCfizlBk15ZI03TLi4ELoFvso9We13dGClHDDy
v0Dm87uaACC+fyAT5JPbZpAcCw8rm/yTuZ8awtR0LEzJUqNJjX/5OX7Bf45h9w== [email protected]
VSPHERE_TLS_THUMBPRINT: 7C:31:67:1A:F3:26:FA:CE:0E:33:2E:D2:7C:FC:86:EC:1C:51:67:E3
VSPHERE_USERNAME: [email protected]
VSPHERE_WORKER_DISK_GIB: "40"
VSPHERE_WORKER_MEM_MIB: "8192"
VSPHERE_WORKER_NUM_CPUS: "2"
WORKER_ROLLOUT_STRATEGY: ""

Note

For Single VIP network Architecture, refer Management Cluster yaml file.

While the cluster is being deployed, you will find that a virtual service is created in NSX Advanced
Load Balancer and new service engines are deployed in vCenter by NSX Advanced Load Balancer
and the service engines are mapped to the SE Group sfo01m01segroup01.​​

The installer automatically sets the context to the Tanzu Kubernetes Grid management cluster on the
bootstrap machine. Now, you can access the Tanzu Kubernetes Grid management cluster from the
bootstrap machine and perform additional tasks such as verifying the management cluster health,
deploying the workload clusters, and so on.

To get the status of Tanzu Kubernetes Grid management cluster, run the following command:

tanzu management-cluster get

VMware, Inc 311


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Use kubectl get nodes command to get the status of the Tanzu Kubernetes Grid management
cluster nodes.

The Tanzu Kubernetes Grid management cluster is successfully deployed and now you can proceed
with registering it with Tanzu Mission Control and creating shared services and workload clusters.

What to Do Next
Register Management Cluster with Tanzu Mission Control
If you want to register your management cluster with Tanzu Mission Control, see Register Your
Management Cluster with Tanzu Mission Control.

Configure AKO Deployment Config (ADC) for Workload


Clusters
Tanzu Kubernetes Grid management clusters with NSX Advanced Load Balancer are deployed with
2 AKODeploymentConfigs.

install-ako-for-management-cluster: default configuration for management cluster

install-ako-for-all: default configuration for all workload clusters. By default, all the
workload clusters reference this file for their virtual IP networks and service engine (SE)
groups. This ADC configuration does not enable NSX L7 Ingress by default.

As per this Tanzu deployment, create two more ADCs:

tanzu-ako-for-shared: Used by shared services cluster to deploy the virtual services in TKG
Mgmt SE Group and the loadbalancer applications in TKG Cluster VIP Network.

VMware, Inc 312


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

tanzu-ako-for-workload-L7-ingress: Use this ADC only if you would like to enable NSX
Advanced Load Balancer L7 ingress on workload cluster. Otherwise, leave the cluster labels
empty to apply the network configuration from default ADC install-ako-for-all.

Configure AKODeploymentConfig (ADC) for Shared Services Cluster


As per the defined architecture, shared services cluster uses the same control plane and data plane
network as the management cluster. The shared services cluster control plane endpoint uses TKG
Cluster VIP Network, application loadbalancing uses TKG Cluster VIP network and the virtual
services are deployed in sfo01m01segroup01 SE group. This configuration is enforced by creating a
custom AKO Deployment Config (ADC) and applying the respective AVI_LABELS while deploying the
shared services cluster.

The format of the AKODeploymentConfig YAML file is as follows:

apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
finalizers:
- ako-operator.networking.tkg.tanzu.vmware.com
generation: 2
name: <Unique name of AKODeploymentConfig>
spec:
adminCredentialRef:
name: nsx-alb-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: nsx-alb-controller-ca
namespace: tkg-system-networking
cloudName: <NAME OF THE CLOUD in ALB>
clusterSelector:
matchLabels:
<KEY>: <VALUE>
controlPlaneNetwork:
cidr: <TKG-Cluster-VIP-CIDR>
Name: <TKG-Cluster-VIP-Network>
controller: <NSX ALB CONTROLLER IP/FQDN>
dataNetwork:
cidr: <TKG-Mgmt-Data-VIP-CIDR>
name: <TKG-Mgmt-Data-VIP-Name>
extraConfigs:
cniPlugin: antrea
disableStaticRouteSync: true
ingress:
defaultIngressController: false
disableIngressClass: true
nodeNetworkList:
- networkName: <TKG-Mgmt-Network>
serviceEngineGroup: <Mgmt-Cluster-SEG>

The sample AKODeploymentConfig with sample values in place is as follows. You should add the
respective NSX ALB label type=shared-services while deploying shared services cluster to enforce
this network configuration.

cloud: ​sfo01w01vc01​

VMware, Inc 313


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

service engine group: sfo01m01segroup01

Control Plane network: sfo01-w01-vds01-tkgclustervip

VIP/data network: sfo01-w01-vds01-tkgclustervip

Node Network: sfo01-w01-vds01-tkgmanagement

apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
generation: 3
name: tanzu-ako-for-shared
spec:
adminCredentialRef:
name: avi-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: avi-controller-ca
namespace: tkg-system-networking
cloudName: sfo01w01vc01
clusterSelector:
matchLabels:
type: shared-services
controlPlaneNetwork:
cidr: 172.16.80.0/24
name: sfo01-w01-vds01-tkgclustervip
controller: 172.16.10.10
controllerVersion: 22.1.3
dataNetwork:
cidr: 172.16.80.0/24
name: sfo01-w01-vds01-tkgclustervip
extraConfigs:
disableStaticRouteSync: false
ingress:
defaultIngressController: false
disableIngressClass: true
nodeNetworkList:
- networkName: sfo01-w01-vds01-tkgmanagement
networksConfig:
nsxtT1LR: /infra/tier-1s/sfo01w01tier1
serviceEngineGroup: sfo01m01segroup01

Note

For Single VIP Network Architecture, see Shared Service Cluster ADC file.

After you have the AKO configuration file ready, use the kubectl command to set the context to
Tanzu Kubernetes Grid management cluster and create the ADC:

# kubectl config use-context sfo01w01tkgmgmt01-admin@sfo01w01tkgmgmt01


Switched to context "sfo01w01tkgmgmt01-admin@sfo01w01tkgmgmt01".

# kubectl apply -f ako-shared-services.yaml


akodeploymentconfig.networking.tkg.tanzu.vmware.com/tanzu-ako-for-shared created

VMware, Inc 314


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Use the following command to list all AKODeploymentConfig created under the management
cluster:

# kubectl get adc


NAME AGE
install-ako-for-all 21h
install-ako-for-management-cluster 21h
tanzu-ako-for-shared 113s

Configure AKO Deployment Config (ADC) for Workload Cluster to


Enable NSX ALB L7 Ingress with NodePortLocal Mode
VMware recommends using NSX Advanced Load Balancer L7 ingress with NodePortLocal mode for
the L7 application load balancing. This is enabled by creating a custom ADC with ingress settings
enabled, and then applying the NSXALB_LABEL while deploying the workload cluster.

As per the defined architecture, workload cluster cluster control plane endpoint uses TKG Cluster
VIP Network, application loadbalancing uses TKG Workload VIP network and the virtual services are
deployed in sfo01w01segroup01 SE group.

Below are the changes in ADC Ingress section when compare to the default ADC.

disableIngressClass: set to false to enable NSX ALB L7 Ingress.

nodeNetworkList: Provide the values for TKG workload network name and CIDR.

serviceType: L7 Ingress type, recommended to use NodePortLocal

shardVSSize: Virtual service size

The format of the AKODeploymentConfig YAML file for enabling NSX ALB L7 Ingress is as follows:

apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
name: <unique-name-for-adc>
spec:
adminCredentialRef:
name: NSXALB-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: NSXALB-controller-ca
namespace: tkg-system-networking
cloudName: <cloud name configured in nsx alb>
clusterSelector:
matchLabels:
<KEY>: <value>
controller: <ALB-Controller-IP/FQDN>
controlPlaneNetwork:
cidr: <TKG-Cluster-VIP-Network-CIDR>
name: <TKG-Cluster-VIP-Network-CIDR>
dataNetwork:
cidr: <TKG-Workload-VIP-network-CIDR>
name: <TKG-Workload-VIP-network-CIDR>
extraConfigs:
cniPlugin: antrea
disableStaticRouteSync: false # required

VMware, Inc 315


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

ingress:
disableIngressClass: false # required
nodeNetworkList: # required
- networkName: <TKG-Workload-Network>
cidrs:
- <TKG-Workload-Network-CIDR>
serviceType: NodePortLocal # required
shardVSSize: MEDIUM # required
serviceEngineGroup: <Workload-Cluster-SEG>

The AKODeploymentConfig with sample values in place is as follows. You should add the respective
NSXALB label workload-l7-enabled=true while deploying shared services cluster to enforce this
network configuration.

cloud: ​sfo01w01vc01​

service engine group: sfo01w01segroup01

Control Plane network: sfo01-w01-vds01-tkgclustervip

VIP/data network: sfo01-w01-vds01-tkgworkloadvip

Node Network: sfo01-w01-vds01-tkgworkload

apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
generation: 3
name: tanzu-ako-for-workload-L7-ingress
spec:
adminCredentialRef:
name: avi-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: avi-controller-ca
namespace: tkg-system-networking
cloudName: sfo01w01vc01
clusterSelector:
matchLabels:
workload-l7-enabled: "true"
controlPlaneNetwork:
cidr: 172.16.80.0/24
name: sfo01-w01-vds01-tkgclustervip
controller: 172.16.10.11
controllerVersion: 22.1.3
dataNetwork:
cidr: 172.16.70.0/24
name: sfo01-w01-vds01-tkgworkloadvip
extraConfigs:
disableStaticRouteSync: true
ingress:
defaultIngressController: true
disableIngressClass: false
serviceType: NodePortLocal
shardVSSize: MEDIUM
nodeNetworkList:
- networkName: sfo01-w01-vds01-tkgworkload

VMware, Inc 316


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

cidrs:
- 172.16.60.0/24
networksConfig:
nsxtT1LR: /infra/tier-1s/sfo01w01tier2
serviceEngineGroup: sfo01w01segroup01

Note

For Single VIP Network Architecture, see Workload Cluster ADC file.

Use the kubectl command to set the context to Tanzu Kubernetes Grid management cluster and
create the ADC:

# kubectl config use-context sfo01w01tkgmgmt01-admin@sfo01w01tkgmgmt01


Switched to context "sfo01w01tkgmgmt01-admin@sfo01w01tkgmgmt01".

# kubectl apply -f workload-adc-l7.yaml


akodeploymentconfig.networking.tkg.tanzu.vmware.com/tanzu-ako-for-workload-l7-ingress
created

Use the following command to list all AKODeploymentConfig created under the management
cluster:

# kubectl get adc


NAME AGE
install-ako-for-all 22h
install-ako-for-management-cluster 22h
tanzu-ako-for-shared 82m
tanzu-ako-for-workload-l7-ingress 25s

Now that you have successfully created the AKO deployment config, you need to apply the cluster
labels while deploying the workload clusters to enable NSX Advanced Load Balancer L7 Ingress with
NodePortLocal mode.

Deploy Tanzu Kubernetes Grid Shared Services Cluster


Each Tanzu Kubernetes Grid instance can have only one shared services cluster. Create a shared
services cluster if you intend to deploy Harbor.

The procedures for deploying a shared services cluster and workload cluster are almost the same. A
key difference is that you add the tanzu-services label to the shared services cluster as its cluster
role. This label identifies the shared services cluster to the management cluster and workload
clusters.

Shared services cluster uses the custom ADC tanzu-ako-for-shared created earlier to apply the
network settings similar to the management cluster. This is enforced by applying the NSX
ALB_LABELS type:shared-services while deploying the shared services cluster.

After the management cluster is registered with Tanzu Mission Control, the deployment of the Tanzu
Kubernetes clusters can be done in just a few clicks. The procedure for creating Tanzu Kubernetes
clusters is as follows.

VMware, Inc 317


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Note

The scope of this document doesn’t cover the use of a proxy for Tanzu Kubernetes
Grid deployment. If your environment uses a proxy server to connect to the internet,
ensure that the proxy configuration object includes the CIDRs for the pod, ingress,
and egress from the workload network of the Management Cluster in the No proxy
list, as described in Create a Proxy Configuration Object for a Tanzu Kubernetes Grid
Service Cluster.

1. Navigate to the Clusters tab and click Create Cluster.

2. Under the Create cluster page, select the management cluster which you registered in the
previous step and click Continue to create cluster.

3. Select the provisioner for creating the workload cluster (shared services cluster). Provisioner
reflects the vSphere namespaces that you have created and associated with the
management cluster.

VMware, Inc 318


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

4. On the Cluster Details page, perform the following actions:

Enter a name for the cluster (Cluster names must be unique within an organization).

Select the cluster group to which you want to attach your cluster.

Select Cluster Class from the drop down.

Use the NSXALB_Labels created for shared cluster on AKO Deployment.

5. On the Configure page, specify the following items:

In the vCenter and tlsThumbprint fields, enter the details for authentication.

From the datacenter, resourcePool, folder, network, and datastore drop down,
select the required information.

From the template drop down, select the Kubernetes version.The latest supported

VMware, Inc 319


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

version is preselected for you.

In the sshAuthorizedKeys field, enter the SSH key that was created earlier.

Enable aviAPIServerHAProvider.

6. Update POD CIDR and Service CIDR if necessary.

7. Select the high availability mode for the control plane nodes of the workload cluster. For a
production deployment, it is recommended to deploy a highly available workload cluster.

VMware, Inc 320


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

8. Customize the default node pool for your workload cluster.

Specify the number of worker nodes to provision.

Select OS Version.

9. Click Create Cluster to start provisioning your workload cluster. Once the cluster is created,
you can check the status from Tanzu Mission Control.

Cluster creation takes approximately 15-20 minutes to complete. After the cluster
deployment completes, ensure that agent and extensions health shows green.

VMware, Inc 321


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

10. Connect to the Tanzu Management Cluster context and verify the cluster labels for the
Shared Service cluster.

## verify the shared service cluster creation

tanzu cluster list


NAME NAMESPACE STATUS CONTROLPLANE WORKERS KUBERNETES
ROLES PLAN TKR
sfo01w01tkgshared01 default running 3/3 3/3 v1.26.5+vmware.
2 <none> prod v1.26.5---vmware.2-tkg.1

## Connect to tkg management cluster

kubectl config use-context sfo01w01tkgmgmt01-admin@sfo01w01tkgmgmt01

## Add the tanzu-services label to the shared services cluster as its cluster r
ole. In the following command "sfo01w01tkgshared01” is the name of the shared s
ervice cluster

kubectl label cluster.cluster.x-k8s.io/sfo01w0tkgshared01 cluster-role.tkg.tanz


u.vmware.com/tanzu-services="" --overwrite=true
cluster.cluster.x-k8s.io/sfo01w0tkgshared01 labeled

## Validate TMC has applied correct AVI_LABELS to shared serice cluster

kubectl get cluster sfo01w0tkgshared01 --show-labels


NAME PHASE AGE VERSION LABELS

sfo01w0tkgshared01 Provisioned 105m cluster-role.tkg.tanzu.vmwa


re.com/tanzu-services=,networking.tkg.tanzu.vmware.com/avi=tanzu-ako-for-shared
,tanzuKubernetesRelease=v1.26.5---vmware.2-tkg.1,tkg.tanzu.vmware.com/cluster-n
ame=sfo01w0tkgshared01,type=shared-services

11. Connect to admin context of the shared service cluster using the following commands and

VMware, Inc 322


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

validate the ako pod status.

## Use the following command to get the admin context of Shared Service Cluster
.

tanzu cluster kubeconfig get sfo01w0tkgshared01 --admin

Credentials of cluster 'sfo01w0tkgshared01' have been saved


You can now access the cluster by running 'kubectl config use-context sfo01w0tk
gshared01-admin@sfo01w0tkgshared01'

## Use the following command to use the context of Shared Service Cluster

kubectl config use-context sfo01w0tkgshared01-admin@sfo01w0tkgshared01

Switched to context "sfo01w0tkgshared01-admin@sfo01w0tkgshared01".

# Verify that ako pod gets deployed in avi-system namespace

kubectl get pods -n avi-system


NAME READY STATUS RESTARTS AGE
ako-0 1/1 Running 0 73m

# verify the nodes and pods status by running the command:


kubectl get nodes -o wide

kubectl get pods -A

Now that the shared services cluster is successfully created, you may proceed with deploying the
Harbor package. For more information, see Install Harbor in Deploy User-Managed Packages in
Workload Clusters.

Deploy Tanzu Kubernetes Grid Workload Cluster


As per the architecture, workload clusters uses a custom ADC to enable NSX Advanced Load
Balancer L7 ingress with NodePortLocal mode. This is enforced by providing the AVI_LABELS while
deploying the workload cluster.

The steps for deploying a workload cluster are the same as for a shared services cluster. except use
the NSX ALB Labels created for the Workload cluster on AKO Deployment in step number 4.

After the Workload cluster creation verify the cluster labels and ako pod status 1. Connect to the
Tanzu Management Cluster context and verify the cluster labels for the workload cluster. ```bash ##
verify the workload service cluster creation

tanzu cluster list


NAME NAMESPACE STATUS CONTROLPLANE WORKERS KUBERNETES ROL
ES PLAN TKR

sfo01w01tkgshared01 default running 3/3 3/3 v1.26.5+vmware.2 <n


one> prod v1.26.5---vmware.2-tkg.1

sfo01w01tkgworkload01 default running 3/3 3/3 v1.26.5+vmware.2 <n


one> prod v1.26.5---vmware.2-tkg.1

## Connect to tkg management cluster

VMware, Inc 323


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

kubectl config use-context sfo01w01tkgmgmt01-admin@sfo01w01tkgmgmt01

## Validate that TMC has applied the AVI_LABEL while deploying the cluster

kubectl get cluster sfo01w01workload01 --show-labels


NAME PHASE AGE VERSION LABELS

sfo01w01tkgworkload01 Provisioned 105m networking.tkg.tanzu.vmware.com


/avi=tanzu-ako-for-workload-l7-ingress,tanzuKubernetesRelease=v1.26.5---vmware.2-tkg.1
,tkg.tanzu.vmware.com/cluster-name=sfo01w01tkgworkload01,workload-l7-enabled=true

```
<!-- /* cSpell:enable */ -->

1. Connect to admin context of the workload cluster using the following commands and validate
the ako pod status.

## Use the following command to get the admin context of workload Cluster.

tanzu cluster kubeconfig get sfo01w01tkgworkload01 --admin

Credentials of cluster 'sfo01w01tkgworkload01' have been saved


You can now access the cluster by running 'kubectl config use-context sfo01w01t
kgworkload01-admin@sfo01w01workload01'

## Use the following command to use the context of workload Cluster

kubectl config use-context sfo01w01tkgworkload01-admin@sfo01w01workload01

Switched to context "sfo01w01tkgworkload01-admin@sfo01w01workload01".

# Verify that ako pod gets deployed in avi-system namespace

kubectl get pods -n avi-system


NAME READY STATUS RESTARTS AGE
ako-0 1/1 Running 0 73m

# verify the nodes and pods status by running the command:


kubectl get nodes -o wide

kubectl get pods -A

You can now configure SaaS components and deploy user-managed packages on the cluster.

Integrate Tanzu Kubernetes clusters with Tanzu Observability


For more information about enabling Tanzu Observability on your workload cluster, see Set up Tanzu
Observability to Monitor a Tanzu Kubernetes Clusters.

Integrate Tanzu Kubernetes clusters with Tanzu Service Mesh


For more information about installing Tanzu Service Mesh on your workload cluster, see Onboard a
Tanzu Kubernetes Cluster to Tanzu Service Mesh.

VMware, Inc 324


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Deploy User-Managed Packages on Tanzu Kubernetes


clusters
For more information about installing user-managed packages on the Tanzu Kubernetes clusters,
see Deploy User-Managed Packages in Workload Clusters.

Single VIP Network Architecture


Appendix A - Management Cluster yaml file
AVI_CA_DATA_B64: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM3ekNDQWRlZ0F3SUJBZ0lVRis5S
3BUSmdydmdFS1paRklabTh1WEFiRVN3d0RRWUpLb1pJaHZjTkFRRUwKQlFBd0ZURVRNQkVHQTFVRUF3d0tZV3h
pTFdObGNuUXdNVEFlRncweU16QTRNamt3T1RJeE16UmFGdzB5TkRBNApNamd3T1RJeE16UmFNQlV4RXpBUkJnT
lZCQU1NQ21Gc1lpMWpaWEowTURFd2dnRWlNQTBHQ1NxR1NJYjNEUUVCCkFRVUFBNElCRHdBd2dnRUtBb0lCQVF
DemU5eGxydzhjQlplTDE0TEc3L2RMMkg3WnJaVU5qM09zQXJxU3JxVmIKWEh4VGUrdTYvbjA1b240RGhUdDBEZ
ys0cDErZEZYMUc2N0kxTldJZlEzZGFRRnhyenBJSWdKTHUxYUF6R2hDRgpCR0dOTkxqbEtDMDVBMnZMaE1TeG5
ZR1orbDhWR2VKWDJ4dzY5N1M4L3duUUtVRGdBUUVwcHpZT0tXQnJLY3RXCktTYm1vNlR3d1UvNWFTS0tvS3h5U
DJJYXYrb1plOVNrNG05ejArbkNDWjVieDF1SzlOelkzZFBUdUUwQ3crMTgKUkpzN3Z4MzIxL3ZTSnM3TUpMa05
Ud0lEUlNLVkViWkR4b3VMWXVMOFRHZjdMLys2Sm1UdGc3Y3VsRmVhTlRKVgowTkJwb201ODc2UmMwZjdnODE3a
EFYcllhKzdJK0hxdnBSdlMrdFJkdjhDM0FnTUJBQUdqTnpBMU1ETUdBMVVkCkVRUXNNQ3FDSW5ObWJ6QXhZV3h
pWTNSc2NqQXhZUzV6Wm04d01TNXlZV2x1Y0c5c1pTNTJiWGVIQkt3UUNnc3cKRFFZSktvWklodmNOQVFFTEJRQ
URnZ0VCQUJIK20xUFUxcm1kNGRJenNTNDBJcWV3bUpHbUVBN3ByMkI2c0VIWAo0VzZWakFZTDNsTE4ySHN4VUN
Sa2NGbEVsOUFGUEpkNFZNdldtQkxabTB4SndHVXdXQitOb2NXc0puVjBjYWpVCktqWUxBWWExWm1hS2g3eGVYK
3VRVEVKdGFKNFJxeG9WYXoxdVNjamhqUEhteFkyZDNBM3RENDFrTCs3ZUUybFkKQmV2dnI1QmhMbjhwZVRyUlN
xb2h0bjhWYlZHbng5cVIvU0d4OWpOVC8vT2hBZVZmTngxY1NJZVNlR1dGRHRYQwpXa0ZnQ0NucWYyQWpoNkhVT
TIrQStjNFlsdW13QlV6TUorQU05SVhRYUUyaUlpN0VRUC9ZYW8xME5UeU1SMnJDCkh4TUkvUXdWck9NTThyK1p
VYm10QldIY1JWZS9qMVlVaXFTQjBJbmlraDFmeDZ3PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0t
AVI_CLOUD_NAME: sfo01w01vc01
AVI_CONTROL_PLANE_HA_PROVIDER: "true"
AVI_CONTROL_PLANE_NETWORK: sfo01-w01-vds01-tkgclustervip
AVI_CONTROL_PLANE_NETWORK_CIDR: 172.16.80.0/24
AVI_CONTROLLER: 172.16.10.11
AVI_DATA_NETWORK: sfo01-w01-vds01-tkgclustervip
AVI_DATA_NETWORK_CIDR: 172.16.80.0/24
AVI_ENABLE: "true"
AVI_NSXT_T1LR: /infra/tier-1s/sfo01w01tier1
AVI_MANAGEMENT_CLUSTER_CONTROL_PLANE_VIP_NETWORK_CIDR: 172.16.80.0/24
AVI_MANAGEMENT_CLUSTER_CONTROL_PLANE_VIP_NETWORK_NAME: sfo01-w01-vds01-tkgclustervip
AVI_MANAGEMENT_CLUSTER_SERVICE_ENGINE_GROUP: sfo01m01segroup01
AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_CIDR: 172.16.80.0/24
AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_NAME: sfo01-w01-vds01-tkgclustervip
AVI_PASSWORD: <encoded:Vk13YXJlMSE=>
AVI_SERVICE_ENGINE_GROUP: sfo01w01segroup01
AVI_USERNAME: admin
CLUSTER_ANNOTATIONS: 'description:,location:'
CLUSTER_CIDR: 100.96.0.0/11
CLUSTER_NAME: sfo01w01tkgmgmt01
CLUSTER_PLAN: prod
ENABLE_AUDIT_LOGGING: "false"
ENABLE_CEIP_PARTICIPATION: "false"
ENABLE_MHC: "false"
IDENTITY_MANAGEMENT_TYPE: none
INFRASTRUCTURE_PROVIDER: vsphere
LDAP_BIND_DN: ""
LDAP_BIND_PASSWORD: ""

VMware, Inc 325


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

LDAP_GROUP_SEARCH_BASE_DN: ""
LDAP_GROUP_SEARCH_FILTER: ""
LDAP_GROUP_SEARCH_GROUP_ATTRIBUTE: ""
LDAP_GROUP_SEARCH_NAME_ATTRIBUTE: cn
LDAP_GROUP_SEARCH_USER_ATTRIBUTE: DN
LDAP_HOST: ""
LDAP_ROOT_CA_DATA_B64: ""
LDAP_USER_SEARCH_BASE_DN: ""
LDAP_USER_SEARCH_FILTER: ""
LDAP_USER_SEARCH_NAME_ATTRIBUTE: ""
LDAP_USER_SEARCH_USERNAME: userPrincipalName
OIDC_IDENTITY_PROVIDER_CLIENT_ID: ""
OIDC_IDENTITY_PROVIDER_CLIENT_SECRET: ""
OIDC_IDENTITY_PROVIDER_GROUPS_CLAIM: ""
OIDC_IDENTITY_PROVIDER_ISSUER_URL: ""
OIDC_IDENTITY_PROVIDER_NAME: ""
OIDC_IDENTITY_PROVIDER_SCOPES: ""
OIDC_IDENTITY_PROVIDER_USERNAME_CLAIM: ""
OS_ARCH: amd64
OS_NAME: photon
OS_VERSION: "3"
SERVICE_CIDR: 100.64.0.0/13
TKG_HTTP_PROXY_ENABLED: "false"
VSPHERE_CONTROL_PLANE_DISK_GIB: "20"
VSPHERE_CONTROL_PLANE_ENDPOINT: ""
VSPHERE_CONTROL_PLANE_MEM_MIB: "4096"
VSPHERE_CONTROL_PLANE_NUM_CPUS: "2"
VSPHERE_DATACENTER: /sfo01w01dc01
VSPHERE_DATASTORE: /sfo01w01dc01/datastore/vsanDatastore
VSPHERE_FOLDER: /sfo01w01dc01/vm/tkg-management-components
VSPHERE_INSECURE: "true"
VSPHERE_NETWORK: /sfo01w01dc01/network/sfo01-w01-vds01-tkgmanagement
VSPHERE_PASSWORD: <encoded:Vk13YXJlMSE=>
VSPHERE_RESOURCE_POOL: /sfo01w01dc01/host/sfo01w01cluster01/Resources/tkg-management-c
omponents
VSPHERE_SERVER: 192.168.200.100
VSPHERE_SSH_AUTHORIZED_KEY: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDrPqkVaPpNxHcKxukYro
V6LcCTuRK9NDyygbsAr/P73jEeWIcC+SU4tRpOZks2+BoduUDzdrsfm/Uq/0uj9LuzqIZKAzA1iQ5DtipVzROq
eTuAXJVCMZc6RPgQSZofLBo1Is85M/IrBS20OMALwjukMdwotKKFwL758l51FVsKOT+MUSW/wJLKTv3l0KPObg
SRTMUQdQpoG7ONcMNG2VkBMfgaK44cL7vT0/0Mv/Fmf3Zd59ZaWvX28ZmGEjRx8kOm1j/os61Y+kOvl1MTv8wc
85rYusRuP2Uo5UM4kUTdhSTFasw6TLhbSWicKORPi3FYklvS70jkQFse2WsvmtFG5xyxE/rzDGHloud9g2bQ7T
x0rtWWoRCCC8Sl/vzCjgZfDQXwKXoMP0KbcYHZxSA3zY2lXBlhNtZtyKlynnhr97EaWsm3b9fvhJMmKW5ylkmk
7+4Bql7frJ4bOOR4+hHv57Q8XFOYdLGQPGv03RUFQwFE6a0a6qWAvmVmoh8+BmlGOfx7WYpp8hkyGOdtQz8ZJe
SOyMT6ztLHbY/WqDwEvKpf1dJy93w8fDmz3qXHpkpdnA0t4TiCfizlBk15ZI03TLi4ELoFvso9We13dGClHDDy
v0Dm87uaACC+fyAT5JPbZpAcCw8rm/yTuZ8awtR0LEzJUqNJjX/5OX7Bf45h9w== [email protected]
VSPHERE_TLS_THUMBPRINT: ""
VSPHERE_USERNAME: [email protected]
VSPHERE_WORKER_DISK_GIB: "20"
VSPHERE_WORKER_MEM_MIB: "4096"
VSPHERE_WORKER_NUM_CPUS: "2"
WORKER_ROLLOUT_STRATEGY: ""

Appendix B - Shared Service Cluster ADC file


apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:

VMware, Inc 326


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

generation: 3
name: tanzu-ako-for-shared
spec:
adminCredentialRef:
name: avi-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: avi-controller-ca
namespace: tkg-system-networking
cloudName: sfo01w01vc01
clusterSelector:
matchLabels:
type: shared-services
controlPlaneNetwork:
cidr: 172.16.80.0/24
name: sfo01-w01-vds01-tkgclustervip
controller: 172.16.10.10
controllerVersion: 22.1.3
dataNetwork:
cidr: 172.16.80.0/24
name: sfo01-w01-vds01-tkgclustervip
extraConfigs:
disableStaticRouteSync: false
ingress:
defaultIngressController: false
disableIngressClass: true
nodeNetworkList:
- networkName: sfo01-w01-vds01-tkgmanagement
networksConfig:
nsxtT1LR: /infra/tier-1s/sfo01w01tier1
serviceEngineGroup: sfo01m01segroup01

Appendix C - Workload Cluster ADC file


apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
generation: 3
name: install-ako-for-workload-02
spec:
adminCredentialRef:
name: avi-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: avi-controller-ca
namespace: tkg-system-networking
cloudName: sfo01w01vc01
clusterSelector:
matchLabels:
workload-l7-enabled: "true"
controlPlaneNetwork:
cidr: 172.16.80.0/24
name: sfo01-w01-vds01-tkgclustervip
controller: 172.16.10.10
controllerVersion: 22.1.3
dataNetwork:
cidr: 172.16.80.0/24
name: sfo01-w01-vds01-tkgclustervip

VMware, Inc 327


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

extraConfigs:
disableStaticRouteSync: true
ingress:
defaultIngressController: true
disableIngressClass: false
serviceType: NodePortLocal
shardVSSize: MEDIUM
nodeNetworkList:
- networkName: sfo01-w01-vds01-tkgworkload
cidrs:
- 172.16.60.0/24
networksConfig:
nsxtT1LR: /infra/tier-1s/sfo01w01tier1
serviceEngineGroup: sfo01w01segroup01

VMware, Inc 328


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

VMware Tanzu for Kubernetes Operation on


vSphere with Tanzu Reference Designs and
Deployment

The following documentation lays out the reference designs for deploying Tanzu for Kubernetes
Operation (informally known as TKO) on vSphere with Tanzu. A separate reference design is
provided for environments that use NSX-T.

VMware Tanzu for Kubernetes Operations using vSphere with Tanzu Reference Design
Deploy Tanzu for Kubernetes Operations using vSphere with Tanzu

VMware Tanzu for Kubernetes Operations using vSphere with Tanzu on NSX-T Reference
Design

VMware Tanzu for Kubernetes Operations using vSphere


with Tanzu Reference Design
vSphere with Tanzu transforms the vSphere cluster into a platform for running Kubernetes workloads
in dedicated resource pools. When vSphere with Tanzu is enabled on a vSphere cluster, vSphere
with Tanzu creates a Kubernetes control plane directly in the hypervisor layer. You can then run
Kubernetes containers by creating upstream Kubernetes clusters through the VMware Tanzu
Kubernetes Grid Service, and run your applications inside these clusters.

This document provides a reference design for deploying VMware Tanzu for Kubernetes Operations
(informally known as TKO) on vSphere with Tanzu.

The following reference design is based on the architecture and components described in VMware
Tanzu for Kubernetes Operations Reference Architecture.

VMware, Inc 329


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

vSphere with Tanzu Components


Supervisor Cluster: When Workload Management is enabled on a vSphere cluster, it
creates a Kubernetes layer within the ESXi hosts that are part of the cluster. A cluster that is
enabled for Workload Management is called a Supervisor Cluster. You run containerized
workloads by creating upstream Kubernetes clusters on the Supervisor Cluster through the
Tanzu Kubernetes Grid Service.

The Supervisor Cluster runs on top of an SDDC layer that consists of ESXi for compute,
vSphere Distributed Switch for networking, and vSAN or another shared storage solution.

vSphere Namespaces: A vSphere Namespace is a tenancy boundary within vSphere with


Tanzu. A vSphere Namespace allows for sharing vSphere resources (computer, networking,
storage), and enforcing resource limits with the underlying objects such as Tanzu Kubernetes
clusters. For each namespace, you configure role-based access control ( policies and
permissions ), images library, and virtual machine classes.

Tanzu Kubernetes Grid Service: Tanzu Kubernetes Grid Service (TKGS) allows you to create
and manage ubiquitous Kubernetes clusters on a VMware vSphere infrastructure using the
Kubernetes Cluster API. The Cluster API provides declarative, Kubernetes-style APIs for the
creation, configuration, and management of the Tanzu Kubernetes Cluster.

Tanzu Kubernetes Grid Service also provides self-service lifecycle management of Tanzu
Kubernetes clusters.

Tanzu Kubernetes Cluster (Workload Cluster): Tanzu Kubernetes clusters are Kubernetes
workload clusters in which your application workloads run. These clusters can be attached to
SaaS solutions such as Tanzu Mission Control, Tanzu Observability, and Tanzu Service Mesh,
which are part of Tanzu for Kubernetes Operations.

VM Class in vSphere with Tanzu: A VM class is a template that defines CPU, memory, and
reservations for VMs. VM classes are used for VM deployment in a Supervisor Namespace.
VM classes can be used by standalone VMs that run in a Supervisor Namespace and by VMs
hosting a Tanzu Kubernetes cluster.

VMware, Inc 330


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

VM classes in vSphere with Tanzu are broadly categorized into the following groups:

guaranteed: The guaranteed class fully reserves its configured resources.

best-effort: The best-effort class allows resources to be overcommitted.

vSphere with Tanzu offers several default VM classes. You can use them as is or you can
create new VM classes. The following screenshot shows the default VM classes that are
available in vSphere with Tanzu.

Storage Classes in vSphere with Tanzu: A StorageClass provides a way for administrators to
describe the classes of storage they offer. Different classes can map to quality-of-service
levels, to backup policies, or to arbitrary policies determined by the cluster administrators.

You can deploy vSphere with Tanzu with an existing default StorageClass or the vSphere
Administrator can define StorageClass objects (Storage policy) that let cluster users
dynamically create PVC and PV objects with different storage types and rules.

The following table provides recommendations for configuring VM Classes/Storage Classes in a


vSphere with Tanzu environment.

Decision
Design Decision Design Justification Design Implications
ID

TKO- Create custom To provide different levels of QoS Default Storage Policy might not be adequate if
TKGS- Storage and SLA for prod and dev/test deployed applications have different performance
001 Classes/Profiles/ K8s workloads. and availability requirements.
Policies To isolate Supervisor clusters
from workload clusters.

TKO- Create custom To facilitate deployment of K8s Default VM Classes in vSphere with Tanzu are not
TKGS- VM Classes workloads with specific adequate to run a wide variety of K8s workloads.
002 compute/storage requirements.

vSphere with Tanzu Architecture


The following diagram shows a high-level architecture of vSphere with Tanzu.

VMware, Inc 331


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

The Supervisor Cluster consists of the following components:

Kubernetes control plane VM: Three Kubernetes control plane VMs in total are created on
the hosts that are part of the Supervisor cluster. The three control plane VMs are load-
balanced as each one of them has its own IP address.

Cluster API and Tanzu Kubernetes Grid Service: These modules run on the Supervisor
cluster and enable the provisioning and management of Tanzu Kubernetes clusters.

The following diagram shows the general architecture of the Supervisor cluster.

After a Supervisor cluster is created, the vSphere administrator creates vSphere namespaces. When
initially created, vSphere namespaces have unlimited resources within the Supervisor cluster. The
vSphere administrator defines the limits for CPU, memory, and storage, as well as the number of
Kubernetes objects such as deployments, replica sets, persistent volumes, and so on. that can run
within the namespace. These limits are configured for each vSphere namespace.

For more information about the maximum supported number, see the vSphere with Tanzu
Configuration Maximums guide.

VMware, Inc 332


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

To provide tenants access to namespaces, the vSphere administrator assigns permission to users or
groups available within an identity source that is associated with vCenter Single Sign-On.

Once the permissions are assigned, tenants can access the namespace to create Tanzu Kubernetes
Clusters using YAML files and the Cluster API.

Here are some recommendations for using namespaces in a vSphere with Tanzu environment.

Decision
Design Decision Design Justification Design Implications
ID

TKO- Create namespaces Create dedicated namespaces for All Kubernetes clusters created under a
TKGS- to logically separate the type of workloads namespace share the same access
003 K8s workloads. (prod/dev/test) that you intend policy/quotas/network resources.
to run.

TKO- Enable self-service Enable DevOps/Cluster admin The vSphere administrator must publish a
TKGS- namespaces. users to provision namespaces in namespace template to the LDAP
004 a self-service manner. users/groups to enable them to create
namespaces.

TKO- Register external Limit access to a namespace to A prod namespace can be accessed by a
TKGS- identity source authorized users/groups. handful of users, whereas a dev/test
005 (AD/LDAP) with namespace can be exposed to a wider
vCenter. audience.

Supported Component Matrix


Software Components Version

Tanzu Kubernetes Release 1.24.9

VMware vSphere ESXi 8.0 U1 or later

VMware vCenter (VCSA) 8.0 U1 or later

NSX Advanced Load Balancer 22.1.3

vSphere with Tanzu Storage


vSphere with Tanzu integrates with shared datastores available in the vSphere infrastructure. The
following types of shared datastores are supported:

VMware, Inc 333


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

vSAN

VMFS

NFS

vVols

vSphere with Tanzu uses storage policies to integrate with shared datastores. The policies represent
datastores and manage the storage placement of objects such as control plane VMs, container
images, and persistent storage volumes.

Before you enable vSphere with Tanzu, create storage policies to be used by the Supervisor Cluster
and namespaces. Depending on your vSphere storage environment, you can create several storage
policies to represent different classes of storage.

vSphere with Tanzu is agnostic about which storage option you choose. For Kubernetes stateful
workloads, vSphere with Tanzu installs the vSphere Container Storage Interface (vSphere CSI) to
automatically provision Kubernetes persistent volumes for pods.

Tanzu Kubernetes Clusters Networking


A Tanzu Kubernetes cluster provisioned by the Tanzu Kubernetes Grid supports the following
Container Network Interface (CNI) options:

Antrea

Calico

The CNI options are open-source software that provide networking for cluster pods, services, and
ingress.

When you deploy a Tanzu Kubernetes cluster using the default configuration of Tanzu CLI, Antrea
CNI is automatically enabled in the cluster.

To provision a Tanzu Kubernetes cluster using Calico CNI, see Deploy Tanzu Kubernetes clusters
with Calico

Each CNI is suitable for a different use case. The following table lists some common use cases for the
CNI options that Tanzu Kubernetes Grid supports. This table will help you select the most appropriate
CNI for your Tanzu Kubernetes Grid implementation.

CNI Use Case Pros and Cons

Ant
Enable Kubernetes pod networking with IP overlay networks using VXLAN Pros
rea
or Geneve for encapsulation. Optionally encrypt node-to-node
- Antrea leverages Open vSwitch
communication using IPSec packet encryption.
as the networking data plane.
Antrea supports advanced network use cases like kernel bypass and Open vSwitch supports both
network service mesh. Linux and Windows.

- VMware supports the latest


conformant Kubernetes and
stable releases of Antrea.

VMware, Inc 334


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

CNI Use Case Pros and Cons

Cali
Calico is used in environments where factors like network performance, Pros
co
flexibility, and power are essential.
- Support for Network Policies
For routing packets between nodes, Calico leverages the BGP routing
- High network performance
protocol instead of an overlay network. This eliminates the need to wrap
packets with an encapsulation layer resulting in increased network - SCTP Support
performance for Kubernetes workloads.
Cons

- No multicast support

Networking for vSphere with Tanzu


You can deploy vSphere with Tanzu on various networking stacks, including:

VMware NSX-T Data Center Networking.

vSphere Virtual Distributed Switch (VDS) Networking with NSX Advanced Load Balancer.

Note

The scope of this discussion is limited to vSphere Networking (VDS) with NSX
Advanced Load Balancer.

vSphere with Tanzu on vSphere Networking with NSX Advanced


Load Balancer
In a vSphere with Tanzu environment, a Supervisor Cluster configured with vSphere networking uses
distributed port groups to provide connectivity to Kubernetes control plane VMs, services, and
workloads. All hosts from the cluster, which is enabled for vSphere with Tanzu, are connected to the
distributed switch that provides connectivity to Kubernetes workloads and control plane VMs.

You can use one or more distributed port groups as Workload Networks. The network that provides
connectivity to the Kubernetes Control Plane VMs is called Primary Workload Network. You can
assign this network to all the namespaces on the Supervisor Cluster, or you can use different
networks for each namespace. The Tanzu Kubernetes clusters connect to the Workload Network
that is assigned to the namespace.

The Supervisor Cluster leverages NSX Advanced Load Balancer (NSX ALB) to provide L4 load
balancing for the Tanzu Kubernetes clusters control-plane HA. Users access the applications by
connecting to the Virtual IP address (VIP) of the applications provisioned by NSX Advanced Load
Balancer.

The following diagram shows a general overview for vSphere with Tanzu on vSphere Networking.

VMware, Inc 335


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

NSX Advanced Load Balancer Components


NSX Advanced Load Balancer is deployed in write access mode in a vSphere environment. This
mode grants NSX Advanced Load Balancer Controllers full write access to the vCenter, which helps
in automatically creating, modifying, and removing SEs and other resources as needed to adapt to
changing traffic needs. The following are the core components of NSX Advanced Load Balancer:

NSX Advanced Load Balancer Controller: NSX Advanced Load Balancer Controller
manages Virtual Service objects and interacts with the vCenter Server infrastructure to
manage the lifecycle of the service engines (SEs). It is the central repository for the
configurations and policies related to services and management and provides the portal for

VMware, Inc 336


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

viewing the health of VirtualServices and SEs and the associated analytics that NSX
Advanced Load Balancer provides.

NSX Advanced Load Balancer Service Engine: NSX Advanced Load Balancer Service
Engines (SEs) are lightweight VMs that handle all data plane operations by receiving and
executing instructions from the controller. The SEs perform load balancing and all client and
server-facing network interactions.

Avi Kubernetes Operator (AKO): Avi Kubernetes Operator is a Kubernetes operator that
runs as a pod in the Supervisor Cluster. It provides ingress and load balancing functionality.
Avi Kubernetes Operator translates the required Kubernetes objects to NSX Advanced Load
Balancer objects and automates the implementation of ingresses/routes/services on the
Service Engines (SE) via the NSX Advanced Load Balancer Controller.

Each environment configured in NSX Advanced Load Balancer is referred to as a cloud. Each cloud
in NSX Advanced Load Balancer maintains networking and NSX Advanced Load Balancer Service
Engine settings. Each cloud is configured with one or more VIP networks to provide IP addresses to
L4 load balancing virtual services created under that cloud.

The virtual services can be spanned across multiple Service Engines if the associated Service Engine
Group is configured in Active/Active HA mode. A Service Engine can belong to only one Service
Engine group at a time.

IP address allocation for virtual services can be over DHCP or via NSX Advanced Load Balancer in-
built IPAM functionality. The VIP networks created/configured in NSX Advanced Load Balancer are
associated with the IPAM profile.

Network Architecture
To deploy vSphere with Tanzu, build separate networks for the Tanzu Kubernetes Grid management
(Supervisor) cluster, Tanzu Kubernetes Grid workload clusters, NSX Advanced Load Balancer
components, and the Tanzu Kubernetes Grid control plane HA.

The network reference design can be mapped into this general framework.

VMware, Inc 337


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Note

The network/portgroup designated for the workload cluster, carries both data and
control traffic. Firewalls cannot be utilized to segregate traffic between workload
clusters; instead, the underlying CNI must be employed as the main filtering system.
Antrea CNI has the Custom Resource Definitions (CRDs) for firewall rules that can be
enforce before Kubernetes network policy is added.

Based on your requirements, you can create additional networks for your workload
cluster. These networks are also referred to as vSphere with Tanzu workload
secondary network.

This topology enables the following benefits:

Isolate and separate SDDC management components (vCenter, ESX) from the vSphere with
Tanzu components. This reference design allows only the minimum connectivity between
the Tanzu Kubernetes Grid clusters and NSX Advanced Load Balancer to the vCenter
Server.

Isolate and separate the NSX Advanced Load Balancer management network from the
supervisor cluster network and the Tanzu Kubernetes Grid workload networks.

Separate vSphere Admin and Tenant access to the supervisor cluster. This prevents tenants
from attempting to connect to the supervisor cluster.

VMware, Inc 338


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Allow tenants to access only their own workload cluster(s) and restrict access to this cluster
from other tenants. This separation can be achieved by assigning permissions to the
supervisor namespaces.

Depending on the workload cluster type and use case, multiple workload clusters may
leverage the same workload network or new networks can be used for each workload
cluster.

Network Requirements

As per the reference architecture, the list of required networks is as follows:

DHCP
Network Type Description
Service

NSX Advanced Load Balancer Optional


NSX Advanced Load Balancer controllers and SEs will be attached to
Management Network
this network.

TKG Management Network Optional Supervisor Cluster nodes will be attached to this network.

TKG Workload Network (Primary) Optional


Control plane and worker nodes of TKG workload clusters will be
attached to this network.

The second interface of the Supervisor nodes is also attached to this


network.

TKG Cluster VIP/Data Network No


Virtual Services (L4) for Control plane HA of all TKG clusters
(Supervisor and Workload).

Reserve sufficient IPs depending on the number of TKG clusters


planned to be deployed in the environment.

Subnet and CIDR Examples


For the purpose of demonstration, this document makes use of the following Subnet CIDR for TKO
deployment.

Gateway NSX Advanced Load


Network Type Segment Name DHCP Pool
CIDR Balancer IP Pool

NSX Advanced Load NSX-Advanced Load 192.168.1 NA 192.168.10.14 -


Balancer Mgmt Network Balancer-Mgmt 0.1/27 192.168.10.30

Supervisor Cluster Network TKG-Management 192.168.4 192.168.40.2 - NA


0.1/28 192.168.40.14

TKG Workload Primary TKG-Workload-PG01 192.168.6 192.168.60.2 - NA


Network 0.1/24 192.168.60.251

VMware, Inc 339


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Gateway NSX Advanced Load


Network Type Segment Name DHCP Pool
CIDR Balancer IP Pool

TKG Cluster VIP/Data TKG-Cluster-VIP 192.168.8 NA


SE Pool:
Network 0.1/26
192.168.80.2 -
192.168.80.20

TKG Cluster VIP Range:

192.168.80.21 -
192.168.80.60

Firewall Requirements
To prepare the firewall, you need the following information:

1. NSX Advanced Load Balancer Controller node and VIP addresses

2. NSX Advanced Load Balancer Service Engine management IP address

3. Supervisor Cluster network (Tanzu Kubernetes Grid Management) CIDR

4. Tanzu Kubernetes Grid workload cluster CIDR

5. Tanzu Kubernetes Grid cluster VIP address range

6. Client machine IP address

7. vCenter server IP address

8. VMware Harbor registry IP address

9. DNS server IP address(es)

10. NTP server IP address(es)

The following table provides a list of firewall rules based on the assumption that there is no firewall
within a subnet/VLAN.

Source Destination Protocol:Port Description

Client Machine NSX Advanced Load TCP:443 Access NSX Advanced Load Balancer portal
Balancer Controller for configuration.
Nodes and VIP

Client Machine vCenter Server TCP:443 Access and configure WCP in vCenter.

Client Machine TKG Cluster VIP Range


TCP:6443 TKG Cluster Access.

TCP:443 Access https workload.

TCP:80 Access http workload.

TCP:443 Access TMC portal, and so on.


Client Machine *.tmc.cloud.vmware.co
m
(optional)
console.cloud.vmware.c
om

VMware, Inc 340


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Source Destination Protocol:Port Description

TKG Management and


DNS Server TCP/UDP:53 DNS Service
Workload Cluster CIDR
NTP Server UDP:123 Time Synchronization

TKG Management vCenter IP TCP:443 Allow components to access vCenter to


Cluster CIDR create VMs and Storage Volumes.

TKG Management and NSX Advanced Load TCP:443 Allow Avi Kubernetes Operator (AKO) and
Workload Cluster CIDR Balancer controller AKO Operator (AKOO) access to NSX
nodes Advanced Load Balancer Controller.

TKG Management and TKG Cluster VIP Range TCP:6443 Allow Supervisor cluster to configure
Workload Cluster CIDR workload clusters.

TKG Management and Image Registry (Harbor) TCP:443 Allow components to retrieve container
Workload Cluster CIDR (If Private) images.

TKG Management and TCP:443 Sync content library, pull TKG binaries, and
wp-content.vmware.com
Workload Cluster CIDR interact with TMC.
*.tmc.cloud.vmware.co
m

Projects.registry.vmware
.com

TKG Management TKG Workload Cluster TCP:6443 VM Operator and TKC VM communication.
cluster CIDR CIDR

TKG Workload Cluster TKG Management Cluster TCP:6443 Allow the TKG workload cluster to register
CIDR CIDR with the Supervisor cluster.

NSX Advanced Load vCenter and ESXi Hosts TCP:443 Allow NSX Advanced Load Balancer to
Balancer Management discover vCenter objects and deploy SEs as
Network required.

NSX Advanced Load


DNS Server TCP/UDP:53 DNS Service
Balancer Controller
Nodes NTP Server UDP:123 Time Synchronization

TKG Cluster VIP Range TKG Management Cluster TCP:6443 To interact with the Supervisor cluster.
CIDR

TKG Cluster VIP Range TKG Workload Cluster To interact with workload cluster and K8s
TCP:6443
CIDR applications
TCP:443

TCP:80

vCenter Server TKG Management Cluster


TCP:443
CIDR
TCP:6443

TCP:22 (optional)

Note

For TMC, if the firewall does not allow wildcards, all IP addresses of

VMware, Inc 341


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

[account].tmc.cloud.vmware.com and extensions.aws-usw2.tmc.cloud.vmware.com


need to be whitelisted.

Deployment options
Starting with vSphere 8, when you enable vSphere with Tanzu, you can configure either one-zone
Supervisor mapped to one vSphere cluster or three-zone Supervisor mapped to three vSphere
clusters.

Single-Zone Deployment of Supervisor


A supervisor deployed on s single vSphere cluster has three control plane VMs, which reside on the
ESXi hosts part of the cluster. A single zone is created for the Supervisor automatically or you can
use a zone that is created in advance. In a Single-Zone deployment, cluster-level high availability is
maintained through vSphere HA and can scale with vSphere with Tanzu setup by adding physical
hosts to vSphere cluster that maps to the Supervisor. You can run workloads through vSphere Pods,
Tanzu Kubernetes Grid clusters and VMs when Supervisor is enabled with the NSX networking stack.

Three-Zone Deployment of Supervisor


Configure each vSphere cluster as an independent failure domain and map it to the vSphere zone.
In a Three-Zone deployment, all three vSphere clusters become one Supervisor and can provide :

Cluster-level high availability to the Supervisor as vSphere cluster is an independent failure


domain.

Distribute the nodes of Tanzu Kubernetes Grid clusters across all three vSphere zones and
provide availability via vSphere HA at cluster level.

Scale the Supervisor by adding hosts to each of the three vSphere clusters.

For more information, see Supervisor Architecture and Components.

Installation Experience
vSphere with Tanzu deployment starts with deploying the Supervisor cluster (Enabling Workload
Management). The deployment is directly done from the vCenter user interface (UI). The Get Started
page lists the pre-requisites for the deployment.

VMware, Inc 342


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

The vCenter UI shows that, in the current version, it is possible to install vSphere with Tanzu on the
VDS networking stack as well as NSX-T Data Center as the networking solution.

This installation process takes you through the steps of deploying Supervisor Cluster in your
vSphere environment. Once the Supervisor cluster is deployed, you can use either Tanzu Mission
Control or Kubectl utility to deploy the Tanzu Kubernetes Shared Service and workload clusters.

Design Recommendations
NSX Advanced Load Balancer Recommendations
The following table provides recommendations for configuring NSX Advanced Load Balancer in a
vSphere with Tanzu environment.

Decision
Design Decision Design Justification Design Implications
ID

VMware, Inc 343


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

TKO- Deploy NSX Advanced To isolate NSX Advanced Load Balancer traffic from Allows for ease of
Advance Load Balancer infrastructure management traffic and Kubernetes management of
d Load controller cluster nodes workloads. controllers.
Balancer on a network dedicated Additional Network
-001 to NSX-Advanced Load (VLAN) is required.
Balancer.

TKO- Deploy 3 NSX To achieve high availability for the NSX Advanced
Clustered mode
Advance Advanced Load Load Balancer platform. In clustered mode, NSX
requires more
d Load Balancer controllers Advanced Load Balancer availability is not impacted
compute and storage
Balancer nodes. by an individual controller node failure. The failed
resources.
-002 node can be removed from the cluster and
redeployed if recovery is not possible.

TKO- Configure vCenter Using a non-default vCenter cloud is not supported Using a non-default
Advance settings in Default- with vSphere with Tanzu. cloud can lead to
d Load Cloud. deployment failures.
Balancer
-003

TKO- Use static IPs for the NSX Advanced Load Balancer Controller cluster uses NSX Advanced Load
Advance NSX Advanced Load management IP addresses to form and maintain Balancer Controller
d Load Balancer controllers if quorum for the control plane cluster. Any changes control plane might go
Balancer DHCP cannot would be disruptive. down if the
-004 guarantee a permanent management IPs of the
lease. controller node
changes.

TKO- Guarantees IP address assignment for Service Engine Removes the corner
Use NSX Advanced
Advance Data NICs and Virtual Services. case scenario when
Load Balancer IPAM for
d Load the DHCP server runs
Service Engine data
Balancer out of the lease or is
network and virtual
-005 down.
services IP assignment.

TKO- Reserve an IP in the NSX Advanced Load Balancer portal is always NSX Advanced Load
Advance NSX Advanced Load accessible over Cluster IP regardless of a specific Balancer
d Load Balancer management individual controller node failure. administration is not
Balancer subnet to be used as affected by an
-006 the Cluster IP for the individual controller
Controller Cluster. node failure.

TKO- Use default Service Using a non-default Service Engine Group for hosting Using a non-default
Advance Engine Group for load L4 virtual service created for TKG control plane HA is Service Engine Group
d Load balancing of TKG not supported. can lead to Service
Balancer clusters control plane. Engine VM
-007 deployment failure.

TKO- Share Service Engines Minimize the licensing cost.


Each Service Engine
Advance for the same type of
contributes to the CPU
d Load workload
core capacity
Balancer (dev/test/prod)clusters.
associated with a
-008
license.

Sharing Service
Engines can help
reduce the licensing
cost.

VMware, Inc 344


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Decision
Design Decision Design Justification Design Implications
ID

TKO- Configure anti-affinity This is to ensure that no two controllers end up in Anti-Affinity rules need
Advance rules for the NSX ALB same ESXi host and thus avoid single point of failure. to be created
d Load controller cluster. manually.
Balancer
-009

TKO- Configure backup for Backups are required if the NSX ALB Controller To store backups, a
Advance the NSX ALB Controller becomes inoperable or if the environment needs to be SCP capable backup
d Load cluster. restored from a previous state. location is needed.
Balancer SCP is the only
-0010 supported protocol
currently.

TKO- Initial setup should be NSX ALB controller cluster is created from an NSX ALB controller
Advance done only on one NSX initialized NSX ALB controller which becomes the cluster creation fails if
d Load ALB controller VM out cluster leader. more than one NSX
Balancer of the three deployed to Follower NSX ALB controller nodes need to be ALB controller is
-0011 create an NSX ALB uninitialized to join the cluster. initialized.
controller cluster.

TKO- Configure Remote For operations teams to centrally monitor NSX ALB Additional Operational
Advance logging for NSX ALB and escalate alerts events must be sent from the NSX Overhead.
d Load Controller to send ALB Controller Additional
Balancer events on Syslog. infrastructure
-0012 Resource.

TKO- Use LDAP/SAML based Helps to Maintain Role based Access Control. Additional
Advance Authentication for NSX Configuration is
d Load ALB required.
Balancer
-0013

Network Recommendations
The following are the key network recommendations for a production-grade vSphere with Tanzu
deployment:

Decision
Design Decision Design Justification Design Implications
ID

TKO- Use separate To have a flexible firewall and Sharing the same network for multiple clusters
NET-001 networks for security policies can complicate creation of firewall rules.
Supervisor cluster
and workload
clusters.

TKO- Use distinct port Isolate production Kubernetes Network mapping is done at the namespace
NET- groups for network clusters from dev/test clusters level. All Kubernetes clusters created in a
002 separation of K8s by placing them on distinct port namespace connect to the same port group.
workloads. groups.

TKO- Use routable Allow connectivity between the Networks that are used for Tanzu Kubernetes
NET- networks for Tanzu TKG clusters and infrastructure cluster traffic must be routable between each
003 Kubernetes clusters. components. other and the Supervisor Cluster Management
Network.

VMware, Inc 345


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Recommendations for Supervisor Clusters


Decision
Design Decision Design Justification Design Implications
ID

TKO- Create a Subscribed


Subscribed Content Library can Local Content Library would require
TKGS- Content Library.
automatically pull the latest OVAs used manual upload of images, suitable for
001
by the Tanzu Kubernetes Grid Service to air-gapped or Internet-restricted
build cluster nodes. environment.

Using a subscribed content library


facilitates template management as new
versions can be pulled by initiating the
library sync.

TKO- Deploy Supervisor Large form factor should suffice to Consume more Resources from
TKGS- cluster control integrate Supervisor Cluster with TMC Infrastructure.
002 plane nodes in large and velero deployment.
form factor.

TKO- Register Supervisor Tanzu Mission Control automates the Need outbound connectivity to
TKGS- cluster with Tanzu creation of the Tanzu Kubernetes clusters internet for TMC registration.
003 Mission Control. and manage the life cycle of all clusters
centrally.

Note

SaaS endpoints here refers to Tanzu Mission Control, Tanzu Service Mesh and Tanzu
Observability.

Recommendations for Tanzu Kubernetes Clusters


Decision
Design Decision Design Justification Design Implications
ID

TKO- Deploy Tanzu The prod plan provides high availability for the Consume from resource from
TKC-001 Kubernetes clusters control plane. Infrastructure.
with prod plan and
multiple worker
nodes.

TKO- Use guaranteed VM Guarantees compute resources are always Could prevent automatic
TKC- class for Tanzu available for containerized workloads. migration of nodes by DRS.
002 Kubernetes clusters.

TKO- Implement RBAC for To avoid the usage of administrator credentials External AD/LDAP needs to
TKC- Tanzu Kubernetes for managing the clusters. be integrated with vCenter or
003 clusters. SSO groups need to be
created manually.

TKO- Deploy Tanzu Tanzu Mission Control provides life-cycle Only Antrea CNI is supported
TKC-04 Kubernetes clusters management for the Tanzu Kubernetes clusters on Workload clusters created
from Tanzu Mission and automatic integration with Tanzu Service from TMC portal.
Control. Mesh and Tanzu Observability.

VMware, Inc 346


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Kubernetes Ingress Routing


vSphere with Tanzu does not ship with a default ingress controller. Any Tanzu-supported ingress
controller can be used.

One example of an ingress controller is Contour, an open-source controller for Kubernetes ingress
routing. Contour is part of a Tanzu package and can be installed on any Tanzu Kubernetes cluster.
Deploying Contour is a prerequisite for deploying Prometheus, Grafana, and Harbor on a workload
cluster.

For more information about Contour, see the Contour site and Implementing Ingress Control with
Contour.

Tanzu Service Mesh also offers an Ingress controller based on Istio.

Each ingress controller has pros and cons of its own. The below table provides general
recommendations on when you should use a specific ingress controller for your Kubernetes
environment.

Ingress
Use Cases
Controller

Contour
Use Contour when only north-south traffic is needed in a Kubernetes cluster. You can apply security
policies for the north-south traffic by defining the policies in the manifest file for the application.

Contour is a reliable solution for simple Kubernetes workloads.

Istio Use Istio ingress controller when you need to provide security, traffic direction, and insight within the
cluster (east-west traffic) and between the cluster and the outside world (north-south traffic).

NSX Advanced Load Balancer Sizing Guidelines


NSX Advanced Load Balancer Controller Configuration
Regardless of NSX Advanced Load Balancer Controller configuration, each controller cluster can
achieve up to 5,000 virtual services; 5,000 is a hard limit. For more information, see Avi Controller
Sizing.

Controller Size VM Configuration Virtual Services Avi SE Scale

Essentials 4 vCPUS, 24 GB RAM 0-50 0-10

Small 6 vCPUS, 24 GB RAM 0-200 0-100

Medium 10 vCPUS, 32 GB RAM 200-1000 100-200

Large 16 vCPUS, 48 GB RAM 1000-5000 200-400

Service Engine Sizing Guidelines


See Sizing Service Engines for guidance on sizing your SEs.

Performance metric 1 vCPU core

Throughput 4 Gb/s

VMware, Inc 347


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Performance metric 1 vCPU core

Connections/s 40k

SSL Throughput 1 Gb/s

SSL TPS (RSA2K) ~600

SSL TPS (ECC) 2500

Multiple performance vectors or features may have an impact on performance. For example, to
achieve 1 Gb/s of SSL throughput and 2000 TPS of SSL with EC certificates, NSX Advanced Load
Balancer recommends two cores.

NSX Advanced Load Balancer Service Engines may be configured with as little as 1 vCPU core and 2
GB RAM, or up to 64 vCPU cores and 256 GB RAM. It is recommended for a Service Engine to have
at least 4 GB of memory when GeoDB is in use.

Container Registry
VMware Tanzu for Kubernetes Operations using vSphere with Tanzu includes Harbor as a container
registry. Harbor is an open-source, trusted, cloud-native container registry that stores, signs, and
scans content.

The initial configuration and setup of the platform does not require any external registry because the
required images are delivered through vCenter. Customer can choose any existing repository and if
required can deploy harbor registry for storing the images.

When vSphere with Tanzu is deployed on VDS networking, you can deploy an external container
registry (Harbor) for Tanzu Kubernetes clusters.

You may use one of the following methods to install Harbor:

Tanzu Kubernetes Grid Package deployment - VMware recommends this installation


method for general use cases. The Tanzu packages, including Harbor, must either be pulled
directly from VMware or be hosted in an internal registry.

VM-based deployment using OVA - VMware recommends this installation method in cases
where Tanzu Kubernetes Grid is being installed in an air-gapped or Internet-restricted
environment, and no pre-existing image registry exists to host the Tanzu Kubernetes Grid
system images. VM-based deployments are only supported by VMware Global Support
Services to host the system images for air-gapped or Internet-restricted deployments. Do
not use this method for hosting application images.

When deploying Harbor with self-signed certificates or certificates signed by internal CAs, it is
necessary for the Tanzu Kubernetes cluster to establish trust with the registry’s certificate. To do so,
follow the procedure in Trust Custom CA Certificates on Cluster Nodes.

VMware, Inc 348


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

vSphere with Tanzu SaaS Integration


The SaaS products in the VMware Tanzu portfolio are on the critical path for securing systems at the
heart of your IT infrastructure. VMware Tanzu Mission Control provides a centralized control plane for
Kubernetes, and Tanzu Service Mesh provides a global control plane for service mesh networks.
Tanzu Observability features include Kubernetes monitoring, application observability, and service
insights.

To learn more about Tanzu Kubernetes Grid integration with Tanzu SaaS, see Tanzu SaaS Services.

Custom Tanzu Observability Dashboards


Tanzu Observability provides various out-of-the-box dashboards. You can customize the dashboards
for your particular deployment. For information about customizing Tanzu Observability dashboards
for Tanzu for Kubernetes Operations, see Customize Tanzu Observability Dashboard for Tanzu for
Kubernetes Operations.

Summary
vSphere with Tanzu on hyper-converged hardware offers high-performance potential and
convenience and addresses the challenges of creating, testing, and updating on-premises
Kubernetes platforms in a consolidated production environment. This validated approach results in a
production installation with all the application services needed to serve combined or uniquely
separated workload types via a combined infrastructure solution.

This plan meets many Day-0 needs for quickly aligning product capabilities to full-stack
infrastructure, including networking, configuring firewall rules, load balancing, workload compute
alignment, and other capabilities.

Deployment Instructions
For instructions on how to deploy this reference design, see Deploy Tanzu for Kubernetes

VMware, Inc 349


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Operations using vSphere with Tanzu.

Deploy VMware Tanzu for Kubernetes Operations using


vSphere with Tanzu
This document outlines the steps for deploying Tanzu for Kubernetes Operations (informally known
as TKO) using vSphere with Tanzu in a vSphere environment backed by a Virtual Distributed Switch
(VDS) and leveraging NSX Advanced Load Balancer (ALB) Enterprise Edition for L4/L7 load
balancing and ingress.

The scope of the document is limited to providing deployment steps based on the reference design
in VMware Tanzu for Kubernetes Operations using vSphere with Tanzu Reference Design. This
document does not cover any deployment procedures for the underlying SDDC components.

Deploying with VMware Service Installer for Tanzu


You can use VMware Service Installer for VMware Tanzu to automate this deployment.

VMware Service Installer for Tanzu automates the deployment of the reference designs for Tanzu for
Kubernetes Operations. It uses best practices for deploying and configuring the required Tanzu for
Kubernetes Operations components.

To use Service Installer to automate this deployment, see Deploying VMware Tanzu for Kubernetes
Operations on vSphere with Tanzu and vSphere Distributed Switch Using Service Installer for
VMware Tanzu.

Alternatively, if you decide to manually deploy each component, follow the steps provided in this
document.

Prerequisites
Before deploying Tanzu Kubernetes Operations using vSphere with Tanzu on vSphere networking,
ensure that your environment is set up as described in the following:

General Requirements

Network Requirements

Firewall Requirements

Resource Pools

General Requirements
Ensure that your environment meets the following general requirements:

vSphere 8.0 instance with an Enterprise Plus license.

Your vSphere environment has the following objects in place:


A vSphere cluster with at least 3 hosts on which vSphere HA & DRS is enabled. If you
are using vSAN for shared storage, it is recommended that you use 4 ESXi hosts.

A distributed switch with port groups for TKO components. See Network
Requirements for the required port groups.

VMware, Inc 350


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

All ESXi hosts of the cluster on which vSphere with Tanzu will be enabled should be
part of the distributed switch.

Dedicated resource pools and VM folder for collecting NSX Advanced Load Balancer
VMs.

A shared datastore with sufficient capacity for the control plane and worker node VM
files.

Network Time Protocol (NTP) service running on all hosts and vCenter.

A user account with Modify cluster-wide configuration permissions.

NSX Advanced Load Balancer 22.1.2 OVA downloaded from customer connect portal and
readily available for deployment.

Note

Tanzu Kubernetes Grid nodes will unable to resolve hostname with the “.local”
domain suffix. For more information, see KB article.

For additional information on general prerequisites, see vSphere with Tanzu product documentation.

Network Requirements
The following table provides example entries for the required port groups. Create network entries
with the port group name, VLAN ID, and CIDRs that are specific to your environment.

DHCP
Network Type Description & Recommendations
Service

NSX ALB Option NSX ALB controllers and SEs will be attached to this network.
Management al Use static IPs for the NSX ALB controllers.
Network

TKG IP Pool Supervisor Cluster nodes will be attached to this network.


Management When an IP Pool is used, ensure that the block has 5 consecutive free IPs.
Network

TKG Workload IP Pool Control plane and worker nodes of TKG Workload Clusters will be attached to this
Network network.

TKG Cluster No Virtual services for Control plane HA of all TKG clusters (Supervisor and Workload).
VIP/Data Reserve sufficient IPs depending on the number of TKG clusters planned to be deployed
Network in the environment, NSX ALB handles IP address management on this network via IPAM.

This document uses the following port groups, subnet CIDRs, and VLANs. Replace these with values
that are specific to your environment. Plan network subnet sizes according to applications need and
future requirement.

Gateway DHCP
Network Type Port Group Name VLAN IP Pool for SE/VIP in NSX ALB
CIDR Enabled

NSX ALB sfo01-w01-vds01- 1680 192.168.1 No 192.168.10.14 - 192.168.10.30


Management albmanagement 0.1/27
Network

VMware, Inc 351


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Gateway DHCP
Network Type Port Group Name VLAN IP Pool for SE/VIP in NSX ALB
CIDR Enabled

TKG Management sfo01-w01-vds01- 1681 192.168.4 Yes No


Network tkgmanagement 0.1/28

TKG Workload sfo01-w01-vds01- 1682 192.168.6 Yes No


Network01 tkgworkload 0.1/24

TKG VIP Network sfo01-w01-vds01- 1683 192.168.8 No SE Pool:192.168.80.2 -


tkgclustervip 0.1/26 192.168.80.20
TKG Cluster VIP Range:
192.168.80.21 - 192.168.80.60

Firewall Requirements
Ensure that the firewall is set up as described in Firewall Requirements.

Resource Pools
Ensure that resource pools and folders are created in vCenter. The following table shows a sample
entry for the resource pool and folder. Customize the resource pool and folder name for your
environment.

Resource Type Resource Pool name Sample Folder name

NSX ALB Components tkg-vsphere-alb-components tkg-vsphere-alb-components

Deployment Overview
Here are the high-level steps for deploying Tanzu Kubernetes operations on vSphere networking
backed by VDS:

1. Deploy and Configure NSX Advanced Load Balancer

2. Deploy Tanzu Kubernetes Grid Supervisor Cluster

3. Create and Configure vSphere Namespaces

4. Register Supervisor Cluster with Tanzu Mission Control

5. Deploy Tanzu Kubernetes Clusters (Workload Clusters)

6. Integrate Tanzu Kubernetes Clusters with Tanzu Observability

7. Integrate Tanzu Kubernetes Clusters with Tanzu Service Mesh

8. Deploy User-Managed Packages on Tanzu Kubernetes Grid Clusters

Note

Starting with vSphere 8, when you enable vSphere with Tanzu, you can configure
either one-zone Supervisor mapped to one vSphere cluster or three-zone
Supervisor mapped to three vSphere clusters. This document covers One-Zone
supervisor deployment with VDS Networking and NSX Advanced Load Balancer.
Requirements for Cluster Supervisor Deployment with NSX Advanced Load Balancer

VMware, Inc 352


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

and VDS Networking.

Deploy and Configure NSX Advanced Load Balancer


NSX Advanced Load Balancer is an enterprise-grade integrated load balancer that provides L4-L7
load balancer support. VMware recommends deploying NSX Advanced Load Balancer for vSphere
deployments without NSX-T, or when there are unique scaling requirements.

NSX Advanced Load Balancer is deployed in write access mode in the vSphere environment. This
mode grants NSX Advanced Load Balancer controllers full write access to the vCenter. Full write
access allows automatically creating, modifying, and removing Service Engines and other resources
as needed to adapt to changing traffic needs.

For a production-grade deployment, VMware recommends deploying three instances of the NSX
Advanced Load Balancer controller for high availability and resiliency.

The following table provides a sample IP address and FQDN set for the NSX Advanced Load
Balancer controllers:

Controller Node IP Address FQDN

Node01 (Primary) 192.168.10.3 sfo01albctlr01a.sfo01.rainpole.vmw

Node02 (Secondary) 192.168.10.4 sfo01albctlr01b.sfo01.rainpole.vmw

Node03 (Secondary) 192.168.10.5 sfo01albctlr01c.sfo01.rainpole.vmw

Controller Cluster 192.168.10.2 sfo01albctlr01.sfo01.rainpole.vmw

Deploy NSX Advance Load Balancer Controller Node


Do the following to deploy NSX Advanced Load Balancer controller node:

1. Log in to the vCenter Server by using the vSphere Client.

2. Select the cluster where you want to deploy the NSX Advanced Load Balancer controller
node.

3. Right-click on the cluster and invoke the Deploy OVF Template wizard.

4. Follow the wizard to configure the following:

VM Name and Folder Location.

Select the tkg-vsphere-alb-components resource pool as a compute resource.

Select the datastore for the controller node deployment.

Select the sfo01-w01-vds01-albmanagement port group for the Management


Network.

Customize the configuration by providing Management Interface IP Address, Subnet


Mask, and Default Gateway. The remaining fields are optional and can be left blank.

Complete the configuration and deploy NSX Advanced Load Balancer controller node.

For more information, see the product documentation Deploy the Controller.

VMware, Inc 353


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Configure the Controller Node for your vSphere with Tanzu


Environment
After the controller VM is deployed and powered-on, configure the controller VM for your vSphere
with Tanzu environment. The controller requires several post-deployment configuration parameters.

On a browser, go to https://https://sfo01albctlr01a.sfo01.rainpole.vmw/.

1. Configure an Administrator Account by setting up a password and optionally, an email


address.

2. Configure System Settings by specifying the backup passphrase and DNS information.

3. (Optional) Configure Email/SMTP.

VMware, Inc 354


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

4. Configure Multi-Tenant settings as follows:

IP Route Domain: Share IP route domain across tenants.

Service Engine Context: Service Engines are managed within the tenant context, not
shared across tenants.

1. Click on Save to finish the post-deployment configuration wizard.

If you did not select the Setup Cloud After option before saving, the initial configuration wizard exits.
The Cloud configuration window does not automatically launch and you are directed to a Dashboard
view on the controller.

Configure Default-Cloud
1. Navigate to Infrastructure > Clouds and edit Default-Cloud.

2. Select VMware vCenter/vSphere ESX as the infrastructure type and click Next.

VMware, Inc 355


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

3. On the vCenter/vSphere tab, click SET CREDENTIALS and configure the following:

vCenter Address: vCenter IP address or fqdn.

vCenter Credentials: Username/password of the vCenter account to use for NSX ALB
integration.

Access Permission: Write

4. Select the Data Center and configure the following:

Select the vSphere Data Center where you want to enable Workload Management.

Select Content library which holds tanzu kubernetes release ova templates.

VMware, Inc 356


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Click SAVE & RELAUNCH

1. Configure the Network settings as follows:

Select the sfo01-w01-vds01-albmanagement as Management Network. This network


interface is used by the Service Engines to connect with the controller.

IP Address Management for Management Network: Select DHCP Enabled if DHCP


is available on the vSphere port groups.

If DHCP is not available, enter the IP Subnet, IP address range (Add Static IP
Address Pool), Default Gateway for the Management Network, then click Save.

VMware, Inc 357


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

2. Ensure that the health of the Default-Cloud is green after configuration.

Configure Licensing
Tanzu for Kubernetes Operations requires an NSX Advanced Load Balancer Enterprise license.

1. Go to Administration > Licensing.

2. Click gear icon.

3. Select Enterprise Tier

VMware, Inc 358


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

4. Provide the license key and click Apply Key.

Configure NTP Settings


Configure NTP settings if you want to use an internal NTP server.

1. Go to the Administration > Settings > DNS/NTP page.

2. Click the pencil icon on the upper right corner to enter edit mode.

3. . On the Update System Settings dialog, edit the settings for the NTP server that you want

VMware, Inc 359


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

to use.

1. Click Save to save the settings.

Deploy NSX Advanced Load Balancer Controller Cluster


In a production environment, VMware recommends that you deploy additional controller nodes and
configure the controller cluster for high availability and disaster recovery.

To run a three-node controller cluster, you deploy the first node and perform the initial
configuration, and set the Cluster IP. After that, you deploy and power on two more controller VMs.
However, do not run the initial configuration wizard or change the administrator password for the two
additional controllers VMs. The configuration of the first controller VM is assigned to the two new
controller VMs.

To configure the Controller cluster:

1. Navigate to Administration > Controller

VMware, Inc 360


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

2. Select Nodes and click Edit.

3. Specify a name for the controller cluster and set the cluster IP address. This IP address
should be from the NSX Advanced Load Balancer management network.

4. In Cluster Nodes, specify the IP addresses of the two additional controllers that you have
deployed.

Leave the name and password fields empty.

5. Click Save.

The controller cluster setup starts. The controller nodes are rebooted in the process. It takes
approximately 10-15 minutes for cluster formation to complete.

You are automatically logged out of the controller node you are currently logged in. Enter the
cluster IP address in a browser to see the cluster formation task details.

The first controller of the cluster receives the “Leader” role. The second and third controllers will
work as “Follower”.

After the controller cluster is deployed, use the controller cluster IP address for doing any additional

VMware, Inc 361


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

configuration. Do not use the individual controller node IP address.

For additional product documentation, see Deploy a Controller Cluster.

Change NSX Advanced Load Balancer Portal Default Certificate


The controller must send a certificate to clients to establish secure communication. This certificate
must have a Subject Alternative Name (SAN) that matches the NSX Advanced Load Balancer
controller cluster hostname or IP address.

The controller has a default self-signed certificate. But this certificate does not have the correct SAN.
You must replace it with a valid or self-signed certificate that has the correct SAN. You can create a
self-signed certificate or upload a CA-signed certificate.

Note

This document makes use of a self-signed certificate.

To replace the default certificate:

1. Navigate to the Templates > Security > SSL/TLS Certificate > and click Create and select
Controller Certificate.

2. The New Certificate (SSL/TLS) window appears. Enter a name for the certificate.

To add a self-signed certificate:

1. For Type select Self Signed and enter the following details:

Common Name: Specify the fully-qualified name of the site


sfo01albctlr01.sfo01.rainpole.vmw. For the site to be considered trusted, this entry
must match the hostname that the client entered in the browser.

Subject Alternate Name (SAN): Enter the cluster IP address or FQDN of the
controller cluster and all controller nodes.

Algorithm: Select either EC or RSA.

Key Size

2. Click Save.

VMware, Inc 362


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

3. Change the NSX Advanced Load Balancer portal certificate.

Navigate to the Administration > Settings > Access Settings.

Clicking the pencil icon to edit the access settings.

Verify that Allow Basic Authentication is enabled.

From SSL/TLS Certificate, remove the existing default portal certificates

From the drop-down list, select the newly created certificate

Click Save.

VMware, Inc 363


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

For additional product documentation, see Assign a Certificate to the Controller.

Export NSX Advanced Load Balancer Certificate


You need the newly created certificate when you configure the Supervisor cluster to enable
Workload Management.

To export the certificate, navigate to the Templates > Security > SSL/TLS Certificate page and
export the certificate by clicking Export.

On the Export Certificate page, click Copy to clipboard against the certificate. Do not copy the key.
Save the copied certificate to use when you enable workload management.

Configure a Service Engine Group


vSphere with Tanzu uses the Default Service Engine Group. Ensure that the HA mode for the
default-Group is set to N + M (buffer).

Optionally, you can reconfigure the Default-Group to define the placement and number of Service
Engine VMs settings.

This document uses the Default Service Engine Group without modification.

For more information, see the product documentation Configure a Service Engine Group.

Configure a Virtual IP Subnet for the Data Network


You can configure the virtual IP (VIP) range to use when a virtual service is placed on the specific
VIP network. You can configure DHCP for the Service Engines.

Optionally, if DHCP is unavailable, you can configure a pool of IP addresses to assign to the Service
Engine interface on that network.

This document uses an IP pool for the VIP network.

To configure the VIP network:

VMware, Inc 364


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

1. Navigate to Infrastructure > Cloud Resources > Networks and locate the network that
provides the virtual IP addresses.

2. Click the edit icon to edit the network settings.

3. Click Add Subnet.

4. In IP Subnet, specify the VIP network subnet CIDR.

5. Click Add Static IP Address Pool to specify the IP address pool for the VIPs and Service
Engine. The range must be a subset of the network CIDR configured in IP Subnet.

6. Click Save to close the VIP network configuration wizard.

For more information, see the product documentation Configure a Virtual IP Network.

Configure Default Gateway


A default gateway enables the service engine to route traffic to the pool servers on the Workload
Network. You must configure the VIP Network gateway IP address as the default gateway.

To configure the default gateway:

1. Navigate to Infrastructure > Cloud Resources > VRF Context

2. Click Create.

3. Under Static Route, click ADD

4. In Gateway Subnet, enter 0.0.0.0/0

5. In Next Hop, enter the gateway IP address of the VIP network.

6. Click Save.

VMware, Inc 365


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

For additional product documentation, see Configure Default Gateway

Configure IPAM and DNS Profile


IPAM is required to allocate virtual IP addresses when virtual services get created. Configure IPAM
for the NSX Advanced Load Balancer controller and assign it to the Default-Cloud.

1. Navigate to the Templates > Profiles > IPAM/DNS Profiles.

2. Click Create and select IPAM Profile from the dropdown menu.

3. Enter the following to configure the IPAM profile:

A name for the IPAM Profile.

Select type as AVI Vantage IPAM.

Deselect the Allocate IP in VRF option.

4. Click Add Usable Network.

Select Default-Cloud.

Choose the VIP network that you have created in Configure a Virtual IP Subnet for
the Data Network.

VMware, Inc 366


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

5. Click Save.

6. Click on the Create button again and select DNS Profile

Provide a name for the profile.

Add your domain name under Domain Name

(Optionally) set the TTL for the domain.

VMware, Inc 367


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

7. Assign the IPAM and DNS profile to the Default-Cloud configuration.

Navigate to the Infrastructure > Cloud

Edit the Default-Cloud configuration as follows:


IPAM Profile: Select the newly created profile.

DNS Profile: Select the newly created profile.

Click Save

VMware, Inc 368


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

8. Verify that the status of the Default-Cloud configuration is green.

For additional product documentation, see Configure IPAM.

Deploy Tanzu Kubernetes Grid Supervisor Cluster


As a vSphere administrator, you enable a vSphere cluster for Workload Management by creating a
Supervisor Cluster. After you deploy the Supervisor Cluster, you can use the vSphere Client to
manage and monitor the cluster.

Before deploying the Supervisor Cluster, ensure the following:

You have created a vSphere cluster with at least three ESXi hosts. If you are using vSAN you
need a minimum of four ESXi hosts.

The vSphere cluster is configured with shared storage such as vSAN.

The vSphere cluster has HA & DRS enabled and DRS is configured in the fully-automated
mode.

The required port groups have been created on the distributed switch to provide networking
to the Supervisor and workload clusters.

Your vSphere cluster is licensed for Supervisor Cluster deployment.

You have created a Subscribed Content Library to automatically pull the latest Tanzu
Kubernetes releases from the VMware repository.

VMware, Inc 369


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

You have created a storage policy that will determine the datastore placement of the
Kubernetes control plane VMs, containers, and images.

A user account with Modify cluster-wide configuration permissions is available.

NSX Advanced Load Balancer is deployed and configured as per instructions provided
earlier.

To deploy the Supervisor Cluster:

1. Log in to the vSphere client and navigate to Menu > Workload Management and click Get
Started.

2. Select the vCenter Server and Networking stack.

Select a vCenter server system.

Select vSphere Distributed Switch (VDS) for the networking stack.

3. Select CLUSTER DEPLOYMENT and a cluster from the list of compatible clusters and click

VMware, Inc 370


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Next.

4. Select the Control Plane Storage Policy for the nodes from the drop-down menu and click
Next.

5. On the Load Balancer screen, select Load Balancer Type as NSX Advanced Load Balancer
and provide the following details:

Name: Friendly name for the load balancer. Only small letters are supported in the
name field.

NSX Advanced Load Balancer Controller IP: If the NSX Advanced Load Balancer
self-signed certificate is configured with the hostname in the SAN field, use the same
hostname here. If the SAN is configured with an IP address, provide the controller
cluster IP address. The default port of NSX Advanced Load Balancer is 443.

NSX Advanced Load Balancer Credentials: Provide the NSX Advanced Load
Balancer administrator credentials.

Server Certificate: Use the content of the controller certificate that you exported
earlier while configuring certificates for the controller.

VMware, Inc 371


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

6. Click Next.

7. On the Management Network screen, select the port group that you created on the
distributed switch and provide the required networking details.

If DHCP is enabled for the port group, set the Network Mode to DHCP.

Ensure that the DHCP server is configured to hand over the DNS server address, DNS
search domain, and NTP server address via DHCP.

8. Click Next.

9. On the Workload Network screen, select the network that will handle the networking traffic
for Kubernetes workloads running on the Supervisor cluster.

Set the Network Mode to DHCP if the port group is configured for DHCP.

VMware, Inc 372


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

10. Click Next.

11. On the Review and Confirm screen, select the size for the Kubernetes control plane VMs
that are created on each host from the cluster. For production deployments, VMware
recommends a large form factor.

12. Click Finish. This triggers the Supervisor Cluster deployment.

VMware, Inc 373


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

The Workload Management task takes approximately 30 minutes to complete. After the task
completes, three Kubernetes control plane VMs are created on the hosts that are part of the
vSphere cluster.

The Supervisor Cluster gets an IP address from the VIP network that you configured in the NSX
Advanced Load Balancer. This IP address is also called the Control Plane HA IP address.

In the backend, three supervisor Control Plane VMs are deployed in the vSphere namespace. A
Virtual Service is created in the NSX Advanced Load Balancer with three Supervisor Control Plane
nodes that are deployed in the process.

For additional product documentation, see Enable Workload Management with vSphere Networking.

Download and Install the Kubernetes CLI Tools for vSphere


You can use Kubernetes CLI Tools for vSphere to view and control vSphere with Tanzu namespaces
and clusters.

The Kubernetes CLI Tools download package includes two executables: the standard open-source
kubectl and the vSphere Plugin for kubectl. The vSphere Plugin for kubectl extends the commands
available to kubectl so that you connect to the Supervisor Cluster and to Tanzu Kubernetes clusters
using vCenter Single Sign-On credentials.

To download the Kubernetes CLI tool, connect to the URL https://<control-plane-vip>/

For additional product documentation, see Download and Install the Kubernetes CLI Tools for
vSphere.

Connect to the Supervisor Cluster


After installing the CLI tool of your choice, connect to the Supervisor Cluster by running the
following command:

kubectl vsphere login [email protected] --server=<control


-plane-vip>

VMware, Inc 374


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

The command prompts for the vSphere administrator password.

After your connection to the Supervisor Cluster is established you can switch to the Supervisor
context by running the command:

kubectl config use-context <supervisor-context-name>

where the <supervisor-context-name> is the IP address of the control plane VIP.

Create and Configure vSphere Namespaces


A vSphere Namespace is a tenancy boundary within vSphere with Tanzu and allows for sharing
vSphere resources (computer, networking, storage) and enforcing resources limits with the
underlying objects such as Tanzu Kubernetes Clusters. It also allows you to attach policies and
permissions.

Every workload cluster that you deploy runs in a Supervisor namespace. To learn more about
namespaces, see the vSphere with Tanzu documentation

To create a new Supervisor namespace:

1. Log in to the vSphere Client.

2. Navigate to Home > Workload Management > Namespaces.

3. Click Create Namespace.

4. Select the Cluster that is enabled for Workload Management.

5. Enter a name for the namespace and select the workload network for the namespace.

Note

The Name field accepts only lower case letters and hyphens.

6. Click Create. The namespace is created on the Supervisor Cluster.

VMware, Inc 375


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

For additional product documentation, see Create and Configure a vSphere Namespace.

Configure Permissions for the Namespace


To access a namespace, you have to add permissions to the namespace. To configure permissions,
click on the newly created namespace, navigate to the Summary tab, and click Add Permissions.

Choose the Identity source, search for the User/Group that will have access to the namespace, and
define the Role for the selected User/Group.

VMware, Inc 376


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Set Persistent Storage to the Namespace


Certain Kubernetes workloads require persistent storage to store data permanently. Storage policies
that you assign to the namespace control how persistent volumes and Tanzu Kubernetes cluster
nodes are placed within datastores in the vSphere storage environment.

To assign a storage policy to the namespace, on the Summary tab, click Add Storage.

From the list of storage policies, select the appropriate storage policy and click OK.

VMware, Inc 377


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

After the storage policy is assigned to a namespace, vSphere with Tanzu creates a matching
Kubernetes storage class in the vSphere Namespace.

Specify Namespace Capacity Limits


When initially created, the namespace has unlimited resources within the Supervisor Cluster. The
vSphere administrator defines the limits for CPU, memory, storage, as well as the number of
Kubernetes objects that can run within the namespace. These limits are configured for each vSphere
Namespace.

To configure resource limitations for the namespace, on the Summary tab, click Edit Limits for
Capacity and Usage.

VMware, Inc 378


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

The storage limit determines the overall amount of storage that is available to the namespace.

Associate VM Class with Namespace


The VM class is a VM specification that can be used to request a set of resources for a VM. The VM
class defines parameters such as the number of virtual CPUs, memory capacity, and reservation
settings.

vSphere with Tanzu includes several default VM classes and each class has two editions: guaranteed
and best effort. A guaranteed edition fully reserves the resources that a VM specification requests. A
best-effort class edition does not and allows resources to be overcommitted.

More than one VM Class can be associated with a namespace. To learn more about VM classes, see
the vSphere with Tanzu documentation.

To add a VM class to a namespace,

1. Click Add VM Class for VM Service.

VMware, Inc 379


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

2. From the list of the VM Classes, select the classes that you want to include in your
namespace.

VMware, Inc 380


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

3. Click Ok.

The namespace is fully configured now. You are ready to register your supervisor cluster with Tanzu
Mission Control and deploy your first Tanzu Kubernetes Cluster.

Register Supervisor Cluster with Tanzu Mission Control


Tanzu Mission Control is a centralized management platform for consistently operating and securing
your Kubernetes infrastructure and modern applications across multiple teams and clouds.

By integrating Supervisor Cluster with Tanzu Mission Control (TMC) you are provided a centralized
administrative interface that enables you to manage your global portfolio of Kubernetes clusters. It
also allows you to deploy Tanzu Kubernetes clusters directly from Tanzu Mission Control portal and
install user-managed packages leveraging the TMC Catalog feature.

Note

This section uses the terms Supervisor Cluster and management cluster
interchangeably.

Ensure that the following are configured before you integrate Tanzu Kubernetes Grid with Tanzu
Mission Control:

A cluster group in Tanzu Mission Control.

A workspace in the Tanzu Mission Control portal.

Policies that are appropriate for your Tanzu Kubernetes Grid deployment.

A provisioner. A provisioner helps you deploy Tanzu Kubernetes Grid clusters across
multiple/different platforms, such as AWS and VMware vSphere.

Do the following to register the Supervisor cluster with Tanzu Mission Control:

1. Log in to Tanzu Mission Control console.

2. Go to Administration > Management clusters > Register Management Cluster tab and

VMware, Inc 381


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

select vSphere with Tanzu.

3. On the Register management cluster page, provide a name for the management cluster, and
choose a cluster group.

You can optionally provide a description and labels for the management cluster.

4. If you are using a proxy to connect to the Internet, you can configure the proxy settings by
toggling the Set proxy option to yes.

VMware, Inc 382


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

5. On the Register page, Tanzu Mission Control generates a YAML file that defines how the
management cluster connects to Tanzu Mission Control for registration. The credential
provided in the YAML expires after 48 hours.

Copy the URL provided on the Register page. This URL is needed to install the TMC agent
on your management cluster and complete the registration process.

6. Login to vSphere Client and select the cluster which is enabled for Workload Management
and navigate to the Workload Management > Supervisors > sfo01w01supervisor01 >
Configure > Tanzu Mission Control and enter the registration URL in the box provided and
click Register.

When the Supervisor Cluster is registered with Tanzu Mission Control, the TMC agent is
installed in the svc-tmc-cXX namespace, which is included with the Supervisor Cluster by
default.

Once the TMC agent is installed on the Supervisor cluster and all pods are running in the
svc-tmc-cXX namespace, the registration status shows “Installation successful”.

VMware, Inc 383


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

7. Return to the Tanzu Mission Control console and click Verify Connection.

8. Clicking View Management Cluster takes you to the overview page which displays the health
of the cluster and its components.

After installing the agent, you can use the Tanzu Mission Control web interface to provision
and manage Tanzu Kubernetes clusters.

For additional product documentation, see Integrate the Tanzu Kubernetes Grid Service on the
Supervisor Cluster with Tanzu Mission Control.

VMware, Inc 384


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Deploy Tanzu Kubernetes Clusters (Workload Cluster)


After Supervisor Cluster is registered with Tanzu Mission Control, deployment of the Tanzu
Kubernetes clusters can be done in just a few clicks. The procedure for creating Tanzu Kubernetes
clusters is shown below.

1. Go to the Clusters tab and click Create Cluster.

2. On the Create cluster page, select the Supervisor cluster that you registered in the previous
step and click on Continue to create cluster.

3. Select the provisioner for creating the workload cluster. Provisioner reflects the vSphere
namespaces that you have created and associated with the Supervisor cluster.

VMware, Inc 385


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

4. Enter a name for the cluster. Cluster names must be unique within an organization.

5. Select the cluster group to which you want to attach your cluster and cluster class and click
next. You can optionally enter a description and apply labels.

Note

You cannot change the cluster class after the workload cluster created.

6. On next page, you can optionally specify a proxy configuration to use for this cluster.

VMware, Inc 386


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Note

This document doesn’t cover the use of a proxy for vSphere with Tanzu. If
your environment uses a proxy server to connect to the Internet, ensure the
proxy configuration object includes the CIDRs for the pod, ingress, and
egress from the workload network of the Supervisor Cluster in the No proxy
list, as explained in Create a Proxy Configuration Object for a Tanzu
Kubernetes Grid Service Cluster Running in vSphere with Tanzu.

7. Configure the Cluster network setting

On the configure page, leave the default service domain.

You can optionally define an alternative CIDR for pod and service. The Pod CIDR and
Service CIDR cannot be changed after the cluster is created.

8. Select the control plane.

Select the Kubernetes version to use for the cluster. The latest supported version is
preselected for you. You can choose the appropriate Kubernetes version by clicking
on the down arrow button.

Select OS version.

Select the High Availability mode for the control plane nodes of the workload cluster.
For a production deployment, it is recommended to deploy a highly available
workload cluster.

Select the default storage class for the cluster. The list of storage classes that you can
choose from is taken from your vSphere namespace.

You can optionally select a different instance type for the cluster’s control plane
node. Control plane endpoint and API server port options are not customizable here
as they will be retrieved from the management cluster.

VMware, Inc 387


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

9. Click Next.

10. Configure default volumes

Optionally add storage volumes for the control plane.

You can optionally add storage volumes for node pools.

11. Configure node pool

Specify the number of worker nodes to provision.

Select the instance type.

Optionally select the instance type for worker nodes.

Optionally select the storage class.

OS version for worker nodes.

Click Create Cluster to start provisioning your workload cluster.

VMware, Inc 388


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

click next

12. Additional cluster configuration

Cluster creation approximately takes 15-20 minutes to complete. After the cluster deployment
completes, ensure that Agent and extensions health shows green.

VMware, Inc 389


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Integrate Tanzu Kubernetes clusters with Tanzu Observability


For instructions on enabling Tanzu Observability on your workload cluster, see Set up Tanzu
Observability to Monitor a Tanzu Kubernetes Cluster

Integrate Tanzu Kubernetes Clusters with Tanzu Service


Mesh
For instructions on installing Tanzu Service Mesh on your workload cluster, see Onboard a Tanzu
Kubernetes Cluster to Tanzu Service Mesh

Integrate Tanzu Kubernetes clusters with Tanzu Observability


For instructions on enabling Tanzu Observability on your workload cluster, please see Set up Tanzu
Observability to Monitor a Tanzu Kubernetes Clusters

Deploy User-Managed Packages on Tanzu Kubernetes


clusters
For instructions on how to install user-managed packages on the Tanzu Kubernetes clusters, see
Deploy User-Managed Packages in Workload Clusters.

Self-Service Namespace in vSphere with Tanzu


Typically creating and configuring vSphere namespaces (permissions, limits, etc.) is a task performed
by a vSphere Administrator. But this model doesn’t allow flexibility in a DevOps model. Every time a
developer/cluster-admin needs a new namespace for deploying Kubernetes clusters, the task of
creating a namespace has to be completed by the vSphere Administrator. Only after permissions,
authentication, etc. are configured for the namespace can be consumed.

A self-service namespace is a new feature that is available with vSphere 7.0 U2 and later versions
and allows users with DevOps persona to create and consume vSphere namespaces in a self-service
fashion.

Before a DevOps user can start creating a namespace, the vSphere Administrator must enable
Namespace service on the supervisor cluster; this will build a template that will be used over and

VMware, Inc 390


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

over again whenever a developer requests a new Namespace.

The workflow for enabling Namespace service on the supervisor cluster is as follows:

1. Log in to the vSphere client and select the cluster configured for workload management.

2. Go to the Workload Management> Supervisors > sfo01w01supervisor01 > General >


Namespace Service page and enable the Namespace Service using the toggle button and
setting the status to Active.

3. Configure the quota for the CPU/Memory/Storage, storage policy for the namespace,
Network, VM Classes and Content Library.

VMware, Inc 391


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

4. On the permissions page, select the identity source (AD, LDAP, etc) where you have created
the users and groups for the Developer/Cluster Administrator. On selecting the identity
source, you can search for the user/groups in that identity source.

5. Review the settings and click Finish to complete the namespace service enable wizard.

VMware, Inc 392


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

The Namespace Self-Service is now activated and ready to be consumed.

VMware Tanzu for Kubernetes Operations using vSphere


with Tanzu on NSX Reference Design
vSphere with Tanzu transforms vSphere into a platform for running Kubernetes workloads natively
on the hypervisor layer. When vSphere with Tanzu is enabled on a vSphere cluster, you can run
Kubernetes workloads directly on ESXi hosts and create upstream Kubernetes clusters within
dedicated resource pools.

This document lays out a reference design for deploying VMware Tanzu for Kubernetes Operations
on vSphere with Tanzu enabled. This document does not cover any recommendations or

VMware, Inc 393


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

deployment steps for underlying software-defined data center (SDDC) environments.

The following reference design is based on the architecture and components described in VMware
Tanzu for Kubernetes Operations Reference Architecture.

Supported Component Matrix


The following table provides the component versions and interoperability matrix supported with
reference design. For more information, see VMware Product Interoperability Matrix.

Software Components Version

Tanzu Kubernetes Release 1.23.8

VMware vSphere ESXi 8.0 Update 1a 21813344

VMware vCenter 8.0 Update 1a 21815093

VMware NSX 4.1.0.2

vSphere with Tanzu Components


Supervisor Cluster: When Workload Management is enabled on a vSphere cluster, it
creates a Kubernetes layer within the ESXi hosts that are part of the cluster. A cluster that is
enabled for Workload Management is called a Supervisor Cluster. Workloads are either run
as native pods or as pods on upstream Kubernetes clusters created through the Tanzu
Kubernetes Grid Service.

The Supervisor Cluster runs on top of an SDDC layer that consists of ESXi for compute, NSX
Data Center or vSphere networking, and vSAN or another shared storage solution.

vSphere Namespaces: A vSphere Namespace is a tenancy boundary within vSphere with


Tanzu. A vSphere Namespace allows sharing vSphere resources (computer, networking,
storage) and enforcing resource limits with the underlying objects such as Tanzu Kubernetes

VMware, Inc 394


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

clusters. For each namespace, you configure role-based access control ( policies and
permissions ), images library, and virtual machine classes.

Tanzu Kubernetes Grid Service: Tanzu Kubernetes Grid Service allows you to create and
manage ubiquitous Kubernetes clusters on a VMware vSphere infrastructure using the
Kubernetes Cluster API. The Cluster API provides declarative, Kubernetes-style API’s to
enable the creation, configuration, and management of the Tanzu Kubernetes cluster.
vSphere 8.0 and above supports the ClusterClass API. The ClusterClass API is a collection
of templates that define a cluster topology and configuration.

Tanzu Kubernetes Cluster (Workload Cluster): Tanzu Kubernetes clusters are Kubernetes
workload clusters in which your application workloads run. These clusters can be attached to
SaaS solutions such as Tanzu Mission Control, Tanzu Observability, and Tanzu Service Mesh,
which are part of Tanzu for Kubernetes Operations.

VM Class in vSphere with Tanzu: A VM class is a template that defines CPU, memory, and
reservations for VMs. VM classes are used for VM deployment in a Supervisor Namespace.
VM classes can be used by standalone VMs that run in a Supervisor Namespace, and by VMs
hosting a Tanzu Kubernetes cluster.

VM Classes in a vSphere with Tanzu are categorized into the following two groups:

Guaranteed: This class fully reserves its configured resources.

Best-effort: This class allows to be overcommitted.

vSphere with Tanzu offers several default VM classes. You can either use the default VM
classes, or create customized VM classes based on the requirements of the application. The
following table explains the default VM classes that are available in vSphere with Tanzu:

Class CPU Memory(GB) Reserved CPU and Memory

best-effort-xsmall 2 2 No

best-effort-small 2 4 No

best-effort-medium 2 8 No

best-effort-large 4 16 No

best-effort-xlarge 4 32 No

best-effort-2xlarge 8 64 No

best-effort-4xlarge 16 128 No

best-effort-8xlarge 32 128 No

guaranteed-xsmall 2 2 Yes

guaranteed-small 2 4 Yes

guaranteed-medium 2 8 Yes

guaranteed-large 4 16 Yes

guaranteed-xlarge 4 32 Yes

guaranteed-2xlarge 8 64 Yes

VMware, Inc 395


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Class CPU Memory(GB) Reserved CPU and Memory

guaranteed-4xlarge 16 128 Yes

guaranteed-8xlarge 32 128 Yes

Storage Classes in vSphere with Tanzu: A StorageClass allows the administrators to describe
the classes of storage that they offer. Different storage classes can map to meet quality-of-
service levels, to backup policies, or to arbitrary policies determined by the cluster
administrators. The policies representing datastore can manage storage placement of such
components and objects as control plane VMs, vsphere Pod ephemeral disks, and container
images. You might need policies for storage placement of persistent volumes and VM
content libraries.

You can deploy vSphere with Tanzu with an existing default storage class or the vSphere
administrator can define storage class objects (Storage policy) that let cluster users
dynamically create PVC and PV objects with different storage types and rules.

The following table provides recommendations for configuring VM Classes/Storage Classes in


a vSphere with Tanzu environment.

Decision Design
Design Justification Design Implications
ID Decision

TKO- Create custom To provide different levels of Default Storage Policy might not be
TKGS- Storage QoS and SLA for prod and adequate if deployed applications have
001 Classes/Profiles dev/test K8s workloads. different performance and availability
/Policies requirements.

TKO- Create custom To facilitate deployment of Default VM Classes in vSphere with Tanzu are
TKGS- VM Classes K8s workloads with specific not adequate to run a wide variety of K8s
002 compute/storage workloads.
requirements.

vSphere Pods: vSphere with Tanzu introduces a new construct that is called vSphere Pod,
which is the equivalent of a Kubernetes pod. A vSphere Pod is a Kubernetes Pod that runs
directly on an ESXi host without requiring a Kubernetes cluster to be deployed. vSphere
Pods are designed to be used for common services that are shared between workload
clusters, such as a container registry.

A vSphere Pod is a VM with a small footprint that runs one or more Linux containers. Each
vSphere Pod is sized precisely for the workload that it accommodates and has explicit
resource reservations for that workload. It allocates the exact amount of storage, memory,
and CPU resources required for the workload to run. vSphere Pods are only supported with
Supervisor Clusters that are configured with NSX Data Center as the networking stack.

Identity and Access Management


vSphere with Tanzu supports the following two identity providers:

vCenter Single Sign-On: This is the default identity provider that is used to authenticate with
vSphere with Tanzu environment, including the Supervisors and Tanzu Kubernetes Grid
Clusters. vCenter Single Sign-On (SSO) provides authentication for vSphere infrastructure
and can integrate with AD/LDAP systems.

VMware, Inc 396


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

To authenticate using vCenter SSO, use vSphere plug-in for kubectl. Once authenticated,
use kubectl to declaratively provision and manage the lifecycle of TKG clusters, deploy TKG
cluster workloads.

External Identity Provider: You can configure a Supervisor with an external identity provider
and support the OpenID Connect protocol. Once connected, the Supervisor functions as an
OAuth 2.0 client, and uses the Pinniped authentication service to connect to Tanzu
Kubernetes Grid clusters by using the Tanzu CLI. Each Supervisor instance can support one
external identity provider. For more information about the list of supported OIDC providers,
see Configure an External IDP.

The Tanzu Kubernetes Grid (informally known as TKG) cluster permissions are set and scoped at the
vSphere Namespace level. When permissions are set for Namespace, including identity source,
users & groups, and roles, all these permissions apply to any TKG cluster deployed within that
vSphere Namespace.

Roles and Permissions


TKG Clusters supports the following three roles: - Viewer - Editor - Owner

These permissions are assigned and scoped at vSphere Namespace.

Permission Description

Can view Read-only access to TKG clusters provisioned in that vSphere Namespace.

Can edit Create, read, update, and delete TKG clusters in that vSphere Namespace.

Owner Can administer TKG clusters in a vSphere Namespace, and can create and delete additional vSphere
Namespaces using kubectl.

vSphere with Tanzu Architecture


The Supervisor Cluster consists of the following components:

Supervisor control plane VM: Three Supervisor control plane VMs in total are created on
the hosts that are part of the Supervisor Cluster. The three control plane VMs are load
balanced as each one of them has its own IP address. Additionally, a floating IP address is
assigned to one of the VMS and a fifth IP address is reserved for patching purposes. vSphere
DRS determines the exact placement of the control plane VMs on the ESXi hosts part of the
cluster and migrate them when needed.

Tanzu Kubernetes Grid and Cluster API: Modules running on the Supervisor and enable the
provisioning and management of Tanzu Kubernetes Grid clusters.

Virtual Machine Service: A module that is responsible for deploying and running stand-
alone VMs, and VMs that makeup the Tanzu Kubernetes Grid clusters.

The following diagram shows the general architecture of the Supervisor Cluster.

VMware, Inc 397


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

After a Supervisor Cluster is created, the vSphere administrator creates vSphere namespaces. When
initially created, vSphere namespaces have unlimited resources within the Supervisor Cluster. The
vSphere administrator defines the limits for CPU, memory, and storage, as well as the number of
Kubernetes objects such as deployments, replica sets, persistent volumes that can run within the
namespace. These limits are configured for each vSphere namespace.

For more information about the maximum supported number, see the vSphere with Tanzu
Configuration Maximums guide.

To provide tenants access to namespaces, the vSphere administrator assigns permission to users or
groups available within an identity source that is associated with vCenter SSO.

Once the permissions are assigned, the tenants can access the namespace to create Tanzu
Kubernetes clusters using the YAML file and Cluster API.

vSphere with Tanzu Storage


vSphere with Tanzu integrates with shared datastores available in the vSphere infrastructure. The
following types of shared datastores are supported:

VMware, Inc 398


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

vSAN

VMFS

NFS

vVols

vSphere with Tanzu uses storage policies to integrate with shared datastores. The policies represent
datastores and manage the storage placement of objects such as control plane VMs, container
images, and persistent storage volumes.

Before you enable vSphere with Tanzu, create storage policies to be used by the Supervisor Cluster
and namespaces. Depending on your vSphere storage environment, you can create several storage
policies to represent different classes of storage.

vSphere with Tanzu is agnostic about which storage option you choose. For Kubernetes stateful
workloads, vSphere with Tanzu installs the vSphere Container Storage Interface (vSphere CSI) to
automatically provision Kubernetes persistent volumes for pods.

Networking for vSphere with Tanzu


You can enable Vsphere with Tanzu in the following environments:

vSphere backed with NSX Data Center Networking.

vSphere backed with virtual Distributed Switch (VDS) Networking and HA proxy to provide
Load Balancing capabilities.

vSphere backed with virtual Distributed Switch (VDS) Networking and NSX Advanced Load
Balancer to provide Load Balancing capabilities.

Note

The scope of this document is limited to VMware NSX Data Center Networking.

NSX provides network connectivity to the objects inside the Supervisor and external networks.
Connectivity to the ESXi hosts within the cluster is backed by VLAN backed port groups.

The following diagram shows a general overview of vSphere with Tanzu on NSX Networking.

VMware, Inc 399


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

The Supervisor cluster configured with NSX Networking either uses a distributed port group
(routable to required infrastructure components such as vCenter, NSX manager, DNS , NTP and so
on. For more information, see Firewal Recommendation) or to NSX segment to provide connectivity
to Kubernetes control plane VMs. Tanzu Kubernetes clusters and vSphere Pods have their
networking provided by NSX segments. All hosts from the cluster, which are enabled for vSphere
with Tanzu, are connected to the distributed switch that provides connectivity to Kubernetes
workloads and control plane VMs.

The following section explains the networking components and services included in the Supervisor
cluster:

NSX Container Plugin (NCP) provides integration between NSX and Kubernetes. The main
component of NCP runs in a container and communicates with the NSX manager and with

VMware, Inc 400


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

the Kubernetes control plane. NCP monitors changes to containers and other resources and
manages resources such as logical ports, segments, routers, and security groups for the
containers by calling the NSX API.

By default, NCP creates one shared tier-1 gateway for system namespaces, and a tier-1
gateway and load balancer for each namespace. The tier-1 gateway for namespace is
connected to the tier-0 gateway and a default segment.

System namespaces are the namespaces that are used by the core components that are
integral to functioning of the Supervisor and Tanzu Kubernetes Grid clusters. The shared
network resources that include the tier-1 gateway, load balancer, and SNAT IP are grouped
in a system namespace.

NSX Edge provides connectivity from external networks to the Supervisor resources. An
NSX edge cluster normally includes at least two Edge nodes and has a load balancer that
provides a redundancy to the Kube-API servers residing on control plane VMs and any
application that must be published and be accessible from outside the Supervisor cluster. For
more information, see Install and Configure NSX for vSphere with Tanzu.

A tier-0 gateway is associated with the NSX Edge cluster to provide routing to the external
network. The uplink interfaces use either the dynamic routing, BGP, or static routing.

Workloads running in vSphere Pods, regular VMs, or Tanzu Kubernetes clusters, that are in
the same namespace, share a same SNAT IP for North-South connectivity.

Workloads running in vSphere Pods or Tanzu Kubernetes clusters will have the same
isolation rule that is implemented by the default firewall.

A separate SNAT IP is not required for each Kubernetes namespace. East west connectivity
between namespaces does not require SNAT.

The segments for each namespace reside on the vSphere Distributed Switch (VDS)
functioning in Standard mode that is associated with the NSX Edge cluster. The segment
provides an overlay network to the Supervisor Cluster.

Each vSphere namespace has a separate network and set of networking resources shared by
applications inside the namespace, such as tier-1 gateway, load balancer service, and SNAT
IP address.

Workloads running in Tanzu Kubernetes Grid clusters have the same isolation rule that is
implemented by the default firewall.

NSX LB provides
L4 Load Balancer service for Kube-API to the Supervisor cluster and Workload
clusters.

L4 Load Balancer service for all services of type LoadBalancer deployed in Workload
clusters.

Network Requirements
The following table lists the required networks for the reference design:

Note

VMware, Inc 401


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Based on your business requirements, modify subnet range to fit the projected
growth.

Network Type Sample Recommendation Description

Supervisor /28 to allow for 5 IPs and Network to host the supervisor VMs. It can be a VLAN backed
Management future expansion. VDS Port group or pre-created NSX segment.
Network

Ingress IP range /24, 254 address Each service type Load Balancer deployed will consume 1 IP
address.

Egress IP range /27 Each vSphere namespace consumes 1 IP address for the SNAT
egress.

Namespace/POD /20 Allocate IP address to workload attached to supervisor


network CIDR By default, it is used in /28 namespace segments.
blocks by workload.

Supervisor Service /24 Network from which IPs for Kubernetes ClusterIP Service will
CIDR be allocated.

Firewall Recommendations
To prepare the firewall, you need the following information:

1. Supervisor network (Tanzu Kubernetes Grid Management) CIDR

2. Tanzu Kubernetes Grid workload cluster CIDR

3. Ingress and Egress range

4. Client machine IP address

5. vCenter server IP address

6. NSX Manager IP address

7. VMware Harbor registry IP address

8. DNS server IP address(es)

9. NTP server IP address(es)

The following table provides a list of firewall rules based on the assumption that there is no firewall
within a subnet or VLAN.

Source Destination Protocol:Port Description

vCenter Supervisor Network TCP:6443 Allows vCenter to manage the supervisor VMs.

vCenter Supervisor Network TCP:22 Allows platform administrators to connect to VMs


through vCenter.

Supervisor NSX Manager TCP:443 Allows supervisor to access NSX Manager to orchestrate
Network networking.

VMware, Inc 402


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Source Destination Protocol:Port Description

Supervisor vCenter Allows supervisor to access vCenter to create VMs and


TCP:6443
Network Storage Volumes.
TCP:443

Supervisor ESXi Hosts TCP:10250 Supervisor Cluster to Spherelet ESXi hosts.


Network

ESXi Hosts Supervisor IP TCP:6443 Spherelet ESXi hosts to Supervisor Cluster.


Addresses

Supervisor DNS Servers DNS


TCP:53
Network
UDP:53

Supervisor NTP Servers UDP:123 NTP


Network

Supervisor Workload Network TCP:6443 GCM, VMOperator needs to communicate with TKC
Network apiserver.

Supervisor *.tmc.cloud.vmware TCP:443 TMC Connectivity


Network .com

Egress IP Range DNS Servers DNS


TCP:53

UDP:53

Egress IP Range NTP Servers UDP:123 NTP

Jumpbox vCenter Management


TCP:22

TCP:443

Jumpbox NSX TCP:443 Management

Jumpbox Supervisor Network Management


TCP:22

TCP:6443

Jumpbox Workload Network

Jumpbox Ingress IP pool Management


TCP:443

TCP:6443

Jumpbox Web proxy TCP:TBC Settings depend on proxy

Jumpbox Git Server Version Control


TCP:443

TCP:22

Jumpbox Ingress IP Range Management


TCP:443

TCP:6443

VMware, Inc 403


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Source Destination Protocol:Port Description

Platform Jumpbox TCP:22 Management


Admins

Kubernetes Ingress IP Range


TCP:443 Management.
users
You can further restrict to individual IPs for cluster access.
TCP:6443

Note

For Tanzu Mission Control (TMC), if the firewall does not allow wildcards, you just
whitelist all IP addresses of [account].tmc.cloud.vmware.com and extensions.aws-
usw2.tmc.cloud.vmware.com.

Network Segmentation
By default, when vSphere namespaces are created, distributed firewall rules are added to block all
access to VMs from sources outside the namespace, other than the Supervisor cluster. This ensure
that the VMs and the vSphere Pods by default are not able to communicate directly with the VMs or
the pods in another namespace.

The NSX distributed firewall applies only to ports on switches known to the ESXi host and does not
apply to router ports. This distinction is important as NSX load balancer virtual server interfaces are
considered router ports, as they only exist as service within a Tier 1 Gateway, which means thats the
ports are not known to the ESXi host. The router ports do not include any metadata or tags, which
means that the distributed firewall has no way to learn which namespace owns the virtual server.

To isolate traffic between separate namespaces, use one of the following options:

1. When creating the namespace, override the network settings to define dedicated IP blocks.
This enables distributed firewall rules to be added to drop/deny traffic from VMs in other
namespaces towards the ingress IP pool. This pattern requires separate IP ranges are used
for ingress, egress and namespace networks. The networks are expandable at any time, but
monitoring and managing IP capacity has an additional overhead.

2. Use gateway firewalls to restrict traffic coming in to each load balancer. The benefit is that no
additional IP management is required. The down side is that each namespace has its own
gateway with its own firewall rule table, which means that automation is significantly more
challenging to implement and manual management will be very difficult. Also the gateway
firewall has not had performance testing conducted against groups with dynamic
membership. This is an issue at scale, as a workload cluster can only be identified by the tags
applied to the segment it is attached to. This means in large environments there is potential
for a lot of firewall rebuilds during activities such as upgrades, which could lead to
performance issues.

Deployment options
With vSphere 8 and above, when you enable vSphere with Tanzu, you can configure either one-
zone Supervisor mapped to one vSphere cluster or three-zone Supervisor mapped to three vSphere

VMware, Inc 404


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

clusters. This reference architecture is based on single zone deployment of a Supervisor Cluster.

Single-Zone Deployment of Supervisor


A supervisor deployed on a single vSphere cluster has three control plane VMs, which reside on the
ESXi hosts part of the cluster. A single zone is created for the Supervisor automatically or you can
use a zone that is created in advance. In a Single-Zone deployment, cluster-level high availability is
maintained through vSphere HA and can scale with vSphere with Tanzu setup by adding physical
hosts to vSphere cluster that maps to the Supervisor. You can run workloads through vSphere Pods,
Tanzu Kubernetes Grid clusters, and VMs when Supervisor is enabled with the NSX networking
stack.

Three-Zone Deployment of Supervisor


Configure each vSphere cluster as an independent failure domain and map it to the vSphere zone.
In a Three-Zone deployment, all three vSphere clusters become one Supervisor and can provide : s
- Cluster-level high availability to the Supervisor as vSphere cluster is an independent failure domain.
- Distribute the nodes of Tanzu Kubernetes Grid clusters across all three vSphere zones and provide
availability via vSphere HA at cluster level. - Scale the Supervisor by adding hosts to each of the
three vSphere clusters.

For more information, see VMware Tanzu for Kubernetes Operations using vSphere with Tanzu
Multi-AZ Reference Architecture on NSX Networking.

Installation Experience
vSphere with Tanzu deployment starts with deploying the Supervisor cluster (Enabling Workload
Management). The deployment is directly executed from the vCenter user interface (UI). The Get
Started page lists the pre-requisites for the deployment.

VMware, Inc 405


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

In the vCenter UI, Select NSX as networking stack.

This installation process takes you through the steps of deploying Supervisor cluster in your vSphere
environment. Once the Supervisor cluster is deployed, you can use either Tanzu Mission Control or
Kubectl utility to deploy the Tanzu Kubernetes Grid Clusters.

The following tables list recommendations for deploying the Supervisor Cluster:

Decision
Design Decision Design Justification Design Implications
ID

TKO- Create a
Subscribed Content Library can Local Content Library would require
TKGS- Subscribed
automatically pull the latest OVAs used by manual upload of images, suitable
003 Content Library.
the Tanzu Kubernetes Grid Service to build for air-gapped or Internet restricted
cluster nodes. environment.

Using a subscribed content library facilitates


template management as new versions can
be pulled by initiating the library sync.

TKO- Deploy Supervisor Large form factor should suffice to integrate Consume more resources from
TKGS- cluster control Supervisor cluster with TMC. Infrastructure.
004 plane nodes in
large form factor.

TKO- Register the Tanzu Mission Control automates the Need outbound connectivity to the
TKGS- Supervisor cluster creation of the Tanzu Kubernetes clusters, internet for TMC registration.
005 with Tanzu Mission and manages the life cycle of all Tanzu
Control. Kubernetes clusters centrally.

Note

In this scenario, the SaaS endpoints refer to Tanzu Mission Control, Tanzu Service
Mesh, and Tanzu Observability.

The following tables list recommendations for deploying Tanzu Kubernetes Clusters on the
Supervisor Cluster:

Decision
Design Decision Design Justification Design Implications
ID

VMware, Inc 406


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

TKO- Deploy Tanzu The prod plan provides high availability for the Requires additional Compute
TKC-001 Kubernetes clusters control plane. resources.
with prod plan and
multiple worker
nodes.

TKO- Use guaranteed VM Guarantees compute resources are always Could prevent automatic
TKC- class for Tanzu available for containerized workloads. migration of nodes by DRS.
002 Kubernetes clusters.

TKO- Implement RBAC for To avoid the usage of administrator credentials External AD/LDAP needs to
TKC- Tanzu Kubernetes for managing the clusters. be integrated with vCenter or
003 clusters. SSO groups need to be
created manually.

TKO- Deploy Tanzu Tanzu Mission Control provides life-cycle Only Antrea CNI is supported
TKC-04 Kubernetes clusters management for the Tanzu Kubernetes clusters on Workload clusters created
from Tanzu Mission and automatic integration with Tanzu Service from TMC portal.
Control. Mesh and Tanzu Observability.

vSphere Namespaces
A vSphere Namespace provides the runtime environment for TKG clusters on Supervisor. To
provision a TKG cluster, you first configure a vSphere namespace with users, roles, permissions,
compute, storage, content library, and assign virtual machine classes. All these configurations are
inherited by TKG clusters deployed in that namespace.

When you create a vSphere namespace, a network segment is created which is derived from the
Namespace Network configured in Supervisor. While creating vSphere namespace, you have the
option to override cluster network settings. Choosing this option lets you customize the vSphere
namespace network by adding Ingress, Egress, and Namespace network CIDR (unique from the
Supervisor and from any other vSphere namespace).

The typical use case for overriding Supervisor network settings is to provision a TKG cluster with
routable pod networking.

Note

The Override supervisor network setting is only available if the Supervisor is


configured with NSX networking.

Recommendations for Using Namespace with Tanzu

Decision
Design Decision Design Justification Design Implications
ID

TKO- Create dedicated Segregate prod/dev/test cluster via Clusters created within the namespace
TKGS- namespace to assigning them to dedicated share the same access
005 environment namespaces. policies/quotas/network & storage
specific. resources.

VMware, Inc 407


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Decision
Design Decision Design Justification Design Implications
ID

TKO- Register external Limit access to namespace based on role External AD/LDAP needs to be
TKGS- IDP with Supervisor of users or groups. integrated with vCenter or SSO
006 or AD/LDAP with Groups need to be created manually.
vCenter SSO.

TKO- Enable namespace Enables Devops users to create The vSphere administrator must
TKGS- self-service namespace in self-service manner. publish a namespace template to
007 LDAP users or groups to enable them
to create a namespace.

TKO- Use guaranteed VM CPU and Memory limits configured on Consume more infrastructure
TKGS- Class for production vSphere Namespace have impact on TKG resources and contention might
008 cluster. cluster if deployed using the guaranteed occur.
VM Class type.

Tanzu Kubernetes Grid Cluster API’s


Tanzu Kubernetes Grid provides the following two API’s for provisioning and managing the life cycle
of TKG2 clusters:

API Version v1alpha3 for Tanzu Kubernetes clusters

API version v1beta1 for Clusters based on a ClusterClass

The v1alpha3 API lets you create conformant Kubernetes clusters of type TanzuKubernetesCluster.
This type of cluster is pre-configured with common defaults for quick provisioning, and can be
customized. The v1beta1 API lets you create conformant Kubernetes clusters based on the default
ClusterClass named tanzukubernetescluster and cluster type of Cluster.

Tanzu Kubernetes Clusters Networking


A Tanzu Kubernetes cluster provisioned by the Tanzu Kubernetes Grid supports the following
Container Network Interface (CNI) options:

Antrea

Calico

The CNI options are open-source software that provide networking for cluster pods, services, and
ingress.

When you deploy a Tanzu Kubernetes cluster using the default configuration of Tanzu CLI, Antrea
CNI is automatically enabled in the cluster.

To provision a Tanzu Kubernetes cluster using Calico CNI, see Deploy Tanzu Kubernetes clusters
with Calico.

Each CNI is suitable for a different use case. The following table lists some common use cases for the
CNI options that Tanzu Kubernetes Grid supports. This table will help you select the most appropriate
CNI for your Tanzu Kubernetes Grid implementation.

CNI Use Case Pros and Cons

VMware, Inc 408


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Ant
Enable Kubernetes pod networking with IP overlay networks using VXLAN Pros
rea
or Geneve for encapsulation. Optionally, encrypt node-to-node
- Antrea leverages Open vSwitch
communication using IPSec packet encryption.
as the networking data plane.
Antrea supports advanced network use cases like kernel bypass and Open vSwitch supports both
network service mesh. Linux and Windows.

- VMware supports the latest


conformant Kubernetes and
stable releases of Antrea.

Cali
Calico is used in environments where factors like network performance, Pros
co
flexibility, and power are essential.
- Support for Network Policies
For routing packets between nodes, Calico leverages the BGP routing
- High network performance
protocol instead of an overlay network. This eliminates the need to wrap
packets with an encapsulation layer resulting in increased network - SCTP Support
performance for Kubernetes workloads.
Cons

- No multicast support

Kubernetes Ingress Routing


vSphere with Tanzu does not provide a default ingress controller. You can use Any Tanzu-supported
ingress controller. For example, Contour, an open-source controller for Kubernetes ingress routing.
Contour is part of a Tanzu package and can be installed on any Tanzu Kubernetes cluster. Deploying
Contour is a prerequisite for deploying Prometheus, Grafana, and Harbour on a workload cluster.

For more information about Contour, see Contour and Ingress Using Contour.

Tanzu Service Mesh also offers an Ingress controller based on Istio.

Each ingress controller has advantages and disadvantages of its own. The following table provides
general recommendations on when you should use a specific ingress controller for your Kubernetes
environment:

Ingress
Use Cases
Controller

Contour
You use Contour when only north-south traffic is needed in a Kubernetes cluster. You can apply
security policies for the north-south traffic by defining the policies in the manifest file for the
application.

Contour is a reliable solution for simple Kubernetes workloads.

Istio You use Istio ingress controller when you need to provide security, traffic direction, and insight within
the cluster (east-west traffic), and between the cluster and the outside world (north-south traffic).

Container Registry
vSphere with Tanzu includes Harbor as a container registry. Harbor provides a location of pushing,
pulling, storing, and scanning container images used in your Kubernetes clusters.

The initial configuration and setup of the platform does not require any external registry because the
required images are delivered through vCenter. Harbor registry is used for day-2 operations of the

VMware, Inc 409


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Tanzu Kubernetes workload clusters. Typical day-2 operations include tasks, such as pulling images
from Harbor for application deployment, and pushing custom images to Harbor.

When vSphere with Tanzu is deployed on NSX networking, you can deploy an external container
registry (Harbor) for Tanzu Kubernetes clusters.

You can use one of the following methods to install Harbor:

Tanzu Kubernetes Grid Package deployment:** VMware recommends this installation


method for general use cases. The Tanzu packages, including Harbor, must either be pulled
directly from VMware or be hosted in an internal registry.

VM-based deployment using OVA: VMware recommends using this installation method in
cases where Tanzu Kubernetes Grid is being installed in an air-gapped or Internet-restricted
environment, and no pre-existing image registry exists to host the Tanzu Kubernetes Grid
system images. VM-based deployments are only supported by VMware Global Support
Services to host the system images for air-gapped or Internet-less deployments. Do not use
this method for hosting application images.

When deploying Harbor with self-signed certificates or certificates signed by internal CAs, it is
necessary for the Tanzu Kubernetes cluster to establish trust with the registry’s certificate. To do so,
follow the procedure ins Integrate TKG 2 cluster with container registry.

Scale a Tanzu Kubernetes Grid Cluster


You can scale a Tanzu Kubernetes Grid cluster on Supervisor horizontally by changing the number
of nodes, or vertically by changing the virtual machine class hosting the nodes.

The following table lists the supported scaling operations for TKG cluster:

Node Horizontal Scale Out Horizontal Scale In Vertical Scale Volume Scale

Control Plane Yes Yes Yes No

Worker Yes Yes Yes Yes

Note

- The number of control plane nodes must be odd, either 3 or 5. - You can change
the Worker node volumes after provisioning. However, you can not change the
control plane node volumes.

Backup And Restore


There are following two options for backing up and restoring stateless and stateful applications
running on TKG Clusters on Supervisor:

Tool Comments

VMware, Inc 410


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Velero plug-in for vSphere


Both Kubernetes metadata and persistent volumes can be backed up and restored.

Velero snapshotting is used for persistent volumes with stateful applications.

Requires the Velero plug-in for vSphere installed and configured on Supervisor.

Standalone Velero and Restic


Both Kubernetes metadata and persistent volumes can be backed up and restored.

Restic is used for persistent volumes with stateful applications.

Use this approach if you require portability.

To backup and restore workloads running on TKG Cluster, create a datastore and install Velero with
Restic on Kubernetes cluster. For more information, see Install and Configure Standalone Velero and
Restic.

vSphere with Tanzu SaaS Integration


The SaaS products in the VMware Tanzu portfolio are on the critical path for securing systems at the
heart of your IT infrastructure. VMware Tanzu Mission Control provides a centralized control plane for
Kubernetes, and Tanzu Service Mesh provides a global control plane for service mesh networks.
Tanzu Observability features include Kubernetes monitoring, application observability, and service
insights.

To learn more about Tanzu Kubernetes Grid integration with Tanzu SaaS, see Tanzu SaaS Services.

Custom Tanzu Observability Dashboards


Tanzu Observability provides various out-of-the-box dashboards. You can customize the dashboards
for your particular deployment. For information on how to customize Tanzu Observability dashboards
for Tanzu for Kubernetes Operations, see Customize Tanzu Observability Dashboard for Tanzu for
Kubernetes Operations.

Summary
vSphere with Tanzu on hyper-converged hardware offers high-performance potential, convenience,
and addresses the challenges of creating, testing, and updating on-premises Kubernetes platforms in
a consolidated production environment. This validated approach will result in a production installation
with all the application services needed to serve combined or uniquely separated workload types
through a combined infrastructure solution.

This plan meets many Day 0 needs for quickly aligning product capabilities to full-stack
infrastructure, including networking, configuring firewall rules, load balancing, workload compute
alignment, and other capabilities.

VMware Tanzu for Kubernetes Operations using vSphere


With Tanzu Multi-AZ Reference Architecture on VDS
Networking
vSphere with Tanzu transforms the vSphere cluster into a platform for running Kubernetes workloads
in dedicated resource pools. When vSphere with Tanzu is enabled on three vSphere zones, it

VMware, Inc 411


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

creates a Kubernetes control plane directly in the hypervisor layer. You can then run the Kubernetes
containers by creating an upstream highly-available Kubernetes cluster through VMware Tanzu
Kubernetes Grid Service (informally known as TKGS), and run your applications inside these clusters.

This document provides a reference design for deploying a zonal supervisor with NSX Advanced
Load Balancer on VDS Networking.

For more information about Non-Zonal deployment, see VMware Tanzu for Kubernetes Operations
using vSphere with Tanzu Reference Design.

Supported Component Matrix


The following table provides the component versions and interoperability matrix supported with
reference design:

Software Components Version

Tanzu Kubernetes Release 1.23.8

VMware vSphere ESXi 8.0 Update 1a 21813344

VMware vCenter ( VCSA ) 8.0 Update 1a 21815093

NSX Advanced Load Balancer 22.1.2

For more information about the the latest versions, see VMware Product Interoperability Matrix.

vSphere with Tanzu Components


Supervisor cluster: When Workload Management is enabled on a vSphere cluster, it creates
a Kubernetes layer within the ESXi hosts that are part of the cluster. A cluster that is enabled
for Workload Management is called a Supervisor cluster. You can containerize workloads by
creating upstream Kubernetes clusters on the top of Supervisor cluster through the Tanzu
Kubernetes Grid Service.

The Supervisor cluster runs on top of an software-defined data center (SDDC) layer that
consists of 3 vSphere clusters for compute, NSX for networking, and a shared storage such
as VSAN.

You can deploy a Supervisor on three vSphere Zones to provide cluster-level high-
availability that protects your Kubernetes workloads against cluster-level failure. A vSphere
Zone maps to one vSphere cluster that you can set up as an independent cluster failure
domain. In a three-zone deployment, all three vSphere clusters become one Supervisor.

vSphere Namespaces: A vSphere Namespace is a tenancy boundary within vSphere with


Tanzu. A vSphere Namespace allows sharing vSphere resources(compute, networking, and
storage), and enforcing resource limits within the underlying objects such as Tanzu
Kubernetes clusters. For each namespace, you configure role-based-access-control (policies
and permissions), image library, and virtual machine classes.

In a Supervisor activated on three vSphere Zones, a namespace resource pool is created on


each vSphere cluster that is mapped to a zone. The vSphere Namespace spreads across all
three vSphere clusters part of the vSphere Zones. The resources utilized on such
namespace are taken from all three underlying vSphere clusters in equal parts.

VMware, Inc 412


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Tanzu Kubernetes Grid Service: Tanzu Kubernetes Grid Service (TKGS) allows you to create
and manage ubiquitous Kubernetes clusters on a VMware vSphere infrastructure using the
Kubernetes Cluster API. The Cluster API provides declarative, Kubernetes-style APIs for
creating, configuring, and managing the Tanzu Kubernetes Cluster. vSPhere 8.0 and above
supports the ClusterClass API. ClusterClass is a collection of templates that define a cluster
topology and configuration.

Tanzu Kubernetes Cluster ( Workload Cluster ): Tanzu Kubernetes clusters are Kubernetes
workload clusters in which your application workloads run. These clusters can be attached to
SaaS solutions such as Tanzu Mission Control (TMC), Tanzu Observability, and Tanzu Service
Mesh, which are part of Tanzu for Kubernetes Operations.

VM Class in vSphere with Tanzu: A VM class is a template that defines CPU, memory, and
reservations for VMs. VM classes are used for VM deployment in a Supervisor Namespace.
VM classes can be used by standalone VMs that run in a Supervisor Namespace, and by VMs
hosting a Tanzu Kubernetes cluster. VM Classes in a vSphere with Tanzu are broadly
categorized into the following two groups:

Guaranteed: This class fully reserves its configured resources.

Best-effort: This class allows to be overcommitted.

vSphere with Tanzu offers several default VM classes. You can either use the default VM
classes, or create custom VM classes based on your application’s requirements. The
following screenshot shows the default VM classes that are available in vSphere with Tanzu.

Default VM Classes:

Class CPU Memory(GB) Reserved CPU and Memory

best-effort-xsmall 2 2 No

best-effort-small 2 4 No

best-effort-medium 2 8 No

best-effort-large 4 16 No

best-effort-xlarge 4 32 No

best-effort-2xlarge 8 64 No

best-effort-4xlarge 16 128 No

best-effort-8xlarge 32 128 No

guaranteed-xsmall 2 2 Yes

guaranteed-small 2 4 Yes

guaranteed-medium 2 8 Yes

guaranteed-large 4 16 Yes

guaranteed-xlarge 4 32 Yes

guaranteed-2xlarge 8 64 Yes

guaranteed-4xlarge 16 128 Yes

VMware, Inc 413


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Class CPU Memory(GB) Reserved CPU and Memory

guaranteed-8xlarge 32 128 Yes

Note

If the default VM Classes are not meeting application compute and storage
requirements, you can create custom VM Classes.

Storage Classes in vSphere with Tanzu: A StorageClass allows the administrators to describe
the classes of storage that they offer. Different classes can map to meet quality-of-service
levels, to backup policies, or to arbitrary policies determined by the cluster administrators.
The policies represent datastore and manage storage placement of such components and
objects as control plane VMs, vSphere Pod ephemeral disks and container images. You
might need policies for storage placement of persistent volumes and VM content libraries.

A three-zone Supervisor supports zonal storage, where a datastore is shared across all hosts in a
single zone. Storage policies that you create for a Supervisor or for a namespace in a three-zone
Supervisor must be topology aware and have the consumption domain enabled. For more
information, see Create Storage Policy for a Three-Zone Supervisor.

When you prepare storage resources for three-zone Supervisor, consider the following parameters:

Storage in all three vSphere zones does not need to be of the same type. However, having
uniform storage in all three clusters provides a consistent performance.

Create a storage policy that is compliant with shared storage in each of the clusters. The
storage policy must be topology aware.

A three-zone Supervisor does not support the following options:

Cross-zonal volumes

vSAN File volumes (ReadWriteMany Volumes)

Static volumes provisioning using Register Volume API

Workloads that use VSAN Data Persistence platform

vSphere POD

VSAN Stretched clusters

VMs with vGPU and instance storage

The following table provides the recommendations for configuring Storage Classes in a
vSphere with Tanzu environment:

Decision
Design Decision Design Justification Design Implications
ID

TKO- Storage Policies must include


Create custom To provide different levels of QoS and
TKGS- one datastore from each
Storage SLA for prod and dev/test Kubernetes
001 vSphere Zone.
Classes/Profiles/Poli workloads.
cies.

VMware, Inc 414


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Identity and Access Management


vSphere with Tanzu provides identity and access management to Supervisor and vSphere
Namespace using following methods:

vCenter Single Sign-On (SSO): This is the default identity provider that is used to
authenticate in the vSphere with Tanzu environment, including Supervisors and Tanzu
Kubernetes Grid Clusters. The vCenter SSO provides authentication for vSphere
infrastructure, and can integrate with AD/LDAP systems.

External Identity Provider: You can configure a Supervisor with an external identity provider
and support the OpenID Connect protocol. Once connected, Supervisor functions as an
OAuth 2.0 client and uses the Pinniped authentication service to connect to Tanzu
Kubernetes Grid clusters by using the Tanzu CLI. Each Supervisor instance can support one
external identity provider. For more information about the list of supported OIDC providers,
see Configure an External IDP.

vSphere with Tanzu Architecture for a Multi-Zone


Deployment
The following diagram shows the high-level architecture of vSphere with Tanzu on three vSphere
Clusters.

The Supervisor cluster consists of the following components:

Supervisor Control Plane VM: A total of three Supervisor control plane VMs are created and
spread evenly across the three vSphere zones. The three Supervisor control plane VMs are
load balanced as each one of them has its own IP address. Additionally, a floating IP address
is assigned to one of the VMS and a fifth IP address is reserved for patching purposes.
vSphere DRS determines the exact placement of the control plane VMs on the ESXi hosts
part of zones and migrate them when needed.

Tanzu Kubernetes Grid Service and ClusterClass Modules running on Supervisor and
enable the provisioning and management of Tanzu Kubernetes Grid clusters. ClusterClass is

VMware, Inc 415


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

a new feature introduced as part of Cluster API which reduces the need for redundant
templating and enables powerful customization of clusters.

Virtual Machine Service: A module that is responsible for deploying and running stand-
alone VMs, and VMs that makeup the Tanzu Kubernetes Grid clusters.

After a Supervisor cluster is created, vSphere administrator creates vSphere namespaces. When
initially created, namespace has unlimited resources within the Supervisor cluster. The vSphere
administrator defines the limits for CPU, memory, and storage. The administrator also limits the
number of Kubernetes objects such as deployments, replica sets, persistent volumes, and so on, that
can run within the boundary of namespace. These limits are configured for each vSphere
namespace.

For more information about the the maximum supported number, see Configuration Maximum.

Recommendations for using namespace in vSphere with


Tanzu
Decision
Design Decision Design Justification Design Implications
ID

TKO- Create dedicated Segregate prod/dev/test Clusters created within the namespace share
TKGS- namespace to cluster via assigning them to the same access policies/quotas/network and
003 environment specific. dedicated namespaces. storage resources.

TKO- Register external IDP Limit access to namespace NA


TKGS- with Supervisor or based on role of
004 AD/LDAP with vCenter users/groups.
SSO.

TKO- Enable namespace self- Enables Dev-Ops users to The vSphere administrator must publish a
TKGS- service. create namespace in self- namespace template to LDAP users/groups
005 service manner. enabling them to create a namespace.

vSphere with Tanzu Storage


vSphere with Tanzu integrates with shared datastores available with vSphere infrastructure. The
following shared datastores are supported:

vSAN ( including vSAN ESA )

VMFS

NFS

vVols

vSphere with Tanzu uses storage policies to integrate with backend shared datastore. The policies
represent datastore and manage storage placement of the control VMs.

vSphere with Tanzu is agnostic about which storage option you chose. For Kubernetes stateful
workloads, vSphere with Tanzu installs the vSphere Container Storage Interface ( vSphere CSI ) to
automatically provision Kubernetes persistent volumes for pods.

Note

VMware, Inc 416


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Dynamic file persistent volumes (ReadWriteMany) access mode, encryption, and


WaitForFirstConsumer mode are not supported.

Networking for vSphere with Tanzu


vSphere with Tanzu can be enabled in the following environments:

vSphere backed with NSX networking.

vSphere backed with virtual Distributed Switch (VDS) Networking and HA proxy to provide
Load Balancing capabilities.

vSphere backed with virtual Distributed Switch (VDS) Networking and NSX Advanced Load
Balancer to provide Load Balancing capabilities.

Note

The scope of this document is limited to vSphere backed with VDS networking with
NSX Advanced Load Balancer.

vSphere with Tanzu on VDS Networking with NSX Advanced Load


Balancer
In a vSphere with Tanzu environment, a Zonal supervisor cluster configured with vSphere
networking uses distributed port groups to provide connectivity to Kubernetes control plane VMs,
services, and workloads. All 3 vSphere clusters, which are enabled for vSphere with Tanzu, are
connected to the same distributed switch that provides connectivity to Kubernetes workloads and
control plane VMs.

The following diagram shows a general overview of vSphere with Tanzu on VDS networking.

VMware, Inc 417


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

NSX Advanced Load Balancer Components


NSX Advanced Load Balancer is deployed in write access mode in a vSphere environment. This
mode grants NSX Advanced Load Balancer Controllers full write access to the vCenter, which helps
in automatically creating, modifying, and removing service engines (SEs) and other resources as
needed to adapt to changing traffic needs. The following are the core components of NSX Advanced
Load Balancer:

NSX Advanced Load Balancer Controller: NSX Advanced Load Balancer Controller
manages Virtual Service objects and interacts with the vCenter Server infrastructure to
manage the lifecycle of the SEs. It is the central repository for the configurations and policies
related to services and management, and provides the portal for viewing the health of
VirtualServices and SEs, and the associated analytics that NSX Advanced Load Balancer
provides. The following table provides recommendations for configuring NSX Advanced

VMware, Inc 418


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Load Balancer in a vSphere with Tanzu environment:

Decision
Design Decision Design Justification Design Implications
ID

TKO- Deploy 3 NSX ALB controller To achieve high-availability of the NSX NSX ALB network
NSXALB nodes, one to each vSphere ALB control plane when one of the should be L2 stretched
-001 Zone and make a cluster. vSphere clusters goes down. across three vSphere
Zone.

TKO- Deploy NSX ALB controller To isolate NSX ALB traffic from Additional VLAN is
NSXALB cluster nodes on a network infrastructure management traffic and required.
-002 dedicated to NSX-ALB. Kubernetes workloads.

TKO- Use static IP addresses for NSX Advanced Load Balancer NA


NSXALB the NSX ALB controllers. Controller cluster uses management IP
-003 addresses to form and maintain
quorum for the control plane cluster.
Any changes would be disruptive.

TKO- Use NSX ALB IPAM for SE Guarantees IP address assignment for NA
NSXALB data network and virtual Service Engine and Virtual Services.
-004 services.

TKO- Reserve an IP in the NSX NSX Advanced Load Balancer portal is always accessible over
NSXALB Advanced Load Balancer Cluster IP regardless of a specific individual controller node
-005 management subnet to be failure.
used as the Cluster IP for the
Controller Cluster.

TKO- Configure backup for the Backups are required if the NSX ALB To store backups, a
NSXALB NSX ALB Controller cluster. Controller becomes inoperable or if the SCP capable backup
-006 environment needs to be restored from location is needed.
a previous state. SCP is the only
supported protocol
currently.

TKO- Initial setup should be done NSX ALB controller


NSX ALB controller cluster is created
NSXALB only on one NSX ALB cluster creation fails if
from an initialized NSX ALB controller
-007 controller VM out of the more than one NSX
which becomes the cluster leader.
three deployed to create an ALB controller is
NSX ALB controller cluster. Follower NSX ALB controller nodes initialized.
need to be uninitialized to join the
cluster.

TKO- Configure Remote logging For operations teams to centrally


Additional Operational
NSXALB for NSX ALB Controller to monitor NSX ALB and escalate alerts
Overhead.
-008 send events on Syslog. events must be sent from the NSX ALB
Controller. Additional
infrastructure
Resource.

VMware, Inc 419


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Decision
Design Decision Design Justification Design Implications
ID

TKO- Use LDAP/SAML based Helps to Maintain Role based Access


Additional
NSXALB Authentication for NSX ALB Control.
Configuration is
-009
required.

SAML based
authentication requires
an NSX ALB Enterprise
license.

NSX Advanced Load Balancer Service Engine: NSX Advanced Load Balancer Service
Engines (SEs) are lightweight VMs that handle all data plane operations by receiving and
executing instructions from the controller. The SEs perform load balancing and all client and
server-facing network interactions. The following table provides recommendations for NSX
Advanced Load Balancer Service Engines deployment:

Decision
Design Decision Design Justification Design Implications
ID

TKO- NSX ALB Service Provides higher resiliency, optimum


Requires NSX ALB
ALB-SE- Engine High performance, and utilization compared to
Enterprise Licensing. Only
001 Availability set to N+M and/or Active/Standby.
the Active/Standby mode
Active/Active.
is supported with NSX ALB
essentials for Tanzu license.

Certain applications might


not work in the
Active/Active mode. For
example, applications that
preserve the client IP use
the Legacy Active/Standby
HA mode.

TKO- Enable ALB Service Enable SEs to elect a primary amongst Requires NSX ALB
ALB-SE- Engine Self themselves in the absence of connectivity Enterprise Licensing. This
002 Elections. to the NSX ALB controller. feature is not supported
with NSX ALB essentials for
Tanzu license.

TKO- Consume more Resources


Enable ‘Dedicated This will enable a dedicated core for packet
ALB-SE- from Infrastructure.
dispatcher CPU’ on processing enabling high packet pipeline
003
Service Engine on the Service Engine VMs.
Groups that contain
Note: By default, the packet processing
the Service Engine
core also processes load-balancing flows.
VMs of 4 or more
vCPUs.

Note: This setting


should be enabled
on SE Groups that
are servicing
applications and
have high network
requirements.

VMware, Inc 420


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Decision
Design Decision Design Justification Design Implications
ID

TKO- Set Placement This allows maximum utilization of capacity NA


ALB-SE- across the Service (Service Engine ).
004 Engines setting to
Compact.

TKO- Set the SE size to a This configuration should meet the most For services that require
ALB-SE- minimum 2vCPU generic use case. higher throughput, these
005 and 4GB of configurations need to be
memory. investigated and modified
accordingly.

TKO- Reserve Memory The Service Engines are a critical


You must perform
ALB-SE- and CPU for Service infrastructure component providing load-
additional configuration to
006 Engines. balancing services to mission-critical
set up the reservations.
applications. Guarantees the CPU and
Memory allocation for SE VM and avoids
performance degradation in case of
resource contention.

Avi Kubernetes Operator (AKO): Avi Kubernetes Operator is a Kubernetes operator that
runs as a pod in the Supervisor cluster. It provides ingress and load balancing functionality.
Avi Kubernetes Operator translates the required Kubernetes objects to NSX Advanced Load
Balancer objects and automates the implementation of ingresses/routes/services on the
Service Engines (SE) via the NSX Advanced Load Balancer Controller.

Each environment configured in NSX Advanced Load Balancer is referred to as a cloud. Each cloud
in NSX Advanced Load Balancer maintains networking and NSX Advanced Load Balancer Service
Engine settings. Each cloud is configured with one or more VIP networks to provide IP addresses to
L4 load balancing virtual services created under that cloud.

The virtual services can be spanned across multiple Service Engines if the associated Service Engine
Group is configured in Active/Active HA mode. A Service Engine can belong to only one Service
Engine group at a time.

IP address allocation for virtual services can be over DHCP or via NSX Advanced Load Balancer in-
built IPAM functionality. The VIP networks created/configured in NSX Advanced Load Balancer are
associated with the IPAM profile.

Network Architecture
To deploy vSphere with Tanzu, build separate networks for Supervisor clusters, Tanzu Kubernetes
Grid Workload clusters, NSX Advanced Load Balancer, and the Tanzu Kubernetes Grid control plane
HA.

Note

The network/portgroup designated for the workload cluster, carries both data and
control traffic. Firewalls cannot be utilized to segregate traffic between workload
clusters; instead, the underlying CNI must be employed as the main filtering system.
Antrea CNI has the Custom Resource Definitions (CRDs) for firewall rules that can be
enforced before Kubernetes network policy is added.

VMware, Inc 421


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Based on your requirements, you can create additional networks for your workload
cluster. These networks are also referred to as vSphere with Tanzu workload
secondary network.

This topology enables the following benefits:

Isolate and separate SDDC management components (vCenter, ESX) from the vSphere with
Tanzu components. This reference design allows only the minimum connectivity between
the Tanzu Kubernetes Grid clusters and NSX Advanced Load Balancer to the vCenter
Server.

Isolate and separate the NSX Advanced Load Balancer management network from the
supervisor cluster network and the Tanzu Kubernetes Grid workload networks.

Separate vSphere Admin and Tenant access to the supervisor cluster. This prevents tenants
from attempting to connect to the supervisor cluster.

Allow tenants to access only their own workload cluster(s) and restrict access to this cluster
from other tenants. This separation can be achieved by assigning permissions to the
supervisor namespaces.

Depending on the workload cluster type and use case, multiple workload clusters may
leverage the same workload network or new networks can be used for each workload
cluster.

The following table provides recommendations for networks:

Decision
Design Decision Design Justification Design Implications
ID

TKO- Use separate networks To have a flexible firewall and Sharing the same network for multiple clusters
NET-001 for Supervisor cluster security policies. can complicate creation of firewall rules.
and workload
clusters.

TKO- Use distinct port Isolate production Kubernetes Network mapping is done at the namespace
NET- groups for network clusters from dev/test clusters level. All Kubernetes clusters created in a
002 separation of by placing them on distinct namespace connect to the same port group.
Kubernetes port groups.
workloads.

TKO- Use routable networks Allow connectivity between the


Networks that are used for Tanzu Kubernetes
NET- for Tanzu Kubernetes TKG clusters and infrastructure
cluster traffic must be routable between each
003 clusters. components.
other and the Supervisor Cluster Management
Network.

Networking Prerequisites
All ESXi hosts which are part of three vSphere clusters share a common VDS with at least one
uplink. We recommend to have two uplinks (VDS must be version 8 and above).

Networking Requirements

VMware, Inc 422


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

As per the reference architecture, the following table lists the network requirements:

DHCP
Network Type Description
Service

NSX ALB Option ALB Controllers and SEs will be attached to this network.
Management al
Network

TKG Supervisor Option Supervisor cluster nodes will be attached to this network.
Network al

TKG Workload Option


Control Plane and Worker Nodes of the TKG workload cluster will be attached to this
Network al
network.

The second interface of Supervisor nodes is also attached to this network.

TKG Cluster No Virtual Services (L4) for control plane HA of Supervisor and workload. Reserve sufficient
VIP IPs depending on the number of TKG clusters planned to be deployed in the
environment.

Note

All the above networks should be L2 stretched across three vSphere clusters.

Subnet and CIDR Examples


The following table provides sample entries for the required port groups. Create network entries with
the port group name, VLAN ID, and CIDRs that are specific to your environment:

Gateway DHCP IP Pool fo SE/VIP in


Network Type Port Group Name VLAN
CIDR Enabled NSX ALB

NSX ALB Management sfo01-w01- 1610 172.16.10.1/ Optional 172.16.10.6-172.16.10.30


Network albmanagement 27

TKG Supervisor Network sfo01-w01- 1640 172.16.40.1 Optional No


tkgmanagement /27

TKG Workload Network sfo01-w01- 1660 172.16.60.1 Optional No


tkgworkload /24

TKG Cluster VIP sfo01-w01- 1680 172.16.80.1 No 172.16.80.2-172.16.80.62


tkgclustervip /26

Note

NSX ALB components can be deployed to Supervisor Network. However, we


recommend that you use separate networks for both.

Firewall Requirements
To prepare the firewall, perform the following options:

VMware, Inc 423


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

1. NSX Advanced Load Balancer Controller nodes and VIP addresses

2. NSX Advanced Load Balancer Service Engine management IP address

3. Supervisor Cluster network (Tanzu Kubernetes Grid Management) CIDR

4. Tanzu Kubernetes Grid workload cluster CIDR

5. Tanzu Kubernetes Grid cluster VIP address range

6. Client machine IP address

7. vCenter server IP address

8. VMware Harbor registry IP address

9. DNS server IP address(es)

10. NTP server IP address(es)

The following table provides a list of firewall rules based on the assumption that there is no firewall
within a subnet/VLAN.

Source Destination Protocol:Port Description

Client Machine NSX Advanced Load TCP:443 Access NSX Advanced Load Balancer portal
Balancer Controller for configuration.
Nodes and VIP

Client Machine vCenter Server TCP:443 Access and configure WCP in vCenter.

Client Machine TKG Cluster VIP Range


TCP:6443 TKG Cluster Access

TCP:443 Access the https workload

TCP:80 Access the http workload

TCP:443 Access the Tanzu Mission Control portal, and


Client Machine *.tmc.cloud.vmware.co
so on.
m
(optional)
console.cloud.vmware.c
om

TKG Management and


DNS Server TCP/UDP:53 DNS Service
Workload Cluster CIDR
NTP Server UDP:123 Time Synchronization

TKG Management vCenter IP TCP:443 Allows components to access vCenter to


Cluster CIDR create VMs and Storage Volumes.

TKG Management and NSX Advanced Load TCP:443 Allows Avi Kubernetes Operator (AKO) and
Workload Cluster CIDR Balancer controller AKO Operator (AKOO) access to NSX
nodes Advanced Load Balancer Controller.

TKG Management and TKG Cluster VIP Range TCP:6443 Allows the Supervisor cluster to configure
Workload Cluster CIDR workload clusters.

TKG Management and Image Registry (Harbor) TCP:443 Allows components to retrieve container
Workload Cluster CIDR (If Private) images.

VMware, Inc 424


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Source Destination Protocol:Port Description

TKG Management and TCP:443 Syncs content library, pull TKG binaries, and
wp-content.vmware.com
Workload Cluster CIDR interact with TMC.
*.tmc.cloud.vmware.co
m

Projects.registry.vmware
.com

TKG Management TKG Workload Cluster TCP:6443 VM Operator and TKC VM communication.
cluster CIDR CIDR

TKG Workload Cluster TKG Management Cluster TCP:6443 Allows the TKG workload cluster to register
CIDR CIDR with the Supervisor cluster.

NSX Advanced Load vCenter and ESXi Hosts TCP:443 Allow NSX Advanced Load Balancer to
Balancer Management discover vCenter objects and deploy SEs as
Network required.

NSX Advanced Load


DNS Server TCP/UDP:53 DNS Service
Balancer Controller
Nodes NTP Server UDP:123 Time Synchronization.

TKG Cluster VIP Range TKG Management Cluster TCP:6443 To interact with the Supervisor cluster.
CIDR

TKG Cluster VIP Range TKG Workload Cluster To interact with workload cluster and
TCP:6443
CIDR Kubernetes applications.
TCP:443

TCP:80

vCenter Server TKG Management Cluster


TCP:443
CIDR
TCP:6443

TCP:22 (optional)

Note

For TMC, if the firewall does not allow wildcards, you must whitelist all IP addresses of
[account].tmc.cloud.vmware.com and extensions.aws-usw2.tmc.cloud.vmware.com.

Installation Experience
You can configure each vSphere cluster as an independent failure domain and map it to the vSphere
zone. In a Three-Zone deployment, all three vSphere clusters become one Supervisor and can
provide the following options:

Cluster-level high availability to the Supervisor as vSphere cluster is an independent failure


domain.

Distribute the nodes of Tanzu Kubernetes Grid clusters across all three vSphere zones and
provide availability via vSphere HA at cluster level.

Scale the Supervisor by adding hosts to each of the three vSphere clusters.

VMware, Inc 425


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

vSphere with Tanzu deployment starts with deploying the Supervisor cluster on three vSphere
Zones. The deployment is directly done from the vCenter UI.

1. The Get Started page lists the pre-requisite for the deployment.

1. In the vCenter UI, select VDS as the networking stack.

1. On the next page, provide a name for the Supervisor cluster, and select previously created
three vSphere Zones.

VMware, Inc 426


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

This installation process takes you through the steps of deploying Supervisor cluster in your vSphere
environment. Once the Supervisor cluster is deployed, you can use either Tanzu Mission Control or
Kubectl utility to deploy the Tanzu Kubernetes Grid Clusters.

The following tables list recommendations for deploying Supervisor Cluster.

Decision
Design Decision Design Justification Design Implications
ID

TKO- Deploy Supervisor Large form factor should suffice to integrate Consume more
TKGS- cluster control plane Supervisor cluster with TMC. Resources from
001 nodes in large form Infrastructure.
factor.

TKO- Register Supervisor Tanzu Mission Control automates the creation of the Need outbound
TKGS- cluster with Tanzu Tanzu Kubernetes clusters and manages the life cycle connectivity to the
002 Mission Control. of all Tanzu Kubernetes Clusters centrally. internet for TMC
registration.

Note

The SaaS endpoints here refer to Tanzu Mission Control, Tanzu Service Mesh, and
Tanzu Observability.

Tanzu Kubernetes Grid Cluster APIs


To provision and manage the life cycle of TKG2 clusters, Tanzu Kubernetes Grid provides the
following two APIs:

API Version v1alpha3 for Tanzu Kubernetes clusters.

API version v1beta1 for Clusters based on a ClusterClass.

The v1alpha3 API lets you create conformant Kubernetes clusters of type TanzuKubernetesCluster.
This type of cluster is pre-configured with common defaults for quick provisioning, and can be

VMware, Inc 427


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

customized. The v1beta1 API lets you create conformant Kubernetes clusters based on the default
ClusterClass named tanzukubernetescluster and cluster type of Cluster.

Tanzu Kubernetes Clusters Networking


A Tanzu Kubernetes cluster provisioned by the Tanzu Kubernetes Grid supports the following
Container Network Interface (CNI) options:

Antrea

Calico

The CNI options are open-source software that provide networking for cluster pods, services, and
ingress.

When you deploy a Tanzu Kubernetes cluster using the default configuration of Tanzu CLI, Antrea
CNI is automatically enabled in the cluster.

To provision a Tanzu Kubernetes cluster using Calico CNI, see Deploy Tanzu Kubernetes Clusters
with Calico.

Each CNI is suitable for a different use case. The following tables lists some common use cases for
the CNI options that Tanzu Kubernetes Grid supports. This table will help you select the most
appropriate CNI for your Tanzu Kubernetes clusters with Calico.

CNI Use Case Pros and Cons

Ant
Enable Kubernetes pod networking with IP overlay networks using VXLAN Pros:
rea
or Geneve for encapsulation. Optionally encrypt node-to-node
- Antrea leverages Open vSwitch
communication using IPSec packet encryption.
as the networking data plane.
Antrea supports advanced network use cases like kernel bypass and Open vSwitch supports both
network service mesh. Linux and Windows.

- VMware supports the latest


conformant Kubernetes and
stable releases of Antrea.

Cali
Calico is used in environments where factors like network performance, Pros:
co
flexibility, and power are essential.
Support for Network Policies
For routing packets between nodes, Calico leverages the BGP routing
High network performance
protocol instead of an overlay network. This eliminates the need to wrap
packets with an encapsulation layer resulting in increased network SCTP Support
performance for Kubernetes workloads.
Cons:

No multicast support.

Recommendations for Tanzu Kubernetes Clusters


Decision
Design Decision Design Justification Design Implications
ID

VMware, Inc 428


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

TKO- Deploy Tanzu The prod plan provides high availability for the Consume resources from
TKC-001 Kubernetes clusters control plane. Infrastructure.
with prod plan and
multiple worker
nodes.

TKO- Assign storage policy Packages installation and applications Package and application
TKC- as default policy deployment requires storage policy defined as installation might fail.
002 during cluster default.
deployment.

TKO- Use guaranteed VM Guaranteed compute resources are always Could prevent automatic
TKC- class for Tanzu available for containerized workloads. migration of nodes by DRS.
003 Kubernetes clusters.

TKO- Implement RBAC for To avoid the usage of administrator credentials External AD/LDAP needs to be
TKC- Tanzu Kubernetes for managing the clusters. integrated with vCenter SSO or
004 clusters. External IDP integration with
Supervisor is required.

TKO- Deploy Tanzu Tanzu Mission Control provides life-cycle


Only Antrea CNI is supported on
TKC- Kubernetes clusters management for the Tanzu Kubernetes clusters
Workload clusters created from
005 from Tanzu Mission and can be integrated with Tanzu Service
the TMC portal.
Control. Mesh and Tanzu Observability.

Kubernetes Ingress Routing


vSphere with Tanzu does not ship with a default ingress controller. Any Tanzu-supported ingress
controller can be used. For example, Contour, an open-source controller for Kubernetes ingress
routing. Contour is part of a Tanzu package and can be installed on any Tanzu kubernetes cluster.
Deploying Contour is a prerequisite for deploying Prometheus, Grafana, and Harbour on a workload
cluster. You can also manually deploy AKO on the Tanzu Kubernetes cluster and make use of NSX
Advanced Load Balancer as L7 ingress but it will require an enterprise license of NSX Advanced
Load Balancer.

For more information about Contour, see the Contour site and Ingress Using Contour.

Tanzu Service Mesh also offers an Ingress controller based on Istio.

Each ingress controller has its own advantages and disadvantages. The below table provides general
recommendations on when you should use a specific ingress controller for your Kubernetes
environment.

Ingress
Use Cases
Controller

Contour
Use Contour when only north-south traffic is needed in a Kubernetes cluster. You can apply security
policies for the north-south traffic by defining the policies in the manifest file for the application.

Contour is a reliable solution for simple Kubernetes workloads.

Istio Use Istio ingress controller when you need to provide security, traffic direction, and insight within the
cluster (east-west traffic) and between the cluster and the outside world (north-south traffic).

Container Registry

VMware, Inc 429


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Harbor provides a location of pushing, pulling, storing and scanning container images used in your
Kubernetes clusters.

The initial configuration and setup of the platform does not require any external registry because the
required images are delivered through vCenter. Harbor registry is used for day 2 operations of the
Tanzu Kubernetes workload clusters. Typical day-2 operations include tasks such as pulling images
from Harbor for application deployment and pushing custom images to Harbor.

When vSphere with Tanzu is deployed on VDS networking, you can deploy an external container
registry (Harbor) for Tanzu Kubernetes clusters.

You may use one of the following methods to install Harbor:

Tanzu Kubernetes Grid Package deployment : VMware recommends this installation method
for general use cases. The Tanzu packages, including Harbor, must either be pulled directly
from VMware or be hosted in an internal registry.

VM-based deployment using OVA: VMware recommends using this installation method
when Tanzu Kubernetes Grid is being installed in an air-gapped or Internet-restricted
environment and no pre-existing image registry exists to host the Tanzu Kubernetes Grid
system images. VM-based deployments are only supported by VMware Global Support
Services to host the system images for air-gapped or Internet-less deployments. Do not use
this method for hosting application images. Harbor registry is being shipped along with TKG
binaries and can be downloaded from here.

If you are deploying Harbor without a publicly signed certificate, you must include the Harbor root
CA in your Tanzu Kubernetes Grid clusters. To configure the TKG cluster with private container
registry, see Integrate TKG 2 cluster with container registry.

Scale a Tanzu Kubernetes Grid Cluster


You can scale a TKG cluster on supervisor horizontally by changing the number of nodes, or
vertically by changing the virtual machine class hosting the nodes.

The following table lists the supported scaling operations for the TKG cluster:

Node Horizontal Scale Out Horizontal Scale In Vertical Scale Volume Scale

Control Plane Yes Yes Yes No

Worker Yes Yes Yes Yes

Note

- The number of control plane nodes must be odd, either 3 or 5. - You can change
the Worker node volumes can be changed after provisioning. However, the control
plane node volumes cannot be changed.

Backup And Restore


There are the following two options for backing up and restoring stateless and stateful applications
running on TKG Clusters on Supervisor:

VMware, Inc 430


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Tool Comments

Velero plug-in for vSphere


Both Kubernetes metadata and persistent volumes can be backed up and restored.

Velero snapshotting is used for persistent volumes with stateful applications.

Requires that the Velero Plugin for vSphere is also installed and configured on
Supervisor.

Standalone Velero and


Both Kubernetes metadata and persistent volumes can be backed up and restored.
Restic
Restic is used for persistent volumes with stateful applications.

Use this approach if you require portability.

To backup and restore workloads running on TKG Cluster on Zonal Supervisor, create a datastore
and install Velero with Restic on Kubernetes cluster. For more information, see Install and Configure
Standalone Velero and Restic.

Note

Velero plug-in for vSphere runs as a pod which is not supported with Zonal
Supervisor and it requires NSX-T networking. For more information, see the
prerequisites section of Install and Configure the Velero Plugin for vSphere on
Supervisor.

Appendix A - Deploy TKG Cluster


Following is the sample yaml file for deploying the TKG 2 workload cluster on Supervisor:

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: tkc-prod-cluster-1
namespace: prod
spec:
clusterNetwork:
services:
cidrBlocks: ["10.96.0.0/12"]
pods:
cidrBlocks: ["192.168.0.0/16"]
serviceDomain: "cluster.local"
topology:
class: tanzukubernetescluster
version: v1.23.8+vmware.2-tkg.2-zshippable
#describe the cluster control plane
controlPlane:
#number of control plane nodes; integer 1 or 3
replicas: 3
#describe the cluster worker nodes
workers:
#specifies parameters for a set of worker nodes in the topology
machineDeployments:
- class: node-pool
name: node-pool-1

VMware, Inc 431


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

replicas: 1
failureDomain: zone-a
- class: node-pool
name: node-pool-2
replicas: 1
failureDomain: zone-b
- class: node-pool
name: node-pool-3
replicas: 1
failureDomain: zone-c
variables:
#virtual machine class type and size for cluster nodes
- name: vmClass
value: guaranteed-small
#persistent storage class for cluster nodes
- name: storageClass
value: gold-sp

By running the following command, you can provision and verify the TKG Cluster across vSphere
Zones:

kubectl get nodes -L topology.kubernetes.io/zone

NAME STATUS ROLES


AGE VERSION ZONE

tkc-prod-cluster-2-node-pool-1-986dx-5f66d4c87c-5ss6f Ready <none>


46h v1.23.8+vmware.2 zone-a

tkc-prod-cluster-2-node-pool-2-rbmh4-847684c759-szld4 Ready <none>


46h v1.23.8+vmware.2 zone-b

tkc-prod-cluster-2-node-pool-3-2gwqq-9d5576964-sbsx2 Ready <none>


46h v1.23.8+vmware.2 zone-c

tkc-prod-cluster-2-wtjxq-h55lj Ready control-plane,master


46h v1.23.8+vmware.2 zone-a

tkc-prod-cluster-2-wtjxq-wcb8b Ready control-plane,master


46h v1.23.8+vmware.2 zone-b

tkc-prod-cluster-2-wtjxq-wzxgp Ready control-plane,master


46h v1.23.8+vmware.2 zone-c

Appendix B - Deploy StatefulSet Application to vSphere


Zones
The following Stateful sample application deploys pod to each vSphere zone, each having a
persistent identifier that it maintains across any rescheduling:

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
replicas: 3

VMware, Inc 432


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

selector:
matchLabels:
app: nginx
serviceName: nginx
template:
metadata:
labels:
app: nginx
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- zone-1
- zone-2
- zone-3
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: topology.kubernetes.io/zone
containers:
- name: nginx
image: gcr.io/google_containers/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
- name: logs
mountPath: /logs
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: zonal-ds-policy-105-latebinding
resources:
requests:
storage: 2Gi
- metadata:
name: logs
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: zonal-ds-policy-105-latebinding
resources:
requests:
storage: 1Gi

VMware, Inc 433


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

You can verify the pod scheduling across zones:

kubectl get pods -o wide

NAME READY STATUS RESTARTS AGE IP NODE


NOMINATED NODE READINESS GATES

web-0 1/1 Running 0 34m 192.168.3.5 tkc-prod-cluster-3-node-pool-


2-w77p4-578fcc7d54-g99ts <none> <none>

web-1 1/1 Running 0 34m 192.168.1.6 tkc-prod-cluster-3-node-pool-


1-9d5wd-65b6874698-f59hg <none> <none>

web-2 1/1 Running 0 33m 192.168.2.6 tkc-prod-cluster-3-node-pool-


3-5vv5h-77b4cc5598-np4wx <none> <none>

Appendix C - NSX Advanced Load Balancer Sizing Guidelines


NSX Advanced Load Balancer Sizing Guidelines
Regardless of the NSX Advanced Load Balancer Controller configuration, each controller cluster can
achieve up to 5000 virtual services, which is a hard limit. For more information, see Sizing Compute
and Storage Resources for NSX Advanced Load Balancer Controller(s).

Controller Size VM Configuration Virtual Services Avi SE Scale

Small 4 vCPUS, 12 GB RAM 0-50 0-10

Medium 8 vCPUS, 24 GB RAM 0-200 0-100

Large 16 vCPUS, 32 GB RAM 200-1000 100-200

Extra Large 24 vCPUS, 48 GB RAM 1000-5000 200-400

Service Engine Sizing Guidelines


For guidance on sizing your service engines (SEs), see Sizing Compute and Storage Resources for
NSX Advanced Load Balancer Service Engine(s).

Performance metric 1 vCPU core

Throughput 4 Gb/s

Connections/s 40k

SSL Throughput 1 Gb/s

SSL TPS (RSA2K) ~600

SSL TPS (ECC) 2500

Multiple performance vectors or features might impact the performance of your applications. For
instance, to achieve 1 Gb/s of SSL throughput and 2000 TPS of SSL with EC certificates, NSX
Advanced Load Balancer recommends two cores.

NSX Advanced Load Balancer SEs may be configured with as less as 1 vCPU core and 1 GB RAM, or

VMware, Inc 434


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

up to 36 vCPU cores and 128 GB RAM. SEs can be deployed in Active/Active or Active/Standby
mode depending on the license tier used. NSX Advanced Load Balancer Essentials license doesn’t
support Active/Active HA mode for SE.

VMware Tanzu for Kubernetes Operations using vSphere


with Tanzu Multi-AZ Reference Architecture on NSX
Networking
vSphere with Tanzu transforms vSphere into a platform for running Kubernetes workloads natively
on the hypervisor layer. When vSphere with Tanzu is enabled on a vSphere cluster, you can run
Kubernetes workloads directly on ESXi hosts and create upstream Kubernetes clusters within
dedicated resource pools.

This document lays out a reference design for deploying Supervisor on three vSphere Zones to
provide cluster-level high-availability. Each vSphere Zone maps to one vSphere cluster. A three-
zone Supervisor only supports Tanzu Kubernetes clusters and VMs, it does not support vSphere
Pods. This document does not cover any recommendations or deployment steps for underlying
software-defined data center (SDDC) environments.

For more information about the Non-Zonal Supervisor deployment, see VMware Tanzu for
Kubernetes Operations using vSphere with Tanzu on NSX-T Reference Design.

The following reference design is based on the architecture and components described in VMware
Tanzu for Kubernetes Operations Reference Architecture.

Supported Component Matrix


For more information, see VMware Product Interoperability Matrix.

The following table provides the component versions and interoperability matrix supported with
reference design:

VMware, Inc 435


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Software Components Version

Tanzu Kubernetes Release 1.23.8

VMware vSphere ESXi 8.0 Update 1a 21813344

VMware vCenter 8.0 Update 1a 21815093

VMware NSX 4.1.0.2

vSphere With Tanzu Components


Supervisor cluster: When Workload Management is enabled on a vSphere cluster, it creates
a Kubernetes layer within the ESXi hosts that are part of the cluster. A cluster that is enabled
for Workload Management is called a Supervisor cluster. You can containerize workloads by
creating upstream Kubernetes clusters on the top of Supervisor cluster through the Tanzu
Kubernetes Grid Service (informally known as TKGS).

The Supervisor cluster runs on top of an SDDC layer that consists of 3 vSphere clusters for
computing, NSX for networking, and a shared storage such as VSAN.

You can deploy a Supervisor on three vSphere Zones to provide cluster-level high-
availability that protects your Kubernetes workloads against cluster-level failure. A vSphere
Zone maps to one vSphere cluster that you can set up as an independent cluster failure
domain. In a three-zone deployment, all three vSphere clusters become one Supervisor
cluster.

vSphere Namespaces: A vSphere Namespace is a tenancy boundary within vSphere with


Tanzu. A vSphere Namespace allows sharing vSphere resources(computing, networking, and
storage), and enforcing resource limits within the underlying objects such as Tanzu
Kuenretes clusters. For each namespace, you configure role-based access control (policies
and permissions), image library, and virtual machine classes.

In a Supervisor activated on three vSphere Zones, a namespace resource pool is created on


each vSphere cluster that is mapped to a zone. The vSphere Namespace spreads across all
three vSphere clusters that are part of the vSphere Zones. The resources utilized on such
namespace are taken from all three underlying vSphere clusters in equal parts.

Tanzu Kubernetes Grid Service: Tanzu Kubernetes Grid Service (TKGS) allows you to create
and manage ubiquitous Kubernetes clusters on a VMware vSphere infrastructure using the
Kubernetes Cluster API. The Cluster API provides declarative Kubernetes-style APIs for
creating, configuring, and managing the Tanzu Kubernetes Cluster. vSphere 8.0 and above
supports the ClusterClass API. The ClusterClass API is a collection of templates that define a
cluster topology and configuration.

Tanzu Kubernetes Cluster ( Workload Cluster ): Tanzu Kubernetes clusters are Kubernetes
workload clusters in which your application workloads run. These clusters can be attached to
SaaS solutions such as Tanzu Mission Control (TMC), Tanzu Observability, and Tanzu Service
Mesh, which are part of Tanzu for Kubernetes Operations.

VM Class in vSphere with Tanzu: A VM class is a template that defines CPU, memory, and
reservations for VMs. VM classes are used for VM deployment in a Supervisor Namespace.
VM classes can be used by standalone VMs that run in a Supervisor Namespace, and by VMs

VMware, Inc 436


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

hosting a Tanzu Kubernetes cluster.

VM Classes in a vSphere with Tanzu are categorized into the following two groups:

Guaranteed: This class fully reserves its configured resources.

Best-effort: This class allows to be overcommitted.

vSphere with Tanzu offers several default VM classes. You can either use the default VM
classes, or create customized VM classes based on the requirements of the application. The
following table explains the default VM classes that are available in vSphere with Tanzu:

Class CPU Memory(GB) Reserved CPU and Memory

best-effort-xsmall 2 2 No

best-effort-small 2 4 No

best-effort-medium 2 8 No

best-effort-large 4 16 No

best-effort-xlarge 4 32 No

best-effort-2xlarge 8 64 No

best-effort-4xlarge 16 128 No

best-effort-8xlarge 32 128 No

guaranteed-xsmall 2 2 Yes

guaranteed-small 2 4 Yes

guaranteed-medium 2 8 Yes

guaranteed-large 4 16 Yes

guaranteed-xlarge 4 32 Yes

guaranteed-2xlarge 8 64 Yes

guaranteed-4xlarge 16 128 Yes

guaranteed-8xlarge 32 128 Yes

Storage Classes in vSphere with Tanzu : A StorageClass allows the administrators to


describe the classes of storage that they offer. Different storage classes can map to meet
quality-of-service levels, to backup policies, or to arbitrary policies determined by the cluster
administrators. The policies representing datastore can manage storage placement of such
components and objects as control plane VMs, vsphere Pod ephemeral disks, and container
images. You might need policies for storage placement of persistent volumes and VM
content libraries.

A three-zone Supervisor supports zonal storage, where a datastore is shared across all hosts in a
single zone. Storage policies that you create for a Supervisor or for a namespace in a three-zone
Supervisor must be topology aware and have the consumption domain enabled. For more
information, see Create Storage Policy for a Three-Zone Supervisor.

When you prepare storage resources for three-zone Supervisor, consider the following parameters:

VMware, Inc 437


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Storage in all three vSphere zones does not need to be of the same type. However, having
uniform storage in all three clusters provides a consistent performance.

Create a storage policy that is compliant with shared storage in each of the clusters. The
storage policy must be topology aware.

A three-zone Supervisor does not support the following items:

Cross-zonal volumes

vSAN File volumes (ReadWriteMany Volumes)

Static volumes provisioning by using the Register Volume API

Workloads that use VSAN Data Persistence platform

vSphere POD

VSAN Stretched clusters

VMs with vGPU and instance storage

The Ephemeral Disk Storage Policy and the Image Cache Storage Policy options are disabled
because vSphere Pods are not supported with a Zonal Supervisor deployment.

You cannot create a storage class manually by using kubectl and YAML. You can create a storage
class using the vSphere storage policy framework and apply it to the vSphere Namespace. While a
storage class cannot be created manually by using kubectl and YAML, an existing storage class can
be modified by using kubectl.

The following table provides recommendations for configuring Storage Classes in a vSphere with
Tanzu environment:

Decision
Design Decision Design Justification Design Implications
ID

TKO- Storage Policies must include


Create custom To provide different levels of QoS and SLA for
TKGS- one datastore from each
Storage production, development, and test Kubernetes
001 vSphere Zone.
Classes/Profiles/Poli workloads.
cies.

Identity and Access Management


vSphere with Tanzu supports the following two identity providers:

vCenter Single Sign-On: This is the default identity provider that is used to authenticate with
vSphere with Tanzu environment, including the Supervisors and Tanzu Kubernetes Grid
Clusters. vCenter SSO provides authentication for vSphere infrastructure and can integrate
with AD/LDAP systems.

To authenticate using vCenter Single Sign-On, use vSphere plug-in for kubectl. Once
authenticated, use kubectl to declaratively provision and manage the lifecycle of TKG
clusters, deploy TKG cluster workloads.

External Identity Provider: You can configure a Supervisor with an external identity provider
and support the OpenID Connect protocol. Once connected, the Supervisor functions as an
OAuth 2.0 client, and uses the Pinniped authentication service to connect to Tanzu

VMware, Inc 438


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Kubernetes Grid clusters by using the Tanzu CLI. Each Supervisor instance can support one
external identity provider. For more information about the list of supported OIDC providers,
see Configure an External IDP.

The Tanzu Kubernetes Grid (TKG) cluster permissions are set and scoped at the vSphere
Namespace level. When permissions are set for Namespace, including identity source, users &
groups, and roles, all these permissions apply to any TKG cluster deployed within that vSphere
Namespace.

Roles and Permissions


TKG Clusters supports the following three roles: - Viewer - Editor - Owner.

These permissions are assigned and scoped at vSphere Namespace.

Permission Description

Can view Read-only access to TKG clusters provisioned in that vSphere Namespace.

Can edit Create, read, update, and delete TKG clusters in that vSphere Namespace.

Owner Can administer TKG clusters in that vSphere Namespace, and can create and delete additional vSphere
Namespaces using kubectl.

vSphere with Tanzu Architecture


On a three-zone Supervisor, you can run Kubernetes workloads on Tanzu Kubernetes Grid clusters
and VMs created by using the VM service. A three zone Supervisor has the following components:

Supervisor Control Plane VM: n this environment, three supervisor control plane VMs are
created and spread evenly across the three vSphere zones. The three Supervisor control
plane VMs are load balanced as each one of them has its own IP address. Additionally, a
floating IP address is assigned to one of the VMS and a fifth IP address is reserved for
patching purposes. vSphere DRS determines the exact placement of the control plane VMs
on the ESXi hosts part of zones and migrate them when needed.

Tanzu Kubernetes Grid and Cluster API: Modules running on the Supervisor and enable the
provisioning and management of Tanzu Kubernetes Grid clusters.

Virtual Machine Service: A module that is responsible for deploying and running stand-
alone VMs, and VMs that makeup the Tanzu Kubernetes Grid clusters.

VMware, Inc 439


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

vSphere with Tanzu Storage


vSphere with Tanzu integrates with the shared datastore available in vSphere infrastructure. The
following types of shared datastore are supported.

VMFS

NFS

VSAN

vVOls

vSphere with Tanzu is agnostic about which storage option you choose. For Kubernetes stateful
workloads, vSphere with Tanzu installs the vSphere Container Storage Interface (vSphere CSI) to
automatically provision Kubernetes persistent volumes for pods.

Depending on your vSphere storage environment and needs of DevOps, you can create several
storage policies for different classes of storage. When you enable a Supervisor and set up
namespaces, you can assign different storage policies to be used by various objects, components,
and workloads.

For three-zone Supervisor, perform the following prerequisites:

Create three vSphere clusters with at least 3 hosts. For using vSAN, the cluster must have 3
or 4 hosts.

Configure storage with VSAN or other shared storage for each cluster.

Enable vSphere HA and vSphere DRS on Fully Automate or Partially Automate mode.

Networking for vSphere with Tanzu


You can enable Vsphere with Tanzu in the following environments:

vSphere backed with NSX networking.

vSphere backed with virtual Distributed Switch (VDS) Networking and HA proxy to provide

VMware, Inc 440


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Load Balancing capabilities.

vSphere backed with virtual Distributed Switch (VDS) Networking and NSX Advanced Load
Balancer to provide Load Balancing capabilities.

Note

The scope of this document is limited to VMware NSX Data Center Networking.

NSX provides network connectivity to the objects inside the Supervisor and external networks.
Connectivity to the ESXi hosts comprises three vSphere clusters that are backed by VLAN backed
port groups.

The following diagram shows a general overview of three-zone deployment of vSphere with Tanzu
on NSx networking.

The Supervisor cluster configured with NSX networking either uses a distributed port group (routable
to required infrastructure components such as vCenter, NSX manager, DNS , NTP and so on. For
more information, see Firewal Recommendation) or to NSX segment to provide connectivity to
Kubernetes control plane VMs. Tanzu Kubernetes clusters have their own networking provided by
the NSX segment. All hosts from the cluster, which is enabled for vSphere with Tanzu, are
connected to the distributed switch that provides connectivity to Kubernetes workload and control
plane VMs.

The following section explains the networking components and services included in the Supervisor
cluster:

NSX Container Plugin (NCP) provides integration between NSX and Kubernetes. The main
component of NCP runs in a container and communicates with the NSX manager and with
the Kubernetes control plane. NCP monitors changes to containers and other resources and
manages resources such as logical ports, segments, routers, and security groups for the

VMware, Inc 441


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

containers by calling the NSX API.

By default, NCP creates one shared tier-1 gateway for system namespaces, and a tier-1
gateway and load balancer for each namespace. The tier-1 gateway for namespace is
connected to the tier-0 gateway and a default segment.

System namespaces are namespaces that are used by the core components that are integral
to functioning of the Supervisor and Tanzu Kubernetes Grid clusters. The shared network
resources that include the tier-1 gateway, load balancer, and SNAT IP are grouped in a
system namespace.

NSX Edge provides connectivity from external networks to the Supervisor resources. An
NSX edge cluster normally includes at least two Edge nodes and has a load balancer that
provides a redundancy to the Kube-API servers residing on control plane VMs and any
application that must be published and be accessible from outside the Supervisor cluster. For
more information, see Install and Configure NSX for vSphere with Tanzu.

A tier-0 gateway is associated with the NSX Edge cluster to provide routing to the external
network. The uplink interfaces use either the dynamic routing, BGP, or static routing.

Each vSphere namespace has a separate network and set of networking resources shared by
applications inside the namespace, such as tier-1 gateway, load balancer service, and SNAT
IP address.

Workloads running in Tanzu kubernetes Grid clusters will have the same isolation rule that is
implemented by the default firewall.

NSX LB provides
L4 Load Balancer service for Kube-API to the Supervisor cluster and workload
clusters.

L4 Load Balancer service for all services of type LoadBalancer deployed in workload
clusters.

Networking Prerequisites
All ESXi hosts, part of three vSphere clusters, share a common VDS with at least one uplink.
We recommend that you configure two uplinks. You must use VDS version 8 or above.

Three vSphere clusters are mapped to the same overlay transport zone.

Supervisor Management network is used to instantiate the zonal supervisor. This can either
be L2 stretched network or NSX segment.

Network Requirements
The following table lists the required networks for the reference design:

Note

Based on your business requirements, modify subnet range to fit the projected
growth.

VMware, Inc 442


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Network Type Sample Recommendation Description

Supervisor /28 to allow for 5 IPs and Network to host the supervisor VMs. It can be a VLAN backed
Management future expansion VDS Port group or pre-created NSX segment.
Network

Ingress IP range /24, 254 address Each service type Load Balancer deployed will consume 1 IP
address.

Egress IP range /27 Each vSphere namespace consumes 1 IP address for the SNAT
egress.

Namespace/POD /20 Allocate IP address to workload attached to supervisor


network CIDR By default, it is used in /28 namespace segments.
blocks by workload.

Supervisor Service /24 Network from which IPs for Kubernetes ClusterIP Service will be
CIDR allocated.

Firewall Recommendations
To prepare the firewall, you need the following information:

1. Supervisor network (Tanzu Kubernetes Grid Management) CIDR

2. Tanzu Kubernetes Grid workload cluster CIDR

3. Ingress and Egress range

4. Client machine IP address

5. vCenter server IP address

6. NSX Manager IP address

7. VMware Harbor registry IP address

8. DNS server IP address(es)

9. NTP server IP address(es)

The following table provides a list of firewall rules based on the assumption that there is no firewall
within a subnet or VLAN.

Source Destination Protocol:Port Description

vCenter Supervisor Network TCP:6443 Allows vCenter to manage the supervisor VMs.

vCenter Supervisor Network TCP:22 Allows platform administrators to connect to VMs


through vCenter.

Supervisor NSX Manager TCP:443 Allows supervisor to access NSX-T Manager to orchestrate
Network networking.

Supervisor vCenter Allows supervisor to access vCenter to create VMs and


TCP:6443
Network Storage Volumes.
TCP:443

Supervisor ESXi Hosts TCP:10250 Supervisor Cluster to Spherelet ESXi hosts.


Network

VMware, Inc 443


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Source Destination Protocol:Port Description

ESXi Hosts Supervisor IP TCP:6443 Spherelet ESXi hosts to Supervisor Cluster.


Addresses

Supervisor DNS Servers DNS


TCP:53
Network
UDP:53

Supervisor NTP Servers UDP:123 NTP


Network

Supervisor Workload Network TCP:6443 GCM, VMOperator needs to communicate with TKC
Network apiserver.

Supervisor *.tmc.cloud.vmware TCP:443 TMC Connectivity


Network .com

Egress IP Range DNS Servers DNS


TCP:53

UDP:53

Egress IP Range NTP Servers UDP:123 NTP

Jumpbox vCenter Management


TCP:22

TCP:443

Jumpbox NSX-T TCP:443 Management

Jumpbox Supervisor Network Management


TCP:22

TCP:6443

Jumpbox Workload Network

Jumpbox Ingress IP pool Management


TCP:443

TCP:6443

Jumpbox Web proxy TCP:TBC Settings depend on proxy

Jumpbox Git Server Version Control


TCP:443

TCP:22

Jumpbox Ingress IP Range Management


TCP:443

TCP:6443

Platform Jumpbox TCP:22 Management


Admins

Kubernetes Ingress IP Range


TCP:443 Management.
users
You can further restrict to individual IPs for cluster access.
TCP:6443

VMware, Inc 444


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Note

For Tanzu Mission Control (TMC), if the firewall does not allow wildcards, you just
whitelist all IP addresses of [account].tmc.cloud.vmware.com and extensions.aws-
usw2.tmc.cloud.vmware.com.

Installation Experience
While deploying Supervisor by using vSphere 8 and above, you can select vSphere Zone
deployment, and can assign vSphere zones to provide high availability and fault tolerance. IN this
scenario, all three vsphere clusters become one supervisor. In a three-zone deployment, you can
perform the following operations:

Provide cluster-level high-availability to the Supervisor as each vSphere cluster is an


independent failure domain.

Distribute the nodes of your Tanzu Kubernetes Grid clusters across all three vSphere zones,
thus providing HA for your Kubernetes workloads at a vSphere cluster level.

vSphere with Tanzu deployment starts with deploying the Supervisor cluster on three vSphere
Zones. The deployment is directly done from vCenter UI. The Get Started page lists the pre-requisite
for the deployment:

1. In the vCenter UI, Select NSX as networking stack.

VMware, Inc 445


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

1. On the next page, provide a name for the Supervisor cluster, and select the previously
created three vSphere Zones.

This installation process takes you through the steps of deploying Supervisor cluster in your vSphere
environment. Once the Supervisor cluster is deployed, you can use either Tanzu Mission Control or
Kubectl utility to deploy the Tanzu Kubernetes Grid Clusters.

The following tables list recommendations for deploying the Supervisor Cluster:

Decision
Design Decision Design Justification Design Implications
ID

TKO- Deploy Supervisor Large form factor should suffice to integrate Consume more
TKGS- cluster control plane Supervisor cluster with TMC. resources from
001 nodes in large form Infrastructure.
factor.

TKO- Register the Supervisor Tanzu Mission Control automates the creation of the Need outbound
TKGS- cluster with Tanzu Tanzu Kubernetes clusters, and manages the life cycle connectivity to the
002 Mission Control. of all Tanzu Kubernetes clusters centrally. internet for TMC
registration.

Note

VMware, Inc 446


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

In this scenario, the SaaS endpoints refer to Tanzu Mission Control, Tanzu Service
Mesh, and Tanzu Observability.

vSphere Namespaces
A vSphere Namespace provides the runtime environment for TKG Clusters on Zonal Supervisor. To
provision a TKG cluster, you first configure a vSphere namespace with users, roles, permissions,
compute, storage, content library, and assign virtual machine classes. All these configurations are
inherited by TKG clusters deployed in that namespace.

When you create a vSphere Namespace, a network segment is created which is derived from the
Namespace Network configured in Supervisor. While creating vSphere namespace, you have the
option to override cluster network settings. Choosing this option lets you customize the vSphere
Namespace network by adding Ingress, Egress, and Namespace network CIDR (unique from the
Supervisor and from any other vSphere namespace).

The typical use case for overriding Supervisor network settings is to provision a TKG cluster with
routable pod networking.

Note

The Override supervisor network setting is only available if the Supervisor is


configured with NSX networking.

Recommendations for Using Namespace with Tanzu

Decision
Design Decision Design Justification Design Implications
ID

TKO- Create dedicated Segregate prod/dev/test cluster via Clusters created within the namespace
TKGS- namespace to assigning them to dedicated share the same access
003 environment namespaces. policies/quotas/network & storage
specific. resources.

TKO- Register external Limit access to namespace based on role External AD/LDAP needs to be
TKGS- IDP with Supervisor of users or groups. integrated with vCenter or SSO
004 or AD/LDAP with Groups need to be created manually.
vCenter SSO.

TKO- Enable namespace Enables Devops users to create The vSphere administrator must
TKGS- self-service namespace in self-service manner. publish a namespace template to
005 LDAP users or groups to enable them
to create a namespace.

TKO- Use guaranteed VM CPU and Memory limits configured on Consume more infrastructure
TKGS- Class for production vSphere Namespace have impact on TKG resources and contention might
006 cluster. cluster if deployed using the guaranteed occur.
VM Class type.

Tanzu Kubernetes Grid Workload Clusters


vSphere zones provide a way to create highly available Tanzu Kubernetes Grid (TKG) clusters on

VMware, Inc 447


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Supervisor. If you are provisioning TKG cluster across vSphere zones, you must provide the failure
domain for each node pool. Each failure domain maps to a vSphere Zone which thereby will be
associated with one vSphere cluster. Failure domains, also known as vSphere Fault domains, are
defined and managed by the vSphere administrator when creating vSphere Zones.

The Control plane nodes of Tanzu Kubernetes Grid clusters are automatically placed across the
vSphere Zones. However you can control how the worker nodes are spread across zones. You can
define a NodePool object for the worker nodes of Tanzu kubernetes Grid clusters and map each
vSphere Zone to a Failure domain with each NodePools. ClusterAPI spreads the Node Pools across
zones automatically.

In a zone topology, when you provision a TKG cluster on Supervisor, the cluster is aware of the
vSphere Zones. The zone topology supports failure domains for highly available workloads. If
needed you can run workload in a specific zone using annotations.

Tanzu Kubernetes Grid Cluster APIs


Tanzu Kubernetes Grid provides the following two APIS for provisioning and managing the life cycle
of TKG2 clusters:

API Version v1alpha3 for Tanzu Kubernetes clusters

VMware, Inc 448


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

API version v1beta1 for Clusters based on a ClusterClass

The v1alpha3 API lets you create conformant Kubernetes clusters of type TanzuKubernetesCluster.
This type of cluster is pre-configured with common defaults for quick provisioning, and can be
customized. The v1beta1 API lets you create conformant Kubernetes clusters based on the default
ClusterClass named tanzukubernetescluster and cluster type of Cluster.

Tanzu Kubernetes Clusters Networking


A Tanzu Kubernetes cluster provisioned by the Tanzu Kubernetes Grid supports the following
Container Network Interface (CNI) options:

Antrea

Calico

The CNI options are open-source software that provide networking for cluster pods, services, and
ingress.

When you deploy a Tanzu Kubernetes cluster using the default configuration, Antrea CNI is
automatically enabled in the cluster.

To provision a Tanzu Kubernetes cluster using Calico CNI, see Deploy Tanzu Kubernetes Clusters
with Calico.

CNI Use Case Pros and Cons

Ant
Enable Kubernetes pod networking with IP overlay networks using VXLAN Pros:
rea
or Geneve for encapsulation. Optionally encrypt node-to-node
- Antrea leverages Open vSwitch
communication using IPSec packet encryption.
as the networking data plane.
Antrea supports advanced network use cases like kernel bypass and Open vSwitch supports both
network service mesh. Linux and Windows.

- VMware supports the latest


conformant Kubernetes and
stable releases of Antrea.

Cali
Calico is used in environments where factors like network performance, Pros:
co
flexibility, and power are essential.
- Support for Network Policies
For routing packets between nodes, Calico leverages the BGP routing
- High network performance
protocol instead of an overlay network. This eliminates the need to wrap
packets with an encapsulation layer resulting in increased network - SCTP Support
performance for Kubernetes workloads.
Cons:

- No multicast support.

Kubernetes Ingress Routing


vSphere with Tanzu does not ship a default ingress controller. You can use Any Tanzu-supported
ingress controller. For example, Contour, an open-source controller for Kubernetes ingress routing.
Contour is part of a Tanzu package and can be installed on any Tanzu Kubernetes cluster. Deploying
Contour is a prerequisite for deploying Prometheus, Grafana, and Harbour on a workload cluster.
You can also manually deploy AKO on the Tanzu Kubernetes cluster and make use of NSX

VMware, Inc 449


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Advanced Load Balancer as L7 ingress. However, this will require an enterprise license of NSX
Advanced Load Balancer.

For more information about Contour, see Contour and Ingress Using Contour.

Tanzu Service Mesh also offers an Ingress controller based on Istio.

Each ingress controller has advantages and disadvantages of its own. The following table provides
general recommendations on when you should use a specific ingress controller for your Kubernetes
environment:

Ingress
Use Cases
Controller

Contour
You use Contour when only north-south traffic is needed in a Kubernetes cluster. You can apply
security policies for the north-south traffic by defining the policies in the manifest file for the
application.

Contour is a reliable solution for simple Kubernetes workloads.

Istio You use Istio ingress controller when you need to provide security, traffic direction, and insight within
the cluster (east-west traffic), and between the cluster and the outside world (north-south traffic).

Container Registry
vSphere with Tanzu includes Harbor as a container registry. Harbor provides a location of pushing,
pulling, storing, and scanning container images used in your Kubernetes clusters.

The initial configuration and setup of the platform does not require any external registry because the
required images are delivered through vCenter. Harbor registry is used for day-2 operations of the
Tanzu Kubernetes workload clusters. Typical day-2 operations include tasks, such as pulling images
from Harbor for application deployment, and pushing custom images to Harbor.

When vSphere with Tanzu is deployed on NSX networking, you can deploy an external container
registry (Harbor) for Tanzu Kubernetes clusters.

You can use one of the following methods to install Harbor:

Tanzu Kubernetes Grid Package deployment:** VMware recommends this installation


method for general use cases. The Tanzu packages, including Harbor, must either be pulled
directly from VMware or be hosted in an internal registry.

VM-based deployment using OVA: VMware recommends using this installation method in
cases where Tanzu Kubernetes Grid is being installed in an air-gapped or Internet-restricted
environment, and no pre-existing image registry exists to host the Tanzu Kubernetes Grid
system images. VM-based deployments are only supported by VMware Global Support
Services to host the system images for air-gapped or Internet-less deployments. Do not use
this method for hosting application images. Harbor registry is being shipped with TKG
binaries and can be download from here.

If you are deploying Harbor without a publicly signed certificate, you must include the Harbor root
CA in your Tanzu Kubernetes Grid clusters. For more information, see Trust Custom CA Certificates
on Cluster Nodes.

To configure TKG cluster with private container registry, see Integrate TKG 2 cluster with container

VMware, Inc 450


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

registry.

Scale a Tanzu Kubernetes Grid Cluster


You can scale a Tanzu Kubernetes Grid cluster on Supervisor horizontally by changing the number
of nodes, or vertically by changing the virtual machine class hosting the nodes.

The following table lists the supported scaling operations for TKG cluster:

Node Horizontal Scale Out Horizontal Scale In Vertical Scale Volume Scale

Control Plane Yes Yes Yes No

Worker Yes Yes Yes Yes

Note

- The number of control plane nodes must be odd, either 3 or 5. - You can change
the Worker node volumes after provisioning. However, you can not change the
control plane node volumes.

Backup And Restore


There are following two options for backing up and restoring stateless and stateful applications
running on TKG Clusters on Supervisor:

Tool Comments

Velero plug-in for vSphere


Both Kubernetes metadata and persistent volumes can be backed up and restored.

Velero snapshotting is used for persistent volumes with stateful applications.

Requires the Velero plug-in for vSphere installed and configured on Supervisor.

Standalone Velero and Restic


Both Kubernetes metadata and persistent volumes can be backed up and restored.

Restic is used for persistent volumes with stateful applications.

Use this approach if you require portability.

To backup and restore workloads running on TKG Cluster on Zonal Supervisor, create a datastore
and install Velero with Restic on Kubernetes cluster. For more information, see Install and Configure
Standalone Velero and Restic.

Note

Velero plug-in for vSphere runs as a pod which is not supported with Zonal
Supervisor, and it requires NSX-T networking. For more information, see the
prerequisites section of Install and Configure the Velero Plugin for vSphere on
Supervisor.

VMware, Inc 451


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Appendix A - Deploy TKG Cluster


Following is the sample yaml file for deploying the TKG 2 workload cluster on Supervisor:

apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: tkc-prod-cluster-1
namespace: prod
spec:
clusterNetwork:
services:
cidrBlocks: ["10.96.0.0/12"]
pods:
cidrBlocks: ["192.168.0.0/16"]
serviceDomain: "cluster.local"
topology:
class: tanzukubernetescluster
version: v1.23.8+vmware.2-tkg.2-zshippable
#describe the cluster control plane
controlPlane:
#number of control plane nodes; integer 1 or 3
replicas: 3
#describe the cluster worker nodes
workers:
#specifies parameters for a set of worker nodes in the topology
machineDeployments:
- class: node-pool
name: node-pool-1
replicas: 1
failureDomain: zone-a
- class: node-pool
name: node-pool-2
replicas: 1
failureDomain: zone-b
- class: node-pool
name: node-pool-3
replicas: 1
failureDomain: zone-c
variables:
#virtual machine class type and size for cluster nodes
- name: vmClass
value: guaranteed-small
#persistent storage class for cluster nodes
- name: storageClass
value: gold-sp

The TKG cluster is provisioned across vSphere Zones, and can be verified by running the following
command:

kubectl get nodes -L topology.kubernetes.io/zone

NAME STATUS ROLES


AGE VERSION ZONE

tkc-prod-cluster-2-node-pool-1-986dx-5f66d4c87c-5ss6f Ready <none>


46h v1.23.8+vmware.2 zone-a

VMware, Inc 452


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

tkc-prod-cluster-2-node-pool-2-rbmh4-847684c759-szld4 Ready <none>


46h v1.23.8+vmware.2 zone-b

tkc-prod-cluster-2-node-pool-3-2gwqq-9d5576964-sbsx2 Ready <none>


46h v1.23.8+vmware.2 zone-c

tkc-prod-cluster-2-wtjxq-h55lj Ready control-plane,master


46h v1.23.8+vmware.2 zone-a

tkc-prod-cluster-2-wtjxq-wcb8b Ready control-plane,master


46h v1.23.8+vmware.2 zone-b

tkc-prod-cluster-2-wtjxq-wzxgp Ready control-plane,master


46h v1.23.8+vmware.2 zone-c

Appendix B - Deploy StatefulSet Application to vSphere


Zones
The following Stateful sample application deploys pod to each vSphere zone, each having a
persistent identifier that it maintains across any rescheduling:

apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
replicas: 3
selector:
matchLabels:
app: nginx
serviceName: nginx
template:
metadata:
labels:
app: nginx
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- zone-1
- zone-2
- zone-3
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: topology.kubernetes.io/zone
containers:

VMware, Inc 453


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

- name: nginx
image: gcr.io/google_containers/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
- name: logs
mountPath: /logs
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: zonal-ds-policy-105-latebinding
resources:
requests:
storage: 2Gi
- metadata:
name: logs
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: zonal-ds-policy-105-latebinding
resources:
requests:
storage: 1Gi

To verify the pod scheduling across zones, run the following command:

kubectl get pods -o wide

NAME READY STATUS RESTARTS AGE IP NODE


NOMINATED NODE READINESS GATES

web-0 1/1 Running 0 34m 192.168.3.5 tkc-prod-cluster-3-node-pool-


2-w77p4-578fcc7d54-g99ts <none> <none>

web-1 1/1 Running 0 34m 192.168.1.6 tkc-prod-cluster-3-node-pool-


1-9d5wd-65b6874698-f59hg <none> <none>

web-2 1/1 Running 0 33m 192.168.2.6 tkc-prod-cluster-3-node-pool-


3-5vv5h-77b4cc5598-np4wx <none> <none>

VMware, Inc 454


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Authentication with Pinniped

VMware Tanzu Kubernetes Grid (informally known as TKG) implements user authentication with
Pinniped, an open-source authentication service for Kubernetes clusters. Pinniped allows you to
plug external OpenID Connect (OIDC) or LDAP identity providers (IdP) into Tanzu Kubernetes
(workload) clusters so that you can control user access to those clusters.

For LDAP authentication, Pinniped uses Dex as the endpoint to connect to your upstream
LDAP IdP.

If you use OIDC, Pinniped provides its own endpoint, so Dex is not required.

Pinniped and Dex run automatically as in-cluster services in your management clusters if you enable
identity management. For instructions on how to enable identity management in Tanzu Kubernetes
Grid, see Configure Identity Management.

Authentication Flow
The authentication flow between the management and workload clusters includes the following:

The Tanzu Kubernetes Grid administrator enables and configures identity management on
the management cluster, specifying an external LDAP or OIDC IdP.

Authentication service components are deployed into the management cluster, using the
LDAP or OIDC IdP details specified by the administrator.

The administrator creates a Tanzu Kubernetes (workload) cluster. The workload cluster
inherits the authentication configuration from the management cluster.

The administrator creates a role binding to associate a given user with a given role on the
workload cluster.

The administrator provides the kubeconfig for the workload cluster to the user.

A user uses the kubeconfig to connect to the workload cluster, for example, by running
kubectl get pods --kubeconfig.

The management cluster authenticates the user with the IdP.

The workload cluster either allows or denies the kubectl get pods request, depending on the
permissions of the user’s role.

In the following image, the blue arrows represent the authentication flow between the workload
cluster, the management cluster, and the external IdP. The green arrows represent Tanzu CLI and
kubectl traffic between the workload cluster, the management cluster, and the external IdP.

VMware, Inc 455


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

We recommend the following best practices for managing identities in Tanzu Kubernetes Grid
provisioned clusters:

Configure Pinniped services during management cluster creation.

Limit access to management clusters to the appropriate set of users. For example, provide
access only to users who are responsible for managing infrastructure and cloud resources
but not to application developers. This is especially important because access to the
management cluster inherently provides access to all workload clusters.

Limit cluster administrator access for workload clusters to the appropriate set of users. For
example, provide access to users who are responsible for managing infrastructure and
platform resources in your organization, but not to application developers.

Connect to an identity provider to manage the user identities allowed to access cluster
resources instead of relying on administrator-generated kubeconfig files.

Prepare External Identity Management


VMware Tanzu Kubernetes Grid (informally known as TKG) implements user authentication with
Pinniped. Pinniped allows you to plug external OpenID Connect (OIDC) or LDAP identity providers
(IDP) into Tanzu Kubernetes clusters, so that you can control user access to those clusters.

You can enable identity management during or after management cluster deployment. Any
workload clusters that you create after enabling identity management are automatically configured to
use the same identity provider as the management cluster.

Enabling and configuring identity management during management cluster deployment


(Recommended):

Obtain your identity provider details.

Use the obtained details to configure LDAPS or OIDC in Tanzu Kubernetes Grid.

After the management cluster has been created, confirm that the authentication
service is running correctly and complete its configuration.

Enabling and configuring identity management after management cluster deployment:

Obtain your identity provider details.

Generate the Pinniped add-on secret for the management cluster.

VMware, Inc 456


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Confirm that the authentication service is running correctly and complete its
configuration.

If the management cluster manages any workload clusters, generate the Pinniped
add-on secret for each workload cluster that was created before you enabled identity
management.

Enable and Configure Identity Management During


Management Cluster Deployment
This section explains how to enable and configure identity management during management cluster
deployment.

Obtain Your Identity Provider Details


Before you can enable identity management, you must have an identity provider. Tanzu Kubernetes
Grid supports LDAPS and OIDC identity providers.

To use your company’s internal LDAPS server as the identity provider, obtain LDAPS information
from your LDAP administrator.

To use OIDC as the identity provider, you must have an account with an identity provider that
supports the OpenID Connect standard, for example, Okta.

For more information on using Okta as your OIDC provider, see Register a Tanzu Kubernetes Grid
Application in Okta.

Configure LDAPS or OIDC Settings in Tanzu Kubernetes Grid


When you are deploying your management cluster using the installer interface, configure LDAPS or
OIDC in the Identity Management section. For instructions, see Configure Identity Management in
Deploy Management Clusters with the Installer Interface.

1. In the Identity Management section of the management cluster deployment UI,

1. Enable Enable Identity Management Settings.

2. Select the provider as OIDC or LDAPS.

2. If you choose to use OIDC, provide details of your OIDC provider account, for example,
Okta.

Issuer URL: The IP or DNS address of your OIDC server.

Client ID: The client_id value that you obtain from your OIDC provider. For example,
if your provider is Okta, log in to Okta, create a Web application, and select the

VMware, Inc 457


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Client Credentials options in order to get a client_id and secret.

Client Secret: The secret value that you obtain from your OIDC provider.

Scopes: A comma separated list of additional scopes to request in the token


response. For example, openid,groups,email.

Username Claim: The name of your username claim. This is used to set a user’s
username in the JSON Web Token (JWT) claim. Depending on your provider, enter
claims such as user_name, email, or code.

Groups Claim: The name of your groups claim. This is used to set a user’s group in
the JWT claim. For example, groups.

3. If you choose to use LDAPS, provide details of your company’s LDAPS server. All settings
except for LDAPS Endpoint are optional.

LDAPS Endpoint: The IP or DNS address of your LDAPS server. Provide the
address and port of the LDAP server, in the form host:port.

Bind DN: The DN for an application service account. The connector uses these
credentials to search for users and groups. Not required if the LDAP server provides
access for anonymous authentication.

Bind Password: The password for an application service account, if Bind DN is set.

Base DN: The point from which to start the LDAP search. For example,
OU=Users,OU=domain,DC=io.

Filter: An optional filter to be used by the LDAP search.

Username: The LDAP attribute that contains the user ID. For example, uid,
sAMAccountName.

Base DN: The point from which to start the LDAP search. For example,
OU=Groups,OU=domain,DC=io.

Filter: An optional filter to be used by the LDAP search.

Name Attribute: The LDAP attribute that holds the name of the group. For example,

VMware, Inc 458


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

cn.

User Attribute: The attribute of the user record that is used as the value of the
membership attribute of the group record. For example, distinguishedName, dn.

Group Attribute: The attribute of the group record that holds the user/member
information. For example, member.

Paste the contents of the LDAPS server CA certificate into the Root CA text box.

(Optional) Verify the LDAP settings.

Click Verify LDAP Configuration.

Enter a user name and group name.

Click Start. After verification completes, if you see any failures, examine them
closely.

4. Click Next and proceed with management cluster creation.

5. After you deploy the management cluster, proceed with Complete the configuration of
Identity Management in the Management Cluster

Enable and Configure Identity Management in an Existing


Deployment
If you did not configure identity management when you deployed your management cluster, you
can enable it as a post-deployment step by doing the following:

1. Obtain your identity provider details.

2. Generate a Kubernetes secret for the Pinniped add-on and deploy Pinniped package.

3. Complete the identity management configuration on the management cluster.

4. Create role bindings for your management cluster users.

5. Enable identity management in workload clusters.

Obtain your identity provider details

VMware, Inc 459


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Before you can identity management, you must have an identity provider. Tanzu Kubernetes Grid
supports LDAPS and OIDC identity providers.

To use your company’s internal LDAPS server as the identity provider, obtain LDAPS information
from your LDAP administrator.

To use OIDC as the identity provider, you must have an account with an identity provider that
supports the OpenID Connect standard, for example, Okta.

For more information on using Okta as your OIDC provider, see Register a Tanzu Kubernetes Grid
Application in Okta.

For more information on obtaining your identity provider details, see Obtain Your Identity Provider
Details.

Generate the Pinniped Add-on Secret for the Management Cluster


and deploy Pinniped package
This procedure configures the Pinniped add-on and deploys the authentication components in your
management cluster. To generate a Kubernetes secret for the Pinniped add-on:

1. Set the context of kubectl to your management cluster. For example, with a management
cluster named id-mgmt-test:

# kubectl config use-context tkg-mgmt-01-admin@tkg-mgmt-01


Switched to context "tkg-mgmt-01-admin@tkg-mgmt-01"

2. Create a cluster configuration file by copying the configuration settings that you defined
when you deployed your management cluster into a new file. In addition to the variables
from the original management cluster configuration, include the following OIDC or LDAP
identity provider details in the file:

Note

With the exception of IDENTITY_MANAGEMENT_TYPE, you need to set


these variables only for management clusters. For workload clusters, set
IDENTITY_MANAGEMENT_TYPE to the same value as in the management
cluster.

# Identity management type. This must be "oidc" or "ldap".

IDENTITY_MANAGEMENT_TYPE:

# Set these variables if you want to configure OIDC.

CERT_DURATION: 2160h
CERT_RENEW_BEFORE: 360h
OIDC_IDENTITY_PROVIDER_CLIENT_ID:
OIDC_IDENTITY_PROVIDER_CLIENT_SECRET:
OIDC_IDENTITY_PROVIDER_GROUPS_CLAIM:
OIDC_IDENTITY_PROVIDER_ISSUER_URL:
OIDC_IDENTITY_PROVIDER_SCOPES: "email,profile,groups"
OIDC_IDENTITY_PROVIDER_USERNAME_CLAIM:

VMware, Inc 460


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

# Set these variables if you want to configure LDAP.

LDAP_BIND_DN:
LDAP_BIND_PASSWORD:
LDAP_GROUP_SEARCH_BASE_DN:
LDAP_GROUP_SEARCH_FILTER:
LDAP_GROUP_SEARCH_GROUP_ATTRIBUTE:
LDAP_GROUP_SEARCH_NAME_ATTRIBUTE: cn
LDAP_GROUP_SEARCH_USER_ATTRIBUTE: DN
LDAP_HOST:
LDAP_ROOT_CA_DATA_B64:
LDAP_USER_SEARCH_BASE_DN:
LDAP_USER_SEARCH_EMAIL_ATTRIBUTE: DN
LDAP_USER_SEARCH_FILTER:
LDAP_USER_SEARCH_ID_ATTRIBUTE: DN
LDAP_USER_SEARCH_NAME_ATTRIBUTE:
LDAP_USER_SEARCH_USERNAME: userPrincipalName

3. Providing a sample management cluster configuration file below after updating the ldap
configuration.

#! ---------------------------------------------------------------------
#! vSphere non proxy env configs
#! ---------------------------------------------------------------------
AVI_CA_DATA_B64: <base64-encoded-cert>
AVI_CLOUD_NAME: tkgvsphere-cloud01
AVI_CONTROLLER: alb-ctlr01.lab.vmw
AVI_DATA_NETWORK: tkg-mgmt-vip-segment
AVI_DATA_NETWORK_CIDR: 172.16.50.0/24
AVI_ENABLE: 'true'
AVI_LABELS: |
'type': 'management'
AVI_PASSWORD: <encoded:Vk13YXJlMSE=>
AVI_SERVICE_ENGINE_GROUP: tkgvsphere-tkgmgmt-group01
AVI_USERNAME: admin
CLUSTER_CIDR: 100.96.0.0/11
CLUSTER_NAME: tkg-mgmt-01
CLUSTER_PLAN: prod
ENABLE_CEIP_PARTICIPATION: 'true'
ENABLE_MHC: 'true'
#----------Providing the ldap config here---------------------
IDENTITY_MANAGEMENT_TYPE: ldap
LDAP_BIND_DN: cn=administrator,cn=Users,dc=lab,dc=vmw
LDAP_BIND_PASSWORD: VMware1!
LDAP_GROUP_SEARCH_BASE_DN: cn=Users,dc=lab,dc=vmw
LDAP_GROUP_SEARCH_FILTER: (objectClass=group)
LDAP_GROUP_SEARCH_GROUP_ATTRIBUTE: member
LDAP_GROUP_SEARCH_NAME_ATTRIBUTE: member
LDAP_GROUP_SEARCH_USER_ATTRIBUTE: DN
LDAP_HOST: dns.lab.vmw
LDAP_ROOT_CA_DATA_B64: <base64-encoded-cert>
LDAP_USER_SEARCH_BASE_DN: cn=Users,dc=lab,dc=vmw
LDAP_USER_SEARCH_EMAIL_ATTRIBUTE: DN
LDAP_USER_SEARCH_FILTER: (objectClass=person)
LDAP_USER_SEARCH_ID_ATTRIBUTE: DN
LDAP_USER_SEARCH_NAME_ATTRIBUTE: userPrincipalName
LDAP_USER_SEARCH_USERNAME: userPrincipalName
#--------------------------

VMware, Inc 461


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

INFRASTRUCTURE_PROVIDER: vsphere
SERVICE_CIDR: 100.64.0.0/13
TKG_HTTP_PROXY_ENABLED: false
DEPLOY_TKG_ON_VSPHERE7: 'true'
VSPHERE_DATACENTER: /tkgm-internet-dc1
VSPHERE_DATASTORE: /tkgm-internet-dc1/datastore/vsanDatastore
VSPHERE_FOLDER: /tkgm-internet-dc1/vm/tkg-vsphere-tkg-mgmt
VSPHERE_NETWORK: pg-tkg_mgmt
VSPHERE_PASSWORD: <encoded:Vk13YXJlMSE=>
VSPHERE_RESOURCE_POOL: /tkgm-internet-dc1/host/tkgm-internet-c1/Resources/tkg-v
sphere-tkg-Mgmt
VSPHERE_SERVER: vcenter.lab.vmw
VSPHERE_SSH_AUTHORIZED_KEY: <vcenter-ssh-key>
VSPHERE_USERNAME: [email protected]
VSPHERE_INSECURE: 'true'
AVI_CONTROL_PLANE_HA_PROVIDER: 'true'
ENABLE_AUDIT_LOGGING: 'true'
OS_ARCH: amd64
OS_NAME: photon
OS_VERSION: 3
AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_NAME: pg-tkg-cluster-vip
AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_CIDR: 172.16.80.0/24
WORKER_SIZE: medium
CONTROLPLANE_SIZE: medium

4. Make sure your local environment has IDENTITY_MANAGEMENT_TYPE set to either oidc
or ldap, and not none:

# export IDENTITY_MANAGEMENT_TYPE=ldap

# echo $IDENTITY_MANAGEMENT_TYPE
ldap

5. Set the _TKG_CLUSTER_FORCE_ROLE environment variable to management:

export _TKG_CLUSTER_FORCE_ROLE="management"

6. Set the FILTER_BY_ADDON_TYPE environment variable to authentication/pinniped:

export FILTER_BY_ADDON_TYPE="authentication/pinniped"

7. Generate a secret for the Pinniped add-on:

tanzu cluster create CLUSTER-NAME --dry-run -f CLUSTER-CONFIG-FILE > CLUSTER-NA


ME-example-secret.yaml

Example:
# tanzu cluster create tkg-mgmt-01 --dry-run -f tkg-mgmt-01.yaml > tkg-mgmt-01-
example-secret.yaml

# ls
tkg-mgmt-01.yaml tkg-mgmt-01-example-secret.yaml

The environment variable settings cause tanzu cluster create –dry-run to generate a
Kubernetes secret, not a full cluster manifest.

VMware, Inc 462


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Note

This command generates the manifest with default namespace, however you
need to create the secret in tkg-system namespace for kapp controller to
reconcile the core addon. So manually edit the file and change the
namespace to “tkg-system”

8. Review the secret and then apply it to the management cluster. For example:

# kubectl apply -f tkg-mgmt-01-example-secret.yaml


secret/tkg-mgmt-01-pinniped-addon created

9. After applying the secret, check the status of the Pinniped add-on by running the kubectl get
app command:

# kubectl get app pinniped -n tkg-system


NAME DESCRIPTION SINCE-DEPLOY AGE
pinniped Reconcile succeeded 92s 4m3s

Note

If the Pinniped app reconcile fails, see Troubleshooting Core Add-on


Configuration.

Complete the Identity Management Configuration on


Management Cluster
After deploying the management cluster, do the following to complete the identity management
configuration:

1. Connect kubectl to the management cluster.

2. Confirm that the authentication service is running correctly by checking its status:
OIDC: Check the Status of an OIDC Identity Management Service.

LDAP: Check the Status of an LDAP Identity Management Service.

OIDC: Provide the Callback URI to the OIDC Provider.

3. If you want to use regular, non-administrator kubeconfig files for access to the management
cluster, after completing the configuration of identity management, configure RBAC by
following the instructions in Configure RBAC for a Management Cluster.

Connect kubectl to the Management Cluster


To configure identity management, you must obtain and use the admin context of the management
cluster:

1. Get the admin context of the management cluster. The procedures in this topic use a

VMware, Inc 463


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

management cluster named tkg-mgmt.

$ tanzu management-cluster `kubeconfig` get tkg-mgmt --admin

Credentials of cluster 'tkg-mgmt' have been saved


You can now access the cluster by running 'kubectl config use-context tkg-mgmt-
admin@tkg-mgmt'

The admin context of a cluster gives you full access to the cluster without requiring
authentication with your IDP

2. Set kubectl to the admin context of the management cluster:

$ kubectl config use-context tkg-mgmt@tkg-mgmt


Switched to context "tkg-mgmt-admin@tkg-mgmt".

Check the Status of an LDAP Identity Management Service


Tanzu Kubernetes Grid uses Pinniped to integrate clusters with an LDAP identity service, along with
Dex to expose the service endpoint. When you enable LDAP, Tanzu Kubernetes Grid creates the
pinniped-supervisor service in the pinniped-supervisor namespace, pinniped-concierge in the
pinniped-concierge namespace, and dexsvc in the tanzu-system-auth namespace.

Follow the steps below to check the status of an LDAP service and note the EXTERNAL-IP address
at which the service is exposed.

1. Verify that the Pinniped package was installed and reconciled successfully.

# kubectl get app pinniped -n tkg-system

NAMESPACE NAME DESCRIPTION


SINCE-DEPLOY AGE
tkg-system pinniped Reconcile succeeded 3m44s
4 9 m

2. Get information about the pinniped supervisor and dexsvc services that are running in the
management cluster.

# kubectl get svc -n pinniped-supervisor


pinniped-supervisor pinniped-supervisor
LoadBalancer 100.67.31.28 192.168.51.4 443:31088/TCP 1
7d

# kubectl get svc -n tanzu-system-auth


NAME TYPE CLUSTER-IP EXTERNAL-IP P
ORT(S) AGE
dexsvc LoadBalancer 100.65.110.223 192.168.51.5 443:3
1769/TCP 8h

3. Proceed with generating the kubeconfig and creating the role based access control. For
more information, see Configure RBAC.

Check the Status of an OIDC Identity Management Service

VMware, Inc 464


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Tanzu Kubernetes Grid uses Pinniped to integrate clusters with an OIDC identity service. When you
enable OIDC, Tanzu Kubernetes Grid creates the pinniped-supervisor service in the pinniped-
supervisor namespace and pinniped-concierge in the pinniped-concierge namespace.

Follow the steps below to check the status of the Pinniped service and note the EXTERNAL-IP
address at which the service is exposed.

1. Verify that the Pinniped package was installed and reconciled successfully.

# kubectl get apps pinniped -n tkg-system

NAMESPACE NAME DESCRIPTION


SINCE-DEPLOY AGE
tkg-system pinniped Reconcile succeeded 3m44s
4 9 m

2. Get information about the services that are running in the management cluster. The identity
management service runs in the pinniped-supervisor namespace.

# kubectl get svc -n pinniped-supervisor


NAME TYPE CLUSTER-IP EXTERNAL-IP PORT
( S ) A G E
pinniped-supervisor LoadBalancer 100.65.110.223 192.168.51.4 443:31769/
TCP 8h

3. Note the external address of the pinniped-supervisor service, as listed under EXTERNAL-IP.

4. Update the External IP in the login redirect URI in the OIDC identity provider. For more
information, see Provide the Callback URI to the OIDC Provider.

5. Once you update the redirect URI, proceed with generating the kubeconfig and creating the
role based access control. For more information, see Configure RBAC.

Provide the Callback URI to the OIDC Provider (OIDC Only)


If you configured the management cluster to use OIDC authentication, you must provide the callback
URI for that management cluster to your OIDC identity provider.

1. Log in to your Okta account.

2. In the main menu, go to Applications.

3. Select the application that you created for Tanzu Kubernetes Grid.

4. In the General Settings panel, click Edit.

5. Under Login, update Login redirect URIs to include the address of the node in which the
pinniped-supervisor is running.

6. Add the external IP address of the node at which the pinniped-supervisor service is running,
that you noted in the previous procedure.

Note

You must specify https instead of http.

VMware, Inc 465


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Enable Identity Management on Workload Clusters


Any workload clusters that you create after you enable identity management in the management
cluster are automatically configured to use the same identity management service. You can proceed
with generating a kubeconfig file and share it with users for connecting to the cluster. For more
information on generating kubeconfig and creating the role based access, see Configure RBAC.

If a workload cluster was created before you enabled identity management for your management
cluster, you must enable it manually. To enable identity management for a workload cluster:

1. Generate the Pinniped add-on secret:

Create a cluster configuration file by copying the configuration settings that you
defined when you deployed your workload cluster into a new file. In addition to the
variables from the original cluster configuration, include the following:

# Identity management type used by the management cluster. This must be "
oidc" or "ldap".
IDENTITY_MANAGEMENT_TYPE:

# This is the Pinniped supervisor service endpoint in the management clus


ter.
SUPERVISOR_ISSUER_URL:

# Pinniped uses this b64-encoded CA bundle data for communication between


the management cluster and the workload cluster.
SUPERVISOR_ISSUER_CA_BUNDLE_DATA_B64:

2. You can retrieve SUPERVISOR_ISSUER_URL and


SUPERVISOR_ISSUER_CA_BUNDLE_DATA_B64 by running kubectl get configmap
pinniped-info -n kube-public -o yaml against the management cluster.

# kubectl get configmap pinniped-info -n kube-public -o yaml


apiVersion: v1
data:
cluster_name: tkg-mgmt-01
concierge_is_cluster_scoped: "true"
issuer: https://172.16.80.104
issuer_ca_bundle_data: <base64-issuer-ca>
kind: ConfigMap
metadata:
creationTimestamp: "2022-03-24T12:03:46Z"
name: pinniped-info

VMware, Inc 466


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

namespace: kube-public
resourceVersion: "62756"
uid: 7f399d41-ab1b-41f2-9cd1-1d5fc4ddf9e1

3. Create the cluster configuration by providing above details. Here is a sample workload config
file with ldap configuration.

CLUSTER_CIDR: 100.96.0.0/11
CLUSTER_NAME: workload-2
CLUSTER_PLAN: dev
ENABLE_CEIP_PARTICIPATION: 'true'
ENABLE_MHC: 'true'
IDENTITY_MANAGEMENT_TYPE: none
INFRASTRUCTURE_PROVIDER: vsphere
SERVICE_CIDR: 100.64.0.0/13
TKG_HTTP_PROXY_ENABLED: false
DEPLOY_TKG_ON_VSPHERE7: 'true'
VSPHERE_DATACENTER: /tkgm-internet-dc1
VSPHERE_DATASTORE: vsanDatastore
VSPHERE_FOLDER: /tkgm-internet-dc1/vm/tkg-vsphere-workload
VSPHERE_NETWORK: /tkgm-internet-dc1/network/pg-tkg_workload
VSPHERE_PASSWORD: <encoded:Vk13YXJlMSE=>
VSPHERE_RESOURCE_POOL: /tkgm-internet-dc1/host/tkgm-internet-c1/Resources/tkg-v
sphere-workload
VSPHERE_SERVER: vcenter.lab.vmw
VSPHERE_SSH_AUTHORIZED_KEY: <vsphere-ssh-key>

VSPHERE_USERNAME: [email protected]
WORKER_MACHINE_COUNT: 1
VSPHERE_INSECURE: 'true'
ENABLE_AUDIT_LOGGING: 'true'
ENABLE_DEFAULT_STORAGE_CLASS: 'true'
ENABLE_AUTOSCALER: false
AVI_CONTROL_PLANE_HA_PROVIDER: 'true'
OS_ARCH: amd64
OS_NAME: photon
OS_VERSION: 3
WORKER_SIZE: medium
CONTROLPLANE_SIZE: medium

IDENTITY_MANAGEMENT_TYPE: ldap

SUPERVISOR_ISSUER_URL: https://172.16.80.104

SUPERVISOR_ISSUER_CA_BUNDLE_DATA_B64: <Supervisor issuer CA from previous step>

4. Ensure that the local environment variable IDENTITY_MANAGEMENT_TYPE set to either


oidc or ldap:

# export IDENTITY_MANAGEMENT_TYPE=ldap

# echo $IDENTITY_MANAGEMENT_TYPE
ldap

5. Set the _TKG_CLUSTER_FORCE_ROLE environment variable to workload:

VMware, Inc 467


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

export _TKG_CLUSTER_FORCE_ROLE="workload"

6. Set the FILTER_BY_ADDON_TYPE environment variable to authentication/pinniped:

export FILTER_BY_ADDON_TYPE="authentication/pinniped"

7. Generate a secret for the Pinniped add-on:

tanzu cluster create workload-2 --dry-run -f workload-2.yaml > workload-2-examp


le-secret.yaml

# ls
Workload-2.yaml workload-2-example-secret.yaml

The environment variable settings cause tanzu cluster create –dry-run to generate a
Kubernetes secret, not a full cluster manifest.

8. Review the secret and apply it to the management cluster. The Pinniped add-on secret is
always created or applied to the management cluster, even if you are configuring a workload
cluster.

Set the context of kubectl to the management cluster. For example, with a
management cluster named tkg-mgmt:

# kubectl config use-context tkg-mgmt-01-admin@tkg-mgmt-01


Switched to context "tkg-mgmt-01-admin@tkg-mgmt-01"

Apply the secret.

# kubectl apply -f workload-2-example-secret.yaml


secret/workload-2-pinniped-addon created

9. After applying the secret, check the status of the Pinniped add-on:

Set the context of kubectl to your workload cluster:

# kubectl config use-context workload-2-admin@workload-2


Switched to context "workload-2-admin@workload-2".

Run the kubectl get app command against the workload cluster:

# kubectl get app pinniped -n tkg-system

NAME DESCRIPTION SINCE-DEPLOY AGE


pinniped Reconcile succeeded 53s 2m43s

10. If you plan to use regular, non-administrator kubeconfig files for cluster access, proceed with
generating the kubeconfig and creating the role based access control. For more information,
see Configure RBAC.

Configure RBAC
To give users access to a management or a workload cluster, you generate a kubeconfig file and
then share the file with those users. If you provide them with the administrator kubeconfig for the

VMware, Inc 468


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

cluster, they have full access to the cluster and do not need to be authenticated. However, if you
provide users with the regular kubeconfig, they must have a user account in your OIDC or LDAP
identity provider and you must configure RBAC on the cluster to grant access permissions to the
designated user.

For more information on how to configure role-based access control (RBAC) in Tanzu Kubernetes
Grid, see Configure RBAC.

Generate and Test a Non-Administrator kubeconfig File for the


Tanzu Clusters
This procedure allows you to test the login step of the authentication process if a browser is present
on the machine on which you are running tanzu and kubectl commands. If the machine does not
have a browser, see Authenticate Users on a Machine Without a Browser.

1. Export the regular kubeconfig for the management cluster to a local file, for example,
/tmp/id_mgmt_test_kubeconfig. Note that the command does not include the –admin
option, so the kubeconfig that is exported is the regular kubeconfig, not the admin version.

# tanzu management-cluster `kubeconfig` get tkg-mgmt --export-file /tmp/id_mgmt


_test_kubeconfig
You can now access the cluster by specifying '--kubeconfig /tmp/id_mgmt_test_ku
beconfig flag when using `kubectl` command

2. Connect to the management cluster by using the newly created kubeconfig file:

# kubectl get pods -A --kubeconfig /tmp/id_mgmt_test_kubeconfig

The authentication process requires a browser to be present on the machine from which
users connect to clusters because running kubectl commands automatically opens the IdP
login page so that users can log in to the cluster. Your browser should open and display the
login page for your OIDC provider or an LDAPS login page.

LDAPS:

OIDC:

VMware, Inc 469


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Enter the credentials of a user account that exists in your OIDC or LDAP server. After a
successful login, the browser should display the following.

3. Go back to the terminal in which you run tanzu and kubectl commands:

If you already configured a role binding on the cluster for the authenticated user, the
output of kubectl get pods -A appears, displaying the pod information.

If you have not configured a role binding on the cluster, you see a message denying
the user account access to the pods:

Error from server (Forbidden): pods is forbidden: User "[email protected]"


cannot list resource "pods" in API group "" at the cluster scope.

This happens because the user has been successfully authenticated, but they are not yet
authorized to access any resources on the cluster. To authorize the user to access the cluster
resources, you must Create a Role Binding on the Management Cluster as described in
Create a Role Binding on the Management Cluster.

Create a Role Binding on the Management Cluster


To give non-admin users access to a management cluster, you generate and distribute a kubeconfig
file as described in Generate and Test a Non-Administrator kubeconfig File for the Management
Cluster.

VMware, Inc 470


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

To make the kubeconfig work, you must first set up RBAC by creating a role binding on the
management cluster. This role binding assigns role-based permissions to individual authenticated
users or user groups. There are many roles with which you can associate users, but the most useful
roles are the following:

cluster-admin: Can perform any operation on the cluster.

admin: Permission to view most resources but can only modify resources like roles and
bindings. Cannot modify pods or deployments.

edit: The opposite of admin. Can create, update, and delete resources like deployments,
services, and pods. Cannot change roles or permissions.

view: Read-only.

You can assign any of the roles to users. For more information about custom roles and role bindings,
see Using RBAC Authorization in the Kubernetes documentation.

1. Make sure that you are using the admin context of the management cluster:

kubectl config current-context

2. If the context is not the management cluster admin context, set kubectl to use that context.
For example:

kubectl config use-context tkg-mgmt-admin@tkg-mgmt

3. List your existing roles:

To see the full list of namespace-scoped roles, run:

kubectl get roles --all-namespaces

To see the full list of cluster-scoped roles, run:

kubectl get clusterroles

4. To associate a given user with a role, create a role binding.

A RoleBinding can reference a Role or ClusterRole.

A ClusterRoleBinding can reference only a ClusterRole.

5. The following example creates a cluster role binding named id-mgmt-test-rb that binds the
cluster role cluster-admin to the user [email protected]:

kubectl create clusterrolebinding tkg-mgmt-rb --clusterrole cluster-admin --use


r [email protected]

For –user, specify the OIDC or LDAP username of the user. You configured the username
attribute and other identity provider details in the Identity Management section of the Tanzu
Kubernetes Grid installer interface or by setting the LDAP_* or OIDC_* variables:

1. OIDC: The username attribute is set in the Username Claim field under OIDC
Identity Management Source in the installer interface or the
OIDC_IDENTITY_PROVIDER_USERNAME_CLAIM configuration variable.

VMware, Inc 471


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

2. LDAPS: The username attribute is set in the Username field under LDAPS Identity
Management Source –> User Search Attributes in the installer interface or the
LDAP_USER_SEARCH_USERNAME configuration variable.

For example, for OIDC, the username is often the email address of the user. For
LDAPS, it is the LDAP username, not the email address.

3. Attempt to connect to the management cluster again by using the kubeconfig file
that you created in the previous procedure:

kubectl get pods -A --kubeconfig /tmp/id_mgmt_test_kubeconfig

This time, because the user is bound to the cluster-admin role on this management cluster, the list of
pods should be displayed. You can share the generated kubeconfig file with any users for whom you
configure role bindings on the management cluster.

Authenticate Users on a Machine Without a Browser


If your bootstrap machine is a jumpbox or other machine with no display, you can authenticate to a
cluster from a browser running on your local machine. How you do this depends on the cluster’s
Pinniped version, which comes from the Tanzu Kubernetes release that the cluster is based on.

For clusters based on older TKrs or created by older versions of Tanzu Kubernetes Grid : Follow the
Authenticate Users on a Machine Without a Browser procedure in the Tanzu Kubernetes Grid v1.4
documentation.

For clusters based on TKr v1.22.5 (default for Tanzu Kubernetes Grid v1.5) or later, do the following:

1. From a terminal window on your local machine, run ssh to remotely log in to your bootstrap
machine.

2. Set the TANZU_CLI_PINNIPED_AUTH_LOGIN_SKIP_BROWSER=true environment


variable. This adds the –skip-browser option to the kubeconfig for the cluster.

export TANZU_CLI_PINNIPED_AUTH_LOGIN_SKIP_BROWSER=true

3. Export the regular kubeconfig for the cluster to a local file. Note that the command does not
include the –admin option, so the kubeconfig that is exported is the regular kubeconfig, not
the admin version.

## For a management cluster, run:

# tanzu management-cluster `kubeconfig` get tkg-mgmt --export-file /tmp/mgmt-ku


beconfig
You can now access the cluster by specifying '--kubeconfig /tmp/mgmt-kubeconfig
' flag when using `kubectl` command

## For a workload cluster, run:

# tanzu cluster `kubeconfig` get workload-2 --export-file /tmp/workload-kubecon


fig
You can now access the cluster by specifying '--kubeconfig /tmp/workload-kubeco
nfig' flag when using `kubectl` command

VMware, Inc 472


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

4. Connect to the cluster by using the newly-created kubeconfig file. The CLI outputs a login
link for your identity provider.

# kubectl get pods -A --kubeconfig /tmp/mgmt-kubeconfig


Log in by visiting this link:
https://172.16.80.104/oauth2/authorize?access_type=offline&client_id=pinniped-c
li&code_challenge=Was7DEO7GQL4tSOrQ9qH4D31stBngVkLgVcwS6SYFQg&code_challenge_me
thod=S256&nonce=b4a1d3cc079c803387f4ad87b91e551d&redirect_uri=https%3A%2F%2Ffanyv88.com%3A443%2Fhttp%2F127.
0.0.1%3A41723%2Fcallback&response_mode=form_post&response_type=code&scope=offli
ne_access+openid+pinniped%3Arequest-audience&state=266ffa2a3852bf523609d625f98b
0588

Optionally, paste your authorization code:

5. Copy the link and paste it into a browser on your local machine and log in to your identity
provider.

6. A page appears prompting you to paste an authorization code into the CLI to finish you login:

7. Copy the authorization code and paste it into the CLI, after the Optionally, paste your
authorization code: prompt.

8. Connect to the cluster again by using the same kubeconfig file as you used previously:

# kubectl get pods -A --kubeconfig /tmp/mgmt-kubeconfig


Error from server (Forbidden): pods is forbidden: User "[email protected]" ca
nnot list resource "pods" in API group "" at the cluster scope

9. If you already configured a role binding on the cluster for the authenticated user, the output
shows the pod information.

10. If you have not configured a role binding on the cluster, you will see a message denying the
user account access to the pods: Error from server (Forbidden): pods is forbidden: User

VMware, Inc 473


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

[email protected]” cannot list resource “pods” in API group "" at the cluster scope. This
happens because the user has been successfully authenticated, but they are not yet
authorized to access any resources on the cluster.

11. To authorize the user to access the cluster resources, you must configure RBAC on the
cluster by creating a cluster role binding. For more information, see Configure RBAC.

VMware, Inc 474


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Enable Data Protection on a Workload


Cluster and Configure Backup

You can run backup and restore operations in Tanzu Mission Control to protect your Kubernetes
data.

Prerequisites
Before you enable Data Protection on a workload cluster, ensure the following prerequisites:

You have an active Tanzu Mission Control subscription.

The workload cluster that you want to protect is registered or attached with Tanzu Mission
Control.

You have created a credential for Data Protection as per instructions provided in the Tanzu
Mission Control documentation.

You have created a Target Location for Data Protection as per instructions provided in the
Tanzu Mission Control documentation.

For more information about protecting the data resources in your Kubernetes clusters, see Data
Protection in VMware Tanzu Mission Control Concepts.

Enable Data Protection on Workload Cluster


To enable data protection on a workload cluster,

1. Locate the cluster in the Tanzu Mission Control portal and click on the Overview tab.

2. In the Data protection section, click Enable Data Protection.

VMware, Inc 475


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

3. Click Enable on the confirmation dialog.

It takes approximately 5-10 minutes to enable data protection on a Kubernetes cluster. Tanzu Mission
Control creates a namespace named Velero and installs Velero related Kubernetes objects in the
workload cluster.

root@arcas [ ~ ]# kubectl get all -n velero


NAME READY STATUS RESTARTS AGE
pod/restic-nfbpl 1/1 Running 0 44s
pod/restic-q57nk 1/1 Running 0 44s
pod/restic-sj954 1/1 Running 0 44s
pod/velero-57cdf5f99f-7fn4b 1/1 Running 0 71s

NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELE


CTOR AGE
daemonset.apps/restic 3 3 3 3 3 <none>
45s

NAME READY UP-TO-DATE AVAILABLE AGE


deployment.apps/velero 1/1 1 1 71s

VMware, Inc 476


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

NAME DESIRED CURRENT READY AGE


replicaset.apps/velero-57cdf5f99f 1 1 1 72s

Configure Backup
After enabling data protection,

1. In the Data protection section, click Create Backup to configure backup for the workload
cluster.

Tanzu Mission Control Data Protection allows you to create backups of the following types:

All resources in a cluster.

Selected namespaces in a cluster.

Specific resources in a cluster identified by a given label.

VMware, Inc 477


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

2. Select the target location where the backup will be stored.

3. Configure the backup schedule and click Next.

VMware, Inc 478


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

4. Specify the backup retention period and click Next.

VMware, Inc 479


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

5. Specify a name for the backup schedule and click Create.

Backup configuration may take some time depending on the Kubernetes objects that you have

VMware, Inc 480


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

provisioned in the workload cluster. When backup is configured for the first time, Tanzu Mission
Control takes a backup immediately. After that backups are taken as per the backup schedule
configured.

Restore Backup
To restore the Kubernetes data from the backup,

1. Go to Data Protection.

2. Select the backup image and click Restore.

3. Select the resources that you want to restore.

VMware, Inc 481


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

4. Specify a name for the restore task and click Restore.

If you have backed up persistent volumes, the restore process may take some time. The backup is
restored in the same cluster from which it was retrieved.

VMware, Inc 482


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

VMware Tanzu for Kubernetes Operations


Network Port Diagram - Reference Sheet

Kubernetes is a platform that provides development teams with a single API to deploy, manage, and
run applications. However, running, maintaining, and securing Kubernetes is a complex task.
VMware Tanzu for Kubernetes Operations (informally known as TKO) simplifies Kubernetes
operations. It determines what base OS instances to use, which Kubernetes Container Network
Interface (CNI) and Container Storage Interfaces (CSI) to use, how to secure the Kubernetes API,
and so on. It monitors, upgrades, and backs up clusters and helps teams provision, manage, secure,
and maintain Kubernetes clusters on a day-to-day basis.

The following diagram provides a high-level network port requirements for deploying the
components available with Tanzu for Kubernetes Operations as a solution.

Reference for Port Diagram


Product Source Destination Ports Protocols Purpose

VMware, Inc 483


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

NSX Avi Controller Syslog 514 TCP Log Export


Advanced
Load
Balancer

NSX Avi Controller Domain Name 53 UDP DNS Requests


Advanced Server
Load
Balancer

NSX Avi Controller NTP Server 123 UDP Time Synchronization


Advanced
Load
Balancer

NSX Avi Controller SecureLDAP 636 TCP Authentication


Advanced Server
Load
Balancer

NSX Avi Controller LDAP Server 389 UDP Authentication


Advanced
Load
Balancer

NSX Management Client Avi Controller 22 TCP Secure shell login


Advanced
Load
Balancer

NSX Management Client Avi Controller 443 TCP NSX ALB UI/REST API
Advanced
Load
Balancer

NSX Management Client Avi Controller 80 TCP NSX ALB UI


Advanced
Load
Balancer

NSX Avi Controller ESXi Host 443 TCP Management Access for Service Engine
Advanced Creation.
Load
Balancer

NSX Avi Controller vCenter 443 TCP APIs for vCenter Integration.
Advanced Server
Load
Balancer

NSX Avi Controller NSX Manager 443 TCP For NSX Cloud creation.
Advanced
Load
Balancer

NSX Avi Service Engine Avi Controller 123 UDP Time sync
Advanced
Load
Balancer

VMware, Inc 484


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Product Source Destination Ports Protocols Purpose

NSX Avi Service Engine Avi Controller 8443 TCP Secure channel key exchange.
Advanced
Load
Balancer

NSX Avi Service Engine Avi Controller 22 TCP Secure channel SSH
Advanced
Load
Balancer

NSX Avi Service Engine Avi Service 9001 TCP Inter-SE distributed object store for
Advanced Engine vCenter/NSX-T/No Orchestrator/Linux
Load server clouds.
Balancer

NSX Avi Service Engine Avi Service 4001 TCP Inter-SE distributed object store for
Advanced Engine AWS/Azure/GCP/OpenStack clouds.
Load
Balancer

Tanzu Bootstrap Machine TKG Cluster 6443 TCP Kubernetes Cluster API Access
Kubernetes Kubernetes
Grid API Server

Tanzu Bootstrap Machine NodePort 30000- TCP External access to hosted services via
Kubernetes Services 32767 L7 ingress in NodePort mode.
Grid

Tanzu Bootstrap Machine NodePortLoc 61000- TCP External access to hosted services via
Kubernetes al Services 62000(d L7 ingress in NodePortLocal mode.
Grid efault)

Tanzu Bootstrap Machine vCenter 443 TCP vCenter Server UI Access


Kubernetes Server
Grid

Tanzu TKG Workload TKG 31234 TCP Allows Pinniped concierge on


Kubernetes Cluster CIDR Management workload cluster to access Pinniped
Grid Cluster CIDR supervisor on management cluster.

Tanzu TKG Workload TKG 31234 TCP To register Workload Cluster with
Kubernetes Cluster CIDR Management Management Cluster.
Grid Cluster CIDR

Tanzu TKG Management Avi Controller 443 TCP Allows Avi Kubernetes Operator (AKO)
Kubernetes and Workload and AKO Operator (AKOO) access to
Grid Cluster CIDR Avi Controller.

Tanzu TKG Management vCenter 443 TCP Allows components to access vCenter
Kubernetes and Workload Server to create VMs and Storage volumes.
Grid Cluster CIDR

Tanzu TKG Management DNS Server 53 UDP Allows components to look up for
Kubernetes and Workload machine addresses.
Grid Cluster CIDR

Tanzu TKG Management NTP Server 123 UDP Allows components to sync current
Kubernetes and Workload time.
Grid Cluster CIDR

VMware, Inc 485


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

Product Source Destination Ports Protocols Purpose

Tanzu TKG Management Harbor 443 TCP Allows components to retrieve


Kubernetes and Workload container images.
Grid Cluster CIDR

Tanzu TKG Management DHCP Server 67 TCP Allows nodes to get DHCP addresses.
Kubernetes and Workload 68
Grid Cluster CIDR

Tanzu TKG Management Tanzu Mission 443 TCP To manage Tanzu Kubernetes Clusters
Kubernetes and Workload Control with Tanzu Mission Control (TMC).
Grid Cluster CIDR

Tanzu TKG Management Tanzu Service 443 TCP To provide Service Mesh services to
Kubernetes and Workload Mesh Tanzu Kubernetes Clusters with Tanzu
Grid Cluster CIDR Service Mesh (TSM).

Tanzu TKG Management Tanzu 443 TCP To monitor Tanzu Kubernetes Clusters
Kubernetes and Workload Observability with Tanzu Observability (TO).
Grid Cluster CIDR

Tanzu TKG Management vRealize Log 514 UDP To configure remote logging with
Kubernetes and Workload Insight fluentbit.
Grid Cluster CIDR

Tanzu TKG Management vRealzie Log 443 TCP To configure remote logging with
Kubernetes and Workload Insight Cloud fluentbit.
Grid Cluster CIDR

VMware, Inc 486


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

ClusterClass Overview

ClusterClass in the Kubernetes Cluster API project allows you to define the shape of your clusters.
You can determine the shape of the cluster only once, and use them multiple times. The
ClusterClass consists of collection of templates that define the topology and configuration of a
Kubernetes cluster. The templates can be used to create new clusters, or to update existing clusters.

ClusterClass helps you simplifying the process of creating and managing multiple Kubernetes
clusters, and to make your clusters more consistent and reliable.

The ClusterClass CRD contains the following components:

ControlPlane: This includes the reference to the VSphereMachineTemplate used when


creating the machines for the cluster’s control plane, and the KubeadmControlPlaneTemplate
containing the KubeadmConfigSpec for initializing the control plane machines.

Workers: This includes the reference to the VSphereMachineTemplate used when creating
the machines for the cluster’s worker machines and the KubeadmConfigTemplate containing
the KubeadmConfigSpec for initializing and joining the worker machines to the control plane.

Infrastructure: This includes the reference to the VSphereClusterTemplate that contains the
vCenter details(vCenter Server endpoint, SSL thumbprint etc) used when creating the
cluster.

Variables: A list of variable definitions, where each variable is defined using the OpenAPI
Schema definition.

Patches: A list of patches, used to change the above mentioned templates for each specific
cluster. Varibale definitions defined in the Variables section can also be used in the patches

VMware, Inc 487


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

section.

Benefits of using ClusterClass:


Simplified cluster creation: ClusterClass templates can be used to create new clusters with a
single command to save time and effort.

Consistent clusters: All clusters that are created from the same ClusterClass will have the
same topology and configuration. This helps ensuring that your clusters are reliable and
predictable.

Extensible cluster templates: ClusterClass templates can be customized to create clusters


that meet the specific needs of your applications.

Managed clusters: ClusterClass can be used to manage the lifecycle of your clusters. This
help you automating the process of creating, updating, and deleting clusters.

Cluster
Cluster CRD is used to create and manage the cluster’s configuration and state, and delete
Kubernetes clusters. For example, you can use the cluster object to update the Kubernetes version,
the network configuration, or the number of nodes in the cluster.

Configuration of the Cluster topology


The configuration of cluster topology contains the following options:

A reference to the ClusterClass CRD.

Define the attributes that governs the Cluster’s control plane. These attributes contain
parameters such as, the count of replicas, alongside provisions for overriding or appending
values to control plane metadata, nodeDrainTimeout, and control plane’s
MachineHealthCheck.

A list of machine deployments slated for creation, with each deployment uniquely
characterized by:
The reference to the MachineDeployment class, which defines the templates to be
used this specific MachineDeployment.

The number of replicas designated for this MachineDeployment, along with other
parameters such as node deployment strategy, machineHealthCheck,
nodeDrainTimeout values.

Specification of the intended Kubernetes version for both the Cluster, encompassing both
the control plane and worker nodes.

The Cluster Topology and MachineDeployments can also be customised using a set of
variables through patches as defined the ClusterClass CRD.

ClusterClass and Cluster CRD use cases in TKGm:


Private Image Repo Configuration

For New Clusters

VMware, Inc 488


VMware Tanzu for Kubernetes Operations Reference Architecture 2.3

For Existing Clusters

Node Resizing/Vertical Scaling (CPU, Memory)

Creating Clusters using Custom ClusterClass

VMware, Inc 489

You might also like