Tko Reference Architecture
Tko Reference Architecture
Kubernetes Operations
Reference Architecture 2.3
VMware Tanzu for Kubernetes Operations 2.3
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
You can find the most up-to-date technical documentation on the VMware website at:
https://docs.vmware.com/
VMware, Inc.
3401 Hillview Ave.
Palo Alto, CA 94304
www.vmware.com
Copyright © 2023 VMware, Inc. All rights reserved. Copyright and trademark information.
VMware, Inc 2
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Contents
VMware, Inc 3
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
VMware, Inc 4
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Deploy Service Engine VMs for Tanzu Kubernetes Grid Management Cluster 69
Deploy Service Engines for Tanzu Kubernetes Grid Workload cluster 71
Deploy and Configure Tanzu Kubernetes Grid 73
Deploy and Configure Bootstrap Machine 73
Import Base Image template for Tanzu Kubernetes Grid Cluster Deployment 76
Deploy Tanzu Kubernetes Grid Management Cluster 77
Register Management Cluster with Tanzu Mission Control 86
Create AKO Deployment Config for Tanzu Kubernetes Grid Workload Cluster 86
Configure AKO Deployment Config (ADC) for Shared Services Cluster 87
Configure AKO Deployment Config (ADC) for Workload Cluster to Enable NSX 89
Advanced Load Balancer L7 Ingress with NodePortLocal Mode
Deploy Tanzu Kubernetes Grid Shared Services Cluster 91
Deploy Tanzu Kubernetes Clusters (Workload Clusters) 97
Integrate Tanzu Kubernetes Clusters with Tanzu Observability 98
Integrate Tanzu Kubernetes Clusters with Tanzu Service Mesh 98
Deploy User-Managed Packages on Tanzu Kubernetes clusters 99
VMware, Inc 5
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
VMware, Inc 6
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
VMware, Inc 7
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
VMware, Inc 8
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
VMware, Inc 9
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
VMware, Inc 10
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Summary 264
Deployment Instructions 264
VMware, Inc 11
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
VMware Tanzu for Kubernetes Operations using vSphere with Tanzu 329
Reference Design
vSphere with Tanzu Components 330
vSphere with Tanzu Architecture 331
Supported Component Matrix 333
vSphere with Tanzu Storage 333
Tanzu Kubernetes Clusters Networking 334
Networking for vSphere with Tanzu 335
vSphere with Tanzu on vSphere Networking with NSX Advanced Load 335
Balancer
NSX Advanced Load Balancer Components 336
Network Architecture 337
Subnet and CIDR Examples 339
Firewall Requirements 340
Deployment options 342
Single-Zone Deployment of Supervisor 342
Three-Zone Deployment of Supervisor 342
Installation Experience 342
Design Recommendations 343
NSX Advanced Load Balancer Recommendations 343
Network Recommendations 345
Recommendations for Supervisor Clusters 345
Recommendations for Tanzu Kubernetes Clusters 346
Kubernetes Ingress Routing 346
NSX Advanced Load Balancer Sizing Guidelines 347
NSX Advanced Load Balancer Controller Configuration 347
Service Engine Sizing Guidelines 347
Container Registry 348
vSphere with Tanzu SaaS Integration 349
Custom Tanzu Observability Dashboards 349
Summary 349
VMware, Inc 12
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Deploy VMware Tanzu for Kubernetes Operations using vSphere with 350
Tanzu
Deploying with VMware Service Installer for Tanzu 350
Prerequisites 350
General Requirements 350
Network Requirements 351
Firewall Requirements 352
Resource Pools 352
Deployment Overview 352
Deploy and Configure NSX Advanced Load Balancer 353
Deploy NSX Advance Load Balancer Controller Node 353
Configure the Controller Node for your vSphere with Tanzu Environment 353
Configure Default-Cloud 355
Configure Licensing 358
Configure NTP Settings 359
Deploy NSX Advanced Load Balancer Controller Cluster 360
Change NSX Advanced Load Balancer Portal Default Certificate 362
Export NSX Advanced Load Balancer Certificate 364
Configure a Service Engine Group 364
Configure a Virtual IP Subnet for the Data Network 364
Configure Default Gateway 365
Configure IPAM and DNS Profile 366
Deploy Tanzu Kubernetes Grid Supervisor Cluster 369
Download and Install the Kubernetes CLI Tools for vSphere 374
Connect to the Supervisor Cluster 374
Create and Configure vSphere Namespaces 375
Configure Permissions for the Namespace 376
Set Persistent Storage to the Namespace 377
Specify Namespace Capacity Limits 378
Associate VM Class with Namespace 379
Register Supervisor Cluster with Tanzu Mission Control 381
Deploy Tanzu Kubernetes Clusters (Workload Cluster) 384
Integrate Tanzu Kubernetes clusters with Tanzu Observability 390
Integrate Tanzu Kubernetes Clusters with Tanzu Service Mesh 390
Integrate Tanzu Kubernetes clusters with Tanzu Observability 390
Deploy User-Managed Packages on Tanzu Kubernetes clusters 390
Self-Service Namespace in vSphere with Tanzu 390
VMware, Inc 13
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
VMware Tanzu for Kubernetes Operations using vSphere with Tanzu on 393
NSX Reference Design
Supported Component Matrix 394
vSphere with Tanzu Components 394
Identity and Access Management 396
Roles and Permissions 397
vSphere with Tanzu Architecture 397
vSphere with Tanzu Storage 398
Networking for vSphere with Tanzu 399
Network Requirements 401
Firewall Recommendations 402
Network Segmentation 404
Deployment options 404
Single-Zone Deployment of Supervisor 405
Three-Zone Deployment of Supervisor 405
Installation Experience 405
vSphere Namespaces 407
Tanzu Kubernetes Grid Cluster API’s 408
Tanzu Kubernetes Clusters Networking 408
Kubernetes Ingress Routing 409
Container Registry 409
Scale a Tanzu Kubernetes Grid Cluster 410
Backup And Restore 410
vSphere with Tanzu SaaS Integration 411
Custom Tanzu Observability Dashboards 411
Summary 411
VMware Tanzu for Kubernetes Operations using vSphere With Tanzu 411
Multi-AZ Reference Architecture on VDS Networking
Supported Component Matrix 412
vSphere with Tanzu Components 412
Identity and Access Management 415
vSphere with Tanzu Architecture for a Multi-Zone Deployment 415
Recommendations for using namespace in vSphere with Tanzu 416
vSphere with Tanzu Storage 416
Networking for vSphere with Tanzu 417
vSphere with Tanzu on VDS Networking with NSX Advanced Load Balancer 417
NSX Advanced Load Balancer Components 418
Network Architecture 421
Networking Prerequisites 422
VMware, Inc 14
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
VMware Tanzu for Kubernetes Operations using vSphere with Tanzu 435
Multi-AZ Reference Architecture on NSX Networking
Supported Component Matrix 435
vSphere With Tanzu Components 436
Identity and Access Management 438
Roles and Permissions 439
vSphere with Tanzu Architecture 439
vSphere with Tanzu Storage 440
Networking for vSphere with Tanzu 440
Networking Prerequisites 442
Network Requirements 442
Firewall Recommendations 443
Installation Experience 445
vSphere Namespaces 447
Tanzu Kubernetes Grid Workload Clusters 447
Tanzu Kubernetes Grid Cluster APIs 448
Tanzu Kubernetes Clusters Networking 449
Kubernetes Ingress Routing 449
Container Registry 450
Scale a Tanzu Kubernetes Grid Cluster 451
Backup And Restore 451
Appendix A - Deploy TKG Cluster 451
Appendix B - Deploy StatefulSet Application to vSphere Zones 453
VMware, Inc 15
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
VMware, Inc 16
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Kubernetes is a great platform that provides development teams with a single API to deploy,
manage, and run applications. However, running, maintaining, and securing Kubernetes is a
complex task. VMware Tanzu for Kubernetes Operations (informally known as TKO) simplifies
Kubernetes operations. It determines what base OS instances to use, which Kubernetes Container
Network Interface (CNI) and Container Storage Interfaces (CSI) to use, how to secure the Kubernetes
API, and much more. It monitors, upgrades, and backs up clusters and helps teams provision,
manage, secure, and maintain Kubernetes clusters on a day-to-day basis.
Note
This reference architecture is tested to work with Tanzu Kubernetes Grid 2.3. This
reference architecture will be refreshed shortly to capture new features and
capabilities introduced in Tanzu Kubernetes Grid 2.3.
The following diagram provides a high-level reference architecture for deploying the components
available with Tanzu for Kubernetes Operations as a solution.
The reference architecture documentation provides several reference designs and the instructions
for deploying the reference designs. The reference designs are based on the high-level reference
architecture and they are tailored for deploying Tanzu for Kubernetes Operations on your IaaS or
VMware, Inc 17
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
infrastructure of choice.
The reference architecture and the reference designs are tested and supported by VMware.
Components
The following components are used in the reference architecture:
VMware Tanzu Kubernetes Grid - Enables creation and lifecycle management operations of
Kubernetes clusters.
vSphere with Tanzu - Transforms vSphere into a platform for running Kubernetes workloads
natively on the hypervisor layer. When enabled on a vSphere cluster, vSphere with Tanzu provides
the capability to run Kubernetes workloads directly on ESXi hosts and to create upstream
Kubernetes clusters within dedicated resource pools.
VMware Tanzu Mission Control - Provides a global view of Kubernetes clusters and allows for
centralized policy management across all deployed and attached clusters.
VMware Tanzu Service Mesh - Provides consistent control and security for microservices, end
users, and data, across all your clusters and clouds.
VMware NSX Advanced Load Balancer Enterprise Edition - Provides layer 4 service type load
balancer support. NSX Advanced Load Balancer is recommended for vSphere deployments without
NSX-T, or which have unique scale requirements.
User-managed packages - Provides in-cluster and shared services to the Kubernetes clusters that
are running in your Tanzu Kubernetes Grid environment.
ExternalDNS - Publishes DNS records for applications to DNS servers. It uses a declarative
Kubernetes-native interface.
Fluent Bit - Collects data and logs from different sources, unifies them, and sends them to
multiple destinations. Tanzu Kubernetes Grid includes signed binaries for Fluent Bit.
Grafana - Provides monitoring dashboards for displaying key health metrics of Kubernetes
clusters. Tanzu Kubernetes Grid includes an implementation of Grafana.
Harbor Image Registry - Provides a centralized location to push, pull, store, and scan
VMware, Inc 18
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
container images used in Kubernetes workloads. It supports storing artifacts such as Helm
Charts and includes enterprise-grade features such as RBAC, retention policies, automated
garbage cleanup, and docker hub proxying.
Multus CNI - Enables attaching multiple network interfaces to pods. Multus CNI is a container
network interface (CNI) plugin for Kubernetes that lets you attach multiple network interfaces
to a single pod and associate each with a different address range.
VMware, Inc 19
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
The following documentation lays out the reference designs for deploying Tanzu for Kubernetes
Operation (informally known as TKO) on a cloud. You can deploy Tanzu for Kubernetes Operations
on VMware Cloud on AWS, directly on AWS, or on Microsoft Azure.
VMware Tanzu for Kubernetes Operations on VMware Cloud on AWS Reference Design
Deploy Tanzu for Kubernetes Operations on VMware Cloud on AWS
This document lays out a reference design for deploying VMware Tanzu for Kubernetes Operations
on VMware Cloud on AWS.
Note
The scope of this document is limited to Tanzu Kubernetes Grid (multi-cloud), which
is a customer-managed solution.
The following reference design is based on the architecture and components described in VMware
Tanzu for Kubernetes Operations Reference Architecture.
VMware, Inc 20
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
For the latest information about which software versions can be used together, check the
Interoperability Matrix here.
1. Cloud Migrations
3. Disaster Recovery
VMware, Inc 21
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
By running VMware Tanzu within the same infrastructure as the general VM workloads enabled by
the first three use cases, organizations can start their next generation application modernization
strategy immediately without incurring additional cost. For example, SDDC spare capacity can be
used to run Tanzu Kubernetes Grid to enable next generation application modernization, or compute
capacity not used by disaster recovery can be used for Tanzu Kubernetes Grid clusters.
The following additional benefits are enabled by the Elastic Network Interface that connects the
VMware Cloud on AWS SDDC to the AWS services within the Amazon VPC:
Enable developers to modernize existing enterprise apps with AWS cloud capabilities and
services.
Integrate modern application tools and frameworks to develop next generation applications.
Remove egress charges as all traffic is internal of the Amazon availability zone.
Management Cluster - A management cluster is the first element that you deploy when you
create a Tanzu Kubernetes Grid instance. The management cluster is a Kubernetes cluster
that performs the role of the primary management and operational center for the Tanzu
Kubernetes Grid instance. The management cluster is purpose-built for operating the
platform and managing the lifecycle of Tanzu Kubernetes clusters.
Tanzu Kubernetes Cluster - Tanzu Kubernetes clusters are the Kubernetes clusters in which
your application workloads run. These clusters are also referred to as workload clusters.
Tanzu Kubernetes clusters can run different versions of Kubernetes, depending on the
needs of the applications they run.
Shared Services Cluster - Each Tanzu Kubernetes Grid instance can have only one shared
services cluster. You deploy this cluster only if you intend to deploy shared services such as
Contour and Harbor.
ClusterClass API - Tanzu Kubernetes Grid 2 functions through the creation of a management
Kubernetes cluster which holds ClusterClass API. The ClusterClass API then interacts with the
infrastructure provider to service workload Kubernetes cluster lifecycle requests. The earlier
primitives of Tanzu Kubernetes Clusters will still exist for Tanzu Kubernetes Grid 1.X. A new
feature has been introduced as a part of Cluster API called ClusterClass which reduces the
need for redundant templating and enables powerful customization of clusters. The whole
process for creating a cluster using ClusterClass is the same as before but with slightly
different parameters.
Tanzu Kubernetes Cluster Plans - A cluster plan is a blueprint that describes the
configuration with which to deploy a Tanzu Kubernetes cluster. It provides a set of
configurable values that describe settings like the number of control plane machines, worker
VMware, Inc 22
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
The current release of Tanzu Kubernetes Grid provides two default templates, dev and prod.
Tanzu Kubernetes Grid Instance - A Tanzu Kubernetes Grid instance is the full deployment
of Tanzu Kubernetes Grid, including the management cluster, the workload clusters, and the
shared services cluster that you configure.
Tanzu CLI - A command-line utility that provides the necessary commands to build and
operate Tanzu management and Tanzu Kubernetes clusters.
ytt - A command-line tool for templating and patching YAML files. You can also use
ytt to collect fragments and piles of YAML into modular chunks for reuse.
kapp - The application deployment CLI for Kubernetes. It allows you to install,
upgrade, and delete multiple Kubernetes resources as one application.
imgpkg - A tool that enables Kubernetes to store configurations and the associated
container images as OCI images, and to transfer these images.
Bootstrap Machine - The bootstrap machine is the laptop, host, or server on which you
download and run the Tanzu CLI. This is where the initial bootstrapping of a management
cluster occurs before it is pushed to the platform where it will run.
Tanzu Kubernetes Grid Installer - The Tanzu Kubernetes Grid installer is a graphical wizard
that you launch by running the tanzu management-cluster create --ui command. The
installer wizard runs locally on the bootstrap machine and provides a user interface to guide
you through the process of deploying a management cluster.
For Kubernetes stateful workloads, Tanzu Kubernetes Grid installs the vSphere Container Storage
interface (vSphere CSI) to provision Kubernetes persistent volumes for pods automatically. While the
default vSAN storage policy can be used, site reliability engineers (SREs) and administrators should
evaluate the needs of their applications and craft a specific vSphere Storage Policy. vSAN storage
policies describe classes of storage such as SSD and NVME, as well as cluster quotas.
In vSphere 7u1+ environments with vSAN, the vSphere CSI driver for Kubernetes also supports
creating NFS File Volumes, which support ReadWriteMany access modes. This allows for
provisioning volumes which can be read and written from multiple pods simultaneously. To support
this, the vSAN File Service must be enabled.
You can also use other types of vSphere datastores. There are Tanzu Kubernetes Grid Cluster Plans
that operators can define to use a certain vSphere datastore when creating new workload clusters.
All developers would then have the ability to provision container-backed persistent volumes from
VMware, Inc 23
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
TKO-STG-001 Use vSAN storage for TKO VMC on AWS come with default vSAN storage. NA
While the default vSAN storage policy can be used, administrators should evaluate the needs of their
applications and craft a specific vSphere Storage Policy.
Starting with vSphere 7.0 environments with vSAN, the vSphere CSI driver for Kubernetes also
supports the creation of NFS File Volumes, which support ReadWriteMany access modes. This allows
for provisioning volumes, which can be read and written from multiple pods simultaneously. To
support this, you must enable vSAN File Service.
Antrea
Calico
Both are open-source software that provide networking for cluster pods, services, and ingress.
When you deploy a Tanzu Kubernetes cluster using Tanzu Mission Control or Tanzu CLI, Antrea CNI
is automatically enabled in the cluster. To provision a Tanzu Kubernetes cluster using a non-default
CNI, see the following instructions:
Each CNI is suitable for a different use case. The following table lists some common use cases for the
three CNIs that Tanzu Kubernetes Grid supports. This table helps you select the most appropriate
CNI for your Tanzu Kubernetes Grid implementation.
VMware, Inc 24
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Ant Enable Kubernetes pod networking with IP overlay networks using Pros
rea VXLAN or Geneve for encapsulation. Optionally encrypt node-to- - Antrea leverages Open vSwitch as
node communication using IPSec packet encryption. the networking data plane. Open
Antrea supports advanced network use cases like kernel bypass and vSwitch supports both Linux and
network service mesh. Windows.
- VMware supports the latest
conformant Kubernetes and stable
releases of Antrea.
Mul Multus CNI can give multiple interfaces per each Kubernetes pod. Pros
tus Using Multus CRDs, you can specify which pods get which interfaces - Separation of data/control planes.
and allow different interfaces depending on the use case. - Separate security policies can be
used for separate interfaces.
- Supports SR-IOV, DPDK, OVS-
DPDK, and VPP workloads in
Kubernetes with both cloud-native
and NFV-based applications in
Kubernetes.
Note
The scope of this document is limited to VMware NSX-T Data Center Networking
with NSX Advanced Load Balancer.
You can configure NSX Advanced Load Balancer in Tanzu Kubernetes Grid as:
A load balancer for workloads in the clusters that are deployed on vSphere.
The L7 ingress service provider for the workloads in the clusters that are deployed on
vSphere.
VMware, Inc 25
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
The VIP endpoint provider for the control plane API server.
Each workload cluster integrates with NSX Advanced Load Balancer by running an Avi Kubernetes
Operator (AKO) on one of its nodes. The cluster’s AKO calls the Kubernetes API to manage the
lifecycle of load balancing and ingress resources for its workloads.
NSX Advanced Load Balancer service engines must be deployed before load balancing services can
be requested by Kubernetes.
The following are the core components of NSX Advanced Load Balancer:
NSX Advanced Load Balancer Controller - NSX Advanced Load Balancer Controller
manages Virtual Service objects and interacts with the vCenter Server infrastructure to
manage the lifecycle of the service engines (SEs). It is the central repository for the
configurations and policies related to services and management and provides the portal for
viewing the health of virtual services and SEs and the associated analytics provided by NSX
Advanced Load Balancer.
NSX Advanced Load Balancer Service Engine - The service engines (SEs) are lightweight
VMs that handle all data plane operations by receiving and executing instructions from the
controller. The SEs perform load balancing and all client- and server-facing network
interactions.
Avi Kubernetes Operator (AKO) - An Avi Kubernetes operator runs as a pod in the
management cluster and Tanzu Kubernetes clusters and provides ingress and load balancing
functionality. AKO translates the required Kubernetes objects to NSX Advanced Load
Balancer objects and automates the implementation of ingresses, routes, and services on the
service engines (SE) through the NSX Advanced Load Balancer controller.
AKO Operator (AKOO) - The AKO operator takes care of deploying, managing, and
removing AKO from Kubernetes clusters. When deployed, this operator creates an instance
of the AKO controller and installs all the relevant objects, including:
AKO StatefulSet
Tanzu Kubernetes Grid management clusters have an AKO operator installed out-of-the-box during
cluster deployment. By default, a Tanzu Kubernetes Grid management cluster has a couple of
AkoDeploymentConfig created which dictates when and how AKO pods are created in the workload
clusters. For more information, see AKO Operator documentation.
Optionally, you can enter one or more cluster labels to identify clusters on which to selectively
enable NSX ALB or to customize NSX ALB settings for different groups of clusters. This is useful in
the following scenarios: - You want to configure different sets of workload clusters to different
VMware, Inc 26
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Service Engine Groups to implement isolation or to support more Service type Load Balancers than
one Service Engine Group’s capacity. - You want to configure different sets of workload clusters to
different Clouds because they are deployed in different sites.
To enable NSX ALB selectively rather than globally, add labels in the format key: value pair in the
management cluster config file. This will create a default AKO Deployment Config (ADC) on
management cluster with the NSX ALB settings provided. Labels that you define here will be used to
create a label selector. Only workload cluster objects that have the matching labels will have the load
balancer enabled.
To customize the NSX ALB settings for different groups of clusters, create an AKO Deployment
Config (ADC) on management cluster by customizing the NSX ALB settings, and providing a unique
label selector for the ADC. Only the workload cluster objects that have the matching labels will have
these custom settings applied.
You can label the cluster during the workload cluster deployment or label it manually post cluster
creation. If you define multiple key-values, you need to apply all of them. - Provide an AVI_LABEL
in the below format in the workload cluster deployment config file, and it will automatically label the
cluster and select the matching ADC based on the label selector during the cluster deployment.
AVI_LABELS: | 'type': 'tkg-workloadset01' - Optionally, you can manually label the cluster object
of the corresponding workload cluster with the labels defined in ADC. kubectl label cluster
<cluster-name> type=tkg-workloadset01
Each environment configured in NSX Advanced Load Balancer is referred to as a cloud. Each cloud
in NSX Advanced Load Balancer maintains networking and NSX Advanced Load Balancer service
engine settings. The cloud is configured with one or more VIP networks to provide IP addresses to
load balancing (L4/L7) virtual services created under that cloud.
The virtual services can be spanned across multiple service engines if the associated service engine
group is configured in Active/Active HA mode. A service engine can belong to only one service
engine group at a time.
IP address allocation for virtual services can be over DHCP or through the in-built IPAM functionality
of NSX Advanced Load Balancer. The VIP networks created or configured in NSX Advanced Load
Balancer are associated with the IPAM profile.
TKO- Register the management Tanzu Mission Control automates the creation of Only Antrea CNI is
TKG-001 cluster with Tanzu Mission the Tanzu Kubernetes clusters and manages the supported on
Control. life cycle of all clusters centrally. Workload clusters
created from the
TMC portal.
TKO- Use NSX Advanced Load AVI is tightly coupled with TKG and vSphere. Adds NSX Advanced
TKG- Balancer as your control Since AVI is a VMware product, customers will Load Balancer
002 plane endpoint provider and have single point of contact for support. License Cost to the
for application load solution.
balancing.
VMware, Inc 27
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Decision
Design Decision Design Justification Design Implications
ID
TKO- Deploy Tanzu Kubernetes Large form factor should suffice to integrate TKG Consumes more
TKG- Management clusters in Management Cluster with TMC, pinniped and Resources from
003 large form factor. velero deployment. This must be capable of Infrastructure.
accommodating 100+ Tanzu Workload Clusters.
TKO- Deploy Tanzu Kubernetes This deploys multiple control plane nodes and Consumes more
TKG- clusters with prod plan. provides high availability for the control plane. Resources from
004 Infrastructure.
TKO- Enable identity management Role-based access control to Tanzu Kubernetes Required External
TKG- for TKG clusters. Grid clusters. Identity
005 Management.
Network Architecture
For deployment of Tanzu Kubernetes Grid in VMware Cloud on AWS SDDCs, separate segments are
built for the Tanzu Kubernetes Grid management cluster, Tanzu Kubernetes Grid shared services
cluster, Tanzu Kubernetes Grid workload clusters, NSX Advanced Load Balancer management,
Cluster-VIP segment for control plane HA, Tanzu Kubernetes Grid Management VIP/Data segment,
and Tanzu Kubernetes Grid workload Data/VIP segment.
The network reference design can be mapped into this general framework.
VMware, Inc 28
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Isolates and separates SDDC management components (vCenter, ESX) from the Tanzu
Kubernetes Grid components. This reference design allows only minimum connectivity
between the Tanzu Kubernetes Grid clusters and NSX Advanced Load Balancer and the
VMware, Inc 29
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
vCenter Server.
Isolates and separates the NSX Advanced Load Balancer management network segment
from the Tanzu Kubernetes Grid management segment and the Tanzu Kubernetes Grid
workload segments.
Depending on the workload cluster type and use case, multiple workload clusters can
leverage the same logical segments or new segments can be used for each workload cluster.
To isolate and separate Tanzu Kubernetes Grid workload cluster networking from each other,
VMware recommends that you use separate logical segments for each workload cluster and
configure the required firewall between these networks. See Firewall Recommendations for
more details.
Separates provider and tenant access to the Tanzu Kubernetes Grid environment.
Only provider administrators need access to the Tanzu Kubernetes Grid management
cluster. Allowing only administrators to access the Tanzu Kubernetes Grid
management cluster prevents tenants from attempting to connect to the Tanzu
Kubernetes Grid management cluster.
Network Requirements
As per the defined architecture, the list of required networks includes:
TKG Yes Control plane and worker nodes of TKG management cluster clusters are attached to this
management network.
network
TKG shared Yes Control plane and worker nodes of TKG shared services cluster are attached to this
services network.
network
TKG workload Yes Control plane and worker nodes of TKG workload clusters are attached to this network.
network
TKG cluster Option Virtual services for Control plane HA of all TKG clusters (management, shared services,
VIP/data al and workload).
network
TKG Option Virtual services for all user-managed packages (such as Contour and Harbor) hosted on
management al the shared services cluster.
VIP/data
network
TKG workload Option Virtual services for all applications hosted on the workload clusters.
VIP/data al
network
VMware, Inc 30
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Network Recommendations
The key network recommendations for a production-grade Tanzu Kubernetes Grid deployment with
NSX-T Data Center Networking are as follows:
Decision
Design Decision Design Justification Design Implications
ID
TKO- Use separate networks To have a flexible firewall and Sharing the same network for multiple
NET-001 for TKG management security policies. clusters can complicate the creation of
and workload clusters. firewall rules.
TKO- Configure DHCP for TKG Tanzu Kubernetes Grid does not Enable DHCP on the logical segments
NET- clusters. support static IP assignments for that are used to host TKG clusters.
003 Kubernetes VM components.
Gateway
Network Type Port Group Name DHCP Pool NSX ALB IP Pool
CIDR
Firewall Requirements
To prepare the firewall, you must collet the following information:
VMware, Inc 31
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
VMware Cloud on AWS uses a management gateway and compute gateway. These gateways need
firewall rules to allow traffic for Tanzu Kubernetes Grid deployments.
The following table provides a list of firewall rules based on the assumption that there is no firewall
within a subnet/VLAN.
Client machine NSX ALB controller nodes TCP:443 To access NSX ALB portal for configuration.
and cluster IP address.
Client machine vCenter Server TCP:443 To create resource pools, VM folders, etc, in
vCenter.
TKG workload
network CIDR.
TKG workload
network CIDR.
VMware, Inc 32
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
TKG cluster VIP Range. TCP:6443 For management cluster to configure shared
TKG management
services and workload clusters.
network CIDR.
TKG workload
network CIDR.
TKG workload
networks
NSX ALB controllers and TCP:443 Allow Avi Kubernetes Operator (AKO) and AKO
TKG management
cluster IP address. Operator (AKOO) access to NSX ALB
network
controller.
TKG shared services
network
TKG workload
networks
NSX ALB vCenter and ESXi Hosts TCP:443 Allow NSX ALB to discover vCenter objects
controllers. and deploy SEs as required.
NSX ALB
DNS server UDP:53 DNS service
management
network CIDR. NTP server UDP:123 Time synchronization
TCP:443
Client machine console.cloud.vmw To access cloud Services portal to configure
are.com networks in VMC SDDC.
VMware, Inc 33
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Decision Design
Design Decision Design Justification
ID Implications
TKO- Deploy NSX ALB controller Isolate NSX ALB traffic from infrastructure management Additional
ALB-001 cluster nodes on a network traffic and Kubernetes workloads. Network
dedicated to NSX-ALB. (VLAN ) is
required.
TKO- Deploy 3 NSX ALB To achieve high availability for the NSX ALB platform. In Additional
ALB- controllers nodes. clustered mode, NSX ALB availability is not impacted by an resource
002 individual controller node failure. The failed node can be requirement
removed from the cluster and redeployed if recovery is not s.
possible. Provides the highest level of uptime for a site.
TKO- Under Compute policies vSphere places NSX Advanced Load Balancer Controller Affinity
ALB- create ‘VM-VM anti-affinity’ VMs in a way that always ensures maximum HA. Rules needs
003 rule that prevents to be
collocation of the NSX ALB configured
Controllers VMs on the manually.
same host.
TKO- Add vCenter as No VMConAWS NSX-T have only limited Access. Service
ALB- Orchestrator type. Engines
CloudAdmin user have Limited Access.
004 need to be
deployed
manually.
VIP
networks
and Server
Networks
need to be
assigned
manually.
Service
engine
group need
to be
assigned
manually.
TKO- Use static IP addresses for NSX ALB Controller cluster uses management IP addresses None
ALB- the NSX ALB controllers. to form and maintain quorum for the control plane cluster.
005 Any changes to management IP addresses will be disruptive.
VMware, Inc 34
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Decision Design
Design Decision Design Justification
ID Implications
TKO- Reserve an IP address in NSX ALB portal is always accessible over cluster IP address Additional
ALB- the NSX ALB management regardless of a specific individual controller node failure. IP is
006 subnet to be used as the required.
cluster IP address for the
controller cluster.
TKO- Create a dedicated Guarantees the CPU and Memory allocation for NSX ALB None
ALB- resource pool with Controllers and avoids performance degradation in case of
007 appropriate reservations for resource contention.
NSX ALB controllers.
TKO- Replace default NSX ALB To establish a trusted connection with other infra None,
ALB- certificates with Custom CA components, and the default certificate does not include SAN entries
008 or Public CA-signed SAN entries which is not acceptable by Tanzu. are not
certificates that contains applicable if
SAN entries of all wild card
Controller nodes. certificate is
used.
TKO- Configure NSX ALB backup Periodic backup of NSX ALB configuration database is Additional
ALB- with a remote server as recommended. The database defines all clouds, all virtual Operational
009 backup location. services, all users, and others. As a best practice, store Overhead.
backups in an external location to provide backup Additional
capabilities in case of entire cluster failure. infrastructur
e Resource.
TKO- Configure Remote logging For operations teams to be able to centrally monitor NSX Additional
ALB-010 for NSX ALB Controller to ALB and escalate alerts, events must be sent from the NSX Operational
send events on Syslog. ALB Controller. Overhead.
Additional
infrastructur
e Resource
TKO- Use LDAP/SAML based Helps to Maintain Role based Access Control. Additional
ALB-011 Authentication for NSX Configuratio
ALB. n is
required.
TKO- NSX ALB Service Engine Provides higher resiliency, optimum Requires Enterprise
ALB-SE- High Availability set to performance, and utilization compared to Licensing.
001 Active/Active. N+M and/or Active/Standby. Certain applications might
not work in Active/ Active
mode. For instance,
applications that require
preserving client IP use the
Legacy Active/ Standby HA
mode.
VMware, Inc 35
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Decision
Design Decision Design Justification Design Implications
ID
TKO- Dedicated Service Engine SE resources are guaranteed for TKG Dedicated service engine
ALB-SE- Group for the TKG Management Stack and provides data path Groups increase licensing
002 Management. segregation for Management and Tenant cost.
Application.
TKO- Dedicated Service Engine SE resources are guaranteed for single or Dedicated service engine
ALB-SE- Group for the TKG Workload set of workload clusters and provides data Groups increase licensing
003 Clusters Depending on the path segregation for Tenant Application cost.
nature and type of workloads hosted on workload clusters.
(dev/prod/test).
TKO- Enable ALB Service Engine Enable SEs to elect a primary amongst None
ALB-SE- Self Elections. themselves in the absence of connectivity
004 to the NSX ALB controller.
TKO- Set ‘Placement across the This allows maximum utilization of None
ALB-SE- Service Engines’ setting to capacity.
005 ‘Compact’.
TKO- Under Compute policies vSphere will take care of placing the Service Affinity Rules needs to be
ALB-SE- Create a ‘VM-VM anti-affinity Engine VMs in a way that always ensures configured manually.
006 rule for SE engines part of maximum HA for the Service Engines part
the same SE group that of a Service Engine group.
prevents collocation of the
Service Engine VMs on the
same host.
TKO- Reserve Memory and CPU for The Service Engines are a critical You must perform
ALB-SE- Service Engines. infrastructure component providing load- additional configuration to
007 balancing services to mission-critical set up the reservations.
applications. Guarantees the CPU and
Memory allocation for SE VM and avoids
performance degradation in case of
resource contention.
Contour is an open-source controller for Kubernetes Ingress routing. Contour can be installed in the
shared services cluster on any Tanzu Kubernetes Cluster. Deploying Contour is a prerequisite if you
want to deploy the Prometheus, Grafana, and Harbor Packages on a workload cluster.
For more information about Contour, see Contour site and Implementing Ingress Control with
Contour.
Another option for ingress control is to use the NSX Advanced Load Balancer Kubernetes ingress
controller which offers an advanced L7 ingress for containerized applications that are deployed in the
Tanzu Kubernetes workload cluster.
VMware, Inc 36
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
For more information about the NSX Advanced Load Balancer ingress controller, see Configuring L7
Ingress with NSX Advanced Load Balancer.
Tanzu Service Mesh, a SaaS offering for modern applications running across multi-cluster, multi-
clouds, also offers an ingress controller based on Istio.
Each ingress controller has its own pros and cons. The following table provides general
recommendations for choosing an ingress controller for your Kubernetes environment.
Ingress
Use Cases
Controller
Contour
Use contour when only north-south traffic is needed in a Kubernetes cluster. You can apply security
policies for north-south traffic by defining the policies in the applications manifest file.
Istio Use Istio ingress controller when you intend to provide security, traffic direction, and insight within
the cluster (east-west traffic) and between the cluster and the outside world (north-south traffic).
NSX ALB
Use NSX ALB ingress controller when a containerized application requires features like local and
Ingress
global server load balancing (GSLB), web application firewall (WAF), performance monitoring, and so
controller
on.
Legacy ingress services for Kubernetes include multiple disparate solutions. The services and
VMware, Inc 37
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
products contain independent components that are difficult to manage and troubleshoot. The ingress
services have reduced observability capabilities with little analytics, and they lack comprehensive
visibility into the applications that run on the system. Cloud-native automation is difficult in the legacy
ingress services.
In comparison to the legacy Kubernetes ingress services, NSX Advanced Load Balancer has
comprehensive load balancing and ingress services features. As a single solution with a central
control, NSX Advanced Load Balancer is easy to manage and troubleshoot. NSX Advanced Load
Balancer supports real-time telemetry with an insight into the applications that run on the system.
The elastic auto-scaling and the decision automation features highlight the cloud-native automation
capabilities of NSX Advanced Load Balancer.
NSX Advanced Load Balancer also lets you configure L7 ingress for your workload clusters by using
one of the following options:
This option enables NSX Advanced Load Balancer L7 ingress capabilities, including sending traffic
directly from the service engines (SEs) to the pods, preventing multiple hops that other ingress
solutions need when sending packets from the load balancer to the right node where the pod runs.
The NSX Advanced Load Balancer controller creates a virtual service with a backend pool with the
pod IP addresses which helps to send the traffic directly to the pods.
However, each workload cluster needs a dedicated SE group for Avi Kubernetes Operator (AKO) to
work, which could increase the number of SEs you need for your environment. This mode is used
when you have a small number of workload clusters.
The NodePort mode is the default mode when AKO is installed on Tanzu Kubernetes Grid. This
option allows your workload clusters to share SE groups and it is fully supported by VMware. With
this option, the services of your workloads must be set to NodePort instead of ClusterIP even when
accompanied by an ingress object. This ensures that NodePorts are created on the worker nodes
and traffic can flow through the SEs to the pods via the NodePorts. Kube-Proxy, which runs on each
node as DaemonSet, creates network rules to expose the application endpoints to each of the nodes
in the format “NodeIP:NodePort”. The NodePort value is the same for a service on all the nodes. It
exposes the port on all the nodes of the Kubernetes Cluster, even if the pods are not running on it.
This feature is supported only with Antrea CNI. The primary difference between this mode and the
NodePort mode is that the traffic is sent directly to the pods in your workload cluster through node
ports without interfering Kube-proxy. With this option, the workload clusters can share SE groups.
Similar to the ClusterIP Mode, this option avoids the potential extra hop when sending traffic from the
VMware, Inc 38
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
NSX Advanced Load Balancer SEs to the pod by targeting the right nodes where the pods run.
Antrea agent configures NodePortLocal port mapping rules at the node in the format
“NodeIP:Unique Port” to expose each pod on the node on which the pod of the service is running.
The default range of the port number is 61000-62000. Even if the pods of the service are running
on the same Kubernetes node, Antrea agent publishes unique ports to expose the pods at the node
level to integrate with the load balancer.
This option does not have all the NSX Advanced Load Balancer L7 ingress capabilities but uses it for
L4 load balancing only and leverages Contour for L7 Ingress. This also allows sharing SE groups
across workload clusters. This option is supported by VMware and it requires minimal setup.
TKO- Deploy NSX ALB L7 - Gives good Network hop efficiency. Supported only with
ALB-L7- ingress in - Helps to reduce the east-west traffic and Antrea CNI with IPV4
001 NodePortLocal mode. encapsulation overhead. addressing.
- Service Engine groups are shared across clusters
and the load-balancing persistence is also
supported.
Container Registry
VMware Tanzu for Kubernetes Operations using Tanzu Kubernetes Grid includes Harbor as a
container registry. Harbor provides a location for pushing, pulling, storing, and scanning container
images used in your Kubernetes clusters.
Harbor registry is used for day-2 operations of the Tanzu Kubernetes workload clusters. Typical day-
2 operations include tasks such as pulling images from Harbor for application deployment, pushing
custom images to Harbor, etc.
VM-based deployment using OVA - VMware recommends this installation method in cases
where Tanzu Kubernetes Grid is being installed in an air-gapped or Internet-restricted
environment, and no pre-existing image registry exists to host the Tanzu Kubernetes Grid
system images. VM-based deployments are only supported by VMware Global Support
Services to host the system images for air-gapped or Internet-restricted deployments. Do
not use this method for hosting application images.
If you are deploying Harbor without a publicly signed certificate, you must include the Harbor root
CA in your Tanzu Kubernetes Grid clusters. To do so, follow the procedure in Trust Custom CA
Certificates on Cluster Nodes.
VMware, Inc 39
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Prometheus is an open-source system monitoring and alerting toolkit. It can collect metrics from
target clusters at specified intervals, evaluate rule expressions, display the results, and trigger alerts if
certain conditions arise. The Tanzu Kubernetes Grid implementation of Prometheus includes Alert
Manager, which you can configure to notify you when certain events occur.
Grafana is an open-source visualization and analytics software. It allows you to query, visualize, alert
on, and explore your metrics no matter where they are stored. Both Prometheus and Grafana are
installed through user-managed Tanzu packages by creating the deployment manifests and invoking
the tanzu package install command to deploy the packages in the Tanzu Kubernetes clusters.
The following diagram shows how the monitoring components on a cluster interact.
You can use out-of-the-box Kubernetes dashboards or you can create new dashboards to monitor
compute, network, and storage utilization of Kubernetes objects such as Clusters, Namespaces,
VMware, Inc 40
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Pods, etc.
You can also monitor your Tanzu Kubernetes Grid clusters using Tanzu Observability which is a SaaS
offering by VMware. Tanzu Observability provides various out-of-the-box dashboards. You can
customize the dashboards for your particular deployment. For information on how to customize
Tanzu Observability dashboards for Tanzu for Kubernetes Operations, see Customize Tanzu
Observability Dashboard for Tanzu for Kubernetes Operations.
Log processing and forwarding in Tanzu Kubernetes Grid is provided via Fluent Bit. Fluent bit
binaries are available as part of extensions and can be installed on management cluster or in
workload cluster. Fluent Bit is a light-weight log processor and forwarder that allows you to collect
data and logs from different sources, unify them, and send them to multiple destinations. VMware
Tanzu Kubernetes Grid includes signed binaries for Fluent Bit that you can deploy on management
clusters and on Tanzu Kubernetes clusters to provide a log-forwarding service.
Fluent Bit makes use of the Input Plug-ins, the filters, and the Output Plug-ins. The Input Plug-ins
define the source from where it can collect data, and the output plug-ins define the destination
where it should send the information. The Kubernetes filter will enrich the logs with Kubernetes
metadata, specifically labels and annotations. Once you configure Input and Output plug-ins on the
Tanzu Kubernetes Grid cluster. Fluent Bit is installed as a user-managed package.
Fluent Bit integrates with logging platforms such as VMware Aria Operations for Logs, Elasticsearch,
Kafka, Splunk, or an HTTP endpoint. For more details about configuring Fluent Bit to your logging
provider, see Implement Log Forwarding with Fluent Bit.
A custom image must be based on the operating system (OS) versions that are supported by Tanzu
Kubernetes Grid. The table below provides a list of the operating systems that are supported for
building custom images for Tanzu Kubernetes Grid.
VMware, Inc 41
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
- Photon OS 3
- Windows 2019
For additional information on building custom images for Tanzu Kubernetes Grid, see Build Machine
Images.
VMware provides FIPS-capable Kubernetes OVA that can be used to deploy FIPS compliant Tanzu
Kubernetes Grid management and workload clusters. Tanzu Kubernetes Grid core components,
such as Kubelet, Kube-apiserver, Kube-controller manager, Kube-proxy, Kube-scheduler, Kubectl,
Etcd, Coredns, Containerd, and Cri-tool are made FIPS compliant by compiling them with the
BoringCrypto FIPS modules, an open-source cryptographic library that provides FIPS 140-2
approved algorithms.
Installation Experience
Tanzu Kubernetes Grid management cluster is the first component that you deploy to get started
with Tanzu Kubernetes Grid.
You can deploy the management cluster in one of the following ways:
1. Run the Tanzu Kubernetes Grid installer, a wizard interface that guides you through the
process of deploying a management cluster. VMware recommends this method if you are
installing a Tanzu Kubernetes Grid Management cluster for the first time.
2. Create and edit YAML configuration files to use with CLI commands to deploy the
management cluster.
By using the current version of the The Tanzu Kubernetes Grid Installation user interface, you can
install Tanzu Kubernetes Grid on VMware vSphere, AWS, and Microsoft Azure. The UI provides a
guided experience tailored to the IaaS, in this case on VMware vSphere backed by NSX-T Data
Center networking.
VMware, Inc 42
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
The installation of Tanzu Kubernetes Grid on VMware Cloud on AWS is done through the same UI as
mentioned above but tailored to a vSphere environment.
This installation process takes you through setting up TKG Management Cluster on your vSphere
environment. Once the management cluster is deployed, you can register the management cluster
with Tanzu Mission Control and deploy Tanzu Kubernetes shared services and workload clusters
directly from the Tanzu Mission Control UI or Tanzu CLI to deploy Tanzu Kubernetes shared service
and workload clusters.
Deployment Instructions
For instructions on how to deploy this reference design, see Deploy Tanzu for Kubernetes
Operations in VMware Cloud on AWS.
Summary
Tanzu on VMware Cloud on AWS offers high-performance potential, convenience, and addresses
the challenges of creating, testing, and updating Kubernetes platforms in a consolidated production
VMware, Inc 43
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
environment. This validated approach results in a production quality installation with all the
application services needed to serve combined or uniquely separated workload types through a
combined infrastructure solution.
This plan meets many day-0 needs for aligning product capabilities, such as configuring firewall
rules, networking, load balancing, and workload compute, to the full stack infrastructure.
Supplemental Information
Automating Deployment of Service Engines
As discussed, Avi Vantage is installed in No Orchestrator mode on VMWare Cloud on AWS.
Therefore, the deployment of service engines (SE) on VMware Cloud on AWS is not orchestrated by
the Avi Controller. Once SE is integrated with the Avi Controller, virtual service placement and
scaling can be handled centrally from the Avi Controller. A pair of service engines provide HA for
load balancing.
It is troublesome to manually deploy a pair of service engines for each tenant using the Import OVA
workflow in VMware Cloud on AWS. Therefore, we recommend using GOVC in conjunction with
Python to obtain the OVF properties as a JSON file and then customizing the JSON file for each
service engine.
The following example JSON file can be used to automate the provisioning of service engines ready
for use with Tanzu Kubernetes Grid.
{
"DiskProvisioning": "flat",
"IPAllocationPolicy": "fixedPolicy",
"IPProtocol": "IPv4",
"PropertyMapping": [
{
"Key": "AVICNTRL",
"Value": "<ip-address-of-avi-controller>"
},
{
"Key": "AVISETYPE",
"Value": "NETWORK_ADMIN"
},
{
"Key": "AVICNTRL_AUTHTOKEN",
"Value": "<avi-controller-auth-token>"
},
{
"Key": "AVICNTRL_CLUSTERUUID",
"Value": "<avi-controller-cluster-id>"
},
{
"Key": "avi.mgmt-ip.SE",
"Value": "<management-ip-address-of-service-engine>"
},
{
"Key": "avi.mgmt-mask.SE",
"Value": "255.255.255.0"
},
{
VMware, Inc 44
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
"Key": "avi.default-gw.SE",
"Value": "<avi-management-network-gateway>"
},
{
"Key": "avi.DNS.SE",
"Value": "<dns-server>"
},
{
"Key": "avi.sysadmin-public-key.SE",
"Value": ""
}
],
"NetworkMapping": [
{
"Name": "Management",
"Network": "avi-management"
},
{
"Name": "Data Network 1",
"Network": "<tkg-workload-1-cluster-network-segment-name>"
},
{
"Name": "Data Network 2",
"Network": "<tkg-workload-2-cluster-network-segment-name>"
},
{
"Name": "Data Network 3",
"Network": "<tkg-workload-3-cluster-network-segment-name>"
},
{
"Name": "Data Network 4",
"Network": "<tkg-workload-4-cluster-network-segment-name>"
},
{
"Name": "Data Network 5",
"Network": "<tkg-workload-5-cluster-network-segment-name>"
},
{
"Name": "Data Network 6",
"Network": "<tkg-workload-6-cluster-network-segment-name>"
},
{
"Name": "Data Network 7",
"Network": "<tkg-workload-7-cluster-network-segment-name>"
},
{
"Name": "Data Network 8",
"Network": "<tkg-workload-8-cluster-network-segment-name>"
},
{
"Name": "Data Network 9",
"Network": "<tkg-workload-9-cluster-network-segment-name>"
}
],
"MarkAsTemplate": false,
"PowerOn": true,
"InjectOvfEnv": false,
"WaitForIP": false,
"Name": "se-1"
VMware, Inc 45
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
export GOVC_URL=<fqdn-of-vcenter-in-vmware-cloud-on-aws>
export [email protected]
export GOVC_PASSWORD=<[email protected]>
export GOVC_INSECURE=false
govc import.spec /home/admin/se.ova | python -m json.tool > se-1.json
govc import.ova -pool=*/Resources/Compute-ResourcePool/TKG/SEs -ds=WorkloadDatastore -
-options=/home/admin/se-1.json /home/admin/se.ova
This deploys a new service engine with a VM name of _se-1_ into the resource pool _Compute-
ResourcePool/TKG/SEs_. Since the _PowerOn_ parameter is set to _true_, the service engine boots
up automatically and since we have set the key value pairs for the following, the service engine is
automatically registered with Avi Controller and is ready for further configuration in Avi Vantage:
"Key": "AVICNTRL",
"Value": "<ip-address-of-avi-controller>"
"Key": "AVICNTRL_CLUSTERUUID",
"Value": "<avi-controller-cluster-id>"
"Key": "avi.mgmt-ip.SE",
"Value": "<management-ip-address-of-service-engine>"
On vSphere, you can configure all node VMs to have the same predefined configurations, set
different predefined configurations for control plane and worker nodes, or customize the
configurations of the nodes. By using these settings, you can create clusters that have nodes with
different configurations to the management cluster nodes. You can also create clusters in which the
control plane nodes and worker nodes have different configurations.
Small 2 4 20
Medium 2 8 40
Large 4 16 40
Extra-large 8 32 80
To create a cluster in which all of the control plane and worker node VMs are the same size, specify
the SIZE variable. If you set the SIZE variable, all nodes will be created with the configuration that
you set.
SIZE: "large"
VMware, Inc 46
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
To create a cluster in which the control plane and worker node VMs are different sizes, specify the
CONTROLPLANE_SIZE and WORKER_SIZE options.
CONTROLPLANE_SIZE: "medium"
WORKER_SIZE: "large"
You can combine the CONTROLPLANE_SIZE and WORKER_SIZE options with the SIZE option. For
example, if you specify SIZE: "large" with WORKER_SIZE: "extra-large", the control plane nodes
will be set to large and worker nodes will be set to extra-large.
SIZE: "large"
WORKER_SIZE: "extra-large"
To use the same custom configuration for all nodes, specify the VSPHERE_NUM_CPUS,
VSPHERE_DISK_GIB, and VSPHERE_MEM_MIB options.
VSPHERE_NUM_CPUS: 2
VSPHERE_DISK_GIB: 40
VSPHERE_MEM_MIB: 4096
To define different custom configurations for control plane nodes and worker nodes, specify the
VSPHERE_CONTROL_PLANE_* and VSPHERE_WORKER_*
VSPHERE_CONTROL_PLANE_NUM_CPUS: 2
VSPHERE_CONTROL_PLANE_DISK_GIB: 20
VSPHERE_CONTROL_PLANE_MEM_MIB: 8192
VSPHERE_WORKER_NUM_CPUS: 4
VSPHERE_WORKER_DISK_GIB: 40
VSPHERE_WORKER_MEM_MIB: 4096
The number of virtual services that can be deployed per controller cluster is directly proportional to
VMware, Inc 47
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
the controller cluster size. For more information, see the NSX Advanced Load Balancer
Configuration Maximums Guide.
Performance metric Per core performance Maximum performance on a single Service Engine VM
Multiple performance vectors or features may have an impact on performance. For instance, to
achieve 1 Gb/s of SSL throughput and 2000 TPS of SSL with EC certificates, NSX Advanced Load
Balancer recommends two cores.
The scope of the document is limited to providing the deployment steps based on the reference
design in VMware Tanzu for Kubernetes Operations on VMware Cloud on AWS Reference Design.
VMware Service Installer for Tanzu automates the deployment of the reference designs for Tanzu for
Kubernetes Operations. It uses best practices for deploying and configuring the required Tanzu for
Kubernetes Operations components.
To use Service Installer to automate this deployment, see Deploying VMware Tanzu for Kubernetes
Operations on VMware Cloud on AWS Using Service Installer for VMware Tanzu.
Alternatively, if you decide to manually deploy each component, follow the steps provided in this
document.
Prerequisites
These instructions assume that you have the following set up:
SDDC deployment
VMware, Inc 48
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
To verify the interoperability of other versions and products, see VMware Interoperability Matrix.
General Requirements
Network Requirements
Firewall Requirements
General Requirements
Your environment should meet the following general requirements:
Dedicated resource pools and VM folders for collecting Tanzu Kubernetes Grid and
NSX Advanced Load Balancer VMs. Refer to the Resource Pools and VM Folders
section for more information.
NSX Advanced Load Balancer 22.1.2 OVA downloaded from the customer connect
portal and readily available for deployment.
A content library to store NSX Advanced Load Balancer Controller and service
engine OVA templates.
Depending on the OS flavor of the bootstrap VM, download and configure the following
packages from VMware Customer Connect. As part of this documentation, refer to the
section to configure required packages on the Photon OS machine.
VMware, Inc 49
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
A vSphere account with the permissions described in Required Permissions for the vSphere
Account.
Download and import NSX Advanced Load Balancer 22.1.2 OVA to Content Library.
Download the following OVA from VMware Customer Connect and import to vCenter.
Convert the imported VMs to templates.
Note
You can also download supported older versions of Kubernetes from VMware
Customer Connect and import them to deploy workload clusters on the intended
Kubernetes versions.
The sample entries of the resource pools and folders that need to be created are as follows.
Network Requirements
Create NSX-T logical segments for deploying Tanzu for Kubernetes Operations components as per
Network Recommendations defined in the reference architecture.
Firewall Requirements
Ensure that the firewall is set up as described in Firewall Recommendations.
VMware, Inc 50
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
The following table provides a sample IP address and FQDN set for the NSX Advanced Load
Balancer controllers:
VMware, Inc 51
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
The following IP addresses are reserved for NSX Advanced Load Balancer:
Follow these steps to deploy and configure NSX Advanced Load Balancer:
2. Select the cluster where you want to deploy the NSX Advanced Load Balancer controller
node.
3. Right-click the cluster and invoke the Deploy OVF Template wizard.
After the controller VM is deployed and powered on, connect to the URL for the node and configure
the node for your Tanzu Kubernetes Grid environment as follows:
1. Create the administrator account by setting the password and optional email address.
VMware, Inc 52
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
2. Configure System Settings by specifying the backup passphrase and DNS information.
VMware, Inc 53
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
If you did not select the Setup Cloud After option before saving, the initial configuration wizard exits.
The Cloud configuration window does not automatically launch and you are directed to a Dashboard
view on the controller.
Configure Licensing
Tanzu for Kubernetes Operations is bundled with a license for NSX Advanced Load Balancer
Enterprise. To configure licensing, complete the following steps.
1. Navigate to the Administration > Settings > Licensing and click on the gear icon to change
the license type to Enterprise.
VMware, Inc 54
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
3. Once the license tier is changed, apply the NSX Advanced Load Balancer Enterprise license
key. If you have a license file instead of a license key, apply the license by selecting the
Upload a License File option.
Note
VMware, Inc 55
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
To run a 3-node controller cluster, you deploy the first node, perform the initial configuration, and
set the cluster IP address. After that, you deploy and power on two more Controller VMs, but you
must not run the initial configuration wizard or change the admin password for these controller VMs.
The configuration of the first controller VM is assigned to the two new controller VMs.
Repeat the steps provided in the Deploy NSX Advanced Load Balancer Controller section to deploy
additional controllers.
1. To configure the controller cluster, navigate to the Administration > Controller > Nodes
page and click Edit.
VMware, Inc 56
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
2. Specify the name for the controller cluster and set the Cluster IP. This IP address should be
from the NSX Advanced Load Balancer management network.
3. Under Cluster Nodes, specify the IP addresses of the two additional controllers that you have
deployed. Optionally, you can configure the name for the controllers.
After you click Save, the controller cluster setup starts, and the controller nodes are rebooted in the
process. It takes approximately 10-15 minutes for cluster formation to complete.
You are automatically logged out of the controller node where you are currently logged in. On
entering the cluster IP address in the browser, you can see details about the cluster formation task.
Note
Once the controller cluster is deployed, you must use the IP address of the controller
VMware, Inc 57
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
cluster, not the IP address of the individual controller node, for any further
configuration.
Connect to the NSX Advanced Load Balancer controller cluster IP/FQDN and ensure that all
controller nodes are in a healthy state.
The first controller of the cluster receives the “Leader” role. The second and third controllers work
as “Followers”.
The controller has a default self-signed certificate, but this certificate does not have the correct SAN.
You must replace it with a valid or self-signed certificate that has the correct SAN. You can create a
self-signed certificate or upload a CA-signed certificate.
For the purpose of the demonstration, this document uses a self-signed certificate.
1. To replace the default certificate, navigate to the Templates > Security > SSL/TLS
Certificate > Create and select Controller Certificate.
2. In the New Certificate (SSL/TLS) window, enter a name for the certificate and set the type
to Self Signed.
VMware, Inc 58
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Common Name - Specify the fully-qualified site name. For the site to be considered
trusted, this entry must match the hostname that the client entered in the browser.
Subject Alternate Name (SAN) - Enter the cluster IP address or FQDN of the
controller cluster nodes.
Key Size
VMware, Inc 59
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
5. To change the NSX Advanced Load Balancer portal certificate, navigate to the
Administration > Settings >Access Settings page and click the pencil icon to edit the
settings.
VMware, Inc 60
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
6. Under SSL/TLS Certificate, remove the existing default certificates. From the drop-down
menu, select the newly created certificate and click Save.
7. Refresh the controller portal from the browser and accept the newly created self-signed
certificate. Ensure that the certificate reflects the updated information in the browser.
1. Navigate to the Templates > Security > SSL/TLS Certificate page and export the certificate
by clicking Export.
2. In the Export Certificate page, click Copy to clipboard against the certificate. Do not copy
the key. Save the copied certificate to use later when you enable workload management.
VMware, Inc 61
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
3. Provide a name for the cloud, enable IPv4 DHCP under DHCP settings, and click Save.
VMware, Inc 62
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
4. After the cloud is created, ensure that the health status of the cloud is green.
Virtual services for all load balancer functionalities requested by the Tanzu Kubernetes Grid
management and shared services clusters.
Virtual services that load balance control plane nodes of management cluster and shared
services cluster.
TKG-WLD01-SEG: Service engines part of this SE group host virtual services that load balance
control plane nodes and virtual services for all load balancer functionalities requested by the
workload clusters mapped to this SE group.
Note
- Based on your requirements, you can create additional SE groups for the workload
clusters. - Multiple workload clusters can be mapped to a single SE group. - A Tanzu
Kubernetes Grid cluster can be mapped to only one SE group for application load
balancer services.
To create and configure a new SE group, complete the following steps. The following components
are created in NSX Advanced Load Balancer.
1. Go to Infrastructure > Service Engine Group under Cloud Resources and click Create.
2. Provide a name for the SE group and configure the following settings:
VMware, Inc 63
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
SE Self-Election: Selected
3. Repeat the steps to create an SE group for the Tanzu Kubernetes Grid workload cluster. You
should have created two service engine groups.
TKG-Cluster-VIP: This network provides high availability for the control plane nodes of the
Tanzu Kubernetes Grid management cluster, shared services cluster, and the workload
clusters.
TKG-Management-VIP: This network provides VIP for the extensions (Envoy, Contour, etc.)
deployed in the shared services cluster.
TKG-Workload-VIP: This network provides VIP for the applications (of type load balancer)
deployed in the workload clusters.
Note
You can provision additional VIP networks for the network traffic separation for the
applications deployed in various workload clusters. This is a day-2 operation.
To create and configure the VIP networks, complete the following steps.
1. Go to the Infrastructure > Networks tab under Cloud Resources and click Create. Check that
the VIP networks are being created under the correct cloud.
VMware, Inc 64
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
2. Provide a name for the VIP network and uncheck the DHCP Enabled and IPv6 Auto-
Configuration options.
VMware, Inc 65
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Configure Routing
After configuring the VIP networks, set the default routes for all VIP/data networks. The following
table lists the default routes used in the current environment.
Note
1. Go to the Infrastructure > VRF Context > Edit global and add Static Route.
VMware, Inc 66
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Kubernetes Grid management VIP network and Tanzu Kubernetes Grid workload VIP network.
1. Navigate to the Templates > Profiles > IPAM/DNS Profiles page, click Create, and select
IPAM Profile.
2. Create the profile using the values shown in the following table.
Parameter Value
Name sfo01w01ipam01
Usable Networks
sfo01-w01-vds01-tkgclustervip
sfo01-w01-vds01-tkgmanagementvip
sfo01-w01-vds01-tkgworkloadvip
4. To create a DNS profile, click Create again and select DNS Profile.
Provide a name for the DNS Profile and select AVI Vantage DNS as the profile type.
Under Domain Name, specify the domain that you want to use with NSX Advanced
Load Balancer.
Optionally, set a new value in Override Record TTL for this domain. The default
value for all domains is 30 seconds.
VMware, Inc 67
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
The newly created IPAM and DNS profiles must be associated with the cloud so they can be
leveraged by the NSX Advanced Load Balancer objects created under that cloud.
To assign the IPAM and DNS profile to the cloud, go to the Infrastructure > Cloud page and edit the
cloud configuration.
2. Under DNS Profile, select the DNS profile and save the settings.
After configuring the IPAM and DNS profiles, verify that the status of the cloud is green.
To download the service engine image for deployment, navigate to the Infrastructure > Clouds tab,
select your cloud, click the download icon, and select type as OVA.
Wait a few minutes for the image generating task to finish. When the task is finished, the resulting
image file is immediately downloaded.
VMware, Inc 68
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Import the Service Engine Image File into the Content Library
You can use the downloaded OVA file directly to create a service engine VM, but bear in mind that
this approach requires you to upload the image to vCenter every time you need to create a new
service engine VM.
For faster deployment, import the service engine OVA image into the content library and use the
“deploy from template” wizard to create new service engine VMs.
Before deploying a service engine VM, you must obtain a cluster UUID and generate an
authentication token. A cluster UUID facilitates integrating the service engine with NSX Advanced
Load Balancer Controller. Authentication between the two is performed via an authentication token.
To generate a cluster UUID and auth token, navigate to Infrastructure > Clouds and click the key
icon in front of the cloud that you have created. This opens a new popup window containing both
the cluster UUID and the auth token.
Note
You need a new auth token every time a new Service Engine instance is deployed.
Deploy Service Engine VMs for Tanzu Kubernetes Grid Management Cluster
1. To deploy a service engine VM, log in to the vSphere client and navigate to Menu > Content
Library > Your Content Library. Navigate to the Templates tab and select the service engine
template, right-click it, and choose New VM from this template.
2. Follow the VM creation wizard. On the networks page, select the management and data
networks for the SE VM.
The Management network label is mapped to the NSX Advanced Load Balancer
Management logical segment. The remaining network labels (Data Network 1 – 9) are
connected to any of the front-end virtual service’s network or back-end server’s logical
network as required. It is left disconnected if not required.
The service engine for the Tanzu Kubernetes Grid management cluster is connected to the
following networks:
Management: sfo01-w01-vds01-albmanagement
VMware, Inc 69
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
3. Provide the cluster UUID and authentication token that you generated earlier on the
Customize template page on the time of SE Deployment. Configure the service engine VM
management network settings as well.
VMware, Inc 70
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
4. Repeat the steps to deploy an additional service engine VM for the Tanzu Kubernetes Grid
management cluster.
By default, service engine VMs are created in the default Service Engine Group.
To map the service engine VMs to the correct Service Engine Group,
1. Go to the Infrastructure > Service Engine tab, select your cloud, and click the pencil icon to
update the settings and link the service engine to the correct SEG.
On the Service Engine Group page, you can confirm the association of service engines with Service
Engine Groups.
Service engine VMs deployed for Tanzu Kubernetes Grid workload cluster are connected to the
VMware, Inc 71
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
following networks:
Management: sfo01-w01-vds01-albmanagement
You need to deploy service engine VMs with the above settings.
After deploying the service engines, edit the service engine VMs and associate them with the
sfo01w01segroup01 Service Engine Group.
VMware, Inc 72
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
The bootstrap machine runs a local kind cluster when Tanzu Kubernetes Grid management cluster
deployment is started. Once the kind cluster is fully initialized, the configuration is used to deploy the
actual management cluster on the backend infrastructure. After the management cluster is fully
configured, the local kind cluster is deleted and future configurations are performed via the Tanzu
CLI.
The bootstrap machine runs a local kind cluster when Tanzu Kubernetes Grid management cluster
deployment is started. Once the kind cluster is fully initialized, the configuration is used to deploy the
actual management cluster on the backend infrastructure. After the management cluster is fully
configured, the local kind cluster is deleted and future configurations are performed with the Tanzu
CLI.
For this deployment, a Photon-based virtual machine is used as the bootstrap machine. For
information on how to configure for a macOS or Windows machine, see Install the Tanzu CLI and
Other Tools.
Docker and containerd binaries are installed. For instructions on how to install Docker, see
VMware, Inc 73
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Docker documentation.
Ensure that the bootstrap VM is connected to Tanzu Kubernetes Grid management network.
To install Tanzu CLI, Tanzu Plugins, and Kubectl utility on the bootstrap machine, follow the
instructions below:
1. Download and unpack the following Linux CLI packages from VMware Tanzu Kubernetes
Grid Download Product page.
2. Execute the following commands to install Tanzu Kubernetes Grid CLI, kubectl CLIs, and
Carvel tools.
version: v0.28.0
buildDate: 2023-01-20
sha: 3c34115bc-dirty
VMware, Inc 74
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
##Install ytt
cd ./cli
gunzip ytt-linux-amd64-v0.43.1+vmware.1.gz
chmod ugo+x ytt-linux-amd64-v0.43.1+vmware.1 && mv ./ytt-linux-amd64-v0.43.1+v
mware.1 /usr/local/bin/ytt
##Install kapp
cd ./cli
gunzip kapp-linux-amd64-v0.53.2+vmware.1.gz
chmod ugo+x kapp-linux-amd64-v0.53.2+vmware.1 && mv ./kapp-linux-amd64-v0.53.2+
vmware.1 /usr/local/bin/kapp
##Install kbld
cd ./cli
gunzip kbld-linux-amd64-v0.35.1+vmware.1.gz
chmod ugo+x kbld-linux-amd64-v0.35.1+vmware.1 && mv ./kbld-linux-amd64-v0.35.1+
vmware.1 /usr/local/bin/kbld
##Install impkg
cd ./cli
gunzip imgpkg-linux-amd64-v0.31.1+vmware.1.gz
chmod ugo+x imgpkg-linux-amd64-v0.31.1+vmware.1 && mv ./imgpkg-linux-amd64-v0.3
1.1+vmware.1 /usr/local/bin/imgpkg
ytt version
kapp version
kbld version
imgpkg version
4. Install yq. yq is a lightweight and portable command-line YAML processor. yq uses jq-like
syntax but works with YAML and JSON files.
wget https://github.com/mikefarah/yq/releases/download/v4.24.5/yq_linux_amd64.t
ar.gz
5. Install kind.
VMware, Inc 75
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
6. Execute the following commands to start the Docker service and enable it to start at boot.
Photon OS has Docker installed by default.
7. Execute the following commands to ensure that the bootstrap machine uses cgroup v1.
An SSH key pair is required for Tanzu CLI to connect to vSphere from the bootstrap
machine.
The public key part of the generated key is passed during the Tanzu Kubernetes Grid
management cluster deployment.
## Add the private key to the SSH agent running on your machine and enter the p
assword you created in the previous step
ssh-add ~/.ssh/id_rsa
## If the above command fails, execute "eval $(ssh-agent)" and then rerun the c
ommand
9. If your bootstrap machine runs Linux or Windows Subsystem for Linux, and it has a Linux
kernel built after the May 2021 Linux security patch, for example Linux 5.11 and 5.12 with
Fedora, run the following command.
All required packages are now installed and the required configurations are in place in the bootstrap
virtual machine. The next step is to deploy the Tanzu Kubernetes Grid management cluster.
VMware, Inc 76
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Before you proceed with the management cluster creation, ensure that the base image template is
imported into vSphere and is available as a template. To import a base image template into vSphere:
1. Go to the Tanzu Kubernetes Grid downloads page and download a Tanzu Kubernetes Grid
OVA for the cluster nodes.
2. For the management cluster, this must be either Photon or Ubuntu based Kubernetes
v1.24.9 OVA.
Note
3. For workload clusters, OVA can have any supported combination of OS and Kubernetes
version, as packaged in a Tanzu Kubernetes release.
Note
Make sure you download the most recent OVA base image templates in the
event of security patch releases. You can find updated base image templates
that include security patches on the Tanzu Kubernetes Grid product
download page.
4. In the vSphere client, right-click an object in the vCenter Server inventory and select Deploy
OVF template.
5. Select Local file, click the button to upload files, and go to the downloaded OVA file on your
local machine.
7. Click Finish to deploy the VM. When the OVA deployment finishes, right-click the VM and
select Template > Convert to Template.
Note
8. If using non administrator SSO account: In the VMs and Templates view, right-click the new
template, select Add Permission, and assign the tkg-user to the template with the TKG role.
For information about how to create the user and role for Tanzu Kubernetes Grid, see Required
Permissions for the vSphere Account.
VMware, Inc 77
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Run the Tanzu Kubernetes Grid installer, a wizard interface that guides you through the
process of deploying a management cluster.
Create a deployment YAML configuration file and use it to deploy the management cluster
with the Tanzu CLI commands.
The Tanzu Kubernetes Grid installer wizard is an easy way to deploy the cluster. The following steps
describe the process.
1. To launch the Tanzu Kubernetes Grid installer wizard, run the following command on the
bootstrapper machine:
2. Access the Tanzu Kubernetes Grid installer wizard by opening a browser and entering
http://<bootstrapper-ip>:port/
Note
Ensure that the port number that you enter in this command is allowed by
the bootstrap machine firewall.
3. From the Tanzu Kubernetes Grid installation user interface, click Deploy for VMware
vSphere.
4. On the IaaS Provider page, enter the IP/FQDN and credentials of the vCenter server where
the Tanzu Kubernetes Grid management cluster is to be deployed and click Connect.
VMware, Inc 78
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
If you are running a vSphere 7.x environment, the Tanzu Kubernetes Grid installer detects it
and provides a choice between deploying vSphere with Tanzu (TKGS) or the Tanzu
Kubernetes Grid management cluster.
6. Select the Virtual Datacenter and enter the SSH public key that you generated earlier.
7. On the Management Cluster Settings page, select the instance type for the control plane
node and worker node and provide the following information:
Management Cluster Name: Name for your Tanzu Kubernetes Grid management
VMware, Inc 79
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
cluster.
Control Plane Endpoint Provider: Select NSX Advanced Load Balancer for the
Control Plane HA.
Control Plane Endpoint: This is an optional field. If left blank, NSX Advanced Load
Balancer assigns an IP address from the pool sfo01-w01-vds01-tkgclustervip which
is configured in NSX Advanced Load Balancer. If you need to provide an IP address,
pick an unused IP address from the sfo01-w01-vds01-tkgclustervip static IP pool.
Enable Audit Logging: Enables audit logging for Kubernetes API server and node
VMs, choose as per environmental needs. For more information, see Audit Logging.
8. On the NSX Advanced Load Balancer page, provide the following information:
Controller credentials.
Controller certificate.
Note
VMware, Inc 80
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
In Tanzu Kubernetes Grid v2.1.x, you can configure the network to separate
the endpoint VIP network of the cluster from the external IP network of the
load balancer service and the ingress service in the cluster. This feature lets
you ensure the security of the clusters by providing you an option to expose
the endpoint of your management or the workload cluster and the load
balancer service and ingress service in the cluster, in different networks.
As per the Tanzu for Kubernetes Operations 2.1.x Reference Architecture, all the control
plane endpoints connected to Tanzu Kubernetes Grid cluster VIP network and data plane
networks are connected to the respective management data VIP network or workload data
VIP network.
Cloud Name: Name of the cloud created while configuring NSX Advanced Load
Balancer sfo01w01vc01.
Workload Cluster Service Engine Group Name: Name of the service engine group
created for Tanzu Kubernetes Grid workload cluster created when configuring NSX
Advanced Load Balancer sfo01w01segroup01.
Workload Cluster Data Plane VIP Network Name & CIDR: Select sfo01-w01-vds01-
tkgworkloadvip and subnet 192.168.16.0/26.
Workload Cluster Control Plane VIP Network Name & CIDR: Select sfo01-w01-
vds01-tkgclustervip and subnet 192.168.14.0/26.
Management Cluster Service Engine Group Name: Name of the service engine
group created for Tanzu Kubernetes Grid Management Cluster created when
configuring NSX Advanced Load Balancer sfo01m01segroup01.
Management Cluster Data Plane VIP Network Name & CIDR : Select sfo01-w01-
vds01-tkgmanagementvip and subnet 192.168.15.0/26.
Management Cluster Control Plane VIP Network Name & CIDR: Select sfo01-w01-
vds01-tkgclustervip and subnet 192.168.14.0/26.
Cluster Labels: Optional. Leave the cluster labels section empty to apply the above
workload cluster network settings by default. If you specify any label here, you must
specify the same values in the configuration YAML file of the workload cluster. Else,
the system places the endpoint VIP of your workload cluster in Management Cluster
Data Plane VIP Network by default.
VMware, Inc 81
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Note
With the above configuration, all the Tanzu workload clusters use
sfo01-w01-vds01-tkgclustervip for control plane VIP network and
sfo01-w01-vds01-tkgclustervip for data plane network by default. If
you would like to configure separate VIP networks for workload
control plane or data networks, create a custom AKO Deployment
Config (ADC) and provide the respective NSXALB_LABELS in the
workload cluster configuration file. For more information on network
separation and custom ADC creation, see Configure Separate VIP
Networks and Service Engine Groups in Different Workload Clusters.
10. On the Metadata page, you can specify location and labels.
11. On the Resources page, specify the compute containers for the Tanzu Kubernetes Grid
VMware, Inc 82
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
12. On the Kubernetes Network page, select the network where the control plane and worker
nodes are placed during management cluster deployment. Ensure that the network has
DHCP service enabled.
If the Tanzu environment is placed behind a proxy, enable the proxy and provide the proxy
details.
Note
The procedure shown in this document does not use a proxy to connect to
the Internet.
13. If LDAP is configured in your environment, see Configure Identity Management for
instructions on how to integrate an identity management system with Tanzu Kubernetes Grid.
VMware, Inc 83
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
14. Select the OS image to use for the management cluster deployment.
Note
After you import the correct template and click Refresh, the installer detects the image
automatically.
VMware, Inc 84
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
When you click Review Configuration, the installer populates the cluster configuration file,
which is located in the ~/.config/tanzu/tkg/clusterconfigs subdirectory, with the settings
that you specified in the interface. You can optionally export a copy of this configuration file
by clicking Export Configuration.
17. Deploy the management cluster from this configuration file by running the command:
When the deployment is started from the UI, the installer wizard displays the deployment
logs on the screen.
Deploying the management cluster takes approximately 20-30 minutes to complete. While
the management cluster is being deployed, a virtual service is created in NSX Advanced
Load Balancer and placed on one of the service engines created in the
“sfo01m01segroup01” SE Group.
The installer automatically sets the context to the management cluster so that you can log in
to it and perform additional tasks such as verifying health of the management cluster and
deploying the workload clusters.
18. After the Tanzu Kubernetes Grid management cluster deployment, run the following
command to verify the health status of the cluster:
Ensure that the cluster status reports as running and the values in the Ready column for
nodes, etc., are True.
VMware, Inc 85
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
See Examine the Management Cluster Deployment to perform additional health checks.
19. When deployment is completed successfully, run the following command to install the
additional Tanzu plugins:
install-ako-for-all: default config for all workload clusters. By default, all the workload
clusters reference this file for their virtual IP networks, service engine (SE) groups. This ADC
configuration does not enable NSX L7 Ingress by default.
tanzu-ako-for-shared: Used by shared services cluster to deploy the Virtual services in TKG
Mgmt SE Group and the loadbalancer applications in TKG Management VIP Network.
tanzu-ako-for-workload-L7-ingress: Use this ADC only if you would like to enable NSX
Advanced Load Balancer L7 Ingress on workload cluster, otherwise leave the cluster labels
empty to apply the network configuration from default ADC install-ako-for-all.
VMware, Inc 86
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
finalizers:
- ako-operator.networking.tkg.tanzu.vmware.com
generation: 2
name: <Unique name of AKODeploymentConfig>
spec:
adminCredentialRef:
name: nsx-alb-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: nsx-alb-controller-ca
namespace: tkg-system-networking
cloudName: <NAME OF THE CLOUD in ALB>
clusterSelector:
matchLabels:
<KEY>: <VALUE>
controlPlaneNetwork:
cidr: <TKG-Cluster-VIP-CIDR>
Name: <TKG-Cluster-VIP-Network>
controller: <NSX ALB CONTROLLER IP/FQDN>
dataNetwork:
cidr: <TKG-Mgmt-Data-VIP-CIDR>
name: <TKG-Mgmt-Data-VIP-Name>
extraConfigs:
cniPlugin: antrea
disableStaticRouteSync: true
ingress:
defaultIngressController: false
disableIngressClass: true
nodeNetworkList:
- networkName: <TKG-Mgmt-Network>
serviceEngineGroup: <Mgmt-Cluster-SEG>
The sample AKODeploymentConfig with sample values in place is as follows. You should add the
respective NSX ALB label type=shared-services while deploying shared services cluster to enforce
this network configuration.
cloud: sfo01w01vmcvc01
VMware, Inc 87
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
generation: 2
name: tanzu-ako-for-shared
spec:
adminCredentialRef:
name: nsx_alb-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: nsx_alb-controller-ca
namespace: tkg-system-networking
cloudName: sfo01w01vc01
clusterSelector:
matchLabels:
type: shared-services
controlPlaneNetwork:
cidr: 192.168.14.0/26
name: sfo01-w01-vds01-tkgclustervip
controller: 192.168.11.8
dataNetwork:
cidr: 192.168.16.0/26
name: sfo01-w01-vds01-tkgmanagementvip
extraConfigs:
cniPlugin: antrea
disableStaticRouteSync: true
ingress:
defaultIngressController: false
disableIngressClass: true
nodeNetworkList:
- networkName: sfo01-w01-vds01-tkgmanagement
serviceEngineGroup: sfo01m01segroup01
After you have the AKO configuration file ready, use the kubectl command to set the context to
Tanzu Kubernetes Grid management cluster and create the ADC:
Use the following command to list all AKODeploymentConfig created under the management
cluster:
VMware, Inc 88
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
As per the defined architecture, workload cluster control plane endpoint uses TKG Cluster VIP
Network, application load balancing uses TKG Workload Data VIP network and the virtual services are
deployed in sfo01w01segroup01 SE group.
Below are the changes in ADC Ingress section when compare to the default ADC.
nodeNetworkList: Provide the values for Tanzu Kubernetes Grid workload network name
and CIDR.
The format of the AKODeploymentConfig YAML file for enabling NSX Advanced Load Balancer L7
Ingress is as follows.
apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
name: <unique-name-for-adc>
spec:
adminCredentialRef:
name: nsx_alb-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: nsx_alb-controller-ca
namespace: tkg-system-networking
cloudName: <cloud name configured in nsx alb>
clusterSelector:
matchLabels:
<KEY>: <value>
controller: <ALB-Controller-IP/FQDN>
controlPlaneNetwork:
cidr: <TKG-Cluster-VIP-Network-CIDR>
name: <TKG-Cluster-VIP-Network-CIDR>
dataNetwork:
cidr: <TKG-Workload-VIP-network-CIDR>
name: <TKG-Workload-VIP-network-CIDR>
extraConfigs:
cniPlugin: antrea
disableStaticRouteSync: false # required
ingress:
disableIngressClass: false # required
nodeNetworkList: # required
- networkName: <TKG-Workload-Network>
cidrs:
- <TKG-Workload-Network-CIDR>
serviceType: NodePortLocal # required
VMware, Inc 89
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
The AKODeploymentConfig with sample values in place is as follows. You should add the respective
NSX ALB label workload-l7-enabled=true while deploying shared services cluster to enforce this
network configuration.
cloud: sfo01w01vc01
apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
name: tanzu-ako-for-workload-l7-ingress
spec:
adminCredentialRef:
name: nsx_alb-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: nsx_alb-controller-ca
namespace: tkg-system-networking
cloudName: tkg-vmc
clusterSelector:
matchLabels:
workload-l7-enabled: "true"
controlPlaneNetwork:
cidr: 192.168.14.0/26
name: sfo01-w01-vds01-tkgclustervip
controller: 192.168.11.8
dataNetwork:
cidr: 192.168.15.0/26
name: sfo01-w01-vds01-tkgworkloadvip
extraConfigs:
cniPlugin: antrea
disableStaticRouteSync: false
ingress:
disableIngressClass: false
nodeNetworkList:
- cidrs:
- 192.168.13.0/24
networkName: sfo01-w01-vds01-tkgworkload
serviceType: NodePortLocal
shardVSSize: MEDIUM
serviceEngineGroup: sfo01w01segroup01
Use the kubectl command to set the context to Tanzu Kubernetes Grid management cluster and
create the ADC:
VMware, Inc 90
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Use the following command to list all AKODeploymentConfig created under the management
cluster:
Now that you have successfully created the AKO deployment config, you need to apply the cluster
labels while deploying the workload clusters to enable NSX Advanced Load Balancer L7 Ingress with
NodePortLocal mode.
The procedure for deploying a shared service cluster is essentially the same as the procedure for
deploying a workload cluster. The only difference is that you add a tanzu-services label to the
shared services cluster to indicate its cluster role. This label identifies the shared services cluster to
the management cluster and workload clusters.
Shared services cluster use the custom ADC tanzu-ako-for-shared created earlier to apply the
network settings similar to management cluster. This is enforced by applying the NSXALB_LABEL
type:shared while deploying the shared services cluster.
Note
The scope of this document doesn’t cover the use of a proxy for Tanzu Kubernetes
Grid deployment. If your environment uses a proxy server to connect to the internet,
ensure that the proxy configuration object includes the CIDRs for the pod, ingress,
and egress from the workload network of the Management Cluster in the No proxy
list, as described in Create a Proxy Configuration Object for a Tanzu Kubernetes Grid
Service Cluster.
1. To deploy a shared services cluster, navigate to the Clusters tab and click Create Cluster.
VMware, Inc 91
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
2. On the Create cluster page, select the Tanzu Kubernetes Grid management cluster that you
registered in the previous step and click Continue to create cluster.
VMware, Inc 92
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
5. Enter a name for the cluster (Cluster names must be unique within an organization).
6. select the cluster group to which you want to attach your cluster.
In the vCenter and tlsThumbprint fields, enter the details for authentication.
From the datacenter, resourcePool, folder, network, and datastore drop down,
select the required information.
From the template drop down, select the Kubernetes version.The latest supported
VMware, Inc 93
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
In the sshAuthorizedKeys field, enter the SSH key that was created earlier.
Enable aviAPIServerHAProvider.
11. Select the high availability mode for the control plane nodes of the shared services cluster.
For a production deployment, it is recommended to deploy a highly available shared services
cluster.
VMware, Inc 94
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
12. You can optionally define the default node pool for your workload cluster.
Select OS Version.
Cluster creation roughly takes 15-20 minutes to complete. After the cluster deployment
completes, ensure that Agent and extensions health shows green.
VMware, Inc 95
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
14. Connect to the Tanzu Management Cluster context and verify the cluster labels for the
workload cluster.
## Add the tanzu-services label to the shared services cluster as its cluster r
ole. In the following command "sfo01w01tkgshared01” is the name of the shared s
ervice cluster
## Validate that TMC has applied the AVI_LABEL while deploying the cluster
15. Connect to admin context of the workload cluster using the following commands and validate
VMware, Inc 96
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
## Use the following command to get the admin context of workload Cluster.
Now that the shared services cluster is successfully created, you may proceed with deploying the
Harbor package. For more information, see Install Harbor in Deploy User-Managed Packages in
Workload Clusters.
The steps for deploying a workload cluster are the same as for a shared services cluster. However, in
step number 4, use the NSXALB Labels created for the Workload cluster on AKO Deployment.
After the Workload cluster creation verify the cluster labels and ako pod status 1. Connect to the
Tanzu Management Cluster context and verify the cluster labels for the workload cluster. ```bash ##
verify the workload service cluster creation
VMware, Inc 97
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
## Validate that TMC has applied the AVI_LABEL while deploying the cluster
```
<!-- /* cSpell:enable */ -->
1. Connect to admin context of the workload cluster using the following commands and validate
the ako pod status.
## Use the following command to get the admin context of workload Cluster.
You can now configure SaaS components and deploy user-managed packages on the cluster.
VMware, Inc 98
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
For instructions on installing Tanzu Service Mesh on your workload cluster, see Onboard a Tanzu
Kubernetes Cluster to Tanzu Service Mesh.
This document lays out a reference design for deploying VMware Tanzu for Kubernetes Operations
with Tanzu components on AWS.
The following reference design is based on the architecture and components described in VMware
Tanzu for Kubernetes Operations Reference Architecture.
Note: This reference design is supported and validated for customers deploying Tanzu Kubernetes
Grid 1.6.x on AWS.
VMware, Inc 99
VMware Tanzu for Kubernetes Operations Reference Architecture 2.3
Cluster creation and management Core Cluster API (v1.1.5), Cluster API Provider AWS (v1.2.0)
Build your own image Amazon Linux 2, Ubuntu 18.04, Ubuntu 20.04
Storage Amazon EBS CSI driver (v1.8.0) and in-tree cloud providers
Authentication OIDC through Pinniped (v0.12.1), LDAP through Pinniped (v0.12.1) and
Dex
Network Overview
The following network diagram shows the network layout used with this reference design. It shows
the layout for a single virtual private cloud (VPC). The network layout uses the following types of
subnets:
1. One private subnet for each AWS availability zone (AZ). These subnets are not automatically
allocated a public IP address. The default gateway is a NAT gateway.
2. One public subnet for each AWS availability zone (AZ). These subnets are automatically
allocated a public IP address. The default gateway is an Internet gateway if subnet is
connected to the Internet. A public subnet is optional if you do not need Internet ingress or
egress.
Network Recommendations
This reference design uses Tanzu Kubernetes Grid to manage the lifecycle of multiple Kubernetes
workload clusters by bootstrapping a Kubernetes management cluster with the Tanzu command line
tool. Consider the following when configuring the network for Tanzu Kubernetes Grid:
Use an internal load balancer scheme. A best practice is to create an internal load balancer
to avoid exposing the Kubernetes API to the public Internet. To avoid creating a public-
facing load balancer, you can set AWS_LOAD_BALANCER_SCHEME_INTERNAL to true in
the cluster configuration file AWS_LOAD_BALANCER_SCHEME_INTERNAL: true This setting
customizes the management cluster’s load balancer to use an internal scheme, which means
that its Kubernetes API server will not be accessible and routed over the Internet. If you use
an internal load balancer, run Tanzu Kubernetes Grid from a machine with access to the
target VPC private IP space.
If you don’t want an outbound Internet or inbound connection from AWS, you can eliminate
the public subnet.
Beware that 172.17.0.0/16 is the default docker subnet. If you are going to use that for a VPC
deployment, you must change your docker container subnet.
Storage
Tanzu Kubernetes Grid ships with the AWS cloud storage driver, which allows you to provision
stateful storage volumes in your Tanzu Kubernetes Grid cluster. The following storage classes are
available:
For more information on the available storage options see Amazon EBS volume types.
VPC Architectures
In a production deployment, Tanzu Kubernetes Grid creates a multi-AZ deployment.
We recommend that you create the VPCs before you deploy Tanzu Kubernetes Grid. Also, make
sure that you tag a public and private subnet in each AZ, including the control plane cluster, with a
key of kubernetes.io/cluster/<cluster_name>. As a best practice, ensure that the value you use for
the public and private subnets for an AZ can easily identify the subnets as belonging to the same AZ.
For example,
Based on your application needs and desired outcomes, you can organize your workloads using one
of the following VPC architectures.
The following diagram shows an example architecture with multiple VPCs. The control plane load
balancers in the example architecture are configured as internal load balancers.
Another variant of multiple VPC and multiple AZ design is to have one VPC for the control plane and
another for just workload clusters. The following diagram shows such a design.
Consider the following design implications when designing your network architecture.
Decision
Design Decision Design Justification Design Implications
ID
TKO- Use separate Better isolation and security policies Sharing the same network for
AWS- networks/VPCs for the between environments isolate multiple clusters can cause shortage
001 management cluster and production Kubernetes clusters from of IP addresses
workload clusters dev/test clusters
Decision
Design Decision Design Justification Design Implications
ID
TKO- Use separate networks Isolate production Kubernetes clusters A separate set of Service Engines
AWS- for workload clusters from dev/test clusters can be used for separating dev/test
002 based on their usage workload clusters from production
clusters
Availability
We recommend deploying your Tanzu Kubernetes Grid cluster in an odd number of AZs to ensure
high availability of components that require consensus to operate in failure modes.
The Tanzu Kubernetes Grid management cluster performs Machine Health Checks on all Kubernetes
worker VMs. This ensures workload remain in a functional state, and can remediate issues such as:
This health check ensures that your worker capacity remains stable and can be scheduled for
workloads. This health check, however, does not apply to the control plane or the load balancer
VMs. The health check does not recreate VMs due to physical host failure.
Quotas
Provide sufficient quotas to support both the management cluster and the workload clusters in your
deployment. Otherwise, the cluster deployments will fail. Depending on the number of workload
clusters you will deploy, you may need to increase the AWS services quotas from their default
values. You will need to increase the quota in every region in which you plan to deploy Tanzu
Kubernetes Grid.
See Tanzu Kubernetes Grid resources in AWS account for more details.
The number of VPCs depends on the VPC architecture you select. The following table indicates the
number of VPCs for the network architectures in the network diagrams shown above.
Single VPC 1
Multiple VPCs - one for the management cluster and one for workload cluster 2
See AWS service quotas for more information on AWS services default quotas.
When making design decisions for your Tanzu Kubernetes Grid clusters, consider the design
implications listed in the following table.
Decision
Design Decision Design Justification Design Implications
ID
TKO- Deploy TKG UI doesn’t provide an option to Additional parameters are required to be
CLS-001 Management specify an internal registry to use passed in the cluster deployment file. Using UI,
cluster from CLI for TKG installation. you can’t pass these additional parameters.
TKO- Use AWS internal Don’t expose Kubernetes API Create additional AWS load balancers in your
CLS- load balancer endpoints to Internet in Tanzu AWS account which may increase AWS
002 scheme for your Kubernetes Grid clusters. infrastructure cost.
control plane
endpoints
TKO- Deploy Tanzu Allow TKG clusters to have enough Create larger AWS EC2 instances into your
CLS- Kubernetes resources for all Tanzu packages. AWS account which may increase AWS
003 clusters in large infrastructure cost.
and above sizes
EC2
instances(For
example, t2.large
or greater)
TKO- Deploy Tanzu This deploys multiple control plane TKG infrastructure is not impacted by single
CLS- Kubernetes nodes and provides high node failure.
004 clusters with availability for the control plane.
Prod plan
TKO- Deploy Tanzu This deploys multiple control plane TKG infrastructure is not impacted by single
CLS- Kubernetes nodes and provides high zone failure.
005 clusters with an availability for the control plane.
odd number of
AWS AZs for HA
TKO- Enable identity To avoid usage of administrator Pinniped package helps with integrating the
CLS- management for credentials and ensure that TKG management cluster with LDAPS
006 Tanzu required users with the right roles authentication and workload cluster inherits
Kubernetes Grid have access to Tanzu Kubernetes the authentication configuration from the
clusters Grid clusters. management cluster.
Decision
Design Decision Design Justification Design Implications
ID
TKO- Enable Machine The Tanzu Kubernetes Grid A MachineHealthCheck is a resource within the
CLS- Health Checks management cluster performs Cluster API that allows users to define
007 for TKG clusters Machine Health Checks on all conditions under which Machines within a
Kubernetes worker VMs and HA, Cluster should be considered unhealthy.
Machine Health Checks Remediation actions can be taken when
interoperably work together to MachineHealthCheck has identified a node as
enhance workload resiliency unhealthy.
Centralized lifecycle management: managing the creation and deletion of workload clusters
using registered management or supervisor clusters
Centralized management: viewing the inventory of clusters and the health of clusters and
their components
Data protection: managing Velero deployment, configuration, and schedule to ensure that
cluster manifests and persistent volumes are backed up & restorable
For a complete list of Tanzu Mission Control features, see VMware Tanzu Mission Control Feature
Comparison.
To Register your management or supervisor cluster for management through Tanzu Mission Control,
navigate to Administration > Management Cluster on the Tanzu Mission Control console and follow
the prompts.
To attach your cluster for management through Tanzu Mission Control, navigate to Clusters > Attach
Cluster on the Tanzu Mission Control console and follow the prompts.
Note
If a workload cluster under management requires a proxy to access the Internet, you
can use the Tanzu Mission Control CLI to generate the YAML necessary to install
Tanzu Mission Control components on it.
Antrea
Calico
Both are open-source software that provide networking for cluster pods, services, and ingress.
When you deploy a Tanzu Kubernetes cluster using Tanzu CLI using the default configuration,
Antrea CNI is automatically enabled in the cluster. While Kubernetes does have in-built network
policies, Antrea builds on those native network policies to provide more fine-grained network
policies of its own.
Antrea has a ClusterNetworkPolicy which operates at the Kubernetes cluster level. It also has a
NetworkPolicy which limits the scope of a policy to a Kubernetes namespace. The
ClusterNetworkPolicy can be used by a Kubernetes Cluster Admin to create a security policy for the
cluster as a whole. The NetworkPolicy can be used by a developer to secure applications in a
particular namespace. See Tanzu Kubernetes Grid Security and Compliance for more details.
To provision a Tanzu Kubernetes cluster using a non-default CNI, see Deploy Tanzu Kubernetes
clusters with calico.
Each CNI is suitable for a different use case. The following table lists some common use cases for the
two CNIs that Tanzu Kubernetes Grid supports. The information in this table helps you select the
right CNI in your Tanzu Kubernetes Grid implementation.
Ant Enable Kubernetes pod networking with IP overlay networks using VXLAN or Pros:
rea Geneve for encapsulation. Optionally encrypt node-to-node communication Provide an option to
using IPSec packet encryption.Antrea supports advanced network use cases like configure egress IP pool or
kernel bypass and network service mesh static egress IP for the
Kubernetes workloads.
Cons:
More complicated for
network troubleshooting
because of the additional
overlay network.
Cali Calico is used in environments where factors like network performance, flexibility, Pros:
co and power are essential. - Support for network
For routing packets between nodes, Calico leverages the BGP routing protocol policies
instead of an overlay network. This eliminates the need to wrap packets with an - High network
encapsulation layer resulting in increased network performance for Kubernetes performance
workloads. - SCTP support
Cons:
- No multicast support.
For workload clusters, the Tanzu Kubernetes Grid Contour ingress controller package can be used
for layer 7 load balancing.
If you have deployed with both public and private subnets, by default you will get an Internet-facing
load balancer. If you want a private load balancer, you can specifically request one by setting
service.beta.kubernetes.io/aws-load-balancer-internal: "true" in the annotations of the
service. This setting also applies to the Contour ingress and controls whether Contour is internal-
facing or external-facing.
In Tanzu Kubernetes Grid, you can optionally deploy the external-dns package, which automates the
updates to DNS records in AWS (Route53) associated with ingress resources or LoadBalancer
services. This can also automate DNS record management for externally exposed services.
The Pinniped Supervisor is an OIDC server that authenticates users through an external
identity provider (IDP)/LDAP, and then issues its own federation ID tokens to be passed on
to clusters based on the user information from the IDP.
The Pinniped Concierge is a credential exchange API which takes as input a credential from
an identity source (e.g., Pinniped Supervisor, proprietary IDP), authenticates the user via that
credential, and returns another credential which is understood by the host Kubernetes
cluster or by an impersonation proxy which acts on behalf of the user.
Dex Pinniped uses Dex as a broker for your upstream LDAP identity provider. Dex is only
deployed when LDAP is selected as the OIDC backend during Tanzu Kubernetes Grid
management cluster creation.
The following diagram shows the Pinniped authentication flow with an external IDP. In the diagram,
the blue arrows represent the authentication flow between the workload cluster, the management
cluster and the external IDP. The green arrows represent Tanzu CLI and kubectl traffic between the
workload cluster, the management cluster and the external IDP.
See the Pinniped docs for more information on how to integrate Pinniped into Tanzu Kubernetes
Grid with OIDC providers and LDAP.
We recommend the following best practices for managing identities in Tanzu Kubernetes Grid
provisioned clusters:
Limit access to management clusters to the appropriate set of users. For example, provide
access only to users who are responsible for managing infrastructure and cloud resources
but not to application developers. This is especially important because access to the
management cluster inherently provides access to all workload clusters.
Limit cluster administrator access for workload clusters to the appropriate set of users. For
example, provide access to users who are responsible for managing infrastructure and
platform resources in your organization, but not to application developers.
Connect to an identity provider to manage the user identities allowed to access cluster
resources instead of relying on administrator-generated kubeconfig files.
Observability
Metrics Monitoring with Tanzu Observability by Wavefront
(Recommended Solution)
Using VMware Tanzu Observability by Wavefront significantly enhances observability. Tanzu
Observability is a VMware SaaS application that collects and displays metrics and trace data from the
full stack platform, as well as from applications. The service provides the ability to create alerts tuned
with advanced analytics, assist in the troubleshooting of systems, and to understand the impact of
Tanzu Observability collects data from Kubernetes and from applications running within Kubernetes.
You can configure Tanzu Observability with an array of capabilities. There are over 200 integrations
with prebuilt dashboards available in Wavefront.
The following table describes the plugins we recommend for this design:
Wavefront Collect metrics from Kubernetes Kubernetes container and POD POD CPU usage rate
Kubernetes clusters and pods statistics
Integration
Wavefront by Adapts Istio collected metrics Istio metrics including request rates, Request rate
VMware for Istio and forwards to Wavefront trace rates, throughput, etc. (Transactions per
Second)
Tanzu Observability provides various out-of-the-box dashboards. You can customize the dashboards
for your particular deployment. For information on how to customize Tanzu Observability dashboards
for Tanzu for Kubernetes Operations, see Customize Tanzu Observability Dashboard for Tanzu for
Kubernetes Operations.
Solution)
Tanzu Kubernetes Grid also supports Prometheus and Grafana as an alternative on-premises solution
for monitoring Kubernetes clusters.
Prometheus exposes scrapable metrics endpoints for various monitoring targets throughout your
cluster. Metrics are ingested by polling the endpoints at a set interval. The metrics are then stored in
a time-series database. You use the Prometheus Query Language interface to explore the metrics.
Grafana is responsible for visualizing Prometheus metrics without the need to manually write the
PromQL queries. You can create custom charts and graphs in addition to the pre-packaged options.
Prometheus and Grafana are user-managed packages available with Tanzu Kubernetes Grid. For
more information about packages bundled with Tanzu Kubernetes Grid, see Install and Configure
Packages. For more information about user-managed packages, see User-Managed Packages.
Log Forwarding
Tanzu also includes Fluent Bit for integration with logging platforms such as vRealize, Log Insight
Cloud, and Elasticsearch. See Fluent Bit Documentation for various logging providers.
Summary
Tanzu Kubernetes Grid on AWS offers high-performance potential, convenience, and addresses the
challenges of creating, testing, and updating cloud based Kubernetes platforms in a consolidated
production environment. This validated approach will result in a production quality installation with all
the application services needed to serve combined or uniquely separated workload types via a
combined infrastructure solution.
This plan meets many Day 0 needs for aligning product capabilities, such as configuring firewall
rules, networking, load balancing, and workload compute, to the full stack infrastructure.
Deployment Instructions
For instructions on how to deploy this reference design, see Deploy Tanzu for Kubernetes
Operations on AWS.
VMware Service Installer for Tanzu automates the deployment of the reference designs for Tanzu for
Kubernetes Operations. It uses best practices for deploying and configuring the required Tanzu for
Kubernetes Operations components.
To use Service Installer to automate this deployment, see Deploying Tanzu for Kubernetes
Operations on Non Air-gapped AWS VPC Using Service Installer for VMware Tanzu.
Alternatively, if you decide to manually deploy each component, follow the steps provided in this
document.
Prerequisites
Before deploying VMware Tanzu for Kubernetes Operations on AWS, ensure that the following are
set up.
AWS Account: An IAM user account with administrative privileges. Choose an AWS region
where the Tanzu Kubernetes Grid (TKG) AMIs exist.
AWS Resource Quotas: Sufficient quotas to support both the management cluster and the
workload clusters in your deployment. Otherwise, the cluster deployments will fail.
Depending on the number of workload clusters you plan to deploy, you may need to
increase the AWS services quotas from their default values. You will need to increase the
quota in every region in which you deploy Tanzu Kubernetes Grid. For more information on
AWS default service quotas, see AWS service quotas in the AWS documentation.
See Tanzu Kubernetes Grid resources in AWS account for more details.
Note
The number of VPCs will depend on the VPC architecture you have selected.
Bootstrap Machine with AWS CLI Installed: The bootstrap machine can be a local device
such as a laptop, or a virtual machine running in, for example, VMware Workstation or
Fusion. Install the AWS CLI on the bootstrap machine. You can get the AWS CLI through a
package manager such as Homebrew, apt-get, or by downloading the CLI from AWS CLI.
You will use the bootstrap machine to create the AWS VPC and jumpbox.
For additional information about preparing to deploy Tanzu Kubernetes Grid on AWS, see Prepare
to Deploy Management Clusters to Amazon EC2.
export AWS_ACCESS_KEY_ID=xx
export AWS_SECRET_ACCESS_KEY=xx
# Should be a region with at least 3 available AZs
export AWS_REGION=us-east-1
export AWS_PAGER=""
WORKING_DIR="$(pwd)"/tkg-vpc
mkdir -p $WORKING_DIR
3. Create the VPC. This deployment uses a single VPC for all clusters.
Note
# Create a second VPC like: aws ec2 create-vpc --cidr-block 172.18.0.0/16 --tag
-specifications 'ResourceType=vpc, Tags=[{Key=Name,Value=TKGVPC-2}]' --output j
son > $WORKING_DIR/vpc2
4. For each VPC, create a public and private subnet in each AZ.
6. Create the Internet and NAT gateways and attach them to the relevant subnets.
7. If you have an existing transit gateway, you can skip the create-transit-gateway command
and just feed the transit gateway ID into the vpc-attachment command. Otherwise, execute
the following commands to create a new transit gateway.
for i in $WORKING_DIR/subnet-priv-*; do
subnetId="$(jq -r .Subnet.SubnetId $i)"
aws ec2 associate-route-table --subnet-id "$subnetId" --route-table-id $PRIV_RT
_TABLE_ID --output json
done
for i in $WORKING_DIR/subnet-pub-*; do
subnetId="$(jq -r .Subnet.SubnetId $i)"
aws ec2 associate-route-table --subnet-id "$subnetId" --route-table-id $PUB_RT_
TABLE_ID --output json
done
1. Create a jumpbox.
2. Wait a few minutes for the instance to start. After it restarts, SSH to the jumpbox.
3. Log in to the jumpbox to install the necessary packages and configurations. Then reboot.
4. Download the Tanzu CLI and other utilities for Linux from the Tanzu Kubernetes Grid
Download Product site.
Note that the command shown below assumes that no process is currently listening on local
port 8080. If it is in use then choose a different port and adjust the SSH command line
accordingly.
Run the session in screen in case your SSH connection is terminated. If your connection is
terminated, you can reattach to the screen session with screen -r once you have
reconnected.
screen
tar -xzvf tanzu-cli-bundle-linux-amd64.tar.gz
gunzip kubectl-*.gz
Running the tanzu config init command for the first time creates the ~/.config/tanzu/tkg
subdirectory, which contains the Tanzu Kubernetes Grid configuration files.
For more information about ytt cluster overlays, see ytt Overlays.
Pinniped is an open-source authentication service for Kubernetes clusters. If you use LDAP
authentication, Pinniped uses Dex as the endpoint to connect to your upstream LDAP identity
provider. If you use OIDC, Pinniped provides its own endpoint, so Dex is not required. Pinniped and
Dex run automatically as in-cluster services in your management cluster.
You enable identity management during management cluster deployment. Therefore, ensure that
you have an IDP/LDAP server setup before you do the Tanzu Kubernetes Grid management cluster
installation.
If you don’t have identity management configured, see Configure Identity Management for a sample
IDP setup. Also see Pinniped Docs for information on Pinniped integration into Tanzu Kubernetes
Grid with various OIDC providers and LDAPs.
Run the Tanzu Kubernetes Grid installer, a wizard interface that guides you through the
process of deploying a management cluster. See Deploy Management Cluster from the
Tanzu Kubernetes Grid Installer.
OR
Create and edit YAML configuration files, and use the configuration files to deploy a
management cluster with the CLI commands. See Deploy Management Clusters from a
Configuration File.
1. From the jumpbox, execute the following command to launch the installer interface.
2. Open a web browser and launch localhost:8080 on the machine running the SSH session.
The Tanzu Kubernetes Grid installer interface displays. Note that if you chose a different
listening port when connecting to the jumpbox then the interface will be available on that
port instead of port 8080.
Note
The screens are provided to help you navigate the installer interface. Enter
the values that are specific to your AWS setup. The screens shown were
taken from the current version at the time of writing and may differ slightly
from other versions.
3. Click Deploy on the Amazon EC2 tile to start the management cluster setup on Amazon
EC2.
4. For IaaS Provider settings, enter your AWS Access Key ID, Secret Access Key, Session
Token, and Region, then click Connect followed by Next. Select the region you selected in
Set up AWS infrastructure.
5. For VPC for AWS settings, select the VPC ID you created in Set up AWS infrastructure,
select the check box next to This is not internet facing vpc and click Next.
6. For Management Cluster Settings, select Production and the instance type for the control
plane nodes.
7. Enter the following specifications for the management cluster and click Next.
EC2 Key Pair: The name of an existing key pair, which you may have created in
Create and Set Up a Jumpbox.
AWS CloudFormation Stack: Select this if this is the first time that you are deploying
a management cluster to this AWS account, see Permissions Set by Tanzu
Kubernetes Grid for more details.
Availability Zone: Select the three availability zones for your region.
VPC Public and Private Subnets: Select the existing subnets on the VPC for each
AZ.
Worker Node Instance Type: Select the configuration for the worker node VMs.
8. For Kubernetes Network, enter the Network CNI settings and click Next.
Optionally, if you already have a proxy server set up and want to send outgoing HTTP(S)
traffic from the management cluster to a proxy, toggle Enable Proxy Settings. For more
information on how to configure proxy settings, see Configure the Kubernetes Network and
Proxies.
9. For Identity Management, toggle Enable Identity Management Settings to configure your
IDP and click Next.
For more information about configuring the identity management settings, see Configure
Identity Management.
10. For OS Image, use the drop-down menu to select the OS and Kubernetes version image
template to use for deploying Tanzu Kubernetes Grid VM. Select Ubuntu OS image (amd64)
and click Next.
11. For Register with Tanzu Mission Control, you can follow these steps to register your Tanzu
Kubernetes Grid Management cluster with Tanzu Mission Control and generate the Tanzu
Mission Control url to enter into the url section.
13. For CEIP Agreement, select the check box to opt in to the VMware Customer Experience
Improvement Program (CEIP), and click Next.
Before creating a management cluster using the Tanzu CLI, define the base configuration for the
cluster in a YAML file. You specify this file by using the --file option of the tanzu management-
cluster create command.
Note
For Register with Tanzu Mission Control, you can Register a Management Cluster with Tanzu
Mission Control to generate Tanzu Mission Control url and set into TMC_REGISTRATION_URL: <Tanzu
To create a new Tanzu Kubernetes Grid management cluster, run the following command:
AWS_AMI_ID:
AWS_NODE_AZ: us-west-2a
AWS_NODE_AZ_1: ""
AWS_NODE_AZ_2: ""
AWS_PRIVATE_NODE_CIDR: 172.16.0.0/24
AWS_PRIVATE_NODE_CIDR_1: ""
AWS_PRIVATE_NODE_CIDR_2: ""
AWS_PRIVATE_SUBNET_ID: ""
AWS_PRIVATE_SUBNET_ID_1: ""
AWS_PRIVATE_SUBNET_ID_2: ""
AWS_PUBLIC_NODE_CIDR: 172.16.3.0/24
AWS_PUBLIC_NODE_CIDR_1: ""
AWS_PUBLIC_NODE_CIDR_2: ""
AWS_PUBLIC_SUBNET_ID: ""
AWS_PUBLIC_SUBNET_ID_1: ""
AWS_PUBLIC_SUBNET_ID_2: ""
AWS_REGION: us-west-2
AWS_SSH_KEY_NAME: tkg-kp
AWS_VPC_CIDR: 172.16.0.0/16
AWS_VPC_ID: ""
BASTION_HOST_ENABLED: "false"
CLUSTER_CIDR: 172.96.0.0/11
CLUSTER_NAME: tkg-validaton-mc
CLUSTER_PLAN: dev
CONTROL_PLANE_MACHINE_TYPE: t3.large
ENABLE_AUDIT_LOGGING: ""
ENABLE_CEIP_PARTICIPATION: "false"
ENABLE_MHC: "true"
IDENTITY_MANAGEMENT_TYPE: none
INFRASTRUCTURE_PROVIDER: aws
LDAP_BIND_DN: ""
LDAP_BIND_PASSWORD: ""
LDAP_GROUP_SEARCH_BASE_DN: ""
LDAP_GROUP_SEARCH_FILTER: ""
LDAP_GROUP_SEARCH_GROUP_ATTRIBUTE: ""
LDAP_GROUP_SEARCH_NAME_ATTRIBUTE: cn
LDAP_GROUP_SEARCH_USER_ATTRIBUTE: DN
LDAP_HOST: ""
LDAP_ROOT_CA_DATA_B64: ""
LDAP_USER_SEARCH_BASE_DN: ""
LDAP_USER_SEARCH_FILTER: ""
LDAP_USER_SEARCH_NAME_ATTRIBUTE: ""
LDAP_USER_SEARCH_USERNAME: userPrincipalName
NODE_MACHINE_TYPE: m5.large
OIDC_IDENTITY_PROVIDER_CLIENT_ID: ""
OIDC_IDENTITY_PROVIDER_CLIENT_SECRET: ""
OIDC_IDENTITY_PROVIDER_GROUPS_CLAIM: ""
OIDC_IDENTITY_PROVIDER_ISSUER_URL: ""
OIDC_IDENTITY_PROVIDER_NAME: ""
OIDC_IDENTITY_PROVIDER_SCOPES: ""
OIDC_IDENTITY_PROVIDER_USERNAME_CLAIM: ""
OS_ARCH: amd64
OS_NAME: ubuntu
OS_VERSION: "20.04"
SERVICE_CIDR: 172.64.0.0/13
TKG_HTTP_PROXY_ENABLED: "false"
To use the configuration file from a previous deployment, make a copy of the configuration file with
a new name, open it in a text editor, and update the configuration. VMware recommends using a
dedicated configuration file for each management cluster, with the configuration settings specific to a
single infrastructure.
For more information about deploying a management cluster from a configuration file, see Deploy
Management Clusters from a Configuration File.
Tanzu Kubernetes Grid uses the temporary management cluster to provision the final management
cluster on AWS. For information about how to examine and verify your Tanzu Kubernetes Grid
management cluster deployment, see Examine the Management Cluster Deployment.
Workload clusters can be highly customized through YAML manifests and applied to the
management cluster for deployment and lifecycle management. To generate a YAML template to
update and modify to your own needs use the --dry-run switch. Edit the manifests to meet your
requirements and apply them to the cluster.
Example:
After the workload cluster is created, the current context changes to the new workload cluster.
For more information on cluster lifecycle and management, see Manage Clusters.
Auto-Managed Packages
Tanzu Kubernetes Grid automatically installs the auto-managed packages during cluster creation. For
more information about auto-managed packages, see Auto-Managed Packages.
CLI-Managed Packages
A CLI-managed package is an optional component of a Kubernetes cluster that you can install and
manage with the Tanzu CLI. These packages are installed after cluster creation. CLI-managed
packages are grouped into package repositories in the Tanzu CLI. If a package repository that
contains CLI-managed packages is available in the target cluster, you can use the Tanzu CLI to install
and manage any of the packages from that repository.
Using the Tanzu CLI, you can install CLI-managed packages from the built-in tanzu-standard
package repository or from package repositories that you add to your target cluster. From the tanzu-
standard package repository, you can install the Cert Manager, Contour, External DNS, Fluent Bit,
Grafana, Harbor, Multus CNI, and Prometheus packages. For more information about CLI-managed
packages, see CLI-Managed Packages.
If you want to deploy Harbor into a shared services cluster, create a shared services cluster if it is not
already created. For instructions, see Create a Shared Services Cluster. Also, make sure you add
INFRASTRUCTURE_PROVIDER: aws into shared service workload cluster config file.
Delete Clusters
The procedures in this section are optional. They are provided in case you want to clean up your
production or lab environment.
Note
Be sure to wait until all the workload clusters have been reconciled before deleting
the management cluster or infrastructure will need to be manually cleaned up.
This document lays out a reference design for deploying VMware Tanzu for Kubernetes Operations
with Tanzu components on Microsoft Azure. This reference design is based on the architecture and
components described in VMware Tanzu for Kubernetes Operations Reference Architecture.
Note
This reference design is supported and validated for customers deploying Tanzu
Kubernetes Grid 1.6 on Microsoft Azure.
The Tanzu Kubernetes Grid user interface (UI) provides a guided deployment experience that is
tailored for Microsoft Azure. The Tanzu Kubernetes Grid installer runs either on an operator’s own
machine (it uses Docker) or through a bootstrap machine or a jump box.
Note
When using a bootstrap machine or a jump box, you may not be able to use the
Tanzu Kubernetes Grid UI to build your configuration of the management and
workload clusters. In such cases, use the following sample YAML file to help kickstart
the installation process.
AZURE_ENVIRONMENT: "AzurePublicCloud"
AZURE_CLIENT_ID: <AZURE_CLIENT_ID>
AZURE_CLIENT_SECRET: <AZURE_CLIENT_SECRET>
AZURE_CONTROL_PLANE_MACHINE_TYPE: Standard_D2s_v3
AZURE_CONTROL_PLANE_SUBNET_CIDR: 10.0.1.0/26
AZURE_CONTROL_PLANE_SUBNET_NAME: mgmt-control-subnet
AZURE_ENABLE_PRIVATE_CLUSTER: "true"
AZURE_FRONTEND_PRIVATE_IP: 10.0.1.4
AZURE_LOCATION: eastus2
AZURE_NODE_MACHINE_TYPE: Standard_D2s_v3
AZURE_NODE_SUBNET_CIDR: 10.0.1.64/26
AZURE_NODE_SUBNET_NAME: mgmt-worker-subnet
AZURE_RESOURCE_GROUP: bch-tkg-east
AZURE_SSH_PUBLIC_KEY_B64: <BASE64-SSH-PUBLIC>
AZURE_SUBSCRIPTION_ID: <AZURE_SUBSCRIPTION_ID>
AZURE_TENANT_ID: <AZURE_TENANT_ID>
AZURE_VNET_CIDR: 10.0.0.0/16
AZURE_VNET_NAME: bch-vnet-tkg
AZURE_VNET_RESOURCE_GROUP: bch-tkg-east
CLUSTER_CIDR: 100.96.0.0/11
CLUSTER_NAME: bchcluster-mgmt-east
CLUSTER_PLAN: prod
ENABLE_AUDIT_LOGGING: "true"
ENABLE_CEIP_PARTICIPATION: "false"
ENABLE_MHC: "true"
INFRASTRUCTURE_PROVIDER: azure
OS_ARCH: amd64
OS_NAME: ubuntu
OS_VERSION: "20.04"
SERVICE_CIDR: 100.64.0.0/13
TKG_HTTP_PROXY_ENABLED: "false"
requirements. However, not all Azure platform services can be tightly integrated into the Tanzu
Kubernetes Grid installation.
Tanzu Clusters
A Kubernetes cluster is made up of several components that act as a control plane of the cluster and
a set of supporting components and worker nodes that actually help run the deployed workloads.
There are two types of clusters in the Tanzu Kubernetes Grid setup: management cluster and
workload cluster. The Tanzu Kubernetes Grid management cluster hosts all the Tanzu Kubernetes
Grid components used to manage workload clusters. Workload clusters, which are spun up by Tanzu
Kubernetes Grid administrators, run the containerized applications. Cluster security is a shared
responsibility between Tanzu Kubernetes Grid cluster administrators, developers, and operators who
run applications on Tanzu Kubernetes Grid clusters.
Network Design
VMware recommends using one of the following production-level network designs for deploying
Tanzu Kubernetes Operations on Azure:
Considerations
The network designs are based on a default Tanzu CLI deployment for a production-level installation.
The designs use the default configuration values when running the Tanzu CLI. However, you have
complete control over how many nodes are deployed within the workload clusters for both the
control plane and worker nodes. You also determine the Azure components with which the clusters
will integrate.
1. Use CIDR range /28. Due to the way that Azure implements its IP addressing scheme within
subnets, VMware recommends that the minimum CIDR range for a Tanzu deployment is /28
to allow for scalability of each cluster.
2. Use only the required Microsoft Azure components that are necessary for deploying Tanzu
Kubernetes Grid on Microsoft Azure.
3. Fit into any production-level network design that you may have in place.
4. Use the default security and DevOps tooling available with an Azure subscription. The
security and DevOps tools are shown in the column to the right of the network designs.
5. Do not make assumptions or provide designs for the outer perimeter of your network design.
You may use Azure or third-party services. The outer perimeter network design should not
affect the network designs for Tanzu Kubernetes Operations on Microsoft Azure.
6. Integrating with SaaS services, such as Tanzu Mission Control and Tanzu Observability,
requires that the Tanzu Kubernetes clusters have outbound SSL-based connectivity to the
Internet. Add a rule to allow port 443. Add the rule to the Network Security Groups (NSGs)
that are applied to the subnet where the control plane VMs are deployed. Allow port 443 to
all targets until VMware can provide a more detailed list of targeted CNAMES or IP ranges.
Quotas
Provide sufficient quotas to support both the management cluster and the workload clusters in your
deployment. Otherwise, the cluster deployments will fail. Depending on the number of workload
clusters you will deploy, you may need to increase the following quotas from their default values.
You will need to increase these quotas in every region in which you plan to deploy Tanzu
Kubernetes Grid.
The Tanzu Kubernetes Grid documentation suggests that you assign the Contributor role to the
Service Principal. However, because the Tanzu CLI creates the VMs and networking components,
for security reasons VMware recommends assigning only the VM and Network Contributor roles to
the SP.
Virtual Network
Because Tanzu for Kubernetes operations is deployed as an IaaS solution on Azure, the Kubernetes
clusters must exist within the boundary of an Azure Virtual Network (VNet). Therefore, place the
bootstrap machine, which is used to run the Tanzu CLI, in the same VNet as the Tanzu management
cluster. Place the management cluster in its own subnet.
The workload clusters can exist within the same VNet, but in different subnets, or in a completely
separate VNet. However, ensure that the Workload VNet is peered with the VNet where the
management cluster is deployed.
Load Balancer
When you deploy a management or workload cluster using Tanzu CLI, a load balancer is created
and attached to both the control plane and the worker node clusters. The load balancers are used
only for running Kubernetes traffic to the underlying nodes. The Kubernetes engine does not use
the load balancers for traffic to service pods within the cluster.
The option to make the clusters private or public is controlled by the AZURE_ENABLE_PRIVATE_CLUSTER
configuration option within the config.yaml file that you use to deploy the cluster. Setting this option
to false tells the deployment process to create a public IP address and attach it to each load
balancer. Setting it to true requires you to specify a value in the AZURE_FRONTEND_PRIVATE_IP
configuration option, and attaches an IP address from your specified subnet to the load balancer.
Before you begin your deployment, it is important to ensure that the necessary pathways are open
to all pieces of the clusters and that they are able to talk to one another. The following are the
primary requirements:
Control Plane VMs/Subnet – HTTPS Inbound/Outbound to Internet and SSH and Secure
Kubectl (22, 443, and 6443) Inbound/Outbound within the VNet
Worker Node VMs/Subnet – Secure Kubectl (6443) Inbound/Outbound within the VNet
Note
HTTPS traffic to the bootstrap machine and the control plane nodes is required so
that they can download the necessary container images for the clusters to function
properly.
Virtual Machines
The primary component of the Tanzu Kubernetes Grid installation is the VMs that are created to work
either as the control plane or as worker nodes within the cluster. You can leverage many different
VM sizes, including GPU-based VMs, when you deploy your clusters. The default VM size is the
standard D2s_V3 and the minimum requirement for Azure instance types is 2 CPUs and 8 GB
memory.
VMware recommends that Resource Groups, VNets, subnets, and Network Security Groups are
created before you start a deployment.
Important
All clusters are deployed in a highly available state across Availability Zones within a
given Azure region. However, this does mean that regions that do not have
Availability Zones will not support Tanzu Kubernetes Grid deployments.
Azure Backup
As with any IaaS-based solution within Azure, VMware recommends that an Azure Backup Recovery
Vault is deployed and made available to all VMs. The availability of Azure Backup is important for the
control plane clusters and the bootstrap machine because that is where the Kubernetes and Tanzu
configurations are stored and managed.
Azure Monitor
The Azure Monitor set of services are automatically turned on for all customers within their given
subscription. Although Tanzu for Kubernetes Operations provides monitoring and logging, it does
not capture information on many of the Azure components mentioned in this reference design.
Therefore, it is important to use the available Azure Monitor features that Microsoft provides, such as:
Activity Log
Network Watcher
Diagnostics/Metrics/Alerts
Bastion Host
Microsoft Azure creates an Azure Bastion service by default. You can use the service as a jump box
to the bootstrap machine.
This reference design uses a bootstrap machine that does cluster deployments using the Tanzu CLI.
However, your security requirements may not allow access from your cluster to a bootstrap machine
inside your firewall. In such cases, after the initial cluster creation, you can connect your clusters to
Tanzu Mission Control for lifecycle management.
Public IP
Use of a public IP address for the Kubernetes API server is optional. You can host your Kubernetes
API server on a private IP address. In fact, this reference design uses a private IP address. Access is
provided through a public endpoint in a DMZ with a Web Application Firewall (WAF) or through
some kind of VPN level connectivity, such as Express Route or Site-to-Site VPN with connectivity
back to your on-premises network.
Note
Keep in mind that the default deployment of Tanzu Kubernetes Grid creates public
facing clusters. Make sure you set AZURE_ENABLE_PRIVATE_CLUSTER to true if you want
to deploy your Kubernetes clusters on a private IP address.
Container Registries
Numerous container registry options are available, and you may already have one in place. Tanzu
comes pre-packaged with its own registry, called Harbor, which can be made available directly within
a Tanzu Kubernetes Grid workload cluster. If you are hosting your Kubernetes clusters on a private
IP address as described in this reference design, the Harbor registry sits in a workload cluster in the
same network architecture as all other clusters. This design allows only private traffic access to the
container images.
DockerHub
Note
Ensure that Tanzu Kubernetes clusters have outbound SSL-based connectivity to the
Internet.
Centralized lifecycle management: managing the creation and deletion of workload clusters
using registered management or supervisor clusters
Centralized management: viewing the inventory of clusters and the health of clusters and
their components
Data protection: managing Velero deployment, configuration, and schedule to ensure that
cluster manifests and persistent volumes are backed up and restorable
For a complete list of features that Tanzu Mission Control includes with Tanzu, see this chart.
To attach your cluster for management through Tanzu Mission Control, navigate to Clusters > Attach
Cluster on the Tanzu Mission Control console and follow the prompts.
Note
If a workload cluster under management requires a proxy to access the Internet, you
can use the Tanzu Mission Control CLI to generate the YAML necessary to install
Tanzu Mission Control components on it.
For workloads, Tanzu Kubernetes Grid Contour ingress controller package can be used for layer 7
load balancing.
In Tanzu Kubernetes Grid, you can optionally deploy the external-dns package, which automates
updating DNS records in Azure DNS associated with ingress resources or load balancing services.
This can also automate away toil associated with DNS record management for externally exposed
services.
The Pinniped Supervisor is an OIDC server which authenticates users through an external
identity provider (IDP)/LDAP, and then issues its own federation ID tokens to be passed on
to clusters based on the user information from the IDP.
The Pinniped Concierge is a credential exchange API which takes as input a credential from
an identity source (e.g., Pinniped Supervisor, proprietary IDP), authenticates the user via that
credential, and returns another credential which is understood by the host Kubernetes
cluster or by an impersonation proxy which acts on behalf of the user.
Dex Pinniped uses Dex as a broker for your upstream LDAP identity provider. Dex is
deployed only when LDAP is selected as the OIDC backend during Tanzu Kubernetes Grid
management cluster creation.
The following diagram shows the Pinniped authentication flow with an external IDP. In the diagram,
the blue arrows represent the authentication flow between the workload cluster, the management
cluster, and the external IDP. The green arrows represent Tanzu CLI and kubectl traffic between the
workload cluster, the management cluster, and the external IDP.
See the Pinniped Docs for more information on how to integrate Pinniped into Tanzu Kubernetes
Grid with OIDC providers and LDAP.
VMware recommends the following best practices for managing identities in clusters provisioned with
Tanzu Kubernetes Grid:
Limit access to management clusters to the appropriate set of users. For example, provide
access only to users who are responsible for managing infrastructure and cloud resources
but not to application developers. This is especially important because access to the
management cluster inherently provides access to all workload clusters.
Limit cluster administrator access for workload clusters to the appropriate set of users. For
example, provide access to users who are responsible for managing infrastructure and
platform resources in your organization, but not to application developers.
Connect to an identity provider to manage the user identities allowed to access cluster
resources instead of relying on administrator-generated kubeconfig files.
Observability
Note
Ensure that Tanzu Kubernetes clusters have outbound SSL-based connectivity to the
Internet.
Tanzu Observability collects data from components in Azure, Kubernetes, and applications running
within Kubernetes.
You can configure Tanzu Observability with an array of capabilities. The following table describes the
plugins that VMware recommends for this design:
Wavefront Collects metrics from Kubernetes container and POD POD CPU usage rate
Kubernetes Kubernetes clusters and pods statistics
Integration
Wavefront by Adapts Istio collected metrics Istio metrics including request rates, Request rate
VMware for Istio and forwards to Wavefront trace rates, throughput, etc. (Transactions per
Second)
Tanzu Observability provides various out-of-the-box dashboards. You can customize the dashboards
for your particular deployment. For information on how to customize Tanzu Observability dashboards
for Tanzu for Kubernetes Operations, see Customize Tanzu Observability Dashboard for Tanzu for
Kubernetes Operations.
Prometheus operates by exposing scrapable metrics endpoints for various monitoring targets
throughout your cluster. Metrics are ingested by polling the endpoints on a set interval which are
then stored in a time-series database. Metrics data can be explored via the Prometheus Query
Language interface.
Grafana is responsible for visualizing Prometheus metrics without the need to manually write PromQL
queries. Custom charts and graphs can be created in addition to the pre-packaged options.
The Tanzu Kubernetes Grid extensions bundles contain instructions and manifests for deploying
these tools out.
Prometheus and Grafana are user-managed packages available with Tanzu Kubernetes Grid. For
more information about packages bundled with Tanzu Kubernetes Grid, see Install and Configure
Packages. For more information about user-managed packages, see User-Managed Packages.
Log Forwarding
Tanzu also includes Fluent Bit for integration with logging platforms such as vRealize LogInsight,
Elastic Search, and other logging aggregators. For information on configuring Fluent Bit to your
logging provider, see Implement Log Forwarding with Fluent Bit.
Summary
Tanzu Kubernetes Grid on Azure offers high-performance potential, convenience, and addresses
the challenges of creating, testing, and updating cloud-based Kubernetes platforms in a consolidated
production environment. This validated approach will result in a production-quality installation with all
the application services needed to serve combined or uniquely separated workload types via a
This plan meets many Day 0 needs for aligning product capabilities, such as configuring firewall
rules, networking, load balancing, and workload computing, to the full stack infrastructure.
Deployment Instructions
For instructions on how to deploy this reference design, see Deploy Tanzu for Kubernetes
Operations on Microsoft Azure.
The scope of this document is limited to providing the deployment steps based on the following
reference design. The reference design represents one of the two production-level reference
designs described in VMware Tanzu for Kubernetes Operations on Azure Reference Design.
This design shows both the Tanzu Kubernetes Grid management cluster and workload clusters in the
same virtual network along with the bootstrap machine. However, each cluster is placed in its own
subnet. In addition, the control plane and worker nodes of each cluster are also separated by a
subnet.
1. The reference design shows the deployment of only the base components within Tanzu
Kubernetes Grid.
2. The reference design fits in with any production-level design that a customer may have in
place, such as a Hub and Spoke, Global WAN Peering, or just a simple DMZ based
implementation.
3. This guide does not make any assumptions about your chosen tooling for security or
DevOps, other than what is available with a default Azure subscription.
Note
You can use this guide to deploy additional workload clusters and workload clusters
of a different size. However, you’ll need to make additional configuration changes.
You can make these configuration changes after you have gone through the
deployment steps provided in this document.
Prerequisites
Ensure that you have:
An SSH key and the Base 64 encoded value of the public key. You will configure the Base
64 encoded value for the AZURE_SSH_PUBLIC_KEY_B64 parameter of the configuration
file for deploying Tanzu Kubernetes Grid. How you generate the SSH key and how you
encode the entire public key is up to you. However, you will need to encode the public key
before storing it in the Tanzu Kubernetes Grid deployment configuration file.
Access to Customer Connect and the available downloads for Tanzu Kubernetes Grid. To
verify that you have access, go to VMware Tanzu Kubernetes Grid Download Product.
2. Set up Bootstrap VM
ARM Template
Parameters
The ARM template contains parameters that you can populate or customize so that your Azure
environment uses your naming standards and networking requirements.
Virtual Network
5 Subnets
Bootstrap
Uses the Region where the Resource Group is located to specify where the resources should
be deployed.
Contains security rules for each of the Network Security Groups attached to the Control
Plane clusters. These rules allow for SSH and secure kubectl access from the public Internet.
Access from the public Internet makes troubleshooting easier while you deploy your
management and workload clusters. You can remove these rules after you complete your
deployment.
Quotas
To successfully deploy Tanzu Kubernetes Grid to Azure, ensure that the quotas are sufficient to
support both the management cluster and workload cluster deployments. Otherwise, the
deployments will fail.
Review the quotas for the following resources, which are included in the ARM template, and
increase their values as needed. Increase the quotas for every region to which you plan to deploy
Tanzu Kubernetes Grid.
Family vCPUs based on your chosen virtual machine family (D, E, F, etc.)
Based on the recommended minimum virtual machine size of D2s_v3, the following minimum
quotas are required per cluster:
Ensure that you increase the quotas if you make changes to the basic configuration of the clusters.
There are multiple methods to deploy an ARM template on Azure. If you are experienced with
Azure, you can deploy the ARM template in a method that is comfortable to you.
Otherwise, you can use the example Azure CLI commands locally or in Azure Cloud Shell. If you
prefer to use Azure PowerShell, use the example command for Azure PowerShell.
Ensure that you have the following to deploy the ARM template in Microsoft Azure:
Azure CLI
Run the following example Azure CLI command locally or in Azure Cloud Shell to deploy the ARM
template.
Azure PowerShell
Alternatively, run the following example command in Azure PowerShell to deploy the ARM template.
Azure Portal
If you prefer to use the Azure Portal, do the following to process an ARM template directly on the
Azure Portal.
1. Search and click Deploy a Custom Template > Build your own template in the editor.
3. Fill in the parameter values so that the values are specific to your deployment.
VMware recommends that you create the SP on the Azure Portal. However, if you prefer to use
either the Azure CLI or Azure PowerShell, see the following Microsoft product documentation:
Azure CLI
Azure PowerShell
Important
Name: Enter a name that reflects what the App Registration is being used for.
Example: tanzucli
4. Copy the values for the Application client ID and Directory (tenant) ID from the Overview
page. You will need the IDs for running the Tanzu CLI.
You will use the key for programmatic authentication and execution.
1. Click Certificates & secrets > Client secrets > New client secret.
The roles provide the minimum level of permissions required for the Tanzu CLI to function
properly within Azure.
Assign the roles through the Subscription scope. Depending on your security boundaries,
you can also assign it at the Resource Group scope.
Important
To assign a role to the SP, you must have either the Owner role or User
Access Administration role within the scope of the Azure subscription.
1. Find your specific Subscription on the Azure Portal and go to Access Control (IAM) >
Roles.
3. In the Add role assignment page, select User, group, or service principal.
4. For Select Members, search for the new SP name you created.
7. Make a note of the following information. You will need the information to create the
configuration files to set up the Bootstrap machine and Tanzu CLI.
Azure Subscription ID
Set Up Bootstrap VM
You will use the bootstrap VM to deploy the Tanzu Kubernetes Grid management and workload
clusters. Create the bootstrap VM after you have set up your Microsoft Azure environment.
Docker
Azure CLI
Tanzu CLI
Tanzu Kubectl
3. Run the following Shell commands to set up the bootstrap VM. Replace the variables with the
VMware account information needed to access VMware Customer Connect and Azure IDs
for the Azure subscription on which you created resources using the ARM template and
Application Registration/Service Principal.
# Variables
export VMWUSER = "<CUSTOMER_CONNECT_USERID>"
export VMWPASS = "<CUSTOMER_CONNECT_PWD>"
export AZURETENANTID = "<AAD Tenant ID>"
export AZURESUBSCRIPTION = "<Subscription GUID>"
export AZURECLIENTID = "<Service Principal ID>"
export AZURECLIENTSECRET = "<Service Principal Secret>"
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive
-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/nul
l
# Optional Verification
# docker run hello-world
Note
Because of permission issues, you will have to log out and log in to the bootstrap VM
after installing Docker and before you download and install the Tanzu components.
If you prefer not to copy paste code, you can use the following sample script files:
bootstrapsetup.sh
bootstraptanzu.sh
The ex-config.yaml sample YAML file contains the minimum configuration needed to deploy
a management cluster and workload clusters. The configuration contains the default values
used in the ARM template. Change the values in the YAML as needed for your deployment.
For example, replace the values for the Azure IDs, Application Registration/Service Principal,
cluster name, and the Base 64 encoded value of the public key.
2. Run the following commands from your bootstrap VM to create the management and
workload clusters.
For additional product documentation on how to create the YAML configuration file and what each
value corresponds to in Azure, see Management Cluster Configuration for Microsoft Azure.
These packages are available for deployment in each workload cluster that you deploy, but they are
not automatically installed and working as pods.
Tanzu Kubernetes Grid includes two types of packages, auto-managed packages and CLI-managed
packages.
Auto-Managed Packages
Tanzu Kubernetes Grid automatically installs the auto-managed packages during cluster creation. For
more information about auto-managed packages, see Auto-Managed Packages.
CLI-Managed Packages
A CLI-managed package is an optional component of a Kubernetes cluster that you can install and
manage with the Tanzu CLI. These packages are installed after cluster creation. CLI-managed
packages are grouped into package repositories in the Tanzu CLI. If a package repository that
contains CLI-managed packages is available in the target cluster, you can use the Tanzu CLI to install
and manage any of the packages from that repository.
Using the Tanzu CLI, you can install CLI-managed packages from the built-in tanzu-standard
package repository or from package repositories that you add to your target cluster. From the tanzu-
standard package repository, you can install the Cert Manager, Contour, External DNS, Fluent Bit,
Grafana, Harbor, Multus CNI, and Prometheus packages. For more information about CLI-managed
packages, see CLI-Managed Packages.
If your deployment requires Harbor to take on a heavy load and store large images in the registry,
you can install Harbor into a separate workload cluster.
The following documentation lays out the reference designs for deploying Tanzu for Kubernetes
Operation (informally known as TKO) on vSphere. Separate reference designs are provided for
environments that use vSphere networking, VDS, and environments that use NSX-T.
VMware Tanzu for Kubernetes Operations on vSphere with NSX-T Reference Design
Deploy VMware Tanzu for Kubernetes Operations on VMware vSphere with VMware
NSX-T
This document describes a reference design for deploying VMware Tanzu for Kubernetes
Operations on vSphere backed by vSphere Networking (VDS).
The following reference design is based on the architecture and components described in VMware
Tanzu for Kubernetes Operations Reference Architecture.
Management Cluster - A management cluster is the first element that you deploy when you create a
Tanzu Kubernetes Grid instance. The management cluster is a Kubernetes cluster that performs the
role of the primary management and operational center for the Tanzu Kubernetes Grid instance. The
management cluster is purpose-built for operating the platform and managing the lifecycle of Tanzu
Kubernetes clusters.
ClusterClass API - Tanzu Kubernetes Grid 2 functions through the creation of a management
Kubernetes cluster which holds ClusterClass API. The ClusterClass API then interacts with the
infrastructure provider to service workload Kubernetes cluster lifecycle requests. The earlier
primitives of Tanzu Kubernetes Clusters will still exist for Tanzu Kubernetes Grid 1.X. A new feature
has been introduced as a part of Cluster API called ClusterClass which reduces the need for
redundant templating and enables powerful customization of clusters. The whole process for creating
a cluster using ClusterClass is the same as before but with slightly different parameters.
Workload Clusters - Workload Clusters are the Kubernetes clusters in which your application
workloads run. These clusters are also referred to as Tanzu Kubernetes clusters. Workload Clusters
can run different versions of Kubernetes, depending on the needs of the applications they run.
Shared Service Cluster - Each Tanzu Kubernetes Grid instance can only have one shared services
cluster. You will deploy this cluster only if you intend to deploy shared services such as Contour and
Harbor.
Tanzu Kubernetes Cluster Plans - A cluster plan is a blueprint that describes the configuration with
which to deploy a Tanzu Kubernetes cluster. It provides a set of configurable values that describe
settings like the number of control plane machines, worker machines, VM types, and so on. This
release of Tanzu Kubernetes Grid provides two default templates, dev and prod.
Tanzu Kubernetes Grid Instance - A Tanzu Kubernetes Grid instance is the full deployment of Tanzu
Kubernetes Grid, including the management cluster, the workload clusters, and the shared services
cluster that you configure.
Tanzu CLI - A command-line utility that provides the necessary commands to build and operate
Tanzu management and Tanzu Kubernetes clusters. Starting with TKG 2.3.0, Tanzu Core CLI is now
distributed separately from Tanzu Kubernetes Grid. For more information about installing the Tanzu
CLI for use with Tanzu Kubernetes Grid, see [Install the Tanzu CLI]
Carvel Tools - Carvel is an open-source suite of reliable, single-purpose, composable tools that aid
in building, configuring, and deploying applications to Kubernetes. Tanzu Kubernetes Grid uses the
following Carvel tools:
ytt - A command-line tool for templating and patching YAML files. You can also use ytt to
collect fragments and piles of YAML into modular chunks for reuse.
kapp - The application deployment CLI for Kubernetes. It allows you to install, upgrade, and
delete multiple Kubernetes resources as one application.
imgpkg - A tool that enables Kubernetes to store configurations and the associated
container images as OCI images, and to transfer these images.
yq - a lightweight and portable command-line YAML, JSON, and XML processor. yq uses jq-
like syntax but works with YAML files as well as JSON and XML.
Bootstrap Machine - The bootstrap machine is the laptop, host, or server on which you download
and run the Tanzu CLI. This is where the initial bootstrapping of a management cluster occurs before
it is pushed to the platform where it will run.
Tanzu Kubernetes Grid Installer - The Tanzu Kubernetes Grid installer is a graphical wizard that you
launch by running the tanzu management-cluster create --ui command. The installer wizard runs
locally on the bootstrap machine and provides a user interface to guide you through the process of
For Kubernetes stateful workloads, Tanzu Kubernetes Grid installs the vSphere Container Storage
interface (vSphere CSI) to provision Kubernetes persistent volumes for pods automatically. While the
default vSAN storage policy can be used, site reliability engineers (SREs) and administrators should
evaluate the needs of their applications and craft a specific vSphere Storage Policy. vSAN storage
policies describe classes of storage such as SSD and NVME, as well as cluster quotas.
In vSphere 7u1+ environments with vSAN, the vSphere CSI driver for Kubernetes also supports
creating NFS File Volumes, which support ReadWriteMany access modes. This allows for
provisioning volumes which can be read and written from multiple pods simultaneously. To support
this, the vSAN File Service must be enabled.
You can also use other types of vSphere datastores. There are Tanzu Kubernetes Grid Cluster Plans
that operators can define to use a certain vSphere datastore when creating new workload clusters.
All developers would then have the ability to provision container-backed persistent volumes from
that underlying datastore.
Antrea
Calico
Both are open-source software that provide networking for cluster pods, services, and ingress.
When you deploy a Tanzu Kubernetes cluster using Tanzu Mission Control or Tanzu CLI, Antrea CNI
is automatically enabled in the cluster.
Tanzu Kubernetes Grid also supports Multus CNI which can be installed through Tanzu user-
managed packages. Multus CNI lets you attach multiple network interfaces to a single pod and
associate each with a different address range.
To provision a Tanzu Kubernetes cluster using a non-default CNI, see the following instructions:
Each CNI is suitable for a different use case. The following table lists some common use cases for the
three CNIs that Tanzu Kubernetes Grid supports. This table will help you with information on
selecting the right CNI in your Tanzu Kubernetes Grid implementation.
Ant
Enable Kubernetes pod networking with IP overlay networks using Pros
rea
VXLAN or Geneve for encapsulation. Optionally encrypt node-to-
- Provides an option to configure
node communication using IPSec packet encryption.
egress IP pool or static egress IP for
Antrea supports advanced network use cases like kernel bypass and Kubernetes workloads.
network service mesh.
- No multicast support
Mul Multus CNI provides multiple interfaces per each Kubernetes pod.
Pros
tus Using Multus CRDs, you can specify which pods get which interfaces
and allow different interfaces depending on the use case. - Separation of data/control planes.
Note
The L7 ingress service provider for the application hosted on the TKG cluster.
L4 load balancer for the Kubernetes cluster control plane API server.
Each workload cluster integrates with NSX ALB by running an NSX ALB Kubernetes Operator (AKO)
on one of its nodes. The cluster’s AKO calls the Kubernetes API to manage the lifecycle of load
balancing and ingress resources for its workloads.
The Enterprise Edition is the default licensing tier for an Avi Controller. A new Avi Controller is set up
in the Enterprise Edition licensing tier, and the Controller can be switched from one edition to
another. For more information about NSX ALB Feature comparison, see NSX Advanced Load
Balancer Editions.
For more information about VMware NSX ALB Enterprise edition, see VMware NSX ALB Enterprise
Edition.
For more information about VMware NSX ALB essentials for Tanzu edition, see VMware NSX ALB
essentials for Tanzu.
NSX Advanced Load Balancer Controller - NSX ALB Controller manages Virtual Service
objects and interacts with the vCenter Server infrastructure to manage the lifecycle of the
service engines (SEs). It is the central repository for the configurations and policies related to
services and management, and it provides the portal for viewing the health of VirtualServices
and SEs and the associated analytics that NSX ALB provides.
NSX Advanced Load Balancer Service Engine - The service engines (SEs) are lightweight
VMs that handle all data plane operations by receiving and executing instructions from the
controller. The SEs perform load balancing and all client- and server-facing network
interactions.
Cloud - Clouds are containers for the environment that NSX ALB is installed or operating
within. During initial setup of NSX Advanced Load Balancer, a default cloud, named Default-
Cloud, is created. This is where the first controller is deployed into Default-Cloud. Additional
clouds may be added, containing SEs and virtual services.
NSX ALB Kubernetes Operator (AKO) - It is a Kubernetes operator that runs as a pod in the
Supervisor Cluster and Tanzu Kubernetes clusters, and it provides ingress and load balancing
functionality. AKO translates the required Kubernetes objects to NSX ALB objects and
automates the implementation of ingresses, routes, and services on the service engines (SE)
through the NSX ALB Controller.
AKO Operator (AKOO) - This is an operator which is used to deploy, manage, and remove
the AKO pod in Kubernetes clusters. This operator when deployed creates an instance of the
AKO controller and installs all the relevant objects like:
AKO StatefulSet
Tanzu Kubernetes Grid management clusters have an AKO operator installed out-of-the-box during
cluster deployment. By default, a Tanzu Kubernetes Grid management cluster has a couple of
AkoDeploymentConfig created which dictates when and how AKO pods are created in the workload
clusters. For more information, see AKO Operator documentation.
Optionally, you can enter one or more cluster labels to identify clusters on which to selectively
enable NSX ALB or to customize NSX ALB settings for different groups of clusters. This is useful in
the following scenarios: - You want to configure different sets of workload clusters to different
Service Engine Groups to implement isolation or to support more Service type Load Balancers than
one Service Engine Group’s capacity. - You want to configure different sets of workload clusters to
different Clouds because they are deployed in different sites.
To enable NSX ALB selectively rather than globally, add labels in the format key: value pair in the
management cluster config file. This will create a default AKO Deployment Config (ADC) on
management cluster with the NSX ALB settings provided. Labels that you define here will be used to
create a label selector. Only workload cluster objects that have the matching labels will have the load
balancer enabled.
To customize the NSX ALB settings for different groups of clusters, create an AKO Deployment
Config (ADC) on management cluster by customizing the NSX ALB settings, and providing a unique
label selector for the ADC. Only the workload cluster objects that have the matching labels will have
these custom settings applied.
You can label the cluster during the workload cluster deployment or label it manually post cluster
creation. If you define multiple key-values, you need to apply all of them. - Provide an AVI_LABEL
in the below format in the workload cluster deployment config file, and it will automatically label the
cluster and select the matching ADC based on the label selector during the cluster deployment.
AVI_LABELS: | 'type': 'tkg-workloadset01' - Optionally, you can manually label the cluster object
of the corresponding workload cluster with the labels defined in ADC. kubectl label cluster
<cluster-name> type=tkg-workloadset01
Each environment configured in NSX ALB is referred to as a cloud. Each cloud in NSX ALB
maintains networking and NSX ALB Service Engine settings. The cloud is configured with one or
more VIP networks to provide IP addresses to load balancing (L4 or L7) virtual services created
under that cloud.
The virtual services can span across multiple service engines if the associated Service Engine Group
is configured in the Active/Active HA mode. A service engine can belong to only one Service
Engine group at a time.
IP address allocation for virtual services can be over DHCP or through the in-built IPAM functionality
of NSX Advanced Load Balancer. The VIP networks created or configured in NSX Advanced Load
Balancer are associated with the IPAM profile.
TKO- Register the management Tanzu Mission Control automates the creation of Only Antrea CNI is
TKG-001 cluster with Tanzu Mission the Tanzu Kubernetes clusters and manages the life supported on
Control. cycle of all clusters centrally. Workload clusters
created from the TMC
portal.
TKO- Use NSX ALB as your NSX ALB is tightly coupled with TKG and vSphere. Adds NSX ALB
TKG- control plane endpoint Since NSX ALB is a VMware product, customers License Cost to the
002 provider and for will have single point of contact for support. solution.
application load
balancing.
TKO- Deploy Tanzu Kubernetes Large form factor should suffice to integrate TKG Consume more
TKG- Management clusters in Mgmt Cluster with TMC, pinniped and velero Resources from
003 large form factor. deployment. This must be capable of Infrastructure.
accommodating 100+ Tanzu Workload Clusters.
TKO- Deploy Tanzu Kubernetes This deploys multiple control plane nodes and Consume more
TKG- clusters with prod plan provides high availability for the control plane. Resources from
004 (Management Cluster and Infrastructure.
Workload Clusters).
TKO- Enable identity Role-based access control to Tanzu Kubernetes Required External
TKG- management for TKG Grid clusters. Identity Management.
005 clusters.
The network reference design can be mapped into this general framework:
Isolate and separate SDDC management components (vCenter, ESX) from the Tanzu
Kubernetes Grid components. This reference design allows only the minimum connectivity
between the Tanzu Kubernetes Grid clusters and NSX ALB to the vCenter Server.
Isolate and separate NSX ALB management network from the Tanzu Kubernetes Grid
management segment and the Tanzu Kubernetes Grid workload segments.
Depending on the workload cluster type and use case, multiple workload clusters may
leverage the same workload network or new networks can be used for each workload
cluster. To isolate and separate Tanzu Kubernetes Grid workload cluster networking from
each other, it is recommended to use separate networks for each workload cluster and
configure the required firewall between these networks. For more information, see Firewall
Requirements.
Separate provider and tenant need access to the Tanzu Kubernetes Grid environment.
Only provider administrators need access to the Tanzu Kubernetes Grid management
cluster. This prevents tenants from attempting to connect to the Tanzu Kubernetes
Grid management cluster.
Only allow tenants to access their Tanzu Kubernetes Grid workload clusters and restrict
access to this cluster from other tenants.
Network Requirements
As per the defined architecture, the list of required networks follows:
TKG Yes Control plane and worker nodes of TKG management cluster and shared service clusters
Management will be attached to this network.
Network
Creating shared service cluster on a separate network is also supported.
TKG Workload Yes Control plane and worker nodes of TKG workload clusters will be attached to this
Network network.
TKG Cluster No Virtual services for control plane HA of all TKG clusters (management, shared service,
VIP/Data and workload).
Network
Reserve sufficient IP addresses depending on the number of TKG clusters planned to be
deployed in the environment. The NSX ALB takes care of IPAM on this network.
TKG No Virtual services for all user-managed packages (such as Contour, Harbor, Contour,
Management Prometheus, Grafana) hosted on the Shared service cluster. For more information, see
VIP/Data User-Managed Packages.
Network
TKG Workload No Virtual services for all applications are hosted on the workload clusters.
VIP/Data
Network Reserve sufficient IP addresses depending on the number of applications that are
planned to be hosted on the workload clusters and scalability considerations.
Network Recommendations
The key network recommendations for a production-grade Tanzu Kubernetes Grid deployment with
NSX-T Data Center Networking are as follows:
Decision
Design Decision Design Justification Design Implications
ID
TKO- Use Dedicated networks for the Management Cluster To have a flexible Additional VLAN
NET-001 Nodes and Workload Cluster Nodes. firewall and security Required (OPEX
policies. overhead).
TKO- Use Dedicated VIP network for the Application To have a flexible Additional VLAN
NET- Hosted in Management and Workload Cluster. firewall and security Required (OPEX
002 policies. overhead).
TKO- Shared Service Cluster uses Management network Host Shared Services VLAN Based Firewall
NET- and Application VIP network of Management Cluster. like Harbor. Policies are not
003 possible.
Node IPAM can be configured for standalone management clusters on vSphere, and the associated
class-based workload clusters that they manage. In the Tanzu Kubernetes Grid Management
configuration file, a dedicated Node IPAM pool is defined for the management cluster only. The
following types of Node IPAM pools are available for workload clusters: - InClusterIPPool -
Configures IP pools that are only available to workload clusters in the same management cluster
namespace. For example, default. - GlobalInClusterIPPool - Configures IP pools with addresses that
can be allocated to workload clusters across multiple namespaces. Node IPAM in TKG provides
flexibility in managing IP addresses for both management and workload clusters that allows efficient
IP allocation and management within the cluster environment.
Gateway
Network Type Port Group Name DHCP Pool NSX ALB IP Pool
CIDR
3-Network Architecture
For POC environments and minimal networks requirement, you can proceed with 3 network
architecture. In this design, we deploy the Tanzu Kubernetes Grid into 3 networks as Infrastructure
Management Network, TKG Management Network and TKG Workload Network. This design allows
us to use only 3 networks and ensures the isolation between Infra VMs, TKG Management and TKG
Workload components.
This network reference design can be mapped into this general framework:
This topology enables the following benefits: - Deploy the NSX ALB components on the existing
infrastructure management network which reduces an additional network usage. - Isolate and
separate the NSX ALB, SDDC management components (vCenter and ESX) from the VMware Tanzu
Kubernetes Grid components. - Club TKG Mgmt Cluster VIP, TKG Mgmt Data VIP, TKG Mgmt into a
single network TKG-Mgmt-Network, that ensures that the TKG Management components are deployed
in a common network, and removes additional network overhead and firewall rules. - Club TKG
Workload Cluster VIP, TKG Workload Data VIP, TKG Workload into a single network TKG-Workload-
Network, that ensures that the TKG Workload components are deployed in a common network. -
Separate the Management control plane/Data VIP and the Workload control plane/Data VIP into
different networks to enhance the isolation and security.
Network Requirements
Infrastructure Option NSX ALB controllers and Service Engines (SE) are attached to this network. DHCP is
Management al not a mandatory requirement on this network as NSX ALB manages the SE networking
Network with IPAM.
This network also hosts core infrastructure components such as, vCenter, ESXi hosts,
DNS, NTP, and so on.
TKG Yes Control plane and worker nodes of the TKG Management cluster and the shared services
Management clusters are attached to this network. The IP Assignment is managed through DHCP.
Network
TKG Management cluster VIP and TKG Management Data VIP assignment is also
managed from the same network using NSX ALB Static IP pool.
Ensure that DHCP range does not interfere with the NSX ALB IP Block reservation.
TKG Workload Yes Control plane and worker nodes of the TKG Workload cluster and the shared services
Network clusters are attached to this network. IP Assignment is managed done through DHCP.
TKG Workload cluster VIP and TKG Workload Data VIP assignment is also managed
from the same network using NSX ALB Static IP pool.
Ensure that DHCP range does not interfere with the NSX ALB IP Block reservation.
Firewall Requirements
To prepare the firewall, you need to gather the following information:
NTP servers
The following table provides a list of firewall rules based on the assumption that there is no firewall
within a subnet/VLAN:
TKG management and DHCP Server UDP: 67, 68 Allows hosts to get DHCP
workload Networks addresses.
TKG management cluster TKG cluster VIP network TCP:6443 For management cluster to
network configure shared service and
workload cluster.
TKG shared service cluster TKG cluster VIP network TCP:6443 Allows shared cluster to register
network with management cluster.
(Required only if using a
separate network for shared
service cluster)
TKG workload cluster TKG cluster VIP network TCP:6443 Allow workload cluster to register
network with management cluster.
Note: In a 3 network design,
destination network is “TKG
Management Network”
TKG management, shared NSX ALB Controllers (NSX ALB TCP:443 Allows NSX ALB Kubernetes
service, and workload Management Network) Operator (AKO) and AKO
Networks Operator (AKOO) access to NSX
ALB Controller.
NSX ALB Management vCenter and ESXi Hosts TCP:443 Allows NSX ALB to discover
Network vCenter objects and deploy SEs
as required.
Decision Design
Design Decision Design Justification
ID Implications
TKO- Deploy NSX ALB controller Isolate NSX ALB traffic from infrastructure management Additional
ALB-001 cluster nodes on a network traffic and Kubernetes workloads. Network
dedicated to NSX-ALB. (VLAN ) is
required.
TKO- Deploy 3 NSX ALB controllers To achieve high availability for the NSX ALB platform. In None
ALB- nodes. clustered mode, NSX ALB availability is not impacted by
002 an individual controller node failure.
Decision Design
Design Decision Design Justification
ID Implications
TKO- Under Compute policies vSphere places NSX ALB Controller VMs in a way that Affinity
ALB- create ‘VM-VM anti-affinity’ always ensures maximum HA. Rules needs
003 rule that prevents collocation to be
of the NSX ALB Controllers configured
VMs on the same host. manually.
TKO- Use static IP addresses for the NSX ALB Controller cluster uses management IP addresses None
ALB- NSX ALB controllers to form and maintain quorum for the control plane cluster.
004 Any changes to management IP addresses will be
disruptive.
TKO- Use NSX ALB IPAM for Simplify the IP address management for Virtual Service None
ALB- service engine data network and Service engine from NSX ALB
005 and virtual services.
TKO- Reserve an IP address in the NSX ALB portal is always accessible over cluster IP address Additional
ALB- NSX ALB management regardless of a specific individual controller node failure. IP is
006 subnet to be used as the required.
cluster IP address for the
controller cluster.
TKO- Create a dedicated resource Guarantees the CPU and Memory allocation for NSX ALB None
ALB- pool with appropriate Controllers and avoids performance degradation in case of
007 reservations for NSX ALB resource contention.
controllers.
TKO- Replace default NSX ALB To establish a trusted connection with other infra None,
ALB- certificates with Custom CA components, and the default certificate does not include SAN entries
008 or Public CA-signed SAN entries which is not acceptable by Tanzu. are not
certificates that contains SAN applicable if
entries of all Controller wild card
nodes. certificate is
used.
TKO- Configure NSX ALB backup Periodic backup of NSX ALB configuration database is Additional
ALB- with a remote server as recommended. The database defines all clouds, all virtual Operational
009 backup location. services, all users, and others. As a best practice, store Overhead.
backups in an external location to provide backup Additional
capabilities in case of entire cluster failure. infrastructur
e Resource.
TKO- Configure Remote logging For operations teams to be able to centrally monitor NSX Additional
ALB-010 for NSX ALB Controller to ALB and escalate alerts, events must be sent from the NSX Operational
send events on Syslog. ALB Controller. Overhead.
Additional
infrastructur
e Resource.
TKO- Use LDAP/SAML based Helps to Maintain Role based Access Control. Additional
ALB-011 Authentication for NSX ALB. Configurati
on is
required.
Decision
Design Decision Design Justification Design Implications
ID
TKO- NSX ALB Service Engine Provides higher resiliency, optimum Requires NSX ALB
ALB-SE- High Availability set to performance, and utilization compared to Enterprise Licensing. Only
001 Active/Active. N+M and/or Active/Standby. the Active/Standby mode
is supported with NSX ALB
essentials for Tanzu license.
TKO- Dedicated Service Engine SE resources are guaranteed for TKG Dedicated service engine
ALB-SE- Group for the TKG Management Stack and provides data path Groups increase licensing
002 Management. segregation for Management and Tenant cost.
Application.
TKO- Dedicated Service Engine SE resources are guaranteed for single or Dedicated service engine
ALB-SE- Group for the TKG Workload set of workload clusters and provides data Groups increase licensing
003 Clusters Depending on the path segregation for Tenant Application cost.
nature and type of workloads hosted on workload clusters.
(dev/prod/test).
TKO- Enable ALB Service Engine Enable SEs to elect a primary amongst Requires NSX ALB
ALB-SE- Self Elections. themselves in the absence of connectivity Enterprise Licensing. This
004 to the NSX ALB controller. feature is not supported
with NSX ALB essentials for
Tanzu license.
TKO- Enable ‘Dedicated dispatcher This will enable a dedicated core for packet Consume more Resources
ALB-SE- CPU’ on Service Engine processing enabling high packet pipeline from Infrastructure.
005 Groups that contain the on the Service Engine VMs.
Service Engine VMs of 4 or Note: By default, the packet processing
more vCPUs. core also processes load-balancing flows.
TKO- Set ‘Placement across the This allows maximum utilization of capacity None
ALB-SE- Service Engines’ setting to (Service Engine).
006 ‘Compact’.
TKO- Set the SE size to a minimum This configuration should meet the most For services that require
ALB-SE- 2vCPU and 4GB of Memory. generic use case. higher throughput, these
007 configuration needs to be
investigated and modified
accordingly.
Decision
Design Decision Design Justification Design Implications
ID
TKO- Under Compute policies vSphere will take care of placing the Service Affinity Rules needs to be
ALB-SE- Create a ‘VM-VM anti-affinity Engine VMs in a way that always ensures configured manually.
008 rule for SE engines part of the maximum HA for the Service Engines part
same SE group that prevents of a Service Engine group.
collocation of the Service
Engine VMs on the same
host.
TKO- Reserve Memory and CPU for The Service Engines are a critical You must perform
ALB-SE- Service Engines. infrastructure component providing load- additional configuration to
009 balancing services to mission-critical set up the reservations.
applications. Guarantees the CPU and
Memory allocation for SE VM and avoids
performance degradation in case of
resource contention.
Contour is an open-source controller for Kubernetes ingress routing. Contour can be installed in the
shared services cluster on any Tanzu Kubernetes Cluster. Deploying Contour is a prerequisite if you
want to deploy Prometheus, Grafana, and Harbor packages on a workload cluster.
For more information about Contour, see the Contour website and Implementing Ingress Control
with Contour.
Another option is to use the NSX ALB Kubernetes ingress controller that offers an advanced L7
ingress for containerized applications that are deployed in the Tanzu Kubernetes workload cluster.
For more information about the NSX ALB ingress controller, see Configuring L7 Ingress with NSX
Advanced Load Balancer.
Tanzu Service Mesh, which is a SaaS offering for modern applications running across multi-cluster,
multi-clouds, also offers an ingress controller based on Istio.
The following table provides general recommendations on when you should use a specific ingress
controller for your Kubernetes environment.
Ingress
Use Cases
Controller
Contour
Use Contour when only north-south traffic is needed in a Kubernetes cluster. You can apply security
policies for the north-south traffic by defining the policies in the application’s manifest file.
Istio Use Istio ingress controller when you intend to provide security, traffic direction, and insights within the
cluster (east-west traffic) and between the cluster and the outside world (north-south traffic).
NSX ALB
Use NSX ALB ingress controller when a containerized application requires features like local and global
ingress
server load balancing (GSLB), web application firewall (WAF), performance monitoring, direct routing
controller
from LB to pod, and so on.
Kubernetes operator that integrates with the Kubernetes API to manage the lifecycle of load
balancing and ingress resources for workloads.
Legacy ingress services for Kubernetes include multiple disparate solutions. The services and
products contain independent components that are difficult to manage and troubleshoot. The ingress
services have reduced observability capabilities with little analytics, and they lack comprehensive
visibility into the applications that run on the system. Cloud-native automation is difficult in the legacy
ingress services.
In comparison to the legacy Kubernetes ingress services, NSX ALB has comprehensive load
balancing and ingress services features. As a single solution with a central control, NSX ALB is easy
to manage and troubleshoot. NSX ALB supports real-time telemetry with an insight into the
applications that run on the system. The elastic auto-scaling and the decision automation features
highlight the cloud-native automation capabilities of NSX Advanced Load Balancer.
NSX ALB with Enterprise Licensing also lets you configure L7 ingress for your workload clusters by
using one of the following options:
This option enables NSX ALB L7 ingress capabilities, including sending traffic directly from the
service engines (SEs) to the pods, preventing multiple hops that other ingress solutions need when
sending packets from the load balancer to the right node where the pod runs. The ALB controller
creates a virtual service with a backend pool with the pod IP addresses which helps to send the
traffic directly to the pods.
However, each workload cluster needs a dedicated SE group for NSX ALB Kubernetes Operator
(AKO) to work, which could increase the number of SEs you need for your environment. This mode
is used when you have a small number of workload clusters.
The NodePort mode is the default mode when AKO is installed on Tanzu Kubernetes Grid. This
option allows your workload clusters to share SE groups and is fully supported by VMware. With this
option, the services of your workloads must be set to NodePort instead of ClusterIP even when
accompanied by an ingress object. This ensures that NodePorts are created on the worker nodes
and traffic can flow through the SEs to the pods via the NodePorts. Kube-Proxy, which runs on each
node as DaemonSet, creates network rules to expose the application endpoints to each of the nodes
in the format “NodeIP:NodePort”. The NodePort value is the same for a service on all the nodes. It
exposes the port on all the nodes of the Kubernetes Cluster, even if the pods are not running on it.
This feature is supported only with Antrea CNI. You must enable this feature on a workload cluster
before its creation. The primary difference between this mode and the NodePort mode is that the
traffic is sent directly to the pods in your workload cluster through node ports without interfering
Kube-proxy. With this option, the workload clusters can share SE groups. Similar to the ClusterIP
Mode, this option avoids the potential extra hop when sending traffic from the NSX ALB SEs to the
pod by targeting the right nodes where the pods run.
Antrea agent configures NodePortLocal port mapping rules at the node in the format
“NodeIP:Unique Port” to expose each pod on the node on which the pod of the service is running.
The default range of the port number is 61000-62000. Even if the pods of the service are running
on the same Kubernetes node, Antrea agent publishes unique ports to expose the pods at the node
level to integrate with the load balancer.
This option does not have all the NSX ALB L7 ingress capabilities but uses it for L4 load balancing
only and leverages Contour for L7 Ingress. This also allows sharing SE groups across workload
clusters. This option is supported by VMware and it requires minimal setup.
TKO- Deploy NSX ALB L7 - Provides better Network hop efficiency. Supported with Antrea CNI
ALB-L7- ingress in - Helps to reduce the east-west traffic and with IPV4 addressing.
001 NodePortLocal mode. encapsulation overhead.
- Shared Service Engine groups across To configure L7 Ingress, you
clusters and supports the load-balancing need NSX ALB Enterprise
persistence. Licensing.
Container Registry
VMware Tanzu for Kubernetes Operations using Tanzu Kubernetes Grid includes Harbor as a
container registry. Harbor provides a location for pushing, pulling, storing, and scanning container
images used in your Kubernetes clusters.
Harbor registry is used for day-2 operations of the Tanzu Kubernetes workload clusters. Typical day-
2 operations include tasks such as pulling images from Harbor for application deployment, pushing
custom images to Harbor, and so on.
VM-based deployment using OVA - VMware recommends this installation method in cases
where Tanzu Kubernetes Grid is being installed in an air-gapped or Internet-restricted
environment, and no pre-existing image registry exists to host the Tanzu Kubernetes Grid
system images. VM-based deployments are only supported by VMware Global Support
Services to host the system images for air-gapped or Internet-restricted deployments. Do
not use this method for hosting application images.
If you are deploying Harbor without a publicly signed certificate, you must include the Harbor root
CA in your Tanzu Kubernetes Grid clusters. To do so, follow the procedure in Trust Custom CA
Certificates on Cluster Nodes.
Tanzu Kubernetes Grid includes signed binaries for Prometheus and Grafana that you can deploy on
Tanzu Kubernetes clusters to monitor cluster health and services.
Prometheus is an open source systems monitoring and alerting toolkit. It can collect metrics
from target clusters at specified intervals, evaluate rule expressions, display the results, and
trigger alerts if certain conditions arise. The Tanzu Kubernetes Grid implementation of
Prometheus includes Alert Manager, which you can configure to notify you when certain
events occur.
Grafana is an open source visualization and analytics software. It allows you to query,
visualize, alert on, and explore your metrics no matter where they are stored. Grafana is used
for visualizing Prometheus metrics without the need to manually write the PromQL queries.
You can create custom charts and graphs in addition to the pre-packaged options.
You deploy Prometheus and Grafana on Tanzu Kubernetes clusters. The following diagram shows
how the monitoring components on a cluster interact.
You can use out-of-the-box Kubernetes dashboards or you can create new dashboards to monitor
compute, network, and storage utilization of Kubernetes objects such as Clusters, Namespaces,
Pods, and so on.
You can also monitor your Tanzu Kubernetes Grid clusters with Tanzu Observability which is a SaaS
offering by VMware. Tanzu Observability provides various out-of-the-box dashboards. You can
customize the dashboards for your particular deployment. For information on how to customize
Tanzu Observability dashboards for Tanzu for Kubernetes Operations, see Customize Tanzu
Observability Dashboard for Tanzu for Kubernetes Operations.
Log processing and forwarding in Tanzu Kubernetes Grid is provided via Fluent Bit. Fluent bit
binaries are available as part of extensions and can be installed on management cluster or in
workload cluster. Fluent Bit is a light-weight log processor and forwarder that allows you to collect
data and logs from different sources, unify them, and send them to multiple destinations. VMware
Tanzu Kubernetes Grid includes signed binaries for Fluent Bit that you can deploy on management
clusters and on Tanzu Kubernetes clusters to provide a log-forwarding service.
Fluent Bit makes use of the Input Plug-ins, the filters, and the Output Plug-ins. The Input Plug-ins
define the source from where it can collect data, and the Output plug-ins define the destination
where it should send the information. The Kubernetes filter will enrich the logs with Kubernetes
metadata, specifically labels and annotations. Once you configure Input and Output plug-ins on the
Tanzu Kubernetes Grid cluster. Fluent Bit is installed as a user-managed package.
Fluent Bit integrates with logging platforms such as VMware Aria Operations for Logs, Elasticsearch,
Kafka, Splunk, or an HTTP endpoint. For more details about configuring Fluent Bit to your logging
provider, see Implement Log Forwarding with Fluent Bit.
A custom image must be based on the operating system (OS) versions that are supported by Tanzu
Kubernetes Grid. The table below provides a list of the operating systems that are supported for
building custom images for Tanzu Kubernetes Grid.
- Photon OS 3
- Windows 2019
For additional information on building custom images for Tanzu Kubernetes Grid, see the Build
Machine Images.
VMware provides FIPS-capable Kubernetes OVA that can be used to deploy FIPS compliant Tanzu
Kubernetes Grid management and workload clusters. Tanzu Kubernetes Grid core components,
such as Kubelet, Kube-apiserver, Kube-controller manager, Kube-proxy, Kube-scheduler, Kubectl,
Etcd, Coredns, Containerd, and Cri-tool are made FIPS compliant by compiling them with the
BoringCrypto FIPS modules, an open-source cryptographic library that provides FIPS 140-2
approved algorithms.
Installation Experience
Tanzu Kubernetes Grid management cluster is the first component that you deploy to get started
with Tanzu Kubernetes Grid.
Run the Tanzu Kubernetes Grid installer, a wizard interface that guides you through the
process of deploying a management cluster. This is the recommended method if you are
installing a Tanzu Kubernetes Grid management cluster for the first time.
Create and edit YAML configuration files, and use them to deploy a management cluster with
the CLI commands.
The Tanzu Kubernetes Grid Installation user interface shows that, in the current version, it is possible
to install Tanzu Kubernetes Grid on vSphere (including VMware Cloud on AWS), AWS EC2, and
Microsoft Azure. The UI provides a guided experience tailored to the IaaS, in this case, VMware
vSphere.
The installation of Tanzu Kubernetes Grid on vSphere is done through the same installer UI but
tailored to a vSphere environment.
This installation process will take you through the setup of a management cluster on your vSphere
environment. Once the management cluster is deployed, you can make use of Tanzu Mission
Control or Tanzu CLI to deploy Tanzu Kubernetes shared service and workload clusters.
Deployment Instructions
For instructions on how to deploy this reference design, see Deploy Tanzu for Kubernetes
Operations on vSphere with VMware VDS.
Summary
Tanzu Kubernetes Grid on vSphere on hyper-converged hardware offers high-performance
potential, convenience, and addresses the challenges of creating, testing, and updating on-premises
Kubernetes platforms in a consolidated production environment. This validated approach will result in
a near-production quality installation with all the application services needed to serve combined or
uniquely separated workload types through a combined infrastructure solution.
This plan meets many Day 0 needs for quickly aligning product capabilities to full stack infrastructure,
including networking, firewalling, load balancing, workload compute alignment, and other
capabilities.
On vSphere, you can configure all node VMs to have the same predefined configurations, set
different predefined configurations for control plane and worker nodes, or customize the
configurations of the nodes. By using these settings, you can create clusters that have nodes with
different configurations to the management cluster nodes. You can also create clusters in which the
control plane nodes and worker nodes have different configurations.
Small 2 4 20
Medium 2 8 40
Large 4 16 40
Extra-large 8 32 80
To create a cluster in which all of the control plane and worker node VMs are the same size, specify
the SIZE variable. If you set the SIZE variable, all nodes will be created with the configuration that
you set.
SIZE: "large"
To create a cluster in which the control plane and worker node VMs are different sizes, specify the
CONTROLPLANE_SIZE and WORKER_SIZE options.
CONTROLPLANE_SIZE: "medium"
WORKER_SIZE: "large"
You can combine the CONTROLPLANE_SIZE and WORKER_SIZE options with the SIZE option. For
example, if you specify SIZE: "large" with WORKER_SIZE: "extra-large", the control plane nodes
will be set to large and worker nodes will be set to extra-large.
SIZE: "large"
WORKER_SIZE: "extra-large"
To use the same custom configuration for all nodes, specify the VSPHERE_NUM_CPUS,
VSPHERE_DISK_GIB, and VSPHERE_MEM_MIB options.
VSPHERE_NUM_CPUS: 2
VSPHERE_DISK_GIB: 40
VSPHERE_MEM_MIB: 4096
To define different custom configurations for control plane nodes and worker nodes, specify the
VSPHERE_CONTROL_PLANE_* and VSPHERE_WORKER_*
VSPHERE_CONTROL_PLANE_NUM_CPUS: 2
VSPHERE_CONTROL_PLANE_DISK_GIB: 20
VSPHERE_CONTROL_PLANE_MEM_MIB: 8192
VSPHERE_WORKER_NUM_CPUS: 4
VSPHERE_WORKER_DISK_GIB: 40
VSPHERE_WORKER_MEM_MIB: 4096
The number of virtual services that can be deployed per controller cluster is directly proportional to
the controller cluster size. See the NSX ALB Configuration Maximums Guide for more information.
Performance metric Per core performance Maximum performance on a single Service Engine VM
Multiple performance vectors or features may have an impact on performance. For instance, to
achieve 1 Gb/s of SSL throughput and 2000 TPS of SSL with EC certificates, NSX ALB recommends
two cores.
The scope of the document is limited to providing the deployment steps based on the reference
design in VMware Tanzu for Kubernetes Operations on vSphere Reference Design. This document
does not cover any deployment procedures for the underlying SDDC components.
VMware Service Installer for Tanzu automates the deployment of the reference designs for Tanzu for
Kubernetes Operations. It uses best practices for deploying and configuring the required Tanzu for
Kubernetes Operations components.
To use Service Installer to automate this deployment, see Deploying VMware Tanzu for Kubernetes
Operations on vSphere with vSphere Distributed Switch Using Service Installer for VMware Tanzu.
Alternatively, if you decide to manually deploy each component, follow the steps provided in this
document.
General Requirements
Network Requirements
Firewall Requirements
General Requirements
The general requirements for deploying Tanzu for Kubernetes Operations on vSphere in your
environment are as follows:
A datastore with sufficient capacity for the control plane and worker node VM files.
Network Time Protocol (NTP) service running on all hosts and vCenter.
A host, server, or VM based on Linux, MacOS, or Windows that acts as your bootstrap
machine, and that has docker installed. For this deployment, a virtual machine based on
Photon OS is used.
Depending on the OS flavor of the bootstrap VM, download and configure the kubectl
cluster CLI 1.26.5 from VMware Customer Connect. As part of this documentation, refer to
the section to configure required packages on the Photon OS machine.
Download Tanzu CLI v0.90.1 from VMware Customer Connect. Starting with TKG 2.3.0,
Tanzu Core CLI is now distributed separately from Tanzu Kubernetes Grid. For instructions
on how to install the Tanzu CLI for use with Tanzu Kubernetes Grid, refer Install the Tanzu
CLI.
A vSphere account with the permissions described in Required Permissions for the vSphere
Account.
Download and import NSX Advanced Load Balancer 22.1.3 OVA to Content Library.
Download the following OVA from VMware Customer Connect and import to vCenter.
Convert the imported VMs to templates.
Note
You can also download supported older versions of Kubernetes from VMware
Customer Connect, and import them to deploy workload clusters on the intended
Kubernetes versions.
The sample entries of the resource pools and folders that need to be created are as follows.
Network Requirements
Create port groups on vSphere DVSwitch for deploying Tanzu for Kubernetes Operations
components as per Network Requirements defined in the reference architecture.
Firewall Requirements
Ensure that the firewall is set up as described in Firewall Requirements.
Gateway
Network Type Port Group Name DHCP Pool NSX ALB IP Pool
CIDR
NSX Advanced Load Balancer is deployed in Write Access Mode in the vSphere Environment. This
mode grants NSX Advanced Load Balancer controllers full write access to vCenter that helps in
automatically creating, modifying, and removing service engines (SEs) and other resources as
needed to adapt to changing traffic needs.
The following table provides a sample IP address and FQDN set for the NSX Advanced Load
Balancer controllers:
Perform the following steps to deploy and configure NSX Advanced Load Balancer:
7. NSX Advanced Load Balancer: Configure Network and IPAM & DNS Profiles
2. Select the content library under which the NSX Advanced Load Balancer OVA is placed.
4. Right-click the NSX Advanced Load Balancer image and select New VM from this
Template.
5. On the Select name and folder page, enter a name and select a folder for the NSX
Advanced Load Balancer VM as tkg-alb-components.
6. On the Select a compute resource page, select the resource pool tkg-alb-components.
7. On the Review details page, verify the template details and click Next.
8. On the Select storage page, select a storage policy from the VM Storage Policy drop-down
menu and choose the datastore location where you want to store the virtual machine files.
9. On the Select networks page, select the network sfo01-w01-vds01-albmanagement and click
Next.
10. On the Customize template page, provide the NSX Advanced Load Balancer management
network details such as IP address, subnet mask, and gateway, and click Next.
11. On the Ready to complete page, review the page and click Finish.
A new task for creating the virtual machine appears in the Recent Tasks pane. After the task is
complete, the NSX Advanced Load Balancer virtual machine is created on the selected resource.
Power on the virtual machine and give it a few minutes for the system to boot. Upon successful boot
up, go to NSX Advanced Load Balancer on your browser.
Note
While the system is booting up, a blank web page or a 503 status code might
appear.
2. On the Welcome page, under System Settings, set backup passphrase and provide DNS
information, and click Next.
3. Under Email/SMTP, provide email and SMTP information, and click Next.
Service Engines are managed within the: Provider (Shared across tenants)
If you did not select the Setup Cloud After option before saving, the initial configuration wizard exits.
The Cloud configuration window does not automatically launch and you are directed to a dashboard
view on the controller.
Note
1. To configure licensing, go to Administration > System Settings > Licensing and click on the
gear icon to change the license type to Enterprise.
3. Once the license tier is changed, apply the NSX Advanced Load Balancer Enterprise license
key. If you have a license file instead of a license key, apply the license by clicking on the
Upload a License File(.lic) option.
To run a 3-node controller cluster, you deploy the first node and perform the initial configuration,
and set the cluster IP address. After that, you deploy and power on two more controller VMs, but
you must not run the initial configuration wizard or change the admin password for these controllers
VMs. The configuration of the first controller VM is assigned to the two new controller VMs.
The first controller of the cluster receives the Leader role. The second and third controllers work as
the Follower.
Perform the following steps to configure NSX Advanced Load Balancer cluster.
1. Log in to the primary NSX Advanced Load Balancer controller and go to Administrator >
Controller > Nodes, and click Edit.
2. Specify Name and Controller Cluster IP, and click Save. This IP address must be from the
NSX Advanced Load Balancer management network.
3. Deploy the 2nd and 3rd NSX Advanced Load Balancer controller nodes by using steps in
Deploy NSX Advanced Load Balancer.
4. Log into the primary NSX Advanced Load Balancer controller using the Controller Cluster
IP/FQDN and go to Administrator > Controller > Nodes, and click Edit. The Edit Controller
Configuration popup appears.
5. In the Cluster Nodes field, enter the IP address for the 2nd and 3rd controller, and click
Save.
After you complete these steps, the primary NSX Advanced Load Balancer controller
becomes the leader for the cluster and invites the other controllers to the cluster as
members.
NSX Advanced Load Balancer then performs a warm reboot of the cluster. This process can
take approximately 10-15 minutes. You are automatically logged out of the controller node
where you are currently logged in. To see details about the cluster formation task, enter the
cluster IP address in the browser.
The configuration of the primary (leader) controller is synchronized to the new member nodes when
the cluster comes online following the reboot. After the cluster is successfully formed, you can see
the following status:
Note
In the following tasks, all NSX Advanced Load Balancer configurations are done by
connecting to the NSX ALB Controller Cluster IP/FQDN.
1. Log in to the NSX Advanced Load Balancer controller and go to Templates > Security >
SSL/TLS Certificates.
2. Click Create and select Controller Certificate. You can either generate a self-signed
certificate, generate CSR, or import a certificate. For the purpose of this document, a self-
signed certificate is generated.
3. Provide all required details as per your infrastructure requirements and in the Subject
Alternate Name (SAN) field, provide IP address and FQDN of all NSX Advanced Load
Balancer controllers including NSX Advanced Load Balancer cluster IP and FQDN, and click
Save.
4. After the certificate is created, capture the certificate contents as this is required while
deploying the Tanzu Kubernetes Grid management cluster. To capture the certificate
content, click the Download icon next to the certificate, and click Copy to clipboard under
Certificate.
5. To replace the certificate, go to Administration > System Settings, and edit it under Access.
You can replace the SSL/TSL certificate to previously created certificate and click Save.
Service Engine Group 1: Service engines part of this service engine group hosts:
Virtual services that load balances control plane nodes of Management Cluster and Shared
services cluster.
Virtual services for all load balancer functionalities requested by Tanzu Kubernetes Grid
management cluster and Shared services cluster.
Service Engine Group 2: Service engines part of this service engine group hosts virtual services that
load balances control plane nodes and virtual services for all load balancer functionalities requested
by the workload clusters mapped to this SE group.
Note
- Based on your requirements, you can create additional SE groups for the workload
clusters. - Multiple workload clusters can be mapped to a single SE group. - A Tanzu
Kubernetes Grid cluster can be mapped to only one SE group for application load
balancer services. - Control plane VIP for the workload clusters will be placed on the
respective Service Engine group assigned through AKO Deployment Config (ADC)
during cluster creation.
For information about mapping a specific service engine group to Tanzu Kubernetes Grid workload
cluster, see Configure NSX Advanced Load Balancer in Tanzu Kubernetes Grid Workload Cluster.
1. Log in to NSX Advanced Load Balancer and go to Infrastructure > Clouds > Create >
VMware vCenter/vSphere ESX.
3. Under the vCenter/vSphere pane, specify the vCenter address*, Username, and Password,
and click CONNECT.
4. Under the Data Center pane, choose the data center from the Data Center drop-down
menu. Select Content Library for SE template and click SAVE & LAUNCH.
5. To choose the NSX Advanced Load Balancer management network for service engines,
select the Management Network from the Management Network drop-down menu. Enter a
static IP address pool for SEs and VIP, and click Complete.
6. Wait for the cloud to get configured and the status to turn green.
7. To create a service engine group for Tanzu Kubernetes Grid management clusters, under
the Infrastructure tab, go to Cloud Resources > Service Engine Group. From the Select
Cloud drop-down menu, select the cloud created in the previous step and click Create.
8. Enter a name for the Tanzu Kubernetes Grid management service engine group and set the
following parameters:
Parameter Value
Enable Service Engine Self Supported with NSX ALB Enterprise edition.
Election
Memory for caching Supported with NSX ALB Enterprise edition. You must set value to 0 for
essentials.
For advanced configuration, under Scope tab click on the Add vCenter and select
configured vCenter cloud, select cluster and datastore for service engine placement, and
click Save.
9. Repeat steps 7 and 8 to create another service engine group for Tanzu Kubernetes Grid
workload clusters. After completing this step, you will have two service engine groups
created.
As part of the cloud creation in NSX Advanced Load Balancer, only management network has been
configured in NSX Advanced Load Balancer. Perform the following steps to configure these
networks:
Log in to NSX Advanced Load Balancer and go to Infrastructure > Cloud Resources >
Networks.
Select the desired cloud. All the networks available in vCenter are listed.
Click on the edit icon next for the network and configure as follows. Change the provided
details as per your SDDC configuration.
Note
Not all networks are auto-discovered. For those networks, manually add the
subnet.
The following image shows a sample network configuration for network sfo01-w01-vds01-
tkgclustervip. You should apply the same configuration in sfo01-w01-vds01-
tkgmanagementvip and sfo01-w01-vds01-tkgworkloadvip.
After the networks are configured, the configuration must look like the following image.
Create IPAM and DNS Profile in NSX Advanced Load Balancer and Attach it to Cloud
At this point, all the required networks related to Tanzu functionality are configured in NSX
Advanced Load Balancer, except for Tanzu Kubernetes Grid management and workload network
which uses DHCP. NSX Advanced Load Balancer provides IPAM service for Tanzu Kubernetes Grid
cluster VIP network, management VIP network, and workload VIP network.
Perform the following steps to create an IPAM profile and attach it to the vCenter cloud created
earlier.
1. Log in to NSX Advanced Load Balancer and go to Templates > Profiles > IPAM/DNS
Profiles > Create > IPAM Profile, provide the following details, and click Save.
Parameter Value
Name sfo01-w01-vcenter-ipam-01
2. Click Create > DNS Profile and provide the domain name.
3. Under IPAM/DNS section, choose the IPAM and DNS profiles created earlier and
save the updated configuration.
The above steps complete the NSX Advanced Load Balancer configuration. The next step is to
deploy and configure a bootstrap machine. The bootstrap machine is used to deploy and manage
Tanzu Kubernetes clusters.
The bootstrap machine runs a local kind cluster when Tanzu Kubernetes Grid management cluster
deployment is started. Once the kind cluster is fully initialized, the configuration is used to deploy the
actual management cluster on the backend infrastructure. After the management cluster is fully
configured, the local kind cluster is deleted and future configurations are performed with the Tanzu
CLI.
For this deployment, a Photon-based virtual machine is used as the bootstrap machine. For
information on how to configure for a macOS or Windows machine, see Install the Tanzu CLI and
Other Tools.
Docker and containerd binaries are installed. For instructions on how to install Docker, see
Docker documentation.
Ensure that the bootstrap VM is connected to Tanzu Kubernetes Grid management network,
sfo01-w01-vds01-tkgmanagement.
To install Tanzu CLI, Tanzu Plugins, and Kubectl utility on the bootstrap machine, follow the
instructions below:
1. Download and unpack the following Linux CLI packages from VMware Tanzu Kubernetes
Grid Download Product page.
2. Execute the following commands to install Tanzu Kubernetes Grid CLI, kubectl CLIs, and
Carvel tools.
version: v0.90.1
buildDate: 2023-06-29
sha: 8945351c
##Install ytt
gunzip ytt-linux-amd64-v0.45.0+vmware.2.gz
chmod ugo+x ytt-linux-amd64-v0.45.0+vmware.2 && mv ./ytt-linux-amd64-v0.45.0+v
mware.2 /usr/local/bin/ytt
##Install kapp
gunzip kapp-linux-amd64-v0.55.0+vmware.2.gz
chmod ugo+x kapp-linux-amd64-v0.55.0+vmware.2 && mv ./kapp-linux-amd64-v0.55.0+
vmware.2 /usr/local/bin/kapp
##Install kbld
gunzip kbld-linux-amd64-v0.37.0+vmware.2.gz
chmod ugo+x kbld-linux-amd64-v0.37.0+vmware.2 && mv ./kbld-linux-amd64-v0.37.0+
vmware.2 /usr/local/bin/kbld
##Install impkg
gunzip imgpkg-linux-amd64-v0.36.0+vmware.2.gz
chmod ugo+x imgpkg-linux-amd64-v0.36.0+vmware.2 && mv ./imgpkg-linux-amd64-v0.3
6.0+vmware.2 /usr/local/bin/imgpkg
ytt version
kapp version
kbld version
imgpkg version
4. Install yq. yq is a lightweight and portable command-line YAML processor. yq uses jq-like
syntax but works with YAML and JSON files.
wget https://github.com/mikefarah/yq/releases/download/v4.24.5/yq_linux_amd64.t
ar.gz
5. Install kind.
6. Execute the following commands to start the Docker service and enable it to start at boot.
Photon OS has Docker installed by default.
7. Execute the following commands to ensure that the bootstrap machine uses cgroup v1.
An SSH key pair is required for Tanzu CLI to connect to vSphere from the bootstrap
machine.
The public key part of the generated key is passed during the Tanzu Kubernetes Grid
management cluster deployment.
## Add the private key to the SSH agent running on your machine and enter the p
assword you created in the previous step
ssh-add ~/.ssh/id_rsa
## If the above command fails, execute "eval $(ssh-agent)" and then rerun the c
ommand
9. If your bootstrap machine runs Linux or Windows Subsystem for Linux, and it has a Linux
kernel built after the May 2021 Linux security patch, for example Linux 5.11 and 5.12 with
Fedora, run the following command.
All required packages are now installed and the required configurations are in place in the bootstrap
virtual machine. The next step is to deploy the Tanzu Kubernetes Grid management cluster.
imported into vSphere and is available as a template. To import a base image template into vSphere:
1. Go to the Tanzu Kubernetes Grid downloads page and download a Tanzu Kubernetes Grid
OVA for the cluster nodes.
2. For the management cluster, this must be either Photon or Ubuntu based Kubernetes
v1.26.5 OVA.
Note
3. For workload clusters, OVA can have any supported combination of OS and Kubernetes
version, as packaged in a Tanzu Kubernetes release.
Note
Make sure you download the most recent OVA base image templates in the
event of security patch releases. You can find updated base image templates
that include security patches on the Tanzu Kubernetes Grid product
download page.
4. In the vSphere client, right-click an object in the vCenter Server inventory and select Deploy
OVF template.
5. Select Local file, click the button to upload files, and go to the downloaded OVA file on your
local machine.
7. Click Finish to deploy the VM. When the OVA deployment finishes, right-click the VM and
select Template > Convert to Template.
Note
8. If using non administrator SSO account: In the VMs and Templates view, right-click the new
template, select Add Permission, and assign the tkg-user to the template with the TKG role.
For information about how to create the user and role for Tanzu Kubernetes Grid, see Required
Permissions for the vSphere Account.
The management cluster is also where you configure the shared and in-cluster services that the
Run the Tanzu Kubernetes Grid installer, a wizard interface that guides you through the
process of deploying a management cluster. This is the recommended method.
Create and edit YAML configuration files, and use them to deploy a management cluster with
the CLI commands.
The following procedure provides the required steps to deploy Tanzu Kubernetes Grid management
cluster using the installer interface.
1. To launch the UI installer wizard, run the following command on the bootstrap machine:
For example:
4. In the IaaS Provider section, enter the IP/FQDN and credentials of the vCenter server where
the Tanzu Kubernetes Grid management cluster is deployed.
Note
7. Select the data center and provide the SSH public Key generated while configuring the
bootstrap VM.
If you have saved the SSH key in the default location, run the following command in you
bootstrap machine to get the SSH public key.
cat /root/.ssh/id_rsa.pub
8. Click Next.
9. On the Management Cluster Settings section, provide the following details and click Next.
Based on the environment requirements, select appropriate deployment type for the
Tanzu Kubernetes Grid Management cluster:
It is recommended to set the instance type to Large or above. For the purpose of this
document, we will proceed with deployment type Production and instance type
Medium.
Control Plane Endpoint Provider: Select NSX Advanced Load Balancer for Control
Plane HA.
Control Plane Endpoint: This is an optional field. If left blank, NSX Advanced Load
Balancer will assign an IP address from the pool “sfo01-w01-vds01-tkgclustervip”
created earlier.
If you need to provide an IP address, pick an IP address from “sfo01-w01-vds01-
tkgclustervip” static IP pools configured in NSX_ALB and ensure that the IP address
is unused.
Enable Audit Logging: Enable for audit logging for Kubernetes API server and node
VMs. Choose as per your environment needs. For more information, see Audit
Logging.
10. On the NSX Advanced Load Balancer section, provide the following information and click
Next.
Controller Host: NSX Advanced Load Balancer Controller IP/FQDN (IP/FQDN of the
Advanced Load Balancer controller cluster configured)
Controller certificate: Paste the contents of the Certificate Authority that is used to
generate your controller certificate into the Controller Certificate Authority text
box.
11. After these details are provided, click Verify Credentials and choose the following
parameters.
Note
Since Tanzu Kubernetes Grid v2.1.0, you can configure the network to
separate the endpoint VIP network of the cluster from the external IP
network of the load balancer service and the ingress service in the cluster.
This feature lets you ensure the security of the clusters by providing you an
option to expose the endpoint of your management or the workload cluster
and the load balancer service and ingress service in the cluster, in different
networks.
As per the Tanzu for Kubernetes Operations 2.1 Reference Architecture, all the control plane
endpoints connected to Tanzu Kubernetes Grid cluster VIP network and data plane networks
are connected to respective management data VIP network or workload data VIP network.
Cloud Name: Name of the cloud created while configuring NSX Advanced Load
Balancer sfo01w01vc01.
Workload Cluster service Engine Group Name: Name of the service engine group
created for Tanzu Kubernetes Grid workload clusters created while configuring NSX
Advanced Load Balancer sfo01w01segroup01.
Workload Cluster Data Plane VIP Network Name & CIDR: Select Tanzu Kubernetes
Grid workload data network sfo01-w01-vds01-tkgworkloadvip and the subnet
172.16.70.0/24 associated with it.
Workload Cluster Control Plane VIP Network Name & CIDR: Select Tanzu
Kubernetes Grid cluster VIP network sfo01-w01-vds01-tkgclustervip and the
subnet 172.16.80.0/24 associated with it.
Management Cluster service Engine Group Name: Name of the service engine
group created for Tanzu Kubernetes Grid management cluster created while
configuring NSX Advanced Load Balancer sfo01m01segroup01.
Management Cluster Data Plane VIP Network Name & CIDR: Select Tanzu
Kubernetes Grid management data network sfo01-w01-vds01-tkgmanagementvip and
the subnet 172.16.50.0/24 associated with it.
Management Cluster Control Plane VIP Network Name & CIDR: Select Tanzu
Kubernetes Grid cluster VIP network sfo01-w01-vds01-tkgclustervip and the
subnet 172.16.80.0/24 associated with it.
Cluster Labels: Optional. Leave the cluster labels section empty to apply the above
workload cluster network settings by default. If you specify any label here, you must
specify the same values in the configuration YAML file of the workload cluster. Else,
the system places the endpoint VIP of your workload cluster in Management Cluster
Data Plane VIP Network by default.
Note
With the above configuration, all the Tanzu workload clusters use sfo01-w01-
vds01-tkgclustervip for control plane VIP network and sfo01-w01-vds01-
tkgworkloadvip for data plane network by default. If you would like to
configure separate VIP networks for workload control plane/data networks,
create a custom AKO Deployment Config (ADC) and provide the respective
AVI_LABELS in the workload cluster config file. For more information on
network separation and custom ADC creation, see Configure Separate VIP
Networks and Service Engine Groups in Different Workload Clusters.
12. (Optional) On the Metadata page, you can specify location and labels and click Next.
13. On the Resources section, specify the resources to be consumed by Tanzu Kubernetes Grid
management cluster and click Next.
14. On the Kubernetes Network section, select the Tanzu Kubernetes Grid management
network (sfo01-w01-vds01-tkgmanagement) where the control plane and worker nodes are
placed during management cluster deployment. Ensure that the network has DHCP service
enabled. Optionally, change the pod and service CIDR.
If the Tanzu environment is placed behind a proxy, enable proxy and provide proxy details:
If you set http-proxy, you must also set https-proxy and vice-versa. You can
choose to use one proxy for HTTP traffic and another proxy for HTTPS traffic or to
use the same proxy for both HTTP and HTTPS traffic.
Under the no-proxy section, enter a comma-separated list of network CIDRs or host
names that must bypass the HTTP(S) proxy.
Your No Proxy list must include the following: * The IP address or hostname for
vCenter. Traffic to vCenter cannot be proxied.
The CIDR of the vSphere network that you selected under Network Name.
The vSphere network CIDR includes the IP address of your control plane
endpoint. If you entered an FQDN under control plane endpoint, add both
the FQDN and the vSphere network CIDR to the no-proxy section.
Note
15. (Optional) Specify identity management with OIDC or LDAP. For this deployment, identity
management is not enabled.
If you would like to enable identity management, see Enable and Configure Identity
Management During Management Cluster Deployment.
16. Select the OS image to use for deploying the management cluster
Note
This list appears empty if you don’t have a compatible template present in
your environment. See the steps provided in Import Base Image Template
into vSphere.
17. Check the “Participate in the Customer Experience Improvement Program”, if you so desire,
and click Review Configuration.
Note
Tanzu Kubernetes Grid v2.1.0 has a known issue that installer UI populates an
empty NSXALB_LABEL in the cluster configuration and leads to management
cluster creation failure. It is recommended to export the cluster configuration
to a file, delete the empty label, and run the cluster creation command from
CLI instead of deploying the cluster from UI.
18. When you click on Review Configuration, the installer populates the cluster configuration
file, which is located in the ~/.config/tanzu/tkg/clusterconfigs subdirectory, with the
settings that you specified in the interface. You can optionally export a copy of this
configuration file by clicking Export Configuration.
While the cluster is being deployed, you will find that a virtual service is created in NSX
Advanced Load Balancer and new service engines are deployed in vCenter by NSX
Advanced Load Balancer and the service engines are mapped to the SE Group
sfo01m01segroup01.
When Tanzu Kubernetes Grid management cluster is being deployed, behind the scenes:
NSX Advanced Load Balancer service engines get deployed in vCenter and this task is
orchestrated by the NSX Advanced Load Balancer controller.
Service engine status in NSX Advanced Load Balancer: The following snippet shows the
service engines status. They are in the initializing state for sometime and then the status
changes to Up.
Service engine group status in NSX Advanced Load Balancer: As per the configuration, the
virtual service required for Tanzu Kubernetes Grid clusters control plane HA are hosted on
service engine group sfo01m01segroup01.
Virtual service status in NSX Advanced Load Balancer: The cluster is configured with
Production type that deployed 3 control plane nodes, which are placed behind the cluster
VIP.
The installer automatically sets the context to the Tanzu Kubernetes Grid management
cluster on the bootstrap machine. Now you can access the Tanzu Kubernetes Grid
management cluster from the bootstrap machine and perform additional tasks such as
verifying the management cluster health and deploying the workload clusters, etc.
To get the status of Tanzu Kubernetes Grid management cluster, run the following
command:
Use kubectl to get the status of the Tanzu Kubernetes Grid management cluster nodes.
install-ako-for-all: default configuration for all workload clusters. By default, all the
workload clusters reference this file for their virtual IP networks and service engine (SE)
groups. This ADC configuration does not enable NSX L7 Ingress by default.
tanzu-ako-for-shared: Used by shared services cluster to deploy the virtual services in TKG
Mgmt SE Group and the loadbalancer applications in TKG Management VIP Network.
tanzu-ako-for-workload-L7-ingress: Use this ADC only if you would like to enable NSX
Advanced Load Balancer L7 ingress on workload cluster. Otherwise, leave the cluster labels
empty to apply the network configuration from default ADC install-ako-for-all.
apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
finalizers:
- ako-operator.networking.tkg.tanzu.vmware.com
generation: 2
name: <Unique name of AKODeploymentConfig>
spec:
adminCredentialRef:
name: nsx-alb-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: nsx-alb-controller-ca
namespace: tkg-system-networking
cloudName: <NAME OF THE CLOUD in ALB>
clusterSelector:
matchLabels:
<KEY>: <VALUE>
controlPlaneNetwork:
cidr: <TKG-Cluster-VIP-CIDR>
Name: <TKG-Cluster-VIP-Network>
controller: <NSX ALB CONTROLLER IP/FQDN>
dataNetwork:
cidr: <TKG-Mgmt-Data-VIP-CIDR>
name: <TKG-Mgmt-Data-VIP-Name>
extraConfigs:
cniPlugin: antrea
disableStaticRouteSync: true
ingress:
defaultIngressController: false
disableIngressClass: true
nodeNetworkList:
- networkName: <TKG-Mgmt-Network>
serviceEngineGroup: <Mgmt-Cluster-SEG>
The sample AKODeploymentConfig with sample values in place is as follows. You should add the
respective NSX_ALB label type=shared-services while deploying shared services cluster to enforce
this network configuration.
cloud: sfo01w01vc01
apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
finalizers:
- ako-operator.networking.tkg.tanzu.vmware.com
generation: 2
labels:
name: tanzu-ako-for-shared
spec:
adminCredentialRef:
name: NSX_ALB-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: NSX_ALB-controller-ca
namespace: tkg-system-networking
cloudName: sfo01w01vc01
clusterSelector:
matchLabels:
type: shared-services
controlPlaneNetwork:
cidr: 172.16.80.0/24
name: sfo01-w01-vds01-tkgclustervip
controller: 172.16.10.10
dataNetwork:
cidr: 172.16.50.0/24
name: sfo01-w01-vds01-tkgmanagementvip
extraConfigs:
cniPlugin: antrea
disableStaticRouteSync: true
ingress:
defaultIngressController: false
disableIngressClass: true
nodeNetworkList:
- networkName: sfo01-w01-vds01-tkgmanagement
serviceEngineGroup: sfo01m01segroup01
After you have the AKO configuration file ready, use the kubectl command to set the context to
Tanzu Kubernetes Grid management cluster and create the ADC:
Use the following command to list all AKODeploymentConfig created under the management
cluster:
As per the defined architecture, workload cluster cluster control plane endpoint uses TKG Cluster
VIP Network, application loadbalancing uses TKG Workload Data VIP network and the virtual services
are deployed in sfo01w01segroup01 SE group.
Below are the changes in ADC Ingress section when compare to the default ADC.
nodeNetworkList: Provide the values for TKG workload network name and CIDR.
Note
The NSX ALB L7 Ingress feature requires Enterprise edition license. If you do not
wish to enable L7 feature/applied with ALB essentials for Tanzu license, disable the
L7 feature by setting the value disableIngressClass to true.
The format of the AKODeploymentConfig YAML file for enabling NSX ALB L7 Ingress is as follows.
apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
name: <unique-name-for-adc>
spec:
adminCredentialRef:
name: NSX_ALB-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: NSX_ALB-controller-ca
namespace: tkg-system-networking
cloudName: <cloud name configured in nsx alb>
clusterSelector:
matchLabels:
<KEY>: <value>
controller: <ALB-Controller-IP/FQDN>
controlPlaneNetwork:
cidr: <TKG-Cluster-VIP-Network-CIDR>
name: <TKG-Cluster-VIP-Network-CIDR>
dataNetwork:
cidr: <TKG-Workload-VIP-network-CIDR>
name: <TKG-Workload-VIP-network-CIDR>
serviceEngineGroup: <Workload-Cluster-SEG>
extraConfigs:
cniPlugin: antrea
disableStaticRouteSync: false # required
ingress:
disableIngressClass: false # required
nodeNetworkList: # required
- networkName: <TKG-Workload-Network>
cidrs:
- <TKG-Workload-Network-CIDR>
serviceType: NodePortLocal # required
shardVSSize: MEDIUM # required
The AKODeploymentConfig with sample values in place is as follows. You should add the respective
NSX ALB label workload-l7-enabled=true while deploying workload cluster to enforce this network
configuration.
cloud: sfo01w01vc01
apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
name: tanzu-ako-for-workload-l7-ingress
spec:
adminCredentialRef:
name: NSX_ALB-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: NSX_ALB-controller-ca
namespace: tkg-system-networking
cloudName: sfo01w01vc01
clusterSelector:
matchLabels:
workload-l7-enabled: "true"
controller: 172.16.10.10
controlPlaneNetwork:
cidr: 172.16.80.0/24
name: sfo01-w01-vds01-tkgclustervip
dataNetwork:
cidr: 172.16.70.0/24
name: sfo01-w01-vds01-tkgworkloadvip
serviceEngineGroup: sfo01w01segroup01
extraConfigs:
cniPlugin: antrea
disableStaticRouteSync: false # required
ingress:
disableIngressClass: false # required
nodeNetworkList: # required
- networkName: sfo01-w01-vds01-tkgworkload
cidrs:
- 172.16.60.0/24
serviceType: NodePortLocal # required
shardVSSize: MEDIUM # required
Use the kubectl command to set the context to Tanzu Kubernetes Grid management cluster and
create the ADC:
Use the following command to list all AKODeploymentConfig created under the management cluster:
Now that you have successfully created the AKO deployment config, you need to apply the cluster
labels while deploying the workload clusters to enable NSX Advanced Load Balancer L7 Ingress with
NodePortLocal mode.
The procedures for deploying a shared services cluster and workload cluster are almost the same. A
key difference is that for the shared service cluster you add the tanzu-services label to the shared
services cluster, as its cluster role. This label identifies the shared services cluster to the
management cluster and workload clusters.
Shared services cluster uses the custom ADC tanzu-ako-for-shared created earlier to apply the
network settings similar to the management cluster. This is enforced by applying the
NSXALB_LABEL type:shared-services while deploying the shared services cluster.
After the management cluster is registered with Tanzu Mission Control, the deployment of the Tanzu
Kubernetes clusters can be done in just a few clicks. The procedure for creating Tanzu Kubernetes
clusters is as follows.
Note
The scope of this document doesn’t cover the use of a proxy for Tanzu Kubernetes
Grid deployment. If your environment uses a proxy server to connect to the internet,
ensure that the proxy configuration object includes the CIDRs for the pod, ingress,
and egress from the workload network of the Management Cluster in the No proxy
list, as described in Create a Proxy Configuration Object for a Tanzu Kubernetes Grid
Service Cluster Running in vSphere with Tanzu.
1. Navigate to the Clusters tab and click Create Cluster and select Create Tanzu Kubernetes
Grid cluster.
2. Under the Create cluster page, select the management cluster which you registered in the
previous step and click Continue to create cluster.
3. Select the provisioner for creating the shared services cluster. Provisioner reflects the
vSphere namespaces that you have created and associated with the management cluster.
5. Enter a name for the cluster (Cluster names must be unique within an organization).
6. Select the cluster group to which you want to attach your cluster.
In the vCenter and tlsThumbprint fields, enter the details for authentication.
From the template drop down, select the Kubernetes version.The latest supported
version is preselected for you.
In the sshAuthorizedKeys field, enter the SSH key that was created earlier.
Enable aviAPIServerHAProvider.
11. Select the high availability mode for the control plane nodes of the shared services cluster.
For a production deployment, it is recommended to deploy a highly available shared services
cluster.
12. Customize the default node pool for your workload cluster.
Select OS Version.
14. Cluster creation takes approximately 15-20 minutes to complete. After the cluster
deployment completes, ensure that agent and extensions health shows green.
Now that the shared services cluster is successfully created, you may proceed with deploying the
Harbor package. For more information, see Install Harbor in Deploy User-Managed Packages in
Workload Clusters.
1. Connect to the Tanzu Management Cluster context and verify the cluster labels for the
workload cluster.
## Add the tanzu-services label to the shared services cluster as its cluster r
ole. In the following command "sfo01w01tkgshared01" is the name of the shared s
ervice cluster
## Validate that TMC has applied the AVI_LABEL while deploying the cluster
2. Connect to admin context of the workload cluster using the following commands and validate
the ako pod status.
## Use the following command to get the admin context of workload Cluster.
Now that the shared services cluster is successfully created, you may proceed with deploying the
Harbor package. For more information, see Install Harbor for Service Registry.
The steps for deploying a workload cluster are the same as for a shared services cluster. except use
the NSX ALB Labels created for the Workload cluster on AKO Deployment in step number 4.
After the Workload cluster creation verify the cluster labels and ako pod status 1. Connect to the
Tanzu Management Cluster context and verify the cluster labels for the workload cluster. ```bash ##
verify the workload service cluster creation
## Validate that TMC has applied the AVI_LABEL while deploying the cluster
```
<!-- /* cSpell:enable */ -->
1. Connect to admin context of the workload cluster using the following commands and validate
the ako pod status.
## Use the following command to get the admin context of workload Cluster.
You can now configure SaaS components and deploy user-managed packages on the cluster.
For more information about deploying user-managed packages, see Deploy User-Managed
Packages in Workload Clusters.
This document lays out a reference architecture related for VMware Tanzu for Kubernetes
Operations when deployed on a vSphere environment backed by VMware NSX and offers a high-
level overview of the different components.
This reference design is based on the architecture and components described in VMware Tanzu for
Kubernetes Operations Reference Architecture.
For more information about the software versions can be used together, see Interoperability Matrix.
Management Cluster - A management cluster is the first element that you deploy when you create a
Tanzu Kubernetes Grid instance. The management cluster is a Kubernetes cluster that performs the
role of the primary management and operational center for the Tanzu Kubernetes Grid instance. The
management cluster is purpose-built for operating the platform and managing the lifecycle of Tanzu
Kubernetes clusters.
ClusterClass API - Tanzu Kubernetes Grid 2 functions through the creation of a management
Kubernetes cluster which holds ClusterClass API. The ClusterClass API then interacts with the
infrastructure provider to service workload Kubernetes cluster lifecycle requests. The earlier
primitives of Tanzu Kubernetes Clusters will still exist for Tanzu Kubernetes Grid 1.X . A new feature
has been introduced as a part of Cluster API called ClusterClass which reduces the need for
redundant templating and enables powerful customization of clusters. The whole process for creating
a cluster using ClusterClass is the same as before but with slightly different parameters.
Tanzu Kubernetes Cluster - Tanzu Kubernetes clusters are the Kubernetes clusters in which your
application workloads run. These clusters are also referred to as workload clusters. Tanzu
Kubernetes clusters can run different versions of Kubernetes, depending on the needs of the
applications they run.
Shared Service Cluster - Each Tanzu Kubernetes Grid instance can only have one shared services
cluster. You deploy this cluster only if you intend to deploy shared services such as Contour and
Harbor.
Tanzu Kubernetes Cluster Plans - A cluster plan is a blueprint that describes the configuration with
which to deploy a Tanzu Kubernetes cluster. It provides a set of configurable values that describe
settings like the number of control plane machines, worker machines, VM types, and so on. This
release of Tanzu Kubernetes Grid provides two default templates, dev and prod.
Tanzu Kubernetes Grid Instance - A Tanzu Kubernetes Grid instance is the full deployment of Tanzu
Kubernetes Grid, including the management cluster, the workload clusters, and the shared services
cluster that you configure.
Tanzu CLI - A command-line utility that provides the necessary commands to build and operate
Tanzu management and Tanzu Kubernetes clusters. Starting with TKG 2.3.0, Tanzu Core CLI is now
distributed separately from Tanzu Kubernetes Grid. For more information about installing the Tanzu
CLI for use with Tanzu Kubernetes Grid, see Install the Tanzu CLI.
Carvel Tools - Carvel is an open-source suite of reliable, single-purpose, composable tools that aid
in building, configuring, and deploying applications to Kubernetes. Tanzu Kubernetes Grid uses the
following Carvel tools:
ytt - A command-line tool for templating and patching YAML files. You can also use ytt to
collect fragments and piles of YAML into modular chunks for reuse.
kapp - The application deployment CLI for Kubernetes. It allows you to install, upgrade, and
delete multiple Kubernetes resources as one application.
imgpkg - A tool that enables Kubernetes to store configurations and the associated
container images as OCI images, and to transfer these images.
yq - a lightweight and portable command-line YAML, JSON, and XML processor. yq uses jq-
like syntax but works with YAML files as well as JSON and XML.
Bootstrap Machine - The bootstrap machine is the laptop, host, or server on which you download
and run the Tanzu CLI. This is where the initial bootstrapping of a management cluster occurs before
it is pushed to the platform on which it runs.
Tanzu Kubernetes Grid Installer - The Tanzu Kubernetes Grid installer is a CLI or a graphical wizard
that provides an option to deploy a management cluster. You launch this installer locally on the
vSAN
VMFS
NFS
vVols
Tanzu Kubernetes Grid is agnostic to which option you choose. For Kubernetes stateful workloads,
Tanzu Kubernetes Grid installs the vSphere Container Storage interface (vSphere CSI) to
automatically provision Kubernetes persistent volumes for pods.
Tanzu Kubernetes Grid Cluster Plans can be defined by operators to use a certain vSphere datastore
when creating new workload clusters. All developers then have the ability to provision container-
backed persistent volumes from that underlying datastore.
Antrea
Calico
Both are open-source software that provide networking for cluster pods, services, and ingress.
When you deploy a Tanzu Kubernetes cluster using Tanzu Mission Control or Tanzu CLI, Antrea CNI
is automatically enabled in the cluster.
Tanzu Kubernetes Grid also supports Multus CNI which can be installed through Tanzu user-
managed packages. Multus CNI lets you attach multiple network interfaces to a single pod and
associate each with a different address range.
To provision a Tanzu Kubernetes cluster using a non-default CNI, see the following instructions:
Each CNI is suitable for a different use case. The following table lists some common use cases for the
three CNIs that Tanzu Kubernetes Grid supports. This table helps you with information on selecting
the right CNI in your Tanzu Kubernetes Grid implementation.
Ant
Enable Kubernetes pod networking with IP overlay networks using Pros
rea
VXLAN or Geneve for encapsulation. Optionally, you can encrypt
- Provides an option to configure
node-to-node communication using IPSec packet encryption.
egress IP pool or static egress IP for
Antrea supports advanced network use cases like kernel bypass and Kubernetes workloads.
network service mesh.
Cali
Calico is used in environments where factors like network Pros
co
performance, flexibility, and power are essential.
- Support for network policies
For routing packets between nodes, Calico leverages the BGP
- High network performance
routing protocol instead of an overlay network. This eliminates the
need to wrap packets with an encapsulation layer resulting in - SCTP support
increased network performance for Kubernetes workloads.
Cons
- No multicast support
Mul Multus CNI provides multiple interfaces per each Kubernetes pod.
Pros
tus Using Multus CRDs, you can specify which pods get which interfaces
and allow different interfaces depending on the use case. - Separation of data/control planes.
You can perform the following tasks by using the routable IP addresses on pods:
Trace outgoing requests to common shared services, due to their source IP address is the
routable pod IP address and not a NAT address.
Support authenticated incoming requests from the external internet directly to pods by
bypassing NAT.
Note
The scope of this document is limited to VMware NSX Data Center Networking with
NSX Advanced Load Balancer Enterprise Edition.
You can configure NSX Advanced Load Balancer in Tanzu Kubernetes Grid as:
The L7 ingress service provider for the applications in the clusters that are deployed on
vSphere.
Each workload cluster integrates with NSX Advanced Load Balancer by running an Avi Kubernetes
Operator (AKO) on one of its nodes. The cluster’s AKO calls the Kubernetes API to manage the
lifecycle of load balancing and ingress resources for its workloads.
NSX Advanced Load Balancer Controller - NSX Advanced Load Balancer controller
manages virtual service objects and interacts with the vCenter Server infrastructure to
manage the lifecycle of the service engines (SEs). It is the central repository for the
configurations and policies related to services and management, and it provides the portal for
viewing the health of VirtualServices and SEs and the associated analytics that NSX
Advanced Load Balancer provides.
NSX Advanced Load Balancer Service Engine - The service engines (SEs) are lightweight
VMs that handle all data plane operations by receiving and executing instructions from the
controller. The SEs perform load balancing and all client- and server-facing network
interactions.
Service Engine Group - Service engines are created within a group, which contains the
definition of how the SEs should be sized, placed, and made highly available. Each cloud has
at least one SE group.
Cloud - Clouds are containers for the environment that NSX Advanced Load Balancer is
installed or operating within. During the initial setup of NSX Advanced Load Balancer, a
default cloud, named Default-Cloud, is created. This is where the first controller is deployed
into Default-Cloud. Additional clouds may be added containing SEs and virtual services.
Avi Kubernetes Operator (AKO) - It is a Kubernetes operator that runs as a pod in the
Supervisor Cluster and Tanzu Kubernetes clusters, and it provides ingress and load balancing
functionality. AKO translates the required Kubernetes objects to NSX Advanced Load
Balancer objects and automates the implementation of ingresses, routes, and services on the
service engines (SE) through the NSX Advanced Load Balancer Controller.
AKO Operator (AKOO) - This is an operator which is used to deploy, manage, and remove
the AKO pod in Kubernetes clusters. This operator when deployed creates an instance of the
AKO controller and installs all the relevant objects like:
AKO Statefulset
Tanzu Kubernetes Grid management clusters have an AKO operator installed out-of-the-box during
cluster deployment. By default, a Tanzu Kubernetes Grid management cluster has a couple of
AkoDeploymentConfig created which dictates when and how AKO pods are created in the workload
clusters. For more information, see AKO Operator documentation.
Optionally, you can enter one or more cluster labels to identify clusters on which to selectively
enable NSX ALB or to customize NSX ALB settings for different groups of clusters. This is useful in
the following scenarios: - You want to configure different sets of workload clusters to different
Service Engine Groups to implement isolation or to support more Service type Load Balancers than
one Service Engine Group’s capacity. - You want to configure different sets of workload clusters to
different Clouds because they are deployed in different sites.
To enable NSX ALB selectively rather than globally, add labels in the format key: value pair in the
management cluster config file. This will create a default AKO Deployment Config (ADC) on
management cluster with the NSX ALB settings provided. Labels that you define here will be used to
create a label selector. Only workload cluster objects that have the matching labels will have the load
balancer enabled.
To customize the NSX ALB settings for different groups of clusters, create an AKO Deployment
Config (ADC) on management cluster by customizing the NSX ALB settings, and providing a unique
label selector for the ADC. Only the workload cluster objects that have the matching labels will have
these custom settings applied.
You can label the cluster during the workload cluster deployment or label it manually post cluster
creation. If you define multiple key-values, you need to apply all of them. - Provide an AVI_LABEL
in the below format in the workload cluster deployment config file, and it will automatically label the
cluster and select the matching ADC based on the label selector during the cluster deployment.
AVI_LABELS: | 'type': 'tkg-workloadset01' - Optionally, you can manually label the cluster object
of the corresponding workload cluster with the labels defined in ADC. kubectl label cluster
<cluster-name> type=tkg-workloadset01
Each environment configured in NSX Advanced Load Balancer is referred to as a cloud. Each cloud
in NSX Advanced Load Balancer maintains networking and service engine settings. The cloud is
configured with one or more VIP networks to provide IP addresses for load balancing (L4/L7) virtual
services created under that cloud.
The virtual services can be spanned across multiple service Engines if the associated Service Engine
Group is configured in Active/Active HA mode. A service engine can belong to only one SE group
at a time.
IP address allocation for virtual services can be over DHCP or through the in-built IPAM functionality
of NSX Advanced Load Balancer. The VIP networks created or configured in NSX Advanced Load
Balancer are associated with the IPAM profile.
Network Architecture
For the deployment of Tanzu Kubernetes Grid in the VMware NSX environment, it is required to
build separate networks for the Tanzu Kubernetes Grid management clusters, workload clusters,
NSX Advanced Load Balancer management, cluster-VIP, and workload VIP network for control
plane HA and application load balancing/ingress.
The network reference design can be mapped into this general framework. This design uses a single
VIP network for control plane L4 load balancing and application L4/L7. This design is mostly suited
for dev/test environment.
Another reference design that can be implemented in production environment is shown below, and
it uses separate VIP network for the applications deployed in management/shared services and the
workload cluster.
Isolate and separate SDDC management components (vCenter, ESX) from the Tanzu
Kubernetes Grid components. This reference design allows only the minimum connectivity
between the Tanzu Kubernetes Grid clusters and NSX Advanced Load Balancer to the
vCenter Server.
Isolate and separate NSX Advanced Load Balancer management network from the Tanzu
Kubernetes Grid management segment and the Tanzu Kubernetes Grid workload segments.
Depending on the workload cluster type and use case, multiple workload clusters may
leverage the same workload network or new networks can be used for each workload
cluster. To isolate and separate Tanzu Kubernetes Grid workload cluster networking from
each other, it is recommended to use separate networks for each workload cluster and
configure the required firewall between these networks. For more information, see Firewall
Requirements.
Separate provider and tenant access to the Tanzu Kubernetes Grid environment.
Only provider administrators need access to the Tanzu Kubernetes Grid management
cluster. This prevents tenants from attempting to connect to the Tanzu Kubernetes
Grid management cluster.
Network Requirements
As per the production architecture, the following list of networks are required:
DHCP
Network Type Description & Recommendations
Service
TKG Management Yes Control plane and worker nodes of TKG management cluster are attached to
Logical Segment this network.
TKG Shared Service Yes Control plane and worker nodes of TKG shared services cluster are attached to
Logical Segment this network.
TKG Workload Logical Yes Control plane and worker nodes of TKG workload clusters are attached to this
Segment network.
TKG Management VIP No Virtual services for control plane HA of all TKG clusters (management, shared
Logical Segment services, and workload).
Reserve sufficient IP addresses depending on the number of TKG clusters
planned to be deployed in the environment.
The NSX Advanced Load Balancer takes care of IPAM on this network.
TKG Workload VIP No Virtual services for applications deployed in the workload cluster. The
Logical Segment applications can be of type Load balancer or Ingress.
Reserve sufficient IP addresses depending on the number of applications
planned to be deployed in the environment.
NSX Advanced Load Balancer takes care of IPAM on this network.
Note
You can also select TKG Workload VIP network for control plane HA of the workload
cluster if you wish so.
For this demonstration, this document uses the following subnet CIDR for Tanzu for Kubernetes
Operations deployment.
Gateway
Network Type Segment Name DHCP Pool NSX ALB IP Pool
CIDR
Gateway
Network Type Segment Name DHCP Pool NSX ALB IP Pool
CIDR
These networks are spread across the tier-1 gateways shown in the reference architecture diagram.
You must configure the appropriate firewall rules on the tier-1 gateways for a successful deployment.
Firewall Requirements
To prepare the firewall, you must collect the following information:
Configured
Source Destination Protocol:Port Description
On
NSX Advanced Load vCenter and ESXi hosts TCP:443 Allows NSX ALB to discover NSX ALB
Balancer controllers and vCenter objects and deploy Tier-1
Cluster IP address SEs as required. Gateway
NSX Advanced Load NSX nodes and VIP TCP:443 Allows NSX ALB to discover NSX ALB
Balancer controllers and address. NSX Objects (logical routers Tier-1
Cluster IP address and logical segments, and so Gateway
on).
Client Machine NSX Advanced Load TCP:443 To access NSX Advanced Load NSX ALB
Balancer controllers and Balancer portal. Tier-1
Cluster IP address Gateway
Configured
Source Destination Protocol:Port Description
On
NSX Advanced Load TCP:443 Allow Avi Kubernetes Operator TKG Mgmt
TKG management
Balancer management (AKO) and AKO Operator Tier-1
network CIDR
network CIDR (AKOO) access to NSX ALB Gateway
TKG shared services controller.
network CIDR
TKG workload network vCenter Server TCP:443 Allows components to access TKG
CIDR vCenter to create VMs and Workload
storage volumes. Tier-1
Gateway
TKG workload network TKG Management VIP TCP:6443 Allow TKG workload clusters to TKG
CIDR Network register with TKG management Workload
cluster. Tier-1
Gateway
Configured
Source Destination Protocol:Port Description
On
TKG workload network NSX Advanced Load TCP:443 Allow Avi Kubernetes Operator TKG
CIDR Balancer management (AKO) and AKO Operator Workload
network CIDR (AKOO) access to NSX ALB Tier-1
controller. Gateway
Installation Experience
Tanzu Kubernetes Grid management cluster is the first component that you deploy to get started
with Tanzu Kubernetes Grid.
You can deploy the management cluster in one of the following ways:
Run the Tanzu Kubernetes Grid installer, a wizard interface that guides you through the
process of deploying a management cluster. This is the recommended method if you are
installing a Tanzu Kubernetes Grid management cluster for the first time.
Create and edit YAML configuration files, and use them to deploy a management cluster with
the CLI commands.
By using the current version of the The Tanzu Kubernetes Grid Installation user interface, you can
install Tanzu Kubernetes Grid on VMware vSphere, AWS, and Microsoft Azure. The UI provides a
guided experience tailored to the IaaS, in this case on VMware vSphere backed by NSX-T Data
Center networking.
The installation of Tanzu Kubernetes Grid on vSphere is done through the same UI as mentioned
above but tailored to a vSphere environment.
This installation process takes you through the setup of a management cluster on your vSphere
environment. Once the management cluster is deployed, you can make use of Tanzu Mission
Control or Tanzu CLI to deploy Tanzu Kubernetes shared service and workload clusters.
Contour is an open-source controller for Kubernetes ingress routing. Contour can be installed in the
shared services cluster on any Tanzu Kubernetes Cluster. Deploying Contour is a prerequisite if you
want to deploy Prometheus, Grafana, and Harbor packages on a workload cluster.
For more information about Contour, see the Contour website and Implementing Ingress Control
with Contour.
Another option is to use the NSX Advanced Load Balancer Kubernetes ingress controller that offers
an advanced L4-L7 load balancing/ingress for containerized applications that are deployed in the
Tanzu Kubernetes workload cluster.
For more information about the NSX ALB ingress controller, see Configuring L7 Ingress with NSX
Advanced Load Balancer.
Tanzu Service Mesh, which is a SaaS offering for modern applications running across multi-cluster,
multi-clouds, also offers an ingress controller based on Istio.
The following table provides general recommendations about using a specific ingress controller for
your Kubernetes environment.
Ingress
Use Cases
Controller
Contour
Use Contour when only north-south traffic is needed in a Kubernetes cluster. You can apply security
policies for the north-south traffic by defining the policies in the application’s manifest file.
Istio Use Istio ingress controller when you intend to provide security, traffic direction, and insights within
the cluster (east-west traffic) and between the cluster and the outside world (north-south traffic).
NSX ALB
Use NSX ALB ingress controller when a containerized application requires features like local and
ingress
global server load balancing (GSLB), web application firewall (WAF), performance monitoring, and so
controller
on.
NSX Advanced Load Balancer provides an L4+L7 load balancing solution for vSphere. It includes a
Kubernetes operator that integrates with the Kubernetes API to manage the lifecycle of load
balancing and ingress resources for workloads.
Legacy ingress services for Kubernetes include multiple disparate solutions. The services and
products contain independent components that are difficult to manage and troubleshoot. The ingress
services have reduced observability capabilities with little analytics, and they lack comprehensive
visibility into the applications that run on the system. Cloud-native automation is difficult in the legacy
ingress services.
In comparison to the legacy Kubernetes ingress services, NSX Advanced Load Balancer has
comprehensive load balancing and ingress services features. As a single solution with a central
control, NSX Advanced Load Balancer is easy to manage and troubleshoot. NSX Advanced Load
Balancer supports real-time telemetry with an insight into the applications that run on the system.
The elastic auto-scaling and the decision automation features highlight the cloud-native automation
capabilities of NSX Advanced Load Balancer.
NSX Advanced Load Balancer also lets you configure L7 ingress for your workload clusters by using
one of the following options:
This option enables NSX Advanced Load Balancer L7 ingress capabilities, including sending traffic
directly from the service engines (SEs) to the pods, preventing multiple hops that other ingress
solutions need when sending packets from the load balancer to the right node where the pod runs.
The NSX Advanced Load Balancer controller creates a virtual service with a backend pool with the
pod IPs which helps to send the traffic directly to the pods.
However, each workload cluster needs a dedicated SE group for Avi Kubernetes Operator (AKO) to
work, which could increase the number of SEs you need for your environment. This mode is used
when you have a small number of workload clusters.
The NodePort mode is the default mode when AKO is installed on Tanzu Kubernetes Grid. This
option allows your workload clusters to share SE groups and is fully supported by VMware. With this
option, the services of your workloads must be set to NodePort instead of ClusterIP even when
accompanied by an ingress object. This ensures that NodePorts are created on the worker nodes
and traffic can flow through the SEs to the pods via the NodePorts. Kube-Proxy, which runs on each
node as DaemonSet, creates network rules to expose the application endpoints to each of the nodes
in the format “NodeIP:NodePort”. The NodePort value is the same for a service on all the nodes. It
exposes the port on all the nodes of the Kubernetes Cluster, even if the pods are not running on it.
This feature is supported only with Antrea CNI. You must enable this feature on a workload cluster
before its creation. The primary difference between this mode and the NodePort mode is that the
traffic is sent directly to the pods in your workload cluster through node ports without interfering
Kube-proxy. With this option, the workload clusters can share SE groups. Similar to the ClusterIP
Mode, this option avoids the potential extra hop when sending traffic from the NSX Advanced Load
Balancer SEs to the pod by targeting the right nodes where the pods run.
Antrea agent configures NodePortLocal port mapping rules at the node in the format
“NodeIP:Unique Port” to expose each pod on the node on which the pod of the service is running.
The default range of the port number is 61000-62000. Even if the pods of the service are running
on the same Kubernetes node, Antrea agent publishes unique ports to expose the pods at the node
level to integrate with the load balancer.
This option does not have all the NSX Advanced Load Balancer L7 ingress capabilities but uses it for
L4 load balancing only and leverages Contour for L7 Ingress. This also allows sharing SE groups
across workload clusters. This option is supported by VMware and it requires minimal setup.
Design Recommendations
NSX Advanced Load Balancer Recommendations
The following table provides the recommendations for configuring NSX Advanced Load Balancer in
a vSphere environment backed by NSX networking.
Decision
Design Decision Design Justification Design Implications
ID
TKO- Deploy NSX ALB controller cluster Isolate NSX ALB traffic from Additional Network
ALB-001 nodes on a network dedicated to infrastructure management traffic and (VLAN) is required.
NSX ALB. Kubernetes workloads.
TKO- Deploy 3 NSX ALB controller To achieve high availability for the Clustered mode requires
ALB- nodes. NSX ALB platform. more Compute and
002 In clustered mode, NSX ALB Storage resources.
availability is not impacted by an
individual controller node failure. The
failed node can be removed from the
cluster and redeployed if recovery is
not possible.
TKO- Initial setup should be done only NSX Advanced Load Balancer NSX Advanced Load
ALB- on one NSX Advanced Load controller cluster is created from an Balancer controller cluster
003 Balancer controller VM out of the initialized NSX Advanced Load creation fails if more than
three deployed to create an NSX Balancer controller which becomes one NSX Advanced Load
Advanced Load Balancer controller the cluster leader. Balancer controller is
cluster. Follower NSX Advanced Load initialized.
Balancer controller nodes need to be
uninitialized to join the cluster.
Decision
Design Decision Design Justification Design Implications
ID
TKO- Use static IP addresses for the NSX NSX ALB controller cluster uses NSX ALB Controller
ALB- ALB controllers. management IP addresses to form and control plane might go
004 maintain quorum for the control plane down if the management
cluster. Any changes to management IP addresses of the
IP addresses are disruptive. controller node change.
TKO- Use NSX ALB IPAM for service Guarantees IP address assignment for None
ALB- engine data network and virtual service engine data NICs and virtual
005 services. services.
TKO- Reserve an IP address in the NSX NSX ALB portal is always accessible None
ALB- ALB management subnet to be over cluster IP address regardless of a
006 used as the cluster IP address for specific individual controller node
the controller cluster. failure.
TKO- Share service engines for the same Minimize the licensing cost.
Each service engine
ALB- type of workload (dev/test/prod)
contributes to the CPU
007 clusters.
core capacity associated
with a license.
TKO- Configure anti-affinity rules for the This is to ensure that no two Anti-affinity rules need to
ALB- NSX ALB controller cluster. controllers end up in same ESXi host be created manually.
008 and thus avoid single point of failure.
TKO- Configure backup for the NSX ALB Backups are required if the NSX ALB To store backups, a SCP
ALB- Controller cluster. Controller becomes inoperable or if capable backup location is
009 the environment needs to be restored needed. SCP is the only
from a previous state. supported protocol
currently.
TKO- Replace default NSX ALB To establish a trusted connection with None,
ALB-011 certificates with Custom CA or other infra components, the default SAN entries are not
Public CA-signed certificates that certificate doesn’t include SAN entries applicable if using wild
contains SAN entries of all which is not acceptable by Tanzu. card certificate.
Controller nodes
TKO- Create a dedicated resource pool Guarantees the CPU and Memory None
ALB-012 with appropriate reservations for allocation for NSX ALB Controllers
NSX ALB controllers. and avoids performance degradation
in case of resource contention.
TKO- Configure Remote logging for NSX For operations teams to centrally Additional Operational
ALB-013 ALB Controller to send events on monitor NSX ALB and escalate alerts Overhead.
Syslog. events sent from the NSX ALB Additional infrastructure
Controller. Resource.
Decision
Design Decision Design Justification Design Implications
ID
TKO- Use LDAP/SAML based Helps to Maintain Role based Access Additional Configuration
ALB-014 Authentication for NSX ALB Control. is required.
TKO- Configure SE Group for Provides optimum Certain applications might not work in
ALB-SE- Active/Active HA mode. resiliency, performance, Active/Active mode. For example,
001 and utilization. applications that require preserving client IP
address. In such cases, use the legacy
Active/Standby HA mode.
TKO- Configure anti-affinity rule This is ensure that no two DRS must be enabled on vSphere cluster
ALB-SE- for the SE VMs. SEs in the same SE group where SE VMs are deployed.
002 end up on same ESXi Host
and thus avoid single point
of failure.
TKO- Configure CPU and This is to ensure that CPU and memory reservation is configured
ALB-SE- Memory reservation for the service engines don’t at SE Group level.
003 SE VMs. compete with other VMs
during resource
contention.
TKO- Create multiple SE groups Allows efficient isolation of Multi SE groups will increase the licensing
ALB-SE- as desired to isolate applications for better cost.
005 applications. capacity planning.
Allows flexibility of life-
cycle-management.
TKO- Create separate service Allows isolating the load Dedicated service engine groups increase
ALB-SE- engine groups for TKG balancing traffic of licensing cost.
006 management and workload management cluster from
clusters. shared services cluster and
workload clusters.
TKO- Set ‘Placement across the This allows maximum fault None
ALB-SE- Service Engines’ setting to tolerance and even
007 ‘distributed’. utilization of capacity.
TKO- Set the SE size to a This configuration should For services that require higher throughput,
ALB-SE- minimum 2vCPU and 4GB meet the most generic use these configurations need to be investigated
008 of Memory. case. and modified accordingly.
TKO- Deploy 1. Network hop efficiency is 1. This is supported only with Antrea CNI. ß
ALB-L7- NSX ALB gained by bypassing the kube- 2. ‘NodePortLocal’ mode is currently only supported
001 L7 Ingress proxy to receive external traffic for nodes running Linux or Windows with IPv4
in to applications. addresses. Only TCP and UDP service ports are
NodePortL 2. TKG clusters can share SE supported (not SCTP). For more information, see
ocal groups, optimizing or Antrea NodePortLocal Documentation.
mode. maximizing capacity and
license consumption.
3. Pod’s node port only exists
on nodes where the Pod is
running, and it helps to reduce
the east-west traffic and
encapsulation overhead.
4. Better session persistence.
VMware recommends using NSX Advanced Load Balancer L7 ingress with the NodePortLocal mode
as it gives you a distinct advantage over other modes as mentioned below:
Although there is a constraint of one SE group per Tanzu Kubernetes Grid cluster, which
results in increased license capacity, ClusterIP provides direct communication to the
Kubernetes pods, enabling persistence and direct monitoring of individual pods.
NodePort resolves the issue for needing a SE group per workload cluster, but a kube-proxy
is created on each and every workload node even if the pod doesn’t exist in it, and there’s
no direct connectivity. Persistence is then broken.
NodePortLocal is the best of both use cases. Traffic is sent directly to the pods in your
workload cluster through node ports without interfering with kube-proxy. SE groups can be
shared and load balancing persistence is supported.
Network Recommendations
The key network recommendations for a production-grade Tanzu Kubernetes Grid deployment with
NSX Data Center Networking are as follows:
Decision
Design Decision Design Justification Design Implications
ID
TKO- Use separate logical segments for To have a flexible firewall and Sharing the same network for
NET-001 management cluster, shared services security policies. multiple clusters can complicate
cluster, workload clusters, and VIP firewall rules creation.
network.
TKO- Configure DHCP for each TKG cluster Tanzu Kubernetes Grid does IP address pool can be used for
NET- network. not support static IP address the TKG clusters in absence of
002 assignments for Kubernetes the DHCP.
VM components.
TKO- Use NSX for configuring DHCP. This avoids setting up For a simpler configuration,
NET- dedicated DHCP server for make use of the DHCP local
003 TKG. server to provide DHCP services
for required segments.
Decision
Design Decision Design Justification Design Implications
ID
TKO- Create a overlay-backed NSX This network is used for the None
NET- segment connected to a Tier-1 controller to the SE
004 gateway for the SE management for connectivity.
the NSX Cloud of overlay type.
TKO- Create a overlay-backed NSX The SEs are placed on overlay None
NET- segment as data network for the NSX Segments created on Tier-1
005 Cloud of overlay type. gateway.
With Tanzu Kubernetes Grid 2.3 and above, you can use Node IPAM, which simplifies the allocation
and management of IP addresses for cluster nodes within the cluster. This eliminates the need for
external DHCP configuration. The Node IPAM can be configured for standalone management
clusters on vSphere, and the associated class-based workload clusters that they manage. In the
Tanzu Kubernetes Grid Management configuration file, a dedicated Node IPAM pool is defined for
the management cluster only. The following types of Node IPAM pools are available for workload
clusters: - InClusterIPPool - Configures IP pools that are only available to workload clusters in the
same management cluster namespace. For example, default. - GlobalInClusterIPPool - Configures
IP pools with addresses that can be allocated to workload clusters across multiple namespaces. Node
IPAM in TKG provides flexibility in managing IP addresses for both management and workload
clusters that allows efficient IP allocation and management within the cluster environment.
TKO- Register management cluster Tanzu Mission Control automates the creation of Only Antrea CNI is
TKG-001 with Tanzu Mission Control the Tanzu Kubernetes clusters and manages the supported on
(TMC). lifecycle of all clusters centrally. Workload clusters
created via TMC
Portal.
TKO- Use NSX Advanced Load Eliminates the requirement for an external load Adds NSX Advanced
TKG- Balancer as your control plane balancer and additional configuration changes Load Balancer
002 endpoint provider and for on your Tanzu Kubernetes Grid clusters. License cost to the
application load balancing. solution.
TKO- Deploy Tanzu Kubernetes Large form factor should suffice to integrate TKG
Consume more
TKG- Management cluster in large Management cluster with TMC, pinniped and
resources from
003 form factor. Velero. This must be capable of accommodating
infrastructure.
100+ Tanzu Workload Clusters.
TKO- Deploy the Tanzu Kubernetes Deploying three control plane nodes ensures the
Consume more
TKG- Cluster with prod state of your Tanzu Kubernetes Cluster control
resources from
004 plan(Management and plane stays healthy in the event of a node failure.
infrastructure.
Workload Clusters).
Decision
Design Decision Design Justification Design Implications
ID
Container Registry
VMware Tanzu for Kubernetes Operations using Tanzu Kubernetes Grid includes Harbor as a
container registry. Harbor provides a location for pushing, pulling, storing, and scanning container
images used in your Kubernetes clusters.
Harbor registry is used for day-2 operations of the Tanzu Kubernetes workload clusters. Typical day-
2 operations include tasks such as pulling images from Harbor for application deployment, pushing
custom images to Harbor, and so on.
VM-based deployment using OVA - VMware recommends this installation method in cases
where Tanzu Kubernetes Grid is being installed in an air-gapped or Internet-restricted
environment, and no pre-existing image registry exists to host the Tanzu Kubernetes Grid
system images. VM-based deployments are only supported by VMware Global Support
Services to host the system images for air-gapped or Internet-restricted deployments. Do
not use this method for hosting application images.
If you are deploying Harbor without a publicly signed certificate, you must include the Harbor root
CA in your Tanzu Kubernetes Grid clusters. To do so, follow the procedure in Trust Custom CA
Certificates on Cluster Nodes.
Prometheus is an open-source system monitoring and alerting toolkit. It can collect metrics from
target clusters at specified intervals, evaluate rule expressions, display the results, and trigger alerts if
certain conditions arise. The Tanzu Kubernetes Grid implementation of Prometheus includes Alert
Manager, which you can configure to notify you when certain events occur.
Grafana is open-source visualization and analytics software. It allows you to query, visualize, alert on,
and explore your metrics no matter where they are stored. Both Prometheus and Grafana are
installed through user-managed Tanzu packages by creating the deployment manifests and invoking
the tanzu package install command to deploy the packages in the Tanzu Kubernetes clusters.
The following diagram shows how the monitoring components on a cluster interact.
You can use out-of-the-box Kubernetes dashboards or you can create new dashboards to monitor
compute, network, and storage utilization of Kubernetes objects such as Clusters, Namespaces,
Pods, and so on.
You can also monitor your Tanzu Kubernetes Grid clusters with Tanzu Observability which is a SaaS
offering by VMware. Tanzu Observability provides various out-of-the-box dashboards. You can
customize the dashboards for your particular deployment. For information on how to customize
Tanzu Observability dashboards for Tanzu for Kubernetes Operations, see Customize Tanzu
Observability Dashboard for Tanzu for Kubernetes Operations.
Log processing and forwarding in Tanzu Kubernetes Grid is provided via Fluent Bit. Fluent bit
binaries are available as part of extensions and can be installed on management cluster or in
workload cluster. Fluent Bit is a light-weight log processor and forwarder that allows you to collect
data and logs from different sources, unify them, and send them to multiple destinations. VMware
Tanzu Kubernetes Grid includes signed binaries for Fluent Bit that you can deploy on management
clusters and on Tanzu Kubernetes clusters to provide a log-forwarding service.
Fluent Bit uses the Input Plug-ins, the filters, and the Output Plug-ins. The Input Plug-ins define the
source from where it can collect data, and the Output plug-ins define the destination where it should
send the information. The Kubernetes filter will enrich the logs with Kubernetes metadata,
specifically labels and annotations. Once you configure Input and Output plug-ins on the Tanzu
Kubernetes Grid cluster. Fluent Bit is installed as a user-managed package.
Fluent Bit integrates with logging platforms such as VMware Aria Operations for Logs, Elasticsearch,
Kafka, Splunk, or an HTTP endpoint. For more details about configuring Fluent Bit to your logging
provider, see Implement Log Forwarding with Fluent Bit.
A custom image must be based on the operating system (OS) versions that are supported by Tanzu
Kubernetes Grid. The table below provides a list of the operating systems that are supported for
building custom images for Tanzu Kubernetes Grid.
- Photon OS 3
- Windows 2019
For more information about building custom images for Tanzu Kubernetes Grid, see Build Machine
Images.
VMware provides FIPS-capable Kubernetes OVA which can be used to deploy FIPS compliant
Tanzu Kubernetes Grid management and workload clusters. Tanzu Kubernetes Grid core
To learn more about Tanzu Kubernetes Grid integration with Tanzu SaaS, see Tanzu SaaS Services.
On vSphere, you can configure all node VMs to have the same predefined configurations, set
different predefined configurations for control plane and worker nodes, or customize the
configurations of the nodes. By using these settings, you can create clusters that have nodes with
different configuration compared to the configuration of management cluster nodes. You can also
create clusters in which the control plane nodes and worker nodes have different configuration.
Small 2 4 20
Medium 2 8 40
Large 4 16 40
Extra-large 8 32 80
To create a cluster in which all of the control plane and worker node VMs are the same size, specify
the SIZE variable. If you set the SIZE variable, all nodes are created with the configuration that you
set.
SIZE: "large"
To create a cluster in which the control plane and worker node VMs are different sizes, specify the
CONTROLPLANE_SIZE and WORKER_SIZE options.
CONTROLPLANE_SIZE: "medium"
WORKER_SIZE: "large"
You can combine the CONTROLPLANE_SIZE and WORKER_SIZE options with the SIZE option. For
example, if you specify SIZE: "large" with WORKER_SIZE: "extra-large", the control plane nodes
are set to large and worker nodes are set to extra-large.
SIZE: "large"
WORKER_SIZE: "extra-large"
To use the same custom configuration for all nodes, specify the VSPHERE_NUM_CPUS,
VSPHERE_DISK_GIB, and VSPHERE_MEM_MIB options.
VSPHERE_NUM_CPUS: 2
VSPHERE_DISK_GIB: 40
VSPHERE_MEM_MIB: 4096
To define different custom configurations for control plane nodes and worker nodes, specify the
VSPHERE_CONTROL_PLANE_* and VSPHERE_WORKER_* options.
VSPHERE_CONTROL_PLANE_NUM_CPUS: 2
VSPHERE_CONTROL_PLANE_DISK_GIB: 20
VSPHERE_CONTROL_PLANE_MEM_MIB: 8192
VSPHERE_WORKER_NUM_CPUS: 4
VSPHERE_WORKER_DISK_GIB: 40
VSPHERE_WORKER_MEM_MIB: 4096
Throughput 4 Gb/s
Connections/s 40k
Multiple performance vectors or features may have an impact on performance. For instance, to
achieve 1 Gb/s of SSL throughput and 2000 TPS of SSL with EC certificates, NSX Advanced Load
Balancer recommends two cores.
NSX Advanced Load Balancer SEs may be configured with as little as 1 vCPU core and 1 GB RAM, or
up to 36 vCPU cores and 128 GB RAM. SEs can be deployed in Active/Active or Active/Standby
mode depending on the license tier used. NSX Advanced Load Balancer Essentials license doesn’t
support Active/Active HA mode for SE.
Summary
Tanzu Kubernetes Grid on vSphere offers high-performance potential, convenience, and addresses
the challenges of creating, testing, and updating on-premises Kubernetes platforms in a consolidated
production environment. This validated approach results in a near-production quality installation with
all the application services needed to serve combined or uniquely separated workload types through
a combined infrastructure solution.
This plan meets many day-0 needs for quickly aligning product capabilities to full stack infrastructure,
including networking, firewalling, load balancing, workload compute alignment, and other
capabilities. Observability is quickly established and easily consumed with Tanzu Observability.
Deployment Instructions
For instructions to deploy this reference design, see Deploy VMware Tanzu for Kubernetes
Operations on VMware vSphere with VMware NSX.
The scope of the document is limited to providing deployment steps based on the reference design
in VMware Tanzu for Kubernetes Operations on vSphere with NSX-T. It does not cover deployment
procedures for the underlying SDDC components.
VMware Service Installer for Tanzu automates the deployment of the reference designs for Tanzu for
Kubernetes Operations. It uses best practices for deploying and configuring the required Tanzu for
Kubernetes Operations components.
To use Service Installer to automate this deployment, see Deploying VMware Tanzu for Kubernetes
Operations on vSphere with NSX-T Using Service Installer for VMware Tanzu.
Alternatively, if you decide to manually deploy each component, follow the steps provided in this
document.
For the latest information about the software versions can be used together, see the Interoperability
Matrix.
General Requirements
Network Requirements
Firewall Requirements
General Requirements
A vCenter with NSX backed environment.
Note
NSX manager instance is deployed and configured with Advanced or higher license.
vCenter Server that is associated with the NSX Data Center is configured as Compute
Manager.
IP pools for host and edge tunnel endpoints (TEP) are created.
Transport node profiles are created. This is not required if configuring NSX data
center on each host instead of the cluster.
NSX data center configured on all hosts part of the vSphere cluster or clusters.
A datastore with sufficient capacity for the control plane and worker node VM files.
Network time protocol (NTP) service is running on all hosts and vCenter.
Depending on the OS flavor of the bootstrap VM, download and configure the
following packages from VMware Customer Connect. To configure required
packages on the Cent OS machine, see Deploy and Configure Bootstrap Machine."
Download and import NSX Advanced Load Balancer 22.1.3 OVA to Content Library.
Download the following OVA files from VMware Customer Connect and import to
vCenter. Convert the imported VMs to templates."
Note
You can also download supported older versions of Kubernetes from VMware
Customer Connect and import them to deploy workload clusters on the intended
Kubernetes versions.
The sample entries of the resource pools and folders that need to be created are as follows:
Network Requirements
Create separate logical segments in NSX for deploying TKO components as per Network
Requirements defined in the reference architecture.
Firewall Requirements
Ensure that the firewall is set up as described in Firewall Requirements.
Gateway
Network Type Segment Name DHCP Pool in NSXT NSX ALB IP Pool
CIDR
Deployment Overview
The steps for deploying Tanzu for Kubernetes Operations on vSphere backed by NSX-T are as
follows:
5. Register Tanzu Kubernetes Grid Management Cluster with Tanzu Mission Control
5. Select a tier-0 gateway to connect to this tier-1 gateway to create a multi-tier topology.
6. Select an NSX Edge cluster. This is required for this tier-1 gateway to host stateful services
such as NAT, load balancer, or firewall.
7. (Optional) In the Edges field, select Auto Allocated or manually set the edge nodes.
8. Select a failover mode or accept the default. The default option is Non-preemptive.
10. Click Route Advertisement and ensure that following routes are selected:
12. Repeat steps from 1-11 and create another Tier-1 gateway.
Complete the following steps to set the DHCP configuration in both the tier-1 gateways:
3. On the tier-1 gateway that you created earlier, click the three dots menu and select Edit.
5. In the Set DHCP Configuration dialog box, set Type to DHCP Server and select the DHCP
profile that you created as part of the prerequisites.
6. Click Save.
Create the overlay backed logical segments as shown in the Overlay backed segments CIDR
example. All these segments will be a part of the same overlay transport zone and they must be
connected to tier-1 gateway.
Note: NSX ALB Management Network, TKG Cluster VIP Network, TKG Management Network, and
TKG Shared Service Network must be connected to sfo01w01tier1 while TKG Workload Network and
TKG Workload VIP Network should be connected to sfo01w01tier2.
Note If you want to use TKG Cluster VIP Network to be used for applications deployed in workload
cluster, connect all network segments to sfo01w01tier1 tier-1 gateway.
The following procedure provides required details to create one such network which is required for
the Tanzu for Kubernetes Operations deployment:
3. Click ADD SEGMENT and enter a name for the segment. For example, sfo01-w01-vds01-
tkgmanagement
4. Under Connected Gateway, select the tier-1 gateway that you created earlier.
5. Under Transport Zone, select a transport zone that will be an overlay transport zone.
6. Under Subnets, enter the gateway IP address of the subnet in the CIDR format. For
example, 172.16.40.1/24
Note
The following step is required only for Tanzu Kubernetes Grid management
network, shared services network, and workload network.
DHCP Type field is set to Gateway DHCP Server and DHCP Profile is set to the profile
created while creating the tier-1 gateway.
1. Click Settings, select Enable DHCP Config, and enter the DHCP range and DNS
server information.
2. Click Options and under Select DHCP Options, select GENERIC OPTIONS.
3. Click ADD GENERIC OPTION, Add NTP servers (42) and Domain Search (119).
Repeat steps 1-7 to create all other required overlay-backed segments. Once completed, you should
see an output similar to the following screenshot:
Additionally, you can create the required inventory groups and firewall rules. For more information,
see NSX Data Center Product Documentation.
NSX Advanced Load Balancer is deployed in Write Access Mode in the vSphere Environment
backed by NSX. This mode grants NSX Advanced Load Balancer controllers full write access to the
vCenter or NSX which helps in automatically creating, modifying, and removing service engines
(SEs) and other resources as needed to adapt to changing traffic needs.
The sample IP address and FQDN set for the NSX Advanced Load Balancer controllers is as follows:
2. Select the content library under which the NSX-ALB OVA is placed.
4. Right-click the NSX Advanced Load Balancer image and select New VM from this
Template.
5. On the Select name and folder page, enter a name and select a folder for the NSX
Advanced Load Balancer VM as tkg-vsphere-alb-components.
6. On the Select a compute resource page, select the resource pool tkg-vsphere-alb-
components.
7. On the Review details page, verify the template details and click Next.
8. On the Select storage page, select a storage policy from the VM Storage Policy drop-down
menu and choose the datastore location where you want to store the virtual machine files.
9. On the Select networks page, select the network sfo01-w01-vds01-albmanagement and click
Next.
10. On the Customize template page, provide the NSX Advanced Load Balancer management
network details such as IP address, subnet mask, and gateway, and then click Next.
11. On the Ready to complete page, review the provided information and click Finish.
A new task for creating the virtual machine appears in the Recent Tasks pane. After the task is
complete, the NSX Advanced Load Balancer virtual machine is created on the selected resource.
Power on the virtual machine and give it a few minutes for the system to boot. Upon successful boot
up, navigate to NSX Advanced Load Balancer on your browser.
Note
While the system is booting up, a blank web page or a 503 status code might
appear.
Once NSX Advanced Load Balancer is successfully deployed and running, navigate to NSX
Advanced Load Balancer on your browser using the URL https://<IP/FQDN> and configure the
basic system settings:
2. On the Welcome page, under System Settings, set backup passphrase and provide DNS
information, and then click Next.
3. Under Email/SMTP, provide email and SMTP information, and then click Next.
Service Engines are managed within the: Provider (Shared across tenants)
If you did not select the Setup Cloud After option before saving, the initial configuration wizard exits.
The Cloud configuration window does not automatically launch, and you are directed to a dashboard
view on the controller.
Note
1. To configure licensing, navigate to Administration > Licensing, and click on the gear icon to
change the license type to Enterprise.
3. Once the license tier is changed, apply the NSX Advanced Load Balancer Enterprise license
key. If you have a license file instead of a license key, apply the license by selecting the
Upload a License File option.
To run a 3-node controller cluster, you deploy the first node and perform the initial configuration,
and set the cluster IP address. After that, you deploy and power on two more controller VMs, but
you must not run the initial configuration wizard or change the admin password for these controllers
VMs. The configuration of the first controller VM is assigned to the two new controller VMs.
The first controller of the cluster receives the Leader role. The second and third controllers work as
Follower.
Complete the following steps to configure NSX Advanced Load Balancer cluster:
1. Log in to the primary NSX Advanced Load Balancer controller and navigate to Administrator
> Controller > Nodes, and then click Edit.
2. Specify Name and Controller Cluster IP, and then click Save. This IP address must be from
the NSX ALB management network.
3. Deploy the 2nd and 3rd NSX Advanced Load Balancer controller nodes by using steps in
Deploy NSX Advanced Load Balancer.
4. Log into the primary NSX Advanced Load Balancer controller using the Controller Cluster
IP/FQDN and navigate to Administrator > Controller > Nodes, and then click Edit. The Edit
Controller Configuration popup appears.
5. In the Cluster Nodes field, enter the IP address for the 2nd and 3rd controller, and then click
Save.
After you complete these steps, the primary NSX Advanced Load Balancer controller
becomes the leader for the cluster and invites the other controllers to the cluster as
members.
NSX Advanced Load Balancer then performs a warm reboot of the cluster. This process can
take approximately 10-15 minutes. You will be automatically logged out of the controller node
where you are currently logged in. On entering the cluster IP address in the browser, you
can see details about the cluster formation task.
The configuration of the primary (leader) controller is synchronized to the new member nodes when
the cluster comes online following the reboot. Once the cluster is successfully formed, you can see
the following status:
Note
In the following tasks, all NSX Advanced Load Balancer configurations are done by
connecting to the NSX Advanced Load Balancer Controller Cluster IP/FQDN.
1. Log in to the NSX Advanced Load Balancer controller and navigate to Templates > Security
> SSL/TLS Certificates.
2. Click Create and select Controller Certificate. You can either generate a self-signed
certificate, generate CSR, or import a certificate. For the purpose of this document, a self-
signed certificate will be generated.
3. Provide all required details as per your infrastructure requirements and in the Subject
Alternate Name (SAN) field, provide IP address and FQDN of all NSX Advanced Load
Balancer controllers including NSX Advanced Load Balancer cluster IP and FQDN, and then
click Save.
4. Once the certificate is created, capture the certificate contents as this is required while
deploying the Tanzu Kubernetes Grid management cluster. To capture the certificate
content, click on the Download icon next to the certificate, and then click Copy to clipboard
under Certificate.
5. To replace the certificate, navigate to Administration > Settings > Access Settings, and click
the pencil icon at the top right to edit the system access settings, and then replace the
SSL/TSL certificate and click Save.
Create Credentials
NSX Advanced Load Balancer requires credentials of VMware NSX and vCenter Server to
authenticate with these endpoints. These credentials need to be created before configuring NSX
Cloud.
To create a new credential, navigate to Administration > User Credentials and click Create.
1. Create NSX Credential: Select the credential type as NSX-T and provide a name for the
credential. Under the section NSX-T Credentials, specify the username and password that
NSX Advanced Load Balancer will use to authenticate with VMware NSX.
1. Create vCenter Credential: Select the credential type as vCenter and provide a name for the
credential. Under the section vCenter Credentials, specify the username and password that
NSX Advanced Load Balancer will use to authenticate with vCenter server.
Service Engine Group 1: Service engines associated with this service engine group hosts:
Virtual services that load balances control plane nodes of Management Cluster and Shared
services cluster.
Virtual services for all load balancer functionalities requested by Tanzu Kubernetes Grid
Service Engine Group 2: Service engines part of this service engine group hosts virtual services that
load balances control plane nodes and virtual services for all load balancer functionalities requested
by the workload clusters mapped to this SE group.
Note
- Based on your requirements, you can create additional SE groups for the workload
clusters. - Multiple workload clusters can be mapped to a single SE group. - A Tanzu
Kubernetes Grid cluster can be mapped to only one SE group for application load
balancer services. - Control plane VIP for the workload clusters will be placed on the
respective Service Engine group assigned through AKO Deployment Config (ADC)
during cluster creation.
For more information about mapping a specific service engine group to Tanzu Kubernetes Grid
workload cluster, see Configure NSX Advanced Load Balancer in Tanzu Kubernetes Grid Workload
Cluster.
1. Log in to NSX Advanced Load Balancer and navigate to Infrastructure > Clouds > Create >
NSX-T Cloud.
2. Enter cloud name and provide a object name prefix. Click CHANGE CREDENTIALS to
connect NSX Advanced Load Balancer with VMware NSX.
3. Specify NSX-T Manager Address and select the NSX-T credential that you created earlier.
Transport Zone: Overlay transport zone where you connect the NSX Advanced Load
Balancer management network.
Tier-1 Router: Tier-1 gateway where the Advanced Load Balancer management
network is connected.
Overlay Segment: Logical segment that you have created for the Advanced Load
Balancer management.
Transport Zone: Overlay transport zone where you connected the Tanzu Kubernetes
Grid VIP networks.
Tier-1 Router: Tier-1 gateway sfo01w01tier1 where the TKG Cluster VIP Network
network is connected.
Overlay Segment: Logical segment that you have created for TKG Cluster VIP
Network.
Tier-1 Router: Tier-1 gateway sfo01w01tier2 where TKG Workload VIP Network is
connected.
Overlay Segment: Logical segment that you created for the TKG Workload VIP
Network.
Note: For single VIP network architecture, Don’t add sfo01w01tier2 tier-1 gateway under Data
Network Segments and associated Overlay Segment.
2. Specify a name for the vCenter server and click CHANGE CREDENTIALS to connect NSX
Advanced Load Balancer with the vCenter server.
3. Select the vCenter server from the drop down and select the vCenter credential which you
have created earlier.
4. Select the Content Library where Service Engine templates will be stored by NSX Advanced
Load Balancer.
5. Leave the IPAM/DNS profile section empty as this will be populated later, once you have
created the profiles. Click SAVE to finish the NSX-T cloud configuration.
7. Create a service engine group for Tanzu Kubernetes Grid management clusters:
2. Under Select Cloud, choose the cloud created in the previous step, and click Create.
8. Enter a name for the Tanzu Kubernetes Grid management service engine group, and set the
following parameters:
Parameter Value
VS Placement Compact
Under Scope tab, Specify the vCenter server endpoint by clicking on the Add option.
Select the vCenter server from the dropdown, Service Engine Folder, vSphere cluster and
datastore for service engine placement, and then click Save.
9. Repeat steps 12 and 13 to create another service engine group for Tanzu Kubernetes Grid
workload clusters. Once complete, there must be two service engine groups created.
1. Navigate to Infrastructure > Cloud Resources >Networks and select the cloud that you have
created earlier. Click on the edit icon next for the network and configure as follows. Change
Once the networks are configured, the configuration must look like the following image.
Note
1. Once the networks are configured, set the default routes for the networks by navigating to
Infrastructure > Cloud Resources > Routing.
Note
To set the default gateway for the asfo01-w01-vds01-albmanagement network, click CREATE
under the global VRF context and set the default gateway to gateway of the NSX Advanced
Load Balancer management subnet.
To set the default gateway for the sfo01-w01-vds01-tkgclustervip network, click CREATE
under the tier-1 gateway sfo01w01tier1 VRF context and set the default gateway to gateway
of the VIP network subnet.
To set the default gateway for the sfo01-w01-vds01-tkgworkloadvip network, click CREATE
under the tier-1 gateway sfo01w01tier2 VRF context and set the default gateway to gateway
Create IPAM Profile in NSX Advanced Load Balancer and Attach to Cloud
At this point, all the required networks related to Tanzu functionality are configured in NSX
Advanced Load Balancer. NSX Advanced Load Balancer provides IPAM service for Tanzu
Kubernetes Grid cluster VIP network and NSX ALB management network.
Complete the following steps to create an IPAM profile and once created, attach it to the NSX-T
cloud created earlier.
1. Log in to NSX Advanced Load Balancer and navigate to Templates > IPAM/DNS Profiles >
Create > IPAM Profile.
Parameter Value
Name sfo01-w01-vcenter-ipam01
Note
1. Click Create > DNS Profile and provide the domain name.
3. Under IPAM/DNS section, choose the IPAM and DNS profiles created earlier and
save the updated configuration.
This completes the NSX Advanced Load Balancer configuration. The next step is to deploy and
configure a bootstrap machine which will be used to deploy and manage Tanzu Kubernetes clusters.
The bootstrap machine runs a local kind cluster when Tanzu Kubernetes Grid management cluster
deployment is started. Once the kind cluster is fully initialized, the configuration is used to deploy the
actual management cluster on the backend infrastructure. After the management cluster is fully
configured, the local kind cluster is deleted and future configurations are performed with the Tanzu
CLI.
For this deployment, a Photon-based virtual machine is used as the bootstrap machine. For more
information about configuring for a macOS or Windows machine, see Install the Tanzu CLI and Other
Tools.
Docker and containerd binaries are installed. For instructions on how to install Docker, see
Docker documentation.
Ensure that the bootstrap VM is connected to Tanzu Kubernetes Grid management network,
sfo01-w01-vds01-tkgmanagement.
To install Tanzu CLI, Tanzu Plugins, and Kubectl utility on the bootstrap machine, follow the
instructions below:
1. Download and unpack the following Linux CLI packages from VMware Tanzu Kubernetes
Grid Download Product page.
2. Execute the following commands to install Tanzu Kubernetes Grid CLI, kubectl CLIs, and
Carvel tools.
version: v0.90.1
buildDate: 2023-06-29
sha: 8945351c
#Accept EULA
root@photon-829669d9bf1f [ ~ ]# tanzu config eula accept
[ok] Marking agreement as accepted.
Standalone Plugins
NAME DESCRIPTION
TARGET VERSION STATUS
isolated-cluster Prepopulating images/bundle for internet-restricted envir
onments global v0.30.1 installed
pinniped-auth Pinniped authentication operations (usually not directly
invoked) global v0.30.1 installed
management-cluster Kubernetes management cluster operations
kubernetes v0.30.1 installed
package Tanzu package management
kubernetes v0.30.1 installed
secret Tanzu secret management
kubernetes v0.30.1 installed
telemetry configure cluster-wide settings for vmware tanzu telemetr
y kubernetes v0.30.1 installed
##Install ytt
gunzip ytt-linux-amd64-v0.45.0+vmware.2.gz
##Install kapp
gunzip kapp-linux-amd64-v0.55.0+vmware.2.gz
##Install kbld
gunzip kbld-linux-amd64-v0.37.0+vmware.2.gz
##Install impkg
gunzip imgpkg-linux-amd64-v0.36.0+vmware.2.gz
chmod ugo+x imgpkg-linux-amd64-v0.36.0+vmware.2 && mv ./imgpkg-linux-amd64-v0.3
6.0+vmware.2 /usr/local/bin/imgpkg
ytt version
kapp version
kbld version
imgpkg version
4. Install yq. yq is a lightweight and portable command-line YAML processor. yq uses jq-like
wget https://github.com/mikefarah/yq/releases/download/v4.24.5/yq_linux_amd64.t
ar.gz
5. Install kind.
6. Execute the following commands to start the Docker service and enable it to start at boot.
Photon OS has Docker installed by default.
7. Execute the following commands to ensure that the bootstrap machine uses cgroup v1.
An SSH key pair is required for Tanzu CLI to connect to vSphere from the bootstrap
machine.
The public key part of the generated key is passed during the Tanzu Kubernetes Grid
management cluster deployment.
## Add the private key to the SSH agent running on your machine and enter the p
assword you created in the previous step
ssh-add ~/.ssh/id_rsa
## If the above command fails, execute "eval $(ssh-agent)" and then rerun the c
ommand
9. If your bootstrap machine runs Linux or Windows Subsystem for Linux, and it has a Linux
kernel built after the May 2021 Linux security patch, for example Linux 5.11 and 5.12 with
Fedora, run the following command.
All required packages are now installed and the required configurations are in place in the bootstrap
virtual machine. The next step is to deploy the Tanzu Kubernetes Grid management cluster.
1. Go to the Tanzu Kubernetes Grid downloads page and download a Tanzu Kubernetes Grid
OVA for the cluster nodes.
2. For the management cluster, this must be either Photon or Ubuntu based Kubernetes
v1.24.9 OVA.
Note
3. For workload clusters, OVA can have any supported combination of OS and Kubernetes
version, as packaged in a Tanzu Kubernetes release.
Note
Make sure you download the most recent OVA base image templates in the
event of security patch releases. You can find updated base image templates
that include security patches on the Tanzu Kubernetes Grid product
download page.
4. In the vSphere client, right-click an object in the vCenter Server inventory and select Deploy
OVF template.
5. Select Local file, click the button to upload files, and go to the downloaded OVA file on your
local machine.
7. Click Finish to deploy the VM. When the OVA deployment finishes, right-click the VM and
select Template > Convert to Template.
Note
8. If using non administrator SSO account: In the VMs and Templates view, right-click the new
template, select Add Permission, and assign the tkg-user to the template with the TKG role.
For more information about creating the user and role for Tanzu Kubernetes Grid, see Required
The management cluster is also where you configure the shared and in-cluster services that the
workload clusters use.
Run the Tanzu Kubernetes Grid installer, a wizard interface that guides you through the
process of deploying a management cluster. This is the recommended method.
Create and edit YAML configuration files, and use them to deploy a management cluster with
the CLI commands.
The following procedure provides the required steps to deploy Tanzu Kubernetes Grid management
cluster using the installer interface.
1. To launch the UI installer wizard, run the following command on the bootstrap machine:
For example:
tanzu management-cluster create --ui --bind 172.16.40.10:8000 --browser none
4. In the IaaS Provider section, enter the IP address/FQDN and credentials of the vCenter
server where the Tanzu Kubernetes Grid management cluster will be deployed. (Optional)
you can skip the vCenter SSL thumbprint verification.
6. Select the data center and provide the SSH public Key generated while configuring the
bootstrap VM.
If you have saved the SSH key in the default location, run the following command in your
bootstrap machine to get the SSH public key.
cat /root/.ssh/id_rsa.pub
7. Click NEXT.
8. On the Management Cluster Settings section, provide the following details and click Next.
Based on the environment requirements, select appropriate deployment type for the
Tanzu Kubernetes Grid management cluster:
It is recommended to set the instance type to Large or above. For the purpose of this
document, we will proceed with deployment type Production and instance type
Medium.
Control Plane Endpoint Provider: Select NSX Advanced Load Balancer for Control
Plane HA.
Control Plane Endpoint: This is an optional field. If left blank, NSX Advanced Load
Balancer will assign an IP address from the pool defined for the network “sfo01-w01-
vds01-tkgclustervip”.
If you need to provide an IP address, pick an IP address from “sfo01-w01-vds01-
tkgclustervip” static IP pools configured in AVI and ensure that the IP address is
unused.
Enable Audit Logging: Enable for audit logging for Kubernetes API server and node
VMs. Choose as per your environment needs. For more information, see Audit
Logging.
9. On the NSX Advanced Load Balancer section, provide the following information and click
Next.
Controller Host: NSX Advanced Load Balancer Controller IP/FQDN (ALB Controller
cluster IP/FQDN of the controller cluster is configured)
Controller certificate: Paste the contents of the Certificate Authority that is used to
generate your controller certificate into the Controller Certificate Authority text box.
10. Once these details are provided, click VERIFY CREDENTIALS and choose the following
parameters.
Cloud Name: Name of the cloud created while configuring NSX Advanced Load
Balancer sfo01w01vc01.
Workload Cluster Service Engine Group Name: Name of the service engine group
created for Tanzu Kubernetes Grid workload clusters created while configuring NSX
Advanced Load Balancer sfo01w01segroup01.
Management Cluster Service Engine Group Name: Name of the service engine
group created for Tanzu Kubernetes Grid management cluster created while
configuring NSX Advanced Load Balancer sfo01m01segroup01.
Cluster Labels: Optional. Leave the cluster labels section empty to apply the above
workload cluster network settings by default. If you specify any label here, you must
specify the same values in the configuration YAML file of the workload cluster. Else,
the system places the endpoint VIP of your workload cluster in TKG Cluster VIP
Network by default.
Note
With the above configuration, all the Tanzu workload clusters use sfo01-w01-
vds01-tkgclustervip for control plane VIP network and sfo01-w01-vds01-
tkgworkloadvip for data plane network by default. If you would like to
11. (Optional) On the Metadata page, you can specify location and labels and click Next.
12. On the Resources section, specify the resources to be consumed by the Tanzu Kubernetes
Grid management cluster and click NEXT.
13. On the Kubernetes Network section, select the Tanzu Kubernetes Grid management
network (sfo01-w01-vds01-tkgmanagement) where the control plane and worker nodes will be
placed during management cluster deployment. Ensure that the network has DHCP service
enabled. Optionally, change the pod and service CIDR.
If the Tanzu environment is placed behind a proxy, enable proxy and provide proxy details:
If you set http-proxy, you must also set https-proxy and vice-versa.
Note
For vSphere, you must manually add the CIDR of Tanzu Kubernetes Grid
management network and Cluster VIP networks that includes the IP address of your
control plane endpoints, to TKG_NO_PROXY.
14. (Optional) Specify identity management with OIDC or LDAP. For the purpose of this
document, identity management integration is deactivated.
If you would like to enable identity management, see Enable and Configure Identity
Management During Management Cluster Deployment section in the Tanzu Kubernetes
Grid Integration with Pinniped Deployment Guide.
15. Select the OS image that will be used for the management cluster deployment.
Note
This list appears empty if you don’t have a compatible template present in
your environment. Refer steps provided in Import Base Image template for
TKG Cluster deployment.
16. Select “Participate in the Customer Experience Improvement Program”, if you so desire.
As of now, it is not possible to deploy management cluster for NSX cloud from the Tanzu
Kubernetes Grid installer UI as one of the required field for NSX cloud is not exposed in the
UI and it needs to be manually inserted in the cluster deployment yaml.
19. Edit the file and insert the key AVI_NSXT_T1LR. The value of this key is the tier-1 gateway
where you have connected the sfo01-w01-vds01-tkgmanagement network. In this example,
the value is set to /infra/tier-1s/sfo01w01tier1.
20. Deploy the Management cluster from this config file by running the command:
A sample file used for the management cluster deployment is shown below:
AVI_CA_DATA_B64: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUM3ekNDQWRlZ0F3SUJBZ0lVRis5S
3BUSmdydmdFS1paRklabTh1WEFiRVN3d0RRWUpLb1pJaHZjTkFRRUwKQlFBd0ZURVRNQkVHQTFVRUF3d0tZV3h
pTFdObGNuUXdNVEFlRncweU16QTRNamt3T1RJeE16UmFGdzB5TkRBNApNamd3T1RJeE16UmFNQlV4RXpBUkJnT
lZCQU1NQ21Gc1lpMWpaWEowTURFd2dnRWlNQTBHQ1NxR1NJYjNEUUVCCkFRVUFBNElCRHdBd2dnRUtBb0lCQVF
DemU5eGxydzhjQlplTDE0TEc3L2RMMkg3WnJaVU5qM09zQXJxU3JxVmIKWEh4VGUrdTYvbjA1b240RGhUdDBEZ
ys0cDErZEZYMUc2N0kxTldJZlEzZGFRRnhyenBJSWdKTHUxYUF6R2hDRgpCR0dOTkxqbEtDMDVBMnZMaE1TeG5
ZR1orbDhWR2VKWDJ4dzY5N1M4L3duUUtVRGdBUUVwcHpZT0tXQnJLY3RXCktTYm1vNlR3d1UvNWFTS0tvS3h5U
DJJYXYrb1plOVNrNG05ejArbkNDWjVieDF1SzlOelkzZFBUdUUwQ3crMTgKUkpzN3Z4MzIxL3ZTSnM3TUpMa05
Ud0lEUlNLVkViWkR4b3VMWXVMOFRHZjdMLys2Sm1UdGc3Y3VsRmVhTlRKVgowTkJwb201ODc2UmMwZjdnODE3a
EFYcllhKzdJK0hxdnBSdlMrdFJkdjhDM0FnTUJBQUdqTnpBMU1ETUdBMVVkCkVRUXNNQ3FDSW5ObWJ6QXhZV3h
pWTNSc2NqQXhZUzV6Wm04d01TNXlZV2x1Y0c5c1pTNTJiWGVIQkt3UUNnc3cKRFFZSktvWklodmNOQVFFTEJRQ
URnZ0VCQUJIK20xUFUxcm1kNGRJenNTNDBJcWV3bUpHbUVBN3ByMkI2c0VIWAo0VzZWakFZTDNsTE4ySHN4VUN
Sa2NGbEVsOUFGUEpkNFZNdldtQkxabTB4SndHVXdXQitOb2NXc0puVjBjYWpVCktqWUxBWWExWm1hS2g3eGVYK
3VRVEVKdGFKNFJxeG9WYXoxdVNjamhqUEhteFkyZDNBM3RENDFrTCs3ZUUybFkKQmV2dnI1QmhMbjhwZVRyUlN
xb2h0bjhWYlZHbng5cVIvU0d4OWpOVC8vT2hBZVZmTngxY1NJZVNlR1dGRHRYQwpXa0ZnQ0NucWYyQWpoNkhVT
TIrQStjNFlsdW13QlV6TUorQU05SVhRYUUyaUlpN0VRUC9ZYW8xME5UeU1SMnJDCkh4TUkvUXdWck9NTThyK1p
VYm10QldIY1JWZS9qMVlVaXFTQjBJbmlraDFmeDZ3PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
AVI_CLOUD_NAME: sfo01w01vc01
AVI_CONTROL_PLANE_HA_PROVIDER: "true"
AVI_CONTROL_PLANE_NETWORK: sfo01-w01-vds01-clustervip
AVI_CONTROL_PLANE_NETWORK_CIDR: 172.16.80.0/24
AVI_CONTROLLER: 172.16.10.11
AVI_DATA_NETWORK: sfo01-w01-vds01-tkgworkloadvip
AVI_DATA_NETWORK_CIDR: 172.16.70.0/24
AVI_ENABLE: "true"
AVI_NSXT_T1LR: /infra/tier-1s/sfo01w01tier1
AVI_MANAGEMENT_CLUSTER_CONTROL_PLANE_VIP_NETWORK_CIDR: 172.16.80.0/24
AVI_MANAGEMENT_CLUSTER_CONTROL_PLANE_VIP_NETWORK_NAME: sfo01-w01-vds01-tkgclustervip
AVI_MANAGEMENT_CLUSTER_SERVICE_ENGINE_GROUP: sfo01m01segroup01
AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_CIDR: 172.16.80.0/24
AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_NAME: sfo01-w01-vds01-tkgclustervip
AVI_PASSWORD: <encoded:Vk13YXJlMSE=>
AVI_SERVICE_ENGINE_GROUP: sfo01w01segroup01
AVI_USERNAME: admin
CLUSTER_ANNOTATIONS: 'description:,location:'
CLUSTER_CIDR: 100.96.0.0/11
CLUSTER_NAME: sfo01w01tkgmgmt01
CLUSTER_PLAN: prod
ENABLE_AUDIT_LOGGING: "true"
ENABLE_CEIP_PARTICIPATION: "false"
ENABLE_MHC: "true"
IDENTITY_MANAGEMENT_TYPE: oidc
INFRASTRUCTURE_PROVIDER: vsphere
LDAP_BIND_DN: ""
LDAP_BIND_PASSWORD: ""
LDAP_GROUP_SEARCH_BASE_DN: ""
LDAP_GROUP_SEARCH_FILTER: ""
LDAP_GROUP_SEARCH_GROUP_ATTRIBUTE: ""
LDAP_GROUP_SEARCH_NAME_ATTRIBUTE: cn
LDAP_GROUP_SEARCH_USER_ATTRIBUTE: DN
LDAP_HOST: ""
LDAP_ROOT_CA_DATA_B64: ""
LDAP_USER_SEARCH_BASE_DN: ""
LDAP_USER_SEARCH_FILTER: ""
LDAP_USER_SEARCH_NAME_ATTRIBUTE: ""
LDAP_USER_SEARCH_USERNAME: userPrincipalName
OIDC_IDENTITY_PROVIDER_CLIENT_ID: ""
OIDC_IDENTITY_PROVIDER_CLIENT_SECRET: ""
OIDC_IDENTITY_PROVIDER_GROUPS_CLAIM: ""
OIDC_IDENTITY_PROVIDER_ISSUER_URL: ""
OIDC_IDENTITY_PROVIDER_NAME: ""
OIDC_IDENTITY_PROVIDER_SCOPES: ""
OIDC_IDENTITY_PROVIDER_USERNAME_CLAIM: ""
OS_ARCH: amd64
OS_NAME: photon
OS_VERSION: "3"
SERVICE_CIDR: 100.64.0.0/13
TKG_HTTP_PROXY_ENABLED: "false"
VSPHERE_CONTROL_PLANE_DISK_GIB: "20"
VSPHERE_CONTROL_PLANE_ENDPOINT: ""
VSPHERE_CONTROL_PLANE_MEM_MIB: "4096"
VSPHERE_CONTROL_PLANE_NUM_CPUS: "2"
VSPHERE_DATACENTER: /sfo01w01dc01
VSPHERE_DATASTORE: /sfo01w01dc01/datastore/vsanDatastore
VSPHERE_FOLDER: /sfo01w01dc01/vm/tkg-management-components
VSPHERE_INSECURE: "false"
VSPHERE_NETWORK: /sfo01w01dc01/network/sfo01-w01-vds01-tkgmanagement
VSPHERE_PASSWORD: <encoded:Vk13YXJlMSE=>
VSPHERE_RESOURCE_POOL: /sfo01w01dc01/host/sfo01w01cluster01/Resources/tkg-management-c
omponents
VSPHERE_SERVER: 192.168.200.100
VSPHERE_SSH_AUTHORIZED_KEY: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDrPqkVaPpNxHcKxukYro
V6LcCTuRK9NDyygbsAr/P73jEeWIcC+SU4tRpOZks2+BoduUDzdrsfm/Uq/0uj9LuzqIZKAzA1iQ5DtipVzROq
eTuAXJVCMZc6RPgQSZofLBo1Is85M/IrBS20OMALwjukMdwotKKFwL758l51FVsKOT+MUSW/wJLKTv3l0KPObg
SRTMUQdQpoG7ONcMNG2VkBMfgaK44cL7vT0/0Mv/Fmf3Zd59ZaWvX28ZmGEjRx8kOm1j/os61Y+kOvl1MTv8wc
85rYusRuP2Uo5UM4kUTdhSTFasw6TLhbSWicKORPi3FYklvS70jkQFse2WsvmtFG5xyxE/rzDGHloud9g2bQ7T
x0rtWWoRCCC8Sl/vzCjgZfDQXwKXoMP0KbcYHZxSA3zY2lXBlhNtZtyKlynnhr97EaWsm3b9fvhJMmKW5ylkmk
7+4Bql7frJ4bOOR4+hHv57Q8XFOYdLGQPGv03RUFQwFE6a0a6qWAvmVmoh8+BmlGOfx7WYpp8hkyGOdtQz8ZJe
SOyMT6ztLHbY/WqDwEvKpf1dJy93w8fDmz3qXHpkpdnA0t4TiCfizlBk15ZI03TLi4ELoFvso9We13dGClHDDy
v0Dm87uaACC+fyAT5JPbZpAcCw8rm/yTuZ8awtR0LEzJUqNJjX/5OX7Bf45h9w== [email protected]
VSPHERE_TLS_THUMBPRINT: 7C:31:67:1A:F3:26:FA:CE:0E:33:2E:D2:7C:FC:86:EC:1C:51:67:E3
VSPHERE_USERNAME: [email protected]
VSPHERE_WORKER_DISK_GIB: "40"
VSPHERE_WORKER_MEM_MIB: "8192"
VSPHERE_WORKER_NUM_CPUS: "2"
WORKER_ROLLOUT_STRATEGY: ""
Note
For Single VIP network Architecture, refer Management Cluster yaml file.
While the cluster is being deployed, you will find that a virtual service is created in NSX Advanced
Load Balancer and new service engines are deployed in vCenter by NSX Advanced Load Balancer
and the service engines are mapped to the SE Group sfo01m01segroup01.
The installer automatically sets the context to the Tanzu Kubernetes Grid management cluster on the
bootstrap machine. Now, you can access the Tanzu Kubernetes Grid management cluster from the
bootstrap machine and perform additional tasks such as verifying the management cluster health,
deploying the workload clusters, and so on.
To get the status of Tanzu Kubernetes Grid management cluster, run the following command:
Use kubectl get nodes command to get the status of the Tanzu Kubernetes Grid management
cluster nodes.
The Tanzu Kubernetes Grid management cluster is successfully deployed and now you can proceed
with registering it with Tanzu Mission Control and creating shared services and workload clusters.
What to Do Next
Register Management Cluster with Tanzu Mission Control
If you want to register your management cluster with Tanzu Mission Control, see Register Your
Management Cluster with Tanzu Mission Control.
install-ako-for-all: default configuration for all workload clusters. By default, all the
workload clusters reference this file for their virtual IP networks and service engine (SE)
groups. This ADC configuration does not enable NSX L7 Ingress by default.
tanzu-ako-for-shared: Used by shared services cluster to deploy the virtual services in TKG
Mgmt SE Group and the loadbalancer applications in TKG Cluster VIP Network.
tanzu-ako-for-workload-L7-ingress: Use this ADC only if you would like to enable NSX
Advanced Load Balancer L7 ingress on workload cluster. Otherwise, leave the cluster labels
empty to apply the network configuration from default ADC install-ako-for-all.
apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
finalizers:
- ako-operator.networking.tkg.tanzu.vmware.com
generation: 2
name: <Unique name of AKODeploymentConfig>
spec:
adminCredentialRef:
name: nsx-alb-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: nsx-alb-controller-ca
namespace: tkg-system-networking
cloudName: <NAME OF THE CLOUD in ALB>
clusterSelector:
matchLabels:
<KEY>: <VALUE>
controlPlaneNetwork:
cidr: <TKG-Cluster-VIP-CIDR>
Name: <TKG-Cluster-VIP-Network>
controller: <NSX ALB CONTROLLER IP/FQDN>
dataNetwork:
cidr: <TKG-Mgmt-Data-VIP-CIDR>
name: <TKG-Mgmt-Data-VIP-Name>
extraConfigs:
cniPlugin: antrea
disableStaticRouteSync: true
ingress:
defaultIngressController: false
disableIngressClass: true
nodeNetworkList:
- networkName: <TKG-Mgmt-Network>
serviceEngineGroup: <Mgmt-Cluster-SEG>
The sample AKODeploymentConfig with sample values in place is as follows. You should add the
respective NSX ALB label type=shared-services while deploying shared services cluster to enforce
this network configuration.
cloud: sfo01w01vc01
apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
generation: 3
name: tanzu-ako-for-shared
spec:
adminCredentialRef:
name: avi-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: avi-controller-ca
namespace: tkg-system-networking
cloudName: sfo01w01vc01
clusterSelector:
matchLabels:
type: shared-services
controlPlaneNetwork:
cidr: 172.16.80.0/24
name: sfo01-w01-vds01-tkgclustervip
controller: 172.16.10.10
controllerVersion: 22.1.3
dataNetwork:
cidr: 172.16.80.0/24
name: sfo01-w01-vds01-tkgclustervip
extraConfigs:
disableStaticRouteSync: false
ingress:
defaultIngressController: false
disableIngressClass: true
nodeNetworkList:
- networkName: sfo01-w01-vds01-tkgmanagement
networksConfig:
nsxtT1LR: /infra/tier-1s/sfo01w01tier1
serviceEngineGroup: sfo01m01segroup01
Note
For Single VIP Network Architecture, see Shared Service Cluster ADC file.
After you have the AKO configuration file ready, use the kubectl command to set the context to
Tanzu Kubernetes Grid management cluster and create the ADC:
Use the following command to list all AKODeploymentConfig created under the management
cluster:
As per the defined architecture, workload cluster cluster control plane endpoint uses TKG Cluster
VIP Network, application loadbalancing uses TKG Workload VIP network and the virtual services are
deployed in sfo01w01segroup01 SE group.
Below are the changes in ADC Ingress section when compare to the default ADC.
nodeNetworkList: Provide the values for TKG workload network name and CIDR.
The format of the AKODeploymentConfig YAML file for enabling NSX ALB L7 Ingress is as follows:
apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
name: <unique-name-for-adc>
spec:
adminCredentialRef:
name: NSXALB-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: NSXALB-controller-ca
namespace: tkg-system-networking
cloudName: <cloud name configured in nsx alb>
clusterSelector:
matchLabels:
<KEY>: <value>
controller: <ALB-Controller-IP/FQDN>
controlPlaneNetwork:
cidr: <TKG-Cluster-VIP-Network-CIDR>
name: <TKG-Cluster-VIP-Network-CIDR>
dataNetwork:
cidr: <TKG-Workload-VIP-network-CIDR>
name: <TKG-Workload-VIP-network-CIDR>
extraConfigs:
cniPlugin: antrea
disableStaticRouteSync: false # required
ingress:
disableIngressClass: false # required
nodeNetworkList: # required
- networkName: <TKG-Workload-Network>
cidrs:
- <TKG-Workload-Network-CIDR>
serviceType: NodePortLocal # required
shardVSSize: MEDIUM # required
serviceEngineGroup: <Workload-Cluster-SEG>
The AKODeploymentConfig with sample values in place is as follows. You should add the respective
NSXALB label workload-l7-enabled=true while deploying shared services cluster to enforce this
network configuration.
cloud: sfo01w01vc01
apiVersion: networking.tkg.tanzu.vmware.com/v1alpha1
kind: AKODeploymentConfig
metadata:
generation: 3
name: tanzu-ako-for-workload-L7-ingress
spec:
adminCredentialRef:
name: avi-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: avi-controller-ca
namespace: tkg-system-networking
cloudName: sfo01w01vc01
clusterSelector:
matchLabels:
workload-l7-enabled: "true"
controlPlaneNetwork:
cidr: 172.16.80.0/24
name: sfo01-w01-vds01-tkgclustervip
controller: 172.16.10.11
controllerVersion: 22.1.3
dataNetwork:
cidr: 172.16.70.0/24
name: sfo01-w01-vds01-tkgworkloadvip
extraConfigs:
disableStaticRouteSync: true
ingress:
defaultIngressController: true
disableIngressClass: false
serviceType: NodePortLocal
shardVSSize: MEDIUM
nodeNetworkList:
- networkName: sfo01-w01-vds01-tkgworkload
cidrs:
- 172.16.60.0/24
networksConfig:
nsxtT1LR: /infra/tier-1s/sfo01w01tier2
serviceEngineGroup: sfo01w01segroup01
Note
For Single VIP Network Architecture, see Workload Cluster ADC file.
Use the kubectl command to set the context to Tanzu Kubernetes Grid management cluster and
create the ADC:
Use the following command to list all AKODeploymentConfig created under the management
cluster:
Now that you have successfully created the AKO deployment config, you need to apply the cluster
labels while deploying the workload clusters to enable NSX Advanced Load Balancer L7 Ingress with
NodePortLocal mode.
The procedures for deploying a shared services cluster and workload cluster are almost the same. A
key difference is that you add the tanzu-services label to the shared services cluster as its cluster
role. This label identifies the shared services cluster to the management cluster and workload
clusters.
Shared services cluster uses the custom ADC tanzu-ako-for-shared created earlier to apply the
network settings similar to the management cluster. This is enforced by applying the NSX
ALB_LABELS type:shared-services while deploying the shared services cluster.
After the management cluster is registered with Tanzu Mission Control, the deployment of the Tanzu
Kubernetes clusters can be done in just a few clicks. The procedure for creating Tanzu Kubernetes
clusters is as follows.
Note
The scope of this document doesn’t cover the use of a proxy for Tanzu Kubernetes
Grid deployment. If your environment uses a proxy server to connect to the internet,
ensure that the proxy configuration object includes the CIDRs for the pod, ingress,
and egress from the workload network of the Management Cluster in the No proxy
list, as described in Create a Proxy Configuration Object for a Tanzu Kubernetes Grid
Service Cluster.
2. Under the Create cluster page, select the management cluster which you registered in the
previous step and click Continue to create cluster.
3. Select the provisioner for creating the workload cluster (shared services cluster). Provisioner
reflects the vSphere namespaces that you have created and associated with the
management cluster.
Enter a name for the cluster (Cluster names must be unique within an organization).
Select the cluster group to which you want to attach your cluster.
In the vCenter and tlsThumbprint fields, enter the details for authentication.
From the datacenter, resourcePool, folder, network, and datastore drop down,
select the required information.
From the template drop down, select the Kubernetes version.The latest supported
In the sshAuthorizedKeys field, enter the SSH key that was created earlier.
Enable aviAPIServerHAProvider.
7. Select the high availability mode for the control plane nodes of the workload cluster. For a
production deployment, it is recommended to deploy a highly available workload cluster.
Select OS Version.
9. Click Create Cluster to start provisioning your workload cluster. Once the cluster is created,
you can check the status from Tanzu Mission Control.
Cluster creation takes approximately 15-20 minutes to complete. After the cluster
deployment completes, ensure that agent and extensions health shows green.
10. Connect to the Tanzu Management Cluster context and verify the cluster labels for the
Shared Service cluster.
## Add the tanzu-services label to the shared services cluster as its cluster r
ole. In the following command "sfo01w01tkgshared01” is the name of the shared s
ervice cluster
11. Connect to admin context of the shared service cluster using the following commands and
## Use the following command to get the admin context of Shared Service Cluster
.
## Use the following command to use the context of Shared Service Cluster
Now that the shared services cluster is successfully created, you may proceed with deploying the
Harbor package. For more information, see Install Harbor in Deploy User-Managed Packages in
Workload Clusters.
The steps for deploying a workload cluster are the same as for a shared services cluster. except use
the NSX ALB Labels created for the Workload cluster on AKO Deployment in step number 4.
After the Workload cluster creation verify the cluster labels and ako pod status 1. Connect to the
Tanzu Management Cluster context and verify the cluster labels for the workload cluster. ```bash ##
verify the workload service cluster creation
## Validate that TMC has applied the AVI_LABEL while deploying the cluster
```
<!-- /* cSpell:enable */ -->
1. Connect to admin context of the workload cluster using the following commands and validate
the ako pod status.
## Use the following command to get the admin context of workload Cluster.
You can now configure SaaS components and deploy user-managed packages on the cluster.
LDAP_GROUP_SEARCH_BASE_DN: ""
LDAP_GROUP_SEARCH_FILTER: ""
LDAP_GROUP_SEARCH_GROUP_ATTRIBUTE: ""
LDAP_GROUP_SEARCH_NAME_ATTRIBUTE: cn
LDAP_GROUP_SEARCH_USER_ATTRIBUTE: DN
LDAP_HOST: ""
LDAP_ROOT_CA_DATA_B64: ""
LDAP_USER_SEARCH_BASE_DN: ""
LDAP_USER_SEARCH_FILTER: ""
LDAP_USER_SEARCH_NAME_ATTRIBUTE: ""
LDAP_USER_SEARCH_USERNAME: userPrincipalName
OIDC_IDENTITY_PROVIDER_CLIENT_ID: ""
OIDC_IDENTITY_PROVIDER_CLIENT_SECRET: ""
OIDC_IDENTITY_PROVIDER_GROUPS_CLAIM: ""
OIDC_IDENTITY_PROVIDER_ISSUER_URL: ""
OIDC_IDENTITY_PROVIDER_NAME: ""
OIDC_IDENTITY_PROVIDER_SCOPES: ""
OIDC_IDENTITY_PROVIDER_USERNAME_CLAIM: ""
OS_ARCH: amd64
OS_NAME: photon
OS_VERSION: "3"
SERVICE_CIDR: 100.64.0.0/13
TKG_HTTP_PROXY_ENABLED: "false"
VSPHERE_CONTROL_PLANE_DISK_GIB: "20"
VSPHERE_CONTROL_PLANE_ENDPOINT: ""
VSPHERE_CONTROL_PLANE_MEM_MIB: "4096"
VSPHERE_CONTROL_PLANE_NUM_CPUS: "2"
VSPHERE_DATACENTER: /sfo01w01dc01
VSPHERE_DATASTORE: /sfo01w01dc01/datastore/vsanDatastore
VSPHERE_FOLDER: /sfo01w01dc01/vm/tkg-management-components
VSPHERE_INSECURE: "true"
VSPHERE_NETWORK: /sfo01w01dc01/network/sfo01-w01-vds01-tkgmanagement
VSPHERE_PASSWORD: <encoded:Vk13YXJlMSE=>
VSPHERE_RESOURCE_POOL: /sfo01w01dc01/host/sfo01w01cluster01/Resources/tkg-management-c
omponents
VSPHERE_SERVER: 192.168.200.100
VSPHERE_SSH_AUTHORIZED_KEY: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDrPqkVaPpNxHcKxukYro
V6LcCTuRK9NDyygbsAr/P73jEeWIcC+SU4tRpOZks2+BoduUDzdrsfm/Uq/0uj9LuzqIZKAzA1iQ5DtipVzROq
eTuAXJVCMZc6RPgQSZofLBo1Is85M/IrBS20OMALwjukMdwotKKFwL758l51FVsKOT+MUSW/wJLKTv3l0KPObg
SRTMUQdQpoG7ONcMNG2VkBMfgaK44cL7vT0/0Mv/Fmf3Zd59ZaWvX28ZmGEjRx8kOm1j/os61Y+kOvl1MTv8wc
85rYusRuP2Uo5UM4kUTdhSTFasw6TLhbSWicKORPi3FYklvS70jkQFse2WsvmtFG5xyxE/rzDGHloud9g2bQ7T
x0rtWWoRCCC8Sl/vzCjgZfDQXwKXoMP0KbcYHZxSA3zY2lXBlhNtZtyKlynnhr97EaWsm3b9fvhJMmKW5ylkmk
7+4Bql7frJ4bOOR4+hHv57Q8XFOYdLGQPGv03RUFQwFE6a0a6qWAvmVmoh8+BmlGOfx7WYpp8hkyGOdtQz8ZJe
SOyMT6ztLHbY/WqDwEvKpf1dJy93w8fDmz3qXHpkpdnA0t4TiCfizlBk15ZI03TLi4ELoFvso9We13dGClHDDy
v0Dm87uaACC+fyAT5JPbZpAcCw8rm/yTuZ8awtR0LEzJUqNJjX/5OX7Bf45h9w== [email protected]
VSPHERE_TLS_THUMBPRINT: ""
VSPHERE_USERNAME: [email protected]
VSPHERE_WORKER_DISK_GIB: "20"
VSPHERE_WORKER_MEM_MIB: "4096"
VSPHERE_WORKER_NUM_CPUS: "2"
WORKER_ROLLOUT_STRATEGY: ""
generation: 3
name: tanzu-ako-for-shared
spec:
adminCredentialRef:
name: avi-controller-credentials
namespace: tkg-system-networking
certificateAuthorityRef:
name: avi-controller-ca
namespace: tkg-system-networking
cloudName: sfo01w01vc01
clusterSelector:
matchLabels:
type: shared-services
controlPlaneNetwork:
cidr: 172.16.80.0/24
name: sfo01-w01-vds01-tkgclustervip
controller: 172.16.10.10
controllerVersion: 22.1.3
dataNetwork:
cidr: 172.16.80.0/24
name: sfo01-w01-vds01-tkgclustervip
extraConfigs:
disableStaticRouteSync: false
ingress:
defaultIngressController: false
disableIngressClass: true
nodeNetworkList:
- networkName: sfo01-w01-vds01-tkgmanagement
networksConfig:
nsxtT1LR: /infra/tier-1s/sfo01w01tier1
serviceEngineGroup: sfo01m01segroup01
extraConfigs:
disableStaticRouteSync: true
ingress:
defaultIngressController: true
disableIngressClass: false
serviceType: NodePortLocal
shardVSSize: MEDIUM
nodeNetworkList:
- networkName: sfo01-w01-vds01-tkgworkload
cidrs:
- 172.16.60.0/24
networksConfig:
nsxtT1LR: /infra/tier-1s/sfo01w01tier1
serviceEngineGroup: sfo01w01segroup01
The following documentation lays out the reference designs for deploying Tanzu for Kubernetes
Operation (informally known as TKO) on vSphere with Tanzu. A separate reference design is
provided for environments that use NSX-T.
VMware Tanzu for Kubernetes Operations using vSphere with Tanzu Reference Design
Deploy Tanzu for Kubernetes Operations using vSphere with Tanzu
VMware Tanzu for Kubernetes Operations using vSphere with Tanzu on NSX-T Reference
Design
This document provides a reference design for deploying VMware Tanzu for Kubernetes Operations
(informally known as TKO) on vSphere with Tanzu.
The following reference design is based on the architecture and components described in VMware
Tanzu for Kubernetes Operations Reference Architecture.
The Supervisor Cluster runs on top of an SDDC layer that consists of ESXi for compute,
vSphere Distributed Switch for networking, and vSAN or another shared storage solution.
Tanzu Kubernetes Grid Service: Tanzu Kubernetes Grid Service (TKGS) allows you to create
and manage ubiquitous Kubernetes clusters on a VMware vSphere infrastructure using the
Kubernetes Cluster API. The Cluster API provides declarative, Kubernetes-style APIs for the
creation, configuration, and management of the Tanzu Kubernetes Cluster.
Tanzu Kubernetes Grid Service also provides self-service lifecycle management of Tanzu
Kubernetes clusters.
Tanzu Kubernetes Cluster (Workload Cluster): Tanzu Kubernetes clusters are Kubernetes
workload clusters in which your application workloads run. These clusters can be attached to
SaaS solutions such as Tanzu Mission Control, Tanzu Observability, and Tanzu Service Mesh,
which are part of Tanzu for Kubernetes Operations.
VM Class in vSphere with Tanzu: A VM class is a template that defines CPU, memory, and
reservations for VMs. VM classes are used for VM deployment in a Supervisor Namespace.
VM classes can be used by standalone VMs that run in a Supervisor Namespace and by VMs
hosting a Tanzu Kubernetes cluster.
VM classes in vSphere with Tanzu are broadly categorized into the following groups:
vSphere with Tanzu offers several default VM classes. You can use them as is or you can
create new VM classes. The following screenshot shows the default VM classes that are
available in vSphere with Tanzu.
Storage Classes in vSphere with Tanzu: A StorageClass provides a way for administrators to
describe the classes of storage they offer. Different classes can map to quality-of-service
levels, to backup policies, or to arbitrary policies determined by the cluster administrators.
You can deploy vSphere with Tanzu with an existing default StorageClass or the vSphere
Administrator can define StorageClass objects (Storage policy) that let cluster users
dynamically create PVC and PV objects with different storage types and rules.
Decision
Design Decision Design Justification Design Implications
ID
TKO- Create custom To provide different levels of QoS Default Storage Policy might not be adequate if
TKGS- Storage and SLA for prod and dev/test deployed applications have different performance
001 Classes/Profiles/ K8s workloads. and availability requirements.
Policies To isolate Supervisor clusters
from workload clusters.
TKO- Create custom To facilitate deployment of K8s Default VM Classes in vSphere with Tanzu are not
TKGS- VM Classes workloads with specific adequate to run a wide variety of K8s workloads.
002 compute/storage requirements.
Kubernetes control plane VM: Three Kubernetes control plane VMs in total are created on
the hosts that are part of the Supervisor cluster. The three control plane VMs are load-
balanced as each one of them has its own IP address.
Cluster API and Tanzu Kubernetes Grid Service: These modules run on the Supervisor
cluster and enable the provisioning and management of Tanzu Kubernetes clusters.
The following diagram shows the general architecture of the Supervisor cluster.
After a Supervisor cluster is created, the vSphere administrator creates vSphere namespaces. When
initially created, vSphere namespaces have unlimited resources within the Supervisor cluster. The
vSphere administrator defines the limits for CPU, memory, and storage, as well as the number of
Kubernetes objects such as deployments, replica sets, persistent volumes, and so on. that can run
within the namespace. These limits are configured for each vSphere namespace.
For more information about the maximum supported number, see the vSphere with Tanzu
Configuration Maximums guide.
To provide tenants access to namespaces, the vSphere administrator assigns permission to users or
groups available within an identity source that is associated with vCenter Single Sign-On.
Once the permissions are assigned, tenants can access the namespace to create Tanzu Kubernetes
Clusters using YAML files and the Cluster API.
Here are some recommendations for using namespaces in a vSphere with Tanzu environment.
Decision
Design Decision Design Justification Design Implications
ID
TKO- Create namespaces Create dedicated namespaces for All Kubernetes clusters created under a
TKGS- to logically separate the type of workloads namespace share the same access
003 K8s workloads. (prod/dev/test) that you intend policy/quotas/network resources.
to run.
TKO- Enable self-service Enable DevOps/Cluster admin The vSphere administrator must publish a
TKGS- namespaces. users to provision namespaces in namespace template to the LDAP
004 a self-service manner. users/groups to enable them to create
namespaces.
TKO- Register external Limit access to a namespace to A prod namespace can be accessed by a
TKGS- identity source authorized users/groups. handful of users, whereas a dev/test
005 (AD/LDAP) with namespace can be exposed to a wider
vCenter. audience.
vSAN
VMFS
NFS
vVols
vSphere with Tanzu uses storage policies to integrate with shared datastores. The policies represent
datastores and manage the storage placement of objects such as control plane VMs, container
images, and persistent storage volumes.
Before you enable vSphere with Tanzu, create storage policies to be used by the Supervisor Cluster
and namespaces. Depending on your vSphere storage environment, you can create several storage
policies to represent different classes of storage.
vSphere with Tanzu is agnostic about which storage option you choose. For Kubernetes stateful
workloads, vSphere with Tanzu installs the vSphere Container Storage Interface (vSphere CSI) to
automatically provision Kubernetes persistent volumes for pods.
Antrea
Calico
The CNI options are open-source software that provide networking for cluster pods, services, and
ingress.
When you deploy a Tanzu Kubernetes cluster using the default configuration of Tanzu CLI, Antrea
CNI is automatically enabled in the cluster.
To provision a Tanzu Kubernetes cluster using Calico CNI, see Deploy Tanzu Kubernetes clusters
with Calico
Each CNI is suitable for a different use case. The following table lists some common use cases for the
CNI options that Tanzu Kubernetes Grid supports. This table will help you select the most appropriate
CNI for your Tanzu Kubernetes Grid implementation.
Ant
Enable Kubernetes pod networking with IP overlay networks using VXLAN Pros
rea
or Geneve for encapsulation. Optionally encrypt node-to-node
- Antrea leverages Open vSwitch
communication using IPSec packet encryption.
as the networking data plane.
Antrea supports advanced network use cases like kernel bypass and Open vSwitch supports both
network service mesh. Linux and Windows.
Cali
Calico is used in environments where factors like network performance, Pros
co
flexibility, and power are essential.
- Support for Network Policies
For routing packets between nodes, Calico leverages the BGP routing
- High network performance
protocol instead of an overlay network. This eliminates the need to wrap
packets with an encapsulation layer resulting in increased network - SCTP Support
performance for Kubernetes workloads.
Cons
- No multicast support
vSphere Virtual Distributed Switch (VDS) Networking with NSX Advanced Load Balancer.
Note
The scope of this discussion is limited to vSphere Networking (VDS) with NSX
Advanced Load Balancer.
You can use one or more distributed port groups as Workload Networks. The network that provides
connectivity to the Kubernetes Control Plane VMs is called Primary Workload Network. You can
assign this network to all the namespaces on the Supervisor Cluster, or you can use different
networks for each namespace. The Tanzu Kubernetes clusters connect to the Workload Network
that is assigned to the namespace.
The Supervisor Cluster leverages NSX Advanced Load Balancer (NSX ALB) to provide L4 load
balancing for the Tanzu Kubernetes clusters control-plane HA. Users access the applications by
connecting to the Virtual IP address (VIP) of the applications provisioned by NSX Advanced Load
Balancer.
The following diagram shows a general overview for vSphere with Tanzu on vSphere Networking.
NSX Advanced Load Balancer Controller: NSX Advanced Load Balancer Controller
manages Virtual Service objects and interacts with the vCenter Server infrastructure to
manage the lifecycle of the service engines (SEs). It is the central repository for the
configurations and policies related to services and management and provides the portal for
viewing the health of VirtualServices and SEs and the associated analytics that NSX
Advanced Load Balancer provides.
NSX Advanced Load Balancer Service Engine: NSX Advanced Load Balancer Service
Engines (SEs) are lightweight VMs that handle all data plane operations by receiving and
executing instructions from the controller. The SEs perform load balancing and all client and
server-facing network interactions.
Avi Kubernetes Operator (AKO): Avi Kubernetes Operator is a Kubernetes operator that
runs as a pod in the Supervisor Cluster. It provides ingress and load balancing functionality.
Avi Kubernetes Operator translates the required Kubernetes objects to NSX Advanced Load
Balancer objects and automates the implementation of ingresses/routes/services on the
Service Engines (SE) via the NSX Advanced Load Balancer Controller.
Each environment configured in NSX Advanced Load Balancer is referred to as a cloud. Each cloud
in NSX Advanced Load Balancer maintains networking and NSX Advanced Load Balancer Service
Engine settings. Each cloud is configured with one or more VIP networks to provide IP addresses to
L4 load balancing virtual services created under that cloud.
The virtual services can be spanned across multiple Service Engines if the associated Service Engine
Group is configured in Active/Active HA mode. A Service Engine can belong to only one Service
Engine group at a time.
IP address allocation for virtual services can be over DHCP or via NSX Advanced Load Balancer in-
built IPAM functionality. The VIP networks created/configured in NSX Advanced Load Balancer are
associated with the IPAM profile.
Network Architecture
To deploy vSphere with Tanzu, build separate networks for the Tanzu Kubernetes Grid management
(Supervisor) cluster, Tanzu Kubernetes Grid workload clusters, NSX Advanced Load Balancer
components, and the Tanzu Kubernetes Grid control plane HA.
The network reference design can be mapped into this general framework.
Note
The network/portgroup designated for the workload cluster, carries both data and
control traffic. Firewalls cannot be utilized to segregate traffic between workload
clusters; instead, the underlying CNI must be employed as the main filtering system.
Antrea CNI has the Custom Resource Definitions (CRDs) for firewall rules that can be
enforce before Kubernetes network policy is added.
Based on your requirements, you can create additional networks for your workload
cluster. These networks are also referred to as vSphere with Tanzu workload
secondary network.
Isolate and separate SDDC management components (vCenter, ESX) from the vSphere with
Tanzu components. This reference design allows only the minimum connectivity between
the Tanzu Kubernetes Grid clusters and NSX Advanced Load Balancer to the vCenter
Server.
Isolate and separate the NSX Advanced Load Balancer management network from the
supervisor cluster network and the Tanzu Kubernetes Grid workload networks.
Separate vSphere Admin and Tenant access to the supervisor cluster. This prevents tenants
from attempting to connect to the supervisor cluster.
Allow tenants to access only their own workload cluster(s) and restrict access to this cluster
from other tenants. This separation can be achieved by assigning permissions to the
supervisor namespaces.
Depending on the workload cluster type and use case, multiple workload clusters may
leverage the same workload network or new networks can be used for each workload
cluster.
Network Requirements
DHCP
Network Type Description
Service
TKG Management Network Optional Supervisor Cluster nodes will be attached to this network.
192.168.80.21 -
192.168.80.60
Firewall Requirements
To prepare the firewall, you need the following information:
The following table provides a list of firewall rules based on the assumption that there is no firewall
within a subnet/VLAN.
Client Machine NSX Advanced Load TCP:443 Access NSX Advanced Load Balancer portal
Balancer Controller for configuration.
Nodes and VIP
Client Machine vCenter Server TCP:443 Access and configure WCP in vCenter.
TKG Management and NSX Advanced Load TCP:443 Allow Avi Kubernetes Operator (AKO) and
Workload Cluster CIDR Balancer controller AKO Operator (AKOO) access to NSX
nodes Advanced Load Balancer Controller.
TKG Management and TKG Cluster VIP Range TCP:6443 Allow Supervisor cluster to configure
Workload Cluster CIDR workload clusters.
TKG Management and Image Registry (Harbor) TCP:443 Allow components to retrieve container
Workload Cluster CIDR (If Private) images.
TKG Management and TCP:443 Sync content library, pull TKG binaries, and
wp-content.vmware.com
Workload Cluster CIDR interact with TMC.
*.tmc.cloud.vmware.co
m
Projects.registry.vmware
.com
TKG Management TKG Workload Cluster TCP:6443 VM Operator and TKC VM communication.
cluster CIDR CIDR
TKG Workload Cluster TKG Management Cluster TCP:6443 Allow the TKG workload cluster to register
CIDR CIDR with the Supervisor cluster.
NSX Advanced Load vCenter and ESXi Hosts TCP:443 Allow NSX Advanced Load Balancer to
Balancer Management discover vCenter objects and deploy SEs as
Network required.
TKG Cluster VIP Range TKG Management Cluster TCP:6443 To interact with the Supervisor cluster.
CIDR
TKG Cluster VIP Range TKG Workload Cluster To interact with workload cluster and K8s
TCP:6443
CIDR applications
TCP:443
TCP:80
TCP:22 (optional)
Note
For TMC, if the firewall does not allow wildcards, all IP addresses of
Deployment options
Starting with vSphere 8, when you enable vSphere with Tanzu, you can configure either one-zone
Supervisor mapped to one vSphere cluster or three-zone Supervisor mapped to three vSphere
clusters.
Distribute the nodes of Tanzu Kubernetes Grid clusters across all three vSphere zones and
provide availability via vSphere HA at cluster level.
Scale the Supervisor by adding hosts to each of the three vSphere clusters.
Installation Experience
vSphere with Tanzu deployment starts with deploying the Supervisor cluster (Enabling Workload
Management). The deployment is directly done from the vCenter user interface (UI). The Get Started
page lists the pre-requisites for the deployment.
The vCenter UI shows that, in the current version, it is possible to install vSphere with Tanzu on the
VDS networking stack as well as NSX-T Data Center as the networking solution.
This installation process takes you through the steps of deploying Supervisor Cluster in your
vSphere environment. Once the Supervisor cluster is deployed, you can use either Tanzu Mission
Control or Kubectl utility to deploy the Tanzu Kubernetes Shared Service and workload clusters.
Design Recommendations
NSX Advanced Load Balancer Recommendations
The following table provides recommendations for configuring NSX Advanced Load Balancer in a
vSphere with Tanzu environment.
Decision
Design Decision Design Justification Design Implications
ID
TKO- Deploy NSX Advanced To isolate NSX Advanced Load Balancer traffic from Allows for ease of
Advance Load Balancer infrastructure management traffic and Kubernetes management of
d Load controller cluster nodes workloads. controllers.
Balancer on a network dedicated Additional Network
-001 to NSX-Advanced Load (VLAN) is required.
Balancer.
TKO- Deploy 3 NSX To achieve high availability for the NSX Advanced
Clustered mode
Advance Advanced Load Load Balancer platform. In clustered mode, NSX
requires more
d Load Balancer controllers Advanced Load Balancer availability is not impacted
compute and storage
Balancer nodes. by an individual controller node failure. The failed
resources.
-002 node can be removed from the cluster and
redeployed if recovery is not possible.
TKO- Configure vCenter Using a non-default vCenter cloud is not supported Using a non-default
Advance settings in Default- with vSphere with Tanzu. cloud can lead to
d Load Cloud. deployment failures.
Balancer
-003
TKO- Use static IPs for the NSX Advanced Load Balancer Controller cluster uses NSX Advanced Load
Advance NSX Advanced Load management IP addresses to form and maintain Balancer Controller
d Load Balancer controllers if quorum for the control plane cluster. Any changes control plane might go
Balancer DHCP cannot would be disruptive. down if the
-004 guarantee a permanent management IPs of the
lease. controller node
changes.
TKO- Guarantees IP address assignment for Service Engine Removes the corner
Use NSX Advanced
Advance Data NICs and Virtual Services. case scenario when
Load Balancer IPAM for
d Load the DHCP server runs
Service Engine data
Balancer out of the lease or is
network and virtual
-005 down.
services IP assignment.
TKO- Reserve an IP in the NSX Advanced Load Balancer portal is always NSX Advanced Load
Advance NSX Advanced Load accessible over Cluster IP regardless of a specific Balancer
d Load Balancer management individual controller node failure. administration is not
Balancer subnet to be used as affected by an
-006 the Cluster IP for the individual controller
Controller Cluster. node failure.
TKO- Use default Service Using a non-default Service Engine Group for hosting Using a non-default
Advance Engine Group for load L4 virtual service created for TKG control plane HA is Service Engine Group
d Load balancing of TKG not supported. can lead to Service
Balancer clusters control plane. Engine VM
-007 deployment failure.
Sharing Service
Engines can help
reduce the licensing
cost.
Decision
Design Decision Design Justification Design Implications
ID
TKO- Configure anti-affinity This is to ensure that no two controllers end up in Anti-Affinity rules need
Advance rules for the NSX ALB same ESXi host and thus avoid single point of failure. to be created
d Load controller cluster. manually.
Balancer
-009
TKO- Configure backup for Backups are required if the NSX ALB Controller To store backups, a
Advance the NSX ALB Controller becomes inoperable or if the environment needs to be SCP capable backup
d Load cluster. restored from a previous state. location is needed.
Balancer SCP is the only
-0010 supported protocol
currently.
TKO- Initial setup should be NSX ALB controller cluster is created from an NSX ALB controller
Advance done only on one NSX initialized NSX ALB controller which becomes the cluster creation fails if
d Load ALB controller VM out cluster leader. more than one NSX
Balancer of the three deployed to Follower NSX ALB controller nodes need to be ALB controller is
-0011 create an NSX ALB uninitialized to join the cluster. initialized.
controller cluster.
TKO- Configure Remote For operations teams to centrally monitor NSX ALB Additional Operational
Advance logging for NSX ALB and escalate alerts events must be sent from the NSX Overhead.
d Load Controller to send ALB Controller Additional
Balancer events on Syslog. infrastructure
-0012 Resource.
TKO- Use LDAP/SAML based Helps to Maintain Role based Access Control. Additional
Advance Authentication for NSX Configuration is
d Load ALB required.
Balancer
-0013
Network Recommendations
The following are the key network recommendations for a production-grade vSphere with Tanzu
deployment:
Decision
Design Decision Design Justification Design Implications
ID
TKO- Use separate To have a flexible firewall and Sharing the same network for multiple clusters
NET-001 networks for security policies can complicate creation of firewall rules.
Supervisor cluster
and workload
clusters.
TKO- Use distinct port Isolate production Kubernetes Network mapping is done at the namespace
NET- groups for network clusters from dev/test clusters level. All Kubernetes clusters created in a
002 separation of K8s by placing them on distinct port namespace connect to the same port group.
workloads. groups.
TKO- Use routable Allow connectivity between the Networks that are used for Tanzu Kubernetes
NET- networks for Tanzu TKG clusters and infrastructure cluster traffic must be routable between each
003 Kubernetes clusters. components. other and the Supervisor Cluster Management
Network.
TKO- Deploy Supervisor Large form factor should suffice to Consume more Resources from
TKGS- cluster control integrate Supervisor Cluster with TMC Infrastructure.
002 plane nodes in large and velero deployment.
form factor.
TKO- Register Supervisor Tanzu Mission Control automates the Need outbound connectivity to
TKGS- cluster with Tanzu creation of the Tanzu Kubernetes clusters internet for TMC registration.
003 Mission Control. and manage the life cycle of all clusters
centrally.
Note
SaaS endpoints here refers to Tanzu Mission Control, Tanzu Service Mesh and Tanzu
Observability.
TKO- Deploy Tanzu The prod plan provides high availability for the Consume from resource from
TKC-001 Kubernetes clusters control plane. Infrastructure.
with prod plan and
multiple worker
nodes.
TKO- Use guaranteed VM Guarantees compute resources are always Could prevent automatic
TKC- class for Tanzu available for containerized workloads. migration of nodes by DRS.
002 Kubernetes clusters.
TKO- Implement RBAC for To avoid the usage of administrator credentials External AD/LDAP needs to
TKC- Tanzu Kubernetes for managing the clusters. be integrated with vCenter or
003 clusters. SSO groups need to be
created manually.
TKO- Deploy Tanzu Tanzu Mission Control provides life-cycle Only Antrea CNI is supported
TKC-04 Kubernetes clusters management for the Tanzu Kubernetes clusters on Workload clusters created
from Tanzu Mission and automatic integration with Tanzu Service from TMC portal.
Control. Mesh and Tanzu Observability.
One example of an ingress controller is Contour, an open-source controller for Kubernetes ingress
routing. Contour is part of a Tanzu package and can be installed on any Tanzu Kubernetes cluster.
Deploying Contour is a prerequisite for deploying Prometheus, Grafana, and Harbor on a workload
cluster.
For more information about Contour, see the Contour site and Implementing Ingress Control with
Contour.
Each ingress controller has pros and cons of its own. The below table provides general
recommendations on when you should use a specific ingress controller for your Kubernetes
environment.
Ingress
Use Cases
Controller
Contour
Use Contour when only north-south traffic is needed in a Kubernetes cluster. You can apply security
policies for the north-south traffic by defining the policies in the manifest file for the application.
Istio Use Istio ingress controller when you need to provide security, traffic direction, and insight within the
cluster (east-west traffic) and between the cluster and the outside world (north-south traffic).
Throughput 4 Gb/s
Connections/s 40k
Multiple performance vectors or features may have an impact on performance. For example, to
achieve 1 Gb/s of SSL throughput and 2000 TPS of SSL with EC certificates, NSX Advanced Load
Balancer recommends two cores.
NSX Advanced Load Balancer Service Engines may be configured with as little as 1 vCPU core and 2
GB RAM, or up to 64 vCPU cores and 256 GB RAM. It is recommended for a Service Engine to have
at least 4 GB of memory when GeoDB is in use.
Container Registry
VMware Tanzu for Kubernetes Operations using vSphere with Tanzu includes Harbor as a container
registry. Harbor is an open-source, trusted, cloud-native container registry that stores, signs, and
scans content.
The initial configuration and setup of the platform does not require any external registry because the
required images are delivered through vCenter. Customer can choose any existing repository and if
required can deploy harbor registry for storing the images.
When vSphere with Tanzu is deployed on VDS networking, you can deploy an external container
registry (Harbor) for Tanzu Kubernetes clusters.
VM-based deployment using OVA - VMware recommends this installation method in cases
where Tanzu Kubernetes Grid is being installed in an air-gapped or Internet-restricted
environment, and no pre-existing image registry exists to host the Tanzu Kubernetes Grid
system images. VM-based deployments are only supported by VMware Global Support
Services to host the system images for air-gapped or Internet-restricted deployments. Do
not use this method for hosting application images.
When deploying Harbor with self-signed certificates or certificates signed by internal CAs, it is
necessary for the Tanzu Kubernetes cluster to establish trust with the registry’s certificate. To do so,
follow the procedure in Trust Custom CA Certificates on Cluster Nodes.
To learn more about Tanzu Kubernetes Grid integration with Tanzu SaaS, see Tanzu SaaS Services.
Summary
vSphere with Tanzu on hyper-converged hardware offers high-performance potential and
convenience and addresses the challenges of creating, testing, and updating on-premises
Kubernetes platforms in a consolidated production environment. This validated approach results in a
production installation with all the application services needed to serve combined or uniquely
separated workload types via a combined infrastructure solution.
This plan meets many Day-0 needs for quickly aligning product capabilities to full-stack
infrastructure, including networking, configuring firewall rules, load balancing, workload compute
alignment, and other capabilities.
Deployment Instructions
For instructions on how to deploy this reference design, see Deploy Tanzu for Kubernetes
The scope of the document is limited to providing deployment steps based on the reference design
in VMware Tanzu for Kubernetes Operations using vSphere with Tanzu Reference Design. This
document does not cover any deployment procedures for the underlying SDDC components.
VMware Service Installer for Tanzu automates the deployment of the reference designs for Tanzu for
Kubernetes Operations. It uses best practices for deploying and configuring the required Tanzu for
Kubernetes Operations components.
To use Service Installer to automate this deployment, see Deploying VMware Tanzu for Kubernetes
Operations on vSphere with Tanzu and vSphere Distributed Switch Using Service Installer for
VMware Tanzu.
Alternatively, if you decide to manually deploy each component, follow the steps provided in this
document.
Prerequisites
Before deploying Tanzu Kubernetes Operations using vSphere with Tanzu on vSphere networking,
ensure that your environment is set up as described in the following:
General Requirements
Network Requirements
Firewall Requirements
Resource Pools
General Requirements
Ensure that your environment meets the following general requirements:
A distributed switch with port groups for TKO components. See Network
Requirements for the required port groups.
All ESXi hosts of the cluster on which vSphere with Tanzu will be enabled should be
part of the distributed switch.
Dedicated resource pools and VM folder for collecting NSX Advanced Load Balancer
VMs.
A shared datastore with sufficient capacity for the control plane and worker node VM
files.
Network Time Protocol (NTP) service running on all hosts and vCenter.
NSX Advanced Load Balancer 22.1.2 OVA downloaded from customer connect portal and
readily available for deployment.
Note
Tanzu Kubernetes Grid nodes will unable to resolve hostname with the “.local”
domain suffix. For more information, see KB article.
For additional information on general prerequisites, see vSphere with Tanzu product documentation.
Network Requirements
The following table provides example entries for the required port groups. Create network entries
with the port group name, VLAN ID, and CIDRs that are specific to your environment.
DHCP
Network Type Description & Recommendations
Service
NSX ALB Option NSX ALB controllers and SEs will be attached to this network.
Management al Use static IPs for the NSX ALB controllers.
Network
TKG Workload IP Pool Control plane and worker nodes of TKG Workload Clusters will be attached to this
Network network.
TKG Cluster No Virtual services for Control plane HA of all TKG clusters (Supervisor and Workload).
VIP/Data Reserve sufficient IPs depending on the number of TKG clusters planned to be deployed
Network in the environment, NSX ALB handles IP address management on this network via IPAM.
This document uses the following port groups, subnet CIDRs, and VLANs. Replace these with values
that are specific to your environment. Plan network subnet sizes according to applications need and
future requirement.
Gateway DHCP
Network Type Port Group Name VLAN IP Pool for SE/VIP in NSX ALB
CIDR Enabled
Gateway DHCP
Network Type Port Group Name VLAN IP Pool for SE/VIP in NSX ALB
CIDR Enabled
Firewall Requirements
Ensure that the firewall is set up as described in Firewall Requirements.
Resource Pools
Ensure that resource pools and folders are created in vCenter. The following table shows a sample
entry for the resource pool and folder. Customize the resource pool and folder name for your
environment.
Deployment Overview
Here are the high-level steps for deploying Tanzu Kubernetes operations on vSphere networking
backed by VDS:
Note
Starting with vSphere 8, when you enable vSphere with Tanzu, you can configure
either one-zone Supervisor mapped to one vSphere cluster or three-zone
Supervisor mapped to three vSphere clusters. This document covers One-Zone
supervisor deployment with VDS Networking and NSX Advanced Load Balancer.
Requirements for Cluster Supervisor Deployment with NSX Advanced Load Balancer
NSX Advanced Load Balancer is deployed in write access mode in the vSphere environment. This
mode grants NSX Advanced Load Balancer controllers full write access to the vCenter. Full write
access allows automatically creating, modifying, and removing Service Engines and other resources
as needed to adapt to changing traffic needs.
For a production-grade deployment, VMware recommends deploying three instances of the NSX
Advanced Load Balancer controller for high availability and resiliency.
The following table provides a sample IP address and FQDN set for the NSX Advanced Load
Balancer controllers:
2. Select the cluster where you want to deploy the NSX Advanced Load Balancer controller
node.
3. Right-click on the cluster and invoke the Deploy OVF Template wizard.
Complete the configuration and deploy NSX Advanced Load Balancer controller node.
For more information, see the product documentation Deploy the Controller.
On a browser, go to https://https://sfo01albctlr01a.sfo01.rainpole.vmw/.
2. Configure System Settings by specifying the backup passphrase and DNS information.
Service Engine Context: Service Engines are managed within the tenant context, not
shared across tenants.
If you did not select the Setup Cloud After option before saving, the initial configuration wizard exits.
The Cloud configuration window does not automatically launch and you are directed to a Dashboard
view on the controller.
Configure Default-Cloud
1. Navigate to Infrastructure > Clouds and edit Default-Cloud.
2. Select VMware vCenter/vSphere ESX as the infrastructure type and click Next.
3. On the vCenter/vSphere tab, click SET CREDENTIALS and configure the following:
vCenter Credentials: Username/password of the vCenter account to use for NSX ALB
integration.
Select the vSphere Data Center where you want to enable Workload Management.
Select Content library which holds tanzu kubernetes release ova templates.
If DHCP is not available, enter the IP Subnet, IP address range (Add Static IP
Address Pool), Default Gateway for the Management Network, then click Save.
Configure Licensing
Tanzu for Kubernetes Operations requires an NSX Advanced Load Balancer Enterprise license.
2. Click the pencil icon on the upper right corner to enter edit mode.
3. . On the Update System Settings dialog, edit the settings for the NTP server that you want
to use.
To run a three-node controller cluster, you deploy the first node and perform the initial
configuration, and set the Cluster IP. After that, you deploy and power on two more controller VMs.
However, do not run the initial configuration wizard or change the administrator password for the two
additional controllers VMs. The configuration of the first controller VM is assigned to the two new
controller VMs.
3. Specify a name for the controller cluster and set the cluster IP address. This IP address
should be from the NSX Advanced Load Balancer management network.
4. In Cluster Nodes, specify the IP addresses of the two additional controllers that you have
deployed.
5. Click Save.
The controller cluster setup starts. The controller nodes are rebooted in the process. It takes
approximately 10-15 minutes for cluster formation to complete.
You are automatically logged out of the controller node you are currently logged in. Enter the
cluster IP address in a browser to see the cluster formation task details.
The first controller of the cluster receives the “Leader” role. The second and third controllers will
work as “Follower”.
After the controller cluster is deployed, use the controller cluster IP address for doing any additional
The controller has a default self-signed certificate. But this certificate does not have the correct SAN.
You must replace it with a valid or self-signed certificate that has the correct SAN. You can create a
self-signed certificate or upload a CA-signed certificate.
Note
1. Navigate to the Templates > Security > SSL/TLS Certificate > and click Create and select
Controller Certificate.
2. The New Certificate (SSL/TLS) window appears. Enter a name for the certificate.
1. For Type select Self Signed and enter the following details:
Subject Alternate Name (SAN): Enter the cluster IP address or FQDN of the
controller cluster and all controller nodes.
Key Size
2. Click Save.
Click Save.
To export the certificate, navigate to the Templates > Security > SSL/TLS Certificate page and
export the certificate by clicking Export.
On the Export Certificate page, click Copy to clipboard against the certificate. Do not copy the key.
Save the copied certificate to use when you enable workload management.
Optionally, you can reconfigure the Default-Group to define the placement and number of Service
Engine VMs settings.
This document uses the Default Service Engine Group without modification.
For more information, see the product documentation Configure a Service Engine Group.
Optionally, if DHCP is unavailable, you can configure a pool of IP addresses to assign to the Service
Engine interface on that network.
1. Navigate to Infrastructure > Cloud Resources > Networks and locate the network that
provides the virtual IP addresses.
5. Click Add Static IP Address Pool to specify the IP address pool for the VIPs and Service
Engine. The range must be a subset of the network CIDR configured in IP Subnet.
For more information, see the product documentation Configure a Virtual IP Network.
2. Click Create.
6. Click Save.
2. Click Create and select IPAM Profile from the dropdown menu.
Select Default-Cloud.
Choose the VIP network that you have created in Configure a Virtual IP Subnet for
the Data Network.
5. Click Save.
Click Save
You have created a vSphere cluster with at least three ESXi hosts. If you are using vSAN you
need a minimum of four ESXi hosts.
The vSphere cluster has HA & DRS enabled and DRS is configured in the fully-automated
mode.
The required port groups have been created on the distributed switch to provide networking
to the Supervisor and workload clusters.
You have created a Subscribed Content Library to automatically pull the latest Tanzu
Kubernetes releases from the VMware repository.
You have created a storage policy that will determine the datastore placement of the
Kubernetes control plane VMs, containers, and images.
NSX Advanced Load Balancer is deployed and configured as per instructions provided
earlier.
1. Log in to the vSphere client and navigate to Menu > Workload Management and click Get
Started.
3. Select CLUSTER DEPLOYMENT and a cluster from the list of compatible clusters and click
Next.
4. Select the Control Plane Storage Policy for the nodes from the drop-down menu and click
Next.
5. On the Load Balancer screen, select Load Balancer Type as NSX Advanced Load Balancer
and provide the following details:
Name: Friendly name for the load balancer. Only small letters are supported in the
name field.
NSX Advanced Load Balancer Controller IP: If the NSX Advanced Load Balancer
self-signed certificate is configured with the hostname in the SAN field, use the same
hostname here. If the SAN is configured with an IP address, provide the controller
cluster IP address. The default port of NSX Advanced Load Balancer is 443.
NSX Advanced Load Balancer Credentials: Provide the NSX Advanced Load
Balancer administrator credentials.
Server Certificate: Use the content of the controller certificate that you exported
earlier while configuring certificates for the controller.
6. Click Next.
7. On the Management Network screen, select the port group that you created on the
distributed switch and provide the required networking details.
If DHCP is enabled for the port group, set the Network Mode to DHCP.
Ensure that the DHCP server is configured to hand over the DNS server address, DNS
search domain, and NTP server address via DHCP.
8. Click Next.
9. On the Workload Network screen, select the network that will handle the networking traffic
for Kubernetes workloads running on the Supervisor cluster.
Set the Network Mode to DHCP if the port group is configured for DHCP.
11. On the Review and Confirm screen, select the size for the Kubernetes control plane VMs
that are created on each host from the cluster. For production deployments, VMware
recommends a large form factor.
The Workload Management task takes approximately 30 minutes to complete. After the task
completes, three Kubernetes control plane VMs are created on the hosts that are part of the
vSphere cluster.
The Supervisor Cluster gets an IP address from the VIP network that you configured in the NSX
Advanced Load Balancer. This IP address is also called the Control Plane HA IP address.
In the backend, three supervisor Control Plane VMs are deployed in the vSphere namespace. A
Virtual Service is created in the NSX Advanced Load Balancer with three Supervisor Control Plane
nodes that are deployed in the process.
For additional product documentation, see Enable Workload Management with vSphere Networking.
The Kubernetes CLI Tools download package includes two executables: the standard open-source
kubectl and the vSphere Plugin for kubectl. The vSphere Plugin for kubectl extends the commands
available to kubectl so that you connect to the Supervisor Cluster and to Tanzu Kubernetes clusters
using vCenter Single Sign-On credentials.
For additional product documentation, see Download and Install the Kubernetes CLI Tools for
vSphere.
After your connection to the Supervisor Cluster is established you can switch to the Supervisor
context by running the command:
Every workload cluster that you deploy runs in a Supervisor namespace. To learn more about
namespaces, see the vSphere with Tanzu documentation
5. Enter a name for the namespace and select the workload network for the namespace.
Note
The Name field accepts only lower case letters and hyphens.
For additional product documentation, see Create and Configure a vSphere Namespace.
Choose the Identity source, search for the User/Group that will have access to the namespace, and
define the Role for the selected User/Group.
To assign a storage policy to the namespace, on the Summary tab, click Add Storage.
From the list of storage policies, select the appropriate storage policy and click OK.
After the storage policy is assigned to a namespace, vSphere with Tanzu creates a matching
Kubernetes storage class in the vSphere Namespace.
To configure resource limitations for the namespace, on the Summary tab, click Edit Limits for
Capacity and Usage.
The storage limit determines the overall amount of storage that is available to the namespace.
vSphere with Tanzu includes several default VM classes and each class has two editions: guaranteed
and best effort. A guaranteed edition fully reserves the resources that a VM specification requests. A
best-effort class edition does not and allows resources to be overcommitted.
More than one VM Class can be associated with a namespace. To learn more about VM classes, see
the vSphere with Tanzu documentation.
2. From the list of the VM Classes, select the classes that you want to include in your
namespace.
3. Click Ok.
The namespace is fully configured now. You are ready to register your supervisor cluster with Tanzu
Mission Control and deploy your first Tanzu Kubernetes Cluster.
By integrating Supervisor Cluster with Tanzu Mission Control (TMC) you are provided a centralized
administrative interface that enables you to manage your global portfolio of Kubernetes clusters. It
also allows you to deploy Tanzu Kubernetes clusters directly from Tanzu Mission Control portal and
install user-managed packages leveraging the TMC Catalog feature.
Note
This section uses the terms Supervisor Cluster and management cluster
interchangeably.
Ensure that the following are configured before you integrate Tanzu Kubernetes Grid with Tanzu
Mission Control:
Policies that are appropriate for your Tanzu Kubernetes Grid deployment.
A provisioner. A provisioner helps you deploy Tanzu Kubernetes Grid clusters across
multiple/different platforms, such as AWS and VMware vSphere.
Do the following to register the Supervisor cluster with Tanzu Mission Control:
2. Go to Administration > Management clusters > Register Management Cluster tab and
3. On the Register management cluster page, provide a name for the management cluster, and
choose a cluster group.
You can optionally provide a description and labels for the management cluster.
4. If you are using a proxy to connect to the Internet, you can configure the proxy settings by
toggling the Set proxy option to yes.
5. On the Register page, Tanzu Mission Control generates a YAML file that defines how the
management cluster connects to Tanzu Mission Control for registration. The credential
provided in the YAML expires after 48 hours.
Copy the URL provided on the Register page. This URL is needed to install the TMC agent
on your management cluster and complete the registration process.
6. Login to vSphere Client and select the cluster which is enabled for Workload Management
and navigate to the Workload Management > Supervisors > sfo01w01supervisor01 >
Configure > Tanzu Mission Control and enter the registration URL in the box provided and
click Register.
When the Supervisor Cluster is registered with Tanzu Mission Control, the TMC agent is
installed in the svc-tmc-cXX namespace, which is included with the Supervisor Cluster by
default.
Once the TMC agent is installed on the Supervisor cluster and all pods are running in the
svc-tmc-cXX namespace, the registration status shows “Installation successful”.
7. Return to the Tanzu Mission Control console and click Verify Connection.
8. Clicking View Management Cluster takes you to the overview page which displays the health
of the cluster and its components.
After installing the agent, you can use the Tanzu Mission Control web interface to provision
and manage Tanzu Kubernetes clusters.
For additional product documentation, see Integrate the Tanzu Kubernetes Grid Service on the
Supervisor Cluster with Tanzu Mission Control.
2. On the Create cluster page, select the Supervisor cluster that you registered in the previous
step and click on Continue to create cluster.
3. Select the provisioner for creating the workload cluster. Provisioner reflects the vSphere
namespaces that you have created and associated with the Supervisor cluster.
4. Enter a name for the cluster. Cluster names must be unique within an organization.
5. Select the cluster group to which you want to attach your cluster and cluster class and click
next. You can optionally enter a description and apply labels.
Note
You cannot change the cluster class after the workload cluster created.
6. On next page, you can optionally specify a proxy configuration to use for this cluster.
Note
This document doesn’t cover the use of a proxy for vSphere with Tanzu. If
your environment uses a proxy server to connect to the Internet, ensure the
proxy configuration object includes the CIDRs for the pod, ingress, and
egress from the workload network of the Supervisor Cluster in the No proxy
list, as explained in Create a Proxy Configuration Object for a Tanzu
Kubernetes Grid Service Cluster Running in vSphere with Tanzu.
You can optionally define an alternative CIDR for pod and service. The Pod CIDR and
Service CIDR cannot be changed after the cluster is created.
Select the Kubernetes version to use for the cluster. The latest supported version is
preselected for you. You can choose the appropriate Kubernetes version by clicking
on the down arrow button.
Select OS version.
Select the High Availability mode for the control plane nodes of the workload cluster.
For a production deployment, it is recommended to deploy a highly available
workload cluster.
Select the default storage class for the cluster. The list of storage classes that you can
choose from is taken from your vSphere namespace.
You can optionally select a different instance type for the cluster’s control plane
node. Control plane endpoint and API server port options are not customizable here
as they will be retrieved from the management cluster.
9. Click Next.
click next
Cluster creation approximately takes 15-20 minutes to complete. After the cluster deployment
completes, ensure that Agent and extensions health shows green.
A self-service namespace is a new feature that is available with vSphere 7.0 U2 and later versions
and allows users with DevOps persona to create and consume vSphere namespaces in a self-service
fashion.
Before a DevOps user can start creating a namespace, the vSphere Administrator must enable
Namespace service on the supervisor cluster; this will build a template that will be used over and
The workflow for enabling Namespace service on the supervisor cluster is as follows:
1. Log in to the vSphere client and select the cluster configured for workload management.
3. Configure the quota for the CPU/Memory/Storage, storage policy for the namespace,
Network, VM Classes and Content Library.
4. On the permissions page, select the identity source (AD, LDAP, etc) where you have created
the users and groups for the Developer/Cluster Administrator. On selecting the identity
source, you can search for the user/groups in that identity source.
5. Review the settings and click Finish to complete the namespace service enable wizard.
This document lays out a reference design for deploying VMware Tanzu for Kubernetes Operations
on vSphere with Tanzu enabled. This document does not cover any recommendations or
The following reference design is based on the architecture and components described in VMware
Tanzu for Kubernetes Operations Reference Architecture.
The Supervisor Cluster runs on top of an SDDC layer that consists of ESXi for compute, NSX
Data Center or vSphere networking, and vSAN or another shared storage solution.
clusters. For each namespace, you configure role-based access control ( policies and
permissions ), images library, and virtual machine classes.
Tanzu Kubernetes Grid Service: Tanzu Kubernetes Grid Service allows you to create and
manage ubiquitous Kubernetes clusters on a VMware vSphere infrastructure using the
Kubernetes Cluster API. The Cluster API provides declarative, Kubernetes-style API’s to
enable the creation, configuration, and management of the Tanzu Kubernetes cluster.
vSphere 8.0 and above supports the ClusterClass API. The ClusterClass API is a collection
of templates that define a cluster topology and configuration.
Tanzu Kubernetes Cluster (Workload Cluster): Tanzu Kubernetes clusters are Kubernetes
workload clusters in which your application workloads run. These clusters can be attached to
SaaS solutions such as Tanzu Mission Control, Tanzu Observability, and Tanzu Service Mesh,
which are part of Tanzu for Kubernetes Operations.
VM Class in vSphere with Tanzu: A VM class is a template that defines CPU, memory, and
reservations for VMs. VM classes are used for VM deployment in a Supervisor Namespace.
VM classes can be used by standalone VMs that run in a Supervisor Namespace, and by VMs
hosting a Tanzu Kubernetes cluster.
VM Classes in a vSphere with Tanzu are categorized into the following two groups:
vSphere with Tanzu offers several default VM classes. You can either use the default VM
classes, or create customized VM classes based on the requirements of the application. The
following table explains the default VM classes that are available in vSphere with Tanzu:
best-effort-xsmall 2 2 No
best-effort-small 2 4 No
best-effort-medium 2 8 No
best-effort-large 4 16 No
best-effort-xlarge 4 32 No
best-effort-2xlarge 8 64 No
best-effort-4xlarge 16 128 No
best-effort-8xlarge 32 128 No
guaranteed-xsmall 2 2 Yes
guaranteed-small 2 4 Yes
guaranteed-medium 2 8 Yes
guaranteed-large 4 16 Yes
guaranteed-xlarge 4 32 Yes
guaranteed-2xlarge 8 64 Yes
Storage Classes in vSphere with Tanzu: A StorageClass allows the administrators to describe
the classes of storage that they offer. Different storage classes can map to meet quality-of-
service levels, to backup policies, or to arbitrary policies determined by the cluster
administrators. The policies representing datastore can manage storage placement of such
components and objects as control plane VMs, vsphere Pod ephemeral disks, and container
images. You might need policies for storage placement of persistent volumes and VM
content libraries.
You can deploy vSphere with Tanzu with an existing default storage class or the vSphere
administrator can define storage class objects (Storage policy) that let cluster users
dynamically create PVC and PV objects with different storage types and rules.
Decision Design
Design Justification Design Implications
ID Decision
TKO- Create custom To provide different levels of Default Storage Policy might not be
TKGS- Storage QoS and SLA for prod and adequate if deployed applications have
001 Classes/Profiles dev/test K8s workloads. different performance and availability
/Policies requirements.
TKO- Create custom To facilitate deployment of Default VM Classes in vSphere with Tanzu are
TKGS- VM Classes K8s workloads with specific not adequate to run a wide variety of K8s
002 compute/storage workloads.
requirements.
vSphere Pods: vSphere with Tanzu introduces a new construct that is called vSphere Pod,
which is the equivalent of a Kubernetes pod. A vSphere Pod is a Kubernetes Pod that runs
directly on an ESXi host without requiring a Kubernetes cluster to be deployed. vSphere
Pods are designed to be used for common services that are shared between workload
clusters, such as a container registry.
A vSphere Pod is a VM with a small footprint that runs one or more Linux containers. Each
vSphere Pod is sized precisely for the workload that it accommodates and has explicit
resource reservations for that workload. It allocates the exact amount of storage, memory,
and CPU resources required for the workload to run. vSphere Pods are only supported with
Supervisor Clusters that are configured with NSX Data Center as the networking stack.
vCenter Single Sign-On: This is the default identity provider that is used to authenticate with
vSphere with Tanzu environment, including the Supervisors and Tanzu Kubernetes Grid
Clusters. vCenter Single Sign-On (SSO) provides authentication for vSphere infrastructure
and can integrate with AD/LDAP systems.
To authenticate using vCenter SSO, use vSphere plug-in for kubectl. Once authenticated,
use kubectl to declaratively provision and manage the lifecycle of TKG clusters, deploy TKG
cluster workloads.
External Identity Provider: You can configure a Supervisor with an external identity provider
and support the OpenID Connect protocol. Once connected, the Supervisor functions as an
OAuth 2.0 client, and uses the Pinniped authentication service to connect to Tanzu
Kubernetes Grid clusters by using the Tanzu CLI. Each Supervisor instance can support one
external identity provider. For more information about the list of supported OIDC providers,
see Configure an External IDP.
The Tanzu Kubernetes Grid (informally known as TKG) cluster permissions are set and scoped at the
vSphere Namespace level. When permissions are set for Namespace, including identity source,
users & groups, and roles, all these permissions apply to any TKG cluster deployed within that
vSphere Namespace.
Permission Description
Can view Read-only access to TKG clusters provisioned in that vSphere Namespace.
Can edit Create, read, update, and delete TKG clusters in that vSphere Namespace.
Owner Can administer TKG clusters in a vSphere Namespace, and can create and delete additional vSphere
Namespaces using kubectl.
Supervisor control plane VM: Three Supervisor control plane VMs in total are created on
the hosts that are part of the Supervisor Cluster. The three control plane VMs are load
balanced as each one of them has its own IP address. Additionally, a floating IP address is
assigned to one of the VMS and a fifth IP address is reserved for patching purposes. vSphere
DRS determines the exact placement of the control plane VMs on the ESXi hosts part of the
cluster and migrate them when needed.
Tanzu Kubernetes Grid and Cluster API: Modules running on the Supervisor and enable the
provisioning and management of Tanzu Kubernetes Grid clusters.
Virtual Machine Service: A module that is responsible for deploying and running stand-
alone VMs, and VMs that makeup the Tanzu Kubernetes Grid clusters.
The following diagram shows the general architecture of the Supervisor Cluster.
After a Supervisor Cluster is created, the vSphere administrator creates vSphere namespaces. When
initially created, vSphere namespaces have unlimited resources within the Supervisor Cluster. The
vSphere administrator defines the limits for CPU, memory, and storage, as well as the number of
Kubernetes objects such as deployments, replica sets, persistent volumes that can run within the
namespace. These limits are configured for each vSphere namespace.
For more information about the maximum supported number, see the vSphere with Tanzu
Configuration Maximums guide.
To provide tenants access to namespaces, the vSphere administrator assigns permission to users or
groups available within an identity source that is associated with vCenter SSO.
Once the permissions are assigned, the tenants can access the namespace to create Tanzu
Kubernetes clusters using the YAML file and Cluster API.
vSAN
VMFS
NFS
vVols
vSphere with Tanzu uses storage policies to integrate with shared datastores. The policies represent
datastores and manage the storage placement of objects such as control plane VMs, container
images, and persistent storage volumes.
Before you enable vSphere with Tanzu, create storage policies to be used by the Supervisor Cluster
and namespaces. Depending on your vSphere storage environment, you can create several storage
policies to represent different classes of storage.
vSphere with Tanzu is agnostic about which storage option you choose. For Kubernetes stateful
workloads, vSphere with Tanzu installs the vSphere Container Storage Interface (vSphere CSI) to
automatically provision Kubernetes persistent volumes for pods.
vSphere backed with virtual Distributed Switch (VDS) Networking and HA proxy to provide
Load Balancing capabilities.
vSphere backed with virtual Distributed Switch (VDS) Networking and NSX Advanced Load
Balancer to provide Load Balancing capabilities.
Note
The scope of this document is limited to VMware NSX Data Center Networking.
NSX provides network connectivity to the objects inside the Supervisor and external networks.
Connectivity to the ESXi hosts within the cluster is backed by VLAN backed port groups.
The following diagram shows a general overview of vSphere with Tanzu on NSX Networking.
The Supervisor cluster configured with NSX Networking either uses a distributed port group
(routable to required infrastructure components such as vCenter, NSX manager, DNS , NTP and so
on. For more information, see Firewal Recommendation) or to NSX segment to provide connectivity
to Kubernetes control plane VMs. Tanzu Kubernetes clusters and vSphere Pods have their
networking provided by NSX segments. All hosts from the cluster, which are enabled for vSphere
with Tanzu, are connected to the distributed switch that provides connectivity to Kubernetes
workloads and control plane VMs.
The following section explains the networking components and services included in the Supervisor
cluster:
NSX Container Plugin (NCP) provides integration between NSX and Kubernetes. The main
component of NCP runs in a container and communicates with the NSX manager and with
the Kubernetes control plane. NCP monitors changes to containers and other resources and
manages resources such as logical ports, segments, routers, and security groups for the
containers by calling the NSX API.
By default, NCP creates one shared tier-1 gateway for system namespaces, and a tier-1
gateway and load balancer for each namespace. The tier-1 gateway for namespace is
connected to the tier-0 gateway and a default segment.
System namespaces are the namespaces that are used by the core components that are
integral to functioning of the Supervisor and Tanzu Kubernetes Grid clusters. The shared
network resources that include the tier-1 gateway, load balancer, and SNAT IP are grouped
in a system namespace.
NSX Edge provides connectivity from external networks to the Supervisor resources. An
NSX edge cluster normally includes at least two Edge nodes and has a load balancer that
provides a redundancy to the Kube-API servers residing on control plane VMs and any
application that must be published and be accessible from outside the Supervisor cluster. For
more information, see Install and Configure NSX for vSphere with Tanzu.
A tier-0 gateway is associated with the NSX Edge cluster to provide routing to the external
network. The uplink interfaces use either the dynamic routing, BGP, or static routing.
Workloads running in vSphere Pods, regular VMs, or Tanzu Kubernetes clusters, that are in
the same namespace, share a same SNAT IP for North-South connectivity.
Workloads running in vSphere Pods or Tanzu Kubernetes clusters will have the same
isolation rule that is implemented by the default firewall.
A separate SNAT IP is not required for each Kubernetes namespace. East west connectivity
between namespaces does not require SNAT.
The segments for each namespace reside on the vSphere Distributed Switch (VDS)
functioning in Standard mode that is associated with the NSX Edge cluster. The segment
provides an overlay network to the Supervisor Cluster.
Each vSphere namespace has a separate network and set of networking resources shared by
applications inside the namespace, such as tier-1 gateway, load balancer service, and SNAT
IP address.
Workloads running in Tanzu Kubernetes Grid clusters have the same isolation rule that is
implemented by the default firewall.
NSX LB provides
L4 Load Balancer service for Kube-API to the Supervisor cluster and Workload
clusters.
L4 Load Balancer service for all services of type LoadBalancer deployed in Workload
clusters.
Network Requirements
The following table lists the required networks for the reference design:
Note
Based on your business requirements, modify subnet range to fit the projected
growth.
Supervisor /28 to allow for 5 IPs and Network to host the supervisor VMs. It can be a VLAN backed
Management future expansion. VDS Port group or pre-created NSX segment.
Network
Ingress IP range /24, 254 address Each service type Load Balancer deployed will consume 1 IP
address.
Egress IP range /27 Each vSphere namespace consumes 1 IP address for the SNAT
egress.
Supervisor Service /24 Network from which IPs for Kubernetes ClusterIP Service will
CIDR be allocated.
Firewall Recommendations
To prepare the firewall, you need the following information:
The following table provides a list of firewall rules based on the assumption that there is no firewall
within a subnet or VLAN.
vCenter Supervisor Network TCP:6443 Allows vCenter to manage the supervisor VMs.
Supervisor NSX Manager TCP:443 Allows supervisor to access NSX Manager to orchestrate
Network networking.
Supervisor Workload Network TCP:6443 GCM, VMOperator needs to communicate with TKC
Network apiserver.
UDP:53
TCP:443
TCP:6443
TCP:6443
TCP:22
TCP:6443
Note
For Tanzu Mission Control (TMC), if the firewall does not allow wildcards, you just
whitelist all IP addresses of [account].tmc.cloud.vmware.com and extensions.aws-
usw2.tmc.cloud.vmware.com.
Network Segmentation
By default, when vSphere namespaces are created, distributed firewall rules are added to block all
access to VMs from sources outside the namespace, other than the Supervisor cluster. This ensure
that the VMs and the vSphere Pods by default are not able to communicate directly with the VMs or
the pods in another namespace.
The NSX distributed firewall applies only to ports on switches known to the ESXi host and does not
apply to router ports. This distinction is important as NSX load balancer virtual server interfaces are
considered router ports, as they only exist as service within a Tier 1 Gateway, which means thats the
ports are not known to the ESXi host. The router ports do not include any metadata or tags, which
means that the distributed firewall has no way to learn which namespace owns the virtual server.
To isolate traffic between separate namespaces, use one of the following options:
1. When creating the namespace, override the network settings to define dedicated IP blocks.
This enables distributed firewall rules to be added to drop/deny traffic from VMs in other
namespaces towards the ingress IP pool. This pattern requires separate IP ranges are used
for ingress, egress and namespace networks. The networks are expandable at any time, but
monitoring and managing IP capacity has an additional overhead.
2. Use gateway firewalls to restrict traffic coming in to each load balancer. The benefit is that no
additional IP management is required. The down side is that each namespace has its own
gateway with its own firewall rule table, which means that automation is significantly more
challenging to implement and manual management will be very difficult. Also the gateway
firewall has not had performance testing conducted against groups with dynamic
membership. This is an issue at scale, as a workload cluster can only be identified by the tags
applied to the segment it is attached to. This means in large environments there is potential
for a lot of firewall rebuilds during activities such as upgrades, which could lead to
performance issues.
Deployment options
With vSphere 8 and above, when you enable vSphere with Tanzu, you can configure either one-
zone Supervisor mapped to one vSphere cluster or three-zone Supervisor mapped to three vSphere
clusters. This reference architecture is based on single zone deployment of a Supervisor Cluster.
For more information, see VMware Tanzu for Kubernetes Operations using vSphere with Tanzu
Multi-AZ Reference Architecture on NSX Networking.
Installation Experience
vSphere with Tanzu deployment starts with deploying the Supervisor cluster (Enabling Workload
Management). The deployment is directly executed from the vCenter user interface (UI). The Get
Started page lists the pre-requisites for the deployment.
This installation process takes you through the steps of deploying Supervisor cluster in your vSphere
environment. Once the Supervisor cluster is deployed, you can use either Tanzu Mission Control or
Kubectl utility to deploy the Tanzu Kubernetes Grid Clusters.
The following tables list recommendations for deploying the Supervisor Cluster:
Decision
Design Decision Design Justification Design Implications
ID
TKO- Create a
Subscribed Content Library can Local Content Library would require
TKGS- Subscribed
automatically pull the latest OVAs used by manual upload of images, suitable
003 Content Library.
the Tanzu Kubernetes Grid Service to build for air-gapped or Internet restricted
cluster nodes. environment.
TKO- Deploy Supervisor Large form factor should suffice to integrate Consume more resources from
TKGS- cluster control Supervisor cluster with TMC. Infrastructure.
004 plane nodes in
large form factor.
TKO- Register the Tanzu Mission Control automates the Need outbound connectivity to the
TKGS- Supervisor cluster creation of the Tanzu Kubernetes clusters, internet for TMC registration.
005 with Tanzu Mission and manages the life cycle of all Tanzu
Control. Kubernetes clusters centrally.
Note
In this scenario, the SaaS endpoints refer to Tanzu Mission Control, Tanzu Service
Mesh, and Tanzu Observability.
The following tables list recommendations for deploying Tanzu Kubernetes Clusters on the
Supervisor Cluster:
Decision
Design Decision Design Justification Design Implications
ID
TKO- Deploy Tanzu The prod plan provides high availability for the Requires additional Compute
TKC-001 Kubernetes clusters control plane. resources.
with prod plan and
multiple worker
nodes.
TKO- Use guaranteed VM Guarantees compute resources are always Could prevent automatic
TKC- class for Tanzu available for containerized workloads. migration of nodes by DRS.
002 Kubernetes clusters.
TKO- Implement RBAC for To avoid the usage of administrator credentials External AD/LDAP needs to
TKC- Tanzu Kubernetes for managing the clusters. be integrated with vCenter or
003 clusters. SSO groups need to be
created manually.
TKO- Deploy Tanzu Tanzu Mission Control provides life-cycle Only Antrea CNI is supported
TKC-04 Kubernetes clusters management for the Tanzu Kubernetes clusters on Workload clusters created
from Tanzu Mission and automatic integration with Tanzu Service from TMC portal.
Control. Mesh and Tanzu Observability.
vSphere Namespaces
A vSphere Namespace provides the runtime environment for TKG clusters on Supervisor. To
provision a TKG cluster, you first configure a vSphere namespace with users, roles, permissions,
compute, storage, content library, and assign virtual machine classes. All these configurations are
inherited by TKG clusters deployed in that namespace.
When you create a vSphere namespace, a network segment is created which is derived from the
Namespace Network configured in Supervisor. While creating vSphere namespace, you have the
option to override cluster network settings. Choosing this option lets you customize the vSphere
namespace network by adding Ingress, Egress, and Namespace network CIDR (unique from the
Supervisor and from any other vSphere namespace).
The typical use case for overriding Supervisor network settings is to provision a TKG cluster with
routable pod networking.
Note
Decision
Design Decision Design Justification Design Implications
ID
TKO- Create dedicated Segregate prod/dev/test cluster via Clusters created within the namespace
TKGS- namespace to assigning them to dedicated share the same access
005 environment namespaces. policies/quotas/network & storage
specific. resources.
Decision
Design Decision Design Justification Design Implications
ID
TKO- Register external Limit access to namespace based on role External AD/LDAP needs to be
TKGS- IDP with Supervisor of users or groups. integrated with vCenter or SSO
006 or AD/LDAP with Groups need to be created manually.
vCenter SSO.
TKO- Enable namespace Enables Devops users to create The vSphere administrator must
TKGS- self-service namespace in self-service manner. publish a namespace template to
007 LDAP users or groups to enable them
to create a namespace.
TKO- Use guaranteed VM CPU and Memory limits configured on Consume more infrastructure
TKGS- Class for production vSphere Namespace have impact on TKG resources and contention might
008 cluster. cluster if deployed using the guaranteed occur.
VM Class type.
The v1alpha3 API lets you create conformant Kubernetes clusters of type TanzuKubernetesCluster.
This type of cluster is pre-configured with common defaults for quick provisioning, and can be
customized. The v1beta1 API lets you create conformant Kubernetes clusters based on the default
ClusterClass named tanzukubernetescluster and cluster type of Cluster.
Antrea
Calico
The CNI options are open-source software that provide networking for cluster pods, services, and
ingress.
When you deploy a Tanzu Kubernetes cluster using the default configuration of Tanzu CLI, Antrea
CNI is automatically enabled in the cluster.
To provision a Tanzu Kubernetes cluster using Calico CNI, see Deploy Tanzu Kubernetes clusters
with Calico.
Each CNI is suitable for a different use case. The following table lists some common use cases for the
CNI options that Tanzu Kubernetes Grid supports. This table will help you select the most appropriate
CNI for your Tanzu Kubernetes Grid implementation.
Ant
Enable Kubernetes pod networking with IP overlay networks using VXLAN Pros
rea
or Geneve for encapsulation. Optionally, encrypt node-to-node
- Antrea leverages Open vSwitch
communication using IPSec packet encryption.
as the networking data plane.
Antrea supports advanced network use cases like kernel bypass and Open vSwitch supports both
network service mesh. Linux and Windows.
Cali
Calico is used in environments where factors like network performance, Pros
co
flexibility, and power are essential.
- Support for Network Policies
For routing packets between nodes, Calico leverages the BGP routing
- High network performance
protocol instead of an overlay network. This eliminates the need to wrap
packets with an encapsulation layer resulting in increased network - SCTP Support
performance for Kubernetes workloads.
Cons
- No multicast support
For more information about Contour, see Contour and Ingress Using Contour.
Each ingress controller has advantages and disadvantages of its own. The following table provides
general recommendations on when you should use a specific ingress controller for your Kubernetes
environment:
Ingress
Use Cases
Controller
Contour
You use Contour when only north-south traffic is needed in a Kubernetes cluster. You can apply
security policies for the north-south traffic by defining the policies in the manifest file for the
application.
Istio You use Istio ingress controller when you need to provide security, traffic direction, and insight within
the cluster (east-west traffic), and between the cluster and the outside world (north-south traffic).
Container Registry
vSphere with Tanzu includes Harbor as a container registry. Harbor provides a location of pushing,
pulling, storing, and scanning container images used in your Kubernetes clusters.
The initial configuration and setup of the platform does not require any external registry because the
required images are delivered through vCenter. Harbor registry is used for day-2 operations of the
Tanzu Kubernetes workload clusters. Typical day-2 operations include tasks, such as pulling images
from Harbor for application deployment, and pushing custom images to Harbor.
When vSphere with Tanzu is deployed on NSX networking, you can deploy an external container
registry (Harbor) for Tanzu Kubernetes clusters.
VM-based deployment using OVA: VMware recommends using this installation method in
cases where Tanzu Kubernetes Grid is being installed in an air-gapped or Internet-restricted
environment, and no pre-existing image registry exists to host the Tanzu Kubernetes Grid
system images. VM-based deployments are only supported by VMware Global Support
Services to host the system images for air-gapped or Internet-less deployments. Do not use
this method for hosting application images.
When deploying Harbor with self-signed certificates or certificates signed by internal CAs, it is
necessary for the Tanzu Kubernetes cluster to establish trust with the registry’s certificate. To do so,
follow the procedure ins Integrate TKG 2 cluster with container registry.
The following table lists the supported scaling operations for TKG cluster:
Node Horizontal Scale Out Horizontal Scale In Vertical Scale Volume Scale
Note
- The number of control plane nodes must be odd, either 3 or 5. - You can change
the Worker node volumes after provisioning. However, you can not change the
control plane node volumes.
Tool Comments
Requires the Velero plug-in for vSphere installed and configured on Supervisor.
To backup and restore workloads running on TKG Cluster, create a datastore and install Velero with
Restic on Kubernetes cluster. For more information, see Install and Configure Standalone Velero and
Restic.
To learn more about Tanzu Kubernetes Grid integration with Tanzu SaaS, see Tanzu SaaS Services.
Summary
vSphere with Tanzu on hyper-converged hardware offers high-performance potential, convenience,
and addresses the challenges of creating, testing, and updating on-premises Kubernetes platforms in
a consolidated production environment. This validated approach will result in a production installation
with all the application services needed to serve combined or uniquely separated workload types
through a combined infrastructure solution.
This plan meets many Day 0 needs for quickly aligning product capabilities to full-stack
infrastructure, including networking, configuring firewall rules, load balancing, workload compute
alignment, and other capabilities.
creates a Kubernetes control plane directly in the hypervisor layer. You can then run the Kubernetes
containers by creating an upstream highly-available Kubernetes cluster through VMware Tanzu
Kubernetes Grid Service (informally known as TKGS), and run your applications inside these clusters.
This document provides a reference design for deploying a zonal supervisor with NSX Advanced
Load Balancer on VDS Networking.
For more information about Non-Zonal deployment, see VMware Tanzu for Kubernetes Operations
using vSphere with Tanzu Reference Design.
For more information about the the latest versions, see VMware Product Interoperability Matrix.
The Supervisor cluster runs on top of an software-defined data center (SDDC) layer that
consists of 3 vSphere clusters for compute, NSX for networking, and a shared storage such
as VSAN.
You can deploy a Supervisor on three vSphere Zones to provide cluster-level high-
availability that protects your Kubernetes workloads against cluster-level failure. A vSphere
Zone maps to one vSphere cluster that you can set up as an independent cluster failure
domain. In a three-zone deployment, all three vSphere clusters become one Supervisor.
Tanzu Kubernetes Grid Service: Tanzu Kubernetes Grid Service (TKGS) allows you to create
and manage ubiquitous Kubernetes clusters on a VMware vSphere infrastructure using the
Kubernetes Cluster API. The Cluster API provides declarative, Kubernetes-style APIs for
creating, configuring, and managing the Tanzu Kubernetes Cluster. vSPhere 8.0 and above
supports the ClusterClass API. ClusterClass is a collection of templates that define a cluster
topology and configuration.
Tanzu Kubernetes Cluster ( Workload Cluster ): Tanzu Kubernetes clusters are Kubernetes
workload clusters in which your application workloads run. These clusters can be attached to
SaaS solutions such as Tanzu Mission Control (TMC), Tanzu Observability, and Tanzu Service
Mesh, which are part of Tanzu for Kubernetes Operations.
VM Class in vSphere with Tanzu: A VM class is a template that defines CPU, memory, and
reservations for VMs. VM classes are used for VM deployment in a Supervisor Namespace.
VM classes can be used by standalone VMs that run in a Supervisor Namespace, and by VMs
hosting a Tanzu Kubernetes cluster. VM Classes in a vSphere with Tanzu are broadly
categorized into the following two groups:
vSphere with Tanzu offers several default VM classes. You can either use the default VM
classes, or create custom VM classes based on your application’s requirements. The
following screenshot shows the default VM classes that are available in vSphere with Tanzu.
Default VM Classes:
best-effort-xsmall 2 2 No
best-effort-small 2 4 No
best-effort-medium 2 8 No
best-effort-large 4 16 No
best-effort-xlarge 4 32 No
best-effort-2xlarge 8 64 No
best-effort-4xlarge 16 128 No
best-effort-8xlarge 32 128 No
guaranteed-xsmall 2 2 Yes
guaranteed-small 2 4 Yes
guaranteed-medium 2 8 Yes
guaranteed-large 4 16 Yes
guaranteed-xlarge 4 32 Yes
guaranteed-2xlarge 8 64 Yes
Note
If the default VM Classes are not meeting application compute and storage
requirements, you can create custom VM Classes.
Storage Classes in vSphere with Tanzu: A StorageClass allows the administrators to describe
the classes of storage that they offer. Different classes can map to meet quality-of-service
levels, to backup policies, or to arbitrary policies determined by the cluster administrators.
The policies represent datastore and manage storage placement of such components and
objects as control plane VMs, vSphere Pod ephemeral disks and container images. You
might need policies for storage placement of persistent volumes and VM content libraries.
A three-zone Supervisor supports zonal storage, where a datastore is shared across all hosts in a
single zone. Storage policies that you create for a Supervisor or for a namespace in a three-zone
Supervisor must be topology aware and have the consumption domain enabled. For more
information, see Create Storage Policy for a Three-Zone Supervisor.
When you prepare storage resources for three-zone Supervisor, consider the following parameters:
Storage in all three vSphere zones does not need to be of the same type. However, having
uniform storage in all three clusters provides a consistent performance.
Create a storage policy that is compliant with shared storage in each of the clusters. The
storage policy must be topology aware.
Cross-zonal volumes
vSphere POD
The following table provides the recommendations for configuring Storage Classes in a
vSphere with Tanzu environment:
Decision
Design Decision Design Justification Design Implications
ID
vCenter Single Sign-On (SSO): This is the default identity provider that is used to
authenticate in the vSphere with Tanzu environment, including Supervisors and Tanzu
Kubernetes Grid Clusters. The vCenter SSO provides authentication for vSphere
infrastructure, and can integrate with AD/LDAP systems.
External Identity Provider: You can configure a Supervisor with an external identity provider
and support the OpenID Connect protocol. Once connected, Supervisor functions as an
OAuth 2.0 client and uses the Pinniped authentication service to connect to Tanzu
Kubernetes Grid clusters by using the Tanzu CLI. Each Supervisor instance can support one
external identity provider. For more information about the list of supported OIDC providers,
see Configure an External IDP.
Supervisor Control Plane VM: A total of three Supervisor control plane VMs are created and
spread evenly across the three vSphere zones. The three Supervisor control plane VMs are
load balanced as each one of them has its own IP address. Additionally, a floating IP address
is assigned to one of the VMS and a fifth IP address is reserved for patching purposes.
vSphere DRS determines the exact placement of the control plane VMs on the ESXi hosts
part of zones and migrate them when needed.
Tanzu Kubernetes Grid Service and ClusterClass Modules running on Supervisor and
enable the provisioning and management of Tanzu Kubernetes Grid clusters. ClusterClass is
a new feature introduced as part of Cluster API which reduces the need for redundant
templating and enables powerful customization of clusters.
Virtual Machine Service: A module that is responsible for deploying and running stand-
alone VMs, and VMs that makeup the Tanzu Kubernetes Grid clusters.
After a Supervisor cluster is created, vSphere administrator creates vSphere namespaces. When
initially created, namespace has unlimited resources within the Supervisor cluster. The vSphere
administrator defines the limits for CPU, memory, and storage. The administrator also limits the
number of Kubernetes objects such as deployments, replica sets, persistent volumes, and so on, that
can run within the boundary of namespace. These limits are configured for each vSphere
namespace.
For more information about the the maximum supported number, see Configuration Maximum.
TKO- Create dedicated Segregate prod/dev/test Clusters created within the namespace share
TKGS- namespace to cluster via assigning them to the same access policies/quotas/network and
003 environment specific. dedicated namespaces. storage resources.
TKO- Enable namespace self- Enables Dev-Ops users to The vSphere administrator must publish a
TKGS- service. create namespace in self- namespace template to LDAP users/groups
005 service manner. enabling them to create a namespace.
VMFS
NFS
vVols
vSphere with Tanzu uses storage policies to integrate with backend shared datastore. The policies
represent datastore and manage storage placement of the control VMs.
vSphere with Tanzu is agnostic about which storage option you chose. For Kubernetes stateful
workloads, vSphere with Tanzu installs the vSphere Container Storage Interface ( vSphere CSI ) to
automatically provision Kubernetes persistent volumes for pods.
Note
vSphere backed with virtual Distributed Switch (VDS) Networking and HA proxy to provide
Load Balancing capabilities.
vSphere backed with virtual Distributed Switch (VDS) Networking and NSX Advanced Load
Balancer to provide Load Balancing capabilities.
Note
The scope of this document is limited to vSphere backed with VDS networking with
NSX Advanced Load Balancer.
The following diagram shows a general overview of vSphere with Tanzu on VDS networking.
NSX Advanced Load Balancer Controller: NSX Advanced Load Balancer Controller
manages Virtual Service objects and interacts with the vCenter Server infrastructure to
manage the lifecycle of the SEs. It is the central repository for the configurations and policies
related to services and management, and provides the portal for viewing the health of
VirtualServices and SEs, and the associated analytics that NSX Advanced Load Balancer
provides. The following table provides recommendations for configuring NSX Advanced
Decision
Design Decision Design Justification Design Implications
ID
TKO- Deploy 3 NSX ALB controller To achieve high-availability of the NSX NSX ALB network
NSXALB nodes, one to each vSphere ALB control plane when one of the should be L2 stretched
-001 Zone and make a cluster. vSphere clusters goes down. across three vSphere
Zone.
TKO- Deploy NSX ALB controller To isolate NSX ALB traffic from Additional VLAN is
NSXALB cluster nodes on a network infrastructure management traffic and required.
-002 dedicated to NSX-ALB. Kubernetes workloads.
TKO- Use NSX ALB IPAM for SE Guarantees IP address assignment for NA
NSXALB data network and virtual Service Engine and Virtual Services.
-004 services.
TKO- Reserve an IP in the NSX NSX Advanced Load Balancer portal is always accessible over
NSXALB Advanced Load Balancer Cluster IP regardless of a specific individual controller node
-005 management subnet to be failure.
used as the Cluster IP for the
Controller Cluster.
TKO- Configure backup for the Backups are required if the NSX ALB To store backups, a
NSXALB NSX ALB Controller cluster. Controller becomes inoperable or if the SCP capable backup
-006 environment needs to be restored from location is needed.
a previous state. SCP is the only
supported protocol
currently.
Decision
Design Decision Design Justification Design Implications
ID
SAML based
authentication requires
an NSX ALB Enterprise
license.
NSX Advanced Load Balancer Service Engine: NSX Advanced Load Balancer Service
Engines (SEs) are lightweight VMs that handle all data plane operations by receiving and
executing instructions from the controller. The SEs perform load balancing and all client and
server-facing network interactions. The following table provides recommendations for NSX
Advanced Load Balancer Service Engines deployment:
Decision
Design Decision Design Justification Design Implications
ID
TKO- Enable ALB Service Enable SEs to elect a primary amongst Requires NSX ALB
ALB-SE- Engine Self themselves in the absence of connectivity Enterprise Licensing. This
002 Elections. to the NSX ALB controller. feature is not supported
with NSX ALB essentials for
Tanzu license.
Decision
Design Decision Design Justification Design Implications
ID
TKO- Set the SE size to a This configuration should meet the most For services that require
ALB-SE- minimum 2vCPU generic use case. higher throughput, these
005 and 4GB of configurations need to be
memory. investigated and modified
accordingly.
Avi Kubernetes Operator (AKO): Avi Kubernetes Operator is a Kubernetes operator that
runs as a pod in the Supervisor cluster. It provides ingress and load balancing functionality.
Avi Kubernetes Operator translates the required Kubernetes objects to NSX Advanced Load
Balancer objects and automates the implementation of ingresses/routes/services on the
Service Engines (SE) via the NSX Advanced Load Balancer Controller.
Each environment configured in NSX Advanced Load Balancer is referred to as a cloud. Each cloud
in NSX Advanced Load Balancer maintains networking and NSX Advanced Load Balancer Service
Engine settings. Each cloud is configured with one or more VIP networks to provide IP addresses to
L4 load balancing virtual services created under that cloud.
The virtual services can be spanned across multiple Service Engines if the associated Service Engine
Group is configured in Active/Active HA mode. A Service Engine can belong to only one Service
Engine group at a time.
IP address allocation for virtual services can be over DHCP or via NSX Advanced Load Balancer in-
built IPAM functionality. The VIP networks created/configured in NSX Advanced Load Balancer are
associated with the IPAM profile.
Network Architecture
To deploy vSphere with Tanzu, build separate networks for Supervisor clusters, Tanzu Kubernetes
Grid Workload clusters, NSX Advanced Load Balancer, and the Tanzu Kubernetes Grid control plane
HA.
Note
The network/portgroup designated for the workload cluster, carries both data and
control traffic. Firewalls cannot be utilized to segregate traffic between workload
clusters; instead, the underlying CNI must be employed as the main filtering system.
Antrea CNI has the Custom Resource Definitions (CRDs) for firewall rules that can be
enforced before Kubernetes network policy is added.
Based on your requirements, you can create additional networks for your workload
cluster. These networks are also referred to as vSphere with Tanzu workload
secondary network.
Isolate and separate SDDC management components (vCenter, ESX) from the vSphere with
Tanzu components. This reference design allows only the minimum connectivity between
the Tanzu Kubernetes Grid clusters and NSX Advanced Load Balancer to the vCenter
Server.
Isolate and separate the NSX Advanced Load Balancer management network from the
supervisor cluster network and the Tanzu Kubernetes Grid workload networks.
Separate vSphere Admin and Tenant access to the supervisor cluster. This prevents tenants
from attempting to connect to the supervisor cluster.
Allow tenants to access only their own workload cluster(s) and restrict access to this cluster
from other tenants. This separation can be achieved by assigning permissions to the
supervisor namespaces.
Depending on the workload cluster type and use case, multiple workload clusters may
leverage the same workload network or new networks can be used for each workload
cluster.
Decision
Design Decision Design Justification Design Implications
ID
TKO- Use separate networks To have a flexible firewall and Sharing the same network for multiple clusters
NET-001 for Supervisor cluster security policies. can complicate creation of firewall rules.
and workload
clusters.
TKO- Use distinct port Isolate production Kubernetes Network mapping is done at the namespace
NET- groups for network clusters from dev/test clusters level. All Kubernetes clusters created in a
002 separation of by placing them on distinct namespace connect to the same port group.
Kubernetes port groups.
workloads.
Networking Prerequisites
All ESXi hosts which are part of three vSphere clusters share a common VDS with at least one
uplink. We recommend to have two uplinks (VDS must be version 8 and above).
Networking Requirements
As per the reference architecture, the following table lists the network requirements:
DHCP
Network Type Description
Service
NSX ALB Option ALB Controllers and SEs will be attached to this network.
Management al
Network
TKG Supervisor Option Supervisor cluster nodes will be attached to this network.
Network al
TKG Cluster No Virtual Services (L4) for control plane HA of Supervisor and workload. Reserve sufficient
VIP IPs depending on the number of TKG clusters planned to be deployed in the
environment.
Note
All the above networks should be L2 stretched across three vSphere clusters.
Note
Firewall Requirements
To prepare the firewall, perform the following options:
The following table provides a list of firewall rules based on the assumption that there is no firewall
within a subnet/VLAN.
Client Machine NSX Advanced Load TCP:443 Access NSX Advanced Load Balancer portal
Balancer Controller for configuration.
Nodes and VIP
Client Machine vCenter Server TCP:443 Access and configure WCP in vCenter.
TKG Management and NSX Advanced Load TCP:443 Allows Avi Kubernetes Operator (AKO) and
Workload Cluster CIDR Balancer controller AKO Operator (AKOO) access to NSX
nodes Advanced Load Balancer Controller.
TKG Management and TKG Cluster VIP Range TCP:6443 Allows the Supervisor cluster to configure
Workload Cluster CIDR workload clusters.
TKG Management and Image Registry (Harbor) TCP:443 Allows components to retrieve container
Workload Cluster CIDR (If Private) images.
TKG Management and TCP:443 Syncs content library, pull TKG binaries, and
wp-content.vmware.com
Workload Cluster CIDR interact with TMC.
*.tmc.cloud.vmware.co
m
Projects.registry.vmware
.com
TKG Management TKG Workload Cluster TCP:6443 VM Operator and TKC VM communication.
cluster CIDR CIDR
TKG Workload Cluster TKG Management Cluster TCP:6443 Allows the TKG workload cluster to register
CIDR CIDR with the Supervisor cluster.
NSX Advanced Load vCenter and ESXi Hosts TCP:443 Allow NSX Advanced Load Balancer to
Balancer Management discover vCenter objects and deploy SEs as
Network required.
TKG Cluster VIP Range TKG Management Cluster TCP:6443 To interact with the Supervisor cluster.
CIDR
TKG Cluster VIP Range TKG Workload Cluster To interact with workload cluster and
TCP:6443
CIDR Kubernetes applications.
TCP:443
TCP:80
TCP:22 (optional)
Note
For TMC, if the firewall does not allow wildcards, you must whitelist all IP addresses of
[account].tmc.cloud.vmware.com and extensions.aws-usw2.tmc.cloud.vmware.com.
Installation Experience
You can configure each vSphere cluster as an independent failure domain and map it to the vSphere
zone. In a Three-Zone deployment, all three vSphere clusters become one Supervisor and can
provide the following options:
Distribute the nodes of Tanzu Kubernetes Grid clusters across all three vSphere zones and
provide availability via vSphere HA at cluster level.
Scale the Supervisor by adding hosts to each of the three vSphere clusters.
vSphere with Tanzu deployment starts with deploying the Supervisor cluster on three vSphere
Zones. The deployment is directly done from the vCenter UI.
1. The Get Started page lists the pre-requisite for the deployment.
1. On the next page, provide a name for the Supervisor cluster, and select previously created
three vSphere Zones.
This installation process takes you through the steps of deploying Supervisor cluster in your vSphere
environment. Once the Supervisor cluster is deployed, you can use either Tanzu Mission Control or
Kubectl utility to deploy the Tanzu Kubernetes Grid Clusters.
Decision
Design Decision Design Justification Design Implications
ID
TKO- Deploy Supervisor Large form factor should suffice to integrate Consume more
TKGS- cluster control plane Supervisor cluster with TMC. Resources from
001 nodes in large form Infrastructure.
factor.
TKO- Register Supervisor Tanzu Mission Control automates the creation of the Need outbound
TKGS- cluster with Tanzu Tanzu Kubernetes clusters and manages the life cycle connectivity to the
002 Mission Control. of all Tanzu Kubernetes Clusters centrally. internet for TMC
registration.
Note
The SaaS endpoints here refer to Tanzu Mission Control, Tanzu Service Mesh, and
Tanzu Observability.
The v1alpha3 API lets you create conformant Kubernetes clusters of type TanzuKubernetesCluster.
This type of cluster is pre-configured with common defaults for quick provisioning, and can be
customized. The v1beta1 API lets you create conformant Kubernetes clusters based on the default
ClusterClass named tanzukubernetescluster and cluster type of Cluster.
Antrea
Calico
The CNI options are open-source software that provide networking for cluster pods, services, and
ingress.
When you deploy a Tanzu Kubernetes cluster using the default configuration of Tanzu CLI, Antrea
CNI is automatically enabled in the cluster.
To provision a Tanzu Kubernetes cluster using Calico CNI, see Deploy Tanzu Kubernetes Clusters
with Calico.
Each CNI is suitable for a different use case. The following tables lists some common use cases for
the CNI options that Tanzu Kubernetes Grid supports. This table will help you select the most
appropriate CNI for your Tanzu Kubernetes clusters with Calico.
Ant
Enable Kubernetes pod networking with IP overlay networks using VXLAN Pros:
rea
or Geneve for encapsulation. Optionally encrypt node-to-node
- Antrea leverages Open vSwitch
communication using IPSec packet encryption.
as the networking data plane.
Antrea supports advanced network use cases like kernel bypass and Open vSwitch supports both
network service mesh. Linux and Windows.
Cali
Calico is used in environments where factors like network performance, Pros:
co
flexibility, and power are essential.
Support for Network Policies
For routing packets between nodes, Calico leverages the BGP routing
High network performance
protocol instead of an overlay network. This eliminates the need to wrap
packets with an encapsulation layer resulting in increased network SCTP Support
performance for Kubernetes workloads.
Cons:
No multicast support.
TKO- Deploy Tanzu The prod plan provides high availability for the Consume resources from
TKC-001 Kubernetes clusters control plane. Infrastructure.
with prod plan and
multiple worker
nodes.
TKO- Assign storage policy Packages installation and applications Package and application
TKC- as default policy deployment requires storage policy defined as installation might fail.
002 during cluster default.
deployment.
TKO- Use guaranteed VM Guaranteed compute resources are always Could prevent automatic
TKC- class for Tanzu available for containerized workloads. migration of nodes by DRS.
003 Kubernetes clusters.
TKO- Implement RBAC for To avoid the usage of administrator credentials External AD/LDAP needs to be
TKC- Tanzu Kubernetes for managing the clusters. integrated with vCenter SSO or
004 clusters. External IDP integration with
Supervisor is required.
For more information about Contour, see the Contour site and Ingress Using Contour.
Each ingress controller has its own advantages and disadvantages. The below table provides general
recommendations on when you should use a specific ingress controller for your Kubernetes
environment.
Ingress
Use Cases
Controller
Contour
Use Contour when only north-south traffic is needed in a Kubernetes cluster. You can apply security
policies for the north-south traffic by defining the policies in the manifest file for the application.
Istio Use Istio ingress controller when you need to provide security, traffic direction, and insight within the
cluster (east-west traffic) and between the cluster and the outside world (north-south traffic).
Container Registry
Harbor provides a location of pushing, pulling, storing and scanning container images used in your
Kubernetes clusters.
The initial configuration and setup of the platform does not require any external registry because the
required images are delivered through vCenter. Harbor registry is used for day 2 operations of the
Tanzu Kubernetes workload clusters. Typical day-2 operations include tasks such as pulling images
from Harbor for application deployment and pushing custom images to Harbor.
When vSphere with Tanzu is deployed on VDS networking, you can deploy an external container
registry (Harbor) for Tanzu Kubernetes clusters.
Tanzu Kubernetes Grid Package deployment : VMware recommends this installation method
for general use cases. The Tanzu packages, including Harbor, must either be pulled directly
from VMware or be hosted in an internal registry.
VM-based deployment using OVA: VMware recommends using this installation method
when Tanzu Kubernetes Grid is being installed in an air-gapped or Internet-restricted
environment and no pre-existing image registry exists to host the Tanzu Kubernetes Grid
system images. VM-based deployments are only supported by VMware Global Support
Services to host the system images for air-gapped or Internet-less deployments. Do not use
this method for hosting application images. Harbor registry is being shipped along with TKG
binaries and can be downloaded from here.
If you are deploying Harbor without a publicly signed certificate, you must include the Harbor root
CA in your Tanzu Kubernetes Grid clusters. To configure the TKG cluster with private container
registry, see Integrate TKG 2 cluster with container registry.
The following table lists the supported scaling operations for the TKG cluster:
Node Horizontal Scale Out Horizontal Scale In Vertical Scale Volume Scale
Note
- The number of control plane nodes must be odd, either 3 or 5. - You can change
the Worker node volumes can be changed after provisioning. However, the control
plane node volumes cannot be changed.
Tool Comments
Requires that the Velero Plugin for vSphere is also installed and configured on
Supervisor.
To backup and restore workloads running on TKG Cluster on Zonal Supervisor, create a datastore
and install Velero with Restic on Kubernetes cluster. For more information, see Install and Configure
Standalone Velero and Restic.
Note
Velero plug-in for vSphere runs as a pod which is not supported with Zonal
Supervisor and it requires NSX-T networking. For more information, see the
prerequisites section of Install and Configure the Velero Plugin for vSphere on
Supervisor.
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: tkc-prod-cluster-1
namespace: prod
spec:
clusterNetwork:
services:
cidrBlocks: ["10.96.0.0/12"]
pods:
cidrBlocks: ["192.168.0.0/16"]
serviceDomain: "cluster.local"
topology:
class: tanzukubernetescluster
version: v1.23.8+vmware.2-tkg.2-zshippable
#describe the cluster control plane
controlPlane:
#number of control plane nodes; integer 1 or 3
replicas: 3
#describe the cluster worker nodes
workers:
#specifies parameters for a set of worker nodes in the topology
machineDeployments:
- class: node-pool
name: node-pool-1
replicas: 1
failureDomain: zone-a
- class: node-pool
name: node-pool-2
replicas: 1
failureDomain: zone-b
- class: node-pool
name: node-pool-3
replicas: 1
failureDomain: zone-c
variables:
#virtual machine class type and size for cluster nodes
- name: vmClass
value: guaranteed-small
#persistent storage class for cluster nodes
- name: storageClass
value: gold-sp
By running the following command, you can provision and verify the TKG Cluster across vSphere
Zones:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
replicas: 3
selector:
matchLabels:
app: nginx
serviceName: nginx
template:
metadata:
labels:
app: nginx
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- zone-1
- zone-2
- zone-3
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: topology.kubernetes.io/zone
containers:
- name: nginx
image: gcr.io/google_containers/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
- name: logs
mountPath: /logs
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: zonal-ds-policy-105-latebinding
resources:
requests:
storage: 2Gi
- metadata:
name: logs
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: zonal-ds-policy-105-latebinding
resources:
requests:
storage: 1Gi
Throughput 4 Gb/s
Connections/s 40k
Multiple performance vectors or features might impact the performance of your applications. For
instance, to achieve 1 Gb/s of SSL throughput and 2000 TPS of SSL with EC certificates, NSX
Advanced Load Balancer recommends two cores.
NSX Advanced Load Balancer SEs may be configured with as less as 1 vCPU core and 1 GB RAM, or
up to 36 vCPU cores and 128 GB RAM. SEs can be deployed in Active/Active or Active/Standby
mode depending on the license tier used. NSX Advanced Load Balancer Essentials license doesn’t
support Active/Active HA mode for SE.
This document lays out a reference design for deploying Supervisor on three vSphere Zones to
provide cluster-level high-availability. Each vSphere Zone maps to one vSphere cluster. A three-
zone Supervisor only supports Tanzu Kubernetes clusters and VMs, it does not support vSphere
Pods. This document does not cover any recommendations or deployment steps for underlying
software-defined data center (SDDC) environments.
For more information about the Non-Zonal Supervisor deployment, see VMware Tanzu for
Kubernetes Operations using vSphere with Tanzu on NSX-T Reference Design.
The following reference design is based on the architecture and components described in VMware
Tanzu for Kubernetes Operations Reference Architecture.
The following table provides the component versions and interoperability matrix supported with
reference design:
The Supervisor cluster runs on top of an SDDC layer that consists of 3 vSphere clusters for
computing, NSX for networking, and a shared storage such as VSAN.
You can deploy a Supervisor on three vSphere Zones to provide cluster-level high-
availability that protects your Kubernetes workloads against cluster-level failure. A vSphere
Zone maps to one vSphere cluster that you can set up as an independent cluster failure
domain. In a three-zone deployment, all three vSphere clusters become one Supervisor
cluster.
Tanzu Kubernetes Grid Service: Tanzu Kubernetes Grid Service (TKGS) allows you to create
and manage ubiquitous Kubernetes clusters on a VMware vSphere infrastructure using the
Kubernetes Cluster API. The Cluster API provides declarative Kubernetes-style APIs for
creating, configuring, and managing the Tanzu Kubernetes Cluster. vSphere 8.0 and above
supports the ClusterClass API. The ClusterClass API is a collection of templates that define a
cluster topology and configuration.
Tanzu Kubernetes Cluster ( Workload Cluster ): Tanzu Kubernetes clusters are Kubernetes
workload clusters in which your application workloads run. These clusters can be attached to
SaaS solutions such as Tanzu Mission Control (TMC), Tanzu Observability, and Tanzu Service
Mesh, which are part of Tanzu for Kubernetes Operations.
VM Class in vSphere with Tanzu: A VM class is a template that defines CPU, memory, and
reservations for VMs. VM classes are used for VM deployment in a Supervisor Namespace.
VM classes can be used by standalone VMs that run in a Supervisor Namespace, and by VMs
VM Classes in a vSphere with Tanzu are categorized into the following two groups:
vSphere with Tanzu offers several default VM classes. You can either use the default VM
classes, or create customized VM classes based on the requirements of the application. The
following table explains the default VM classes that are available in vSphere with Tanzu:
best-effort-xsmall 2 2 No
best-effort-small 2 4 No
best-effort-medium 2 8 No
best-effort-large 4 16 No
best-effort-xlarge 4 32 No
best-effort-2xlarge 8 64 No
best-effort-4xlarge 16 128 No
best-effort-8xlarge 32 128 No
guaranteed-xsmall 2 2 Yes
guaranteed-small 2 4 Yes
guaranteed-medium 2 8 Yes
guaranteed-large 4 16 Yes
guaranteed-xlarge 4 32 Yes
guaranteed-2xlarge 8 64 Yes
A three-zone Supervisor supports zonal storage, where a datastore is shared across all hosts in a
single zone. Storage policies that you create for a Supervisor or for a namespace in a three-zone
Supervisor must be topology aware and have the consumption domain enabled. For more
information, see Create Storage Policy for a Three-Zone Supervisor.
When you prepare storage resources for three-zone Supervisor, consider the following parameters:
Storage in all three vSphere zones does not need to be of the same type. However, having
uniform storage in all three clusters provides a consistent performance.
Create a storage policy that is compliant with shared storage in each of the clusters. The
storage policy must be topology aware.
Cross-zonal volumes
vSphere POD
The Ephemeral Disk Storage Policy and the Image Cache Storage Policy options are disabled
because vSphere Pods are not supported with a Zonal Supervisor deployment.
You cannot create a storage class manually by using kubectl and YAML. You can create a storage
class using the vSphere storage policy framework and apply it to the vSphere Namespace. While a
storage class cannot be created manually by using kubectl and YAML, an existing storage class can
be modified by using kubectl.
The following table provides recommendations for configuring Storage Classes in a vSphere with
Tanzu environment:
Decision
Design Decision Design Justification Design Implications
ID
vCenter Single Sign-On: This is the default identity provider that is used to authenticate with
vSphere with Tanzu environment, including the Supervisors and Tanzu Kubernetes Grid
Clusters. vCenter SSO provides authentication for vSphere infrastructure and can integrate
with AD/LDAP systems.
To authenticate using vCenter Single Sign-On, use vSphere plug-in for kubectl. Once
authenticated, use kubectl to declaratively provision and manage the lifecycle of TKG
clusters, deploy TKG cluster workloads.
External Identity Provider: You can configure a Supervisor with an external identity provider
and support the OpenID Connect protocol. Once connected, the Supervisor functions as an
OAuth 2.0 client, and uses the Pinniped authentication service to connect to Tanzu
Kubernetes Grid clusters by using the Tanzu CLI. Each Supervisor instance can support one
external identity provider. For more information about the list of supported OIDC providers,
see Configure an External IDP.
The Tanzu Kubernetes Grid (TKG) cluster permissions are set and scoped at the vSphere
Namespace level. When permissions are set for Namespace, including identity source, users &
groups, and roles, all these permissions apply to any TKG cluster deployed within that vSphere
Namespace.
Permission Description
Can view Read-only access to TKG clusters provisioned in that vSphere Namespace.
Can edit Create, read, update, and delete TKG clusters in that vSphere Namespace.
Owner Can administer TKG clusters in that vSphere Namespace, and can create and delete additional vSphere
Namespaces using kubectl.
Supervisor Control Plane VM: n this environment, three supervisor control plane VMs are
created and spread evenly across the three vSphere zones. The three Supervisor control
plane VMs are load balanced as each one of them has its own IP address. Additionally, a
floating IP address is assigned to one of the VMS and a fifth IP address is reserved for
patching purposes. vSphere DRS determines the exact placement of the control plane VMs
on the ESXi hosts part of zones and migrate them when needed.
Tanzu Kubernetes Grid and Cluster API: Modules running on the Supervisor and enable the
provisioning and management of Tanzu Kubernetes Grid clusters.
Virtual Machine Service: A module that is responsible for deploying and running stand-
alone VMs, and VMs that makeup the Tanzu Kubernetes Grid clusters.
VMFS
NFS
VSAN
vVOls
vSphere with Tanzu is agnostic about which storage option you choose. For Kubernetes stateful
workloads, vSphere with Tanzu installs the vSphere Container Storage Interface (vSphere CSI) to
automatically provision Kubernetes persistent volumes for pods.
Depending on your vSphere storage environment and needs of DevOps, you can create several
storage policies for different classes of storage. When you enable a Supervisor and set up
namespaces, you can assign different storage policies to be used by various objects, components,
and workloads.
Create three vSphere clusters with at least 3 hosts. For using vSAN, the cluster must have 3
or 4 hosts.
Configure storage with VSAN or other shared storage for each cluster.
Enable vSphere HA and vSphere DRS on Fully Automate or Partially Automate mode.
vSphere backed with virtual Distributed Switch (VDS) Networking and HA proxy to provide
vSphere backed with virtual Distributed Switch (VDS) Networking and NSX Advanced Load
Balancer to provide Load Balancing capabilities.
Note
The scope of this document is limited to VMware NSX Data Center Networking.
NSX provides network connectivity to the objects inside the Supervisor and external networks.
Connectivity to the ESXi hosts comprises three vSphere clusters that are backed by VLAN backed
port groups.
The following diagram shows a general overview of three-zone deployment of vSphere with Tanzu
on NSx networking.
The Supervisor cluster configured with NSX networking either uses a distributed port group (routable
to required infrastructure components such as vCenter, NSX manager, DNS , NTP and so on. For
more information, see Firewal Recommendation) or to NSX segment to provide connectivity to
Kubernetes control plane VMs. Tanzu Kubernetes clusters have their own networking provided by
the NSX segment. All hosts from the cluster, which is enabled for vSphere with Tanzu, are
connected to the distributed switch that provides connectivity to Kubernetes workload and control
plane VMs.
The following section explains the networking components and services included in the Supervisor
cluster:
NSX Container Plugin (NCP) provides integration between NSX and Kubernetes. The main
component of NCP runs in a container and communicates with the NSX manager and with
the Kubernetes control plane. NCP monitors changes to containers and other resources and
manages resources such as logical ports, segments, routers, and security groups for the
By default, NCP creates one shared tier-1 gateway for system namespaces, and a tier-1
gateway and load balancer for each namespace. The tier-1 gateway for namespace is
connected to the tier-0 gateway and a default segment.
System namespaces are namespaces that are used by the core components that are integral
to functioning of the Supervisor and Tanzu Kubernetes Grid clusters. The shared network
resources that include the tier-1 gateway, load balancer, and SNAT IP are grouped in a
system namespace.
NSX Edge provides connectivity from external networks to the Supervisor resources. An
NSX edge cluster normally includes at least two Edge nodes and has a load balancer that
provides a redundancy to the Kube-API servers residing on control plane VMs and any
application that must be published and be accessible from outside the Supervisor cluster. For
more information, see Install and Configure NSX for vSphere with Tanzu.
A tier-0 gateway is associated with the NSX Edge cluster to provide routing to the external
network. The uplink interfaces use either the dynamic routing, BGP, or static routing.
Each vSphere namespace has a separate network and set of networking resources shared by
applications inside the namespace, such as tier-1 gateway, load balancer service, and SNAT
IP address.
Workloads running in Tanzu kubernetes Grid clusters will have the same isolation rule that is
implemented by the default firewall.
NSX LB provides
L4 Load Balancer service for Kube-API to the Supervisor cluster and workload
clusters.
L4 Load Balancer service for all services of type LoadBalancer deployed in workload
clusters.
Networking Prerequisites
All ESXi hosts, part of three vSphere clusters, share a common VDS with at least one uplink.
We recommend that you configure two uplinks. You must use VDS version 8 or above.
Three vSphere clusters are mapped to the same overlay transport zone.
Supervisor Management network is used to instantiate the zonal supervisor. This can either
be L2 stretched network or NSX segment.
Network Requirements
The following table lists the required networks for the reference design:
Note
Based on your business requirements, modify subnet range to fit the projected
growth.
Supervisor /28 to allow for 5 IPs and Network to host the supervisor VMs. It can be a VLAN backed
Management future expansion VDS Port group or pre-created NSX segment.
Network
Ingress IP range /24, 254 address Each service type Load Balancer deployed will consume 1 IP
address.
Egress IP range /27 Each vSphere namespace consumes 1 IP address for the SNAT
egress.
Supervisor Service /24 Network from which IPs for Kubernetes ClusterIP Service will be
CIDR allocated.
Firewall Recommendations
To prepare the firewall, you need the following information:
The following table provides a list of firewall rules based on the assumption that there is no firewall
within a subnet or VLAN.
vCenter Supervisor Network TCP:6443 Allows vCenter to manage the supervisor VMs.
Supervisor NSX Manager TCP:443 Allows supervisor to access NSX-T Manager to orchestrate
Network networking.
Supervisor Workload Network TCP:6443 GCM, VMOperator needs to communicate with TKC
Network apiserver.
UDP:53
TCP:443
TCP:6443
TCP:6443
TCP:22
TCP:6443
Note
For Tanzu Mission Control (TMC), if the firewall does not allow wildcards, you just
whitelist all IP addresses of [account].tmc.cloud.vmware.com and extensions.aws-
usw2.tmc.cloud.vmware.com.
Installation Experience
While deploying Supervisor by using vSphere 8 and above, you can select vSphere Zone
deployment, and can assign vSphere zones to provide high availability and fault tolerance. IN this
scenario, all three vsphere clusters become one supervisor. In a three-zone deployment, you can
perform the following operations:
Distribute the nodes of your Tanzu Kubernetes Grid clusters across all three vSphere zones,
thus providing HA for your Kubernetes workloads at a vSphere cluster level.
vSphere with Tanzu deployment starts with deploying the Supervisor cluster on three vSphere
Zones. The deployment is directly done from vCenter UI. The Get Started page lists the pre-requisite
for the deployment:
1. On the next page, provide a name for the Supervisor cluster, and select the previously
created three vSphere Zones.
This installation process takes you through the steps of deploying Supervisor cluster in your vSphere
environment. Once the Supervisor cluster is deployed, you can use either Tanzu Mission Control or
Kubectl utility to deploy the Tanzu Kubernetes Grid Clusters.
The following tables list recommendations for deploying the Supervisor Cluster:
Decision
Design Decision Design Justification Design Implications
ID
TKO- Deploy Supervisor Large form factor should suffice to integrate Consume more
TKGS- cluster control plane Supervisor cluster with TMC. resources from
001 nodes in large form Infrastructure.
factor.
TKO- Register the Supervisor Tanzu Mission Control automates the creation of the Need outbound
TKGS- cluster with Tanzu Tanzu Kubernetes clusters, and manages the life cycle connectivity to the
002 Mission Control. of all Tanzu Kubernetes clusters centrally. internet for TMC
registration.
Note
In this scenario, the SaaS endpoints refer to Tanzu Mission Control, Tanzu Service
Mesh, and Tanzu Observability.
vSphere Namespaces
A vSphere Namespace provides the runtime environment for TKG Clusters on Zonal Supervisor. To
provision a TKG cluster, you first configure a vSphere namespace with users, roles, permissions,
compute, storage, content library, and assign virtual machine classes. All these configurations are
inherited by TKG clusters deployed in that namespace.
When you create a vSphere Namespace, a network segment is created which is derived from the
Namespace Network configured in Supervisor. While creating vSphere namespace, you have the
option to override cluster network settings. Choosing this option lets you customize the vSphere
Namespace network by adding Ingress, Egress, and Namespace network CIDR (unique from the
Supervisor and from any other vSphere namespace).
The typical use case for overriding Supervisor network settings is to provision a TKG cluster with
routable pod networking.
Note
Decision
Design Decision Design Justification Design Implications
ID
TKO- Create dedicated Segregate prod/dev/test cluster via Clusters created within the namespace
TKGS- namespace to assigning them to dedicated share the same access
003 environment namespaces. policies/quotas/network & storage
specific. resources.
TKO- Register external Limit access to namespace based on role External AD/LDAP needs to be
TKGS- IDP with Supervisor of users or groups. integrated with vCenter or SSO
004 or AD/LDAP with Groups need to be created manually.
vCenter SSO.
TKO- Enable namespace Enables Devops users to create The vSphere administrator must
TKGS- self-service namespace in self-service manner. publish a namespace template to
005 LDAP users or groups to enable them
to create a namespace.
TKO- Use guaranteed VM CPU and Memory limits configured on Consume more infrastructure
TKGS- Class for production vSphere Namespace have impact on TKG resources and contention might
006 cluster. cluster if deployed using the guaranteed occur.
VM Class type.
Supervisor. If you are provisioning TKG cluster across vSphere zones, you must provide the failure
domain for each node pool. Each failure domain maps to a vSphere Zone which thereby will be
associated with one vSphere cluster. Failure domains, also known as vSphere Fault domains, are
defined and managed by the vSphere administrator when creating vSphere Zones.
The Control plane nodes of Tanzu Kubernetes Grid clusters are automatically placed across the
vSphere Zones. However you can control how the worker nodes are spread across zones. You can
define a NodePool object for the worker nodes of Tanzu kubernetes Grid clusters and map each
vSphere Zone to a Failure domain with each NodePools. ClusterAPI spreads the Node Pools across
zones automatically.
In a zone topology, when you provision a TKG cluster on Supervisor, the cluster is aware of the
vSphere Zones. The zone topology supports failure domains for highly available workloads. If
needed you can run workload in a specific zone using annotations.
The v1alpha3 API lets you create conformant Kubernetes clusters of type TanzuKubernetesCluster.
This type of cluster is pre-configured with common defaults for quick provisioning, and can be
customized. The v1beta1 API lets you create conformant Kubernetes clusters based on the default
ClusterClass named tanzukubernetescluster and cluster type of Cluster.
Antrea
Calico
The CNI options are open-source software that provide networking for cluster pods, services, and
ingress.
When you deploy a Tanzu Kubernetes cluster using the default configuration, Antrea CNI is
automatically enabled in the cluster.
To provision a Tanzu Kubernetes cluster using Calico CNI, see Deploy Tanzu Kubernetes Clusters
with Calico.
Ant
Enable Kubernetes pod networking with IP overlay networks using VXLAN Pros:
rea
or Geneve for encapsulation. Optionally encrypt node-to-node
- Antrea leverages Open vSwitch
communication using IPSec packet encryption.
as the networking data plane.
Antrea supports advanced network use cases like kernel bypass and Open vSwitch supports both
network service mesh. Linux and Windows.
Cali
Calico is used in environments where factors like network performance, Pros:
co
flexibility, and power are essential.
- Support for Network Policies
For routing packets between nodes, Calico leverages the BGP routing
- High network performance
protocol instead of an overlay network. This eliminates the need to wrap
packets with an encapsulation layer resulting in increased network - SCTP Support
performance for Kubernetes workloads.
Cons:
- No multicast support.
Advanced Load Balancer as L7 ingress. However, this will require an enterprise license of NSX
Advanced Load Balancer.
For more information about Contour, see Contour and Ingress Using Contour.
Each ingress controller has advantages and disadvantages of its own. The following table provides
general recommendations on when you should use a specific ingress controller for your Kubernetes
environment:
Ingress
Use Cases
Controller
Contour
You use Contour when only north-south traffic is needed in a Kubernetes cluster. You can apply
security policies for the north-south traffic by defining the policies in the manifest file for the
application.
Istio You use Istio ingress controller when you need to provide security, traffic direction, and insight within
the cluster (east-west traffic), and between the cluster and the outside world (north-south traffic).
Container Registry
vSphere with Tanzu includes Harbor as a container registry. Harbor provides a location of pushing,
pulling, storing, and scanning container images used in your Kubernetes clusters.
The initial configuration and setup of the platform does not require any external registry because the
required images are delivered through vCenter. Harbor registry is used for day-2 operations of the
Tanzu Kubernetes workload clusters. Typical day-2 operations include tasks, such as pulling images
from Harbor for application deployment, and pushing custom images to Harbor.
When vSphere with Tanzu is deployed on NSX networking, you can deploy an external container
registry (Harbor) for Tanzu Kubernetes clusters.
VM-based deployment using OVA: VMware recommends using this installation method in
cases where Tanzu Kubernetes Grid is being installed in an air-gapped or Internet-restricted
environment, and no pre-existing image registry exists to host the Tanzu Kubernetes Grid
system images. VM-based deployments are only supported by VMware Global Support
Services to host the system images for air-gapped or Internet-less deployments. Do not use
this method for hosting application images. Harbor registry is being shipped with TKG
binaries and can be download from here.
If you are deploying Harbor without a publicly signed certificate, you must include the Harbor root
CA in your Tanzu Kubernetes Grid clusters. For more information, see Trust Custom CA Certificates
on Cluster Nodes.
To configure TKG cluster with private container registry, see Integrate TKG 2 cluster with container
registry.
The following table lists the supported scaling operations for TKG cluster:
Node Horizontal Scale Out Horizontal Scale In Vertical Scale Volume Scale
Note
- The number of control plane nodes must be odd, either 3 or 5. - You can change
the Worker node volumes after provisioning. However, you can not change the
control plane node volumes.
Tool Comments
Requires the Velero plug-in for vSphere installed and configured on Supervisor.
To backup and restore workloads running on TKG Cluster on Zonal Supervisor, create a datastore
and install Velero with Restic on Kubernetes cluster. For more information, see Install and Configure
Standalone Velero and Restic.
Note
Velero plug-in for vSphere runs as a pod which is not supported with Zonal
Supervisor, and it requires NSX-T networking. For more information, see the
prerequisites section of Install and Configure the Velero Plugin for vSphere on
Supervisor.
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
name: tkc-prod-cluster-1
namespace: prod
spec:
clusterNetwork:
services:
cidrBlocks: ["10.96.0.0/12"]
pods:
cidrBlocks: ["192.168.0.0/16"]
serviceDomain: "cluster.local"
topology:
class: tanzukubernetescluster
version: v1.23.8+vmware.2-tkg.2-zshippable
#describe the cluster control plane
controlPlane:
#number of control plane nodes; integer 1 or 3
replicas: 3
#describe the cluster worker nodes
workers:
#specifies parameters for a set of worker nodes in the topology
machineDeployments:
- class: node-pool
name: node-pool-1
replicas: 1
failureDomain: zone-a
- class: node-pool
name: node-pool-2
replicas: 1
failureDomain: zone-b
- class: node-pool
name: node-pool-3
replicas: 1
failureDomain: zone-c
variables:
#virtual machine class type and size for cluster nodes
- name: vmClass
value: guaranteed-small
#persistent storage class for cluster nodes
- name: storageClass
value: gold-sp
The TKG cluster is provisioned across vSphere Zones, and can be verified by running the following
command:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
replicas: 3
selector:
matchLabels:
app: nginx
serviceName: nginx
template:
metadata:
labels:
app: nginx
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- zone-1
- zone-2
- zone-3
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- nginx
topologyKey: topology.kubernetes.io/zone
containers:
- name: nginx
image: gcr.io/google_containers/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
- name: logs
mountPath: /logs
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: zonal-ds-policy-105-latebinding
resources:
requests:
storage: 2Gi
- metadata:
name: logs
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: zonal-ds-policy-105-latebinding
resources:
requests:
storage: 1Gi
To verify the pod scheduling across zones, run the following command:
VMware Tanzu Kubernetes Grid (informally known as TKG) implements user authentication with
Pinniped, an open-source authentication service for Kubernetes clusters. Pinniped allows you to
plug external OpenID Connect (OIDC) or LDAP identity providers (IdP) into Tanzu Kubernetes
(workload) clusters so that you can control user access to those clusters.
For LDAP authentication, Pinniped uses Dex as the endpoint to connect to your upstream
LDAP IdP.
If you use OIDC, Pinniped provides its own endpoint, so Dex is not required.
Pinniped and Dex run automatically as in-cluster services in your management clusters if you enable
identity management. For instructions on how to enable identity management in Tanzu Kubernetes
Grid, see Configure Identity Management.
Authentication Flow
The authentication flow between the management and workload clusters includes the following:
The Tanzu Kubernetes Grid administrator enables and configures identity management on
the management cluster, specifying an external LDAP or OIDC IdP.
Authentication service components are deployed into the management cluster, using the
LDAP or OIDC IdP details specified by the administrator.
The administrator creates a Tanzu Kubernetes (workload) cluster. The workload cluster
inherits the authentication configuration from the management cluster.
The administrator creates a role binding to associate a given user with a given role on the
workload cluster.
The administrator provides the kubeconfig for the workload cluster to the user.
A user uses the kubeconfig to connect to the workload cluster, for example, by running
kubectl get pods --kubeconfig.
The workload cluster either allows or denies the kubectl get pods request, depending on the
permissions of the user’s role.
In the following image, the blue arrows represent the authentication flow between the workload
cluster, the management cluster, and the external IdP. The green arrows represent Tanzu CLI and
kubectl traffic between the workload cluster, the management cluster, and the external IdP.
We recommend the following best practices for managing identities in Tanzu Kubernetes Grid
provisioned clusters:
Limit access to management clusters to the appropriate set of users. For example, provide
access only to users who are responsible for managing infrastructure and cloud resources
but not to application developers. This is especially important because access to the
management cluster inherently provides access to all workload clusters.
Limit cluster administrator access for workload clusters to the appropriate set of users. For
example, provide access to users who are responsible for managing infrastructure and
platform resources in your organization, but not to application developers.
Connect to an identity provider to manage the user identities allowed to access cluster
resources instead of relying on administrator-generated kubeconfig files.
You can enable identity management during or after management cluster deployment. Any
workload clusters that you create after enabling identity management are automatically configured to
use the same identity provider as the management cluster.
Use the obtained details to configure LDAPS or OIDC in Tanzu Kubernetes Grid.
After the management cluster has been created, confirm that the authentication
service is running correctly and complete its configuration.
Confirm that the authentication service is running correctly and complete its
configuration.
If the management cluster manages any workload clusters, generate the Pinniped
add-on secret for each workload cluster that was created before you enabled identity
management.
To use your company’s internal LDAPS server as the identity provider, obtain LDAPS information
from your LDAP administrator.
To use OIDC as the identity provider, you must have an account with an identity provider that
supports the OpenID Connect standard, for example, Okta.
For more information on using Okta as your OIDC provider, see Register a Tanzu Kubernetes Grid
Application in Okta.
2. If you choose to use OIDC, provide details of your OIDC provider account, for example,
Okta.
Client ID: The client_id value that you obtain from your OIDC provider. For example,
if your provider is Okta, log in to Okta, create a Web application, and select the
Client Secret: The secret value that you obtain from your OIDC provider.
Username Claim: The name of your username claim. This is used to set a user’s
username in the JSON Web Token (JWT) claim. Depending on your provider, enter
claims such as user_name, email, or code.
Groups Claim: The name of your groups claim. This is used to set a user’s group in
the JWT claim. For example, groups.
3. If you choose to use LDAPS, provide details of your company’s LDAPS server. All settings
except for LDAPS Endpoint are optional.
LDAPS Endpoint: The IP or DNS address of your LDAPS server. Provide the
address and port of the LDAP server, in the form host:port.
Bind DN: The DN for an application service account. The connector uses these
credentials to search for users and groups. Not required if the LDAP server provides
access for anonymous authentication.
Bind Password: The password for an application service account, if Bind DN is set.
Base DN: The point from which to start the LDAP search. For example,
OU=Users,OU=domain,DC=io.
Username: The LDAP attribute that contains the user ID. For example, uid,
sAMAccountName.
Base DN: The point from which to start the LDAP search. For example,
OU=Groups,OU=domain,DC=io.
Name Attribute: The LDAP attribute that holds the name of the group. For example,
cn.
User Attribute: The attribute of the user record that is used as the value of the
membership attribute of the group record. For example, distinguishedName, dn.
Group Attribute: The attribute of the group record that holds the user/member
information. For example, member.
Paste the contents of the LDAPS server CA certificate into the Root CA text box.
Click Start. After verification completes, if you see any failures, examine them
closely.
5. After you deploy the management cluster, proceed with Complete the configuration of
Identity Management in the Management Cluster
2. Generate a Kubernetes secret for the Pinniped add-on and deploy Pinniped package.
Before you can identity management, you must have an identity provider. Tanzu Kubernetes Grid
supports LDAPS and OIDC identity providers.
To use your company’s internal LDAPS server as the identity provider, obtain LDAPS information
from your LDAP administrator.
To use OIDC as the identity provider, you must have an account with an identity provider that
supports the OpenID Connect standard, for example, Okta.
For more information on using Okta as your OIDC provider, see Register a Tanzu Kubernetes Grid
Application in Okta.
For more information on obtaining your identity provider details, see Obtain Your Identity Provider
Details.
1. Set the context of kubectl to your management cluster. For example, with a management
cluster named id-mgmt-test:
2. Create a cluster configuration file by copying the configuration settings that you defined
when you deployed your management cluster into a new file. In addition to the variables
from the original management cluster configuration, include the following OIDC or LDAP
identity provider details in the file:
Note
IDENTITY_MANAGEMENT_TYPE:
CERT_DURATION: 2160h
CERT_RENEW_BEFORE: 360h
OIDC_IDENTITY_PROVIDER_CLIENT_ID:
OIDC_IDENTITY_PROVIDER_CLIENT_SECRET:
OIDC_IDENTITY_PROVIDER_GROUPS_CLAIM:
OIDC_IDENTITY_PROVIDER_ISSUER_URL:
OIDC_IDENTITY_PROVIDER_SCOPES: "email,profile,groups"
OIDC_IDENTITY_PROVIDER_USERNAME_CLAIM:
LDAP_BIND_DN:
LDAP_BIND_PASSWORD:
LDAP_GROUP_SEARCH_BASE_DN:
LDAP_GROUP_SEARCH_FILTER:
LDAP_GROUP_SEARCH_GROUP_ATTRIBUTE:
LDAP_GROUP_SEARCH_NAME_ATTRIBUTE: cn
LDAP_GROUP_SEARCH_USER_ATTRIBUTE: DN
LDAP_HOST:
LDAP_ROOT_CA_DATA_B64:
LDAP_USER_SEARCH_BASE_DN:
LDAP_USER_SEARCH_EMAIL_ATTRIBUTE: DN
LDAP_USER_SEARCH_FILTER:
LDAP_USER_SEARCH_ID_ATTRIBUTE: DN
LDAP_USER_SEARCH_NAME_ATTRIBUTE:
LDAP_USER_SEARCH_USERNAME: userPrincipalName
3. Providing a sample management cluster configuration file below after updating the ldap
configuration.
#! ---------------------------------------------------------------------
#! vSphere non proxy env configs
#! ---------------------------------------------------------------------
AVI_CA_DATA_B64: <base64-encoded-cert>
AVI_CLOUD_NAME: tkgvsphere-cloud01
AVI_CONTROLLER: alb-ctlr01.lab.vmw
AVI_DATA_NETWORK: tkg-mgmt-vip-segment
AVI_DATA_NETWORK_CIDR: 172.16.50.0/24
AVI_ENABLE: 'true'
AVI_LABELS: |
'type': 'management'
AVI_PASSWORD: <encoded:Vk13YXJlMSE=>
AVI_SERVICE_ENGINE_GROUP: tkgvsphere-tkgmgmt-group01
AVI_USERNAME: admin
CLUSTER_CIDR: 100.96.0.0/11
CLUSTER_NAME: tkg-mgmt-01
CLUSTER_PLAN: prod
ENABLE_CEIP_PARTICIPATION: 'true'
ENABLE_MHC: 'true'
#----------Providing the ldap config here---------------------
IDENTITY_MANAGEMENT_TYPE: ldap
LDAP_BIND_DN: cn=administrator,cn=Users,dc=lab,dc=vmw
LDAP_BIND_PASSWORD: VMware1!
LDAP_GROUP_SEARCH_BASE_DN: cn=Users,dc=lab,dc=vmw
LDAP_GROUP_SEARCH_FILTER: (objectClass=group)
LDAP_GROUP_SEARCH_GROUP_ATTRIBUTE: member
LDAP_GROUP_SEARCH_NAME_ATTRIBUTE: member
LDAP_GROUP_SEARCH_USER_ATTRIBUTE: DN
LDAP_HOST: dns.lab.vmw
LDAP_ROOT_CA_DATA_B64: <base64-encoded-cert>
LDAP_USER_SEARCH_BASE_DN: cn=Users,dc=lab,dc=vmw
LDAP_USER_SEARCH_EMAIL_ATTRIBUTE: DN
LDAP_USER_SEARCH_FILTER: (objectClass=person)
LDAP_USER_SEARCH_ID_ATTRIBUTE: DN
LDAP_USER_SEARCH_NAME_ATTRIBUTE: userPrincipalName
LDAP_USER_SEARCH_USERNAME: userPrincipalName
#--------------------------
INFRASTRUCTURE_PROVIDER: vsphere
SERVICE_CIDR: 100.64.0.0/13
TKG_HTTP_PROXY_ENABLED: false
DEPLOY_TKG_ON_VSPHERE7: 'true'
VSPHERE_DATACENTER: /tkgm-internet-dc1
VSPHERE_DATASTORE: /tkgm-internet-dc1/datastore/vsanDatastore
VSPHERE_FOLDER: /tkgm-internet-dc1/vm/tkg-vsphere-tkg-mgmt
VSPHERE_NETWORK: pg-tkg_mgmt
VSPHERE_PASSWORD: <encoded:Vk13YXJlMSE=>
VSPHERE_RESOURCE_POOL: /tkgm-internet-dc1/host/tkgm-internet-c1/Resources/tkg-v
sphere-tkg-Mgmt
VSPHERE_SERVER: vcenter.lab.vmw
VSPHERE_SSH_AUTHORIZED_KEY: <vcenter-ssh-key>
VSPHERE_USERNAME: [email protected]
VSPHERE_INSECURE: 'true'
AVI_CONTROL_PLANE_HA_PROVIDER: 'true'
ENABLE_AUDIT_LOGGING: 'true'
OS_ARCH: amd64
OS_NAME: photon
OS_VERSION: 3
AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_NAME: pg-tkg-cluster-vip
AVI_MANAGEMENT_CLUSTER_VIP_NETWORK_CIDR: 172.16.80.0/24
WORKER_SIZE: medium
CONTROLPLANE_SIZE: medium
4. Make sure your local environment has IDENTITY_MANAGEMENT_TYPE set to either oidc
or ldap, and not none:
# export IDENTITY_MANAGEMENT_TYPE=ldap
# echo $IDENTITY_MANAGEMENT_TYPE
ldap
export _TKG_CLUSTER_FORCE_ROLE="management"
export FILTER_BY_ADDON_TYPE="authentication/pinniped"
Example:
# tanzu cluster create tkg-mgmt-01 --dry-run -f tkg-mgmt-01.yaml > tkg-mgmt-01-
example-secret.yaml
# ls
tkg-mgmt-01.yaml tkg-mgmt-01-example-secret.yaml
The environment variable settings cause tanzu cluster create –dry-run to generate a
Kubernetes secret, not a full cluster manifest.
Note
This command generates the manifest with default namespace, however you
need to create the secret in tkg-system namespace for kapp controller to
reconcile the core addon. So manually edit the file and change the
namespace to “tkg-system”
8. Review the secret and then apply it to the management cluster. For example:
9. After applying the secret, check the status of the Pinniped add-on by running the kubectl get
app command:
Note
2. Confirm that the authentication service is running correctly by checking its status:
OIDC: Check the Status of an OIDC Identity Management Service.
3. If you want to use regular, non-administrator kubeconfig files for access to the management
cluster, after completing the configuration of identity management, configure RBAC by
following the instructions in Configure RBAC for a Management Cluster.
1. Get the admin context of the management cluster. The procedures in this topic use a
The admin context of a cluster gives you full access to the cluster without requiring
authentication with your IDP
Follow the steps below to check the status of an LDAP service and note the EXTERNAL-IP address
at which the service is exposed.
1. Verify that the Pinniped package was installed and reconciled successfully.
2. Get information about the pinniped supervisor and dexsvc services that are running in the
management cluster.
3. Proceed with generating the kubeconfig and creating the role based access control. For
more information, see Configure RBAC.
Tanzu Kubernetes Grid uses Pinniped to integrate clusters with an OIDC identity service. When you
enable OIDC, Tanzu Kubernetes Grid creates the pinniped-supervisor service in the pinniped-
supervisor namespace and pinniped-concierge in the pinniped-concierge namespace.
Follow the steps below to check the status of the Pinniped service and note the EXTERNAL-IP
address at which the service is exposed.
1. Verify that the Pinniped package was installed and reconciled successfully.
2. Get information about the services that are running in the management cluster. The identity
management service runs in the pinniped-supervisor namespace.
3. Note the external address of the pinniped-supervisor service, as listed under EXTERNAL-IP.
4. Update the External IP in the login redirect URI in the OIDC identity provider. For more
information, see Provide the Callback URI to the OIDC Provider.
5. Once you update the redirect URI, proceed with generating the kubeconfig and creating the
role based access control. For more information, see Configure RBAC.
3. Select the application that you created for Tanzu Kubernetes Grid.
5. Under Login, update Login redirect URIs to include the address of the node in which the
pinniped-supervisor is running.
6. Add the external IP address of the node at which the pinniped-supervisor service is running,
that you noted in the previous procedure.
Note
If a workload cluster was created before you enabled identity management for your management
cluster, you must enable it manually. To enable identity management for a workload cluster:
Create a cluster configuration file by copying the configuration settings that you
defined when you deployed your workload cluster into a new file. In addition to the
variables from the original cluster configuration, include the following:
# Identity management type used by the management cluster. This must be "
oidc" or "ldap".
IDENTITY_MANAGEMENT_TYPE:
namespace: kube-public
resourceVersion: "62756"
uid: 7f399d41-ab1b-41f2-9cd1-1d5fc4ddf9e1
3. Create the cluster configuration by providing above details. Here is a sample workload config
file with ldap configuration.
CLUSTER_CIDR: 100.96.0.0/11
CLUSTER_NAME: workload-2
CLUSTER_PLAN: dev
ENABLE_CEIP_PARTICIPATION: 'true'
ENABLE_MHC: 'true'
IDENTITY_MANAGEMENT_TYPE: none
INFRASTRUCTURE_PROVIDER: vsphere
SERVICE_CIDR: 100.64.0.0/13
TKG_HTTP_PROXY_ENABLED: false
DEPLOY_TKG_ON_VSPHERE7: 'true'
VSPHERE_DATACENTER: /tkgm-internet-dc1
VSPHERE_DATASTORE: vsanDatastore
VSPHERE_FOLDER: /tkgm-internet-dc1/vm/tkg-vsphere-workload
VSPHERE_NETWORK: /tkgm-internet-dc1/network/pg-tkg_workload
VSPHERE_PASSWORD: <encoded:Vk13YXJlMSE=>
VSPHERE_RESOURCE_POOL: /tkgm-internet-dc1/host/tkgm-internet-c1/Resources/tkg-v
sphere-workload
VSPHERE_SERVER: vcenter.lab.vmw
VSPHERE_SSH_AUTHORIZED_KEY: <vsphere-ssh-key>
VSPHERE_USERNAME: [email protected]
WORKER_MACHINE_COUNT: 1
VSPHERE_INSECURE: 'true'
ENABLE_AUDIT_LOGGING: 'true'
ENABLE_DEFAULT_STORAGE_CLASS: 'true'
ENABLE_AUTOSCALER: false
AVI_CONTROL_PLANE_HA_PROVIDER: 'true'
OS_ARCH: amd64
OS_NAME: photon
OS_VERSION: 3
WORKER_SIZE: medium
CONTROLPLANE_SIZE: medium
IDENTITY_MANAGEMENT_TYPE: ldap
SUPERVISOR_ISSUER_URL: https://172.16.80.104
# export IDENTITY_MANAGEMENT_TYPE=ldap
# echo $IDENTITY_MANAGEMENT_TYPE
ldap
export _TKG_CLUSTER_FORCE_ROLE="workload"
export FILTER_BY_ADDON_TYPE="authentication/pinniped"
# ls
Workload-2.yaml workload-2-example-secret.yaml
The environment variable settings cause tanzu cluster create –dry-run to generate a
Kubernetes secret, not a full cluster manifest.
8. Review the secret and apply it to the management cluster. The Pinniped add-on secret is
always created or applied to the management cluster, even if you are configuring a workload
cluster.
Set the context of kubectl to the management cluster. For example, with a
management cluster named tkg-mgmt:
9. After applying the secret, check the status of the Pinniped add-on:
Run the kubectl get app command against the workload cluster:
10. If you plan to use regular, non-administrator kubeconfig files for cluster access, proceed with
generating the kubeconfig and creating the role based access control. For more information,
see Configure RBAC.
Configure RBAC
To give users access to a management or a workload cluster, you generate a kubeconfig file and
then share the file with those users. If you provide them with the administrator kubeconfig for the
cluster, they have full access to the cluster and do not need to be authenticated. However, if you
provide users with the regular kubeconfig, they must have a user account in your OIDC or LDAP
identity provider and you must configure RBAC on the cluster to grant access permissions to the
designated user.
For more information on how to configure role-based access control (RBAC) in Tanzu Kubernetes
Grid, see Configure RBAC.
1. Export the regular kubeconfig for the management cluster to a local file, for example,
/tmp/id_mgmt_test_kubeconfig. Note that the command does not include the –admin
option, so the kubeconfig that is exported is the regular kubeconfig, not the admin version.
2. Connect to the management cluster by using the newly created kubeconfig file:
The authentication process requires a browser to be present on the machine from which
users connect to clusters because running kubectl commands automatically opens the IdP
login page so that users can log in to the cluster. Your browser should open and display the
login page for your OIDC provider or an LDAPS login page.
LDAPS:
OIDC:
Enter the credentials of a user account that exists in your OIDC or LDAP server. After a
successful login, the browser should display the following.
3. Go back to the terminal in which you run tanzu and kubectl commands:
If you already configured a role binding on the cluster for the authenticated user, the
output of kubectl get pods -A appears, displaying the pod information.
If you have not configured a role binding on the cluster, you see a message denying
the user account access to the pods:
This happens because the user has been successfully authenticated, but they are not yet
authorized to access any resources on the cluster. To authorize the user to access the cluster
resources, you must Create a Role Binding on the Management Cluster as described in
Create a Role Binding on the Management Cluster.
To make the kubeconfig work, you must first set up RBAC by creating a role binding on the
management cluster. This role binding assigns role-based permissions to individual authenticated
users or user groups. There are many roles with which you can associate users, but the most useful
roles are the following:
admin: Permission to view most resources but can only modify resources like roles and
bindings. Cannot modify pods or deployments.
edit: The opposite of admin. Can create, update, and delete resources like deployments,
services, and pods. Cannot change roles or permissions.
view: Read-only.
You can assign any of the roles to users. For more information about custom roles and role bindings,
see Using RBAC Authorization in the Kubernetes documentation.
1. Make sure that you are using the admin context of the management cluster:
2. If the context is not the management cluster admin context, set kubectl to use that context.
For example:
5. The following example creates a cluster role binding named id-mgmt-test-rb that binds the
cluster role cluster-admin to the user [email protected]:
For –user, specify the OIDC or LDAP username of the user. You configured the username
attribute and other identity provider details in the Identity Management section of the Tanzu
Kubernetes Grid installer interface or by setting the LDAP_* or OIDC_* variables:
1. OIDC: The username attribute is set in the Username Claim field under OIDC
Identity Management Source in the installer interface or the
OIDC_IDENTITY_PROVIDER_USERNAME_CLAIM configuration variable.
2. LDAPS: The username attribute is set in the Username field under LDAPS Identity
Management Source –> User Search Attributes in the installer interface or the
LDAP_USER_SEARCH_USERNAME configuration variable.
For example, for OIDC, the username is often the email address of the user. For
LDAPS, it is the LDAP username, not the email address.
3. Attempt to connect to the management cluster again by using the kubeconfig file
that you created in the previous procedure:
This time, because the user is bound to the cluster-admin role on this management cluster, the list of
pods should be displayed. You can share the generated kubeconfig file with any users for whom you
configure role bindings on the management cluster.
For clusters based on older TKrs or created by older versions of Tanzu Kubernetes Grid : Follow the
Authenticate Users on a Machine Without a Browser procedure in the Tanzu Kubernetes Grid v1.4
documentation.
For clusters based on TKr v1.22.5 (default for Tanzu Kubernetes Grid v1.5) or later, do the following:
1. From a terminal window on your local machine, run ssh to remotely log in to your bootstrap
machine.
export TANZU_CLI_PINNIPED_AUTH_LOGIN_SKIP_BROWSER=true
3. Export the regular kubeconfig for the cluster to a local file. Note that the command does not
include the –admin option, so the kubeconfig that is exported is the regular kubeconfig, not
the admin version.
4. Connect to the cluster by using the newly-created kubeconfig file. The CLI outputs a login
link for your identity provider.
5. Copy the link and paste it into a browser on your local machine and log in to your identity
provider.
6. A page appears prompting you to paste an authorization code into the CLI to finish you login:
7. Copy the authorization code and paste it into the CLI, after the Optionally, paste your
authorization code: prompt.
8. Connect to the cluster again by using the same kubeconfig file as you used previously:
9. If you already configured a role binding on the cluster for the authenticated user, the output
shows the pod information.
10. If you have not configured a role binding on the cluster, you will see a message denying the
user account access to the pods: Error from server (Forbidden): pods is forbidden: User
“[email protected]” cannot list resource “pods” in API group "" at the cluster scope. This
happens because the user has been successfully authenticated, but they are not yet
authorized to access any resources on the cluster.
11. To authorize the user to access the cluster resources, you must configure RBAC on the
cluster by creating a cluster role binding. For more information, see Configure RBAC.
You can run backup and restore operations in Tanzu Mission Control to protect your Kubernetes
data.
Prerequisites
Before you enable Data Protection on a workload cluster, ensure the following prerequisites:
The workload cluster that you want to protect is registered or attached with Tanzu Mission
Control.
You have created a credential for Data Protection as per instructions provided in the Tanzu
Mission Control documentation.
You have created a Target Location for Data Protection as per instructions provided in the
Tanzu Mission Control documentation.
For more information about protecting the data resources in your Kubernetes clusters, see Data
Protection in VMware Tanzu Mission Control Concepts.
1. Locate the cluster in the Tanzu Mission Control portal and click on the Overview tab.
It takes approximately 5-10 minutes to enable data protection on a Kubernetes cluster. Tanzu Mission
Control creates a namespace named Velero and installs Velero related Kubernetes objects in the
workload cluster.
Configure Backup
After enabling data protection,
1. In the Data protection section, click Create Backup to configure backup for the workload
cluster.
Tanzu Mission Control Data Protection allows you to create backups of the following types:
Backup configuration may take some time depending on the Kubernetes objects that you have
provisioned in the workload cluster. When backup is configured for the first time, Tanzu Mission
Control takes a backup immediately. After that backups are taken as per the backup schedule
configured.
Restore Backup
To restore the Kubernetes data from the backup,
1. Go to Data Protection.
If you have backed up persistent volumes, the restore process may take some time. The backup is
restored in the same cluster from which it was retrieved.
Kubernetes is a platform that provides development teams with a single API to deploy, manage, and
run applications. However, running, maintaining, and securing Kubernetes is a complex task.
VMware Tanzu for Kubernetes Operations (informally known as TKO) simplifies Kubernetes
operations. It determines what base OS instances to use, which Kubernetes Container Network
Interface (CNI) and Container Storage Interfaces (CSI) to use, how to secure the Kubernetes API,
and so on. It monitors, upgrades, and backs up clusters and helps teams provision, manage, secure,
and maintain Kubernetes clusters on a day-to-day basis.
The following diagram provides a high-level network port requirements for deploying the
components available with Tanzu for Kubernetes Operations as a solution.
NSX Management Client Avi Controller 443 TCP NSX ALB UI/REST API
Advanced
Load
Balancer
NSX Avi Controller ESXi Host 443 TCP Management Access for Service Engine
Advanced Creation.
Load
Balancer
NSX Avi Controller vCenter 443 TCP APIs for vCenter Integration.
Advanced Server
Load
Balancer
NSX Avi Controller NSX Manager 443 TCP For NSX Cloud creation.
Advanced
Load
Balancer
NSX Avi Service Engine Avi Controller 123 UDP Time sync
Advanced
Load
Balancer
NSX Avi Service Engine Avi Controller 8443 TCP Secure channel key exchange.
Advanced
Load
Balancer
NSX Avi Service Engine Avi Controller 22 TCP Secure channel SSH
Advanced
Load
Balancer
NSX Avi Service Engine Avi Service 9001 TCP Inter-SE distributed object store for
Advanced Engine vCenter/NSX-T/No Orchestrator/Linux
Load server clouds.
Balancer
NSX Avi Service Engine Avi Service 4001 TCP Inter-SE distributed object store for
Advanced Engine AWS/Azure/GCP/OpenStack clouds.
Load
Balancer
Tanzu Bootstrap Machine TKG Cluster 6443 TCP Kubernetes Cluster API Access
Kubernetes Kubernetes
Grid API Server
Tanzu Bootstrap Machine NodePort 30000- TCP External access to hosted services via
Kubernetes Services 32767 L7 ingress in NodePort mode.
Grid
Tanzu Bootstrap Machine NodePortLoc 61000- TCP External access to hosted services via
Kubernetes al Services 62000(d L7 ingress in NodePortLocal mode.
Grid efault)
Tanzu TKG Workload TKG 31234 TCP To register Workload Cluster with
Kubernetes Cluster CIDR Management Management Cluster.
Grid Cluster CIDR
Tanzu TKG Management Avi Controller 443 TCP Allows Avi Kubernetes Operator (AKO)
Kubernetes and Workload and AKO Operator (AKOO) access to
Grid Cluster CIDR Avi Controller.
Tanzu TKG Management vCenter 443 TCP Allows components to access vCenter
Kubernetes and Workload Server to create VMs and Storage volumes.
Grid Cluster CIDR
Tanzu TKG Management DNS Server 53 UDP Allows components to look up for
Kubernetes and Workload machine addresses.
Grid Cluster CIDR
Tanzu TKG Management NTP Server 123 UDP Allows components to sync current
Kubernetes and Workload time.
Grid Cluster CIDR
Tanzu TKG Management DHCP Server 67 TCP Allows nodes to get DHCP addresses.
Kubernetes and Workload 68
Grid Cluster CIDR
Tanzu TKG Management Tanzu Mission 443 TCP To manage Tanzu Kubernetes Clusters
Kubernetes and Workload Control with Tanzu Mission Control (TMC).
Grid Cluster CIDR
Tanzu TKG Management Tanzu Service 443 TCP To provide Service Mesh services to
Kubernetes and Workload Mesh Tanzu Kubernetes Clusters with Tanzu
Grid Cluster CIDR Service Mesh (TSM).
Tanzu TKG Management Tanzu 443 TCP To monitor Tanzu Kubernetes Clusters
Kubernetes and Workload Observability with Tanzu Observability (TO).
Grid Cluster CIDR
Tanzu TKG Management vRealize Log 514 UDP To configure remote logging with
Kubernetes and Workload Insight fluentbit.
Grid Cluster CIDR
Tanzu TKG Management vRealzie Log 443 TCP To configure remote logging with
Kubernetes and Workload Insight Cloud fluentbit.
Grid Cluster CIDR
ClusterClass Overview
ClusterClass in the Kubernetes Cluster API project allows you to define the shape of your clusters.
You can determine the shape of the cluster only once, and use them multiple times. The
ClusterClass consists of collection of templates that define the topology and configuration of a
Kubernetes cluster. The templates can be used to create new clusters, or to update existing clusters.
ClusterClass helps you simplifying the process of creating and managing multiple Kubernetes
clusters, and to make your clusters more consistent and reliable.
Workers: This includes the reference to the VSphereMachineTemplate used when creating
the machines for the cluster’s worker machines and the KubeadmConfigTemplate containing
the KubeadmConfigSpec for initializing and joining the worker machines to the control plane.
Infrastructure: This includes the reference to the VSphereClusterTemplate that contains the
vCenter details(vCenter Server endpoint, SSL thumbprint etc) used when creating the
cluster.
Variables: A list of variable definitions, where each variable is defined using the OpenAPI
Schema definition.
Patches: A list of patches, used to change the above mentioned templates for each specific
cluster. Varibale definitions defined in the Variables section can also be used in the patches
section.
Consistent clusters: All clusters that are created from the same ClusterClass will have the
same topology and configuration. This helps ensuring that your clusters are reliable and
predictable.
Managed clusters: ClusterClass can be used to manage the lifecycle of your clusters. This
help you automating the process of creating, updating, and deleting clusters.
Cluster
Cluster CRD is used to create and manage the cluster’s configuration and state, and delete
Kubernetes clusters. For example, you can use the cluster object to update the Kubernetes version,
the network configuration, or the number of nodes in the cluster.
Define the attributes that governs the Cluster’s control plane. These attributes contain
parameters such as, the count of replicas, alongside provisions for overriding or appending
values to control plane metadata, nodeDrainTimeout, and control plane’s
MachineHealthCheck.
A list of machine deployments slated for creation, with each deployment uniquely
characterized by:
The reference to the MachineDeployment class, which defines the templates to be
used this specific MachineDeployment.
The number of replicas designated for this MachineDeployment, along with other
parameters such as node deployment strategy, machineHealthCheck,
nodeDrainTimeout values.
Specification of the intended Kubernetes version for both the Cluster, encompassing both
the control plane and worker nodes.
The Cluster Topology and MachineDeployments can also be customised using a set of
variables through patches as defined the ClusterClass CRD.