NSX Sdwan
NSX Sdwan
NSX Sdwan
4)
Agenda
First 3 Sections Vmware Virtualization with vSphere 6.7 Pre NSX-T
--- Rest of the Sections are :
NSX-T allows IT and development teams to choose the technologies best suited for their
particular applications. NSX-T is also designed for management, operations, and
consumption by development organizations in addition to IT.
NSX-T Anywhere Architecture
Characteristics of NSX-T architecture include:
Networking
Policy and Security and Visibility
and
Consistency Services
Connectivity
Adapter layer: NCP is built in a modular manner so that individual adapters can
be added for a variety of CaaS and PaaS platforms.
NSX Infrastructure layer: Implements the logic that creates topologies, attaches
logical ports, etc.
When using the Native Cloud Enforced mode, NSX Policies are translated to Native
Cloud Constructs such as Security Groups (in AWS) or combination of Network
Security Group/Application Security Groups (in Azure).
In NSX Enforced Mode (which was the only mode available in NSX-T 2.4
and prior), the NSX policie are enforced using NSX Tools which is
deployed in each Cloud instance.
The three planes are implemented as sets of processes, modules, and agents
residing on three types of nodes: manager, controller, and transport.
NSX-T Architecture and Components
Management Plane
Serves as a unique entry point for user configuration via RESTful API
(CMP,automation) or NSX-T user interface.
Responsible for storing desired configuration in its database. The NSX-T Manager
stores the final configuration request by the user for the system. This configuration
will be pushed by the NSX-T Manager to the control plane to become a realized
configuration (i.e., a configuration effective in the data plane).
Control Plane
NSX-T splits the control plane into two parts:
● Central Control Plane (CCP) – The CCP is implemented as a
cluster of virtual machines called CCP nodes. The cluster form factor
provides both redundancy and scalability of resources. The CCP is
logically separated from all data plane traffic, meaning any failure in
the control plane does not affect existing data plane operations. User
traffic does not pass through the CCP Cluster.
● Local Control Plane (LCP) – The LCP runs on transport nodes. It is
adjacent to the data plane it controls and is connected to the CCP. The
LCP is responsible for programing the forwarding entries and firewall
rules of the data plane.
NSX Manager Appliance
Instances of the NSX Manager and NSX Controller are bundled in a virtual
machine called the NSX Manager Appliance.
Starting 2.4, the NSX manager, NSX policy manager and NSX controller as
an element will co-exist under a common VM. Three unique NSX appliance
VMs are required for cluster availability.
The set of objects that the control plane deals with include VIFs, logical
networks, logical ports, logical routers, IP addresses, and so on.
Data Plane
There are two main types of transport nodes in NSX-T:
Hypervisor Transport Nodes: Hypervisor transport nodes are
hypervisors prepared and configured for NSX-T. The N-VDS provides
network services to the virtual machines running on those hypervisors.
NSX-T currently supports VMware ESXi™ and KVM hypervisors. The
N-VDS implemented for KVM is based on the Open vSwitch (OVS) and
is platform independent .
Edge Nodes: VMware NSX-T Edge™ nodes are service appliances
dedicated to running centralized network services that cannot be distributed
to the hypervisors. They can be instantiated as a bare metal appliance or in
virtual machine form factor. They are grouped in one or several clusters,
representing a pool of capacity.
The data plane performs stateless forwarding/transformation of packets based on
tables
populated by the control plane and reports topology information to the control
plane,
and maintains packet level statistics.
CMP integration - Plugins will continue use All deployments – except for CMP use case
Imperative Manager API’s for now. Will be (until plugins are updated with Declarative
updated to use Declarative API’s in Future API)
Container – Flexible * Plugin vendor specific Container – Flexible * Plugin vendor specific
OpenStack – Flexible
OpenStack – Flexible
Upgrade scenario
Existing Config gets ported to Advanced UI NSX 2.4 Onward New Features:
and DNS Services/Zones VPN
only available from there (not from Simplified Endpoint Protection (EPP)
UI) Network Introspection-E-W Service Insertion
DFW Settings Context Profile Creation – L7 APP or
FQDN’s New DFW/Gateway FW Layout –
Session Timer Bridging Different Categories, Auto Plumbed rules
Configuration
NSX-T Logical Object Naming Changes
The declarative API/Data model some of the networking and security logical objects names
have changed to build unified object model. The table below provides the before and after
naming side by side for those NSX-T Logical objects. This just changes the name for the
given NSX-T object, but conceptually and functionally it is same as before.
Advanced API/UI Object Declarative API Object
Logical switch Segment
Networking
T0 Logical Router Tier-0 Gateway
T1 Logical Router Tier-1 Gateway
Centralized Service Port Service Interface
A transport zone controls which hosts a logical switch can reach. It can
span one or more host clusters. Transport zones dictate which hosts
and, therefore, which VMs can participate in the use of a particular
network.
If VMs are attached to switches that are in different transport zones, the
VMs cannot communicate with each other. Transport zones do not
replace Layer 2/Layer 3 reachability requirements, but they place a
limit on reachability.
A node can serve as a transport node if it contains at least one
hostswitch. When creating a host transport node and adding it to a
transport zone, NSX-T installs a hostswitch on the host. The hostswitch
is used for attaching VMs to NSX-T logical switch segments and for
creating NSX-T gateway router uplinks and downlinks.
Control Plane
Computes runtime state based on configuration from the management plane.
Control plane disseminates topology information reported by the data plane
elements and pushes stateless configuration to forwarding engines.
Data Plane
Performs stateless forwarding or transformation of packets based on tables
populated by the control plane. Data plane reports topology information to the
control plane and maintains packet level statistics.
NSX Manager
Management function that exists as a component of the NSX Manager Cluster.
In prior versions of NSX, the NSX Manager was a dedicated virtual appliance.
As of NSX-T 2.4, the NSX Manager function and Controller Cluster functions are
consolidated into a single cluster called the NSX Manager Cluster.
Transport Zone
Collection of transport nodes that defines the maximum span of logical
switches. A transport zone represents a set of similarly provisioned
hypervisors, and the logical switches that connect VMs on those hypervisors.
Profile
Represents a specific configuration that can be associated with an NSX Edge
cluster. For example, the fabric profile might contain the tunneling properties
for dead peer detection.
Gateway Router
NSX-T routing entity that provides distributed East-West routing. A
gateway router also links a Tier-1 router with a Tier-0 router.
Tier-1 gateway router is the second tier router that connects to one Tier-0
gateway router for northbound connectivity, and one or more overlay
networks for southbound connectivity. Tier-1 gateway router can also be
configured in an active-standby cluster of services when the router is
configured to provide stateful services.
Segment / Logical Switch
Logical network implemented using Layer 2-in-Layer 3 tunneling such that the
topology seen by VMs is decoupled from that of the physical network.
In contrast, the Edge Transport Node in NSX-T contains its own TEP within the
Edge, and no longer requires the hypervisor to perform encapsulation and
decapsulation functions on its behalf. When an encapsulated packet is
destined for an Edge, it is delivered in its encapsulated form directly to the
Edge Node via its TEP address. This allows for greater portability of the Edge
Node, since it no longer has dependencies on underlying kernel services of
the host.
NSX-T Logical Switching
The N-VDS
The primary component involved in the data plane of the transport nodes is
the N-VDS.
In the latter case, the NVDS must own one or more physical interfaces
(pNICs) on the transport node.
The N-VDS is mandatory with NSX-T for both overlay and VLAN
backed networking. On ESXi hypervisors, the N-VDS implementation is
derived from VMware vSphere® Distributed Switch™ (VDS).
In NSX-T, virtual layer 2 domains are called segments. There are two
kinds of segments:
VLAN backed segments
Overlay backed segments
VLAN backed segment
That means that traffic between two VMs on two different hosts but attached
to the same VLAN backed segment will be carried over a VLAN between the
two hosts.
On the other hand, two VMs on different hosts and attached to the
same overlay backed segment will have their layer 2 traffic carried by
tunnel between their hosts.
● Failover Order – An active uplink is specified along with an optional list of standby
uplinks. Should the active uplink fail, the next available uplink in the standby list takes its
place immediately.
Uplink Profiles are assigned to Transport Nodes in the NSX-T environment, and define
the configuration of the physical NICs that will be used.
Name: nsx-default-uplink-hostswitch-profile
Description: [blank]
Transport VLAN: 0
MTU: Using global MTU
Teaming Policy: FAILOVER_ORDER
Active Uplinks: uplink-1
Standby Uplinks: uplink-2
This profile states that two uplinks will be configured in a failover configuration.
Traffic will normally utilize uplink-1, and will traverse uplink-2 in the event of a
failure of uplink-1.
The N-VDS
The primary component involved in the data plane of the transport
nodes is the N-VDS. The NVDS forwards traffic between components
running on the transport node (e.g., between virtual
machines) or between internal components and the physical network.
In the latter case, the NVDS must own one or more physical interfaces
(pNICs) on the transport node.
The N-VDS is mandatory with NSX-T for both overlay and VLAN
backed networking. On ESXi hypervisors, the N-VDS implementation is
derived from VMware vSphere® Distributed Switch™ (VDS).
There are two types of Transport Zone in NSX-T, Overlay and VLAN:
• Overlay transport zones are used for NSX-T Logical Switch segments. Network
segments created in an Overlay transport zone will utilize TEPs and Geneve
encapsulation, as explored in Module 2: Logical Switching.
• VLAN transport zones are used for traditional VLAN-backed segments. Network
segments created in a VLAN transport zone function similar to a VLAN port group
in vSphere.
1. On the left side of the NSX-T user interface, click Transport Zones
2. Click TZ-Overlay
◦ NOTE: Click on the name of the Transport Zone, not the checkbox to its left
Verify Overlay Transport Zone Configuration
Observe the following configuration of the TZ-Overlay Transport Zone:
Name: TZ-Overlay
Description: [blank]
Traffic Type: Overlay
N-VDS Name: N-VDS-1
Host Membership Criteria: Standard
Uplink Teaming Policy Names: [blank]
Logical Ports: 8
Logical Switches: 3
This information is useful for seeing where a given Transport Zone is being used.
Revisiting Host Transport Node Configuration
Now it's time to review how uplink profiles and transport zones are combined to
configure Host Transport Nodes in NSX-T. There are two ways that this can be done:
• N-VDS-1
N-VDS Name: N-VDS-1
Associated Transport Zones: TZ-Overlay
IP Pool: TEP-ESXi-Pool
Physical NICs: vmnic1 to uplink-1
This profile states that a single Transport Zone, TZ-Overlay, will be associated with hosts
in this profile. Their connectivity to the physical network will use the nsx-default-
uplink-hostswitch-profile. Finally, when a TEP is provisioned on each host, it will
assign an IP address from the TEP-ESXi-Pool range of IP addresses.
1. Click CANCEL to return to the list of Transport Node Profiles
Verify Host Transport Nodes
1. On the left side of the NSX-T user interface, click Nodes
2. Click the dropdown next to Managed by
3. Click vCenter from the list of available options
4. If the list of hosts in the RegionA01-COMP01 (2) cluster are not already visible,
click the arrow to the left of the cluster name to display them
A single, standalone KVM host has been provisioned as part of this lab and has been
configured to participate in the NSX fabric.
1.Click the checkbox to the left of kvm-01a.corp.local to select it
2.Click Edit to review the configuration
Observe
• Name:the following details in the Host Details tab of the Edit Transport Node dialog:
kvm-01a.corp.local
• Description: [blank]
• IP Addresses: 192.168.110.61
1. Click NEXT to view the Configure NSX settings
Observe the following details in the Configure NSX tab of the Edit Transport Node
dialog:
This profile states that a single Transport Zone, TZ-Overlay, will be associated with
the KVM Transport Node host. Its connectivity to the physical network will use the
nsx- default-uplink-hostswitch-profile. Finally, when a TEP is provisioned on this
host, it will assign an IP address from the TEP-KVM-Pool range of IP addresses.
In this lab, there are four total Edge Transport Nodes, configured in two fault-
tolerant clusters of two nodes each.
1. Click Edge Transport Nodes
2. Click the checkbox to the left of nsx-edge-01 to select it
3. Click Edit to review the configuration
Observe the following details in the General tab of the Edit Edge Transport Node
Profile dialog:
Name: nsx-edge-01
Description: [blank]
Transport Zones (Selected): TZ-Overlay (Overlay), TZ-VLAN (VLAN)
• N-VDS-1
◦N-VDS Name: N-VDS-1
◦Associated Transport Zones: TZ-Overlay
◦Uplink Profile: nsx-edge-single-nic-uplink-profile-large-mtu
◦IP Assignment: Use IP Pool
◦IP Pool: TEP-ESXi-Pool
◦Physical NICs: Uplink-1 to edge-uplink-A
• N-VDS-2
◦ N-VDS Name: N-VDS-2
◦ Associated Transport Zones: TZ-VLAN
◦ Uplink Profile: nsx-edge-single-nic-uplink-profile-large-mtu
IP Assignment: [Disabled] Use DHCP
Physical NICs: Uplink-1 to edge-uplink-B
This profile states that this Edge Node will host two Transport Zones, TZ-Overlay and TZ-
VLAN. One transport zone will be used for route peering with the physical network (TZ-
VLAN), while the other transport zone will be used for overlay network services.
Their connectivity to the physical network will use the nsx-edge-single-nic-uplink-
profile-large-mtu. Finally, when a TEP is provisioned on the TZ-Overlay transport zone, it
will assign an IP address from the TEP-ESXi-Pool range of IP addresses. No TEP will be
provisioned on the VLAN transport zone, so the option is disabled.
1. Click the PuTTY icon in the taskbar. This will launch the PuTTY terminal client
1. Enter virsh list to view the virtual machine workloads currently running on this KVM host
and confirm that VM web-03a is running
virsh list
1. Enter ifconfig nsx-vtep0.0 into the command-line on kvm-01a to see that the TEP
interface has been created with an IP address of 192.168.130.61 and an MTU of 1600.
ifconfig nsx-vtep0.0
Uplink Profile Lab
The uplink profile is a template that defines how an N-VDS connects to the physical
network. It specifies:
●The format of the uplinks of an N-VDS
●The default teaming policy applied to those uplinks
●The transport VLAN used for overlay traffic
●The MTU of the uplinks
●The Network IO Control profile
Transport Node Creation with Uplink Profile
Leveraging Different Uplink Profiles
Network I/O Control
Network I/O Control, or NIOC, is the implementation in NSX-T of vSphere’s
Network I/O Control v3.
Shares: Shares, from 1 to 100, reflect the relative priority of a system traffic type
against the other system traffic types that are active on the same physical adapter.
Limit: The maximum bandwidth that a system traffic type can consume on a single
physical adapter.
The pre-determined types of ESXi infrastructure traffic are:
Note:
The benefit of this NSX-T overlay model is that it allows direct connectivity between
transport nodes irrespective of the specific underlay inter-rack connectivity (i.e., L2 or
L3).Segments can also be created dynamically without any configuration of the physical
network infrastructure
Flooded Traffic
The NSX-T segment behaves like a LAN, providing the capability of flooding traffic to all
the devices attached to this segment; this is a cornerstone capability of layer 2.
NSX-T does not differentiate between the different kinds of frames replicated to multiple
destinations. Broadcast, unknown unicast, or multicast traffic will be flooded in a similar
fashion across a segment.
In the overlay model, the replication of a frame to be flooded on a segment is orchestrated
by the different NSX-T components. NSX-T provides two different methods for flooding
traffic .
In the head end replication mode, the transport node at the origin of the
frame to be flooded sends a copy to each other transport node that is
connected to this segment.
Note : The default two-tier hierarchical flooding mode is recommended as a
best practice as it typically performs better in terms of physical uplink
bandwidth utilization.
Head-end Replication Mode
Two-tier Hierarchical Mode
In the two-tier hierarchical mode, transport nodes are grouped according to the subnet of
the IP address of their TEP.
Transport nodes in the same rack typically share the same subnet for their TEP IPs,
though this is not mandatory.
Based on this assumption, Figure shows hypervisor transport nodes classified in three
groups: subnet 10.0.0.0, subnet 20.0.0.0 and subnet 30.0.0.0.
Two-tier Hierarchical Mode
Tables Maintained by the NSX-T Controller
●Global MAC address to TEP table
●Global ARP table, associating MAC addresses to IP addresses
VXLAN has static fields while Geneve offers flexible field. This capability can be used by anyone to adjust the need of typical workload and overlay
fabric, thus NSX-T tunnels are only setup between NSX-T transport nodes. NSX-T only needs efficient support for the Geneve encapsulation by the NIC
hardware; most NIC vendors support the same hardware offload for Geneve as they would for VXLAN.
Network virtualization is all about developing a model of deployment that is applicable to variety of physical network variety and diversity of compute
domains. New networking features are developed in software and implemented without worry of support on the physical infrastructure. The data plane
learning section described how NSX-T relies on metadata inserted in the tunnel header to identify the source TEP of a frame. The benefit of Geneve over
VXLAN is that it allows any vendor to add its own metadata in the tunnel header with a simple Type-Length- Value (TLV) model. NSX-T defines a
single TLV, with fields for:
Geneve allows any vendor to add its own metadata in the tunnel header with a
simple Type-Length-Value (TLV) model. NSX-T defines a single TLV, with fields
for:
It runs as a kernel module and is distributed in hypervisors across all transport nodes, including
Edge nodes.
The traditional data plane functionality of routing and ARP lookups is performed by the logical
interfaces connecting to the different segments.
Each LIF has a vMAC address and an IP address representing the default IP gateway for its logical
L2 segment.
The IP address is unique per LIF and remains the same anywhere the segment/logical switch exists.
The vMAC associated with each LIF remains constant in each hypervisor, allowing the default
gateway and MAC to remain the same during vMotion.
E-W Routing with Workload on the same Hypervisor
Packet Flow between two VMs on same Hypervisor
1. “Web1” (172.16.10.11) sends a packet to “App1” (172.16.20.11). The packet is sent to
the default gateway interface (172.16.10.1) for “Web1” located on the local DR.
2. The DR on “HV1” performs a routing lookup which determines that the destination
subnet 172.16.20.0/24 is a directly connected subnet on “LIF2”. A lookup is performed
in the “LIF2” ARP table to determine the MAC address associated with the IP address
for “App1”. If the ARP entry does not exist, the controller is queried. If there is no
response from controller, an ARP request is flooded to learn the MAC address of
“App1”.
3. Once the MAC address of “App1” is learned, the L2 lookup is performed in the local
MAC table to determine how to reach “App1” and the packet is sent.
4. The return packet from “App1” follows the same process and routing would happen
again on the local DR.
E-W Packet Flow between two Hypervisors
The routing decisions taken by the DR on “HV1” and the DR on “HV2”. When “Web1” sends
traffic to “App2”, routing is done by the DR on “HV1”. The reverse traffic from “App2” to
“Web1” is routed by DR on “HV2”.
Services Router
East-West routing is completely distributed in the hypervisor, with each hypervisor in the
transport zone running a DR in its kernel. However, some services of NSX-T are not
distributed, including, due to its locality or stateful nature:
●Physical infrastructure connectivity
●NAT
●DHCP server
●Load Balancer
●VPN
●Gateway Firewall
●Bridging
●Service Interface
●Metadata Proxy for OpenStack
The appliances where the centralized services or SR instances are hosted are called Edge
nodes. An Edge node is the appliance that provides connectivity to the physical
infrastructure.
Logical Router Components and Interconnection
A Tier-0 Gateway can have following interfaces:
● External Interface – Interface connecting to the physical infrastructure/router. Static routing
and BGP are supported on this interface. This interface was referred to as uplink interface in
previous releases.
● Service Interface: Interface connecting VLAN segments to provide connectivity to VLAN
backed physical or virtual workloads. Service interface can also be connected to overlay
segments for Tier-1 standalone load balancer use cases explained in Load balancer
● Intra-Tier Transit Link – Internal link between the DR and SR. A transit overlay segment is
auto plumbed between DR and SR and each end gets an IP address assigned in 169.254.0.0/25
subnet by default. This address range is configurable and can be changed if it is used somewhere
else in the network.
● Linked Segments – Interface connecting to an overlay segment. This interface was referred
to as downlink interface in previous releases.
North-South Routing by SR Hosted on Edge Node
Routing Packet Flow
End-to-end Packet Flow – Application “Web1” to External
“Web1” (172.16.10.11) sends a packet to 192.168.100.10. The packet is sent to the “Web1”
default gateway interface (172.16.10.1) located on the local DR.
The packet is received on the local DR. DR doesn’t have a specific connected route for
192.168.100.0/24 prefix. The DR has a default route with the next hop as its corresponding SR,
which is hosted on the Edge node.
The “ESXi” TEP encapsulates the original packet and sends it to the Edge node TEP with a
source IP address of 10.10.10.10 and destination IP address of 30.30.30.30.
The Edge node is also a transport node. It will encapsulate/decapsulate the traffic sent to or
received from compute hypervisors. The Edge node TEP decapsulates the packet, removing
the outer header prior to sending it to the SR.
The SR performs a routing lookup and determines that the route 192.168.100.0/24 is learned
via external interface with a next hop IP address 192.168.240.1.
The packet is sent on the VLAN segment to the physical router and is finally delivered to
192.168.100.10.
End-to-end Packet Flow – External to Application “Web1”
An external device (192.168.100.10) sends a packet to “Web1” (172.16.10.11). The packet
is routed by the physical router and sent to the external interface of Tier-0 Gateway hosted
on Edge node.
A single routing lookup happens on the Tier-0 Gateway SR which determines that
172.16.10.0/24 is a directly connected subnet on “LIF1”. A lookup is performed in the
“LIF1” ARP table to determine the MAC address associated with the IP address for “Web1”.
This destination MAC “MAC1” is learned via the remote TEP (10.10.10.10), which is the
“ESXi” host where “Web1” is located.
The Edge TEP encapsulates the original packet and sends it to the remote TEP with an outer
packet source IP address of 30.30.30.30 and destination IP address of 10.10.10.10. The
destination VNI in this Geneve encapsulated packet is of “Web LS”.
The “ESXi” host decapsulates the packet and removes the outer header upon receiving the
packet. An L2 lookup is performed in the local MAC table associated with “LIF1”.
The packet is delivered to Web1.
Two-Tier Routing
The concept of multi-tenancy is built into the routing model. The top-tier
gateway is referred to as Tier-0 gateway while the bottom-tier gateway is Tier-
1 gateway. This structure gives both provider and tenant administrators
complete control over their services and policies. The provider
administrator controls and configures Tier-0 routing and services, while
the tenant administrators control and configure Tier-1.
○ NAT IP – NAT IP addresses owned by the Tier-0 gateway discovered from NAT rules
configured on Tier-0 Gateway.
○ IPSec Local IP – Local IPSEC endpoint IP address for establishing VPN sessions.
○ DNS Forwarder IP – Listener IP for DNS queries from clients and also used as
source IP used to forward DNS queries to upstream DNS server.
Tier-1 Gateway
○ Connected – Connected routes on Tier-1 include segment subnets connected to Tier-1 and
service interface subnets configured on Tier-1 gateway. 172.16.10.0/24 (Connected segment)
and 192.168.10.0/24 (Service Interface) are connected routes for Tier-1 gateway in Figure
○ NAT IP – NAT IP addresses owned by the Tier-1 gateway discovered from NAT
rules configured on the Tier-1 gateway.
○ LB SNAT – IP address or a range of IP addresses used for Source NAT by load balancer.
○ IPSec Local IP – Local IPSEC endpoint IP address for establishing VPN sessions.
○ DNS Forwarder IP –Listener IP for DNS queries from clients and also used as source IP
used to forward DNS queries to upstream DNS server.
Route Advertisement on the Tier-1 and Tier-0 Logical Router
Logical Routing Lab
Task:
Per transport node view shows that the distributed component (DR)
for Tier-0 and the Tier-1 gateways have been instantiated on two
hypervisors.
Logical Routing Instances
Multi-Tier Distributed Routing with Workloads on the same Hypervisor
5. The “HV1” TEP encapsulates the packet and sends it to the “HV2”
TEP.
In addition to static routing and BGP, Tier-0 gateway also supports a
dynamically created iBGP session between its Services router
component.
Tier-1 Gateway supports static routes but do not support any dynamic
routing protocols.
Dynamic Routing
BGP is the de facto protocol on the WAN and in most modern data
centers. A typical leaf-spine topology has eBGP running between leaf
switches and spine switches.
Tier-0 gateways support eBGP and iBGP on the external interfaces with
physical routers. BFD can also be enabled per BGP neighbor for faster
failover. BFD timers depend on the Edge node type.
Bare metal Edge supports a minimum of 300ms TX/RX BFD keep alive
timer while the VM form factor Edge supports a minimum of 1000ms
TX/RX BFD keep alive timer.
With NSX-T 2.5 release, the following BGP features are supported:
● Two and four bytes AS numbers in asplain, asdot and asdot+ format.
● eBGP multi-hop support, allowing eBGP peering to be established on
loopback interfaces.
● iBGP
● eBGP multi-hop BFD
● ECMP support with BGP neighbors in same or different AS numbers.
● BGP Allow AS in
● BGP route aggregation support with the flexibility of advertising a
summary route only to the BGP peer or advertise the summary route along
with specific routes. A more specific route must be present in the routing
table to advertise a summary route.
● Route redistribution in BGP to advertise Tier-0 and Tier-1 Gateway
internal routes .
● Inbound/outbound route filtering with BGP peer using prefix-lists or
route-maps.
● Influencing BGP path selection by setting Weight, Local preference, AS
Path Prepend, or MED.
● Standard, Extended and Large BGP community support.
● BGP well-known community names (e.g., no-advertise, no-export, no-
export-subconfed) can also be included in the BGP route updates to the
BGP peer.
● BGP communities can be set in a route-map to facilitate matching of
communities at the upstream router.
● Graceful restart (Full and Helper mode) in BGP.
Services High Availability
These physical routers may or may not have the same routing
information. For instance, a route 192.168.100.0/24 may only be
available on physical router 1 and not on physical router 2.
For such asymmetric topologies, users can enable Inter-SR routing.
This feature is only available on Tier-0 gateway configured in
active/active high availability mode.
Services like NAT are in constant state of sync between active and
standby SRs on the Edge nodes. This mode is supported on both
Tier-1 and Tier-0 SRs. Preemptive and Non-Preemptive modes are
available for both Tier-0 and Tier-0 SRs.
Default mode for gateways configured in active/standby high
availability configuration is non-preemptive. A user needs to select
the preferred member (Edge node) when a gateway is configured in
active/standby preemptive mode.
Both of the Tier-0 SRs (active and standby) receive routing updates
from physical routers and advertise routes to the physical routers;
however, the standby Tier-0 SR prepends its local AS three times in the
BGP updates so that traffic from the physical routers prefer the active
Tier-0 SR.
Southbound IP addresses on active and standby Tier-0 SRs are the
same and the operational state of standby SR southbound interface is
down. Since the operational state of southbound Tier-0 SR interface
is down, the Tier-0 DR does not send any traffic to the standby SR.
Active and Standby Routing Control with eBGP
High Availability (HA)
Edge Node
Edge nodes are service appliances with pools of capacity, dedicated to
running network and security services that cannot be distributed to the
hypervisors.
Edge nodes are available in two form factors – VM and bare metal. Both
leverage the data plane development kit (DPDK) for faster packet
processing and high performance.
Notice that a single N-VDS is used in this topology that carries both
overlay and external traffic.
For instance, if pNIC P1 fails, TEP IP1 along with its MAC address will
be migrated to use Uplink2 that’s mapped to pNIC P2. In case of pNIC
P1 failure, pNIC P2 will carry the traffic for both TEP IP1 and TEP IP2.
VM Edge Node
NSX-T VM Edge in VM form factor can be installed using an OVA, OVF,
or ISO file. NSX-T Edge VM is only supported on ESXi host.
This default teaming policy can be overridden for VLAN segments only
using “named teaming policy”.
Each N-VDS instance can have a unique teaming policy, allowing for
flexible design choices.
Edge Cluster
Scale out from the logical networks to the Edge nodes is achieved using
ECMP. NSX-T 2.3 introduced the support for heterogeneous Edge nodes
which facilitates easy migration from Edge node VM to bare metal Edge
node without reconfiguring the logical routers on bare metal Edge nodes.
Edge VMs support BFD with minimum BFD timer of one second with
three retries, providing a three second failure detection time.
Bare metal Edges support BFD with minimum BFD TX/RX timer of
300ms with three retries which implies 900ms failure detection time.
Network Services
The distribution of the firewall for the application of security policy to protect
individual workloads is highly efficient; rules can be applied that are specific to
the requirements of each workload.
The NSX-T DFW architecture management plane, control plane, and data
plane work together to enable a centralized policy configuration model
with distributed firewalling.
This section will examine the role of each plane and its associated
components, detailing how they interact with each other to provide a
scalable, topology agnostic distributed firewall solution.
NSX-T DFW Architecture and Components
Management Plane
Each of the transport nodes, at any given time, connects to only one of the CCP managers,
based on mastership for that node.
On each of the transport nodes, once local control plane (LCP) has received policy
configuration from CCP, it pushes the firewall policy and rules to the data plane filters (in
kernel) for each of the virtual NICs.
With the “Applied To” field in the rule or section which defines scope of enforcement, the
LCP makes sure only relevant DFW rules are programmed on relevant virtual NICs instead
of every rule everywhere, which would be a suboptimal use of hypervisor resources.
NSX-T Data Plane Implémentation - ESXi vs. KVM Hosts
Management and control plane components are identical for both ESXi and
KVM hosts. For the data plane, they use a different implementation for
packet handling.
NSX-T uses N-VDS on ESXi hosts, which is derived from vCenter VDS, along
with the VMware Internetworking Service Insertion Platform
(vSIP) kernel module for firewalling.
For KVM, the N-VDS leverages Open vSwitch (OVS) and its utilities.
ESXi Hosts- Data Plane Components
NSX-T uses N-VDS on ESXi hosts for connecting virtual workloads,
managing it with the NSXT Manager application.
The NSX-T DFW kernel space implementation for ESXi is same as the
implementation of NSX for vSphere – it uses the VMware Internetworking
Service Insertion Platform (vSIP) kernel module and kernel IO chains filters.
For KVM, there is an additional component called the NSX agent in addition to LCP, with
both running as user space agents. When LCP receives DFW policy from the CCP, it sends
it to NSX-agent.
NSX-agent will process and convert policy messages received to a format appropriate for
the OVS data path. Then NSX agent programs the policy rules onto the OVS data path
using OpenFlow messages.
For stateful DFW rules, NSX-T uses the Linux conntrack utilities to keep track of the state
of permitted flow connections allowed by a stateful firewall rule. For DFW policy rule
logging, NSX-T uses the ovs-fwd module.
NSX-T DFW Data Plane Components on KVM
NSX-T DFW Policy Lookup and Packet Flow
In the data path, the DFW maintains two tables: a rule table and a
connection tracker table. The LCP populates the rule table with the
configured policy rules, while the connection tracker table is updated
dynamically to cache flows permitted by rule table.
NSX-T DFW can allow for a policy to be stateful or stateless with section-
level granularity in the DFW rule table.
The connection tracker table is populated only for stateful policy rules; it
contains no information on stateless policies. This applies to both ESXi
and KVM environments.
NSX-T DFW rules are enforced as follows:
Each packet is checked against the top rule in the rule table before
moving down the subsequent rules in the table.
The first rule in the table that matches the traffic parameters is
enforced. The search is then terminated, so no subsequent rules will
be examined or enforced.
The TCP SYN packets hit the DFW on vNIC and does Flow Table Look
Up first, to see any state match to existing Flow. Given it's the first
packet of the new session, lookup Results in Flow state not found.
Since Flow Table Miss, DFW does Rule Table lookup in top-down
order for 5-Tuple match.
Flow Matches FW rule 2, which is Allow so the packet is sent out to the
destination.
In addition, the Flow table is updated with New Flow State for
permitted flow as "Flow 2”.
Subsequent packets in this TCP session checked against this flow in
the flow table for the state match. Once session terminates flow
information is removed from the flow table.
NSX-T Security Policy - Plan, Design and Implement
Define Security Policy – Using the firewall rule table, define the security
policy. Have categories and policies to separate and identify emergency,
infrastructure, environment, and application-specific policy rules based on
the rule model.
NSX-T Groups & DFW Rule
NSX-T provides collection of referenceable objects represented in a construct called
Groups. The selection of a specific policy methodology approach – application,
infrastructure, or network – will help dictate how grouping construct is used.
A Group is a logical construct that allows grouping into a common container of static
(e.g., IPSet/NSX objects) and dynamic (e.g., VM names/VM tags) elements. This is a
generic construct which can be leveraged across a variety of NSX-T features where
applicable.
Static criteria provide capability to manually include particular objects
into the Group. For dynamic inclusion criteria, Boolean logic can be used
to create groups between various criteria.
Segment All VMs/vNICs connected to this segment/logical switch segment will be selected.
Nested (Sub-group) of collection of referenceable objects - all VMs/vNICs defined within the Group will be selected.
Group
This particular vNIC instance will be selected.
Segment Port
Selected MAC sets container will be used. MAC sets contain a list of individual MAC addresses.
MAC Address AD
Grouping based on Active Directory groups for Identity Firewall (VDI/RDSH) use case.
Groups
VM Properties used for Groups
The use of Groups gives more flexibility as an environment changes
over time. This approach has three major advantages:
Rules stay more constant for a given policy model, even as the data center
environment changes. The addition or deletion of workloads will affect
group membership alone, not the rules.
As NSX-T adds more grouping object criteria, the group criteria can be
edited to better reflect the data center environment.
Using Nesting of Groups
In the example shown in Figure , three Groups have been defined with different
inclusion criteria to demonstrate the flexibility and the power of grouping
construct.
Using dynamic inclusion criteria, all VMs with name starting by “WEB” are
included in Group named “SG-WEB”.
Using dynamic inclusion criteria, all VMs containing the name “APP” and
having a tag “Scope=PCI” are included in Group named “SG-PCI-APP”.
Using static inclusion criteria, all VMs that are connected to a segment “SEG-
DB” are included in Group named “SG-DB”.
Group and Nested Group Example
Define Policy using DFW Rule Table
The NSX-T DFW rule table starts with a default rule to allow (blacklist) any
traffic. An administrator can add multiple policies on top of default rule
under different categories based on the specific policy model.
In the data path, the packet lookup will be performed from top to bottom
order, starting with policies from category Ethernet, Emergency,
Infrastructure, Environment and Application.
Any packet not matching an explicit rule will be enforced by the last rule in
the table (i.e., default rule). This final rule is set to the “allow” action by
default, but it can be changed to “block” (whitelist) if desired.
Source and Destination: Source and destination fields of the packet. This
will be a GROUP which could be static or dynamic groups as mentioned
under Group section.
Applied To: Define the scope of rule publishing. The policy rule could
be published all workloads (default value) or restricted to a specific
GROUP. When GROUP is used in Applied To it needs to be based on
NON-IP members like VM object, Segments etc.
Action Description
Log: Enable or disable packet logging. When enabled, each DFW enabled host will send
DFW packet logs in a syslog file called “dfwpktlog.log” to the configured syslog server. This
information can be used to build alerting and reporting based on the information within the
logs, such as dropped or allowed packets.
Direction: This field matches the direction of the packet, default both In-Out. It can be set
to match packet exiting the VM, entering the VM, or both directions.
Tag: You can tag the rule; this will be sent as part of DFW packet log when traffic hits this
rule.
Notes: This field can be used for any free-flowing string and is useful to store comments.
Stats: Provides packets/bytes/sessions statistics associated with that rule entry. Polled every
5 minutes.
Examples of Policy Rules for 3-Tier Application
For instance, a newly installed web server will be seamlessly protected by the first
policy rule with no human intervention, while VM disconnected from a segment will
no longer have a security policy applied to it. This type of construct fully leverages the
dynamic nature of NSX-T.
This is very common use case for our customer who is looking at NSX-T
as a platform only for micro-segmentation security use case without
changing existing network isolation, which is VLAN backed. This is ideal
use case for brownfield deployment where customer wants to enhance the
security posture for existing application without changing network
design.
NSX-T DFW Logical topology – VLAN Backed Workloads
NSX-T DFW Physical Topology – VLAN Backed Workloads
NSX-T Distributed Firewall for Mix of VLAN and Overlay backed
workloads
This use case mainly applies to customer who wants to adapt NSX-T
micro-segmentation policies to all of their workloads and looking at
adapting NSX-T network virtualization (overlay) for their application
networking needs in phases. This scenario may arise when customer
starts to either deploy new application with network virtualization or
migrating existing applications in phases from VLAN to overlay backed
networking to avail the advantages of NSX-T network virtualization.
NSX-T DFW Logical Topology – Mix of VLAN & Overlay Backed Workloads
NSX-T DFW Physical Topology – Mix of VLAN & Overlay Backed
Workloads
Summary
NSX-T Platform enforces micro-segmentation policies irrespective of
network isolation, VLAN or overlay or Mix, without having to change
policy planning, design, and implementation. A user can define NSX-T
micro-segmentation policy once for the application, and it will continue
to work as you migrate application from VLAN based networking to NSX-
T overlay backed networking.
Gateway Firewall
The NSX-T Gateway firewall provides essential perimeter firewall protection which
can be used in addition to a physical perimeter firewall. Gateway firewall service is
part of the NSX-T Edge node for both bare metal and VM form factors. The Gateway
firewall is useful in developing PCI zones, multi-tenant environments, or DevOps
style connectivity without forcing the inter-tenant or inter-zone traffic onto the
physical network. The Gateway firewall data path uses DPDK framework supported
on Edge to provide better throughput.
Optionally, Gateway Firewall service insertion capability can be
leveraged with the partner ecosystem to provide advanced security
services like IPS/IDS and more. This enhances the security posture by
providing next-generation firewall (NGFW) services on top of native
firewall capability NSX-T provides. This is applicable for the design
where security compliance requirements mandate zone or group of
workloads need to be secured using NGFW, for example, DMZ or PCI
zones or Multi-Tenant environments.
Consumption
This section provides two examples for possible deployment and data
path implementation.
Gateway FW as Perimeter FW at Virtual and Physical Boundary The
Tier-0 Gateway firewall is used as perimeter firewall between physical
and virtual domains. This is mainly used for N-S traffic from the
virtualized environment to physical world. In this case, the Tier-0 SR
component which resides on the Edge node enforces the firewall policy
before traffic enters or leaves the NSX-T virtual environment. The E-W
traffic continues to leverage the distributed routing and firewalling
capability which NSX-T natively provides in the hypervisor.
Tier-0 Gateway Firewall – Virtual-to-Physical Boundary
Gateway FW as Inter-tenant FW
This deployment scenario extends the Gateway Firewall scenarios depicted above
with additional capability to insert the NGFW on top of native firewall capability
NSX-T Gateway Firewall provides.
This is applicable for the design where security compliance requirements mandate
zone or group of workloads need to be secured using NGFW, for example, DMZ or
PCI zones or Multi-Tenant environments.
The service insertion can be enabled per Gateway for both Tier-0 and Tier-1
Gateway depending on the scenario.
As a best practice Gateway firewall policy can be leveraged as the first
level of defense to allow traffic based on L3/L4 policy. And leverage
partner service as the second level defense by defining policy on
Gateway firewall to redirect the traffic which needs to be inspected by
NGFW. This will optimize the NGFW performance and throughput.
The following diagram provides the logical representation of overall
deployment scenario. Please refer to NSX-T interoperability matrix to
check certified partners for the given use case.
Gateway Firewall – Service Insertion
Endpoint Protection with NSX-T
The server pool can include an arbitrary mix of physical servers, VMs or containers that
together, allow scaling out the application.
Application high-availability
Modern applications are often built around advanced load balancing capabilities,
which go far beyond the initial benefits of scale and availability. In the example
below, the load balancer selects different target servers based on the URL of the
requests received at the VIP:
Load Balancing offers advanced application load balancing
The NSX-T load balancer is running on a Tier-1 gateway. The arrows in the above
diagram represent a dependency: the two load balancers LB1 and LB2 are
respectively attached to the Tier-1 gateways 1 and 2.
Load balancers can only be attached to Tier-1 gateways (not Tier-0 gateways), and
one Tier-1 gateway can only have one load balancer attached to it.
Virtual Server
On a load balancer, the user can define one or more virtual server (the
maximum number depends on the load balancer form factor – See NSX-T
Administrator Guide for load balancer scale information).
A virtual server can have basic or advanced load balancing options such as
forward specific client requests to specific pools, or redirect them to external
sites, or even block them.
Pool
A pool is a construct grouping servers hosting the same application.
Grouping can be configured using server IP addresses or for more
flexibility using Groups.
In the above diagram for example, virtual server VS2 could load
balance image requests to Pool2, while directing other requests to
Pool3.
Monitor
A monitor defines how the load balancer tests application availability.
Those tests can range from basic ICMP requests to matching patterns
in complex HTTPS queries.
Monitors are specified by pools: a single pool can use only 1 monitor,
but the same monitor can be used by different Pools.
Lab
NSX-T Load Balancing deployment modes
NSX-T load balancer is flexible and can be installed in either traditional in-
line or one-arm topologies. This section goes over each of those options and
examine their traffic patterns.
In-line load balancing In in-line load balancing mode, the clients and the
pool servers are on different side of the load balancer. In the design below,
the clients are on the Tier-1 uplink side, and servers are on the Tier-1
downlink side: