BGP Evpn For Vxlan: Deployment Benefits Summary EVPN Overview and Operations
BGP Evpn For Vxlan: Deployment Benefits Summary EVPN Overview and Operations
BGP Evpn For Vxlan: Deployment Benefits Summary EVPN Overview and Operations
Contents
Introduction 2
Conclusion 24
EVPN: Introduction
Introduction
Many data centers today are moving from a legacy layer 2 Ethernet Virtual Private Network (EVPN) is a feature
design to a modern layer 3 web-scale IT architecture. Layer offered by Cumulus Networks that provides a scalable,
3 designs using traditional routing protocols like OSPF and interoperable end-to-end control-plane solution for VXLAN
BGP allow simplified troubleshooting, clear upgrade strategies, tunnels using BGP. It supports redundancy, load sharing
multi-vendor support, small failure domains and less vendor and multi-tenant segmentation. EVPN also provides the
lock-in. However, many applications, storage appliances and benefit of fast convergence for host and VM mobility over
tenant considerations still require layer 2 adjacency. VXLAN tunnels and ARP suppression.
Virtual Extensible LAN (VXLAN) is widely deployed in many This white paper discusses deployment benefits, how EVPN
layer 3 data centers to provide layer 2 connectivity between works, how to operate EVPN, and different deployment
hosts for specific applications. For example, as seen in scenarios. This paper also includes sample Cumulus Linux
Figure 1, the webservers and the load balancer must be configurations to deploy a scalable, controller-free layer 2
on the same layer 2 network. VXLAN provides that layer 2 virtualization over a layer 3 IP fabric using the standard well-
connectivity over a layer 3 infrastructure. known routing protocol, BGP. For information on integrating
VXLAN routing with EVPN, please visit our VXLAN
Routing with EVPN white paper.
Internet
L3 IP Fabric
loadbalancer webservers
FIGURE 1 - LOAD BALANCING OVER A LAYER 3 INFRASTRUCTURE WITH VXLAN VXLAN tunnel
VXLAN provides a scalable solution for a VTEP must have a VXLAN-supported chipset such
layer 2 virtualization over a layer 3 routed as Mellanox Spectrum or Broadcom Trident II, Trident
infrastructure II+ or Tomahawk. A list of our compatible hardware can
be found in the Hardware Compatibility List. Though
Virtual Tunnel Endpoints (VTEPs) are used to originate and it’s not depicted in Figure 2, you can have multiple VNIs
terminate the VXLAN tunnel and map end devices such as (VXLANs) using one VTEP IP address.
hosts and VMs to VXLAN segments. The VTEP provides
the encapsulation of layer 2 frames into User Datagram In traditional VXLAN, as seen below in Figure 2, the
Protocol (UDP) segments to traverse across a layer 3 control plane and the data plane are integrated together
fabric. Likewise, the VTEP also de-capsulates the UDP — meaning MAC address learning happens over the data
segments from a VXLAN tunnel to send to a local host. A plane, often called flood and learn. This causes limitations,
VTEP requires an IP address (often a loopback address) including limited load balancing and slow convergence
and uses this address as the source/destination tunnel IP times especially for VM and host mobility. Further, broadcast,
address. The VTEP IP address must be advertised into the unknown unicast, and multicast (BUM) traffic, such as ARP, is
routed domain so the VXLAN tunnel endpoints can reach required to traverse across the tunnel for discovery purposes,
each other as shown in Figure 2. Each switch that hosts thereby increasing overall data center traffic.
spine
routing
leaf
VTEP VTEP VTEP
VNI10100 VNI10100 VNI10100
bridging
host01
VLAN100
host02
VLAN100
host03
VLAN100 hosts
VXLAN tunnel
encompasses Layer 2
Layer 2
The Cumulus Networks EVPN implementation provides a EVPN provides remote VTEP discovery, thus it doesn't
separate control plane for VXLAN tunnels. EVPN provides require an external controller. Learning control plane
exchange of MAC/IP addresses between VTEPs through information independently of the data plane offers greater
the use of a separate control plane, similar to pure IP redundancy, load sharing and multipathing while also
routing. Cumulus EVPN is an open and standards based supporting MAC address filtering and traffic engineering,
solution that implements IETF RFC 7432 “BGP MPLS- which can provide granular control of traffic flow. EVPN
Based Ethernet VPN” along with IETF draft “A Network also provides faster convergence for mobility. Greater
Virtualization Overlay Solution using EVPN” for a redundancy and multipathing can be achieved because all
VXLAN tunnel control plane. the possible paths are exchanged across the control plane,
not just from one data plane path.
EVPN introduces a new address family to the MP-BGP
protocol family, as depicted in Figure 3. When EVPN is implemented with the VXLAN data plane,
the evpn address family can exchange either just the MAC
layer control plane information (that is, MAC addresses)
or it can exchange both the MAC address and IP address
information in its updates between VTEPs. Exchanging
IP and MAC information together can allow for ARP
suppression at the local switch, thereby reducing the
broadcast traffic in a data center.
CONTROL PL ANE
EVPN VTEP PEER DISCOVERY EVPN offers peer discovery, thus requiring
One large advantage of deploying EVPN is the ability to no external controller
deploy controller-free VXLAN tunnels. EVPN uses type 3
EVPN routes to exchange information about the location of For example, in Figure 4, the VTEPs are automatically
the VTEPs on a per-VNI basis, thereby enabling automatic discovered via eBGP and do not need to be explicitly
discovery. It also reduces or eliminates the chance of a configured or controlled as peers. The spine switches do
rogue VTEP being introduced in the data center. not need to be configured for VLAN or VXLAN at all. All the
discovered VTEPs within a VXLAN can easily be seen from
one participating VTEP with a simple show command. The
command in Figure 4 below displays the number of remote
VTEPs associated with a specific VNI that are automatically
discovered, including any rogue VTEPs.
spine
routing
leaf
VTEP VTEP VTEP VTEP
VNI10100 VNI10100 VNI10100 VNI10100
bridging
host01
VLAN100
host02
VLAN100
host03
VLAN100
host04
VLAN100 hosts
VXLAN tunnel
encompasses Layer 2
Layer 2
FIGURE 4 - VTEP PEER DISCOVERY EVPN MULTI-TENANT The RD makes overlapping routes from different
SUPPORT
tenants look unique to the data center spine switches
The new EVPN address family also provides multi-tenant
to provide proper routing. A per-VXLAN 8-byte RD is
separation and allows for overlapping addresses between
prepended to each advertised route before the route
tenants. To maintain the separation, it uses mature MP-
is sent to its BGP EVPN peer. In Figure 5, the same
BGP VPN technology: Route Distinguishers (RDs) and
route is advertised from 2 hosts in separate tenants,
Route-Targets (RTs).
but the spine router can distinguish between the
routes since they have different route distinguishers.
(Some entries left off for brevity)
00:03:00:22:22:02 00:03:00:22:22:02
*> [2]:[0]:[0]:[48]:[00:03:00:22:22:02]:[32]:[172.16.2.2]
10.1.1.2 0 65001 i
*> [3]:[0]:[32]:[10.1.1.1]
10.1.1.1 0 65001 i
Route Distinguisher: 10.1.1.2:10200
*> [2]:[0]:[0]:[48]:[00:03:00:22:22:02]:[32]:[172.16.2.2]
10.1.1.2 0 65002 i
*> [3]:[0]:[32]:[10.1.1.2]
10.1.1.2 0 65002 i
EVPN also makes use of the RT extended community for on a VTEP. If the export RT in the received update matches
route filtering and separating tenants. The RT is advertised the import RT of a VNI instance on the VTEP receiving the
in the BGP update message along with the EVPN routes. update, the corresponding routes will be imported into that
The RT community distinguishes which routes should be VNI’s EVPN Route table. If the RTs do not match, the route
exported from and imported into a specific VNI route table will not be imported into that VNI’s EVPN route table.
EVPN provides multi-tenant separation with Cumulus Linux supports either a default RD and/or RT for
one protocol instance configuration ease, or configuring explicit RD and/or RT
within BGP for each VNI to allow flexibility. By default, the
Figure 6 below depicts leaf01 sending a BGP EVPN MAC switch automatically derives the RD and RT from the VNI.
route to leaf02 with the attached route-target community. In the default case, the RD would be Router ID: n (where
As seen, four MAC routes are sent in the advertisement, n is assigned chronologically from 1) , and the export
two each originating from different VNIs on leaf01. Since RT is set at AS:VNI. The import RT community is set to
leaf02 only has the one route-target import, 65001:1, it <any>:VNI, which allows all routes from that same VXLAN
will receive only those routes associated with 65001:1, to be imported. If more granular control of importing routes,
and the route with route-target 65001:2 will not be compatibility with other vendors, and/or if globally unique
installed as there is no matching import route-target VNIs are not configured, the RD and RT community can be
within VNI 10100 located on leaf02. manually configured as well. Manually configuring the RT
and RD overrides the default RD and RT values.
leaf01 leaf02
10.1.1.1/32 VTEP
VNI10100
VTEP
VNI10200
VTEP
VNI10100
10.1.1.2/32
44:38:39:00:00:28
44:38:39:00:00:1b
route advertisment
MAC + IP ADDRESS LEARNING/EXCHANGE On the remote end, the MAC+IP routes that BGP learns are
Cumulus Networks supports advertising either the MAC placed into the BGP table. From there, if the route target
routes only, or advertising the MAC+IP routes in Cumulus community sent with the route matches a local VNI route-
Linux 3.3 and later. The MAC+IP address advertisement is target import, the route will be placed into that switch’s
necessary to support ARP suppression. MAC forwarding table with the appropriate VXLAN tunnel as
its destination. The IP address, if included, will be placed in
Cumulus EVPN reduces broadcast traffic in the EVPN ARP cache. This process separates the data and
a data center via ARP suppression control planes, allowing dynamic learning of MAC addresses,
allows overlapping MAC+IP addresses between tenants, and
On the leaf switch, each local VLAN is mapped to a VNI. allows granular filtering of MAC+IP addresses, all without
When a local switch learns a new MAC+IP route on a requiring a data plane packet to traverse each switch first.
particular VLAN, either via gratuitous ARP (GARP) or via
To walk through an example of the MAC+IP address
the first data packet, which is typically an ARP request,
being propagated through the network, consider the
the MAC address is placed into the local switch’s bridge
example network in Figure 7 where there are two leaf
forwarding table. Additionally, the local leaf’s ARP/ND table
switches, each participating in two independent VXLAN
is populated with the IP to MAC layer mapping. The local
tunnels. This same demo is available here to set this
MP-BGP process learns every new local MAC address from
up virtually and follow along.
the local forwarding table and learns its corresponding IP
route from the ARP/ND table. MP-BGP then advertises the
MAC+IP route to the remote VTEPS via Type 2 EVPN routes.
spine
MP-BGP Peers
L3 Routing
leaf02 leaf03
VNI10200 VNI10100 VNI10100 VNI10200
VTEP = 10.1.1.2 VTEP = 10.1.1.3
swp1 swp2 swp1 swp2
hosts
In this case, we can see the local MAC address The 00:00:00:00:00:00 MAC address associated with
00:03:00:11:11:02 is located in VLAN 20. The remote vxlan_20 and the 00:00:00:00:00:00 MAC address
MAC addresses can also be seen across the tunnels. For associated with vxlan_10 entries are added by EVPN when
example, host04’s MAC address 00:03:00:44:44:01 in the VTEP is discovered. These entries are the head end
VLAN 20, is reachable through interface vxlan_20 and is replication entries and should never age out as long as a
behind VTEP 10.1.1.3. remote VTEP is active.
To propagate the local MAC+IP routes to the remote network. For brevity, we will display only VNI 10200, which
VTEP, the local MAC and IP addresses will be learned shows communication between host01 and host04. The
by MP-BGP, as seen in the following. The type 2 routes addresses associated with host01 are highlighted. To view
are advertising the MAC+IP addresses, and the type 3 the entire output, download the demo here and perform
routes are advertising the location of the VTEPs in the the command “net show bgp evpn route”
(continued)
*> [2]:[0]:[0]:[48]:[00:03:00:44:44:01]:[128]:[fd00::24]
10.1.1.3 0 65000 65003 i
* [2]:[0]:[0]:[48]:[00:03:00:44:44:01]:[128]:[fe80::203:ff:fe44:4401]
10.1.1.3 0 65000 65003 i
*> [2]:[0]:[0]:[48]:[00:03:00:44:44:01]:[128]:[fe80::203:ff:fe44:4401]
10.1.1.3 0 65000 65003 i
*> [3]:[0]:[32]:[10.1.1.2]
10.1.1.2 32768 i
* [3]:[0]:[32]:[10.1.1.3]
10.1.1.3 0 65000 65003 i
*> [3]:[0]:[32]:[10.1.1.3]
10.1.1.3 0 65000 65003 i
Displayed 10 prefixes (15 paths)
The routes are separated per tenant (VNI), and is identified From here, the local routes are advertised to the remote
by the route distinguishers in a full output. The local routes BGP neighbor (usually a spine in the case of eBGP) and then
naturally have no AS path, whereas the remote ones do show propagated to the remote leaf. The eBGP EVPN output on the
the AS path to the MAC+IP address and VTEP IP addresses. same VNI from the remote leaf looks like the following:
(continued)
*> [2]:[0]:[0]:[48]:[00:03:00:44:44:01]:[32]:[172.16.20.4]
10.1.1.3 32768 i
*> [2]:[0]:[0]:[48]:[00:03:00:44:44:01]:[128]:[fd00::24]
10.1.1.3 32768 i
*> [2]:[0]:[0]:[48]:[00:03:00:44:44:01]:[128]:[fe80::203:ff:fe44:4401]
10.1.1.3 32768 i
* [3]:[0]:[32]:[10.1.1.2]
10.1.1.2 0 65000 65002 i
*> [3]:[0]:[32]:[10.1.1.2]
10.1.1.2 0 65000 65002 i
*> [3]:[0]:[32]:[10.1.1.3]
10.1.1.3 32768 i
Displayed 10 prefixes (15 paths)
As seen above, 00:03:00:11:11:02/172.16.20.1 and fd::21 are Based upon the configured import route targets, BGP then
now remote addresses with 2 paths, as expected. places certain routes within specific VNIs. For example, in
this case, we have an import route target of <any>:10200
to be imported into VNI 10200, and an import route-target
of <any>:10100 to be imported into VNI 10100, so all the
MAC+IP addresses with the same route target will be
imported into the respective VNI.
(continued)
As clearly seen above, EVPN is able to learn and exchange If ARP suppression is turned on, the local leaf, having the
MAC+IP addresses via the MP-BGP routing protocol while remote MAC address and IP address, is able to respond
keeping tenant separation. to a server’s ARP request, thus reducing broadcast traffic
throughout the data center.
L3 Routing
hosts
rack1 rack2
L3 Routing
VM1
44:38:39:00:00:1b VM1 hosts
44:38:39:00:00:17 VM2 VM3 44:38:39:00:00:23
rack1 rack2
VXLAN tunnel
When the host or VM moves, the new local switch (in this sequence numbers between two identical routes (the new
case leaf03) learns of the change via GARP or the data one versus any that are already in the table), and install
plane. The local switch then installs the MAC address into the route with the highest sequence number into the
its bridge forwarding database, and BGP reads it. BGP EVPN table, which then gets installed to the local bridge
then compares the MAC address with its current BGP table. forwarding database. Use of the MAC mobility community
If the MAC address is already there, BGP then increments with the sequence numbers ensure all applicable VTEPs
the sequence number in this community (it is assumed converge quickly on the latest route to the MAC address.
to be 0 if the community is not there) before advertising Below shows the output on the new local leaf (leaf03) after
the address to remote peers. The remote peers similarly the move. The MAC Mobility community (MM) is now
compare the BGP extended MAC mobility community’s shown and the MAC address has moved once.
cumulus@leaf03:~$ net show bgp evpn route vni 10100 mac 44:38:39:00:00:1b
BGP routing table entry for [2]:[0]:[0]:[48]:[44:38:39:00:00:1b]
Paths: (1 available, best #1)
Not advertised to any peer
Route [2]:[0]:[0]:[48]:[44:38:39:00:00:1b] VNI 10100
Local
10.1.1.3 from 0.0.0.0 (10.1.1.3)
Origin IGP, localpref 100, weight 32768, valid, sourced, local, bestpath-from-AS Local, best
(continued)
cumulus@leaf02:~$ net show bgp evpn route vni 10100 mac 44:38:39:00:00:1b
BGP routing table entry for [2]:[0]:[0]:[48]:[44:38:39:00:00:1b]
Paths: (2 available, best #2)
Not advertised to any peer
Route [2]:[0]:[0]:[48]:[44:38:39:00:00:1b] VNI 10100
Imported from 10.1.1.3:10100:[2]:[0]:[0]:[48]:[44:38:39:00:00:1b]
65000 65003
10.1.1.3 from spine01(swp51) (10.10.2.1)
Origin IGP, localpref 100, valid, external
Extended Community: RT:65003:10100 ET:8 MM:1
AddPath ID: RX 0, TX 68
Last update: Sun Feb 5 18:35:37 2017
Route [2]:[0]:[0]:[48]:[44:38:39:00:00:1b] VNI 10100
Imported from 10.1.1.3:10100:[2]:[0]:[0]:[6]:[44:38:39:00:00:1b]
65000 65003
10.1.1.3 from spine02(swp52) (10.10.2.2)
Origin IGP, localpref 100, valid, external, bestpath-from-AS 65000, best
Extended Community: RT:65003:10100 ET:8 MM:1
AddPath ID: RX 0, TX 66
Last update: Sun Feb 5 18:35:37 2017
cumulus@leaf02:~$ net show bgp evpn route vni 10100 mac 00:03:00:55:55:02
BGP routing table entry for [2]:[0]:[0]:[48]:[00:03:00:55:55:02]
Paths: (1 available, best #1)
Not advertised to any peer
Route [2]:[0]:[0]:[48]:[00:03:00:55:55:02] VNI 10100
Local
10.1.1.2 from 0.0.0.0 (10.1.1.2)
Origin IGP, localpref 100, weight 32768, valid, sourced, local, bestpath-from-AS Local, best
Extended Community: ET:8 RT:65002:10100 MM:0, sticky MAC
AddPath ID: RX 0, TX 88
Last update: Tue Jun 6 15:22:47 2017
(continued)
EVPN deployment scenarios and Naturally, VTEPs must be configured on the leaf switches for
the data plane traffic. Below is a snippet of a sample VXLAN
configuration
configuration in a leaf with VXLAN active-active mode. The
EVPN is used as the control plane solution for extending MLAG and layer 3 configurations are left off for brevity.
layer 2 connectivity across a data center using layer 3
fabric or it can be used to provide layer 2 connectivity
between data centers.
interface lo
address 10.0.0.11/32
clagd-vxlan-anycast-ip 10.0.0.20
interface swp2
alias host facing interface
bridge-access 10
interface swp51
alias spine facing interface bgp unnumbered
interface vxlan _ 10
bridge-access 10
bridge-arp-nd-suppress on
bridge-learning off
mstpctl-bpduguard yes
mstpctl-portbpdufilter yes
vxlan-id 10100
vxlan-local-tunnelip 10.0.0.11
interface bridge
bridge-ports vxlan _ 1 swp2
bridge-vids 10-20
bridge-vlan-aware yes
interface vlan _ 10
ip-forward off
ip6-forward off
vlan-id 10
vlan-raw-device bridge
As seen above, the active-active mode VXLAN VNI 10100 The MP-BGP EVPN control plane running Cumulus Linux
is configured with the anycast address of 10.0.0.20. can be deployed in three layer 3 routed environments:
There is only one VXLAN tunnel to the remote leaf switch.
To prevent data plane learning, bridge-learning is turned ●● eBGP between the VTEPs (leafs) and
off. The locally connected bridge is associated with both spines
the host facing interface (swp2) as well as the VXLAN ●● iBGP between the VTEPs (leafs) with
interface (vxlan_1). A VXLAN interface and bridge OSPF underlay
interface must be configured on every switch with a ●● iBGP between the VTEPs (leafs) and
desired VTEP. For the active-active scenario, the routing route reflectors (spines)
protocol must advertise the anycast VTEP IP address
(10.0.0.20) to remote VTEPs. More information about
configuring VXLAN in active-active mode with EVPN can
be found in the Cumulus Linux user guide.
(continued)
Although Cumulus Linux supports all options mentioned EVPN IN AN EBGP ENVIRONMENT
above, Cumulus Networks recommends deploying eBGP for In this scenario, you peer the leafs and the spines together
greenfield deployments. eBGP is already the most preferred as in a typical eBGP data center, activating the neighbors
data center routing protocol for the underlay network and in the evpn address family. Cumulus Linux also supports
the same session can carry the overlay EVPN routes also. eBGP unnumbered to further simplify configuration. See
Figure 10.
Cumulus recommends deploying EVPN with
eBGP for simplicity
AS65020
swp51 swp52
AS65011 AS65012 AS65013 AS65014
leaf
VTEP VTEP VTEP VTEP
VNI10100 VNI10100 VNI10100 VNI10100
bridging
hosts
host01 host02 host03 host04
VLAN100 VLAN100 VLAN100 VLAN100
EBGP Peering
(continued)
Note the EVPN address family is needed on the spines More information on configuring EVPN with eBGP can be
to forward the EVPN routes, but the command advertise- found in the Cumulus Linux user guide.
all-vni is not needed on the spines unless VTEPs are also
located on the spines.
EVPN IN AN IBGP ENVIRONMENT WITH OSPF To provide redundancy, two spine switches should be
UNDERLAY
configured as RRs within the EVPN address family. It is
EVPN can also be deployed with an OSPF or static route
recommended to use the same cluster-ID on the redundant
underlay if needed, but is more complex than the eBGP
route reflectors to reduce the total number of stored routes.
solution. In this case, iBGP advertises EVPN routes directly
More information on configuring RRs can be found in the
between VTEPs and the spines are unaware of EVPN or
Cumulus Linux user guide. See Figure 11.
BGP. The leaf switches peer with each other in a full mesh
within the EVPN address family, and generally peer to the
If a three tier Clos network is desired without an OSPF
leaf loopback addresses which is advertised in OSPF. The
underlay, tiers of route reflectors must be deployed.
receiving VTEP imports routes into a specific VNI with a
matching route target community.
If more than one pod is needed or the data center expands
with all iBGP, the use of additional clusters is recommended.
EVPN IN AN IBGP ENVIRONMENT WITH ROUTE
REFLECTORS A cluster consists of one or more route reflectors and their
With this scenario, the spines are route reflectors (RR) and clients. Each route reflector in each cluster peers with
reflect EVPN routes between the leafs. This scenario may each other as well as any other cluster’s route reflectors.
be necessary for scale, and/or if iBGP is desired with no Assigning different cluster IDs (a BGP attribute) to each
OSPF underlay. The EVPN address family must be run on cluster prevents looping of routes between different clusters.
the spines (RRs), but the command “advertise-all-vni” is
not needed. Although the RRs receive all the MAC address
routes associated with the VXLANs, they are not put into
hardware on the RRs allowing for greater scale.
AS65020
spine01 spine02
route reflector 01 route reflector 02
fVNI10100
ael leaf
VTEP VTEP VTEP VTEP VTEP VTEP
VNI10001 VNI10001 VNI10001 VNI10001 VNI10001
iBGP Peering
EVPN Address Family
FIGURE 11 - EVPN IBGP DEPLOYMENT WITH ROUTE REFLECTORS
iBGP Reflecting
EVPN Address Family
Conclusion
Data centers are moving towards a layer 3 fabric in order
to scale, provide ease of troubleshooting, and provide
redundancy with multi-vendor interoperability. However,
some applications still require layer 2 connectivity. For
these applications, VXLAN tunnels are being widely
deployed to provide a scalable layer 2 overlay solution over
a layer 3 fabric.
©2017 Cumulus Networks. All rights reserved. CUMULUS, the Cumulus Logo, CUMULUS NETWORKS, and the Rocket Turtle Logo (the “Marks”) are trademarks and
service marks of Cumulus Networks, Inc. in the U.S. and other countries. You are not permitted to use the Marks without the prior written consent of Cumulus Networks.
The registered trademark Linux® is used pursuant to a sublicense from LMI, the exclusive licensee of Linus Torvalds, owner of the mark on a worldwide basis. All other
marks are used under fair use or license from their respective owners.
08072018