DCI Using VXLAN EVPN Multi-Site W/ VPC BGW
DCI Using VXLAN EVPN Multi-Site W/ VPC BGW
DCI Using VXLAN EVPN Multi-Site W/ VPC BGW
Let’s suppose to have the necessity to L2 and/or L3 extend via Data Center
Interconnection (DCI) two (or more) Data Center that are using Classic Ethernet
networks, so no fancy deployment, just classical vPC at most (or VSS, remaining in
CISCO world) rather than spanning tree or Cisco FabricPath inside each fabric.
What about to use VXLAN EVPN Multi-Site vPC BGW as Layer 2 and Layer 3 overlay
technology to interconnect them? … in place of vPC, OTV, VPLS or EoMPLS!
The deployment of vPC BGWs is supported starting with Cisco NX-OS 9.2(1); even
though it can be introduced for several use cases, it can be considered the main
integration point for legacy networks into an EVPN Multi-Site deployment.
The vPC BGW provides in fact redundant Layer 2 attachment through vPC and the
hosting of the first-hop gateway by using IP Anycast Gateway.
Even though vPC BGW well adapts to manage the coexistence of a VXLAN BGP EVPN
fabric with a legacy network, in this paper we’ll talk about the pure DCI
interconnection between two legacy DCs.
Quite often, the scenario model just mentioned represents the first step of a
migration procedure aiming to refresh the legacy technologies used inside each site
replacing them with modern VXLAN EVPN fabrics.
Basically, from a data plane forwarding perspective, the vPC BGW nodes leverage
VXLAN tunnels that extend connectivity between the legacy data centers; in this
way, traffic locally originating at an endpoint in the legacy network and destined
for an endpoint in a remote site is VXLAN encapsulated and delivered across the
external network infrastructure via the VXLAN tunnel.
The huge advantage of VXLAN EVPN Multi-Site vPC BGW solution, respect other
technologies such as VRF-lite, MPLS L3VPN that provide only Layer 3 connectivity or
other ones such as VPLS or Cisco OTV that provide only Layer 2 extension, is the
integration of Layer 2 and Layer 3 extension; that, joined at the workload mobility,
and multitenancy between multiple legacy data center networks, due to the VRF
support, makes this technology, very attractive and easy to implement.
One more thing that makes very cool this technology is the EVPN Multi-Site storm-
control feature; it can individually tune how much BUM traffic is allowed to
propagate among legacy sites.
The vPC BGW nodes do not perform fragmentation so it’s very important that the
transport network interconnecting the data centers be ready to host the extra 50
bytes due to VXLAN encapsulation.
… I’m quite sure that the chance provided by the vPC BGW nodes of using Ingress
Replication (IR) mode (aka Head-End Replication) to handle the BUM traffic between
Ok, now it’s time to start to dig this technology with an example that as usual
makes everything much clearer and very easy to understand … so, let’s start our
journey!
Let’s base all our discussions on the scheme here below, where we have two Sites,
with one VLAN, L2 extended between them, and other two, present one in site 1
and one in site 2, in order to test VLAN (and VXLAN) routing using the VRF VNI
identifier with Symmetric Integrated Routing & Bridging (IRB).
Concerning the IRB, just as reference to see how the source and destination MAC
address change along the way, here a picture highlights this aspect:
OSPF among the two sites is configured to make reachable the Loopbacks used for
mp-BGP EVPN peering and VTEP besides the ones used as Multi-Site VIP address.
mp-BGP EVPN sessions are in place between BGWs for the propagation of L2 and/or
L3 endpoint prefixes.
As reference, these are the ip addresses that will be referenced along this
document:
o BGW1 (SPINE11):
o BGW2 (SPINE12):
Site 2:
o BGW3 (SPINE21):
o BGW4 (SPINE22):
The secondary ip address provided to L1 for the VTEP vPC VIP, the same for both
the BGWs of each site, is used as source and destination IP address of the VXLAN
tunnel from one site to another; the traffic distribution across the inter-site network
is however fairly distributed among equal-cost paths because the source UDP port,
used as entropy for generating the outgoing interface, per traffic flow, is calculated
basing on the hashing of the inner header of original packet. Statistically so, we can
say that different traffic flows will be distributed on different paths.
© copyright 2021, Mario Rosi | All rights reserved
Among BGWs and Core switches, we have a normal vPC double side, configured,
that means that L2 domain is extended from the access switches towards BGWs
devices where the L2/L3 demarcation point is defined.
- On site1:
- On site2:
Let’s start first of all analyzing the features enabled on BGW devices.
The two ones I want just spend a few words about are nv overlay evpn (enables the
mp-BGP EVPN control plane) and feature nv overlay (enables VXLAN feature and so
the configuration of VTEP); the other ones should be well known, like for instance
feature vn-segment-vlan-based, used for mapping VLANs to VXLAN.
Feature bgp is used for the mp_BGP EVPN peering among BGWs, in fact, going on
BGW1 of site 1, we find both the peers with BGW3 and BGW4 of site 2:
The routing context (the VRF named VLAN_200_300) is configured with the
definition of route-distinguisher and route-target left to the system, in auto mode.
Now, let’s examine the configuration of the VTEP interface, the NVE1, that inherits
the Loopback 1 ip address:
Finally, we have the configuration concerning the RD and RT definition for each
VXLAN segment:
After the introduction on configurations, now it’s time to investigate a little with
the show commands…
The two prefixes IP/MAC address just shown, are BGP EVPN type-2 updates (where
MAC Address Length (/48) and MAC Address are always present and IP Address
Length (/32, /128) and IP Address are optional) and the description of each field
can be seen in the following figure:
- Received label 1000 2030 (MPLS Label1 (L2VNI) and MPLS Label2 (L3VNI)) are
the VNI identifiers for the BD 100 (relative to entry 10.10.10.2) and for the
VRF that contains it
…but mp_BGP EVPN transport also type-5 prefixes as quoted here below, once
imported in the local VRF with the proper RD: 10.10.10.101:4…:
…the BGW install in BGP table the route-type 5 prefixes having as next hop the PIP
primary addresses. Together also the advertise virtual-mac command has to be
added under the NVE interface:
Vice versa, hosts’ route-type 2 prefixes (so /32 as IP address) are always advertised
using the NVE primary address as next hop.
I hope one more time you liked the virtual journey across the protocols on the
underlay of DCI VXLAN framework.