0% found this document useful (0 votes)
1K views600 pages

HCIP-Data Center Network V1.0 Training Material

Uploaded by

giovannimeyong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views600 pages

HCIP-Data Center Network V1.0 Training Material

Uploaded by

giovannimeyong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 600

Data Center and Data Center Network

Technologies
Foreword

⚫ In the cloud and big data era, data centers are facing massive construction requirements.
With the development of technologies and the improvement of user requirements,
simplicity, efficiency, and reliability have become a new idea for future data center
development, and the design concept of data centers is changing quietly.
⚫ This course introduces the basic concepts of the data center and data center network.

2 Huawei Confidential
Objectives

⚫ On completion of this course, you will be able to:


 Describe the concepts of data centers and data center network.
 Understanding Common data center network Architectures.
 Clarify key network technologies in the data center.

3 Huawei Confidential
Contents

1. Data Center Overview

2. Data Center Network Overview

3. Overview of Key DC Technologies

4 Huawei Confidential
Why Do We Need a Data Center?
⚫ With the development of enterprises, the amount of data that enterprises need to process every day is increasing.
The processing power of personal computers in offices is no longer enough to meet the needs of enterprises. To
provide more efficient methods for processing information and data, enterprises build or rent data centers to
process massive data in a centralized manner, meeting enterprise development requirements.

Small business using personal PCs As the enterprise grows, more and Data is centrally processed in the data center, and
Processing data more data needs to be maintained. large enterprises use data through the data center

5 Huawei Confidential
What is a Data Center?
⚫ A data center, as the name suggests, is a data center where enterprises process and store massive amounts of data.
⚫ A data center is actually a large-scale equipment room. Enterprises use the existing Internet lines and bandwidth
resources of communications carriers to establish a standardized data center equipment room environment to
provide all-round computing, storage, and security services for enterprises, governments, and individuals. The data
center has the characteristics of high running speed, large storage capacity, and high security.

Data center equipment room Data center cabinet

6 Huawei Confidential
Typical Application Scenarios of Data Centers

Finance The government Big enterprise

Traditional bank Open banking Low efficiency High efficiency Few services More services
and few NMSs and fewer NMSs

Scenario Scale
Loaned Deposited Transfers Loaned Deposited Transfers Efficiency City Towns
100x 100x
10x and towns
Departments
Province
The bank Cars Shopping ? and city
APP APP APP

Government portal

Online transaction data index surges Data isolation between departments Rapid enterprise business development
Build a data center to quickly process Build an integrated data center Build a data center to
data in multiple scenarios. to implement one-click service implement intelligent service
processing. management and control.

7 Huawei Confidential
Overall Data Center Architecture
⚫ For enterprises, the data center is actually an extended version of the personal computer, which is responsible for computing, storing,
and forwarding enterprise data. A modern data center consists of the following parts:
 The computing system consists of a large number of servers and is the heart of the data center. It processes massive data in the data center.

 A storage system consists of different types of storage devices. A storage device is a place where massive data is stored and is used for information
storage.

 The data center network consists of different types of network devices, such as switches and firewalls. It connects the computing and storage systems
in the data center. All data interaction between the computing and storage systems is implemented through the data center network.

Data Center
Network

Computing system Storage system

8 Huawei Confidential

• Key devices in the data center equipment room include servers, network devices,
and storage devices. Small- and medium-sized data centers are key devices, such
as servers, which are characterized by small physical space, small requirements
for network devices, and limited capacity expansion.
Contents

1. Data Center Overview

2. Data Center Network Overview


◼ Introduction to DCN

▫ DCN Common Concepts

3. Overview of Key DC Technologies

9 Huawei Confidential
Data Center Network

DC2
Internet
/WAN

⚫ The Data Center Network (data center network) is the


infrastructure that carries services in a data center and
Leaf Leaf
is responsible for data forwarding.
Spine VXLAN
⚫ Multiple data center network can connect to branches
Leaf Leaf of an enterprise or organization across regions. Data
center networks can also connect to the Internet or
WAN.

DC1 Server FW LB

10 Huawei Confidential

• The data center network uses the Spine-Leaf architecture and uses VXLAN.
(Virtual Extensible Local Area Network) Connectivity.

▫ Spine: a backbone node and core node on a VXLAN network. It provides


high-speed IP forwarding and connects to leaf nodes through high-speed
interfaces.

▫ Leaf: A leaf node, which provides VXLAN access for various network devices.
Devices of different roles can be co-deployed based on the device type. (As
shown in the figure, the border leaf node and service leaf node are co-
deployed.) The specific types and functions will be described in detail later.

▫ Value-added service (VAS): A device that provides L4-L7 services, such as a


firewall or load balancer.
Advantages of the Spine-Leaf Architecture
⚫ Spine-Leaf is a new network architecture of a data center, consisting of spine nodes and leaf nodes. Spine nodes are
backbone nodes and provide high-speed IP forwarding. A leaf node provides the network access function.
⚫ Spine nodes and leaf nodes are fully connected at Layer 3 and equal-cost multipathing is used to improve network
availability.
⚫ The Spine-Leaf architecture has high scalability.

Spine Spine Spine Spine


Spine Spine
Spine Spine

Leaf Leaf Leaf Leaf


Leaf
Leaf Leaf Leaf Leaf Leaf Leaf
Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf
Leaf Leaf Leaf

Spine-Leaf Networking Multi-node expansion Multi-level expansion

11 Huawei Confidential

• The Spine-Leaf architecture has the following characteristics:

▫ Each lower-level node (leaf) connects to all the corresponding higher-level


nodes (pines) to form a full-mesh topology.

▫ There is no horizontal line between nodes at the same level.

▫ In typical applications, the entire spine-leaf architecture is like a logical


modular switch. Leaf nodes, like the interface line cards of the modular
switch, access external traffic. Spine nodes, like the SFUs of the modular
switch, implement traffic exchange between leaf nodes.

• Spine-Leaf, with high scalability:

▫ The number of spine nodes can be expanded to four or more. The


maximum number of spine nodes depends on the number of uplink ports
on leaf nodes.

▫ Based on two-level spine-leaf nodes, the network can be further expanded


to three-level spine-leaf nodes, implementing high-speed data exchange
between more leaf nodes.
Basic Concepts of the Spine-Leaf Architecture

Terms To explain

A core node on a VXLAN fabric network. It provides high-


Spine speed IP forwarding and connects to leaf nodes through
high-speed interfaces.
A leaf node, which is a VXLAN fabric function access node
Leaf and provides various network devices to access the VXLAN
network.
Spine Fabric
A group of spine and leaf nodes are interconnected to form
Fabric
the basic physical network topology of the data center.
A leaf node, which provides Layer 4 to Layer 7 value-added
Service
services, such as firewall and load balancer, to access the
Leaf
VXLAN fabric network.
A leaf node, which provides computing resources, such as
Server Leaf virtualized and non-virtualized servers, to access the VXLAN
fabric network.
A leaf node is a leaf node that connects external traffic of a
Border Leaf data center to the VXLAN fabric network of the data center
and connects to routers or transmission devices.

Server Leaf Service Leaf Border Leaf

12 Huawei Confidential
VXLAN-based Data Center Network Layer

• A VPC is a logically isolated network created by tenants


based on the VXLAN technology. It can also be called a
security domain. A VPC usually represents a department
or a service.

VPC1 VPC2

• Use virtualization technologies (such as VXLAN) to build a


Border Leaf Spine logical topology based on any physical network and
Service Leaf FW
VXLAN enable logical tunnels to build a large Layer 2 network.

Overlay Server Leaf Server

• A physical network established by a physical device.


Border Leaf Spine
Service Leaf FW • Provides interconnection capabilities for all services in the
OSPF
data center.

Underlay (Physical network layer) • Basic bearer network for service data forwarding.
Server Leaf Server

13 Huawei Confidential
Underlay and Overlay
Overlay
• VXLAN is a logical network established on the underlay network.
• It has an independent forwarding plane and control protocol.
• The underlay physical network is transparent to the devices that are not
connected to the VXLAN tunnel endpoints.

Overlay Network Underlay


• The underlay network consists of various physical network devices and is a
bearer network of the overlay network.
N
• After the overlay technology is implemented on the underlay network, a
N
Host logical network is formed on the basis of the underlay network.

Host N • The underlay network provides basic capabilities, such as reachability and
reliability, for the upper-layer overlay network.
Host
Underlay Network
• The underlay network has independent control plane protocols and
forwarding plane protocols. Generally, OSPF or EBGP is used as the
N :NVE
control plane protocol, and IPv4 is used as the forwarding plane protocol.
:Data plane
• The underlay network is logically isolated from the overlay network and is
:Overlay control plane
unaware of overlay network routes.

14 Huawei Confidential
Typical Data Center Network Scenarios
VPC1
Logical network of Logical Router
The Agile Controller manages network service 1
devices and forms a network resource pool.
VAS Logical switch

Tenant 1
Border Leaf& One network for
Service Leaf multiple purposes,
VPC2 Logical firewall
allowing tenants to Logical network of
apply for network service 2
resources on demand
Spine External
network domain
Tenants create a VPC
Server Leaf
based on the applied
network resources
and create a logical
Network service layer network in the VPC. VPC3
Logical network of
service 3
Tenant 2

Server
Computing access layer

15 Huawei Confidential

• iMaster NCE (Fabric) is an autonomous driving control system developed by


Huawei for data center network scenarios. It integrates management, control,
analysis, and AI functions. The following sections will describe the functions,
features, and application scenarios.
Contents

1. Data Center Overview

2. Data Center Network Overview


▫ Introduction to DCN
◼ DCN Common Concepts

3. Overview of Key DC Technologies

16 Huawei Confidential
Integrated Cabling Equipment Room PoD Data Center Switch

Integrated Cabling
⚫ The integrated cabling of a DC has three important concepts: Top of Rack (ToR), End of Row (EoR), and Middle of
Row (MoR).

ToR switch Server EoR switch

Aggregation
Server Server EoR switch
switch
(modular)
Server Server EoR switch
42 U
Server Server
Aggregation Aggregation
switch switch
(modular) (modular)

A row of cabinets A row of cabinets

17 Huawei Confidential

• The common height of standard cabinets within a DC is 42 U (1 U = 4.445 cm)


and the height of each cabinet unit is 4.445 cm.
• Top of Rack (ToR): ToR switches are deployed at the top of a cabinet and servers
in the cabinet are connected to a switch through optical fibers or network cables.
ToR switches are connected to the aggregation switches at the upper layer. This
applies to the scenario with a large number of access devices or a high-density
single cabinet. The distributed access of servers can reduce the network
connections between server cabinets and network cabinets, simplifying the
connection management. At the same time, access switches are distributed in
multiple cabinets, causing difficulties in centralized maintenance and
management. This is common in cloud data centers.
• End of Row (EoR): Access switches are deployed on one or two cabinets of a
cabinet group in a centralized mode. All servers of the row of cabinets are
connected to the switches through horizontal cabling. This is common in
traditional DCs. If cables are connected in the EoR mode, many cable connections
will be aggregated from multiple server cabinets to network cabinets, causing
difficulties in connection management while bringing conveniences to centralized
maintenance and management of switches.
• Middle of Row (MoR): The connection modes of MoR switches and EoR switches
are similar. Access switches are deployed in one or two cabinets of a cabinet
group in a centralized mode, but network cabinets are in the middle of the
cabinet group. In this situation, connections from server cabinets to network
cabinets are slightly simplified compared with the EoR mode, and switches are
managed in a centralized way. It is a compromise solution between the ToR
mode and the EoR mode.
Integrated Cabling Equipment Room PoD Data Center Switch

Equipment Room Module


Building-1 Building-2
• For example, each floor of each building in a financial DC is
Open Open Testing Testing
divided into multiple equipment room modules.
Floor-W Open Open Testing Testing • From the perspective of functions, equipment room modules can
Network Network Open Open be divided into different types, including network modules,
storage modules, open server modules, and test modules, with
Storage Storage Storage Storage different power densities.
• As shown in the left figure, based on power densities, each floor is
Floor-Z Open Open Open Open
divided into three areas: high-density area, medium-density area,
Storage Storage Storage Storage and low-density area.

Open Extranet Branch Branch • Network module: responsible for network access, as the core of
Floor-Y Testing Open Testing Open the WAN and LAN. The power consumption varies for different
Open Open devices.
Open Open Low-
density • Storage module: used for housing storage devices in a centralized
mode
Open Network Open Network High-
density • Open server module: used for housing servers in a centralized
Floor-X Network Open Network Open
mode
Open Open Open Open Medium
• Test module: used for housing test devices in a centralized mode
-density

18 Huawei Confidential

• Network module: responsible for network access. Network modules feature


differences both in power consumption and device types, such as large-scale core
network devices with high power consumption and devices with low power
consumption, such as load balancing, firewalls, switches, and routers.

• Storage module: used for deploying storage devices in a centralized mode,


including NAS storage and SAN storage, as well as tape libraries and virtual tape
libraries.

• Open server module: used for deploying standard servers in a centralized mode,
including PC servers, blade servers, and small-sized servers. Servers feature high
standardization and high density.

• Testing module: used for deploying test devices in a centralized mode. Testing
modules feature high flexibility, with lower security requirements compared with
production modules. The modules can also be adjusted at any time based on
testing requirements, with low management requirements.

• In the data center of a financial institution, based on special purposes, other


special equipment room modules can be planned.

▫ Internet module: responsible for accessing Internet applications. The


module should be accessed to the Internet with high security requirements.
It is vulnerable to various Internet attacks, such as online banking, websites
and e-commerce applications.
Integrated Cabling Equipment Room PoD Data Center Switch

PoD
Data center (as an example)
PoD 1 PoD 2 PoD 3 PoD 4
Spine Spine Spine Spine Spine Spine Spine Spine

Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf Leaf

⚫ To facilitate the resource pool-based operation and management of a DC, a DC is divided into one or more physical
partitions and each partition is called a Point of Delivery (PoD). PoD is a common concept of DCs for physical
design and a modular design entity integrating network, storage, and computing.
⚫ PoDs can be defined based on actual service requirements:
 In large DCs, equipment room modules can be defined as a PoD.
 In midsize DCs, every two or multiple rows of cabinets can be defined as a PoD.
 In small DCs, one or more cabinets can be defined as a PoD.

20 Huawei Confidential

• Definitions of the PoD scope vary with different enterprises' user habits. For
example, some large enterprises consider an equipment room module is wider
than a PoD. In an enterprise, a PoD consists of 48 ToR devices and 4 spine
switches.
Integrated Cabling Equipment Room PoD Data Center Switch

Data Center Switch


⚫ The data center switch usually refers to a hardware switch. It has gone through three phases: in phase 1, FE access
and GE uplinks are applied; in phase 2, GE access and 10GE uplinks are applied; now, in phase 3, 10 GE/40 GE
access and 100 GE uplinks are applied.
Phase 3: high-density 10 GE/40 GE
Phase 1: high-density FE/GE access Phase 2: high-density GE/10 GE access
access
GE uplink 10 GE/40 GE uplink
100 GE uplink

Core Core Core Core Core Core

GE 10/40 GE 100 GE
100 M FE GE GE GE 10 GE 10 GE 40 GE 100 GE
access access access access access access access access access

From 2000 to 2006 From 2006 to 2012 From 2012 to 2020

DC servers mainly use FE/GE access. DC DC servers mainly use GE/10 GE access. DC DC servers mainly use 10 GE/40 GE access.
switches mainly provide 100 M and GE switches mainly provide GE interfaces, with a Therefore, DC core switches must provide
interfaces, meeting the requirements of few 10 GE access capabilities, meeting the high-density 10 GE/40 GE interfaces to meet
small DCNs made of a few servers. requirements of small- and medium-sized the server access requirement.
DCNs made of GE servers.

21 Huawei Confidential
Integrated Cabling Equipment Room PoD Data Center Switch

DC Switches for the AI Era

FabricInsight iMaster NCE-Fabric


The CloudEngine (hereinafter referred to as CE)
series refers to Huawei's high-performance switches
designed for next-generation data centers, including:
CloudEngine 16800/12800 series (modular) • CE 16800 series and the CE 12800 series, mainly
used for high-speed data forwarding in data
centers
• CE 9800/8800/6800/5800 series, mainly used for
high-density access in DCs

CloudEngine 9800/8800/6800/5800 series (fixed)

22 Huawei Confidential
Contents

1. Data Center Overview

2. Data Center Network Overview

3. Overview of Key DC Technologies


◼ DC Key IT Technologies

▫ DC Key Network Technologies

23 Huawei Confidential
Virtualization Cloud Computing Container Storage HPC AI

Introduction to Server Virtualization


⚫ Server virtualization is a virtual technology with which you can run multiple virtual machines on a
physical server to obtain advantages, including higher physical resource usage, rapid service
deployment, and elasticity.
⚫ Eventually, users can install and run multiple applications and services on these virtual machines like
using a physical server.
Server virtualization

App 1 App 2 App 3 App 4


APP
OS OS OS OS
OS
VM1 VM2 VM3 VM4

Physical server Physical server

24 Huawei Confidential

• APP: Application.

• OS: Operating System.

• For more information about virtualization, see Technical Principles and


Applications of Virtualization.
Virtualization Cloud Computing Container Storage HPC AI

Server Virtualization: Virtualization Management Platform


⚫ As services grow, the number of virtual server clusters reaches hundreds to thousands. Therefore, a virtualization management
platform is required for centralized management.

⚫ The virtualization management platform provides a simple user interface and various functions, such as monitoring and managing
virtual resources, simplifying the creation process of VMs, configuring resource scheduling policies, and executing rules. Mainstream
virtualization platforms in the industry include Huawei VRM, VMware vCenter, and Microsoft System Center.

Virtualization management platform

VM VM VM

VM VM VM

Server Server Server

25 Huawei Confidential
Virtualization Cloud Computing Container Storage HPC AI

Introduction to Cloud Computing and OpenStack


⚫ OpenStack is an open-source cloud operating system that controls large-sized computing, storage, and network resource pools of
the data center. After OpenStack is deployed, users can manage resources through web UIs, CLIs, or APIs.

⚫ OpenStack does not simply mean cloud computing, but a cloud platform as a key component of cloud computing. OpenStack aims
to offer resource management, including managing the computing, storage, and network resource pools of heterogeneous vendors.

Cloud service Cloud disk Cloud host ...

Cloud platform
Cloud platform
OpenStack

Virtual resource pool Computing virtualization Storage virtualization Network virtualization

Infrastructure
Server Storage Network device

26 Huawei Confidential

• OpenStack and cloud computing:

▫ OpenStack is a framework for building a cloud OS. The cloud OS integrates


and manages various hardware devices and bears various upper-layer
applications and services to form a complete cloud computing system.
Therefore, OpenStack is the core software component of a cloud computing
system and the basic framework for building a cloud computing system.

• OpenStack and virtualization:

▫ OpenStack is a cloud OS framework. To build a complete cloud OS,


especially to implement resource access and abstraction, OpenStack needs
to be integrated with the virtualization software to implement compute
resource pooling of servers. In the resource pooling process, physical
resources are virtualized by the virtualization software.
Virtualization Cloud Computing Container Storage HPC AI

Introduction to OpenStack Core Components


⚫ OpenStack is decomposed into several service components, each of which supports the plug-and-play mode to meet diversified
service requirements.
⚫ There are many core projects of OpenStack resource management. Nova manages compute resources. Cinder manages block storage
resources. Neutron manages network resources.
⚫ The upper layer of OpenStack connects to the cloud management platform. The cloud management functions include but are not
limited to: operations for tenants, cloud provisioning services, accounting, and multi-cloud monitoring.

Cloud management platform

OpenStack

Nova (computing) Cinder (block storage) Neutron (network)

Compute Network
resource pool Storage resource pool resource pool
Virtualization
SDN controller
management platform

Server Server Server Server Disk array Switch Firewall Others

27 Huawei Confidential

• To build a cloud OS, a large number of software components need to be


integrated so that they can work together to provide functions and services
required by system administrators and tenants. However, OpenStack cannot
independently provide all capabilities required by a complete cloud OS.

• For example, OpenStack cannot independently access and abstract resources, and
needs to work with underlying virtualization software, software-defined storage
(SDS), and software-defined networking (SDN). OpenStack cannot independently
provide comprehensive application lifecycle management capabilities, and needs
to integrate various management software platforms at the upper layer.
OpenStack does not have complete system management and maintenance
capabilities. When OpenStack is put into production, it needs to integrate various
management software and maintenance tools. The man-machine interface
provided by OpenStack is not powerful enough.

• For details, see Technical Principles and Applications of the OpenStack Cloud
Platform.
Virtualization Cloud Computing Container Storage HPC AI

Introduction to Containers
⚫ Container is an OS-level virtualization technology. Containers are more lightweight and efficient than VMs.
⚫ For example, the Linux operating system can be divided into the kernel space and the user space. The kernel of an
operating system supports multiple isolated user space instances. An advantage of the container technology is the
integration of applications and their operating environment. This enables fast transportation of an application and
greatly simplifies the process of development-test-deployment-O&M.

A standard transportation
mode needs to be formulated,
without special attention to
the container content.
running job

filesystem

Container
Container

28 Huawei Confidential

• The Linux operating system and drivers run in the kernel space and applications
run in the user space.

• Container can be more precisely defined as the entity for running a container
image.

• Container image:

▫ An application and its dependencies (including all files and directories of


the OS) can be packaged into an image.

▫ The image contains all dependencies required for application running. You
only need to run the image in the isolated sandbox without any
modification or configuration.

▫ Images focus on packaging applications and their runtime environments in


a unified format. This ensures high consistency between the local
environment and the cloud environment.
Virtualization Cloud Computing Container Storage HPC AI

Comparison Between Containers and VMs


⚫ Compared with VMs, containers share the same operating system kernel. They are of low
independence, only providing the process-level isolation.
⚫ Containers perform better than VMs, in terms of the startup speed, running performance, and server
resource usage.

VM Container

Web DB App Web DB APP

Bins/Libs Bins/Libs Bins/Libs Bins/Libs Bins/Libs Bins/Libs

Guest OS Guest OS Guest OS Container engine

Hypervisor Host OS

Server Server

29 Huawei Confidential
Virtualization Cloud Computing Container Storage HPC AI

Container Management Platform


⚫ The container engine is only a daemon for container management on a single node, while the scale of nodes managed by enterprise
DCs or the public cloud is quite large. An independent container management platform is needed to implement large-scale container
management.

⚫ A mature container management platform should at least contain the following two major functions: application orchestration
management and cluster resource scheduling.

⚫ There are three platforms of cluster resource management scheduling and application orchestration in the industry: Kubernetes,
Swarm, and Mesos.
Container management platform

Container Container Container


Container Container Container

Server Server Server

30 Huawei Confidential

• Resource management and scheduling of container clusters: The resource status


of managed nodes is collected to complete the resource management of tens of
thousands of nodes; the container resource requests of users are handled based
on specific scheduling policies and algorithms.
• Application orchestration and management: In terms of different types of
applications in a data center, such as the Web service and task processing in
batches, basic management capabilities commonly applied by different
applications are abstracted and exposed to users through APIs. As such, users can
achieve automatic application management by using the preceding API
capabilities of the container management platform when developing and
deploying their own applications. Through application orchestration and
management, users can enable application model customization and one-click
automatic deployment. Users enable a gray upgrade based on the one-click
application deployment of application templates, which significantly simplifies
the application management and deployment.
• The Kubernetes ecosystem is a community project launched by Google, covering
the container cluster resource management and distribution, as well as
application management components of different applications, such as copy
reliability management, service discovery and load balancing, gray upgrade, and
configuration management.
• The Mesos ecosystem is actively promoted by Mesosphere, Twitter, and other
companies.
• The Docker ecosystem is proposed by Docker with an aim to develop towards the
upper layer of the container ecosystem by introducing container schedule
components of Swarm container resource management and Compose application
orchestration components.
Virtualization Cloud Computing Container Storage HPC AI

Introduction to Storage Types


⚫ Block storage, file storage, and object storage are three common concepts in enterprise DC storage.
Different data types and service scenarios in enterprises have different storage requirements, and the
three different storage types providing different storage services are briefly described here:

Storage Type Typical Application Typical Device Model Application Scenario


High-performance
Block storage Disk arrays and hard disks High I/O database
applications
• Common scenario: file sharing
File-sharing applications • High-performance scenario: video processing and
File storage FTP and NFS servers
based on LANs animation rendering/high-performance
computing
Applications with a large
Video storage of VOD/video surveillance, image
amount of data and a Distributed servers with built-
Object storage storage, disk storage, static web page storage, and
rapid growth of storage in large-capacity hard disks
remote backup storage/archiving
capacity

31 Huawei Confidential
Virtualization Cloud Computing Container Storage HPC AI

Introduction to the Storage System


⚫ From the storage product perspective, storage products in a data center are classified into two modes:
centralized storage and distributed storage.

Centralized storage Distributed storage

Server Server Server Server Server Server

Server Server Server

Distributed storage
⚫ Centralized storage: One or more primary computers form a central ⚫ Distributed storage: The storage system stores data on multiple
node where data is centrally stored and all service units and functions independent devices.
are deployed on a storage system. ⚫ A distributed storage system adopts a scalable system architecture
⚫ In a centralized system, each terminal or client is only responsible for and enables multiple storage servers to share the storage load. This
the input and output of data, while the storage and processing of improves the scalability, reliability, availability, and access efficiency.
data are completely decided by a host.

32 Huawei Confidential

• The most distinct feature of a centralized system is the simple deployment


architecture. As centralized systems are often based on mainframes with
outstanding performances at the bottom layer, there is no need to consider the
multi-node deployment of services or the distributed collaboration among
multiple nodes.
• Traditional network storage systems store data on centralized storage servers,
which may become the bottleneck of system performance and a vulnerable point
in terms of availability and security, failing to meet the requirements of large-
scale storage applications.
• Features of the centralized storage:
▫ Devices, diversified in types, often establish external connections through an
IP or FC network.
▫ Extensibility must be ensured. Restricted by the controller's capabilities, the
network scalability is limited and storage capabilities are in the PB level.
▫ Devices need to be replaced after the lifecycle ends and all the data needs
to be migrated.
• Features of the distributed storage:
▫ High scalability: Based on standard hardware, the distributed storage
supports multiple types of storage protocols and models.
▫ High elasticity: Based on the distributed architecture, the number of storage
nodes reaches several thousand, with an amount of EB-level data.
▫ Flexible capacity expansion: Capacity expansion is conducted based on
standard hardware.
Virtualization Cloud Computing Container Storage HPC AI

High-Performance Computing
⚫ High Performance Computing (HPC) is a branch of the computer science. HPC improves the computing speed to a
manner of tera operations per second (TOPS) through a cluster architecture, parallel algorithm, and the
parallel/distributed computing of related software, which cannot be achieved by a single computer.
⚫ The HPC system supports software and hardware collaboration. A typical architecture of the system includes
infrastructure, compute nodes, storage and file systems, network switching, cluster management, and resource
scheduling.
HPC cluster

…… Parallel storage

HPC

Compute cluster

33 Huawei Confidential

• HPC refers to the aggregation of the computing power to perform computing


tasks, such as simulation, modeling, and rendering, which are beyond the
capacity of standard workstations. In recent years, HPC is often considered
equivalent to a computer cluster system, which uses the high-speed
interconnection technology to connect multiple computer systems and handles
large-scale computation problems with comprehensive computing capabilities of
all the connected systems. In this sense, HPC is usually called HPC cluster.
• A cluster refers to a group of computers which provide network resources for
users as a whole. Each computer of a cluster is considered as a node, which can
be added or deleted. A computer is virtualized based on these nodes for users.
From users' viewpoint, they only care about the computer being used, without
considering the nodes.
• HPC uses high-end hardware or aggregates the computing power of multiple
units and provides the ultra-high floating point computing capability solution to
satisfy the computing requirements of computing-intensive, network-intensive,
and data-intensive services, covering scientific research, weather forecasting,
computational simulation, military research, CAD/CAE, biopharmaceuticals, gene
sequencing, and image processing. This greatly shortens computing time and
improve computing accuracy.
• In general, the HPC solution consists of hardware, including servers, storage
devices, and switches, and software, including cluster software and application
software.
• The main purpose of building an HPC system is to improve the computing speed.
To improve the computing speed to a manner of tera operations per second
(TOPS), high requirements are imposed on the system processor, memory
bandwidth, computing method, system I/O, and storage. Each of these directly
affects the computing speed.
Virtualization Cloud Computing Container Storage HPC AI

HPC Power Measurement


⚫ HPC power is measured by floating-point operations per second (FLOPS).

Rpeak is determined by the specification and


number of CPUs.
Rpeak = CPU frequency (standard frequency) x
number of floating-point operations in each CPU 1 K FLOPS = one thousand FLOPS = 10^3
clock period x number of cores in the CPU 1 M FLOPS = one million FLOPS = 10^6
1 G FLOPS = one billion FLOPS = 10^9
Rmax refers to the maximum performance of HPL 1 T FLOPS = one trillion FLOPS = 10^12
in the entire cluster. 1 P FLOPS = one FLOPS = 10^15
HPL efficiency measures the computing efficiency 1 E FLOPS = ten thousand FLOPS = 10^18
of the entire cluster. 1 Z FLOPS = ten million trillion FLOPS = 10^21

𝑅𝑚𝑎𝑥
HPL efficiency =
𝑅𝑝𝑒𝑎𝑘

34 Huawei Confidential

• HPL: The High-Performance Linpack Benchmark.


Virtualization Cloud Computing Container Storage HPC AI

Introduction to AI
⚫ AI is a technical science that studies and develops theories, methods, and applications for simulating and extending human
intelligence.

⚫ Machine learning simulates and implements human learning behaviors to obtain new knowledge. It is one of the core research areas
of artificial intelligence.

⚫ Deep learning originates from the research of artificial neural network. A multilayer sensor is a deep learning structure. Deep
learning is a new research field in machine Learning. It simulates the mechanisms of the human brain to interpret data, such as the
recognition of images, voice, and texts.

ARTIFICIAL
INTELLIGENCE
MACHINE
LEARNING
DEEP
LEARNING

1950 1980 2010

35 Huawei Confidential
Virtualization Cloud Computing Container Storage HPC AI

AI Industry Ecosystem
⚫ The four elements of AI are the data, algorithm, computing power, and scenario. To meet the requirements of the
four elements, AI is integrated into cloud computing, big data, IoT, and other industries.
⚫ In the AI industry, networks are expected to provide the high-speed communication between computing nodes.

AI app Finance, healthcare, education, retail, and agriculture

Computer Speech Natural language Planning and decision-


Technical direction Big data analytics
vision recognition processing making system

AI condition Data Algorithm Computing power

Basic technology Big data Cloud computing

Internet, sensor, and Server and high-


Infrastructure
IoT performance chip

36 Huawei Confidential
Virtualization Cloud Computing Container Storage HPC AI

Network Requirements of AI Computing


⚫ The development of machine learning and deep learning is accompanied by powerful computing requirements, which can hardly be
met by only one computer. As such, distributed compute clusters are often established.

⚫ Enterprises, such as Facebook, Baidu, and Alibaba, proactively build the machine learning and deep learning platform, which is
usually built by 100 Gbps and faster network devices. The AI performance test result shows that networks can seriously affect the
computing performance. In model parallel computing, each node computes one part of the algorithm. After the computing is
complete, all data shards need to be transmitted to other nodes.
Machine 1

Machine 2
Network requirements of AI computing:
• High bandwidth, low delay, no packet loss
• Traffic control in the incast scenario
• Congestion control with quick responses
Machine 3

Machine 4
• Fast and efficient load balancing mechanism
• Differentiated scheduling of hybrid traffic

Model parallel computing

37 Huawei Confidential

• For more information, see Huawei AI certification.


Contents

1. DC Overview

2. Data Center Network Overview

3. Overview of Key DC Technologies


▫ DC Key IT Technologies
◼ DC Key Network Technologies

38 Huawei Confidential
Overview of Key DCN Technologies
NETCONF Telemetry ⚫ There are multiple network technologies applied on
DCNs. This course describes the following key DCN
technologies:
Spine
 SLB
VXLAN  M-LAG
EVPN
 VXLAN
Leaf
 EVPN
 Telemetry

SLB
 NETCONF

Intelligent
 Microsegmentation
and lossless
EPG1 EPG2
 SFC
network
 Intelligent and lossless network technologies
M-LAG SFC

39 Huawei Confidential

• M-LAG: Multichassis Link Aggregation Group

• EVPN: Ethernet Virtual Private Network

• NETCONF: Network Configuration Protocol


Load Balancing M-LAG VXLAN EVPN Telemetry NETCONF Microsegmentation SFC Lossless Network

Introduction to Load Balancing


⚫ Load balancing is a technology with which computer clusters can allocate loads. Proxy devices are used to receive external
requests and share them to multiple internal servers. The proxy device is called load balancer.

⚫ Layer 3 load balancing means IP-based load balancing. Similarly, Layer 4 load balancing means load balancing based on IP
addresses and port numbers and Layer 7 load balancing means load balancing based on the application layer protocol (such as
HTTP).

Example of Layer 3 load balancing Load balancing classifications

External • Based on the technology:


request
▫ Layer 3 load balancing, such as routers
Polling
Least ▫ Layer 4 load balancing, such as LVS and HAProxy
Virtual IP: Load balancing connections
192.168.1.100 algorithm ▫ Layer 7 load balancing, such as Nginx and HAProxy
Hash
Random • Based on the deployment mode:
weight
... ▫ Software-based load balancing

▫ Hardware-based load balancing


192.168.1.10 192.168.1.11 192.168.1.12

40 Huawei Confidential

• The IP-based load balancing is called virtual IP in this example and is called
floating IP on OpenStack.

• Load balancing algorithms determine health servers at the back end to be


chosen. Common algorithms include:

▫ Round robin: Select the first server in the first request list, and scroll the list
downwards in order in a circular manner for conducting preceding requests.

▫ Least connections: The server with the least connections is preferred.

▫ Hash: A hash is created after the hash calculation of the requested source
IP address and requested are sent to a certain server based on the hash.

▫ Random LoadBalance: weight-based random allocation.


Load Balancing M-LAG VXLAN EVPN Telemetry NETCONF Microsegmentation SFC Lossless Network

Load Balancing Applications in DCs


⚫ Load balancing applications in DCs include server load balancing (SLB) and global server load balancing (GSLB).
The former implements server load balancing within DCs and the latter implements load balancing between DCs.
1. The user accesses https://www.huawei.com/en/,
and applies for address resolutions from the DNS
1 DNS server
User server.
4 2 2
2. The DNS server forwards the query request to the
5 GSLB for resolution.
3
3. GSLB selects the optimal result, that is, the virtual
GSLB GSLB
IP address provided by the SLB, and sends the
result to the DNS server.
SLB SLB
4. The DNS server returns the user result.
(active) (standby)

6 5. The user accesses the virtual IP address provided


Data
Data center A by the SLB.
center B

Server cluster Server cluster 6. The SLB forwards the request to the specified
server.

41 Huawei Confidential

• The application scenario of GSLB is that enterprises establish multiple DCs in


different areas. Users can access the nearest DC based on their locations. There
are multiple GSLB solutions. This example describes the DNS-based GSLB solution
that is used most widely within DCs.

• In the GSLB solution, domain name service providers forward the name server
(NS) records of domain names to GSLB devices with smart DNS resolution
functions and the records are resolved by GSLB devices. If GSLB devices are
deployed in multiple places, they should all be added to the NS record to provide
high availability. GSLB devices perform health checks to back-end servers and
public IP addresses of other DCs. The results will be synchronized between GSLB
devices of different DCs through proprietary protocols. Eventually, GSLB devices
choose the optimal address resolution for DNS servers based on the GSLB policy
and DNS servers send the optimal address to the user.

• Based on the differences of user requests, SLBs in a data center distribute the
requests to multiple, hundreds, or even thousands of devices at the back end and
ensures that the system selects the optimal server to process the requests
according to the previously defined policy, which improves the availability and
scalability of applications to some extent.
Load Balancing M-LAG VXLAN EVPN Telemetry NETCONF Microsegmentation SFC Lossless Network

Introduction to M-LAG
⚫ Multichassis link aggregation group (M-LAG) is an inter-device link aggregation technology. M-LAG improves link
reliability from the board level to the device level. M-LAG provides traffic load balancing and backup protection.
⚫ In a DC, M-LAG is established through the active/standby negotiation of two ToR switches, responsible for the
access of other devices (such as servers and firewalls).

ToR
Dual-active detection (DAD)

Active/standby negotiation
ToR ToR
Layer 2 traffic forwarding Logical
perspective
Eth-Trunk

M-LAG

Server Servers are considered to be


connected to the same device.

42 Huawei Confidential
Load Balancing M-LAG VXLAN EVPN Telemetry NETCONF Microsegmentation SFC Lossless Network

M-LAG Applications in DCs


⚫ The current DCs usually adopt the spine-leaf architecture. In order to meet the requirement for high reliability, M-
LAG is recommended for server or firewall access.
Firewall

Egress

Spine

Leaf

M-LAG

Server Server

43 Huawei Confidential
Load Balancing M-LAG VXLAN EVPN Telemetry NETCONF Microsegmentation SFC Lossless Network

Introduction to VXLAN
⚫ VXLAN is a VPN technology that can build a Layer 2 virtual network over a physical network with reachable routes. Routed networks
relied on by the underlying VXLAN layer are not limited by the network architecture and support strong scalability.

⚫ VXLAN packets contain some VXLAN network identifier (VNI) fields, which are similar to the VLAN ID and are used to identify
different networks. Between two devices, there is only one VXLAN tunnel, which is similar to the Trunk link and is used to carry the
permitted traffic of all the VNIs between devices.
VNI 6000

VXLAN tunnel
VTEP VTEP
1.1.1.1/32 2.2.2.2/32

192.168.10.10-VNI 5000 192.168.10.20-VNI 5000

NVE NVE
IP network

44 Huawei Confidential
Load Balancing M-LAG VXLAN EVPN Telemetry NETCONF Microsegmentation SFC Lossless Network

VXLAN Applications in Cloud DCs


⚫ Many services are deployed on VMs in cloud DCs. VMs can be live migrated randomly in a server cluster. When VMs
are migrated to a server under another leaf node, IP addresses and MAC addresses of VMs remain unchanged to
prevent service interruptions.
⚫ Spine-Leaf + VXLAN is the best practice in this scenario.
VM migration
VM VM VM VM VM VM VM’ VM VM

Open vSwitch Open vSwitch Open vSwitch

Leaf

Spine
VXLAN tunnel
Service
access

45 Huawei Confidential
Load Balancing M-LAG VXLAN EVPN Telemetry NETCONF Microsegmentation SFC Lossless Network

Introduction to EVPN
⚫ Ethernet VPN (EVPN) is initially defined in RFC-7432. The MPLS-based VPN meets the requirements of high
bandwidth and complicated QoS scheduling.
⚫ Virtualization technologies are introduced into cloud DCs. As such, a host can carry multiple VMs which belong to
different tenants. This raises new requirements for the network. As such, the network virtualization overlay (NVO)
solution is adopted.

Control plane EVPN MP-BGP Control plane EVPN MP-BGP

Data plane Label Switching (MPLS) Data plane VXLAN


IP/GRE tunnel

MPLS-based EVPN NVO

46 Huawei Confidential

• NVO: Network Virtualization Overlay,A logical network is built on the existing IP


network to shield differences between underlying physical networks and
virtualize network resources. In this way, multiple logically isolated network
partitions and multiple heterogeneous virtual networks can coexist on the same
shared network infrastructure.
Load Balancing M-LAG VXLAN EVPN Telemetry NETCONF Microsegmentation SFC Lossless Network

EVPN Applications in DCs


⚫ The NVO solution is applied in DCs, that is, BGP EVPN works as the control plane to transmit routing information
within a DC and between DCs and VXLAN works as the data forwarding plane to forward data packets.
⚫ EVPN supports traffic transmission between Layer 2 and Layer 3 within a DC and between DCs.

Spine Spine
DCI IP network
VXLAN/EVPN
VXLAN/EVPN VXLAN/EVPN

Leaf Leaf

Server VAS resource Egress Server VAS resource Egress


cluster pool router cluster pool router

47 Huawei Confidential
Load Balancing M-LAG VXLAN EVPN Telemetry NETCONF Microsegmentation SFC Lossless Network

Introduction to Telemetry
⚫ Telemetry, also known as network telemetry, is a technology that remotely collects data from physical or virtual
devices at a high speed.
⚫ Compared with SNMP, the telemetry is at the subsecond level in terms of the collection interval. A telemetry-
enabled device proactively sends information in push mode, implementing real-time, high-speed, and precise data
collection.

Traditional SNMP Telemetry

T > 5 min T < 1s

"Pull" "Subscription
and push"

48 Huawei Confidential
Load Balancing M-LAG VXLAN EVPN Telemetry NETCONF Microsegmentation SFC Lossless Network

Telemetry Applications in DCs


⚫ DCN collects high-precision device data based on the telemetry technology to build an intelligent O&M system.

Measured data Network


Network
Networkdevice
device Big data intelligent analysis system control/deployment
Network device system
Real-time telemetry

Basic functions of Basic functions of the big data intelligent


monitored devices analysis system

1. Software and hardware 1. Multiple data storage capabilities


programmability, enabling 2. Machine learning and big data analytics
automated and open capabilities
configurations
3. Network service analysis (performance
2. Multiple collection tasks provided, analysis, capacity analysis, fault analysis,
enabling the near-real-time, full- security analysis, and route analysis)
process, and complete data
monitoring and reporting 4. Centralized management of visualization tools
5. Openness and integration capabilities

49 Huawei Confidential
Load Balancing M-LAG VXLAN EVPN Telemetry NETCONF Microsegmentation SFC Lossless Network

Introduction to NETCONF
⚫ Network Configuration Protocol (NETCONF) provides a mechanism for managing network devices. To be specific,
users can use NETCONF to add, modify, and delete configurations of network devices, as well as obtain
configurations and status of network devices.
⚫ Compared with CLI and SNMP, NETCONF has the following advantages in device configuration:

Function NETCONF SNMP CLI

Machine-machine interface: The interface definition is complete and Machine-machine Man-machine


Interface type
standard, and the interface is easy to control and use. interface interface

High: Data is modeled based on objects. Only one-time interaction is


Operation efficiency required for operations on an object. Operations such as filtering and Medium Low
batch processing are supported.

Extended capability Proprietary protocol capabilities can be extended. Low Medium

Supported: transaction processing mechanisms such as trial running,


Transaction processing Not supported Partially supported
rollback upon errors, and configuration rollback

Available only in
Secure transmission Multiple security protocols: SSH, TLS, BEEP/TLS, and SOAP/HTTP/TLS SSH supported
SNMPv3

50 Huawei Confidential
Load Balancing M-LAG VXLAN EVPN Telemetry NETCONF Microsegmentation SFC Lossless Network

NETCONF Applications in DCs


⚫ In a DC, NETCONF is mainly used by a network controller to orchestrate services and deliver configurations to
southbound devices.
⚫ NETCONF messages are encoded in XML format, including the standard YANG model.
<?xml version="1.0" encoding="UTF-8"?>
<rpc xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"
message-id=“101">
<edit-config>
<target>
NETCONF client <running/>
</target>
NETCONF message <config>

Network exchange Configuration content in XML format


(based on the YANG model)
</config>
</edit-config>
</rpc>

NETCONF server
Device
Device 1 Device 2 Device 3

51 Huawei Confidential
Load Balancing M-LAG VXLAN EVPN Telemetry NETCONF Microsegmentation SFC Lossless Network

Introduction to Microsegmentation
⚫ Microsegmentation is a security isolation technology that groups DC service units based on certain rules and
deploys policies between groups to implement traffic control.
⚫ Traditionally, subnets are created for DCs based on coarse-grained granularities such as VLAN IDs or VNIs.
Microsegmentation supports more fine-grained and flexible grouping modes, for example, grouping based on IP
addresses, MAC addresses, and VM names. This can further narrow down security zones to implement fine-grained
service isolation and enhance network security.

Server VM Server VM

Traffic control policy


Scattered IP Scattered IP
Subnet Subnet
address address
Action: permit/Deny
MAC VM name MAC VM name

... ...

Group 1 Group 2

52 Huawei Confidential
Load Balancing M-LAG VXLAN EVPN Telemetry NETCONF Microsegmentation SFC Lossless Network

Microsegmentation Applications in DCs


⚫ In a DC, microsegmentation classifies servers or VMs into groups and defines access control policies between
different groups to implement traffic control between service nodes.
⚫ Microsegmentation can be implemented on a standalone CE switch or on a CE switch and an iMaster NCE-Fabric
controller.
Spine

Fabric

Leaf1 Leaf2

VM1 VM2 VM3 VM4 VM5 VM6 VM7 VM8


EPG1 EPG2 EPG3

53 Huawei Confidential

• Microsegmentation implements service isolation between different servers of a


VXLAN network and ensures secure management and control for the VXLAN
network. At the same time, the configuration and maintenance are simple,
significantly reducing the costs.

• For more information, see Technical Principles and Applications of


Microsegmentation and SFC.
Load Balancing M-LAG VXLAN EVPN Telemetry NETCONF Microsegmentation SFC Lossless Network

Introduction to SFC
⚫ Service Function Chaining (SFC) technology provides ordered services for the application layer.
⚫ SFC creates a chain of service functions (SF), usually value-added service (VAS) devices, along which matched traffic
passes through to obtain VASs. Typical VAS devices are firewalls, load balancers, deep packet inspection (DPI)
devices, and intrusion prevention devices.
⚫ iMaster NCE-Fabric can be used to directly orchestrate SFCs, which can be achieved through PBR or NSH.

Filtering and Filtering and


redirection Filtering and redirection
policy delivery redirection policy delivery
policy delivery

VM1 ToR ToR VM2

VXLAN tunnel Traffic diversion point

54 Huawei Confidential
Load Balancing M-LAG VXLAN EVPN Telemetry NETCONF Microsegmentation SFC Lossless Network

SFC Applications in DCs

Border Leaf
External

Spine
• When data packets are transmitted on DCNs, they need
to pass through various service nodes to ensure that
DCNs flexibly divert traffic to the service nodes as
planned, thus providing VASs for users. Typical service
Service Leaf nodes are firewalls, intrusion prevention systems (IPS),
Server Leaf
and load balancers.

• With SFC, differentiated VASs can be provided on a


network.

Firewall IPS

Service node (VAS device)

55 Huawei Confidential
Load Balancing M-LAG VXLAN EVPN Telemetry NETCONF Microsegmentation SFC Lossless Network

Introduction to the Intelligent and Lossless Network Technologies


⚫ In DCs, lossy networks cannot satisfy the requirements of high-performance systems. An intelligent and lossless
network uses the AI-ready hardware architecture and iLossless algorithm to maximize the throughput and minimize
the latency without packet loss.
⚫ The intelligent and lossless network technology architecture has five layers, which will be detailed later in the
following courses.

Application
INC iNoF
acceleration layer

Traffic
LoadBalance Queue scheduling
scheduling layer

Congestion
ECN AI ECN NPCC
control layer

Flow Control layer PFC PFC storm control PFC deadlock prevention

Hardware layer Switch NP FPGA AI Chip

56 Huawei Confidential

• Flow control: matches traffic rates between the sender and the receiver to ensure
zero packet loss.

• Congestion control: ensures the maximum throughput and minimum latency by


controlling traffic rates in the case of network congestion.

• Traffic scheduling: implements load balancing for service traffic and network
links to ensure the quality of different service traffic.
Load Balancing M-LAG VXLAN EVPN Telemetry NETCONF Microsegmentation SFC Lossless Network

Intelligent and Lossless Network Applications in DCs

Core ⚫ Service networks are often deployed as TCP/IP


lossy networks.
Service ⚫ Different industries or enterprises have different
network
zone division classifications.
Spine

Leaf

In DCs, the intelligent and lossless network solution is


applicable different computing scenarios:
HPC cluster AI cluster Storage node
⚫ For example, the HPC network and the distributed AI
Leaf training network are generally deployed as closed networks.
Intelligent and ⚫ A two-layer or three-layer spine-leaf networking
Spine lossless network
AI parameter architecture is selected based on the access node scale of
HPC network
plane network
the cluster, and appropriate leaf switches and spine
switches are selected based on the port bandwidth.

57 Huawei Confidential
Quiz
1. (True or false) iMaster NCE-Fabric sends NETCONF messages to deliver configurations to
network devices and NETCONF messages are encoded in XML format.
A. True

B. False

2. (Multiple-answer question) Which of the following devices are included in a DC? ( )


A. Environmental control devices, such as air conditioners

B. Security devices, such as platform screen doors (PSDs)

C. IT devices, such as servers

D. Communication devices, such as switches and routers

58 Huawei Confidential

1. A

2. ABCD
Summary

⚫ As the closest area to the network industry and the computing industry, DCNs
should quickly respond to IT requirements, featuring complicated and integrated
structures and rapid technological development.
⚫ This is the first course of the DCN series courses. You will understand what a DC
and a DCN is, as well as their development histories.
⚫ We will analyze more DC technical principles in detail to help you understand the
hyper-converged DCN.

59 Huawei Confidential
Thank you. 把数字世界带入每个人、每个家庭、
每个组织,构建万物互联的智能世界。
Bring digital to every person, home, and
organization for a fully connected,
intelligent world.

Copyright© 2023 Huawei Technologies Co., Ltd.


All Rights Reserved.

The information in this document may contain predictive


statements including, without limitation, statements regarding
the future financial and operating results, future product
portfolio, new technology, etc. There are a number of factors
that
could cause actual results and developments to differ materially
from those expressed or implied in the predictive statements.
Therefore, such information is provided for reference purpose
only and constitutes neither an offer nor an acceptance. Huawei
may change the information at any time without notice.
Technical Principles and Applications of
Virtualization
Foreword
⚫ Virtualization is defined in different ways by network engineers and IT engineers. With the convergence
and application of cloud network technologies in data centers (DCs), it is important to explain basic
concepts from different perspectives and clarify their differences.
⚫ Server virtualization is a technology that allows multiple virtual servers to run on one physical server.
From the perspective of IT engineers, server virtualization includes compute virtualization, storage
virtualization, and network virtualization.
⚫ From the perspective of network engineers, network virtualization refers to network device and
network architecture virtualization technologies, such as stacking, Multichassis Link Aggregation Group
(M-LAG), virtual system, and Virtual Extensible LAN (VXLAN), instead of server virtualization.
⚫ This course introduces the background and related technologies of server virtualization, and further
explains "network virtualization" in the eyes of IT and network engineers.

2 Huawei Confidential
Objectives

⚫ On completion of this course, you will be able to:


 Describe the background of server virtualization.
 Describe the applications of network virtualization in DCs.
 Describe the technical fundamentals of network virtualization in server virtualization.

3 Huawei Confidential
Contents

1. Server Virtualization
◼ Background

▫ Technical Fundamentals

▫ Deployment

2. Network Virtualization

3. Introduction to FusionCompute

4 Huawei Confidential
Overview and Objectives

⚫ This section describes what is (server) virtualization from the perspective of IT


engineers, what technologies are involved in (server) virtualization, and how to
deploy server virtualization.
⚫ You can learn the definition, development history, and key technologies of server
virtualization.

5 Huawei Confidential

• In this course, virtualization from the perspective of IT engineers is referred to as


server virtualization.
Server Virtualization Definition (1)
⚫ Server virtualization is a technology that creates multiple virtual machines (VMs) on a physical server.
It brings various benefits, including efficient physical resource utilization, rapid service provisioning, and
elasticity.
⚫ Each VM can run its own applications and services and act as a physical server.

Server virtualization

App 1 App 2 App 3 App 4


App
OS OS OS OS
OS
VM 1 VM 2 VM 3 VM 4

Physical server Physical server

6 Huawei Confidential
Server Virtualization Definition (2)
⚫ As the clustering technology emerges, server virtualization provides the ability to have multiple physical servers
operated in a cluster, which acts as a virtual resource pool.
⚫ VMs can be migrated between physical servers in a cluster. This further unlocks flexibility, elasticity, and high
availability of server virtualization.

App 1 App 2 App 3 App 4

OS OS OS OS VM

VM 1 VM 2 VM 3 VM 4

Resource
pooling

Virtualization layer Virtualization layer Virtualization layer

Physical server cluster

7 Huawei Confidential
Virtualized Server Cluster Management
⚫ As services grow, the number of VMs in a cluster reaches hundreds to thousands. Therefore, a virtualization
management platform is required for centralized management.
⚫ The virtualization management platform provides a simple user interface and various functions, such as monitoring
and managing virtualized resources. It simplifies VM creation and helps users configure and execute resource
scheduling policies.

Virtualization management platform

VM VM VM
VM VM VM

Server Server Server


8 Huawei Confidential

• Different vendors have their own virtualization management platforms since they
use different virtualization technologies, such as vCenter of VMware,
FusionCompute VRM of Huawei, SystemCenter of Microsoft, and RHEV of Red
Hat.
Server Virtualization Benefits
⚫ Increased resource utilization: Without virtualization, servers in a DC use only 5% to 30% of their
resources during normal operation. After virtualization, the utilization of virtualized server resources is
dramatically improved to more than 60%.
⚫ Reduced costs: Server virtualization provides the time-sharing feature for resources and allows dynamic
adjustment of cluster resources. As such, DCs require fewer servers and less equipment room space and
power.
⚫ Improved flexibility: Clustering allows elastic VM provisioning and can flexibly cope with service
requirements in peaks and off-peaks.
⚫ Less system breakdown: High availability (HA) for VMs helps prevent VM services from being affected
due to a faulty physical server.

9 Huawei Confidential
Contents

1. Server Virtualization
▫ Background
◼ Technical Fundamentals

▫ Deployment

2. Network Virtualization

3. Introduction to FusionCompute

10 Huawei Confidential
Server Virtualization Technologies
⚫ There are three kinds of server virtualization: compute virtualization, storage virtualization, and
network virtualization.
⚫ A hypervisor, also known as a virtual machine monitor (VMM), is introduced to compute virtualization.
It abstracts hardware into virtual resources to allow an OS to run directly on each VM. In this way,
multiple OSs can run on a single physical server at the same time.
⚫ A hypervisor virtualizes the following physical resources: CPU, memory, and input/output (I/O)
resources.
VM VM VM • Provides virtual resources abstracted
from hardware for VMs.
VMM/Hypervisor
• Manages all hardware resources
Hardware (CPU, memory, and I/O devices).

11 Huawei Confidential

• A CPU (Central Processing Unit) is one of the main devices of a computer, and a
function of the CPU is to interpret computer instructions and process data in
computer software.

• A hypervisor provides the following basic functions: Identify, capture, and respond
to privileged CPU instructions or protection instructions sent by VMs (the
privileged instructions and protection instructions will be described in the CPU
virtualization section); schedule VM queues and return physical hardware
processing results to related VMs.
Compute Storage Network
Virtualization Virtualization Virtualization

Compute Virtualization - Basic Concepts

Physical server Virtual server

Application Application

Guest OS: OS running on a VM


OS OS
Host OS Guest OS Guest machine: VM virtualized on a physical server

Hypervisor: virtualization software layer


VM
Guest machine Host OS: OS running on a physical server

Host machine: physical server


VMM
Hypervisor

Hardware Hardware
Host machine Host machine

12 Huawei Confidential

• A host machine is a physical host that can run multiple VMs, and an OS installed
and running on the host machine is a host OS. VMs running on a host machine
are called guest machines. The OS installed and running on a VM is called a
guest OS. The core of virtualization technologies is a hypervisor between the host
OS and guest OS. It can also be called Virtual Machine Manager (VMM).

• In the physical architecture, a host uses a two-tier architecture from bottom to


top: hardware (host machine) and host OS. Applications are installed on top of
the host OS. In the virtualization architecture, a host uses a three-tier
architecture from bottom to top: hardware (host machine), hypervisor, and guest
machine installed with a guest OS. Applications are installed on the guest OS.
Multiple guest machines can operate on a host machine.
Compute Storage Network
Virtualization Virtualization Virtualization

Compute Virtualization Technologies


⚫ Virtualization can be implemented in two modes based on hypervisor deployment locations, referred to as Type 1
(or bare metal) and Type 2 (or hosted).
⚫ Type 1 virtualization has a hypervisor run directly on the hardware, without the need of a host OS, while Type 2
virtualization lets a hypervisor run as a software program.

App App

App App App Guest OS Guest OS

Guest OS Guest OS Guest OS VM VM

VM VM VM App VMM/Hypervisor

VMM/Hypervisor Host OS

Hardware Hardware

Type 1 bare-metal virtualization Type 2 hosted virtualization

13 Huawei Confidential
Compute Storage Network
Virtualization Virtualization Virtualization

Compute Virtualization: CPU Virtualization


⚫ A host OS sends three types of instructions: privileged instructions and common instructions in the physical scenario, and sensitive
instructions specific to the virtualization scenario.

⚫ Hierarchical protection domains, often called protection rings, are defined for CPU instructions. A CPU has four rings, numbered from
0 through to 3. Ring 0 is the most privileged level and interacts directly with the hardware. Ring 3, the least privileged ring, is where
most applications reside.

⚫ For example, when Kernel-based Virtual Machine (KVM) is used for CPU virtualization, guest OSs send all instructions to the
hypervisor, and then the hypervisor schedules the instructions to the CPU for execution. Common instructions from applications are
executed at the non-privilege level.
Ring 3 Application

Ring 2

Ring 1 Guest OS

The hypervisor issues the privileged instructions


Ring 0 Hypervisor to be executed by the hardware CPU.

Hardware
* KVM is used as an example.

14 Huawei Confidential

• There are four CPU hierarchical protection domains, also called protection rings,
numbered 0 (most privileged) to 3 (least privileged). Ring 0 has direct access to
the hardware. Generally, only the OS and driver have this privilege. Ring 3 has
the least privileges. All programs can run in Ring 3. To protect computers, some
dangerous instructions can only be executed by the OS, preventing malicious
software from randomly calling hardware resources. For example, if a program
needs to enable a camera, the program must request the driver in ring 0 to
enable the camera. Otherwise, the operation will be rejected.
• The instructions sent by a host OS are classified into two types: privileged
instructions and common instructions.
▫ Privileged instructions: are instructions used to operate and manage key
system resources. These instructions can be executed only at the highest
privilege level, that is, Ring 0.
▫ Common instructions: are instructions that can be executed at the non-
privilege level, that is, Ring 3.
• In a virtualization environment, another special instruction type is called sensitive
instruction. A sensitive instruction is used for changing the operating mode of a
VM or the state of a host machine. The instruction is handled by VMM after a
privileged instruction that originally needs to be run in Ring 0 on the guest OS is
deprived of the privilege.
• CPU virtualization can be further classified into full virtualization, para-
virtualization, and hardware-assisted virtualization. For details, see HCIA-Cloud
Computing.
Compute Storage Network
Virtualization Virtualization Virtualization

Compute Virtualization: Memory Virtualization


⚫ Memory virtualization centrally manages physical memory of a physical server and divides the physical
memory into multiple virtual memories for VMs. As shown in the figure, the memory addresses of VMs
are contiguous.

VM 1 VM 2 Guest
virtual VA
address

Guest
physical PA
address

Machine
memory MA
Hypervisor address

15 Huawei Confidential

• Generally, a physical host uses the memory address space as follows:

▫ The memory address space starts from the physical address 0.

▫ Addresses in the memory address space are assigned contiguously.

• However, after virtualization is introduced, the following problems occur: There is


only one memory address space that can start with the physical address 0.
Therefore, it is impossible to ensure that the memory address space of all VMs on
a physical host starts from the physical address 0. Although contiguous physical
addresses can be assigned, this way of memory allocation leads to poor efficiency
and flexibility.

• Memory virtualization involves the translation of three types of memory


addresses, that is, VM memory address (VA), physical memory address (PA), and
machine memory address (MA). To have multiple VMs run a physical host,
addresses need to be translated in the following path: VA → PA → MA. The
guest OS on a VM controls the mapping from the VA to the PA of the host
memory. However, the guest OS cannot directly access the host memory.
Therefore, the hypervisor needs to map the PA to the MA.

• For details about memory virtualization techniques, such as the shadow page
table and huge page memory, see Huawei Cloud Computing certification courses.
Compute Storage Network
Virtualization Virtualization Virtualization

Compute Virtualization: I/O Virtualization


⚫ In a virtualization environment, a hypervisor implements I/O device sharing among VMs.
⚫ The hypervisor intercepts the requests from VMs to I/O devices, simulates real I/O devices using software, and
responds to the requests. In this way, multiple VMs have access to limited I/O resources.
⚫ There are three ways to implement I/O virtualization: full virtualization, para-virtualization, and hardware-assisted
virtualization.
 Full virtualization: Software is used to simulate real hardware, such as the keyboard and mouse. Physical servers are responsible
for device monitoring and simulation, resulting in poor performance.
 Para-virtualization: Domain0, a privileged VM, is introduced to run hardware drivers. Guest OSs on other VMs access I/O devices
through this privileged VM.
 Hardware-assisted virtualization: An I/O device driver is directly installed on the guest OS, without any change to the OS. In this
way, the time required for a VM to access the I/O hardware is the same as that in the traditional way. Hardware-assisted
virtualization requires special hardware support, such as intelligent network interface cards (NICs).

16 Huawei Confidential

• I/O virtualization creates a hardware middleware layer between the hypervisor


and various available I/O processing units, allowing multiple guest OSs to reuse
limited peripheral resources.

• I/O virtualization can be implemented in the following modes: full virtualization,


para-virtualization, and hardware-assisted virtualization, among which hardware-
assisted virtualization is the mainstream technology. For details, see Huawei
Cloud Computing certification courses.
Compute Storage Network
Virtualization Virtualization Virtualization

Mainstream Compute Virtualization Technologies

Compute Virtualization

CPU virtualization, memory virtualization,


and I/O virtualization

Open-source Close-source

KVM Hyper-V

Xen VMware ESXi

Huawei FusionSphere

17 Huawei Confidential

• There are many mainstream virtualization technologies, which can be classified


into open-source (such as KVM and Xen) and closed-source virtualization
technologies (such as Microsoft Hyper-V, VMware vSphere, and Huawei
FusionSphere).

• Among open-source virtualization technologies, KVM is implemented in full


virtualization mode and gains more popularity. Xen supports both para-
virtualization and full virtualization modes, but is not widely used due to various
causes. KVM, a module in the Linux kernel, is used to virtualize CPU and memory
resources. It is a process running on the Linux OS. When KVM is used, QEMU is
required to virtualize I/O devices (such as NICs and disks). Different from KVM,
Xen directly runs on hardware, and VMs run on Xen. VMs in Xen are classified
into two types: privileged VM (Domain 0) and common VM (Domain U). Domain
0 has the permission to directly access hardware and manage common VMs.
Domain 0 must be started before other VMs. Domain U is a common VM and
cannot directly access hardware. All operations on Domain U must be forwarded
to Domain 0 through frontend and backend drivers. Domain 0 completes the
operations and returns the results to Domain U.
Compute Storage Network
Virtualization Virtualization Virtualization

Server Virtualization: Storage Virtualization


⚫ In storage virtualization, the hypervisor intercepts I/O read and write instructions on the storage data plane, shields underlying
storage differences, and provides storage resources for guest OSs in a unified manner.

⚫ The significant difference between storage virtualization and compute virtualization is that storage virtualization aims to aggregate
resources as a pool, instead of dividing resources as compute virtualization does.
Compute Virtualization Storage Virtualization

App 1 App 2 App 1 App 2 App 1 App 2 App 1 App 2

OS OS OS OS

X86 instruction X86 instruction SCSI/iSCSI/NFS SCSI/iSCSI/NFS

Compute virtualization hypervisor Hypervisor for storage virtualization

X86 instruction SCSI/iSCSI/NFS

Hardware Server SAN

CPU Memory I/O device HDD SSD

18 Huawei Confidential

• The concept of storage virtualization varies in different scenarios. From the


perspective of server virtualization, storage virtualization provides storage
technologies for disk mounting and file storage access of VMs. From the
perspective of storage arrays, functions such as heterogeneous device
management and storage gateways are also be considered as storage
virtualization.
Compute Storage Network
Virtualization Virtualization Virtualization

Server Virtualization: Network Virtualization


⚫ Network virtualization creates a virtual network connecting compute and storage units. For example, this network
allows communication between VMs on the same and different physical serves, and allows VM access to file
systems.
⚫ Network virtualization requires the participation of virtual network elements (NEs). In the following figure, virtual
switches (vSwitches) set up a simple virtual network topology inside a server. In complex network virtualization
scenarios, virtual networks may contain more virtual NEs and a network controller.

Server NIC 1 NIC 2 Server NIC 1

vSwitch 1 vSwitch 2 vSwitch 1

VM 0 VM 1 VM 2 VM 3

19 Huawei Confidential

• NIC(Network Interface Card): a circuit board or card in.stalled in a computer and


connected to a network.
Compute Storage Network
Virtualization Virtualization Virtualization

Server Virtualization: Virtual Network


⚫ The virtual network topology varies according to network requirements of VMs. The following figure shows a simple virtual network
topology. In a personal or small-scale virtualization system, VMs are bound to physical NICs using bridges or NAT. In a large-scale
enterprise virtualization system, VMs are connected to physical networks through vSwitches.

⚫ The virtual network provides VMs with various capabilities, such as Layer 2 communication, isolation, Quality of Service (QoS), and
port mirroring.
Application A Application A Application A

Bins/Libs Bins/Libs Bins/Libs

Guest OS Guest OS Guest OS

Guest machine Guest machine Guest machine

Bridge
NAT
Virtual switch

Host Machine

20 Huawei Confidential

• For details about server network virtualization, see chapter 2 Network


Virtualization.
Contents

1. Server Virtualization
▫ Background

▫ Technical Fundamentals
◼ Deployment

2. Network Virtualization

3. Introduction to FusionCompute

21 Huawei Confidential
Virtualization Management Platform
Server cluster resource management and scheduling, VM operation and life cycle management.

Virtualization management platform

Compute resource Storage resource Network resource


management management management
• VM management • Storage resource • Distributed switch
• Cluster management management management
• Host management • Data storage • Uplink management
• GPU management management • Port group management
• ... • Disk management • Security group
• File management configuration
• ... • ...

VM VM VM
VM VM VM

Server Server Server


22 Huawei Confidential
Server Virtualization Deployment
⚫ Server virtualization requires a hypervisor and a virtualization management platform.
 A hypervisor is deployed directly on a physical server to create VMs.
 A virtualization management platform can be deployed as a VM on top of the hypervisor and manage all other
VMs.

Service VM Service VM Virtualization


2 management platform Virtualization management platforms and
Service VM Service VM (VM) hypervisors of mainstream vendors include:

Hypervisor Hypervisor 1 Hypervisor • VMware vCenter and ESXi

Server Server ... Server • Huawei CNA and VRM

Physical switch

23 Huawei Confidential
Server Virtualization Topology
Management plane
Storage plane
Storage Service plane
··· interface
NIC 1 BMC plane

NIC 2

Access
NIC 3 switch Storage device
Management NIC
of controller A
NIC 4
Management NIC
of controller B
NIC 5

··· Storage interface


Onboard NIC 6
NIC

24 Huawei Confidential

• Baseboard management controller (BMC) plane:

▫ Plane used by the BMC network port on a host. This plane enables remote
access to the BMC system of a server. It is similar to the management port
of a switch.

• Management plane:

▫ Plane used by the management system to manage all nodes in a unified


manner and used by internal nodes for communication.

• Storage plane:

▫ Network plane on which hosts communicate with storage units on storage


devices.

• Service plane:

▫ Plane used by service data of user VMs.


Section Summary
⚫ A complete implementation of server virtualization requires multiple virtualization technologies
working simultaneously:
 Computing virtualization: includes CPU virtualization, memory virtualization, and I/O virtualization. A computing
resource pool is used to integrate physical CPU and memory resources of a host into a computing resource pool,
and then allocate virtual CPU and memory resources to provide computing capabilities for VMs.
 Storage virtualization: The virtualization layer is compatible with various storage types. Virtual storage space
provided by different storage types is integrated into storage resource pools and allocated to VMs as virtual
volumes.
 Network virtualization: provides VM NICs, virtual switches, and internal networks of servers to enable
communication between VMs and between VMs and external networks.

25 Huawei Confidential
Contents

1. Server Virtualization

2. Network Virtualization
◼ Overview

▫ Fundamentals

3. Introduction to FusionCompute

26 Huawei Confidential
Overview and Objectives
⚫ Network virtualization focuses on virtual network configuration and connection inside
servers. Traditional network engineers are unaware of this and cannot understand traffic
forwarding paths from an overall perspective.
⚫ In this section, you will learn the applications and fundamentals of network virtualization
based on service traffic forwarding paths from the perspective of network engineers.

27 Huawei Confidential
Network Virtualization in DCs
⚫ In a DC, network virtualization mostly applies to the network layer and server layer.
⚫ Network virtualization at the network layer is
classified into two types:
Spine  Device virtualization: such as stacking, M-LAG,
and virtual system.
 Network architecture virtualization: such as a
Leaf large Layer 2 network in the spine-leaf
architecture with VXLAN and BGP EVPN.

⚫ Network virtualization at the server layer: sets up


NIC 1 virtual networks connecting virtual NEs inside
Server
vSwitch 1 servers to implement network connectivity after
VM 1 VM 2 server virtualization.

28 Huawei Confidential
End-to-End Network Virtualization
⚫ Traffic is forwarded along the following path: VM -> vSwitch -> physical NIC (based on mappings) -> physical
switch -> destination device. This process involves three phases: virtual access, network connection, and network
switching.

Network switching
VXLAN
tunnel

Network connection
Host 1 Host 2
Physical Physical
NIC NIC

vSwitch vSwitch Virtual access

vNIC vNIC vNIC vNIC vNIC vNIC


VM VM VM VM VM VM

29 Huawei Confidential

• vNIC: virtual network interface card (vNIC) of a VM.

• VXLAN is used to separate the underlay network from the overlay network so
that physical networks can be reused. In addition, physical switches provide the
virtualization capability.
From a VM to a vSwitch

Host 1 ⚫ A vSwitch inside a server identifies the source VMs of data packets.
Physical
NIC ⚫ A vSwitch provides virtual interfaces for VM access.
⚫ A vSwitch uses local VLANs to distinguish traffic of different VMs.
vSwitch ⚫ Local VLANs take effect only between a vSwitch and a VM.

VLAN 4 VLAN 5 VLAN 6


Local
VLAN
vNIC vNIC vNIC
VM VM VM

30 Huawei Confidential
From a vSwitch to a Physical NIC

⚫ When sending a packet received from a VM to a physical NIC, a


Host 1 vSwitch removes the local VLAN tag from the packet and adds
Physical
NIC
a new VLAN tag that identifies the VLAN to which the physical
VLAN 100 VLAN 200 interface corresponding to the physical NIC belongs.
10.10.10.0/24 10.10.20.0/24
⚫ In most cases, packets from different network segments are
vSwitch encapsulated with different VLAN tags.

VLAN 4 VLAN 5 VLAN 6


⚫ Allowed VLANs, bonding mode, and vSwitch connection mode
Local can be configured for physical NICs on servers.
VLAN
vNIC vNIC vNIC
VM VM VM

31 Huawei Confidential

• Bond: The Linux NIC bonding function is used to bond host network ports to
improve network reliability.
From a Physical NIC to an Access Switch

[SW1] interface GigabitEthernet 1/0/1.1 mode l2


[SW1-GigabitEthernet1/0/1.1] encapsulation dot1q vid 200
[SW1-GigabitEthernet1/0/1.1] bridge-domain 2000

Host 1
Physical ⚫ An access switch connects to physical NICs of a server.
NIC
VLAN 100 VLAN 200 ⚫ In the VXLAN scenario, traffic from different VLANs is forwarded
10.10.10.0/24 10.10.20.0/24
through different Layer 2 sub-interfaces, and directed to different

vSwitch
VXLAN tunnels.
⚫ Access switches can set up a stack or an M-LAG system to improve
VLAN 4 VLAN 5 VLAN 6
reliability.
Local
VLAN
vNIC vNIC vNIC ⚫ If link aggregation is required, the LACP mode set on the access
VM VM VM switch must be the same as that on the physical NICs on the server.

32 Huawei Confidential
From the Local Access Switch to the Remote Switch

BGP EVPN ⚫ VXLAN and BGP EVPN are used between switches to
build a large Layer 2 network.
VXLAN
tunnel ⚫ On the control plane, BGP EVPN is used to transmit IP
and MAC addresses of VMs, establish VXLAN tunnels,
and import external routes.

Host 1 Host 1 ⚫ On the data plane, VXLAN tunnels are used to


forward traffic.

33 Huawei Confidential
New Traffic Model: Host Overlay

Traditional
⚫ VMs on a host use Open vSwitches (OVSs) to
network differentiate networks. VXLAN tunnels are established
Host 1 Host 2 between OVSs on different hosts to set up the large
Physical Physical Layer 2 network required by VM communication.
NIC NIC
VTEP VTEP
Hardware switches only provide connectivity, and
therefore require only the traditional network
vSwitch VXLAN
vSwitch configuration.
tunnel

⚫ This scenario applies mostly in host overlay networking.

vNIC vNIC vNIC vNIC vNIC vNIC


VM VM VM VM VM VM

34 Huawei Confidential
Contents

1. Server Virtualization

2. Network Virtualization
▫ Overview
◼ Fundamentals

3. Introduction to FusionCompute

35 Huawei Confidential
Server OS Basics
⚫ Compared with network devices, the server OS network works in a different way but still follows the OSI model.
⚫ Use Linux as an example. Linux consists of the user space and kernel space, also referred to as user and kernel
modes, respectively. Simply speaking, the user space is where application programs run whereas the kernel space
controls hardware resources to support program running in the user space. The network protocol stack runs in the
Linux kernel space. Application layer
• Data is encoded, encrypted, and
compressed by applications, layer by
Application program Presentation layer
User space layer, without interactions with the
kernel space.
Session layer

• Data is encapsulated by the network protocol


Transport layer
Network protocol stack stack in the kernel space, layer by layer,
according to the TCP/IP model without
Network layer
interactions with applications in the user space.
Kernel space
• The NIC driver instructs the NIC to
NIC driver Data link layer transmit data frames.

• Bit streams are transmitted.


Hardware NIC Physical layer

36 Huawei Confidential
How Does a Server NIC Send and Receive Data?
⚫ A physical NIC sends and receives data as follows (when the CPU executes data copying):
 Sending data: The kernel reads data from the network protocol stack and writes it to the physical NIC. The NIC then sends the
data to the destination external network.
 Receiving data: Upon data receipt, the physical NIC triggers an interrupt to the CPU, which then instructs the kernel to read data
and place it in the memory. The network protocol stack then parses the data.
Application
User space program ⚫ The NIC driver needs to register the physical NIC in the kernel space so
that the NIC can function properly. After registration, an NIC interface
name is available.
Kernel space Network
Buffer
protocol stack ⚫ Interface properties, such as the IP address and mask, can be set for a
Driver physical NIC. These properties are configured in the network protocol
stack of the kernel space.
Hardware NIC ⚫ A physical NIC connects to the network protocol stack in the kernel space
on one end and connects to an external network on the other end.
External
network

37 Huawei Confidential

• Currently, an intelligent server NIC provides the Direct Memory Access (DMA)
function, which allows data to be directly cached to the memory, bypassing the
CPU. As such, the NIC is responsible for data transmission with the network
protocol stack. After the DMA data transfer is complete, the DMA controller
(DMAC) triggers an interrupt to the CPU, indicating that the transfer is
completed. In this process, the CPU does not need to read or write data.
Linux Virtual Network Devices (TUN/TAP)
⚫ The kernel can create virtual NICs (vNICs), which are similar to physical NICs, and provide NIC drivers for these vNICs to complete
registration.
⚫ TAP and TUN are vNICs defined in the Linux kernel. TUN reads and writes Layer 3 IP packets whereas TAP reads and writes Layer 2
Ethernet frames.
⚫ A vNIC connects to the user space on one end and connects to the network protocol stack on the other end. Therefore, vNICs can
neither directly send data packets to nor directly receive data packets from physical NICs.

App 1 App 2 VM 1

User space
vNIC /dev/tun X /dev/tun X /dev/tap Y
Kernel space
Layer 3 packets Layer 3 packets Layer 2 frames

Network stack

NIC
Traffic between apps
App-to-external network traffic
External network
VM-to-external network traffic

38 Huawei Confidential

• TUN and TAP are two types of vNICs in a Linux system and provide packet
reception and transmission functions. Compared with physical NICs, TUN and
TAP provide almost the same functions, except that they do not provide the
hardware functions of physical NICs. In addition, TUN and TAP are responsible
for transferring data between the user space and the network protocol stack in
the kernel space.

• In Linux, the character special files corresponding to TAP and TUN are /dev/tapX
and /dev/tunX, respectively.

• TAP devices are usually used to connect to network devices, such as vSwitches.
TUN devices are usually used to re-encapsulate data sourced from application
programs in the user space, for example, encapsulating data using IPsec VPN.

• For more information, see


https://www.kernel.org/doc/html/latest/networking/tuntap.html.
vNIC vs Physical NIC
⚫ The kernel allows same configurations on vNICs and physical NICs.
⚫ For example, a vNIC can be configured with a MAC address, an IP address, and a subnet mask.
⚫ Physical NICs and vNICs transfer data in different ways. Physical NICs transfer data as bit streams
whereas vNICs copy data to and from the memory.

39 Huawei Confidential
SR-IOV: Improves I/O Performance
⚫ Single Root I/O Virtualization and Sharing Specification (SR-IOV) is a hardware-based virtualization solution that improves
performance and scalability.

⚫ SR-IOV enables efficient sharing of a physical Peripheral Component Interconnect Express (PCIe) device among VMs. This physical
PCIe device can present itself as multiple virtual devices, of which each is directly attached to a VM and has an independent memory
space, queues, interrupts, and command execution capability. As such, the physical PCIe device can perform direct I/O with attached
VMs, achieving I/O performance that is comparable to native performance.

VM VM VM VM VM VM

VMM vSwitch VMM VMM

Physical NIC Physical NIC Queue Queue Physical NIC vNIC vNIC

Host Host Host


Regular NIC VMDq NIC SR-IOV NIC

40 Huawei Confidential

• SR-IOV requires special hardware.

• VMDq: virtual machine device queue.


Smart NIC
⚫ Smart NICs integrate wired networking and compute resources. They offload vSwitch functions from server CPUs,
thus freeing up CPU compute resources to drive application performance. In this way, smart NICs expand NIC
functions and provide higher performance.
⚫ Smart NICs can also offload network virtualization protocols, such as Virtual Extensible LAN (VXLAN) and Network
Virtualization using Generic Routing Encapsulation (NVGRE). They support the SR-IOV function as well.

VM VM VM vSwitch
FPGA/ASIC/SoC
vSwitch Smart NIC
CPU
Uses dedicated chips for data forwarding
Host
Releases host’s CPU resources

41 Huawei Confidential

• SoC: System-on-Chip.

• FPGA: Field Programmable Gate Array.

• NVGRE: Network Virtualization using Generic Routing Encapsulation.


Introduction to Linux Bridges
⚫ A Linux Bridge is a virtual network device that works at Layer 2 in a Linux system. It is named after br in the OS.
⚫ Network devices, such as TUN and TAP devices, can be added to a Linux Bridge as interfaces. Devices on a Linux
Bridge can receive only Layer 2 data frames and forwards all received data frames to the Linux Bridge.
⚫ Similar to a switch, a Linux Bridge supports various functions, such as MAC address learning, STP, and VLAN.

VM 0 VM 1

TAP 0 TAP 1
Linux
br0

eth0

Hardware switch

42 Huawei Confidential

• Similar to a physical switch, a Linux Bridge looks up for the outbound port for
forwarding a data frame in the MAC address table and updates the table. As
such, a Linux Bridge can decide whether to forward the data frame to another
interface, discard it, broadcast it, or send it to the upper-layer protocol stack.

• When Linux Bridges are used to set up virtual networks, bridge_netfilter of Linux
Bridges works with iptables to implement the security group function in the cloud
computing scenario.

• For more information, see https://wiki.linuxfoundation.org/networking/bridge.


Introduction to OVS
⚫ An Open vSwitch (OVS) is a virtual switch running on an open-source virtualization platform. It supports OpenFlow
and tunneling technologies such as GRE, VXLAN, and IPsec, and provides comprehensive functions in network
security, monitoring, management, and QoS.
⚫ OVS can be deployed across multiple physical servers.
⚫ Compared with a Linux Bridge, an OVS gains popularity in virtualization and cloud computing scenarios due to its
rich functions.

Single-server environment

Host 1 Security: Monitoring:


VLAN isolation, NetFlow, sFlow,
traffic filtering SPAN, RSPAN
vSwitch (OVS)
QoS: Automated control:
traffic queuing and OpenFlow, OVSDB
traffic shaping mgmt protocol
VM VM VM

43 Huawei Confidential

• An Open vSwitch (OVS) is a software-based open-source virtual switch. It


complies with the Apache 2.0 license. It supports multiple standard management
interfaces and protocols, such as NetFlow, sFlow, SPAN, Remote Switched Port
Analyzer (RSPAN), Command Line Interface (CLI), LACP, and 802.1ag. It can be
deployed across multiple physical servers (similar to vSwitch from VMware and
Nexus 1000V from Cisco). OVS supports the OpenFlow protocol and can be
integrated with multiple open-source virtualization platforms.

• OVS supports but is not limited to the following features:

▫ Supports traffic monitoring protocols, such as NetFlow, IPFIX, sFlow, and


SPAN/RSPAN.

▫ Supports fine-grained ACL and QoS policies.

▫ Supports port bonding, LACP, and tunneling (VXLAN, GRE, and IPsec).

▫ Supports the standard 802.1Q VLAN protocol.

▫ Supports VM interface-based traffic management policies.

• For more information, see https://docs.openvswitch.org/en/latest/intro/what-is-


ovs/.
Introduction to DVS
⚫ A distributed virtual switch (DVS) is an abstract representation of multiple hosts defining the same name, network
policy, and attribute. OVS is a kind of DVS.
⚫ A DVS lets VMs maintain consistent network configuration and policy as they migrate across multiple hosts.

App App App App

OS OS OS OS • A DVS acts as a virtual switch across hosts.


vNIC vNIC vNIC vNIC • A DVS provides VMs with consistent network
experience regardless of physical host locations.
Distributed Virtual Switch (OVS) • A DVS is an OVS on a host (if OVS is used).

Hypervisor Hypervisor

Server Server

44 Huawei Confidential
DVS Fundamentals
⚫ Key concepts in DVS:
 Distributed port group: provides VMs with network connections that span across hosts. A DVS can have multiple distributed port
groups.
 Uplink: At the host level, each uplink is connected to a physical NIC. Uplinks are used to configure physical connections of hosts.
 Uplink port group: can have one or more uplinks. A DVS can have only one uplink port group.

VM VM VM VM VM VM

Port group port


Distributed Distributed port Port group port
Distributed Distributed port
1
group 1 group 2 1
group 1 group 2
DVS
Uplink port group Uplink port group

Uplink 1 Uplink 2 Uplink 1 Uplink 2


Host 1 Host 2
NIC 1 NIC 2 NIC 1 NIC 2

TOR switch

45 Huawei Confidential

• A DVS provides similar functions as a physical switch. Each host is connected to


the DVS. A DVS connects to VMs through distributed port groups and connects to
physical Ethernet adapters on hosts where the VMs reside. As such, a DVS
implements communication between virtual and physical networks, since it
connects hosts and VMs.

• A DVS functions as a single virtual switch across all associated hosts. It allows
VMs to maintain consistent network configuration as they migrate across hosts.
How Do DVSs Allow VMs to Communicate
⚫ DVS 1 and DVS 2 are created on the TOR switch, each of which has two port groups in VLANs 10 and 20, respectively. The gateways
for VMs in the two VLANs are located on the TOR switch.

⚫ VMs on the same DVS and in the same VLAN can communicate with each other directly inside the host where they reside. VMs on
the same DVS but in different VLANs, as well as VMs on different DVSs, can communicate with each other only through a physical
switch. Port group 1, in VLAN 10
VM1 VM2 VM4 VM5 VM6
Port group 2, in VLAN 20

VM7 Port group 3, in VLAN 10


VM3
Port group 4, in VLAN 20

DVS1

DVS2

Host1 Uplink Uplink Host2 Uplink Uplink

• VLANIF 10: 10.10.10.0/24


• VLANIF 20: 10.10.20.0/24
ToR Switch

46 Huawei Confidential

• As shown in the figure, DVS 1 has two port groups: port group 1 connecting VM
1, VM 2, and VM 5 and port group 2 connecting VM 4 and VM 6. DVS 2 also has
two port groups: port group 3 connecting VM 3 and port group 4 connecting VM
7. The two DVSs have separate uplinks.

• VMs connected to the same port group can directly communicate with each
other. VMs on the same host and DVS can communicate with each other directly
through the DVS. For example, VM 1 and VM 2 can communicate through DVS 1.
VMs on different hosts but on the same DVS, such as VM 1 and VM 5, can
communicate with each other through uplinks of the DVS.

• VMs connected to different port groups (that is, in different VLANs), no matter
whether they are located on the same DVS or host, can communicate with other
only through the physical switch that allows inter-VLAN communication. For
example, VM 1 in VLAN 10 can communicate VM 4, VM 6, and VM 7 in VLAN 20
only through the TOR switch.

• Traffic between VMs on different DVS but connected to port groups in the same
VLAN (such as VM 1 and VM 3 connected to port groups 1 and 3, or VM 4 and
VM 7 connected to port groups 2 and 4, respectively) is transmitted through DVS
uplinks to the physical switch for forwarding.
OVS Application in Virtualization and Cloud Computing Scenarios

Host 1 Host 2 Layer 2 traffic between


VMs on the same
compute node
VM 1 VM 2 VM 3
Layer 2 traffic between
VMs on different compute
qbr qbr qbr nodes through VXLAN

Traffic from VMs to the


br-int br-int physical external network

br-phy br-tun br-phy br-tun

Bond Tunnel-bearing Bond Tunnel-bearing

eth1 eth2 eth1 eth2

TOR switch

47 Huawei Confidential

• qbr: Linux Bridge, which provides security group services for VMs and implements
security isolation.

• br-int: one of the OVS core bridges. Layer 2 and Layer 3 traffic must pass through
this bridge. Local VLANs help isolate different virtual networks on a host, and
take effect only locally.

• br-phy: a physical bridge, one of the OVS core bridges. Physical NICs of a node
are mounted on this bridge. It encapsulates traffic of different service VLANs
based on the flow table, and then sends encapsulated packets to physical
external networks through physical NICs.

• br-tun: a tunnel bridge, one of the OVS core bridges. This bridge is used to
forward VXLAN traffic. Tunnel-bearing is a VTEP that encapsulates and
decapsulates VXLAN packets.

• Bond: NIC bonding provided by Linux. It bonds the NIC ports on a host to
improve network reliability.
Section Summary
⚫ This section uses the Linux OS as an example to describe traffic forwarding principles and
process on the underlying network deployed with server virtualization.
⚫ This section focuses on the functions and principles of virtual network devices in a Linux
system, such as TAP/TUN, Linux Bridge, and OVS.

49 Huawei Confidential
Contents

1. Server Virtualization

2. Network Virtualization

3. Introduction to FusionCompute

50 Huawei Confidential
Introduction to FusionCompute Virtualization Suite
⚫ Huawei FusionCompute virtualization suite is an industry-leading virtualization solution.
⚫ The FusionCompute virtualization suite deploys virtualization software on servers so that one physical server can
function as multiple servers. Achieve high consolidation ratios by consolidating existing workloads and utilizing
remaining servers to deploy new applications and solutions, which greatly improves the efficiency of the data center
infrastructure.
⚫ The FusionCompute virtualization suite brings the following benefits to customers:
 This feature helps customers improve resource utilization of data center infrastructure.
 Help customers shorten the service rollout period by multiple times.
 Help customers reduce data center energy consumption by multiple times.
 With the high availability and strong recovery capability of the virtualized infrastructure, the solution quickly and automatically
recovers services from faults, reducing data center costs and increasing system application uptime.

51 Huawei Confidential

• Application scenario: This scenario applies to the scenario where enterprises use
FusionCompute as the unified O&M management platform to operate and
maintain the entire system. including resource monitoring, resource
management, and system management.

• FusionCompute virtualizes hardware resources and centrally manages virtual


resources, service resources, and user resources. It uses virtual computing, virtual
storage, and virtual network technologies to virtualize computing, storage, and
network resources. In addition, the unified interface is used to centrally schedule
and manage these virtual resources, reducing service operating costs and
ensuring system security and reliability.

• This section describes the features of the Fusioncompute virtual network. For
details about other features and scenarios, see the FusionCompute Product
Documentation.
FusionCompute Architecture
⚫ Shows the logical architecture of the FusionCompute virtualization suite.

FusionCompute virtualization suite

(Optional) eBackup UltraVR (Optional)

FusionCompute
FusionCompute (Mandatory) Virtualization Suite
Container
management
Host Virtualized file Virtualized (KRM)
virtualization system network (OVS)

Hardware infrastructure
Non-FusionCompute
Virtualization Suite
Server Storage Network & Security

52 Huawei Confidential

• FusionCompute is a cloud operating system software that virtualizes hardware


resources and centrally manages virtual resources, service resources, and user
resources. It uses virtual computing, virtual storage, and virtual network
technologies to virtualize computing, storage, and network resources.

• eBackup is a virtual backup software. It works with the snapshot function and
CBT function of FusionCompute to implement the VM data backup solution of
FusionCompute. (eBackup does not support virtualization deployment in
Haiguang scenarios.)
• UltraVR is the DR service management software. It uses the asynchronous remote
replication feature provided by the underlying SAN storage system to protect and
restore key VM data.

• Container management: manages Kubernetes clusters and nodes, content


libraries, projects, and container images.

• Note: FusionCompute is mandatory, and eBackup and UltraVR are optional. This
section describes only mandatory components.
FusionCompute Logical Architecture

Module Function

FusionCompute • Manages block storage resources in a cluster.


• Manages network resources (IP addresses and VLANs) in a cluster and allocates IP
addresses to VMs.
VRM • Manages the life cycle of VMs in a cluster and the distribution and migration of VMs
on compute nodes.
VRM • Manages dynamic resource adjustment in a cluster.
REST • Manages virtual resources and user data in a unified manner and provides services
such as elastic computing, storage, and IP addresses.
• The provides a unified O&M management interface for O&M personnel to remotely
access the through the WebUI. FusionCompute performs O&M on the entire system,
CNA including resource management, resource monitoring, and resource reporting.

• Provides the virtual computing function.


• Manages VMs on compute nodes.
CNA
• Manages computing, storage, and network resources on compute nodes.

53 Huawei Confidential

• Virtual Resource Management (VRM): functions as a unified O M management


platform and manages multiple CNA hosts.

• Computing Node Agent (CAN): deployed on a computing node, manages VMs on


the computing node and mounts the VMs to the corresponding virtual volumes.
FusionCompute Virtual Network Management
⚫ As shown in the figure, the virtual NIC of a VM is connected to the DVS through a port group, and then connected to the physical
NIC of the host through the uplink of the DVS. In this way, the VM can communicate with the external network environment.

Physical switch

Physical NIC Physical NIC

Uplink Uplink
Distributed virtual switch
Port group Port group Port group Port group Port group Port group
vlan100 vlan200 vlan300 vlan100 vlan200 vlan300

VM1 VM2 VM3 VM4 VM5 VM6

Server Server

54 Huawei Confidential

• This section describes how to create network resources, such as distributed


switches (DVSs) and port groups, and how to adjust and configure network
resources on FusionCompute.

• A DVS is a virtual switch. It functions like a Layer 2 physical switch. It connects to


VMs through port groups and connects to physical networks through uplinks.

• A port group is a virtual logical port. Similar to a network attribute template, a


port group defines the mode in which VM NICs are connected to the network
through a DVS.

▫ VLAN mode: No IP address is allocated to the VM NICs that use the port
group. You need to manually allocate IP addresses to the VM NICs.
However, the VMs are connected to the VLAN defined by the port group.

▫ MUX VLAN mode: The Layer 2 traffic isolation mechanism provided by the
MUX VLAN enables some users to communicate with each other and
isolate other users.

• Uplinks are used by the DVS to connect to the physical NICs of hosts and are
used for VM data uplinks.
VM Provisioning (1)
Creation Mode Description
An empty virtual machine is like a blank physical computer without an operating system installed.
When creating an empty VM, you can create it on a host or cluster and customize the CPU, memory, disk, and NIC
Creating an empty VM specifications.
After an empty VM is created, you need to install the OS on the VM. The procedure for installing the operating
system is the same as that for installing the operating system on a physical machine.
Use a template to create VMs similar to the template.
• Use an existing template to create VMs by converting the template to a VM and deploying VMs based on the
template.
• Export the template used by other sites and import the template to create VMs at the site.
When a template is converted to a VM, all attributes of the VM are the same as those of the template. After the
Creating a VM Using a Template conversion, the template does not exist.
When a VM is deployed using a template or a VM is imported using a template, the following attributes are
inherited from the template and other attributes can be customized.
• VM OS type and version
• Number, capacity, and bus type of VM disks
• Number of VM NICs
Clone a VM similar to an existing VM in the system.
During VM cloning, the following attributes are inherited from the original VM. Other attributes can be customized.
• VM OS type and version
Creating a VM Using a VM
• Number, capacity, and bus type of VM disks
• Number of VM NICs
If you have a virtual machine that you want to clone frequently, you can set the virtual machine as a template.

55 Huawei Confidential

• A virtual machine, like a physical computer, is a virtual computer that runs an


operating system and applications.

• A VM runs on a CNA and obtains required computing resources such as CPUs and
memory, USB devices, network connections, and storage access from the CNA.
Multiple VMs can run on one CNA at the same time. FusionCompute provides
multiple methods for creating VMs.
VM Provisioning (2)

Creation Mode Recommended Application Scenario

• Create a VM for the first time during the initial deployment of the system.
• If no suitable template or VM is available in the system (the OS and hardware configuration are the same), you
Creating an empty VM need to create an empty VM.
• Create an empty virtual machine, install an operating system on it, and convert or clone the virtual machine to a
template so that you can use the template to create a virtual machine.

• A proper template is available in the system (the operating system and hardware configuration are the same).
Creating a VM Using a Template Using the template to create a VM can save time.
• Export the template of another site and import the template to create VMs at the site.

When deploying multiple similar virtual machines, you can create, configure, and install different software on a
Creating a VM Using a VM single virtual machine, and then clone the virtual machine multiple times instead of creating and configuring each
virtual machine separately.

56 Huawei Confidential
Section Summary

⚫ This section describes the features of FusionCompute, including the logical


architecture of FusionCompute and the functions of each module, such as
the functions of CAN and VRM.

57 Huawei Confidential
Quiz

1. (Essay) What functions does a virtualization management platform provide?

2. (Multiple-answer question) Which of the following virtual network devices work


at Layer 2 when server virtualization is deployed? ( )
A. TUN

B. TAP

C. Linux Bridge

D. OVS

58 Huawei Confidential

1. A virtualization management platform manages virtualized clusters in a unified


manner, provides a simple management interface for users, monitors and
manages virtual resources, simplifies the VM creation process, and configures
and executes resource scheduling policies.

2. BCD
Summary
⚫ With increasingly wide application of virtualization technologies in DCs, servers are being
integrated into physical networks. Network engineers need to have basic knowledge of
server virtualization.
⚫ This course introduces the principles of server virtualization and server network
virtualization from the perspective of IT engineers, and describes end-to-end traffic
forwarding between servers and physical networks deployed with network virtualization.
⚫ For more information about server and network virtualization principles, visit the websites
on the More Information slide.

59 Huawei Confidential
More Information

⚫ https://www.kernel.org/doc/html/latest/networking/tuntap.html
⚫ https://wiki.linuxfoundation.org/networking/bridge
⚫ https://docs.openvswitch.org/en/latest/intro/what-is-ovs/

60 Huawei Confidential
Thank you. 把数字世界带入每个人、每个家庭、
每个组织,构建万物互联的智能世界。
Bring digital to every person, home, and
organization for a fully connected,
intelligent world.

Copyright©2023 Huawei Technologies Co., Ltd.


All Rights Reserved.

The information in this document may contain predictive


statements including, without limitation, statements regarding
the future financial and operating results, future product
portfolio, new technology, etc. There are a number of factors that
could cause actual results and developments to differ materially
from those expressed or implied in the predictive statements.
Therefore, such information is provided for reference purpose
only and constitutes neither an offer nor an acceptance. Huawei
may change the information at any time without notice.
Technical Principles and Applications of
VXLAN
Foreword
⚫ Cloud computing has become a new form of enterprise IT construction due to its advantages such as
high system utilization, low labor and management costs, and high flexibility and scalability. In cloud
computing, widely deployed virtualization is a basic technology mode. The wide deployment of server
virtualization technology greatly increases the computing density of data centers (DCs). In addition, to
implement flexible service changes, VMs need to be migrated without restrictions on the network.
⚫ Virtual eXtensible Local Area Network (VXLAN) is an important overlay technology. It can solve the
problems faced by traditional DCs, such as small VM migration scope, limited number of VMs, and
limited network isolation capability. VXLAN is widely used in SDN network scenarios of DCs, such as
cloud-network integration DC scenarios.
⚫ This course describes the background, fundamentals, and application scenarios of VXLAN and EVPN.

2 Huawei Confidential
Objectives

⚫ On completion of this course, you will be able to:


 Describe the network requirements of DCs and how VXLAN meets these requirements.
 Describe basic concepts of VXLAN.
 Describe fundamentals of VXLAN.
 Understand concepts and fundamentals of EVPN.
 Understand the combination of EVPN and VXLAN technologies.

3 Huawei Confidential
Contents

1. Background of VXLAN

2. Basic Concepts and Fundamentals of VXLAN

3. EVPN VXLAN Fundamentals

4. VXLAN Deployment Cases in Typical Scenarios

4 Huawei Confidential
Technical Background: Virtualization Is Widely Deployed by
Enterprises
⚫ Virtualization technologies reduce IT and O&M costs, and improve service deployment flexibility. More and more
enterprises choose to use cloud computing or virtualization technologies in their DC IT facilities.
⚫ After an enterprise chooses the virtualization architecture, services are deployed on VMs in server clusters.

Services are deployed on VMs in server clusters.

Web 1 DB APP Web 2 DB APP

Guest OS Guest OS Guest OS Guest OS Guest OS Guest OS

Hypervisor Hypervisor

Server Cluster Server

Physical network
5 Huawei Confidential
New Network Requirement - Layer 2 Extension
⚫ VMs in a virtualization or cloud computing cluster can be migrated flexibly. As a result, VMs running the same
service (on the same network segment) may run on different servers, or the same VM (with the same IP address)
may run on different servers (physical locations) at different times.
⚫ Physical servers may be distributed in equipment rooms that are geographically distant from each other. Therefore,
Layer 3 connectivity is required.
Layer 2 communication across a Layer 3 network is required.
Layer 2 communication is required for the same service.

Web 1 DB APP Web 2 DB APP

Guest OS Guest OS Guest OS Guest OS Guest OS Guest OS

Hypervisor Hypervisor

Server Cluster Server

Physical network (Layer 3)


6 Huawei Confidential

• After servers are virtualized, services are encapsulated on VMs. VMs can be live
migrated to any host in a cluster. One of the features of live migration is that the
network status does not change. This requires that the IP addresses of VMs in
different physical locations remain unchanged. Therefore, a large Layer 2
network is required to solve this problem.
New Network Requirement - Multi-Tenant Isolation
⚫ In cloud-based scenarios, multi-tenancy is supported, that is, different tenants share physical resources. This poses
two requirements on the network: inter-tenant isolation and intra-tenant communication.
 Inter-tenant isolation: Tenants may be configured with the same MAC address and IP address. Physical network isolation needs
to be considered, and a large number of users need to be isolated.
 Intra-tenant communication: VMs on the same network segment of a tenant can directly communicate with each other at Layer
2, even if they are located in different equipment rooms. Inter-tenant
Intra-tenant Layer 2 communication network isolation

Tenant 1 Tenant 2 Tenant 3 Tenant 1 Tenant 2 Tenant 3

Guest OS Guest OS Guest OS Guest OS Guest OS Guest OS

Hypervisor Hypervisor

Server Cluster Server

Physical network (Layer 3)


7 Huawei Confidential
Challenges Facing Traditional Networks
VM quantity limited by entry
Limited network isolation capabilities Limited VM migration scope
specifications of devices

• After servers are virtualized, the • The VLAN ID field has only 12 bits. • VM migration must be performed on a
number of VMs increases greatly • In large virtualization and cloud Layer 2 network.
compared with the number of computing service scenarios, the • VM migration on a traditional Layer 2
original physical machines. However, number of tenants is much greater network is limited to a small scope.
the MAC address table size of Layer than the number of available VLANs.
2 access devices is small, which
• VLANs on traditional Layer 2 networks
cannot meet the requirements of the
rapidly increasing number of VMs. cannot adapt to dynamic network
adjustment.
802.1Q-tagged frame VMs can be migrated only within a
VLAN. The number of VLANs is limited.
Destination Source 802.1Q Length/
Payload FCS
MAC MAC Tag Type

Each device must have a The 12-bit VLAN IDs can represent End-to-end
large MAC address table. only 4096 logical units. VLAN

The number of tenants supported by a


Large numbers of VMs
large DC is much greater than 4096.
DC Migration

8 Huawei Confidential
Overview of VXLAN
⚫ VXLAN is essentially a virtual private network (VPN) technology and can be used to build a Layer 2 virtual network
(overlay network) on any physical network (underlay network) with reachable routes. VXLAN tunnels can be built
between VXLAN gateways to implement communication within a VXLAN network as well as communication
between a VXLAN network and a non-VXLAN network.
⚫ VXLAN utilizes MAC-in-UDP encapsulation to extend Layer 2 networks. It encapsulates Ethernet packets into IP
packets for these packets to be transmitted through routing, without considering the MAC addresses of VMs. In
addition, Layer 3 networks are not limited by the network architecture and support large-scale scalability. VM
migration through routed networks is also not limited by the physical network architecture.

Spine
VXLAN

OSPF

Leaf

Underlay Overlay

9 Huawei Confidential

• VXLAN resolves the following problems on traditional networks:


▫ VM quantity limited by network specifications:
▪ VXLAN encapsulates data packets sent from VMs into UDP packets,
and encapsulates IP and MAC addresses used on the physical network
into the outer headers. Devices on the network are aware of only the
encapsulated parameters but not the inner data.
▪ Only VXLAN network edge devices need to identify the MAC
addresses of VMs, thereby reducing the number of MAC addresses
that must be learned and enhancing device performance.
▫ Limited network isolation capabilities:
▪ VXLAN uses a VNI field similar to the VLAN ID field to identify users.
The VNI field has 24 bits and can identify up to 16M VXLAN
segments, effectively isolating and identifying a large number of
tenants.
▫ VM migration scope limited by the network architecture:
▪ VMs with IP addresses on the same network segment are logically
located in the same Layer 2 domain even if they are physically located
on different Layer 2 networks. VXLAN builds a virtual large Layer 2
network over a Layer 3 network.
• Underlay network: It is a physical network that functions as the base layer of the
upper-layer logical network.
• Overlay network: It is a logical network built on a physical network using a
tunneling technology.
Underlay and Overlay
Overlay
• An overlay network is a logical network established on an
underlay network through VXLAN.
• It has independent forwarding and control plane protocols.
Overlay control plane • The underlay physical network is transparent to devices that
are not connected to VXLAN tunnel endpoints (VTEPs).
Underlay
• An underlay network consists of various physical network
Payload encapsulation devices and is a bearer network of an overlay network.
Overlay network on the data plane • After an overlay technology is implemented on an underlay
network, a logical network is formed based on the underlay
NVE Host network.
• The underlay network provides basic capabilities such as
reachability and reliability for the upper-layer overlay network.
NVE
• The underlay network has independent control and forwarding
plane protocols. Generally, OSPF or EBGP is used as the control
Host
Underlay network Host plane protocol, and IPv4 is used as the forwarding plane
protocol.
• The underlay network is logically isolated from the overlay
network and is unaware of overlay network routes.

10 Huawei Confidential
VXLAN Overlay Network Types
⚫ VXLAN overlay networks are classified into network overlay, host overlay, and hybrid overlay networks
based on the types of devices where VTEPs reside.

Network overlay Host overlay Hybrid overlay

Spine

VTEP VTEP VTEP


Leaf

VTEP VTEP VTEP


vSwitch vSwitch vSwitch vSwitch vSwitch vSwitch
VM VM VM VM VM VM VM VM VM VM VM VM

VTEPs at both ends of a VXLAN tunnel are VTEPs at both ends of a VXLAN tunnel are A VTEP of a VXLAN tunnel can be either a
physical switches. virtual switches (vSwitches). vSwitch or a physical switch.
Network overlay is classified into centralized Spine and leaf nodes only forward IP packets at
network overlay and distributed network overlay. a high speed.

11 Huawei Confidential
Overlay Protocol Development
⚫ To meet the requirements of multi-tenant and VM migration in cloud DCs, vendors are looking for an overlay protocol with optimal
performance and the most flexible applications. VXLAN proposed in RFC 7348 meets the requirements.

⚫ In the early stage, VXLAN is deployed in static mode, and VXLAN tunnels are manually created, which requires heavy configuration
workload. In addition, VXLAN does not have a control plane. VTEP discovery and host information collection are implemented
through traffic flooding on the data plane. As a result, a large amount of flooding traffic exists on the data center network (DCN).
To address these problems, VXLAN works with Ethernet Virtual Private Network (EVPN) to implement automatic VXLAN tunnel
establishment, automatic VTEP discovery, and host information advertisement.

⚫ To facilitate control and deployment on a large Layer 2 network, an SDN controller is introduced. The controller uses NETCONF to
control devices, automatically creates an overlay network, and collaborates with the cloud platform to implement automatic service
and network deployment.
iMaster NCE
Configuration delivery
Configuration
Spine delivery

EVPN EVPN
Manually configured Created using a protocol Created using a protocol
VTEP VTEP VTEP VTEP VTEP VTEP
VXLAN Tunnel VXLAN Tunnel VXLAN Tunnel
Leaf

12 Huawei Confidential
Contents

1. Background of VXLAN

2. Basic Concepts and Fundamentals of VXLAN


◼ Basic Concepts

▫ Fundamentals

3. EVPN VXLAN Fundamentals

4. VXLAN Deployment Cases in Typical Scenarios

13 Huawei Confidential
VXLAN Packet Format

VXLAN encapsulation Original data frame

Outer Outer VXLAN Inner Inner


UDP header Payload
Ethernet header IP header header Ethernet header IP header

• Source IP address: IP address of the


source VTEP of a VXLAN tunnel

• Destination IP address: IP address of the VXLAN Flags


Reserved VNI Reserved
(00001000)
destination VTEP of a VXLAN tunnel
8 bits 24 bits 24 bits 8 bits

Source UDP port Destination UDP port


Length Checksum
(Hash value) (4789)

14 Huawei Confidential
NVE VTEP VNI and BD Access Mode Gateway

Basic Concepts of VXLAN: NVE


⚫ Network Virtualization Edge (NVE):
 A network entity that implements network virtualization functions. A hardware or software switch can work as
an NVE.
 NVEs run VXLAN and construct a Layer 2 virtual network over a Layer 3 network. SW1 and SW2 in the figure
are NVEs.

IP network

PC1 PC2
192.168.1.1/24 192.168.1.2/24

VXLAN tunnel
SW1 (NVE) SW2 (NVE)

15 Huawei Confidential
NVE VTEP VNI and BD Access Mode Gateway

Basic Concepts of VXLAN: VTEP


⚫ VXLAN tunnel endpoint (VTEP):
 A VTEP is located on an NVE and performs VXLAN encapsulation and decapsulation.
 In the outer IP header of VXLAN packets, the source IP address is the IP address of the source VTEP, and the
destination IP address is the IP address of the destination VTEP.
VTEP VTEP
PC1 1.1.1.1/32 2.2.2.2/32 PC2
192.168.1.1/24 192.168.1.2/24

VXLAN tunnel

SW1 (NVE) SW2 (NVE)

Ethernet IP Ethernet IP VXLAN Original data Ethernet IP


Payload UDP header Payload
header header header header header frame header header

Original data frame Source IP address: 1.1.1.1 Original data frame


Destination IP address: 2.2.2.2
16 Huawei Confidential

• A pair of VTEP IP addresses identifies a VXLAN tunnel.

• The source VTEP encapsulates packets and sends the encapsulated packets to the
destination VTEP through the VXLAN tunnel. After receiving the encapsulated
packets, the destination VTEP decapsulates the packets.

• Generally, the IP address of a loopback interface on a device is used as the VTEP


address.
NVE VTEP VNI and BD Access Mode Gateway

Basic Concepts of VXLAN: VNI and BD


⚫ VXLAN Network Identifier (VNI): ⚫ Bridge domain (BD):
 An L2 VNI is similar to a VLAN ID and identifies a Layer 2  VLANs are used to divide broadcast domains on a
broadcast domain. VMs in different broadcast domains traditional network. Similarly, BDs are used to divide
cannot communicate with each other at Layer 2. broadcast domains on a VXLAN network. A BD identifies a
large Layer 2 broadcast domain on a VXLAN network.
 An L3 VNI is used to identify a VPN instance. A Layer 3
VNI is associated with a VPN instance for inter-subnet  VNIs are mapped to BDs in 1:1 mode. Terminals in the
forwarding of VXLAN packets. same BD can communicate with each other at Layer 2.
 A tenant can have one or more VNIs. The VNI field has 24
bits, and a maximum of 16M tenants are supported.

PC1 PC2
BD 20 BD 20
192.168.1.1/24 192.168.1.2/24
L2 VNI 2000 SW1 SW2 L2 VNI 2000

VXLAN tunnel

Ethernet Ethernet IP VXLAN Original data Ethernet


IP header Payload UDP header IP header Payload
header header header header frame header

Original data frame VNI: 2000


17 Huawei Confidential
NVE VTEP VNI and BD Access Mode Gateway

Basic Concepts of VXLAN: VXLAN Access Modes


⚫ Service access points need to be configured on devices for service access to a VXLAN network. The
following two access modes are available:
 Access in Layer 2 sub-interface mode: For example, a Layer 2 sub-interface is created on SW1 and associated
with BD 10. Specific traffic on the sub-interface is then forwarded to BD 10.
 Access in VLAN binding mode: For example, VLAN 10 is configured on SW2 and associated with BD 10. All
traffic from VLAN 10 is then forwarded to BD 10.

PC1 PC2
192.168.1.1/24 192.168.1.2/24
SW1 SW2
1

GE 0/0/1.1 VXLAN tunnel


Bound to BD 10 2
BD 10
Bound to VLAN 10

18 Huawei Confidential

• After traffic from a traditional network enters a VXLAN network, the traffic is
bound to a BD through Layer 2 sub-interface or VLAN binding mode. A VXLAN
VNI is specified in the BD to implement mapping from the traditional VLAN
network to the VXLAN network.

• When VLAN binding mode is used for VXLAN access, a BD cannot be configured
with a VBDIF interface. Therefore, this mode applies only to Layer 2 service
access.
NVE VTEP VNI and BD Access Mode Gateway

Basic Concepts of VXLAN: Layer 2 and Layer 3 VXLAN


Gateways
Border
Border Layer 3 gateway

Leaf1 Leaf2 Leaf1 Leaf2


Layer 2 VXLAN tunnel Layer 2 Layer 2 Layer 2
gateway gateway gateway gateway

PC1 PC2 PC1 PC2


192.168.1.1/24 192.168.1.2/24 192.168.1.1/24 192.168.2.2/24

Layer 2 gateway: forwards traffic to a VXLAN network Layer 3 gateway: is used for inter-subnet
and is used for intra-subnet communication between communication between terminals on a VXLAN
terminals on the same VXLAN network. network and allows terminals to access external
networks (non-VXLAN networks).
19 Huawei Confidential
NVE VTEP VNI and BD Access Mode Gateway

Basic Concepts of VXLAN: VBDIF Interface


SW3
Layer 3
VBDIF 10 gateway VBDIF 20
192.168.1.254 192.168.2.254

PC1 SW1 SW2 PC2


192.168.1.1/24 Layer 2 gateway Layer 2 gateway 192.168.2.2/24

⚫ VLANIF interfaces are used for communication between broadcast domains on a traditional network.
Similarly, VBDIF interfaces are used for communication between BDs on a VXLAN network.
⚫ A VBDIF interface is a Layer 3 logical interface created for a BD on a Layer 3 VXLAN gateway.
⚫ VBDIF interfaces allow users on different network segments to communicate through a VXLAN
network, allow communication between VXLAN and non-VXLAN networks, and implement Layer 2
network access to a Layer 3 network.

20 Huawei Confidential
NVE VTEP VNI and BD Access Mode Gateway

Basic Concepts of VXLAN: Distributed and Centralized


Gateways
Centralized gateway Distributed gateway

Layer 3 Layer 2
gateway gateway
Layer 2 Layer 2/
Layer 2/
gateway Layer 3
Layer 3
gateway
gateway

PC1 PC2 PC3 PC1 PC2 PC3


192.168.1.1/24 192.168.2.1/24 192.168.1.3/24 192.168.1.1/24 192.168.2.1/24 192.168.1.3/24

The Layer 3 gateway is deployed on one device. All inter-subnet traffic is VTEPs function as both Layer 2 and Layer 3 gateways. Non-gateway nodes
forwarded by the gateway to implement centralized traffic management. are unaware of VXLAN tunnels and only forward VXLAN packets.
Advantage: Inter-subnet traffic is managed in a centralized manner, Advantage: A VTEP only needs to learn ARP entries of terminals
simplifying gateway deployment and management. connected to it. Therefore, the number of ARP entries supported is no
Disadvantage: The forwarding path is not optimal. The number of ARP longer a bottleneck on distributed VXLAN gateways, and the network
entries supported is a bottleneck. Because a centralized Layer 3 gateway is scalability is improved.
deployed, the gateway needs to maintain a large number of ARP entries Disadvantage: Compared with centralized gateway deployment, this
for terminals connected to the VXLAN network. mode is complex to configure and implement.

21 Huawei Confidential
Application of VXLAN in DCs
⚫ VXLAN can be applied to a DCN that uses a two-layer spine-leaf physical architecture.
⚫ It is recommended that a VXLAN network with distributed gateways be deployed in a DC. Spine nodes forward
packets based on routes and are unaware of VXLAN during traffic forwarding. Leaf nodes provide network access
for device resources such as servers, and perform VXLAN encapsulation and decapsulation.
⚫ All services in the DC are carried by the VXLAN network.

Spine Spine

Layer 3 interconnection links VXLAN On-demand VXLAN


without Layer 2 loops tunnel establishment
Dynamic routing protocols for
network reachability
Leaf Leaf Leaf

Physical network Spine-leaf architecture and VXLAN Overlay service network

22 Huawei Confidential
Contents

1. Background of VXLAN

2. Basic Concepts and Fundamentals of VXLAN


▫ Basic Concepts
◼ Fundamentals

3. EVPN VXLAN Fundamentals

4. VXLAN Deployment Cases in Typical Scenarios

23 Huawei Confidential
Tunnel Establishment MAC Address Learning Data Frame Forwarding

VXLAN Tunnel Establishment


⚫ A VXLAN tunnel is identified by a pair of VTEPs. Packets are encapsulated on VTEPs and then
transmitted in the VXLAN tunnel through routing. A VXLAN tunnel can be successfully
established as long as the VTEPs at both ends of the VXLAN tunnel have reachable routes to
each other at Layer 3.
⚫ VXLAN tunnels are classified into the following types based on the VXLAN tunnel creation
mode:
 Static VXLAN tunnels: created by manually configuring the local and remote VNIs, VTEP IP
addresses, and ingress replication lists.
 Dynamic VXLAN tunnels: dynamically established using BGP EVPN. After a BGP EVPN peer
relationship is established between VTEPs, the VTEPs use BGP EVPN routes to transmit VNIs and
VTEP IP addresses to dynamically establish a VXLAN tunnel.

24 Huawei Confidential
Tunnel Establishment MAC Address Learning Data Frame Forwarding

Static VXLAN Tunnel


⚫ A static VXLAN tunnel is created through manual configuration. The VXLAN tunnel can be established successfully
as long as the VTEPs at both ends of the tunnel have reachable routes to each other's IP address at Layer 3.
Source VTEP IP
address
interface nve 1 Destination VTEP
source 3.3.3.3 IP address
vni 100 head-end peer-list 1.1.1.1
PC3
L2 VNI VTEP3
PC1 172.16.1.3/24
3.3.3.3/32
172.16.1.1/24

VTEP1
1.1.1.1/32
PC2 VTEP2
172.16.2.2/24 interface nve 1 2.2.2.2/32 PC4
source 1.1.1.1 interface nve 1 172.16.2.4/24
vni 100 head-end peer-list 3.3.3.3 source 2.2.2.2
vni 200 head-end peer-list 2.2.2.2 vni 200 head-end peer-list 1.1.1.1

25 Huawei Confidential

• This section describes how to establish VXLAN tunnels in static mode. For details
on how to establish VXLAN tunnels in dynamic mode (BGP EVPN mode), see
"EVPN VXLAN Fundamentals."
Tunnel Establishment MAC Address Learning Data Frame Forwarding

VXLAN MAC Address Entries


⚫ VXLAN implements Layer 2 forwarding on the overlay network. Unicast data frames are still forwarded based on MAC address
entries.

⚫ When a VTEP receives a data frame from the local BD, the VTEP adds the source MAC address of the data frame to the MAC address
table of the BD. The outbound interface in the MAC address entry is the interface that receives the data frame.

⚫ This entry is used to guide the forwarding of data frames sent to terminals connected to the VTEP.
PC1
172.16.1.1/24
0000-0000-000A <S1>display mac-address bridge-domain 10
-------------------------------------------------------------------------------
MAC Address VLAN/VSI/BD Learned-From Type
-------------------------------------------------------------------------------
0000-0000-000a -/-/10 GE1/0/1.10 dynamic
S1
<S1>display mac-address bridge-domain 20
-------------------------------------------------------------------------------
PC2 MAC Address VLAN/VSI/BD Learned-From Type
172.16.2.2/24 -------------------------------------------------------------------------------
0000-0000-000B 0000-0000-000b -/-/20 GE1/0/1.20 dynamic
How can data frames be forwarded to the
device connected to the remote VTEP?

26 Huawei Confidential
Tunnel Establishment MAC Address Learning Data Frame Forwarding

Dynamic MAC Address Learning (1)


⚫ To forward data frames to a device connected to a remote VTEP, the local VTEP needs to learn the MAC address of
the remote device first.
⚫ Similar to the traditional MAC address entry generation process, the MAC address entry generation process depends
on packet exchange between hosts. Generally, MAC address entries are generated through ARP packet exchange.
SW1 learns the MAC address of PC1. SW2 learns the MAC address of PC1.
2 MAC Address BD Learned From 4 MAC Address BD Learned From
0000-0000-000A 10 Port1 0000-0000-000A 10 1.1.1.1

VTEP 1.1.1.1/32 VTEP 2.2.2.2/32

Port1 VXLAN tunnel (VNI 1000) Port1

SW1 (Layer 2 gateway) SW2 (Layer 2 gateway)


PC1 PC2
172.16.1.1/24 172.16.1.2/24
0000-0000-000A 0000-0000-000B
PC1 broadcasts an ARP SW1 performs VXLAN encapsulation for the ARP packet, and floods SW2 forwards the ARP
request packet. the VXLAN-encapsulated ARP packet to all VTEPs with the same VNI. packet to PC2.
1 3 5
Ethernet ARP Ethernet IP UDP VXLAN Original data Ethernet ARP
header frame header header header header frame header frame

27 Huawei Confidential

• The communication process between PC1 and PC2 is as follows:

▫ To communicate with PC2, PC1 broadcasts an ARP request frame to obtain


the MAC address of PC2.

▫ After receiving the frame, SW1 determines the BD ID, destination VXLAN
tunnel, and VNI of the traffic based on the service access point
configuration. In addition, SW1 learns the MAC address of PC1 and records
the BD ID and the interface that receives the frame in the corresponding
MAC address entry.

▫ SW1 performs VXLAN encapsulation for the ARP request packet and
forwards the encapsulated packet based on the ingress replication list.

▫ After receiving the VXLAN packet, SW2 decapsulates the packet to obtain
the original data frame. In addition, SW2 learns the MAC address of PC1
and binds the MAC address to the VTEP address of SW1.

▫ SW2 floods the ARP packet in the local BD. PC2 then receives the frame
and learns the ARP information of PC1.
Tunnel Establishment MAC Address Learning Data Frame Forwarding

Dynamic MAC Address Learning (2)

SW1 learns the MAC address of PC2. SW2 learns the MAC address of PC2.
9 MAC Address BD Learned From 7 MAC Address BD Learned From
0000-0000-000A 10 Port1 0000-0000-000A 10 1.1.1.1
0000-0000-000B 10 2.2.2.2 0000-0000-000B 10 Port1

VTEP 1.1.1.1/32 VTEP 2.2.2.2/32

Port1 VXLAN tunnel (VNI 1000) Port1

SW1 (Layer 2 gateway) SW2 (Layer 2 gateway)


PC1 PC2
172.16.1.1/24 172.16.1.2/24
0000-0000-000A SW2 searches the MAC address table, encapsulates the ARP data 0000-0000-000B
SW1 forwards the ARP frame according to the MAC address entry {0000-0000-000A, 10, PC2 sends a unicast ARP
packet to PC1. 1.1.1.1}, and sends the encapsulated ARP data frame to 1.1.1.1. reply packet.
10 8 6
Ethernet ARP Ethernet IP VXLAN Original data Ethernet ARP
UDP header
header frame header header encapsulation frame header frame

PC1 and PC2 learn the ARP entries of each other, and SW1 and SW2 learn the MAC addresses of PC1 and PC2. This
process is also called flood and learn.

28 Huawei Confidential

▫ PC2 sends a unicast ARP reply packet.

▫ SW2 has learned the MAC address of PC1 and forwards the packet in
unicast mode. SW2 learns the source MAC address of PC2 and adds it to
the MAC address table.

▫ SW2 performs VXLAN encapsulation for the ARP reply packet and sends the
encapsulated packet to the remote VTEP with the IP address 1.1.1.1.

▫ After receiving the VXLAN packet, SW1 decapsulates the packet and records
the source MAC address of PC2 in the MAC address table. The outbound
interface of the corresponding MAC address entry is the remote VTEP.

▫ SW1 forwards the data frame to PC1.

• PC1 and PC2 learn ARP entries of each other, and SW1 and SW2 learn
corresponding MAC addresses.
Tunnel Establishment MAC Address Learning Data Frame Forwarding

Intra-Subnet Forwarding of Unicast Packets with Known


Destination Addresses
MAC Address BD Learned From MAC Address BD Learned From
AAAA-0000-0001 10 Port1 AAAA-0000-0001 10 1.1.1.1
AAAA-0000-0002 10 2.2.2.2 AAAA-0000-0002 10 Port1

SW1 searches its MAC address table for the MAC SW2 searches its MAC address table for the MAC
2 4 address of PC2 and finds the matching entry.
address of PC2 and finds the matching entry.

VTEP 1.1.1.1/32 VTEP 2.2.2.2/32

Port1 VXLAN tunnel (VNI 1000) Port1

SW1 (Layer 2 gateway) SW2 (Layer 2 gateway)


PC1 PC2
192.168.1.1/24 192.168.1.2/24
AAAA-0000-0001 AAAA-0000-0002
SW1 performs VXLAN encapsulation for the packet and adds a new
PC1 sends a unicast IP header to the packet. The destination IP address of the packet is SW2 forwards the
frame to PC2. 2.2.2.2, which is the IP address of the remote VTEP SW2. packet to PC2.
1 3 5
Ethernet Ethernet IP VXLAN Original data Ethernet
Payload UDP header Payload
header header header header frame header

• Source IP address: 1.1.1.1


• Destination IP address: 2.2.2.2
29 Huawei Confidential
Tunnel Establishment MAC Address Learning Data Frame Forwarding

Inter-Subnet Forwarding of Unicast Packets


Outbound
Destination/Mask Next Hop
Interface
SW3 192.168.1.0/24 VBDIF10 192.168.1.254
VBDIF 10 Layer 3 VBDIF 20 3 192.168.2.0/24 VBDIF20 192.168.2.254
gateway
192.168.1.254 192.168.2.254
Routing table of SW3
00AB-09FF-1111 00AB-09FF-2222
PC1 192.168.1.1/24 Server2 192.168.2.1/24
SW1 SW2
Default gateway: Default gateway:
Layer 2 gateway Layer 2 gateway
192.168.1.254 VTEP 3.3.3.3/32 192.168.2.254
4

VTEP 1.1.1.1/32 VTEP 2.2.2.2/32


1 5
PC1 sends a unicast frame to PC2.
Ethernet Ethernet
Payload Payload
header header
2
Ethernet IP VXLAN Original data Ethernet IP VXLAN Original data
UDP header UDP header
header header header frame header header header frame

• Source IP address: 1.1.1.1 VNI: 1000 • Source IP address: 3.3.3.3 VNI: 2000
• Destination IP address: 3.3.3.3 • Destination IP address: 2.2.2.2

30 Huawei Confidential

• PC1 wants to communicate with PC2. After local calculation, PC1 finds that it is
on a different subnet from PC2. PC1 then sends the packet to the gateway.

• The destination MAC address of the data frame from PC1 to PC2 is 00AB-09FF-
1111 (gateway MAC address). After receiving the data frame, SW1 searches the
Layer 2 forwarding table and finds that the outbound interface is the remote
VTEP (Layer 3 gateway). SW1 then adds a VXLAN header (VNI = 1000) to the
data frame and sends the packet to SW3.

• After receiving the packet, SW3 performs VXLAN decapsulation for the packet
and finds that the destination MAC address of the original data frame is 00AB-
09FF-1111, which is the MAC address of VBDIF 10 on SW3. SW3 needs to search
the Layer 3 forwarding table to forward the data frame.

• SW3 searches the routing table and finds that the destination IP address
192.168.2.1 matches the direct route generated by VBDIF 20 on SW3. SW3 then
searches the ARP table for the destination MAC address of the packet and
searches the MAC address table for the outbound interface of the packet. On
SW3, the outbound interface in the MAC address entry corresponding to
192.168.2.1 is the remote VTEP with the IP address 2.2.2.2. SW3 performs VXLAN
encapsulation for the packet and sends the encapsulated packet to SW2.

• After receiving the packet, SW2 performs VXLAN decapsulation for the packet
and finds that the destination MAC address is not the MAC address of any
interface on SW2. SW2 searches the Layer 2 forwarding table and forwards the
packet from a local interface based on the MAC address table.
Tunnel Establishment MAC Address Learning Data Frame Forwarding

BUM Traffic Forwarding


⚫ When transmitting broadcast, unknown unicast, and multicast traffic (BUM traffic), the local
VTEP sends multiple copies of the traffic to remote VTEPs in the ingress replication list,
implementing flood forwarding on the overlay network.
PC2
VTEP3 172.16.2.2/24
2 3.3.3.3/32
1

PC1
VTEP1
172.16.2.1/24
1.1.1.1/32
VTEP2
2.2.2.2/32 PC3
BUM traffic 172.16.2.3/24
VXLAN header

UDP

Src IP 1.1.1.1, Dst IP 2.2.2.2

31 Huawei Confidential
Contents

1. Background of VXLAN

2. Basic Concepts and Fundamentals of VXLAN

3. EVPN VXLAN Fundamentals


◼ Basic Concepts

▫ BGP EVPN Routes

▫ BGP EVPN Feature

4. VXLAN Deployment Cases in Typical Scenarios

32 Huawei Confidential
Using BGP EVPN as the Control Plane Protocol
BGP EVPN not used BGP EVPN used as the control plane protocol

• Enable BGP EVPN on devices and establish BGP EVPN peer


relationships between them.
VXLAN tunnels • Devices advertise BGP EVPN routes to each other to
complete related operations on the VXLAN control plane.
• VXLAN tunnels are automatically established through BGP
EVPN, and forwarding entries are dynamically updated
Problem 1: A total of N x (N-1)/2 tunnels need to be created for N
nodes, causing heavy configuration workload. through BGP EVPN.

RR

Traffic flooding BGP EVPN


peer
relationships

Problem 2: The flood and learn mechanism is used to learn MAC In actual deployment, a route reflector (RR) can be used to further
addresses, causing a large amount of flooding traffic. reduce the number of established BGP EVPN peer relationships.

33 Huawei Confidential

• The static VXLAN solution does not have a control plane. VTEP discovery and
learning of host information (including IP addresses, MAC addresses, VNIs, and
gateway VTEP IP addresses) are performed through traffic flooding on the data
plane. As a result, there is a lot of flooding traffic on VXLAN networks. To
address this problem, BGP EVPN is introduced as the control plane of VXLAN.
BGP EVPN allows VTEPs to exchange BGP EVPN routes to implement automatic
VTEP discovery and host information advertisement, preventing unnecessary
traffic flooding.

• Problems in configuring VXLAN in static mode:

▫ If N devices need to establish VXLAN tunnels, you need to manually


configure the ingress replication list a maximum of N x (N-1)/2 times.

▫ A static VXLAN tunnel only has the data forwarding plane.

▫ Remote MAC addresses can be learned only through broadcast ARP


packets.
Overview of BGP EVPN
⚫ BGP EVPN extends BGP by defining several new types of BGP EVPN routes using Network Layer Reachability
Information (NLRI) in the MP_REACH_NLRI attribute.
⚫ These BGP EVPN routes can be used to transmit VTEP addresses and host information. Therefore, BGP EVPN is
applied to VXLAN networks to transfer VTEP discovery and host information learning from the data plane to the
control plane.

BGP EVPN peer relationship

SW1 SW2

• Type 2 routes (MAC/IP routes): are used to advertise host MAC addresses, ARP entries, and IP routes.

• Type 3 routes (inclusive multicast routes): are used to transmit Layer 2 VNI and VTEP IP address information, implement
automatic VTEP discovery, dynamic VXLAN tunnel establishment, and BUM packet forwarding.

• Type 5 routes (IP prefix routes): are used to advertise host IP routes and external network routes.

34 Huawei Confidential

• In a network virtualization overlay (NVO) scenario, BGP EVPN is used together


with VXLAN as the control plane protocol for VXLAN.
EVPN NLRI
⚫ EVPN NLRI is carried in the path attribute MP_REACH_NLRI. The address family identifier (AFI) is 25,
indicating L2VPN. The sub-address family identifier (SAFI) is 70.

Path Attribute - MP_REACH_NLRI


Flags: Optional, Non-transitive
Type Code: MP_REACH_NLRI (14)
Length
Address family identifier (AFI): Layer-2 VPN (25)
Subsequent address family identifier (SAFI): EVPN (70)
Next hop network address (4 bytes)
Route Type (1 octet)
EVPN NLRI
Length (1 octet)
Route Type specific (variable)

35 Huawei Confidential
Extended Community
⚫ Similar to MPLS VPN, BGP EVPN uses EVPN instances to control route sending and receiving. Similar to
traditional IP VPN instances, EVPN instances also have RDs and RTs, and the extended community
attribute is used to carry EVPN instance RTs during route transmission.
⚫ In addition to the RT, BGP EVPN adds some new subtypes to the extended community attribute: MAC
Mobility and EVPN Router's MAC Extended Community.

Path Attribute - EXTENDED_COMMUNITIES


Flags: Optional, Transitive
Type Code: EXTENDED_COMMUNITIES (16)
Length
Route Target (RT)
MAC Mobility Extended Community

EVPN Router's MAC Extended Community

36 Huawei Confidential

• For details about RDs and RTs, see HCIP – Datacom - Advanced Routing &
Switching Technology - 08 MPLS VPN Basics.
EVPN VPN Instance
⚫ After an EVPN instance is bound to a BD, MAC address entries in the BD are transmitted through BGP EVPN routes carrying the
export VPN target (ERT) of the EVPN instance bound to the BD. After receiving the EVPN routes, the remote end compares the
import VPN target (IRT) of the local EVPN instance with the ERT, adds the EVPN routes to the routing table of the corresponding
EVPN instance, parses the EVPN routing table to obtain MAC address entries, and adds the MAC address entries to the MAC address
table of the BD bound to the local EVPN instance.
BGP Update message
EVPN RT = 202:1
EVPN route

VTEP 1.1.1.1/32 VTEP 2.2.2.2/32

VXLAN tunnel
PC1 SW1 SW2 PC2

EVPN RD: 20:1 If the ERT and IRT are not specified EVPN RD: 20:1
EVPN ERT: 202:1 and only the RT is specified, the ERT EVPN ERT: 200:1
EVPN IRT: 200:1 and IRT are the same. EVPN IRT: 202:1

37 Huawei Confidential
Contents

1. Background of VXLAN

2. Basic Concepts and Fundamentals of VXLAN

3. EVPN VXLAN Fundamentals


▫ Basic Concepts
◼ BGP EVPN Routes

▫ BGP EVPN Feature

4. VXLAN Deployment Cases in Typical Scenarios

38 Huawei Confidential
Type 2 Route Type 3 Route Type 5 Route

MAC/IP Route (1)


⚫ Type 2 routes (MAC/IP routes): are used to advertise MAC addresses, ARP entries, and host IP routes.

Packet format Field description

Route Distinguisher (8 bytes) Route distinguisher (RD) configured for an EVPN instance.
Ethernet Segment Identifier (10 bytes) Unique ID of the connection between local and remote devices.
Ethernet Tag ID (4 bytes) VLAN ID configured on the local device.
MAC Address Length (1 byte) Length of the host MAC address carried in the route.
MAC Address (6 bytes) Host MAC address carried in the route.
IP Address Length (1 byte) Mask length of the host IP address carried in the route.
IP Address (0, 4, or 16 bytes) Host IP address carried in the route.
MPLS Label1 (3 bytes) Layer 2 VNI carried in the route.
MPLS Label2 (0 or 3 bytes) Layer 3 VNI carried in the route.

39 Huawei Confidential
Type 2 Route Type 3 Route Type 5 Route

MAC/IP Route (2)


⚫ Contents carried in BGP EVPN Type 2 routes vary in different scenarios.

Host MAC address advertisement Host ARP advertisement Host IP route advertisement
Route Distinguisher Route Distinguisher Route Distinguisher
Ethernet Segment Identifier Ethernet Segment Identifier Ethernet Segment Identifier
Ethernet Tag ID Ethernet Tag ID Ethernet Tag ID
MAC Address Length = MAC address length MAC Address Length = MAC address length MAC Address Length = MAC address length
MAC Address = MAC address MAC Address = MAC address MAC Address = MAC address
IP Address Length IP Address Length = IP address length IP Address Length = IP address length
IP Address IP Address = IP address IP Address = IP address
MPLS Label1 = VNI (Layer 2) MPLS Label1 = VNI (Layer 2) MPLS Label1 = VNI (Layer 2)
MPLS Label2 MPLS Label2 MPLS Label2 = VNI (Layer 3)

When hosts on the same subnet In a centralized VXLAN gateway When hosts on different subnets
communicate with each other, host scenario, ARP routes containing host IP communicate with each other in a
MAC addresses containing host MAC address information, MAC address distributed gateway scenario, IRB routes
address information and Layer 2 VNIs information, and Layer 2 VNIs are containing host MAC address information,
are advertised. advertised. IP address information, Layer 2 VNIs, and
Layer 3 VNIs are advertised.

40 Huawei Confidential

• The contents of the first three fields (RD, Ethernet Segment Identifier, and
Ethernet Tag ID) of BGP EVPN Type 2 routes are the same in different scenarios,
and the contents of the last six fields vary in different scenarios.
Type 2 Route Type 3 Route Type 5 Route

Host MAC Address Advertisement


⚫ This slide shows how BGP EVPN uses Type 2 routes to implement dynamic MAC address learning. This function is
used to implement intra-subnet communication through VXLAN.
MAC Address BD Learned From MAC Address BD Learned From
2 0000-0000-0001 10 Port1 4 0000-0000-0001 10 1.1.1.1
Learn the MAC
address of PC1. VTEP 1.1.1.1/32 VTEP 2.2.2.2/32

VXLAN tunnel
SW1 (Layer 2 gateway) SW2 (Layer 2 gateway)
PC1 172.16.1.1/24 PC2
0000-0000-0001 BD 10 3 BGP Update message BD 10
L2 VNI 10 L2 VNI 10
1 RD 10:1 EVPN RT = 10:1 RD 20:1
Send traffic.
ERT 10:1 Type 2 route IRT 10:1

RD = 10:1
MAC address = 0000-0000-0001
VNI = 10

Host MAC address Host ARP Host IP route


advertisement advertisement advertisement

41 Huawei Confidential

• Intra-subnet host MAC address advertisement:

▫ PC1 generates data traffic and sends the traffic to SW1.

▫ SW1 obtains the MAC address of PC1 and creates an entry in the MAC
address table to record the MAC address, BD ID, and inbound interface.

▫ SW1 generates a BGP EVPN route based on this entry and sends the route
to SW2. The route carries the RT value (extended community attribute) of
the local EVPN instance and a Type 2 route (MAC route). In the MAC route,
the MAC address of PC1 is stored in the MAC Address field, and the Layer 2
VNI is stored in the MPLS Label1 field.

▫ After receiving the BGP EVPN route from SW1, SW2 checks the RT (similar
to the RT in MPLS VPN) carried in the route. If the RT is the same as the
import RT of the local EVPN instance, SW2 accepts the route. Otherwise,
SW2 discards the route. After accepting the route, SW2 obtains the MAC
address of PC1 and the mapping between the BD ID and the VTEP IP
address (carried in the next hop network address field of MP_REACH_NLRI)
of SW1, and generates the MAC address entry of PC1 in the local MAC
address table. Based on the next hop, the outbound interface of the MAC
address entry recurses to the VXLAN tunnel destined for SW1.
Type 2 Route Type 3 Route Type 5 Route

Host ARP Advertisement


⚫ This slide shows how Type 2 routes implement host ARP advertisement when BGP EVPN is used to
construct a DCN in a distributed gateway scenario.
VTEP 1.1.1.1/32 VTEP 2.2.2.2/32

VXLAN tunnel
SW1 (Layer 3 gateway) SW2 (Layer 3 gateway)
PC1 172.16.1.1/24 BGP Update message
0000-0000-0001 BD 10 BD 10
L2 VNI 10 L2 VNI 10
RD 10:1 2 EVPN RT = 10:1 RD 20:1
ERT 10:1 Type 2 route IRT 10:1
RD = 10:1 3
1 MAC address = 0000-0000-0001
SW1 learns the ARP IP address = 172.16.1.1 The Layer 3 gateway
entry of PC1. L2 VNI=10 SW2 obtains ARP
information of PC1.

When BGP EVPN is used in a centralized gateway scenario, the inter-subnet packet forwarding process is similar to that in a static
VXLAN scenario, and is not described here.
Host MAC address Host ARP Host IP route
advertisement advertisement advertisement

42 Huawei Confidential

• A MAC/IP route can carry both the MAC address and IP address of a host. As
such, this type of route can be used to transmit host ARP entries between VTEPs,
thereby implementing host ARP advertisement. The MAC Address and MAC
Address Length fields identify the MAC address of the host, whereas the IP
Address and IP Address Length fields identify the IP address of the host. In this
case, MAC/IP routes are also called ARP routes. Host ARP advertisement applies
to the following scenarios:

▫ ARP broadcast suppression. After a Layer 3 gateway learns the ARP entry of
a host on its subnet, it generates host information that contains the host IP
and MAC addresses, L2VNI, and gateway's VTEP IP address. The Layer 3
gateway then advertises an ARP route carrying the host information to a
Layer 2 gateway. When the Layer 2 gateway receives an ARP request, it
searches for host information corresponding to the destination IP address in
the request. If the host information exists, the gateway replaces the
broadcast MAC address in the ARP request with the destination unicast
MAC address, and unicasts the packet, thereby implementing ARP
broadcast suppression.
Type 2 Route Type 3 Route Type 5 Route

Inter-Subnet Communication in a Distributed Gateway


Scenario
⚫ In the distributed gateway networking, VTEPs function as both Layer 2 and Layer 3 gateways. In this networking, inter-subnet
communication can be implemented in different modes. According to the processing mode of the ingress VTEP that receives packets,
inter-subnet communication can be classified into asymmetric integrated routing and bridging (IRB) and symmetric IRB.

Inter-subnet forwarding between VLANs through VLANIF interfaces Inter-subnet forwarding between BDs through VBDIF interfaces

2 Routing VBDIF 10 VBDIF 10


VBDIF 20 VBDIF 20

VLANIF 10 VLANIF 20

1 Bridge 3 Bridge VXLAN tunnel

PC1 PC2
172.16.2.1/24 VTEP1 VTEP2 172.16.1.2/24

PC1 PC2 In the distributed gateway networking, VBDIF interfaces 10 and


172.16.10.1/24 172.16.20.2/24 20 can be created on both VTEP1 and VTEP2. How is the
MAC 1 MAC 3 routing process between VBDIF interfaces complete during
Host MAC address Host ARP Host IP route
inter-subnet communication?
advertisement advertisement advertisement

44 Huawei Confidential

• Details about inter-subnet forwarding between VLANs through VLANIF


interfaces:

▫ Based on the local IP address, local mask, and peer IP address, PC1 finds
that PC2 is not on the same network segment as itself. Therefore, PC1
determines that the communication is Layer 3 communication and sends
the traffic destined for PC2 to the gateway. In the data frame sent by PC1,
the source MAC address is MAC1 and the destination MAC is MAC2.

▫ After receiving a packet destined for PC2 from PC1, the switch decapsulates
the packet and finds that the destination MAC address is the MAC address
of VLANIF 10. Therefore, the switch considers that the packet is destined for
itself and sends the packet to the routing module for further processing.

▫ The routing module parses the packet and finds that the destination IP
address is 192.168.20.2, which is not an IP address of a local interface.
Therefore, the packet needs to be forwarded at Layer 3. After the routing
table is searched, a direct route generated by VLANIF 20 is matched.

▫ Because the matched route is a direct route, the packet has reached the
last hop. Therefore, the switch searches the ARP table for 192.168.20.2 to
obtain the MAC address of the host with the IP address 192.168.20.2, and
sends the MAC address to the switching module for re-encapsulation into a
data frame.
Type 2 Route Type 3 Route Type 5 Route

Asymmetric IRB
⚫ Asymmetric IRB: The ingress VTEP searches both the Layer 3 and Layer 2 forwarding tables for traffic forwarding at
the same time, and the egress VTEP searches only the Layer 2 forwarding table for traffic forwarding. This
forwarding mode is called asymmetric forwarding because the ingress and egress VTEPs perform different
Outer IP
operations.
VBDIF 10 UDP VBDIF 10
VBDIF 20 VXLAN header (VNI 100) VBDIF 20
VTEP1 Source MAC address: VBDIF 10 MAC VTEP2
1.1.1.1 Destination MAC address: MAC B 2.2.2.2 6 VTEP2 sends the data
1 VTEP1 sends the data frame to VTEP2 through frame to PC2.
PC1 sends a unicast 4
frame to PC2. the VXLAN tunnel.
VXLAN tunnel

VBDIF 10 searches the MAC address


BD 20 BD 10 3 table of BD 10 and finds that the BD 10
VNI 200 VNI 100 destination MAC address (learned VNI 100
through a Type 2 route) is the MAC
address of the remote VTEP.
5 VTEP2 searches the Layer 2 PC2
PC1
forwarding table in BD 10 172.16.1.2/24
172.16.2.1/24
2 VBDIF 20 searches the routing table corresponding to VNI 100. MAC B
MAC A
and sends the frame to VBDIF 10.
Host MAC address Host ARP Host IP route
advertisement advertisement advertisement

46 Huawei Confidential

• During asymmetric IRB, host IP routes are not advertised between VTEPs. That is,
VTEP1 and VTEP2 do not advertise 32-bit host routes (generated based on ARP
information) generated by the local downstream PCs between them. Therefore,
VTEP1 searches the routing table in step 2, and only the direct route generated
by VBDIF 10 can be matched.

• In step 5, VTEP2 decapsulates the VXLAN packet and finds that the destination
MAC address is not the MAC address of the local VBDIF interface corresponding
to the BD. Therefore, VTEP2 searches the Layer 2 forwarding table for the MAC
address entry of the BD based on the VNI carried in the packet, and then
forwards the packet at Layer 2.
Type 2 Route Type 3 Route Type 5 Route

Symmetric IRB
⚫ Symmetric IRB: Both the ingress and egress VTEPs search the Layer 3 forwarding table for traffic forwarding.
⚫ Compared with asymmetric IRB, the concepts of an IP VPN instance and its bound Layer 3 VNI are added. (In asymmetric IRB, the
VNI in the VXLAN header of packets transmitted between VTEPs is a Layer 2 VNI.) A VBDIF interface needs to be bound to an IP VPN
instance. In this case, route learning and data forwarding of the VBDIF interface are restricted in the IP VPN instance, which is similar
to the implementation in MPLS VPN.
VTEP1 VTEP2
1.1.1.1 2.2.2.2

VXLAN tunnel

VBDIF 20 VBDIF 10
IP Bind VPN-Instance VPN1 IP Bind VPN-Instance VPN1
BD 20 VTEPs exchange IRB routes (with an additional BD 10
Layer 3 VNI). The learning of IRB routes
between BD 20 of VTEP1 and BD 10 of VTEP2 is
IP VPN-Instance VPN1 controlled by the RTs carried in the routes. This IP VPN-Instance VPN1
VXLAN VNI 1000 (L3 VNI) mechanism is similar to that of VPNv4 routes in VXLAN VNI 1000 (L3 VNI)
RD 203:1 MPLS VPN. RD 103:1
RT 10:1
RT 10:1
Host MAC address Host ARP Host IP route
advertisement advertisement advertisement

47 Huawei Confidential

• Huawei devices implement symmetric IRB.


Type 2 Route Type 3 Route Type 5 Route

EVPN RT and IP VPN RT (1)


⚫ After an IP VPN instance is added, the RT carried in a Type 2 route advertised by BGP EVPN is still an
EVPN RT. The only difference is that the remote end processes the received route differently.
 If the RT carried in the route is the same as the import RT of the local EVPN instance, the route is accepted.
After the EVPN instance obtains an IRB route, it can extract an ARP route from the IRB route to implement host
ARP advertisement.
 If the RT carried in the route is the same as the import RT (EVPN) of the local IP VPN instance, the route is
accepted. The VPN instance then obtains the IRB route carried in the route, extracts the host IP address and
Layer 3 VNI from the route, saves the host IP route in the routing table, and recurses the outbound interface
based on the next hop of the route. The final recursion result is the VXLAN tunnel pointing to the VTEP.

Host MAC address Host ARP Host IP route


advertisement advertisement advertisement

48 Huawei Confidential

• In a BGP EVPN scenario, to use the RTs of an IP VPN instance to control the
sending and receiving of EVPN routes, run the vpn-target evpn command to
configure RTs for the IP VPN instance. Then, the export RT attribute is carried in
the EVPN route to be sent to the remote BGP EVPN peer, the import RT attribute
is used to determine which EVPN routes can be added to the routing table of the
local IP VPN instance address family by matching the import RT attribute with
the RT attribute carried in the EVPN route.

• Note: The RTs configured using the vpn-target evpn command are called RTs
(EVPN).
Type 2 Route Type 3 Route Type 5 Route

EVPN RT and IP VPN RT (2)


⚫ A route is discarded only when the RT carried in the route is different from the EVPN IRT and IP VPN
IRT (EVPN).
BGP Update message
BD 20
1 EVPN RT = 20:1
EVPN IRT 20:1
EVPN route

VBDIF 20
IP VPN IRT 20:1
VTEP1 VTEP2
BD 20
The RT carried in the
EVPN ERT 20:1 EVPN route route is the same as
3
the IRT of the IP VPN
2 The RT carried in the
instance.
route is the same as the
IRT of the EVPN instance.

EVPN route Host IP route


Host MAC address Host ARP Host IP route BGP EVPN routing table IP VPN routing table
advertisement advertisement advertisement

49 Huawei Confidential

• VTEP1 sends a Type 2 BGP EVPN route (IRB type). The route carries the ERT
(20:1) of the EVPN instance bound to the BD to which the route belongs.

• After receiving the BGP Update message, VTEP2 checks whether the RT (20:1)
carried in the extended attribute of the BGP Update message is the same as the
IRT of the local EVPN instance and the IRT (EVPN) of the IP VPN instance. If the
IRT is the same as that of the EVPN instance bound to BD 20 and that of the IP
VPN instance bound to VBDIF 20, the device adds the EVPN route to the EVPN
routing table of BD 20 and the IP route contained in the EVPN route to the
routing table of the IP VPN instance corresponding to VBDIF 20.
Type 2 Route Type 3 Route Type 5 Route

Symmetric IRB: Host IP Route Advertisement (IRB Route)


VTEP2 2.2.2.2
VTEP1 1.1.1.1 Router MAC: MAC B

VXLAN tunnel
SW1 (Layer 3 gateway) SW2 (Layer 3 gateway)
PC1 172.16.2.1/24 BGP Update message PC2 172.16.1.2/24
MAC D IP VPN-Instance VPN1 MAC A
EVPN RT: 10:1 VXLAN VNI 1000 (L3 VNI)
2 Router MAC: MAC B RD 103:1
1
Destination/ Next Outbound
L3 VNI
Mask Hop Interface RT 10:1 SW2 learns the
Type 2 route ---------------------------
VXLAN ARP entry of PC2
172.16.1.2/32 1000 2.2.2.2 RD 10: 1 BD 10
tunnel
EVPN VPN-Instance BD_10
and generates an
Host route = 172.16.1.2/32 IRB route.
RD 10:1
3 The Layer 3 gateway SW1 obtains MAC address = MAC A L2 VNI 100
the host route containing the router Layer 2 VNI = 100 RT 10:1
MAC address of VTEP2 to PC2. Layer 3 VNI = 1000

BGP EVPN uses the EVPN Router's MAC Extended Community attribute to transmit the
VTEP's router MAC address, which is the MAC address of the NVE interface.

Host MAC address Host ARP Host IP route


advertisement advertisement advertisement

50 Huawei Confidential
Type 2 Route Type 3 Route Type 5 Route

Symmetric IRB: Communication Process


Outer IP

UDP
VXLAN header (L3 VNI 1000)
Source MAC address: VTEP1's system MAC address
VTEP1 Destination MAC address: MAC B
VTEP2 2.2.2.2
1.1.1.1 Router MAC: MAC B
1 VTEP1 sends the data frame to VTEP2 through the 5 VTEP2 sends the data
PC1 sends a unicast 3
frame to PC2. VXLAN tunnel. frame to PC2.
VXLAN tunnel

IP VPN instance VPN1 IP VPN instance VPN1


BD 20 BD 10
L3 VNI 1000 L3 VNI 1000
4 VTEP2 searches for the route in the PC2
PC1 2 VBDIF 20 searches the routing table of the
routing table of the IP VPN instance 172.16.1.2/24
172.16.2.1/24 IP VPN instance VPN1 for a route and finds
corresponding to VNI 1000, finds MAC A
MAC D that the next hop of the matched route
that the route is a direct route of the
(32-bit host route) is the remote VTEP of
local VBDIF interface, and searches
the VXLAN tunnel.
the Layer 2 forwarding table in the
corresponding BD.
Host MAC address Host ARP Host IP route
advertisement advertisement advertisement

51 Huawei Confidential

• During symmetric IRB, VTEPs exchange 32-bit host routes generated based on
ARP information. Therefore, VTEP1 searches the routing table for the 32-bit host
route transmitted by VETP2. Even if VBDIF 10 and the corresponding direct route
exist on VTEP1, VTEP1 still forwards packets based on the 32-bit host route
according to the longest match rule.

• In step 4, VTEP2 decapsulates the VXLAN packet and finds that the destination
MAC address of the inner data frame is VTEP2's router MAC address (MAC B).
VTEP2 then determines to search the Layer 3 table for traffic forwarding. VTEP2
finds the corresponding IP VPN instance based on VNI 1000 and searches for the
corresponding route in the routing table of the IP VPN instance. It finds the direct
route matching VBDIF 10, searches the local MAC address table, and sends the
packet to a local host PC2.
Type 2 Route Type 3 Route Type 5 Route

Description of Type 3 Routes


⚫ Type 3 route (inclusive multicast route)
 Inclusive multicast routes are used for automatic VTEP discovery and dynamic VXLAN tunnel establishment on
the VXLAN control plane.
 Through these routes, VTEPs that function as BGP EVPN peers transmit Layer 2 VNIs and VTEPs' IP addresses.
 The Originating Router's IP Address and MPLS Label fields carried in the routes indicate the local VTEP's IP
address and Layer 2 VNI, respectively.
Route Distinguisher (8 bytes) Route distinguisher (RD) configured for an EVPN instance.
Ethernet Tag ID (4 bytes) VLAN ID configured on the local device, which is all 0s in this type of route.

NLRI format IP Address Length (1 byte) Mask length of the local VTEP's IP address carried in the route.
Originating Router's IP Address (4 or 16 bytes) Local VTEP's IP address carried in the route.

Flags (1 byte) This field is not used in VXLAN scenarios.


Tunnel Type (1 byte) In VXLAN scenarios, the value can be 6: Ingress Replication.
PMSI attribute
MPLS Label (3 bytes) = Layer 2 VNI Layer 2 VNI carried in the route.
Tunnel Identifier (variable length) This field is the local VTEP's IP address in VXLAN scenarios.

52 Huawei Confidential

• Provider Multicast Service Interface (PMSI): an optional transitive BGP attribute.


In VXLAN scenarios, the Tunnel Type field has a fixed value of 6, carrying the
VTEP's IP address and Layer 2 VNI of the sender.
Type 2 Route Type 3 Route Type 5 Route

VXLAN Tunnel Establishment


⚫ VTEPs exchange Layer 2 VNI and VTEP IP address information through Type 3 routes. If there are
reachable routes between the local and remote VTEPs' IP addresses at Layer 3, a VXLAN tunnel is
established between the VTEPs. Additionally, if the local and remote VNIs are the same, an ingress
replication list is created for BUM packet forwarding.

VTEP 1.1.1.1/32 BGP EVPN peer relationship VTEP 2.2.2.2/32

SW1 BGP Update message 1 BGP Update message SW2

Type 3 route Type 3 route


VTEP address = VTEP address =
1.1.1.1 2.2.2.2
VNI = 1000 VNI = 1000

53 Huawei Confidential
Type 2 Route Type 3 Route Type 5 Route

Description of Type 5 Routes


⚫ Type 5 route (IP prefix route)
 The IP Prefix Length and IP Prefix fields in this type of route carry a host IP address or network segment
address.
 If a host IP address is carried, the route is used for IP route advertisement in distributed VXLAN gateway
scenarios. In this case, the route functions the same as an IRB route on the VXLAN control plane.
 If a network segment address is carried, the route can be advertised to allow hosts on the VXLAN network to
access an external network.
Packet format Field description
Route Distinguisher (8 bytes) Route distinguisher (RD) configured for an EVPN instance.
Ethernet Segment Identifier (10 bytes) Unique ID of the connection between local and remote devices.
Ethernet Tag ID (4 bytes) VLAN ID configured on the local device.
IP Prefix Length (1 byte) Mask length of the IP prefix carried in the route.
IP Prefix (4 or 16 bytes) IP prefix carried in the route.
GW IP Address (4 or 16 bytes) Default gateway address. It is used in specific scenarios.
MPLS Label (3 bytes) Layer 3 VNI carried in the route.

54 Huawei Confidential
Type 2 Route Type 3 Route Type 5 Route

Application Scenario of Advertising IP Prefix Routes


⚫ For an external network of a VXLAN network, a VTEP can advertise external routes to the entire VXLAN
network through Type 5 routes to allow hosts on the VXLAN network to access the external network.

External network
VTEP 1.1.1.1/32 VTEP 2.2.2.2/32
1.2.3.0/24
VNI 88
SW1 (Layer 3 gateway) SW2 (Layer 3 gateway)

Static route BGP Update message 2 3


1.2.3.0/24 SW2 obtains the
Type 5 route route to 1.2.3.0/24.
1
Import the static Prefix = 1.2.3.0/24
route to BGP. Layer 3 VNI = 88

55 Huawei Confidential

• Similar to Type 2 IRB routes, Type 5 routes carry the router MAC address of the
VTEP through the EVPN Router's MAC Extended Community attribute during
route transmission. In addition, Type 5 routes carry only the Layer 3 VNI.
Therefore, the forwarding process is also IRB.
Contents

1. Background of VXLAN

2. Basic Concepts and Fundamentals of VXLAN

3. EVPN VXLAN Fundamentals


▫ Basic Concepts

▫ BGP EVPN Routes


◼ BGP EVPN Feature

4. VXLAN Deployment Cases in Typical Scenarios

56 Huawei Confidential
ARP Broadcast Suppression
⚫ BGP EVPN Type 2 routes enable VTEPs to learn MAC addresses without depending on communication between
hosts. However, ARP requests between hosts still need to be flooded on the VXLAN overlay network, which
consumes a large number of network resources.
⚫ ARP broadcast suppression can be implemented based on BGP EVPN routes to reduce broadcast traffic.
Search the ARP broadcast
suppression table of BD 20.
2 IP Address MAC VTEP
172.16.2.2 MAC B 2.2.2.2

VXLAN tunnel
PC1 VTEP1 changes the destination MAC address of the PC2
172.16.2.1/24 VTEP1 ARP data frame from all Fs to MAC B, encapsulates VTEP2 172.16.2.2/24
MAC A 1.1.1.1 the data frame into a VXLAN packet, and sends the 2.2.2.2 MAC B
VXLAN packet to VTEP2.
1 3 4
PC1 sends an ARP Original data VTEP2 unicasts the
request packet to IP header UDP header VXLAN header ARP ARP packet to PC2.
frame header
PC2.

• Source IP address: 1.1.1.1 • Source MAC address: MAC A • Source MAC address: MAC A
• Destination IP address: 2.2.2.2 • Destination MAC address: • Destination MAC address: MAC B
MAC B

57 Huawei Confidential

• ARP broadcast suppression effectively reduces the burden of the gateway in


processing ARP packets. When the gateway receives an ARP request packet, it
searches the ARP broadcast suppression table, which stores the mapping between
the IP address and MAC address of the destination device. If a matching entry is
found, the gateway replaces the broadcast MAC address in the ARP request
packet with the MAC address of the destination device. The gateway then sends
the ARP request packet through the interface corresponding to the destination
MAC address.
Host Information Collection
⚫ The implementation of ARP broadcast suppression depends on the ARP broadcast suppression table. The generation
of ARP broadcast suppression entries depends on Type 2 routes (IRB routes and host ARP advertisement) generated
by BGP EVPN.
⚫ By default, a Layer 3 gateway does not generate BGP EVPN routes based on local ARP information. You need to
manually enable BGP EVPN host information collection. VTEPs then generate IRB routes based on ARP information.

ARP entry of VBDIF 20 on the Layer 3 gateway

IP Address MAC VTEP


172.16.2.1 MAC A 1.1.1.1

VXLAN tunnel
PC1 PC2
172.16.2.1/24 VTEP1 VTEP2 172.16.2.2/24
MAC A 1.1.1.1 2.2.2.2 MAC B
Transmit ARP information through
BGP EVPN Type 2 IRB routes.
1 2 3 VTEP2 uses IRB routes to generate
Enable BGP EVPN host IRB host information entries.
information collection
to generate IRB routes.

58 Huawei Confidential

• An ARP route carries the following valid information: host MAC address, host IP
address, and Layer 2 VNI. An IRB route carries the following valid information:
host MAC address, host IP address, Layer 2 VNI, and Layer 3 VNI. Therefore, IRB
routes include ARP routes and can be used to advertise both host IP routes and
host ARP entries.
Local Proxy ARP (1)
⚫ After BGP EVPN host information collection is enabled on the entire network, the Layer 3 gateway learns 32-bit
host routes of all hosts. In this way, the Layer 3 gateway can use host routes to perform Layer 3 symmetric IRB for
traffic in the same BD.
⚫ You can enable local proxy ARP on the VBDIF interface of the Layer 3 gateway. The VBDIF interface responds to
ARP requests from downstream hosts for IP addresses on the same network segment. The Layer 3 gateway then
performs Layer 3 forwarding for access to the IP addresses on the same network segment.
1 VBDIF 20
PC1 sends an ARP
request packet to PC2. MAC C
172.16.2.254
arp-proxy local enable
2
VBDIF20 functions as a
proxy and sends an
ARP reply packet.

VXLAN tunnel
PC1 PC2
172.16.2.1/24 VTEP1 VTEP2 172.16.2.2/24
MAC A 1.1.1.1 2.2.2.2 MAC B

59 Huawei Confidential

• On a VXLAN network, a BD is a broadcast domain. After receiving BUM packets,


a VTEP broadcasts the packets in the BD. To reduce broadcast traffic, the
network administrator usually configures access-side isolation or port isolation on
the access side to isolate access users in a BD and prevent Layer 2
communication. However, with the increase of user services, users have higher
requirements for communication. To meet the requirements, the network
administrator can enable local proxy ARP on a VBDIF interface so that isolated
access users in a BD can communicate with each other.
Local Proxy ARP (2)
VBDIF 20
ARP entry of PC1 MAC C
172.16.2.254
IP Address MAC
172.16.2.2 MAC C

VXLAN tunnel
PC1 PC2
172.16.2.1/24 VTEP1 VTEP2 172.16.2.2/24
MAC A 1.1.1.1 2.2.2.2 MAC B
VTEP1 finds that the destination MAC address is
its own MAC address, searches the routing table
PC1 sends a data for Layer 3 forwarding, finds a host route, and
frame to PC2. forwards the packet to VTEP2 through VXLAN.
3 4
Ethernet
Payload
header

• Source MAC address: MAC A Local proxy ARP restricts ARP packet transmission within the local
• Destination MAC address: MAC C VTEP and reduces unnecessary traffic exchanged between VTEPs.

60 Huawei Confidential
Anycast Gateway
⚫ When local proxy ARP is enabled, a VTEP only needs to maintain local ARP entries. ARP information
transmitted by other VTEPs through BGP EVPN routes is not used during packet forwarding. In this
case, the VTEP does not need to maintain ARP entries learned from other VTEPs.
⚫ After the distributed gateway function is enabled, the VTEP processes only ARP packets received from
user-side hosts and deletes learned network-side ARP entries.

ARP entry of VTEP1


IP Address MAC
172.16.2.1 MAC A
172.16.2.2 MAC B

VXLAN tunnel
PC1 PC2
172.16.2.1/24 VTEP1 VTEP2 172.16.2.2/24
MAC A 1.1.1.1 2.2.2.2 MAC B

61 Huawei Confidential

• Generally, the VBDIF interfaces with the same ID on different VTEPs are
configured with the same MAC address. After the distributed gateway function is
enabled, the VBDIF interfaces have the same IP address and MAC address, but no
ARP conflict is reported. In addition, when hosts and VMs are migrated to
different VTEPs, ARP resolution does not need to be performed on the gateway.
MAC Mobility (1)

2 BGP EVPN route, with MAC Mobility - Seq 0


the sequence number Prefix = 172.16.2.1/24
of the extended MAC B
community attribute Next hop: VTEP1 (1.1.1.1)
MAC Mobility being 0.

VTEP1 VTEP2 VTEP3


1.1.1.1 2.2.2.2 4 VTEP2 detects VM1 3.3.3.3
1 VTEP1 learns ARP based on ARP
information of VM1 information and
and generates and generates a new IRB
advertises an IRB route, which is the same
route. as the route advertised
by VTEP1.
VM1
3 VM1 is migrated to
172.16.2.1/24 VTEP2.
MAC B

62 Huawei Confidential
MAC Mobility (2)

MAC Mobility - Seq 1 5 BGP EVPN route, with the sequence


Prefix = 172.16.2.1/24 number of the extended community
MAC B attribute MAC Mobility being 1.
Next hop: VTEP2 (2.2.2.2)

VTEP1 VTEP2 VTEP3


6 1.1.1.1 2.2.2.2 3.3.3.3
VTEP1 receives the BGP route
update and detects that the
connected VM has been
migrated based on the
sequence number carried in
MAC Mobility. VTEP1 then
VM1
sends a BGP Update message
to withdraw the route update. 172.16.2.1/24
MAC B

63 Huawei Confidential

• The MAC Mobility extended attribute is used to announce the location change of
a host or VM when the host or VM is migrated from one VTEP to another VTEP.
VXLAN QoS (1)
⚫ Certain fields in the packet header record QoS information so that network devices can provide
differentiated services.
⚫ Packets carry different types of precedence field depending on the network type. For example, packets
carry the 802.1p field on a VLAN network, the DSCP field on an IP network, and the EXP field on an
MPLS network. If packets traverse different types of networks, the mapping between the precedence
fields must be configured on the gateway. This configuration ensures that the packet priorities are
retained regardless of the network type.
⚫ VXLAN QoS provides differentiated quality assurance for VXLAN packets based on their internal
priorities, which are assigned by devices to differentiate the service classes of packets. In VXLAN QoS
implementation, devices map QoS priorities carried in original packets to internal priorities, and map
internal priorities to the priorities of VXLAN packets.

64 Huawei Confidential
VXLAN QoS (2)
The internal priority of the packet is
mapped to the QoS priority on a Layer 2
The packet is encapsulated into a VXLAN
sub-interface so that the QoS priority of the
packet, the outer 802.1p or DSCP priority is packet remains unchanged after the packet
mapped from the internal priority, and the passes through the VXLAN network.
packet is forwarded to the VXLAN tunnel. Subsequent packets are transmitted based
2 on the mapped priority.
The device maps the QoS priority of 4
the original packet to the internal 802.1p/DSCP 802.1p/DSCP
priority on a Layer 2 sub-interface
and sends the packet to a queue Inner
based on the internal priority. Payload
802.1p/DSCP
VXLAN Ethernet
1 IP header UDP header
header header
Payload

VXLAN tunnel

AF1 When the packet leaves the tunnel,


the 802.1p or DSCP priority trusted on AF1
802.1p/DSCP Payload AF2 the tunnel interface is mapped to the AF2
internal priority, and then the packet
AF3 AF3
enters a queue for transmission.
3 Outer Inner
Payload
802.1p/DSCP 802.1p/DSCP

65 Huawei Confidential

• In step 2, after the device encapsulates the packet into a VXLAN packet, the QoS
priority of the encapsulated packet is as follows:

▫ By default, the outer 802.1p value of the encapsulated packet is mapped


from the internal priority, and the inner 802.1p value of the encapsulated
packet remains unchanged. After the qos phb marking 8021p disable
command is configured in the Ethernet interface view, the outer 802.1p
value is 0, and the inner 802.1p value remains unchanged.

▫ By default, the outer DSCP value of the encapsulated packet is 0, and the
inner DSCP value of the encapsulated packet remains unchanged. After the
qos phb marking dscp enable command is configured in the Ethernet
interface view, the outer DSCP value is mapped from the internal priority,
and the inner DSCP value remains unchanged.

• After VXLAN encapsulation is complete, the local VTEP maps the internal priority
based on the DSCP or 802.1p field in the outer packet before the packet arrives
at the remote VTEP.
Contents

1. Background of VXLAN

2. Basic Concepts and Fundamentals of VXLAN

3. EVPN VXLAN Fundamentals

4. VXLAN Deployment Cases in Typical Scenarios

66 Huawei Confidential
Distributed Gateway (1)
• Networking requirements:
:Border Leaf  The entire network uses BGP EVPN to construct a VXLAN network
with distributed gateways. Spines function as RRs to reflect EVPN
:Server Leaf
PE routes to implement Layer 2 and Layer 3 communication between
servers.
 M-LAG is configured on all leaf nodes to ensure access link
reliability.
Leaf3A Leaf3B
 Configure an egress route on Leaf 3 (Border Leaf) to allow Server 1
on the intranet to access the Internet.

• Configuration procedure:
Spine1 Spine2
 Configure the M-LAG on the leaf node. (The configuration is not
mentioned here.)
 Configure the interface IP address and OSPF. (The configuration is
Leaf1A Leaf1B Leaf2A Leaf2B not mentioned here.)
 Configure BGP and enable BGP EVPN peers.
 Configure a VXLAN tunnel.
 Configure EVPN and VPN instances.
Server1 Server2 Server3
192.168.1.1/24 192.168.2.1/ 24 192.168.1.2/24  Configure a VXLAN Layer 3 gateway.
 Configure service access points and egress routes.

67 Huawei Confidential
Distributed Gateway (2)

X : X value
PE
Y : Y value
• Router ID planning: All devices use the IP address of the
3 Loopback0 interface as the router ID. The IP address planning
Leaf3A Leaf3B is 10.X.X.X, where X indicates the device ID, which is marked on
5 6
the left.

7 8 • VTEP IP address planning: All devices use the IP address of the


Spine1 Spine2 Loopback1 interface as the VTEP IP address. The IP address
planning is 11.Y.Y.Y, where Y is marked on the left.

1 2 Question: Why do two leaf nodes in an M-LAG share the same VTEP IP
Leaf1A Leaf1B Leaf2A Leaf2B
1 2 3 4 address?

Server1 Server2 Server3


192.168.1.1/24 192.168.2.1/24 192.168.1.2/24

68 Huawei Confidential

• Consideration: On a distributed VXLAN network, two leaf switches that form an


M-LAG system function as dual-active access gateways. Ensure that the IP
addresses and MAC addresses of the NVE interfaces on the two switches are the
same to ensure normal traffic forwarding on the VXLAN network. On a typical
data center network, all spine switches are fully connected, and at least two leaf
switches are forwarded through IGP routes (to implement backup and load
balancing).
Distributed Gateway (3)
• Configuration procedure:
:Border Leaf
 Configure the M-LAG on the leaf node. (The configuration is not
:Server Leaf mentioned here.)
PE
 Configure the interface IP address and OSPF. (The configuration is
not mentioned here.)

Leaf3A Leaf3B
• Configuration notes:

 After an M-LAG system is established for leaf nodes, monitor-links


or best-effort routes need to be configured to forward traffic in the
Spine1 Spine2 M-LAG system when the M-LAG system is faulty. You are advised
to deploy best-effort routes on border nodes and monitor-links on
server leaf nodes.
Leaf1A Leaf1B Leaf2A Leaf2B  Each leaf device independently establishes OSPF and BGP neighbor
relationships with the spine device. When an OSPF address is
advertised, the VTEP IP address needs to be advertised to the OSPF
process to ensure reachable routes during VXLAN tunnel
establishment.
Server1 Server2 Server3
192.168.1.1/24 192.168.2.1/24 192.168.1.2/24

69 Huawei Confidential
Distributed Gateway (4)
• Configuration procedure:
:Border Leaf
 Configure BGP and enable BGP EVPN peers.
:Server Leaf
PE
• The configuration of spine1 is as follows:

 The following uses the BGP EVPN peer relationship established


Leaf3A Leaf3B with Leaf1A as an example. The configuration of BGP peer
relationships established between Spine and other leaf nodes is
similar.
[-Spine1] evpn-overlay enable
Spine1 Spine2 [*Spine1] bgp 100
[*Spine1-bgp] router-id 10.7.7.7
[*Spine1-bgp] peer 10.1.1.1 as-number 100
[*Spine1-bgp] peer 10.1.1.1 connect-interface LoopBack 0
Leaf1A Leaf1B Leaf2A Leaf2B [*Spine1-bgp] l2vpn-family evpn
[*Spine1-bgp-af-evpn] peer 10.1.1.1 enable
[*Spine1-bgp-af-evpn] peer 10.1.1.1 advertise irb
[*Spine1-bgp-af-evpn] peer 10.1.1.1 reflect-client
[*Spine1-bgp-af-evpn] commit
Server1 Server2 Server3
192.168.1.1/24 192.168.2.1/24 192.168.1.2/24 The configuration of Spine2 is similar to that of Spine1. The
configuration details are not mentioned here.

70 Huawei Confidential
Distributed Gateway (5)
• Configuration procedure:
:Border Leaf
 Configure BGP and enable BGP EVPN peers.
:Server Leaf
PE • The configuration of Leaf1A is as follows:

[-Spine1] evpn-overlay enable


[*Spine1] bgp 100
Leaf3A Leaf3B [*Spine1-bgp] router-id 10.1.1.1
[*Spine1-bgp] peer 10.7.7.7 as-number 100
[*Spine1-bgp] peer 10.7.7.7 connect-interface LoopBack 0
[*Spine1-bgp] peer 10.8.8.8 as-number 100
[*Spine1-bgp] peer 10.8.8.8 connect-interface LoopBack 0
Spine1 Spine2 [*Spine1-bgp] l2vpn-family evpn
[*Spine1-bgp-af-evpn] peer 10.7.7.7 enable
[*Spine1-bgp-af-evpn] peer 10.7.7.7 advertise irb
[*Spine1-bgp-af-evpn] peer 10.8.8.8 enable
Leaf1A Leaf1B Leaf2A Leaf2B
[*Spine1-bgp-af-evpn] peer 10.8.8.8 advertise irb
[*Spine1-bgp-af-evpn] commit

The configuration of other leaf nodes is similar to that of


Leaf1A. The configuration details are not mentioned here.
Server1 Server2 Server3
192.168.1.1/24 192.168.2.1/24 192.168.1.2/24

71 Huawei Confidential
Distributed Gateway (6)
• Configuration roadmap:
:Border Leaf  Configure a VXLAN tunnel.

:Server Leaf • Service planning:


PE
 Network segments 192.168.1.0/24 where Server1 and Server3 are
located are allocated to BD100.
 The network segment 192.168.2.0/24 where Server2 is located is
Leaf3A Leaf3B allocated to BD200.
• The configuration of Leaf1 is as follows. The following uses Leaf1A as an
example.
[-Leaf1A] bridge-domain 100
Spine1 Spine2 [*Leaf1A-bd100] vxlan vni 100
[*Leaf1A-bd100]bridge-domain 200
[*Leaf1A-bd200] vxlan vni 200
[*Leaf1A-bd200] quit
Leaf1A Leaf1B Leaf2A Leaf2B [*Leaf1A-bd100] interface Nve1
[*Leaf1A-Nve1] source 11.1.1.1
[*Leaf1A-Nve1] mac-address 0000-5e00-0100
[*Leaf1A-Nve1] vni 100 head-end peer-list protocol bgp
[*Leaf1A-Nve1] vni 200 head-end peer-list protocol bgp
[*Leaf1A-Nve1] quit
Server1 Server2 Server3
192.168.1.1/24 192.168.2.1/24 192.168.1.2/24 [*Leaf1A] commit
The configuration of Leaf2 and Leaf3 is similar to that of Leaf1. The
configuration details are not mentioned here.

72 Huawei Confidential
Distributed Gateway (7)
• Configuration roadmap:
:Border Leaf  Configure EVPN and VPN instances.

:Server Leaf
PE • The configuration of Leaf1 is as follows. The following uses Leaf1A as
an example.
[-Leaf1A] bridge-domain 100 EVPN Instance
Leaf3A Leaf3B [*Leaf1A-bd100] evpn Configuration
[*Leaf1A-bd100-evpn] route-distinguisher 2:2
[*Leaf1A-bd100-evpn] vpn-target 100:1
[*Leaf1A-bd100-evpn] vpn-target 1000:1 export-extcommunity
[*Leaf1A-bd100-evpn] bridge-domain 200
Spine1 Spine2 [*Leaf1A-bd200-evpn] route-distinguisher 3:3
[*Leaf1A-bd200-evpn] vpn-target 200:1
[*Leaf1A-bd200-evpn] vpn-target 1000:1 export-extcommunity
[*Leaf1A-bd200-evpn] commit
Leaf1A Leaf1B Leaf2A Leaf2B
[-Leaf1A] ip vpn-instance vpn1 IP VPN Instance
[*Leaf1A-vpn-instance-vpn1] vxlan vni 10000 Configuration
[*Leaf1A-vpn-instance-vpn1] route-distinguisher 22:22
[*Leaf1A-vpn-instance-vpn1-af-ipv4] vpn-target 1000:1
[*Leaf1A-vpn-instance-vpn1-af-ipv4] vpn-target 1000:1 evpn
Server1 Server2 Server3
[*Leaf1A-vpn-instance-vpn1-af-ipv4] commit
192.168.1.1/24 192.168.2.1/24 192.168.1.2/24
The configuration of Leaf2 and Leaf3 is similar to that of Leaf1. The
configuration details are not mentioned here.

73 Huawei Confidential

• Leaf2 needs to be configured with only the EVPN instance and IP VPN instance of
BD100.
Distributed Gateway (8)
• Configuration roadmap:
:Border Leaf
 Configure a VXLAN Layer 3 gateway.
:Server Leaf
PE
• The following figure shows the configuration of Leaf1. The following uses
Leaf1A as an example.

Leaf3A Leaf3B [-Leaf1A] interface Vbdif 100


[*Leaf1A-Vbdif100] ip binding vpn-instance vpn1
[*Leaf1A-Vbdif100] ip address 192.168.1.1 24
[*Leaf1A-Vbdif100] mac-address 0000-5e00-0102
[*Leaf1A-Vbdif100] arp collect host enable
Spine1 Spine2 [*Leaf1A-Vbdif100] arp distribute-gateway enable
[*Leaf1A-Vbdif100] quit
[*Leaf1A] interface Vbdif 200
[*Leaf1A-Vbdif100] ip binding vpn-instance vpn1
Leaf1A Leaf1B Leaf2A Leaf2B [*Leaf1A-Vbdif100] ip address 192.168.2.1 24
[*Leaf1A-Vbdif100] mac-address 0000-5e00-0102
[*Leaf1A-Vbdif100] arp collect host enable
[*Leaf1A-Vbdif100] arp distribute-gateway enable
[*Leaf1A-Vbdif100] commit
Server1 Server2 Server3
192.168.1.1/24 192.168.2.1/24 192.168.1.2/24 The configuration of Leaf2 is similar to that of Leaf1. The
configuration details are not mentioned here.

74 Huawei Confidential
Distributed Gateway (9)
• Configuration roadmap:
:Border Leaf
 Configure service access points and egress routes.
:Server Leaf
PE Configure an egress route on Leaf3 and import BGP routes. The
following uses Leaf3A as an example.
[-Leaf3A] ip route-static 0.0.0.0 0.0.0.0 100.1.1.2 vpn-instance vpn1
[*Leaf3A] bgp 100
Leaf3A Leaf3B [*Leaf3A-bgp] ipv4-family vpn-instance vpn1 IP address of
[*Leaf3A-bgp-vpn1] default-route imported the port
[*Leaf3A-bgp-vpn1] import-route static connecting
[*Leaf3A-bgp-vpn1] commit Leaf3 to PE

Configure all leaf nodes to advertise IP prefix routes to BGP peers.


Spine1 Spine2
[-Leaf1A] bgp 100
[-Leaf1A-bgp] ipv4-family vpn-instance vpn1
[*Leaf1A-bgp-vpn1] import-route direct
Leaf1A Leaf1B Leaf2A Leaf2B [*Leaf1A-bgp-vpn1] advertise l2vpn evpn
[*Leaf1A-bgp-vpn1] commit

Configure a service access point. The following uses Server1 as an


example:
Server access
[-Leaf1] interface Eth-Trunk1.1 mode l2 port
Server1 Server2 Server3
[*Leaf1-GE1/0/1.1] encapsulation dot1q vid 10
192.168.1.1/24 192.168.2.1/24 192.168.1.2/24
[*Leaf1-GE1/0/1.1] bridge-domain 100 Server1 data
[*Leaf1-GE1/0/1.1] commit carries VLAN 10.

75 Huawei Confidential
Distributed Gateway (10)
• Result verification:
:Border Leaf

:Server Leaf  Run the display vxlan tunnel command on Leaf 1A to check the VXLAN
PE tunnel.
[-Leaf1A] display vxlan tunnel
Number of vxlan tunnel: 1
Tunnel ID Source Destination State Type Uptime
Leaf3A Leaf3B
----------------------------------------------------------------------------------------
4026531841 11.1.1.1 11.2.2.2 up dynamic 0032h21m
4026531842 11.1.1.1 11.3.3.3 up dynamic 0032h25m

Spine1 Spine2  After the configuration is complete, Layer 2 and Layer 3 communication
can be implemented between different servers.
 Check the egress routes on Leaf1 and Leaf3, for example, Leaf1A.

Leaf1A Leaf1B Leaf2A Leaf2B [-Leaf1A] display ip routing-table vpn-instance vpn1


Destination/Mask Proto Pre Cost Flags NextHop Interface
0.0.0.0/0 IBGP 60 0 D 11.3.3.3 VXLAN

[-Leaf3A] display ip routing-table vpn-instance vpn1


Destination/Mask Proto Pre Cost Flags NextHop Interface
Server1 Server2 Server3 0.0.0.0/0 Static 60 0 D 100.1.1.2 vbdif1000
192.168.1.1/24 192.168.2.1/24 192.168.1.2/24
 After the configuration is complete, Server1 accesses the external network
through Leaf1, Spine, and Leaf3 in sequence.

76 Huawei Confidential
Quiz

1. (True or false) BGP EVPN Type 2 host IP routes can be used to transmit ARP information. ( )
A. True

B. False

2. (Single-answer question) Which of the following statements about BGP EVPN is false? ( )
A. Carrying routes through MP_REACH_NLRI.

B. Carrying the RT through the extended community attribute

C. Carrying L2 VNI and L3 VNI in MP_REACH_NLRI

D. Carrying the next hop address of a route through the Next_Hop attribute

77 Huawei Confidential

1. A

2. D
Summary
⚫ VXLAN uses a Layer 3 routed network as its underlay network and uses tunnels to build an
overlay virtual network, supporting a large number of tenant networks.
⚫ VXLAN does not define a control plane. To limit the flooding of BUM traffic, VXLAN needs
to use other control plane protocols to optimize BUM traffic forwarding.
⚫ BGP EVPN extends BGP by defining several types of BGP EVPN routes. These BGP EVPN
routes can be used to transmit VTEP addresses, host information, and routing information,
effectively helping VXLAN limit the flooding of BUM traffic.

78 Huawei Confidential
Thank you. 把数字世界带入每个人、每个家庭、
每个组织,构建万物互联的智能世界。
Bring digital to every person, home, and
organization for a fully connected,
intelligent world.

Copyright©2023 Huawei Technologies Co., Ltd.


All Rights Reserved.

The information in this document may contain predictive


statements including, without limitation, statements regarding
the future financial and operating results, future product
portfolio, new technology, etc. There are a number of factors that
could cause actual results and developments to differ materially
from those expressed or implied in the predictive statements.
Therefore, such information is provided for reference purpose
only and constitutes neither an offer nor an acceptance. Huawei
may change the information at any time without notice.
Technical Principles and Application of M-LAG
Foreword

⚫ The data center carries core computing functions of enterprise production. The network has
requirements for high-performance load balancing and high service reliability. Important
service systems have requirements for uninterrupted services during device upgrade. This
puts forward a high requirement on the availability of the network system.
⚫ The CloudFabric solution uses Multichassis Link Aggregation Group (M-LAG) and Virtual
eXtensible Local Area Network (VXLAN) to implement end-to-end reliability, ensuring that
service systems can run properly in device failure and upgrade scenarios.
⚫ This document describes the principles and applications of the M-LAG technology.

2 Huawei Confidential
Objectives

Upon completion of this course, you will be able to:


 Describe the definition, usage, and features of M-LAG.
 Differences Between Stacking and M-LAG.
 Describe the technical principles of M-LAG.
 Describe the network deployment mode and typical application networking of M-LAG.

3 Huawei Confidential
Contents

1. Overview of M-LAG

2. M-LAG Fundamentals

3. M-LAG Failure Protection

4. M-LAG Deployment

5. M-LAG Best Practices

4 Huawei Confidential
Overview of LAG
⚫ SW1 and SW2 are connected by using multiple links, for example, four links. The four links can be bundled into an Eth-Trunk.
 Increase the bandwidth (sum of the bandwidth of the four links)

 Improve reliability (where some links are down, other links can take over the forwarding task)

 Load balancing (Traffic is allocated to different links based on the 5-tuple hash algorithm to improve bandwidth utilization.)

⚫ However, if SW1 or SW2 fails, the traffic transmitted through SW1 or SW2 is interrupted. In this case, board-level link aggregation
cannot meet reliability requirements.

SW1

Eth-Trunk
SW2

5 Huawei Confidential

• Huawei devices use Eth-Trunk as the link aggregation technology. You can
configure an Eth-Trunk on a device and add multiple interfaces (for example,
four interfaces) to the Eth-Trunk.
Overview of M-LAG
⚫ M-LAG (Multichassis Link Aggregation Group, Inter-Device Link Aggregation Group): A mechanism that implements
inter-device link aggregation. This mechanism improves the reliability of link aggregation from the link level to the
device level. In addition, M-LAG member devices forward traffic through load balancing, forming a dual-active
system.
⚫ M-LAG is also a virtualization technology. From the perspective of the peer device connected to the M-LAG port,
the M-LAG port is connected to a logical switch.
SW1 SW2
Peer-link
SW1 and SW2 are independent devices from a
management perspective
M-LAG M-LAG is configured on SW1 and
SW2.

LAG Deploy traditional LAG on SW3.

For SW3, SW1/SW2 is considered a single device at the link layer

SW3

6 Huawei Confidential

• There are several options to improve network reliability, such as STP+VRRP and
stacking. However, these options have obvious problems, such as:

▫ STP+VRRP

▪ The STP blocking mechanism leads to low Layer 2 link usage.

▪ The Master/Backup backup function of VRRP leads to low resource


utilization.

▪ Only the Master/Backup mode is supported for server access.

▫ Stacking technology (to be compared later)

▪ Fast stack upgrade reduces the service interruption time, but increases
the upgrade time and increases the upgrade risk. The control plane is
centralized, and faults may spread on member devices.

▪ The master control plane needs to control the forwarding planes of


all stack members, increasing the CPU load.

• Therefore, M-LAG is often used in data center network to improve network


reliability.
Advantages of M-LAG
⚫ To meet the requirements for higher network
SW1 SW2 reliability, M-LAG uses link aggregation between
multiple devices to achieve higher reliability and
improve link utilization.
⚫ Advantage:
 Implements inter-device link aggregation,
SW3 SW4 improving Layer 2 link utilization.
 The active-active gateway technology of M-LAG
improves the utilization of device and link resources.
LAG  Servers can use link aggregation to implement
Server
M-LAG active-active access devices and implement load
balancing.

7 Huawei Confidential
Comparison Between M-LAG and Stack
Stacked M-LAG
Management plane

SW1 Protocol plane SW2 SW1 SW2


Protocol Plane
Data plane Data Plane

A stack implements virtualization on the management plane, protocol plane M-LAG implements virtualization on some data planes and some protocol

(control plane), and data plane, and member devices are highly coupled. planes (control planes) and has low coupling between member devices.

Dimension Stacked M-LAG


Protocol planes are centralized, and faults may spread Excellent. Protocol planes are independent (partially
Reliability
on member devices. centralized), and fault domains are isolated.
Excellent. The number of management nodes is
O&M Two switches need to be managed.
reduced and the configuration is simple.
Fault convergence Excellent. The convergence performance is close to
Failover information needs to be passed through a protocol.
performance that of a single device.
High: Fast stack upgrade shortens the service Excellent. The two switches are upgraded independently without
Upgrade complexity interruption time, but increases the upgrade time and interrupting service access. The risk is low and applications are
increases the upgrade risk. unaware of the upgrade.
Longer upgrade: In the typical networking, the service
Service interruption time
interruption time is about 20s to 1 minute, which is Short: Traffic is interrupted in seconds.
during an upgrade
closely related to the service volume.

8 Huawei Confidential

• Application scenarios of stacking:

▫ There is no requirement on the interruption duration during software


upgrade.

▫ Simple maintenance is required.

• Application scenarios of M-LAG:

▫ The service interruption time during the software upgrade is high.

▫ Higher reliability.

▫ It is acceptable to add a certain degree of maintenance complexity.


Contents

1. Overview of M-LAG

2. M-LAG Fundamentals
◼ Basic Concepts of M-LAG

▫ Basic Features of M-LAG

▫ M-LAG Traffic Forwarding Process

3. M-LAG Failure Protection

4. M-LAG Deployment

5. M-LAG Best Practices


9 Huawei Confidential
Basic Concepts System Setup

Basic Concepts of M-LAG (1)


• Dynamic fabric service group (DFS) group: It is used
to pair M-LAG devices. The interface status and entries
Dual-active detection (DAD) link
between M-LAG dual-homing devices must be
synchronized using the DFS group protocol.
DFS Group • Peer-link: a Layer 2 link used to exchange negotiation
Peer-link
SW1 SW2 packets, synchronize device information, and transmit
Master Backup some traffic. After an interface is configured as a peer-
M-LAG M-LAG
Member Member link interface, other services cannot be configured on
interface interface the interface.
M-LAG
• DFS master device (Master): indicates the master
device with M-LAG deployed.
• DFS backup device (Backup): M-LAG is deployed and
LAG
the device is in the standby state.
• M-LAG member interface: Eth-Trunk interface on the
M-LAG Master/Backup connected to user-side hosts or
Server
switching devices.

10 Huawei Confidential

• A DFS group consists of a master device and a backup device. Under normal
circumstances, both the master and backup devices forward service traffic and
their forwarding behaviors are the same. The master and backup devices have
different forwarding behaviors only when a fault occurs.

▫ When no fault occurs, both the master and backup devices forward traffic.

▫ When two master devices are detected, service interfaces on the backup
device enter the Error-Down state.

• The peer-link is used to achieve the following:

▫ Transmit DFS group protocol packets.

▫ Transmit synchronization packets used for synchronizing MAC address


entries and ARP entries between M-LAG master and backup devices.

▫ Forward inter-device traffic sent from non-M-LAG member interfaces or


traffic received from an M-LAG member interface when downstream
devices are single-homed to the M-LAG due to a fault.

• By default, the peer-link allows packets from all VLANs to pass through. If you do
not want the peer-link to allow packets from some VLANs to pass through, you
need to configure the VLANs separately.

• To improve the reliability of the peer-link, you are advised to add multiple links
to a LAG and configure the aggregated link as the peer-link. However, even if

• there is only one link, you need to add it to the LAG.


Basic Concepts System Setup

Basic Concepts of M-LAG (2)

DAD link

DFS Group
Peer-link • DAD link: A DAD link, also called a heartbeat
SW1 SW2
Master Backup link, is a Layer 3 link used by Master/Backup in an
M-LAG to send DAD packets.

M-LAG • HeartBeat (HB) DFS master device: A device


that negotiates the master state through a
heartbeat link.
LAG
• HB DFS standby device: indicates the standby
device negotiated through the heartbeat link.
Server

11 Huawei Confidential

• Under normal circumstances, the DAD link does not participate in any traffic
forwarding behaviors in the M-LAG. It is only used to detect whether two master
devices exist when a fault occurs. The DAD link can be an external link, for
example, if the M-LAG is connected to an IP network and the two member
devices can communicate through the IP network, the link that enables
communication between the member devices can function as the DAD link. An
independent link that provides Layer 3 reachability can also be configured as the
DAD link, for example, a link between management interfaces of the member
devices can function as the DAD link.

• Under normal circumstances, the HB DFS master/backup status does not affect
traffic forwarding behaviors in the M-LAG. It is used only in secondary fault
recovery scenarios.

▫ If a fault on the original DFS master device is rectified and the peer-link is
still faulty, the corresponding interfaces on the backup device are triggered
to enter the Error-Down state based on the HB DFS master/backup status.
This mechanism prevents abnormal traffic forwarding in the scenario where
two master devices exist.
Basic Concepts System Setup

Base Protocol - LACP


⚫ LACP needs to be configured on M-LAG member interfaces to detect faults such as link layer faults and
incorrect link connections, improving link reliability.
DAD link
According to the LACP principle, to enable SW3 in the
scenario shown in the left figure to consider the two M-

Peer-link LAG member switches as one LACP node, the LACP


SW1 SW2
packet received by SW3 must meet the following

M-LAG conditions:
LAG
• Same LACP system ID.

LAG • The LACP system priority is the same.

• The LACP port priority is the same.

• The LACP key (identifying one Eth-Trunk) is the same.


SW3 SW3
• LACP ports do not conflict.
Logical topology
Physical topology in the eyes of SW3

12 Huawei Confidential

• LACP can be deployed on the peer-link interface, M-LAG member interface, or


Eth-Trunk interface of the access device.
Basic Concepts System Setup

Base Protocol - STP


⚫ M-LAG supports dual-homing and logical loop-free networks. This does not mean that STP is not required. For example, the
following three scenarios are used.
Scenario 1: Preventing loops caused by incorrect Scenario 2: Connecting physical cables before Scenario 3: M-LAG Access to a Layer 2
cable connections or configurations configuring M-LAG interfaces Network

L2 Network

Peer-link Peer-link
Loop
STP
blocking
Peer-link
STP STP
Loop Loop Blocking
Blocking
M-LAG

LAG

STP needs to be used to prevent loops when a If physical connections are complete
M-LAG is configurted. physical cables are connected,
port planned for connecting to a port on a server before M-LAG configuration, loops exist on
and a loop occurs on the network. In this case, STP
is incorrectly connected to a switch or uplink of the network. In this case, STP needs to be
needs to be deployed to prevent loops.
the switch is connected to a non-M-LAG member deployed to prevent loops.
interface.

13 Huawei Confidential
Basic Concepts System Setup

M-LAG Setup Process

If the DFS group


Peer-link IDs are the same,
Pairing Hello the pairing
succeeds.
Phase 1: Hello packet exchange Phase 2: Pairing

Compare
Master/Backup Peer-link
priorities and
device DFS group device System MAC
negotiation information address
Phase 1: Exchange DFS group
de v ice in forma tion. Ph a se 2 : S el ect Ma ster/Backup

DAD Peer-link
Dual-active
detection All types of
Synchronize i n f o r m at i on
Du a l -active de te ction S yn chronizing v a riou s
information and entries

14 Huawei Confidential
Basic Concepts System Setup

M-LAG Pairing
⚫ After the M-LAG configuration is complete on two devices:
DAD link  The device first sends a DFS group Hello packet over the peer-link.

 After receiving a Hello packet from the remote end, the device
DFS Group
checks whether the DFS group number carried in the packet is the
Peer-link
same as that carried by the local end.
SW1 SW2
 If the DFS group numbers of the two devices are the same, the DFS
group pairing is successful.

 Proceed to the next step and negotiate the Master/standby.


DFS Group Hello packet

DFS group device information packet


Node Slot Version Serial Num

M-LAG device information packet


Ether Header Msg Header Payload

M-LAG synchronization packet Synchronization packet format

Type Length Value

15 Huawei Confidential

• A customized message header is encapsulated in the outer Ethernet header. The


customized message header contains the following information:

▫ Version: indicates the protocol version, which is used to identify the M-LAG
version of M-LAG member devices.

▫ Message Type: indicates the type of a packet, which can be Hello or


Synchronization.

▫ Node: indicates the device node ID.

▫ Slot: indicates the slot ID of the card that needs to receive messages. For a
fixed device, the value is the stack ID.

▫ Serial Number: indicates the protocol serial number, which is used to


improve reliability.

• The user-defined message header contains the normal packet data, including the
information that needs to be exchanged or synchronized. For example, the DATA
field of a Hello packet contains the DFS group ID, priority, and MAC address of
the device. However, the synchronization packet DATA contains some entries and
status information.
Basic Concepts System Setup

Negotiate Master/Backup
⚫ DFS Group Negotiation Master/Backup
DAD link  After the pairing succeeds, the two devices send a DFS group
information packet to the peer through the peer-link. The device
DFS Group determines the Master/Backup status of the DFS group based on the
Peer-link DFS group priority and system MAC address carried in the packet. (If
SW1 SW2 the priority is higher, the device functions as the master. If the

M-LAG priority is the same, the device compares the MAC address of the
M-LAG
Member Member device. If the MAC address is smaller, the device functions as the
interface interface master.)
DFS group Hello packet
 In normal cases, the forwarding behavior of the master and backup
devices is the same. The forwarding behavior of the Master/Backup
DFS group device information packet
device is different only in the case of a fault.

M-LAG device information packet ⚫ M-LAG member interface negotiation Master/Backup


 In addition to Master/Backup negotiation, member interfaces also
M-LAG synchronization packet use M-LAG packets to negotiate the master/backup status.

16 Huawei Confidential

• M-LAG member interface negotiation Master/Backup:

▫ After the DFS group negotiates the Master/Backup status, the two M-LAG
devices send M-LAG device information packets over the peer-link. The
packets carry the configurations of M-LAG member interfaces. After the
information about the M-LAG member interfaces is synchronized, the
Master/Backup status of the M-LAG member interfaces is determined.

▫ When member interface information is synchronized from the peer end, the
M-LAG member interface whose status changes from Down to Up first
becomes the master M-LAG member interface, and the M-LAG member
interface on the peer end becomes the backup.

• The forwarding behavior of the Master/Backup role of the M-LAG member


interface is different only in the M-LAG multicast access scenario.

▫ In versions earlier than V200R003C00, only the M-LAG member interface in


the master state forwards multicast traffic to receivers. In V200R003C00
and later versions, the M-LAG member interface in the master/backup state
can forward multicast traffic to receivers. Load balancing is implemented.
When the versions of two devices in an M-LAG system are different, the
multicast traffic forwarding rule of the earlier version prevails.
Basic Concepts System Setup

Dual-Active Detection
⚫ After the M-LAG Master/Backup is negotiated, the two devices
DAD link send M-LAG DAD packets at an interval of 1s over the DAD
link. Once the device detects a peer-link fault, it sends three
DFS Group DAD packets at an interval of 100 ms to accelerate the
Peer-link
detection. When the two devices can receive the packets from
SW1 SW2
the peer device, the active-active system starts to work properly.

⚫ After the peer-link fails, the DAD determines that the other

DFS Group Hello packet device is running. The service port on the standby device is set
to the Error-down state.

DFS group device information packet ⚫ Key deployment points:


 Independent links are used to carry DAD traffic. Peer-links cannot be
M-LAG device information packet reused.

 You are advised to deploy the DAD through the out-of-band


M-LAG synchronization packet
management network port, which reduces costs.

 DAD can also be deployed on an independent Layer 3 service


interface.

17 Huawei Confidential

• If the peer-link fails and the two member switches continue to run, network
services will be affected.

▫ Forwarding entries (ARP and MAC addresses) on SW1 and SW2 are not
synchronized, which may cause forwarding exceptions.

▫ In the V-STP scenario, messages cannot be synchronized through the peer-


link. As a result, STP calculation may be abnormal.

• To improve the reliability of the M-LAG system, you need to configure DAD.
Normally, DAD links do not participate in any forwarding behavior of the M-LAG.
This command is used only when the DFS group pairing fails or the peer-link
fails. Therefore, the M-LAG does not work properly even if DAD fails. The DAD
link can be carried over an external network. (For example, if M-LAG is
connected to an IP network, two dual-homing devices can communicate with
each other through the IP network. In this case, the interworking link can be used
as a dual-active detection link.) You can also configure a reachable Layer 3 link
as the DAD link (for example, through the management interface).

▫ (Recommended) DAD links communicate with each other through


management network ports. The IP addresses of management network
ports bound to DFS groups must be able to communicate with each other.
The management network ports must be bound to VPN instances to isolate
DAD packets from service traffic.
Basic Concepts System Setup

Synchronizing M-LAG Device Data - Synchronizing Device


Information
DAD link
⚫ To ensure that the connected devices regard the M-
DFS Group LAG system as a logical device, the two switches in the
Peer-link M-LAG system must have the same device information
SW1 SW2
(partial) and forwarding entries. This ensures that the
fault of either device does not affect traffic forwarding

DFS Group Hello packet


and services are not interrupted.
⚫ Therefore, after the M-LAG system works properly, the
DFS group device information packet two devices send M-LAG synchronization packets over
the peer-link to synchronize the information about the
M-LAG device information packet
peer end in real time. The device information includes
M-LAG synchronization packet the device name, system MAC address, software
version, M-LAG status, STP protocol packets, and VRRP
packets.

19 Huawei Confidential

• The synchronization information includes the device name, system MAC address,
software version, M-LAG status, STP status, VRRP priority, DR priority, ACL, and
LACP information.
Basic Concepts System Setup

M-LAG Device Data Synchronization - Forwarding Entry


Synchronization
Forwarding entries, such as MAC addresses, ARP ⚫ Common forwarding entries that need to be synchronized include the MAC
entries, and ND entries, are synchronized between
two members through the peer-link. address table, ARP table, ND table, and IGMP multicast table.

⚫ Synchronization principles:
SW1 SW2  The entries learned on the M-LAG interface must be synchronized to the peer
Peer-link
device. After receiving the message, the peer device changes the interface
corresponding to the entry to the M-LAG interface on the local device.
Single-homing
interface  The entries learned on the isolated port must be synchronized to the peer
device. After receiving the message, the peer device changes the interface
corresponding to the entry to the peer-link.

⚫ Why does the peer-link disable the learning of related entries or protocols?

SW3 SW4  If the peer-link interface learning function is enabled, the peer-link interface
may conflict with the forwarding entry synchronized by the M-LAG DFS
protocol.
By default, non-M-LAG synchronization packets
received from peer-links are not learned, which  Because the peer-link interface is disabled from learning related entries or
may conflict with information in M-LAG protocols, the entries learned by the isolated interface need to be
synchronization packets.
synchronized to the peer end.

20 Huawei Confidential

• The synchronization information includes MAC addresses, ARP entries, ND


entries, IGMP entries, and DHCP snooping entries.
Contents

1. Overview of M-LAG

2. M-LAG Fundamentals
▫ Basic Concepts of M-LAG
◼ Basic Features of M-LAG

▫ M-LAG Traffic Forwarding Process

3. M-LAG Failure Protection

4. M-LAG Deployment

5. M-LAG Best Practices


21 Huawei Confidential
Local Preferential Forwarding
⚫ Preferential local forwarding applies only to known unicast traffic (upstream and downstream).
 Layer 2 unicast traffic: If the outbound interface in the MAC address table contains both the peer-link interface
and the local interface, the local interface is preferentially sent. (If there is no local outbound interface, the
peer-link can be used.)
 Layer 3 unicast traffic: Check the routing table of each active-active gateway and forward traffic based on the
local table.
L2 Network L3 Network

STP Blocking

SW1 SW2 SW1 SW2


Peer-link Peer-link

SW3 SW3

22 Huawei Confidential
Unidirectional Isolation to Prevent Loops - BUM Packets (1)

Unidirectional isolation is not configured from


the peer-link to the single-homing port.

SW1 SW2 ⚫ In M-LAG networking, a loop exists from the


Peer-link
physical perspective.Loops greatly affect Layer 2
forwarding.How does M-LAG solve this problem?
 For packets forwarded at Layer 2, the M-LAG uses
a unidirectional isolation technology to prevent
loops on the Layer 2 network.
SW3 SW4

Peer-link to M-LAG dual-homing ports, unidirectional isolation

BUM

23 Huawei Confidential
Unidirectional Isolation to Prevent Loops - BUM Packets (2)
⚫ When SW3 is connected to M-LAG in active-active mode,
Unidirectional isolation is not configured from
the peer-link to the single-homing port.
global ACL configurations are delivered in the following
sequence by default:
Rule1  Rule 1: Allows Layer 3 unicast packets with the source
SW1 SW2
Rule2 Peer-link interface being the peer-link interface and the destination
interface being the M-LAG member interface to pass through.
 Rule 2: All packets with a peer-link interface as the source
interface and an M-LAG member interface as the destination
interface are rejected.

⚫ Unidirectional isolation: M-LAG devices use the ACL rule


SW3 SW4
group to implement unidirectional isolation between
peer-link interfaces and M-LAG member interfaces.
Peer-link to M-LAG dual-homing ports, Flooding traffic such as broadcast traffic from a peer-
unidirectional isolation
link interface to an M-LAG member interface is isolated.
BUM

24 Huawei Confidential

• Prerequisites for the unidirectional isolation mechanism to take effect

▫ When M-LAG master and backup devices are negotiated, the system checks
whether the access device is dual-homed to the M-LAG using M-LAG
synchronization packets. If the access device is dual-homed to the M-LAG,
the two M-LAG devices deliver the unidirectional isolation configuration of
the corresponding M-LAG member interface to isolate traffic from peer-link
interfaces to M-LAG member interfaces. Unidirectional isolation in the M-
LAG loop prevention mechanism takes effect only for flooding traffic such
as broadcast traffic.

▫ If the access device is single-homed to the M-LAG, the M-LAG does not
deliver the unidirectional isolation configuration of the corresponding M-
LAG member interface.

• Canceling unidirectional isolation: When an M-LAG device detects that the local
M-LAG member interface is in Down state, the device sends M-LAG
synchronization packets through the peer-link to instruct the remote device to
revoke the automatically delivered unidirectional isolation ACL rule group of the
corresponding M-LAG member interface.
M-LAG Upgrade in Maintenance Mode
⚫ If SW3 needs to be upgraded in the networking shown in the following figure, switch traffic to SW4 by shutting down the interface or modifying the link
cost of the routing protocol, and then upgrade SW3. After SW3 is upgraded, restore the interface status or the cost value of the routing protocol link and
switch traffic back to SW3. As a result, packet loss occurs in north-to-south traffic due to routing protocol convergence or ECMP path switching, and packet
loss occurs in south-to-north and east-west traffic due to Eth-Trunk interface status changes.

⚫ M-LAG upgrade in maintenance mode allows you to run commands in the maintenance mode view to switch traffic from the device to be upgraded to the
backup device and then restart the device. This reduces the packet loss rate during the upgrade and improves upgrade reliability.

1. Started
2. Preparing for the upgrade (including the device status,
upgrade files, and upgrade tools) L3 Network
3. Traffic switchover DAD
4. Upgrading the Device (You can upgrade the Main Device
first) SW3 SW4
5. Verifying the Upgrade Peer-link
A. After the upgrade succeeds, the traffic is switched back and
the next phase starts.
B. If the upgrade fails, the traffic is switched back. Perform the
upgrade again after the check.

6. After 10 minutes, the device status and services are SW1 SW2
normal, and the other device can be upgraded.

25 Huawei Confidential

• Since V200R020C10, M-LAG can be upgraded in maintenance mode. M-LAG


upgrade in maintenance mode allows you to run commands in the maintenance
mode view to switch traffic from the device to be upgraded to the backup device
and then restart the device. This reduces the packet loss rate during the upgrade
and improves upgrade reliability.
• The upgrade in M-LAG maintenance mode is controlled by a license. By default,
the upgrade in M-LAG maintenance mode is disabled on a newly purchased
device. To use this function, apply for and purchase a license.
• Detailed operations:
▫ SW1 and SW2 are dual-homed to the network through M-LAG, and SW3
and SW4 are dual-homed to the network through routing protocols.
▫ In the M-LAG maintenance mode scenario, enter the maintenance mode of
SW3 and perform the following configurations before the upgrade.
▫ On SW3, change the OSPF and OSPFv3 cost values, or change the MED and
Local_Pref values of BGP and BGP4+ to lower the route advertisement
priority and switch the network-side traffic destined for SW3 to SW4.
▫ On SW3, enable the Eth-Trunk member interface added to the M-LAG to be
set to Down so that SW3's Eth-Trunk member interface added to the M-
LAG sends dying packets to SW1 and SW2. SW1 and SW2 switch the traffic
destined for SW3 to SW4 after receiving the dying packet.
• After the preceding operations are complete, upgrade SW4. After SW3 is
upgraded, perform the following configurations to switch service traffic back to
SW3.
Contents

1. Overview of M-LAG

2. M-LAG Fundamentals
▫ Basic Concepts of M-LAG

▫ Basic Features of M-LAG


◼ M-LAG Traffic Forwarding Process

3. M-LAG Failure Protection

4. M-LAG Deployment

5. M-LAG Best Practices


27 Huawei Confidential
Known Unicast Traffic BUM Traffic Multicast Traffic

Known Unicast Traffic Forwarding: Connecting to Layer 2


Network (1)
Known unicast traffic from non-M-LAG member interfaces Known unicast traffic (east-west) from
(southbound and northbound) non-M-LAG member interfaces

L2 Network L2 Network

STP STP
Blocking Blocking
SW1 SW2 SW1 SW2
Peer-link Peer-link

SW3 SW4 SW5 SW3 SW4 SW5


Forward the unicast traffic according to the normal unicast traffic
forwarding process. East-west traffic from SW3 to SW4 does not pass through the peer-link.

28 Huawei Confidential

• For north-south Layer 2 traffic, M-LAG member devices forward network-side


traffic based on the MAC address table. Due to the STP blocking interface, some
traffic will be forwarded through the peer-link interface to the normal member
devices.

• For east-west Layer 2 traffic, M-LAGs are configured for all devices and no
isolated ports are available. Local Layer 2 traffic is preferentially forwarded
through the M-LAG.
Known Unicast Traffic BUM Traffic Multicast Traffic

Known Unicast Traffic Forwarding: Connecting to Layer 2


Network (2)
Known unicast traffic from M-LAG member interfaces Known unicast traffic from the network side
(southbound and northbound) (southbound and northbound)

L2 Network L2 Network

STP STP
Blocking Blocking

SW1 SW2 SW1 SW2


Peer-link Peer-link

SW3 SW4 SW5 SW3 SW4 SW5


For unicast traffic sent from the network side to M-LAG member interfaces
SW1 and SW2 work in per-flow load balancing mode and forward or non-member interfaces, the local device preferentially forwards the
traffic together. traffic and does not load balance the traffic.

29 Huawei Confidential
Known Unicast Traffic BUM Traffic Multicast Traffic

Known Unicast Traffic Forwarding: Connecting to Layer 2


Network (3)
Special Scenario
L2 Network
⚫ Traffic from SW3 to SW5 will reach SW5 via SW1
STP and SW2.
Blocking

SW1 SW2
⚫ Traffic from SW4 to SW5:
Peer-link
 The packets passing through SW1 are sent to SW2
through the peer-link.
 The packets destined for SW2 will be forwarded
directly to the destination through SW2.

SW3 SW4 SW5

30 Huawei Confidential
Known Unicast Traffic BUM Traffic Multicast Traffic

Known Unicast Traffic Forwarding: Connecting to Layer 3


Network (1)
Known unicast traffic from non-M-LAG member interfaces Known unicast traffic (east-west) from
(southbound and northbound) non-M-LAG member interfaces

L3 Network L3 Network

SW1 SW2
SW1 SW2 Peer-link
Peer-link

SW3 SW4 SW5


SW3 SW4 SW5
Forward the unicast traffic according to East-west traffic from SW3 to SW4 does not pass through the peer-link.
the normal unicast traffic forwarding process.

31 Huawei Confidential

• For north-south Layer 3 traffic, M-LAG member devices preferentially forward


the received network-side traffic locally based on the routing table to implement
load balancing.

• For east-west Layer 3 traffic, M-LAG member devices preferentially forward local
traffic.
Known Unicast Traffic BUM Traffic Multicast Traffic

Known Unicast Traffic Forwarding: Connecting to Layer 3


Network (2)
Known unicast traffic (southbound and northbound) Known unicast traffic from the network side
from M-LAG member interfaces (southbound and northbound)

L3 Network
L3 Network

SW1 SW2
Peer-link
SW1 SW2
Peer-link

SW3 SW4 SW5


For the unicast traffic sent from the network side to M-LAG
SW3 SW4 SW5 member interfaces, the M-LAG device group can load balance the
unicast traffic to access devices. Unicast traffic sent from the
SW1 and SW2 load balance traffic and forward traffic together. network side to non-M-LAG member interfaces is not load-
balanced.

32 Huawei Confidential
Known Unicast Traffic BUM Traffic Multicast Traffic

BUM Traffic Forwarding (1)


⚫ SW1 floods the received traffic. When the traffic reaches SW2, SW2 does not forward the traffic to SW4
because the peer-link and M-LAG member interfaces are isolated unidirectionally.

BUM packet from the network side

L2 Network

STP
Blocking
SW1 SW2
Peer-link
Unidirectional
isolation

SW3 SW4 SW5

33 Huawei Confidential

• BUM packets refer to broadcast, unknown unicast, and multicast packets. The
Layer 2 forwarding process floods these packets.

• Packets received from a common port (dual-homing source port or single-


homing source port) are flooded to the local port and to the peer-link.

• Packets received from the peer-link are flooded only to the single-homing
interface. The unidirectional isolation technology is used to prevent the packets
from being flooded to the dual-homing destination.
Known Unicast Traffic BUM Traffic Multicast Traffic

BUM Traffic Forwarding (2)


L2 Network L2 Network

STP STP
Blocking Blocking

SW1 SW2 SW1 SW2


Peer-link Peer-link
Unidirectional Unidirectional
isolation isolation

SW3 SW4 SW5 SW3 SW4 SW5

BUM packets from non-M-LAG member interfaces BUM packet from M-LAG member interface

SW1 floods the received traffic. When the traffic reaches SW2, SW2 SW1 floods the received traffic. When the traffic reaches SW2, SW2
does not forward the traffic to SW4 because the peer-link and M-LAG does not forward the traffic to SW4 because the peer-link and M-LAG
member interfaces are isolated unidirectionally. member interfaces are isolated unidirectionally.

34 Huawei Confidential

• The figure on the right shows only the packets sent from SW4 to SW1 and the
packets sent from SW4 to SW2. The M-LAG member ports on SW1 are
unidirectionally isolated.
Known Unicast Traffic BUM Traffic Multicast Traffic

Multicast: M-LAG Connecting to a Layer 2 network.


Server A is the multicast source and Server B is the receiver. Server B is the multicast source and Server A is the receiver.

Server B (receiver) Server B (multicast source)

L2 Network L2 Network

STP Blocking STP Blocking

SW1 SW2 SW1 SW2


Peer-link Peer-link
Unidirectional
isolation

Server A (multicast source) Server A (receiver)

If only one copy of traffic is diverted from the network side, the
Traffic from the multicast source is load-balanced device that receives the traffic directly forwards the traffic to the
to the M-LAG Master/Backup. local M-LAG member interface.

35 Huawei Confidential

• If an M-LAG is connected to a Layer 2 network, the Layer 2 network must send


only one copy of traffic to the M-LAG. Otherwise, loops may occur. As shown in
the figure, assume that the M-LAG upstream interface on the right is blocked by
STP.

• When Server A functions as the multicast source and Server B functions as the
multicast group member, the traffic of the multicast source is sent to the M-LAG
Master/Backup through load balancing. Because the upstream interface on the
right M-LAG Master/Backup is blocked, the multicast outbound interface on the
right device points to the peer-link.

• When Server B functions as the multicast source and Server A functions as the
multicast group member, M-LAG Master/Backup can forward multicast traffic.
When only one copy of traffic is diverted from the network side, the device that
receives the traffic directly forwards the multicast traffic to the local M-LAG
member interface.
Known Unicast Traffic BUM Traffic Multicast Traffic

Multicast: M-LAG Connecting to a Layer 3 network.


Server A is the multicast source and Server B is the receiver. Server B is the multicast source and Server A is the receiver.

Server B (Receiver) Server B (multicast source)

L3 Network L3 Network

SW1 SW2 SW1 SW2


Peer-link Peer-link

Server A (multicast source) Server A (receiver)

The M-LAG Master/Backup searches the local multicast table and


Traffic sent by the multicast source is load-balanced to forwards the traffic to multicast group members based on the odd
the Master/Backup in the M-LAG. or even number of the last bits of the multicast IP address.

36 Huawei Confidential

• When ServerAfunctions as a multicast source and ServerBfunctions as a multicast


group member, traffic sent by the multicast source is load balanced to M-LAG
master and backup devices. After receiving the traffic, M-LAG master and backup
devices query the local multicast forwarding table and forward the traffic.

• When ServerBfunctions as a multicast source and ServerAfunctions as a multicast


group member, both M-LAG master and backup devices divert traffic from the
multicast source, query the local multicast forwarding table, and load balance
the traffic to the multicast group member based on the following rules:

▫ If the last digit of the multicast group address is an odd number (for
example, 225.1.1.1, FF1E::1, or FF1E::B), the M-LAG device where the master
M-LAG member interface resides forwards the traffic to the multicast group
member.

▫ If the last digit of the multicast group address is an even number (for
example, 225.1.1.2, FF1E::2, or FF1E::A), the M-LAG device where the
backup M-LAG member interface resides forwards the traffic to the
multicast group member.
Contents

1. Overview of M-LAG

2. M-LAG Fundamentals

3. M-LAG Failure Protection

4. M-LAG Deployment

5. M-LAG Best Practices

37 Huawei Confidential
Introduction to M-LAG Failure Protection
⚫ As an inter-device link aggregation technology, M-LAG improves link reliability from the card level to
the device level. If a fault (link, device, or peer-link fault) occurs, M-LAG uses the fault handling
mechanism to ensure that normal services are not affected.

Proper running

The service link or Device Peer-link Heartbeat Comprehensive


interface fault fault fault link Fault fault

38 Huawei Confidential
Service Link and Interface Device Peer-link Heartbeat Link Comprehensive

M-LAG Service Link Fault - Uplink


Normal time When the fault occurs

Master Backup Master Backup

Peer-link Peer-link

• Uplink faults do not affect DAD on the M-LAG Master/Backup or the active-active system.
• In an M-LAG-to-Ethernet scenario, if the uplink of the M-LAG master device fails, all traffic passing through the M-LAG master
device is forwarded through the peer-link, as shown in the right figure.
• If the M-LAG connecting a Layer 3 network and the uplink is faulty, the route is unavailable. In this case, you need to configure best-
effort path forwarding or configure Monitor-Link (which will be shown later) to disable the downlink interface when the uplink fails.

39 Huawei Confidential

• When an M-LAG is connected to a common Ethernet network and the uplink of


the M-LAG master device fails, traffic passing through the M-LAG master device
is forwarded through the peer-link. (STP performs network convergence, and the
blocked interface may be enabled.)

• If the DAD link is on a service network and the faulty uplink is the DAD link, the
M-LAG works properly without being affected. If the peer-link also fails, DAD
cannot be performed and packet loss occurs.
Service Link and Interface Device Peer-link Heartbeat Link Comprehensive

M-LAG Service Link Fault - Downlink


Uplink traffic when a fault occurs Downlink traffic when a fault occurs

Master Backup Master Backup

Peer-link Peer-link

• If a downstream M-LAG member interface fails, the DFS group Master/Backup status does not change. If the faulty M-LAG member interface is in the
master state, the slave M-LAG member interface becomes the master. The MAC address of the faulty M-LAG member interface points to the peer-link
interface.
• The unidirectional isolation mechanism between the peer-link and M-LAG member interfaces is enabled when the M-LAG master member interface fails to
prevent traffic forwarding failure.
• After the faulty M-LAG member interface recovers, the status of the M-LAG member interface remains unchanged, and the M-LAG member interface that
becomes the master remains the master.

40 Huawei Confidential
Service Link and Interface Device Peer-link Heartbeat Link Comprehensive

M-LAG Service Link Fault - Multicast: M-LAG Connecting to


a Layer 2 Network
Server A is the multicast source and Server B is the receiver. Server B is the multicast source and Server A is the receiver.

Server B (receiver) Server B (multicast source)

L2 Network L2 Network

STP Blocking STP Blocking

SW1 SW2 SW1 SW2


Peer-link Peer-link

Server A (multicast source) Server A (receiver)


When the M-LAG member link fails, multicast traffic is forwarded
If an M-LAG member link fails, multicast services are not affected. through the peer-link and forwarded to the member interface of the
other M-LAG device.

41 Huawei Confidential

• A multicast traffic forwarding scenario is special because M-LAG master and


backup devices load balance traffic depending on whether the last digit of the
multicast group address is an odd or even number and an independent Layer 3
link is required between the M-LAG devices to forward Layer 3 packets (described
on the next slide).

• As shown in the figure on the right, if the local M-LAG member interface fails,
multicast traffic is forwarded to the member interface of the other M-LAG device
through the peer-link.

• Assume that a multicast source is at the network side and a multicast group
member is at the access side. If the M-LAG member interface on the M-LAG
master device fails, the master device instructs the remote device to update
multicast entries through M-LAG synchronization packets. M-LAG master and
backup devices no longer load balance traffic depending on whether the last
digit of the multicast group address is an odd or even number, and all multicast
traffic is forwarded by the M-LAG backup device on which the M-LAG member
interface is Up. If the M-LAG member interface on the M-LAG backup device fails,
multicast traffic is forwarded in a similar manner.
Service Link and Interface Device Peer-link Heartbeat Link Comprehensive

M-LAG Service Link Fault - Multicast: M-LAG Connecting to


a Layer 3 Network
Server A is the receiver and Server B is the multicast source.
⚫ To forward multicast traffic on a Layer 3 network, an
independent Layer 3 link must be configured between
Server B (multicast source)
two M-LAG devices.
Independent Layer 3 link
 In the case of a fault, there may be only one uplink on the
network side. In this case, an independent Layer 3 link is
L3 Network deployed between the M-LAG Master/Backup to transmit
multicast packets.
 Multicast packets with the last bit of a multicast address

SW1 SW2
being an odd number cannot be forwarded to the M-LAG
Peer-link master device (SW1 in this example) through the peer-link.
Instead, the packets can be forwarded to the M-LAG
master device only through an independent Layer 3 link.
 Similarly, if the backup device in the M-LAG system fails,
the multicast packet whose last bit of the multicast
Server A (receiver)
Traffic sent by the multicast source is load-balanced address is an even number may also be forwarded to the
to the Master/Backup in the M-LAG. master device by using the independent layer 3 link.

42 Huawei Confidential
Service Link and Interface Device Peer-link Heartbeat Link Comprehensive

M-LAG Device Fault

⚫ If the M-LAG Master Device fails:


 The backup device becomes the master device and

DAD Link
continues to forward traffic.
 The Eth-Trunk on the M-LAG master device goes
2 Down.
Peer-link
1 The backup
device becomes the
⚫ If the M-LAG backup device is fails:
The original master master device.
device fails.  The Master/Backup status of the M-LAG does not
change, and the Eth-Trunk on the M-LAG backup
device goes Down.
M-LAG master device failure
 The Eth-Trunk link on the M-LAG master device
remains Up, and the traffic forwarding status
remains unchanged.

43 Huawei Confidential

• When a faulty M-LAG member device recovers, the peer-link goes Up first, and
the two M-LAG member devices renegotiate their master and backup roles. After
the negotiation succeeds, the M-LAG member interface on the faulty M-LAG
member device goes Up and traffic is load balanced. Both the M-LAG master and
backup devices retain their original roles after recovering from the fault.
Service Link and Interface Device Peer-link Heartbeat Link Comprehensive

M-LAG Peer-Link Fault


⚫ If the peer-link fails but the DAD heartbeat status is normal, all interfaces on the backup M-LAG device,
except the logical interface, management interface, and peer-link interface, enter the Error-Down state.

Proper running Failure

DAD Link DAD Link

Master Backup Master Backup

Peer-link Peer-link

44 Huawei Confidential

• You can run a command to configure logical interfaces on the M-LAG backup
device to enter the Error-Down state if the peer-link fails but the DAD heartbeat
status remains normal.

▫ If the peer-link fails but the DAD heartbeat status is normal when M-LAG is
used for dual-homing access on a VXLAN or IP network, the VLANIF
interface, VBDIF interface, loopback interface, and M-LAG member
interface on the M-LAG backup device enter the Error-Down state.

• After logical interfaces are configured to change to Error-Down state when the
peer-link fails but the DAD heartbeat status is normal in an M-LAG, if a faulty
peer-link interface in the M-LAG recovers, the devices restore VLANIF interfaces,
VBDIF interfaces, and loopback interfaces to Up state 6 seconds after DFS group
pairing succeeds to ensure that ARP entry synchronization on a large number of
VLANIF interfaces is normal. If a delay after which the Layer 3 protocol status of
the interface changes to Up is configured, the delay after which VLANIF
interfaces, VBDIF interfaces, and loopback interfaces go Up is the configured
delay plus 6 seconds.

• When the faulty peer-link recovers, the M-LAG member interface in the Error-
Down state automatically restores to the Up state after 240s by default, and the
other interfaces in the Error-Down state automatically restore to the Up state
immediately.

• When the peer-link recovers, the M-LAG interface in the Error Down state
automatically goes Up after 240 seconds by default, and the other M-LAG
interfaces in the Error Down state immediately go Up.
Service Link and Interface Device Peer-link Heartbeat Link Comprehensive

M-LAG Heartbeat Link Fault

⚫ If the heartbeat link fails:


 Services are not affected.

DAD Link
 The dual-active peer-link fault cannot be identified

Master Backup during peer-link troubleshooting.

Peer-link
⚫ Therefore, If the heartbeat link fails:
 The failsafe mechanism is not triggered.
 However, an alarm will be generated. You need to
handle the alarm in a timely manner to prevent service
abnormalities if the entire peer-link fault occurs.

46 Huawei Confidential

• After the heartbeat link fault is rectified, a heartbeat fault clear alarm is
generated.
Service Link and Interface Device Peer-link Heartbeat Link Comprehensive

Peer-Link Fault + M-LAG Device Fault (Problem)

The peer-link fails. Then, an M-LAG device fails. Traffic is Interrupted.

• If the peer-link fails but the DAD


heartbeat status is normal when M-LAG
is used for dual-homing access, some
DAD Link DAD Link
interfaces on the DFS backup device enter
Master Backup Master Backup
Error-Down state. In this case, the DFS
master device continues to work.
Peer-link
• If the DFS master device cannot work
because it is powered off, its MPU is
damaged, or it restarts due to a fault,
both the DFS master and backup devices
cannot forward traffic.

47 Huawei Confidential
Service Link and Interface Device Peer-link Heartbeat Link Comprehensive

Peer-Link Fault + M-LAG Device Fault (Solution)

Enable the enhanced function for


⚫ Enable the enhanced secondary fault function:
If the enhanced secondary fault function has
been enabled in the M-LAG, the backup device
DAD Link detects the fault of the DFS master device by
Master Backup
using the dual-active detection (DAD)
Peer-link mechanism. (No M-LAG DAD heartbeat packet is
received within a certain period.) After, the
becomes the DFS master device and the interface
that is in the ERROR DOWN state on the device
goes Up and continues to forward traffic.

48 Huawei Confidential

• Device fault rectification: If the fault on the original DFS master device is rectified
but the peer-link fault persists, the following applies:

▫ If the LACP M-LAG system ID is switched to the LACP system ID of the local
device within a certain period, the access device selects only one of the
uplinks as the active link during LACP negotiation. The actual traffic
forwarding is normal.

▫ If the default LACP M-LAG system ID is used, that is, it remains unchanged,
two M-LAG devices use the same system ID to negotiate with the access
device. Therefore, links to both devices can be selected as the active link. In
this scenario, because the peer-link fault persists, M-LAG devices cannot
synchronize information such as the priority and system MAC address of
each other. As a result, two M-LAG master devices exist, and multicast
traffic forwarding may be abnormal. In this case, the HB DFS
master/backup status is negotiated through heartbeat packets carrying
necessary information for DFS group master/backup negotiation (such as
the DFS group priority and system MAC address). Some interfaces (for
details, see Peer-LinkFault) on the HB DFS backup device are triggered to
enter Error-Down state. The HB DFS master device continues to work.
Contents

1. Overview of M-LAG

2. M-LAG Fundamentals

3. M-LAG Failure Protection

4. M-LAG Deployment
◼ M-LAG Multi-Protocol Deployment

▫ M-LAG Deployment Scenario

5. M-LAG Best Practices

49 Huawei Confidential
Multi-Protocol Deployment - STP Solution 1: Root Bridge
Solution
Physical view STP logical view

1. Manually set the M-LAG system as the


root bridge.
2. Configure the M-LAG system to use Advantages of the root bridge solution:
the same STP bridge ID. • Simple implementation: The STP protocol
3. STP is disabled on the peer-link. Root Bridge
implementation does not need to be
modified. Only the STP bridge ID parameter
Peer-link
needs to be set.
STP Blocking
Constraints on the root bridge solution:
STP Blocking
• An M-LAG system can only serve as the STP
root bridge, but cannot serve as a non-root
bridge.

The device considers that The device considers that


it is connected to the it is connected to the
same device through an same device through two
Eth-Trunk and the two independent links and a
links in the Eth-Trunk do port needs to be blocked
not need to be blocked. to prevent loops.

50 Huawei Confidential

• Configuration suggestion: When configuring M-LAG based on the root bridge, set
the bridge IDs of the two devices in the M-LAG to the same and set the root
priority to the highest. This ensures that the two devices in the M-LAG are the
STP root bridges.
Multi-Protocol Deployment - STP Solution 2: V-STP Solution

STP logical
⚫ After the V-STP mode is enabled on the M-LAG
Physical view
view Master/Backup, the two devices are virtualized into one
device using V-STP to calculate the port role and fast
Root Bridge Root Bridge convergence once the M-LAG master/backup negotiation is
successful.
Peer-link
⚫ After the V-STP mode is enabled, the M-LAG backup device
needs to synchronize the bridge MAC address and instance
priority information of the M-LAG master device.
Peer-link
⚫ The M-LAG backup device uses the bridge MAC address and
instance priority information synchronized from the M-LAG
master device to perform STP calculation and send and
receive BPDUs. This ensures that the STP calculation
parameters are consistent after the M-LAG master device is
virtualized into one device.

51 Huawei Confidential
Multi-Protocol Deployment - STP Solution Comparison
⚫ The root bridge mode and V-STP mode can be used to build a loop-free network. In root bridge mode,
M-LAG devices must be manually specified as the same bridge. In V-STP mode, protocol information
between M-LAG devices must be synchronized and displayed as one device for STP negotiation.

Mode Configuration Method Application Scenario Application Limitations

This mode applies to the


deployment of a single- • This mode supports only STP, RSTP, and
Manually configure the two M-
level M-LAG or the MSTP.
LAG devices as the root bridge
Root Bridge deployment of a multi- • M-LAG cascading in all-root bridge mode is
and configure the same bridge
Mode level M-LAG at the not supported.
ID to simulate the two devices as
aggregation layer as the • STP must be disabled on the peer-link
the same root bridge.
root bridge of a Layer 2 interface.
network.
After the V-STP mode is enabled, Applies to interconnection
• Only STP and RSTP are supported in this
the STP protocol status is with traditional STP
mode.
synchronized between dual- networks.
V-STP mode • The M-LAG member interface
homing devices so that the two This mode applies to
confiqurations on M-LAG master and
devices use the same status for multi-level M-LAG
backup devices must be the same.
STP negotiation. deployment.

52 Huawei Confidential
Multi-Protocol Deployment - Dual-Active Gateway (1)

Physical topology Layer 3 logical view

• Scheme 1 (simulate the same gateway):


Configure the same gateway IP address
Layer 3 network
Layer 3 network and MAC address for the two devices.

Peer-link Solution 1 is preferred. This solution is easy to


Gateway
Same IP address, MAC address configure and reduces protocol costs. The Layer
M-LAG
3 gateways on the two M-LAG devices are
LAG
independent. You can configure the same IP
address and MAC address for the Layer 3
gateways on the two devices so that they
function as dual-active gateways.

53 Huawei Confidential
Multi-Protocol Deployment - Dual-Active Gateway (2)

Physical topology Layer 3 logical view

• Solution 2 (virtual route redundancy


protocol): Configure VRRP on the two
Layer 3 network
VRRP Protocol Layer 3 network devices. An M-LAG supports VRRP dual-
active mode.

VRRP all-active gateway


• Create a VRRP group on VLANIF or VBDIF
M-LAG
interfaces and configure the same virtual IP
LAG
and MAC addresses for them so that the
M- LAG master and backup devices in the
VRRP group function as dual-active
gateways.

54 Huawei Confidential

• If VRRP is deployed, VRRP information needs to be synchronized between the


master and backup devices through the peer-link so that the virtual interfaces
(VLANIF or VBDIF interfaces) of the master and backup devices have the same
virtual IP address and virtual MAC address.

• M-LAG and VRRP are usually configured together in Data Center Interconnect
(DCI) scenarios.
Multi-Protocol Deployment - Dynamic Routing Protocols
such as OSPF
⚫ M-LAG devices can function as access devices to connect to servers or as egress devices to connect to egress routers
(PEs).
 An M-LAG can be configured with a static route to the network segment where a server resides or use OSPF to dynamically
exchange routing information with the server.
 An M-LAG can function as a border leaf node and communicate with PE routers through OSPF to exchange routing information
and implement internal and external communication of the data center.
PE

SW1 SW2
OSPF
M-LAG
SW1 SW2
OSPF

M-LAG

Server
LAG

55 Huawei Confidential

• The server is dual-homed to the M-LAG and has static routes configured so that
it can communicate with the M-LAG through Layer 3 routes. However, the
network using static routes is difficult to configure and maintain and is lack of
flexible and fast deployment capabilities, thereby cannot meet the requirements
of rapidly growing services.

• To address this problem, M-LAG member devices need to establish neighbor


relationships of dynamic routing protocols with the user-side device. Therefore,
M-LAG member interfaces need to support dynamic routing protocols.

• Before configuring OSPF over M-LAG, you need to complete the following tasks:

▫ Establish an M-LAG and an OSPF network.

▫ Add M-LAG member interfaces to the corresponding VLAN.

▫ Enable OSPF on the user-side device.


Multi-Protocol Deployment - Monitor Link or Best-effort Path
⚫ When an M-LAG accesses a Layer 3 network, if the uplink of a device fails, packets cannot be forwarded through the peer-link. As a result, packets sent to
the device are discarded.

L3 Network L3 Network
Monitor-link is deployed. If
the uplink interface goes
Down, the downlink interface
SW1 SW2 goes Down. SW1 SW2
Peer-link Peer-link

Monitor-link

L3 Network Configure a best-effort path. L3 Network


When the uplink of a device
Best-effort link
fails, the device forwards
packets to another device
SW1 SW2 through the best-effort link SW1 SW2
Peer-link (such as a static route). Peer-link

Best-effort path

56 Huawei Confidential

• In a Layer 3 scenario, a bypass link must be configured between M-LAG master


and backup devices. Otherwise, the uplink traffic that reaches the master device
cannot reach the backup device through the peer-link.

• When Monitor Link is configured, if the downlink or M-LAG member interface of


the other device fails, all traffic is discarded. Therefore, Monitor Link is not
applicable to the scenario where the M-LAG functions as the egress gateway.
Contents

1. Overview of M-LAG

2. M-LAG Fundamentals

3. M-LAG Failure Protection

4. M-LAG Deployment
▫ M-LAG Multi-Protocol Deployment
◼ M-LAG Deployment Scenario

5. M-LAG Best Practices

57 Huawei Confidential
Overview of the M-LAG Deployment Solution
⚫ M-LAG deployment modes are as follows:

M-LAG access M-LAG access M-LAG M-LAG


network type device type access mode deployment mode

Connecting to a
Switch access Single-homing access Single-level M-LAG
Layer 2 network

Connecting to a
Server access Dual-homing access Multi-level M-LAG
Layer 3 network

Connecting to a
VAS device access
tunnel network

58 Huawei Confidential

• Single-level M-LAG deployment is the most common deployment. The preceding


sections use the deployment as an example and are not described here.
M-LAG Access Network Type - Connecting to a Layer 2
Network
⚫ An M-LAG can connect to a Layer 2 network, such as an Ethernet network. Pay attention to the
following point: To prevent loops, a link may be blocked by STP, and packets may need to be
forwarded through the peer-link.

L2 Network

DAD
STP blocking

SW1 SW2
Peer-link

SW3 SW4 SW5

59 Huawei Confidential
M-LAG Access Network Type - Connecting to a Layer 3
Network
⚫ An M-LAG system can access a Layer 3 network. Note the following points:
 The M-LAG system functions as the gateway of the access-side device. To function as a logical device, the M-LAG system must
be deployed with active-active gateways.
 If the ping test is performed between a device in the M-LAG system and a PE, packet loss may occur due to load balancing
between the PE and the PE. (This is normal and does not affect services.)
PE

SW1 SW2
Peer-link

SW3 SW4 SW5

60 Huawei Confidential

• You are not advised to configure M-LAG member interfaces as main interfaces.

• When a Layer 3 device is connected to an M-LAG, connect it to the M-LAG in


ECMP mode instead of M-LAG dual-homing mode. (Note: You can configure
static routes to allow a Layer 3 device to be dual-homed to an M-LAG.)
M-LAG Access Network Type - Connecting to a Tunnel
Network
⚫ A large Layer 2 network needs to be deployed in a data
center. M-LAG can connect to the VXLAN network. VXLAN Network

Note the following points:


DAD
 Configure VXLAN on the two devices in the M-LAG. The
two devices function as tunnel endpoints and must be
SW1 SW2
configured with the same tunnel endpoint IP address and Peer-link

function as dual-active gateways. When an underlay


routing protocol (such as OSPF) is configured, the two
devices have different router IDs.
 The M-LAG is considered as a logical device (one tunnel
endpoint) to the remote device. Traffic sent to the M-LAG
is load balanced to the M-LAG member devices. Server
 Configure Layer 2 sub-interfaces on the M-LAG interfaces
of leaf nodes to transmit traffic.

61 Huawei Confidential

• Configuration suggestions:

▫ Configure M-LAG on leaf switches to support dual-active access of server


NICs.

▫ Configuring EVPN: Configure the same VTEP on two leaf nodes.

▫ Configuring Layer 2 functions: Create a BD and specify a VNI.

▫ Create a Layer 2 sub-interface on the M-LAG interface of a leaf node and


associate the Layer 2 sub-interface with the BD.

▫ Configuring Layer 3 functions: Create BDIF interfaces and configure IP and


MAC addresses for the BDIF interfaces.

• Note: For distributed VXLAN gateways, BDIF interfaces on the same network
segment must be configured with the same IP address and MAC address to
support VM migration.
M-LAG Access Device Type - Switch Access
⚫ A switch can function as the access device of the Network
M-LAG system. Generally, the switch is not a data
DAD
source but a Layer 2 transparent transmission
device. In this case, note the following:
SW1 SW2
Peer-link
 When a switch is dual-homed to a switch, only link
aggregation can be configured to implement load
balancing. The hash calculation result determines
the device in the M-LAG system to which the switch
sends packets.
 If the access switch is not connected to other Layer
2 networks, STP does not need to be configured to
prevent loops. The unidirectional isolation
mechanism prevents loops between the access Server Server

switch and the M-LAG system.


62 Huawei Confidential
M-LAG Access Device Type - Server Access
⚫ The M-LAG system supports dual network ports of a
server in active-active bond and active-backup bond
mode.
Peer-link
⚫ If two network ports on a server are connected in active-
active mode:
 LACP is recommended to provide higher reliability and
failover performance. Active-active Active/backup
 When some servers are deployed in PXE mode, access
switches must support dynamic LACP. When the server goes
online, no configuration is configured. In this case, LACP
negotiation fails and the Eth-Trunk goes Down. However,
member interfaces can independently forward Layer 2 data so
that the server can obtain the configuration file. After
obtaining the configuration, the server negotiates aggregation
parameters with the access switch through LACP.

63 Huawei Confidential

• Note: The Linux operating system is used as an example. The operating system
supports seven bonding modes.

▫ 0. round robin and 4.lacp support load balancing between two network
ports. They are two common dual-network-port active-active access
solutions. Link binding must be configured on the peer switch.

▪ 0. round robin: Data packets are sent to each interface in polling


mode to implement load balancing.

▪ 4.lacp: indicates that LACP is used to negotiate the working mode,


load balancing, and redundancy of bound interfaces.

▫ "1.active-backup": indicates the common active-standby mode. Link


aggregation does not need to be configured on interfaces of the remote
switch.
M-LAG Access Device Type - VAS Device Access (Firewall as
an Example)
⚫ VAS devices can be connected to the M-LAG system in bypass or direct connection mode.
 In bypass mode, traffic can pass through the firewall based on the route, avoiding traffic bottlenecks on the firewall. In addition,
the bypass mode is more conducive to network expansion.
 In the direct connection mode, all traffic needs to pass through the firewall.
Heartbeat line

Heartbeat line

Peer-link

Peer-link

M-LAG system connected to firewalls in bypass mode Direct firewall connection

64 Huawei Confidential

• HRP must be configured between firewalls to ensure network reliability.


M-LAG Access Mode - Dual-Homing to an M-LAG
⚫ Generally, access devices are dual-homed to an M-LAG, which is recommended and most commonly used.
⚫ The networking where access devices are dual-homed to an M-LAG through link aggregation has the following
advantages:
 If the peer-link fails, fast convergence can be performed. In dual-active scenarios, traffic forwarding behaviors are consistent.
 Dual-active redundant forwarding paths are provided, improving link reliability.

DAD

SW1 SW2
Peer-link

SW3 SW4

65 Huawei Confidential
M-LAG Access Mode - Single-Homing to an M-LAG
⚫ If a device cannot be dual-homed to an M-LAG, preferentially connect the device to another device that has been
dual-homed to the M-LAG.
⚫ If a device cannot be connected to another device that has been dual-homed to the M-LAG, you can connect the
device to the M- LAG master device to prevent the device from being isolated upon failure of the peer-link. (If the
peer-link fails, all interfaces except the stack interface, management interface, and peer-link interface on the
backup device enter Error-Down state.) In addition, you are advised to use the VLAN that is not used by M-LAG
member interfaces.
DAD DAD

SW1 SW2 Backup Master


Peer-link Peer-link

Device

SW4 SW3 SW4


SW3

66 Huawei Confidential
M-LAG Deployment Mode - Multi-Level M-LAG
⚫ Multi-level M-LAG interconnection is mainly used in large-scale data centers to build large Layer 2
networks. It not only simplifies networking, but also increases the number of dual-homing access
servers while ensuring reliability.
⚫ During the configuration of two-level M-LAG and in various fault scenarios, ensure that no loop occurs.

SW1 SW2
Peer-link

SW3 SW4
Peer-link

67 Huawei Confidential

• In a multi-level M-LAG scenario, you cannot manually configure the root bridge
to prevent STP loops. This is because if the two devices in an M-LAG are
configured as root bridges, other devices cannot run. Therefore, V-STP must be
deployed to synchronize STP status information between M-LAG member devices
in a multi-level M-LAG scenario.
Contents

1. Overview of M-LAG

2. M-LAG Fundamentals

3. M-LAG Failure Protection

4. M-LAG Deployment

5. M-LAG Best Practices

68 Huawei Confidential
M-LAG Deployment Best Practice

• It is responsible for Layer 3 switching between different


aggregation layer areas or between aggregation layer areas and
Core layer
other networks.

OSPF • Uses L3 interfaces to connect to core switches.


• Configure LACP on M-LAG ports.
• Deploy active-active gateways. Supports load balancing.
Aggregation layer • V-STP is deployed and functions as the root bridge to prevent
loops caused by incorrect connection.
• Configure DAD based on the out-of-band management network.

• V-STP is deployed to prevent loops caused by incorrect


Access connection. Configure edge ports and BPDU protection for
layer access-side ports.
• Configure DAD based on the out-of-band management network.

Server Peer-link

69 Huawei Confidential

• It is recommended that a dedicated L3 best-effort link be configured between M-


LAG devices to meet the scenario where all upstream ports on a single member
device fail.
Quiz

1. (Multiple-answer question) In an M-LAG, which of the following entries are synchronized


between two devices? ( )
A. MAC address entry

B. ARP entry

C. Routing entry

D. ACL entry

2. (Short-answer question) What are the functions of DAD links in an M-LAG? What are the
deployment considerations?

70 Huawei Confidential

1. AB

2. A dual-active detection (DAD) link, also called a heartbeat link, is a Layer 3


interconnection link used to exchange DAD packets between M-LAG master and
backup devices. Under normal circumstances, the DAD link does not participate
in any traffic forwarding behaviors in the M-LAG. It is only used to detect
whether two master devices exist when a fault occurs. The DAD link can be an
external link, for example, if the M-LAG is connected to an IP network and the
two member devices can communicate through the IP network, the link that
enables communication between the member devices can function as the DAD
link. An independent link that provides Layer 3 reachability can also be
configured as the DAD link, for example, a link between management interfaces
of the member devices can function as the DAD link.
Summary
⚫ M-LAG is a mechanism that implements inter device link aggregation. Two access switches
in the same state in an M-LAG can perform link aggregation negotiation with a connected
device. M-LAG allows two devices to establish a dual-active system, improving link reliability
from the card level to the device level.
⚫ This course describes the basic concepts, fundamentals, failure protection principles, and
typical applications of M-LAG on data center networks.
⚫ The CloudFabric solution uses M-LAG and VXLAN to implement end-to-end reliability,
ensuring that service systems can run properly in device failure and upgrade scenarios.

71 Huawei Confidential
Thank you. 把数字世界带入每个人、每个家庭、
每个组织,构建万物互联的智能世界。
Bring digital to every person, home, and
organization for a fully connected,
intelligent world.

Copyright© 2023 Huawei Technologies Co., Ltd.


All Rights Reserved.

The information in this document may contain predictive


statements including, without limitation, statements regarding
the future financial and operating results, future product
portfolio, new technology, etc. There are a number of factors
that could cause actual results and developments to differ
materially from those expressed or implied in the predictive
statements. Therefore, such information is provided for reference
purpose only and constitutes neither an offer nor an acceptance.
Huawei may change the information at any time without notice.
Huawei CloudFabric Data Center Network
Solution
Foreword
⚫ Nowadays, the cloud-based digital architecture has become the key to digital transformation. ICT
infrastructure has undergone profound cloud transformation, and cloud computing has been widely
used. As a key ICT infrastructure, data center networks (DCNs) also need to undergo technological
transformation based on service requirements in the cloud computing scenario.
⚫ To meet the service requirements and challenges for traditional DCNs in the cloud computing scenario,
Huawei launches the CloudFabric Hyper-Converged DCN Solution, which is also called Huawei
CloudFabric Solution.
⚫ This course describes the overall architecture, functions, and features of the CloudFabric Solution based
on service scenarios and requirements, and further introduces the core components and functions of
the solution.

2 Huawei Confidential
Objectives

⚫ On completion of this course, you will be able to:


 Describe the development trend and challenges of DCNs.
 Describe the architecture and core components of the CloudFabric Solution.
 Describe the application scenarios of the CloudFabric Solution.
 Describe typical functions and features of the CloudFabric Solution.

3 Huawei Confidential
Contents

1. Development Trends and Challenges of DCNs

2. Huawei CloudFabric Solution

3. Solution Features

4 Huawei Confidential
DC Mission: Shift from a Service Center to a Value Center

Resource pool sharing Cloud-based services Data value mining


Higher resource utilization Optimized delivery efficiency Unleashing the potential of data

Large Layer-2 data Next-generation DC Instant messaging tool


center (DC) architecture system of Bank Y Z + AI marketing
of Energy Company X 1000 -> 100,000 transactions per 38% higher game
Server utilization second download speed
20% -> 60% Elastic scaling in minutes 49% higher ad conversion
speed

Virtualization era Cloud era AI era

2010 2015 2020

5 Huawei Confidential
DCNs Are Evolving to Multi-DC, Multi-cloud Networks
Mobility-based acceleration Ubiquitous services Agile service rollout
Offline -> online, improving Personalized services, enhancing user loyalty Quick business monetization,
the service efficiency accelerating innovation

Online
shopping
Ad Ecosystem Third-party app
Mobile app Supermarket integration … …
placement
Payment by experience shopping
card User
Multiple quick Entertainment profile SDK/API SDK/API SDK/API
Entrusted fee payment methods consumption
deduction ...
Hundreds of Third-party Product
Information
Counter millions of monthly transfer recommendation
push
transfer active users ETC Risk
prevention
Usage and control Data volume: Interconnection Rollout
User scale: 10x 24/7 services
frequency: 10x 200x scenarios: 100x speed: 10x

Centralized -> Distributed Single-DC -> Multi-DC Private cloud -> Hybrid cloud
More complex DC scale increased Virtualization scale
system architecture by 100 times increased by 100 times

6 Huawei Confidential
DCNs Are Evolving Toward All-Ethernet
As-Is To-Be
IT
architecture Server
Centralized 100x scale interconnection
-> distributed over the Ethernet
Centralized Distributed

Computing CPU/GPU
unit 100x
Ethernet Ethernet interconnection
PCIe is
PCIe performance over the Ethernet
replaced
IB or Ethernet

Storage All-flash
media 1000x capacity interconnection
SCSI NVMe
HDD -> SSD over the Ethernet
FC RoCE
(32G) (400G)

7 Huawei Confidential

• The network connects computing and storage servers to support the IT


architecture of the entire DC. This means that the network needs to be adjusted
accordingly upon any change of the connected servers or the IT architecture.
Such change, however, is commonplace in DCs. Specifically, three major changes
of the IT architecture, computing, and storage are driving DCNs to evolve from
the original multi-protocol mode to all-Ethernet.

▫ The IT architecture has evolved from centralized to distributed, and large-


scale node interconnection has become a new norm on the Ethernet.

▫ PCIe buses are being removed from computing units, no matter whether
they are CPUs or GPUs. This aims to break through the bus speed
bottleneck. Instead, Ethernet ports are used to directly provide higher
computing power.

▫ From the perspective of storage media, HDDs are upgrading toward all-
flash, improving storage performance 100-fold. Traditional FC, however,
provides only 32G bandwidth, which cannot meet the high throughput
requirements of all-flash. In this context, the Ethernet with up to 400G
bandwidth becomes the de facto standard for the next-generation storage
network.

• Note:
▫ PCIe: PCI Express
▫ IB: InfiniBand, an Input/Output (I/O) technology
▫ HDD: Hard Disk Drive
▫ SSD: Solid State Disk
Challenge 1: AI-Powered DCs Pose Challenges to Networks
Unified management of compute
Rapid construction of ultra-large Frequent service changes bring a
resource pooling platforms, such as
DCNs large number of network changes
VMs and containers
Requiring fast network Requiring network linkage and rapid Requiring intelligent network
construction evaluation and verification
login and logout

Full DCN lifecycle Planning Installation Monitoring


and and
Service Service Fault
...
and handling
Automatic design deployment rollout change maintenance
requirement sorting

8 Huawei Confidential
Challenge 2: The Network Changes and O&M Have
Exceeded Human Limits
Bank A: The single DC construction volume Bank B: 14,000+ changes in a year Abnormal alarms are generated due to
in 2021 is greater than that in 2020. The network is interrupted for 40 unexpected situations caused by changes.
Deployment and rollout of 30 switches: 3+ minutes because a legacy server port is Network-wide emergency recovery is the top
person-weeks deleted by mistake. priority. Ensure that the network recovery time
> 3 days for rolling out a service, cross-DC, is less than 30 minutes.
N work orders and conferences.

23:00 1:00 3:00 5:00

Network-wide Network-wide
snapshot Change rollback
execution

Urgent rollback: abnormal alarms,


changes that are not completed within
the specified time, customer-defined...
Manual network construction and change Experience-dependent change, no prevention Network exceptions or faults cannot be
operations or detection methods quickly rectified
Manual operations such as solution design, Nearly 40% of faults are caused by human Average fault locating time: > 76 minutes
evaluation, and decision-making account for errors, causing multiple major accidents. Average critical incident recovery time: >
80% of the entire process. 40 minutes

Long service rollout period Error-prone configuration change Slow network recovery

9 Huawei Confidential
Challenge 3: Difficulties Faced by Traditional O&M
Difficult health check Difficult fault locating Difficult network change
Fluctuating securities market, Hundreds of millions of cross-bank Enormous increase in Internet
resulting in the daily needs to cope transactions per day, requiring traffic, requiring network
with service peaks. 24/7 uninterrupted services. changes every week.
0.09 0.63 1.1 1.6 2.0 2.8 6.48


Survey on loss caused by fault-triggered
interruptions ①
It takes three person-hours to perform
The complicated architecture
routine inspection before the market About 70% of network faults are
results in difficult fault locating.
opens every day. This increases caused by human errors as changes
It takes 76 minutes on average to
difficulties in confidently keeping up are manually compared and verified.
locate a fault.
with the general market trends.

10 Huawei Confidential

• Note:

▫ Source ①: Network Computing, the Meta Group and Contingency Planning


Research
▫ Source ②: App Annie
Challenge 4: Three Challenges Faced by All-Ethernet
Evolution
Zero packet loss Zero packet loss required More complex O&M
required for HPC for active-active storage of large networks

0.2-0.3% packet loss rate 1000 nodes, requiring million-level


configurations
0.15%

(>70 km)

DC A DC B
0.02%

On the traditional Ethernet, the packet As the intra-city long-distance latency The traditional Ethernet lacks effective
loss rate increases exponentially with the increases, it is more difficult to perform O&M methods, and the network is too
number of network nodes. flow control across DCNs on the complex to be handled manually.
traditional Ethernet.

11 Huawei Confidential

• Network evolution toward all-Ethernet faces three challenges. It is well known


that the Ethernet is natively prone to packet loss, which remains unresolved for
more than 40 years since the debut of the Ethernet. As the network scale
increases, the packet loss rate increases exponentially. In intra-city active-active
storage scenarios, long-distance transmission causes an extra latency of hundreds
of microseconds, making it even harder for network flow control to implement
zero packet loss. The Ethernet lacks effective O&M methods. As services are
migrated to clouds, the network scale increases 100-fold, and the number of
relationships between network objects such as ports and policies reaches millions.
Manual network O&M no longer can meet requirements.
Contents

1. Development Trends and Challenges of DCNs

2. Huawei CloudFabric Solution


◼ Solution Features

▫ Overall Architecture

▫ Application Scenarios

▫ Core Components and Key Services

3. Huawei CloudFabric Typical Scenarios - Computing Scenario

12 Huawei Confidential
Huawei CloudFabric Solution

Full-lifecycle automation
Automated network planning, construction,
maintenance, and optimization
Intent-driven, network as a service (NaaS).
Large
Finance Government
enterprise

Lossless Ethernet
Local and long-distance lossless
data transmission,
Converged computing and
Hyper-converged DCN storage networks.

All-Ethernet Network-wide
intelligent O&M
Compute cluster Service cluster Storage cluster
Predictive maintenance of devices, ports,
optical modules, networks, and services,
GPU CPU Storage ensuring interruption-free services.

13 Huawei Confidential

• Based on the development trends and challenges of DCs, Huawei launches the
CloudFabric Hyper-Converged DCN Solution, which can:

▫ Implement full-lifecycle automation of services and improve the service


TTM by 90%.

▫ Build a lossless Ethernet network to implement lossless HPC and implement


lossless long-distance transmission so as to build intra-city active-active
storage networks over Ethernets.

▫ Implement fast fault detection, intelligent analysis, and fast fault


remediation, as well as proactive fault prediction in a large number of fault
scenarios.

• Huawei CloudFabric Solution is built on Huawei DC flagship core switches —


CloudEngine 16800/12800 series — and high-performance fixed switches —
CloudEngine 8800/7800/6800/5800 series. It works with Huawei DC controller —
iMaster NCE-Fabric, intelligent network analysis platform — iMaster NCE-
FabricInsight, and security solution — HiSec, ideal for providing customers with
simplified operation experience throughout the DCN lifecycle spanning network
planning and construction, service rollout, O&M and monitoring, and change
optimization. It also implements intelligent remediation of network faults, and
can detect, analyze, and isolate network faults in real time. In addition, the
CloudFabric Solution can meet the evolution requirements of DCs to an all-
Ethernet architecture. It can integrate computing and storage networks, enable a
lossless Ethernet, and improve computing and storage performance.
Feature 1: Full-Lifecycle Automation
Day 0 planning and design:
Customer service system/
operations platform Intent-driven, intelligent recommendation of network
design solutions.
Graphical drag-and-drop, 3x higher deployment efficiency
OpenStack
than the industry average.
Kubernetes
VMware
Planning
Construction Maintenance Optimization

Day 1 service rollout:


Analysis Pre-event simulation and verification of service rollout and
changes, 40% lower configuration errors.
Automation Intelligence Level-3 (service/tenant/network) emergency handling,
Management Control service recovery in 20 minutes.

Hyper-converged DCN
Day 2 O&M and monitoring:
Service experience-based network health evaluation,
implementing predictive maintenance.
General-purpose Storage HPC
AI-powered network knowledge graph, implementing "1-
computing
3-5" intelligent O&M.

14 Huawei Confidential

• Currently, network configuration automation has been implemented through


SDN on many DCNs. However, service design and planning, technical review, and
effect verification still need to be manually performed, involving multiple
departments and roles. The entire process is time-consuming and inefficient,
which has become the bottleneck of service rollout.

• The CloudFabric solution introduces intelligent algorithms in:

▫ Design phase: The factors that affect network design are broken down into
three evaluation dimensions: resource, quality, and reliability. In this way,
the network solution can be generated and recommended in seconds.

▫ Verification phase: The network topology, device configuration, and traffic


information are calculated together to implement second-level verification
of massive configurations on the entire network.

• The CloudFabric Solution can implement automated management and control


throughout the network lifecycle spanning network planning and construction,
service rollout, O&M and monitoring, and change optimization.
Feature 2: Lossless Ethernet
• Lossless Ethernet based on PFC.
Zero
Application HPC application AI application Big data application Database application • Introduce an intelligent plane to
packet
implement network self-
loss
HPC platform AI platform Storage platform optimization.
Software

Computing service base


platform

HPC resource Horizontal AI resource Horizontal General-purpose Horizontal Storage • Zero packet loss is the basis for
pool resource pool resource resource pool resource resource pool Low achieving low-latency RDMA.
reuse reuse reuse delay • E2E congestion control reduces the
overall latency.
Computing center 1 Computing center 2
Hardware platform

DCI (Long-
haul lossless
Intelligent and Intelligent and • Chip- and port-level high
transmission) High
lossless network lossless network bandwidth.
throughput • Network-level balanced scheduling.
Computing
center
iNIC interconnection iNIC
HPC computing Storage AI computing Storage
zone zone zone zone • Unified O&M of three networks all
Intelligent built on the Ethernet.
Infrastructure Liquid cooling/Air cooling + power supply Integrated equipment room/DC O&M • Simulation, verification,
monitoring, and optimization.

15 Huawei Confidential

• The intelligent lossless algorithm overcomes the packet loss problem of Ethernet,
which has remained unresolved for 40+ years. This helps to achieve zero packet
loss under 100% throughput, meeting the ultimate network performance
requirements of HPC and high-performance storage services and doubling the
computing power and storage I/O performance at the same cluster scale.

▫ The CloudFabric solution provides an all-Ethernet HPC network for HPC


scenarios. Based on Huawei's unique iLosslessTM algorithm, the solution
solves the Ethernet packet loss problem that remains unresolved for many
years and achieves zero packet loss under 100% throughput, providing the
ultimate network performance required by HPC services with unchanged
network scale and doubled computing power.

▫ The CloudFabric solution provides an active-active all-Ethernet storage


network for storage scenarios. Based on the iLosslessTM algorithm for
short-distance transmission, the iLossless-DCI algorithm is proposed to
solve the packet loss problem in long-distance transmission scenarios. The
solution increases network bandwidth by 10 times from 32GE to 400GE and
significantly improves the storage input/output operations per second (IOPS)
performance.

• Note:

▫ PFC: priority-based flow control

▫ RDMA: remote direct memory access

▫ Three networks: front-end service network, diversified computing network,


and storage network
Feature 3: Network-Wide Intelligent O&M
Traditional NMS iMaster NCE-FabricInsight

Telemetry Visualized network data in all scenarios


SNMP protocol
Data collection • Indicator analysis in eight dimensions
5-minute polling
in seconds • Dynamic baseline anomaly detection
period

Comprehensive network health


Device-centric Service-centric evaluation
2-hour inspection Risk identification • 5-layer evaluation model + AI
per day in minutes algorithm
• Capacity/Traffic risk prediction

"1-3-5" troubleshooting
Passive response Proactive O&M • AI algorithm + expert
Manual fault Automatic experience
locating troubleshooting • Automatic locating of multi-
vendor problems

Multi-DC,
multi-cloud Overall Multi-cloud, multi-DC analysis
Distributed and perspective • Unified multi-DC health evaluation
independent Unified O&M • Visualized service access cross clouds
O&M

16 Huawei Confidential

• The CloudFabric Solution uses telemetry technology to collect multi-dimensional


data from the network, and uses the intelligent analysis platform to analyze
network-wide O&M data. In addition to visualization of various O&M data, the
CloudFabric Solution provides multiple key O&M capabilities.
▫ Network health evaluation: A multi-dimensional evaluation system in terms
of the device, network, protocol, overlay, and service is built to integrate
configuration data, entry data, log data, and KPI performance data on the
network with the help of telemetry. The intelligent analysis platform can
detect issues and risks in each dimension of the network in real time. The
detection scope covers the network working status, network capacity,
component sub-health, and service traffic exchange. In this way, O&M
personnel can view the overall experience quality of the entire network.
▫ Rapid root cause locating: Based on knowledge graph, known DCN faults
can be detected within 1 minute, located within 3 minutes, and rectified
within 5 minutes. Unknown faults learning and fault inference are also
supported to help O&M personnel deeply explore the root causes of
unknown faults.
▫ Automated assurance for service changes: Network data after configuration
changes are collected to perform modeling to check whether the actual
network forwarding behavior is consistent with users' service intents. O&M
personnel can use the verification result to check whether the change
meets the expectation and causes issues. If an intent fails verification, they
can locate the failure cause, greatly improving the O&M efficiency in
network change scenarios. In addition, important services can be
periodically and automatically verified to ensure normal and reliable
running of the services.
Contents

1. Development Trends and Challenges of DCNs

2. Huawei CloudFabric Solution


▫ Solution Features
◼ Overall Architecture

▫ Application Scenarios

▫ Core Components and Key Services

3. Huawei CloudFabric Typical Scenarios - Computing Scenario

17 Huawei Confidential
CloudFabric Solution Architecture
⚫ The CloudFabric Solution consists of the application layer, control and analysis layer, and forwarding layer.
Application Cloud Container
Cloud OS platform platform
layer

MDA MDC iMaster NCE-


Control and Computing Fabric
analysis management
layer platform Network analyzer Network controller
SecoManager HiSec Insight iMaster NCE-
VMM iMaster NCE-FabricInsight iMaster NCE-Fabric
FabricInsight

Server pool Fabric Spine VAS pool Multi-DC fabric

Public cloud
Leaf NGFW/ Third–party ...
vSwitch vNGFW firewall Core
Forwarding
layer Intelligent
and lossless
network Fabric Fabric
gateway gateway
WAN

DC1 DC n

18 Huawei Confidential

• Application layer:

▫ Cloud OS:

▪ Cloud platform: OpenStack-based cloud operating platform, including


open-source OpenStack and Huawei FusionSphere, which
collaboratively manage computing, storage, and network resources.

▪ Container platform: creates and provisions containers.

• Control and analysis layer:

▫ Computing management platform: The Virtual Machine Management


(VMM) implements computing plane virtualization and resource
management. vCenter and System Center are common computing
management platforms.

▫ Network controller: iMaster NCE-Fabric is used to centrally manage and


control cloud DCNs. It provides automatic mapping from applications to
physical networks, and implements resource pool deployment, and
visualized O&M, helping customers dynamically schedule service-centric
network services.

▫ VAS controller: SecoManager implements centralized security policy


management and control for firewalls, monitors events in real time,
comprehensively analyzes security events such as attacks, and provides
statistical reports in different formats. All of this helps customers master
the cyber security status at any time.
Contents

1. Development Trends and Challenges of DCNs

2. Huawei CloudFabric Solution


▫ Solution Features

▫ Overall Architecture
◼ Application Scenarios

▫ Core Components and Key Services

3. Huawei CloudFabric Typical Scenarios - Computing Scenario

20 Huawei Confidential
Hosting Scenarios Overview
⚫ iMaster NCE and network are deployed without the cloud platform and VMM. The network administrator uniformly
manages networks through the GUI provided by iMaster NCE.

Network
Administrator

vSwitch
Uplink Port

VMM Internal Port

Computing
VM VM VM CE1800V

Agent
Administrator

Host
Hypervisor

21 Huawei Confidential

• Virtual Machine Management (VMM) is a system and platform that manages


VMs in a unified manner. For example, Huawei FusionCompute and VMware
vCenter can be used to create, delete, and migrate VMs.
Computing Scenarios Overview
⚫ The network administrator uniformly manages the physical and virtual networks through the GUI provided by
iMaster NCE. The network system collaborates with compute resources, which are managed by the computing
administrator.

Network
Administrator

Collaboration

vSwitch
Uplink Port

VMM Internal Port

Computing
VM VM VM CE1800V

Agent
Administrator

Host
Hypervisor

22 Huawei Confidential

• The computing scenario and hosting scenario are network virtualization


scenarios, which are essentially only network devices for virtualization and
management.
Cloud-Network Integration Scenarios Overview

DC administrator

In this solution, user operations are performed on the


cloud platform. The cloud platform interconnects with
Cloud platform
the controller (iMaster NCE-Fabric) and computing
management platform (VMM) to implement association
Computing between computing and network services.
management
platform ⚫ The cloud platform delivers network-related service
instructions to the controller for processing. The
Spine controller translates these service instructions into
Firewall LB
network configurations, and then delivers the
VXLAN configurations to corresponding network devices.
Leaf
⚫ The cloud platform delivers computing-related
Network service instructions to the VMM for processing. The
VMM then manages the lifecycle of compute
resources.
Computing

23 Huawei Confidential

• The cloud-network integration scenario applies to unified provisioning of


computing and network resources based on the cloud platform. The cloud
platform is the only portal for provisioning services and managing compute and
network resources. It uses standard northbound APIs of the SDN controller to
implement dynamic provisioning of tenant network resources as well as rapid
network provisioning and resource adjustment, shortening service provisioning
time. The benefits are as follows:

▫ Unified computing and network service provisioning

▪ This eliminates the isolation between the IT system and network


system in the traditional service system. The cloud platform or
orchestration platform collaborates with the SDN controller to
provision and maintain services, providing a unified GUI for
customers.

▫ Information sharing between departments, improving efficiency

▪ This breaks down the barriers between IT and network departments


in traditional enterprises and implements unified collaboration,
greatly improving their working efficiency.

▫ Built on the quasi-standard platform

▪ The system built based on OpenStack, the open-source cloud


platform, can maximize customers' return on investment and be
compatible with other systems and devices.
Container Network Scenarios Overview
• The Kubernetes API server functions as the Kubernetes
Kubernetes master cluster management entry.
Kubernetes API server Service
• Calico API controller listens to Pod, network, service, and
presentation
node object events of the Kubernetes API server and
layer
Calico API controller associates with the controller to automatically deliver
NCE API controller
physical network device configurations.

Network • The controller models and instantiates networks,


control/ automatically orchestrates and delivers network
analysis layer configurations, and diverts traffic to VAS devices.

• CE switches function as hardware NVE nodes, VXLAN


gateways, and distributed routers.
• Leaf nodes support stacking and M-LAG modes, and border
Network leaf nodes support stacking and multi-active modes.
service layer • BGP is used for communication with external routers.
• Hardware firewalls are connected to border leaf nodes in
bypass mode.
• Service leaf nodes are connected to LBs in bypass mode.
EBGP
ETH0
• In Layer 3 routing mode, servers and server leaf nodes run
bird
Calico Iptables (Linux kernel) Computing EBGP.
CNI access layer • Calico CNI configures the container network and BGP on
Kubernetes node
Pod Pod Pod servers in Layer 3 routing mode.
(BM)
Layer 3 routing

24 Huawei Confidential

• Container networks can be classified into independent deployment and


interworking between container networks and physical networks. shows the
interworking between container networks and physical networks. In this scenario,
physical networks can be associated with container networks to implement
automatic deployment, avoiding manual configuration errors, shortening service
provisioning time, and providing stronger O&M features.
Contents

1. Development Trends and Challenges of DCNs

2. Huawei CloudFabric Solution


▫ Solution Features

▫ Overall Architecture

▫ Application Scenarios
◼ Core Components and Key Services

3. Huawei CloudFabric Typical Scenarios - Computing Scenario

25 Huawei Confidential
Core Component: iMaster NCE-Fabric
⚫ iMaster NCE-Fabric is a core component of Huawei CloudFabric
Application Cloud Container
Cloud OS Solution. It implements unified control and dynamic scheduling
layer platform platform
of network resources and fast deployment of cloud services.
RESTful/RPC
⚫ iMaster NCE-Fabric has the following features:
Network controller SecoManager HiSec Insight  Automation: Service requirements are translated into logical network
Control and models and network configurations, and are automatically delivered
RESTful
analysis layer to devices in batches, shortening the service rollout period from
SNMP
NETCONF weeks to minutes.
OpenFlow
OVSDB
 Reliability: iMaster NCE-Fabric provides highly reliable cluster
Fabric Spine capabilities. The system processes northbound and southbound
VAS Pool
services in load balancing mode. In addition, the active and standby
clusters can be deployed in different regions to implement remote
Leaf NGFW/ Third–party
FW DR, ensuring high reliability of DC services.
vSwitch vNGFW
Forwarding  Security: iMaster NCE-Fabric provides security protection at the
layer Intelligent and lossless network minimum granularity for DCNs. It also implements security isolation
at the minimum granularity through microsegmentation defined in
multiple dimensions, such as the IP address, host name, and VM
name, preventing the spread and transfer of threats to the maximum
extent.

26 Huawei Confidential

• iMaster NCE-Fabric is designed based on open platforms, allowing it to connect


to cloud platforms through northbound interfaces, to physical switches, vSwitches,
and firewalls through southbound interfaces, and to the computing management
platform through eastbound and westbound interfaces. These capabilities
implement management and control of network resources and collaborative
provisioning of compute and storage resources, resulting in an efficient, simple,
and open DC. Based on the multi-engine capability of the data base, iMaster
NCE-Fabric works with iMaster NCE-FabricInsight to provide L3 autonomous
driving capabilities. The combination of system-based automated processing and
manual assisted processing greatly reduces labor costs and error rates, as well as
implementing conditional autonomy.

• iMaster NCE-Fabric interface description:

▫ Between the control layer and application layer:

▪ The two layers are interconnected via RESTful or RPC. The control
layer receives service instructions from the application layer and
returns status information to the application layer.

▫ Between the control layer and forwarding layer:

▪ The controller uses SNMP to discover and obtain physical device


information, uses NETCONF to deliver configurations to physical
devices, and uses OpenFlow to deliver flow tables to physical devices
which are used for constructing detection packets during O&M.
Core Component: iMaster NCE-FabricInsight
Application Cloud Container ⚫ iMaster NCE-FabricInsight is an intelligent network analysis platform of
Cloud OS
layer platform platform
CloudFabric. It detects fabric network status and application behavior
status in real time, breaks network and application boundaries, and
helps customers detect network and application problems in a timely
Analyzer
manner from the application perspective, ensuring continuous and
Control and
RESTful/RPC stable running of applications.
analysis layer
gRPC/ERSPAN Network controller
Syslog/SNMP ⚫ Features of iMaster NCE-FabricInsight:
 Display service flows and network-wide KPIs through Telemetry in seconds,
Fabric Spine implementing correlation analysis of services, network paths, and network
devices and giving intuitive insights into network health.
Leaf  Train the knowledge inference engine based on machine learning to analyze
vSwitch
Forwarding root causes of dozens of faults in minutes, implement edge intelligence based
layer Intelligent and lossless network on software and hardware, and perform comprehensive analysis of TCP and
UDP flows.

 Construct dynamic baselines, identify device, queue, and port exceptions, and
proactively predict traffic and optical module faults.

28 Huawei Confidential

• iMaster NCE-FabricInsight interface description:


▫ Inside the control layer: The control layer uses RESTful or RPC to
synchronize configuration and status information between control units.
▫ Between the control layer and forwarding layer: The analyzer at the
control layer connects to network devices through Google Remote
Procedure Call (gRPC) or ERSPAN to collect and send device data. Syslog or
SNMP is used to collect device status, alarms, and logs.
• The overall architecture of iMaster NCE-FabricInsight consists of three parts:
network device, collector, and analyzer.
▫ Network device: includes Huawei CloudEngine (CE) switches, NetEngine
(NE) routers, and some third-party devices. Devices report performance
metrics such as interface traffic in Telemetry mode based on the gRPC
protocol. Devices are connected to iMaster NCE-FabricInsight as gRPC
clients. Users can run commands to configure the telemetry function on the
devices. The devices then proactively establish a gRPC connection with the
target collector and send data to the collector. The current version supports
the following sampling metrics: CPU and memory usage at the device and
card levels; number of sent and received bytes, number of discarded sent
and received packets, and number of sent and received error packets at the
interface level; number of congested bytes at the queue level; packet loss
behavior data.
▫ Collector: receives data reported by network devices via telemetry,
including performance metric data reported through gRPC. The collector
parses the metrics, combines and compresses the metric data, and reports
the data to the analyzer.
▫ Analyzer: receives performance metrics from switches. In addition, the
analyzer establishes dynamic baselines for some performance metrics based
on the AI algorithm, detects exceptions, and displays the analysis result on
the GUI.
Core Component: CloudEngine Series Switches

Analyzer Network controller


Control and ⚫ Huawei CloudEngine (CE for short) series switches are
analysis layer SecoManager high-performance cloud switches designed for next-
generation DCs, including the industry‘s first DC switches
designed for the intelligence era (CE16800 series) and
CloudEngine 16800/12800 series (modular)
next-generation high-performance core switches designed
for DCs (CE12800 series), and high-performance
aggregation/access switches (CE9800/8800/6800/5800
series).
Forwarding
layer

CloudEngine 9800/8800/6800/5800 series (fixed)

29 Huawei Confidential
Key Service Overview
⚫ Huawei CloudFabric provides mission-critical services at multiple layers of data centers, efficiently building agile
data centers.
Zero Touch
Device locking and O&M
Provisioning
automatic
Data Center Network Flexible orchestration reconciliation
of multiple services Manage Monitoring Troubleshooting
Management of Configuration Rollback Devices
multiple network types
Spine Management Fault locating
Intent network Intelligent lossless
network Network
Fault recovery
Leaf Intention Simulation Low-latency network Management
vSwitch Multicast service Service restoration Network health Fault closure

Data Center Security Data Center Interconnection

Multi-PoD Multi-Site
VAS Pool Service chain

NGFW/ Third–party Microsegmentation DC 1 DC 2


vNGFW FW

30 Huawei Confidential
Data Center Data Center Data Center Multi-DC
Network Security O&M Interconnection

Initial Network Construction: ZTP

Application scenario
Service cable
Management • Zero Touch Provisioning (ZTP) allows newly delivered or unconfigured devices to
cable automatically load version files, deploy the underlay network, and be managed
by iMaster NCE-Fabric after they are powered on.

Out-of-band Deployment solution


management • Out-of-band deployment: iMaster NCE-Fabric connects to the management
M-LAG switch interfaces of all devices to be brought online through the out-of-band
management switch.
• In-band deployment: The management network and service network share
service network ports, and no independent switch needs to be deployed.
DHCP server SFTP server
(Third-party server Deployment mode
or built-in iMaster NCE-Fabric) • Typical configuration mode: A plan file is automatically generated, reducing the
workload of filling in the topology template.
• User-defined import mode: Topology planning is required, which is highly
Typical ZTP out-of-band networking refined.

31 Huawei Confidential

• In the traditional deployment mode, administrators need to manually configure


each newly delivered or unconfigured device after hardware installation, which
lowers deployment efficiency and results in high labor costs. This is where the
ZTP-based simplified deployment function of iMaster NCE comes in. The function
enables users to complete network topology planning and fabric resource
planning, automatically bring devices online, execute device configuration scripts,
and deliver underlay network configurations to devices in batches on a visualized
GUI. This reduces labor costs and improves deployment efficiency. ZTP-based
simplified deployment enables rapid rollout and management of DCN devices.
• In the CloudFabric solution, the physical DC network uses the spine-leaf
architecture and supports horizontal on-demand capacity expansion. The roles on
the network include spine nodes, server leaf nodes, border leaf nodes, service leaf
nodes, and DCI gateways. There are often a large number of server leaf nodes,
which require automatic service rollout. Therefore, ZTP mainly focuses on server
leaf nodes.
▫ Server leaf nodes support M-LAG and standalone networking, which are
applicable to different server access scenarios. M-LAG networking is
recommended because high reliability is achieved when servers are dual-
homed to switches in M-LAG mode. In addition, each M-LAG device has its
own control plane, simplifying upgrade and maintenance.
• The CloudFabric solution supports two network architectures: three-layer
networking and two-layer networking, which are both supported by ZTP.
▫ Three-layer networking architecture: Spine nodes, border leaf nodes, and
service leaf nodes are separately deployed, which applies to large network
scenarios.
▫ Two-layer networking architecture: Spine nodes, border leaf nodes, and
service leaf nodes are combined, which applies to small and midsize
network scenarios.
Data Center Data Center Data Center Multi-DC
Network Security O&M Interconnection

Management of Multiple Network Types


⚫ iMaster NCE-Fabric supports multiple overlay fabric types to meet different user requirements in different
application scenarios, such as network forwarding performance, server access type, and VXLAN tunnel encapsulation
points.

Network Overlay Hybrid Overlay

Spine Spine

NVE NVE NVE NVE


Server Leaf Server Leaf
NVE NVE NVE

NVE NVE NVE


Server VSwitch VSwitch VSwitch VSwitch Server VSwitch VSwitch VSwitch VSwitch

VM VM VM VM VM VM VM VM

32 Huawei Confidential

• In a network overlay network, all overlay devices are physical devices, and VXLAN
tunnels on the overlay network are encapsulated on physical switches. This
networking has the advantages of high forwarding performance and reliability,
and can connect to multiple servers. Servers do not need to support VXLAN
tunnel encapsulation. Network overlay is applicable to new data centers that
have high requirements on forwarding performance and security, and SDN
networks and traditional networks need to communicate with each other.

• On a hybrid overlay network, overlay devices include physical and virtual network
devices. Overlay VXLAN tunnel encapsulation can be implemented on either the
physical switch or the virtual switch where the host server resides. The hybrid
overlay can not only use the high-performance forwarding of physical network
devices, but also improve performance by reusing existing physical network
devices and overlaying physical servers. Therefore, hybrid overlay networking is
more flexible and provides customers with more choices. Hybrid overlay
networking is applicable to scenarios where network capacity expansion,
hardware costs are sensitive, network reuse is emphasized, VXLAN and hardware
decoupling is required, and SDN networks and traditional networks need to
communicate with each other.
Data Center Data Center Data Center Multi-DC
Network Security O&M Interconnection

Three-Level Rollback
Network-wide rollback Tenant snapshot Service-level rollback
• Network-wide rollback is used to resolve • The tenant snapshot function is used to back • Service-level rollback helps quickly restore
major faults on the entire network. For up and restore network service configurations original network configurations to recover
example, if network configurations are by tenant, and apply to multi-tenant services. services when a network exception occurs
deleted due to changes, many services are Backup and restoration operations performed due to a fine-grained single-point service
interrupted. In this case, network-wide by a tenant do not affect the provisioning of provisioning failure.
configurations can be rolled back to those other tenants' services, including backup and • You do not need to manually back up data
before the changes or interruptions, restoration of network service configurations for service-level rollback, but need to
enabling quick service recovery. by other tenants. manually restore data.
• Before changes, you can back up network- • The tenant snapshot function allows a tenant • iMaster NCE-Fabric automatically backs up
wide configurations on iMaster NCE-Fabric. to set a backup point and save all its service each service that is provisioned. When an
When a problem occurs due to changes, configurations at the backup point. If needed, exception occurs, iMaster NCE-Fabric can
the configurations can be quickly restored service configurations can then be restored to quickly restore the service to the status
to the backup point, resolving major a specific snapshot point. Additionally, before the service is provisioned.
network faults. iMaster NCE-Fabric can compare the current
• You can manually save data in real time or configurations with the configurations at the
snapshot point, or compare the
periodically on the GUI. You need to
proactively back up data. configurations from two given snapshot
points, and perform configuration rollback to
eliminate differences.
• The tenant snapshot function supports
manual backup and restoration as well as
automatic and periodic backup.

33 Huawei Confidential

• iMaster NCE-Fabric provides three-level rollback, meeting the reliability


requirements of different scenarios and ensuring quick service recovery. This
feature covers 70% to 80% of routine change scenarios. For example, the fast
rollback feature is available for single-point service provisioning exceptions and
independent tenant services.
▫ Network-wide rollback features:
▪ iMaster NCE-Fabric saves the snapshots of the entire network,
including those of iMaster NCE-Fabric and its managed devices.
▪ You can manually save the snapshots in real time or periodically.
▪ During restoration, iMaster NCE-Fabric delivers commands to devices
to restore data. The devices restore specific configurations based on
specified snapshot point labels and do not need to be restarted.
▫ Tenant snapshot features:
▪ You can manually save the snapshots in real time or periodically.
▪ iMaster NCE-Fabric divides different tenant spaces for tenant backup
so that operations between tenants do not affect each other.
▪ Differences between rollback points can be previewed for further
examinations.
▫ Service-level rollback features:
▪ Service operations are automatically saved.
▪ Snapshots are automatically stored in mirroring mode.
▪ The linkage technology enables rollback of multiple operations to the
previous state.
Data Center Data Center Data Center Multi-DC
Network Security O&M Interconnection

Microsegmentation

Spine ⚫ Microsegmentation allocates servers to different


EPGs and defines GBPs between EPGs to implement
traffic control between servers.

Fabric ⚫ Microsegmentation can be implemented either on


Leaf1 Leaf2 CE switches or on iMaster NCE-Fabric. iMaster NCE-
Fabric configures EPGs and GBPs and delivers the
configurations to CE switches through NETCONF
VM1 VM2 VM3 VM4 interfaces.
EPG1 EPG2 EPG1 EPG3

34 Huawei Confidential

• End Point Group (EPG): Endpoints (servers) are grouped based on the IP address,
IP network segment, MAC address, VM name, container, and operating system.
An EPG can contain multiple servers.

• Group-based policy (GBP): policy for traffic control within an EPG and between
EPGs. A GBP can be configured based on EPGs, protocol numbers, and port
numbers, which specifies the policies within an EPG, between EPGs, and between
a known EPG and an unknown EPG.
Data Center Data Center Data Center Multi-DC
Network Security O&M Interconnection

Service Chain

⚫ SFC is a technology that logically connects services on


External
network devices to provide an ordered service set for the
application layer. SFC adds service function path (SFP)
information to original packets to enable packets to pass
through SFs along the specified path.
⚫ SFC can be implemented in Policy-based Routing (PBR)
mode or Network Service Header (NSH) mode. When
creating a fabric network on the controller, you must
specify the PBR or NSH mode. Install logical switches,
logical routers, or external gateways on the controller to
divide EPGs and define SFCs between EPGs.
VM1 VM2 LB FW IPS
SFC 1
SFC 2

35 Huawei Confidential
Data Center Data Center Data Center Multi-DC
Network Security O&M Interconnection

iMaster NCE-Fabric O&M


iMaster NCE-Fabric O&M panorama

Physical network O&M Logical network O&M Application network O&M


Visualization of Neutron EPG application
VMM Cloud platform

Status

Detail

Mapping
application, logical, Consistency check network network mapping
O&M O&M
and physical networks restoration (web/app/DB)
interconnection interconnection
Loop fault diagnosis End port O&M Service
Controller installation and (locating) Logical switch O&M Application path

Status

Fault

Change
provisioning
deployment visualization
audit
Logical network Logical router O&M (connectivity and
Underlay network connectivity Events, logs,
topology path)
detection Physical firewall and statistics
management Logical SF O&M
Physical resource pool (fabric) Logical resource pool
Software firewall
(resource visualization)
management
Server management
Virtual switch
Physical topology management
Physical switch
ZTP-based switch installation management

36 Huawei Confidential

• iMaster NCE-Fabric centrally manages and controls cloud DCNs and provides
automatic mapping from applications to physical networks, resource pool
deployment, and visualized O&M, helping customers build service-centric
dynamic network service scheduling capabilities.

• In addition to network planning and deployment, iMaster NCE-Fabric also


provides DCN service O&M, including: topology visualization, loop detection, path
detection, traffic statistics collection, three-level rollback, and data consistency
verification.
Data Center Data Center Data Center Multi-DC
Network Security O&M Interconnectio

Multi-DC Service
⚫ With the development of services, more and more applications are deployed in data centers. The resources of a
single data center cannot meet the increasing service requirements. Therefore, multiple data centers are required to
deploy services.

Multi-PoD solution
Multi-site solution

(MDC)

(Main) (Optional) (DC1) (DC 2)

close long
distance distance

DC 1 DC 2 DC 1 DC 2

The computing and network resources of multiple DCs are unified and In the multi-DC scenario, the computing and network resources of each
managed by a cloud platform and a set of iMaster NCE (Fabric). DC are independent resource pools and are managed by the cloud
platform and iMaster NCE (Fabric) in their respective DCs.

37 Huawei Confidential
Contents

1. Development Trends and Challenges of DCNs

2. Huawei CloudFabric Solution

3. Huawei CloudFabric Typical Scenarios - Computing Scenario

38 Huawei Confidential

• This course describes only the computing scenario in detail. For details about
other scenarios, see the related sections of HCIE-DCN.
Solution Networking Service Service
Overview Solution Model Process

Introduction to Computing

Government Transportation Enterprise Education Healthcare Internet Manufacturing

Challenge 1 Challenge 2
Service types are becoming more refined, IT resources always seem to be insufficient
and an increasing number of devices are Challenges for enterprise IT while the resource utilization is low. The
deployed, resulting in increasingly high resource utilization is unbalanced, and
configuration and management costs. resources cannot be flexibly scheduled.

Cloud computing provides various advantages such as resource pooling, elastic scaling, and on-demand
self-service provisioning, helping enterprises cope with the preceding challenges.
However, some enterprises cannot fully implement cloud-based services at a time.
• Generally, enterprises have IT department and

Non-technical
• Service systems are complex and have different
network department but no cloud platform
Technical

factors
factors

requirements on the running environment. department. The IT department and network


• The application scale of each service system can department are responsible for computing and
be estimated within a certain period of time and network requirements, respectively. These two
will not scale greatly. departments cannot be integrated in a short period of
time.

Some enterprises that cannot achieve cloud computing at a time start with automation reconstruction on networks. That is, they
associate network resources with compute resources, and then gradually transform their networks toward the scenario where a
unified cloud platform will be deployed, which is the Cloud-Network Integration scenario.

39 Huawei Confidential

• Currently, enterprises face the following challenges in their IT systems:

▫ Service types are becoming more refined, and an increasing number of


devices are deployed, resulting in increasingly high configuration and
management costs.

▫ IT resources always seem to be insufficient while the resource utilization is


low. The resource utilization is unbalanced, and resources cannot be flexibly
scheduled.

• Cloud computing provides various advantages such as resource pooling, elastic


scaling, and on-demand self-service provisioning, helping enterprises cope with
the preceding challenges.
Solution Networking Service Service
Overview Solution Model Process

CloudFabric Solution Overview in the Computing Solution


• In this scenario, no cloud platform is involved. The network
Service negotiation
administrator configures network services through the
Computing Network controller, and the computing administrator configures
administrator administrator compute resources. The network administrator and computing
administrator perform service negotiation through the
API interconnection
VMM enterprise's internal service process.

• The controller can interconnect with the VMM to implement


service automation. The controller delivers network
Spine
Firewall LB configurations to the computing platform through APIs. The
computing platform notifies the controller of the VM online
VXLAN
and offline information, and the controller delivers the
Leaf
Network configuration to the corresponding API to complete E2E service
configuration.
Computing • The computing solution implements automatic network
configuration to the maximum extent, reducing the
configuration workload of the network administrator.

41 Huawei Confidential
Solution Networking Service Service
Overview Solution Model Process

CloudFabric Solution Architecture in the Computing Solution

Service • In the computing scenario, no cloud platform is deployed. iMaster


NCE orchestration presentation NCE provides an independent UI, which interconnects with iMaster
layer NCE through the RESTful API.

• iMaster NCE automatically configures network devices and


Network interconnects with the VMM to implement virtual network provisioning
analysis/ and virtualization awareness.
SecoManager control layer • SecoManager interconnects with iMaster NCE to implement VAS
orchestration, policy management, and VAS modeling, instantiation,
and configuration delivery.
• FabricInsight detects network anomalies based on real service traffic.
Spine
Firewall LB
Network
VXLAN service layer • The spine-leaf architecture is used.

Leaf

• Virtualization server: iMaster NCE interconnects with the VMM to


Computing automatically deliver and modify configurations on the network side
VMM when VMs go online or offline.
access layer
• Physical server: Some servers do not support virtualization.
Administrators need to orchestrate and deliver network-side
configurations through the iMaster NCE-Fabric.

42 Huawei Confidential
Solution Networking Service Service
Overview Solution Model Process

Networking Solution

• The spine-leaf architecture is used. The border leaf node and


Border leaf
Service leaf service leaf node are co-deployed, and the spine node is
deployed separately and connects to VAS devices and external
networks. The spine and leaf nodes are fully meshed to
implement ECMP load balancing.
Spine
• The distributed network overlay solution is recommended.
iMaster NCE centrally manages the gateways and
automatically delivers service configurations.

• OSPF or BGP is used as the routing protocol on the forwarding


plane of the underlay network, and BGP EVPN is used as the
Server leaf
control plane protocol. A BGP EVPN peer relationship is
established between VTEPs.

43 Huawei Confidential
Solution Networking Service Service
Overview Solution Model Process

Service Model: Tenant Service Model (1)


⚫ Understanding the terminology, background, and implementation principles associated with service provisioning is helpful for quickly
mastering tenant network interconnection skills in a computing scenario.

• Tenant: is the minimum unit for enterprise service management.


Tenant
• Virtual Private Cloud (VPC): provides secure and reliable information
processing, storage, and transmission services to tenants through the
VPC External VPC
network virtualization and encryption technologies based on network, storage,
and compute resources. Multiple VPCs can be created for a tenant based
Logical router Firewall
on service requirements.
Optional
• Logical router: is virtualized by a network device where virtualization
Logical switch software is running, and is connected to VMs on different networks, so
that VMs can communicate with each other on a Layer 3 network. One
network device can be virtualized into multiple logical routers for
Logical Logical different tenants.
port port
• Logical switch: connects to different VMs to ensure that the VMs can
communicate with each other at Layer 2. One network device can be
virtualized into multiple logical switches for different tenants.

44 Huawei Confidential

• One network device can be virtualized into multiple logical routers for different
tenants. Multiple tenants can share a network device. For each tenant, a logical
router functions as an independent and real router with independent hardware
and software resources and running space. Services on different logical routers do
not affect each other. In terms of experience, there is no difference between a
logical router and a real router.

• One network device can be virtualized into multiple logical switches for different
tenants. Multiple tenants can share a network device. For each tenant, a logical
switch functions as an independent and real switch with independent software
and hardware resources and running space. Services on different logical switches
do not affect each other. In terms of experience, there is no difference between a
logical switch and a real switch.
Solution Networking Service Service
Overview Solution Model Process

Service Model: Tenant Service Model (2)

Tenant
• Logical port: functions as an access point for VMs to access the network.
One physical port on a network device can be virtualized into multiple
VPC External VPC
network logical ports for different tenants. For each tenant, a logical port
functions as an independent and real port.
Logical router Firewall
Optional • External network: networks outside the tenant's management, such as
Internet or other tenant networks connected through VPNs.
Logical switch
• Firewall: The firewall function is provided by a physical firewall or virtual
firewall.
Logical Logical
port port
• VM: virtual machine.

45 Huawei Confidential

• Located at the border of a network, a firewall implements secure access control


between the external network and internal network, which enhances the network
protection capability. It protects service data flows between the Untrust and Trust
zones based on 5-tuple information. It can also be used for access control
between subnets. You can choose whether to deploy firewalls based on whether
the tenant needs to access an external network. For security purposes, deploy a
firewall when a tenant is connected to an external network.

• In the computing scenario, VMs are provisioned by the VMM connected to


iMaster NCE-Fabric. The VMM manages compute resources, and iMaster NCE-
Fabric manages network resources.
Solution Networking Service Service
Overview Solution Model Process

Example of a Tenant Service Model

Service model Service example • Tenant: A tenant can apply for independent
compute, storage, and network resources, and can
be regarded as a service system or department.
Tenant Department 1 Department 2 Department 3
• VPC: Each VPC is a security domain and can be
regarded as a collection of services that have the
Service Service
VPC DMZ service system 1 system 2 same security policy. A VPC is mapped to a VRF.

• EPG: An EPG is a set of service ports. Service ports


Web Application Database
EPG layer layer layer in an EPG have the same security policy. An EPG
can have one or more subnets. Security policies can

Subnet Subnet 1-web


Subnet 2- Subnet 3- be easily configured using EPGs.
application database
• Subnet: A subnet indicates a network segment. A
VPC can have one or more subnets.
VM VM1 VM2 VM3
• VM: A VM is connected to only one subnet, and
one subnet can have multiple VMs.

46 Huawei Confidential
Solution Networking Service Service
Overview Solution Model Process

Relationship Between a Physical Network and a Logical Network

Logical network Physical network Abstract network model

External network
Internet
L4-L7
NAT/firewall
Firewall Internet
/IPsec/WAF…
Spine IPsec VPN Firewall
L3 Logical router
VRF
WAF NAT
Layer 3
L2 Logical switch gateway
Leaf Fabric
Layer 2
Bridge gateway
domain
Logical Logical
L1 Sub-
port port
interface VM Physical server
End port VM Physical Firewall
server

VM Physical server

47 Huawei Confidential

• The physical network uses the spine-leaf architecture. VMs, switches, and
firewalls access the network through switches at the leaf layer. VMs and physical
machines function as computing nodes. Firewalls function as network nodes and
provide NAT, IPsec VPN, WAF, and firewall (packet filtering) network services in
SFC.

• Common packets of VMs, physical machines, and firewalls are encapsulated into
VXLAN packets and transmitted on a fabric network constructed by switches at
leaf and spine layers. VXLAN Layer 2 gateways encapsulate the common packets
into VXLAN packets at the access layer. VXLAN Layer 2 gateways provide data
transmission services within a subnet and are called internal gateways.

• For communication between an intranet and the Internet, between an intranet


and an external private network, and between subnets within an intranet, VXLAN
Layer 3 gateways are required for route query and data forwarding. VXLAN Layer
3 gateways connect to the Internet using PE routers (not displayed in the figure).
VXLAN Layer 3 gateways transform the VXLAN packets sent from an intranet to
the Internet or an external private network to common packets and forward
these packets. VXLAN Layer 3 gateways are also called external gateways. If VMs
and physical machines need to communicate with other subnets or external
networks, the IP addresses of gateways must be set to the IP addresses of VXLAN
Layer 3 gateways by default.
Solution Networking Service Service
Overview Solution Model Process

Service Provisioning Process (1)

The network administrator and computing


administrator need to cooperate with each other to
deliver services in the computing scenario:
Orchestrate and VXLAN
Network deliver network • The network administrator orchestrates and
administrator services. delivers network services on iMaster NCE.

• The computing administrator configures, creates,


Collaboration and manages VM resources on the VMM (vCenter
in this example).
Obtain the topology
Configure, create, 0
between TOR switches
and manage VM and hosts through LLDP. The topology, namely, connections between ports on
resources.
TOR switches and physical servers, has been

Computing
discovered between servers and TOR switches
administrator through LLDP before service provisioning in the
computing scenario.

49 Huawei Confidential
Solution Networking Service Service
Overview Solution Model Process

Service Provisioning Process (2)


1. The computing administrator enables the VMM to
manage servers and creates a virtual switch VDS.
Configure a 2. The network administrator creates a tenant and a
Create a tenant
2 9 gateway and
and a VPC. security policies. VPC on iMaster NCE.

Configure the VXLAN 3. The network administrator orchestrates the logical


Network 3 Create a 8
VLAN and VNI
administrator network.
of an interface. network in the VPC, including vRouters and
subnets.

Notify the VM 4. iMaster NCE synchronizes information about


7 online 4 Push network
configurations. VLANs corresponding to the subnets to the VMM.
information.
Obtain the topology The VMM then creates a port group for each
0
between TOR switches
Select a host
subnet.
Create a and hosts through LLDP.
6
5 and create a VM.
VM. 5. The computing administrator creates a VM and

Computing 1 Add a host and configures VM parameters, including the port


create a VDS.
administrator group to which the VM belongs.

50 Huawei Confidential
Solution Networking Service Service
Overview Solution Model Process

Service Provisioning Process (3)


6. When receiving the instruction from the
computing administrator, the VMM selects a host
that it manages, creates a VM, and allocates
Configure a
Create a tenant resources to it.
2 9 gateway and
and a VPC. security policies.
7. After confirming that the VM is online, the VMM
Configure the VXLAN
Create a notifies iMaster NCE of the information.
Network 3 network.
8
VLAN and VNI
administrator of an interface. 8. iMaster NCE obtains the information about the
host of the VM from the VMM, determines the
Notify the VM TOR switch port for connecting to the host based
7 online 4 Push network
information. configurations. on the LLDP information, and then delivers VLAN
Obtain the topology and VNI mappings to the port.
0
between TOR switches
Select a host and hosts through LLDP. 9. iMaster NCE delivers the VM gateway and security
6
5 Create a VM. and create a VM.
policy configurations based on the configurations
Computing 1 Add a host and performed by the network administrator in the
create a VDS.
administrator
VPC. Then the VM can access the network
properly.

51 Huawei Confidential
Solution Networking Service Service
Overview Solution Model Process

Network Resource Provisioning Process

When the network administrator configures and provisions network


resources, the computing administrator is not aware of it.

1. The network administrator edits the logical networks required by


1 2 services on the tenant network orchestration page of iMaster NCE.

Network 2. iMaster NCE computes and saves the mapping between VLANs and
administrator VNIs based on the VNI range allocated by the administrator. These
configurations and mappings are stored on iMaster NCE and have
not been delivered to switches since VMs have not gone online.
3
3. iMaster NCE connects to the VMM through the WebService
interface and transfers the preceding information. The VMM creates
Uplink port vSwitch
a port group required by the local network on the virtual switch and

Internal port binds the local VLAN ID to the port group.

Computing
Host agent

Port group VM VM VM
administrator
configuration

Hypervisor

52 Huawei Confidential
Solution Networking Service Service
Overview Solution Model Process

VM Online Process

1. The computing administrator configures compute resource quota


on the VMM based on service requirements.

2. The computing administrator creates a VM.


6
3. The computing administrator checks the port group information
Network
synchronized from the network side on the VMM and manually
administrator
binds the NIC of the VM to the corresponding port group.

4. The VMM synchronizes the port group configurations to the


5 corresponding host and binds the VM to the port group.

5. iMaster NCE detects VM online and port group binding

Uplink port vSwitch information through the WebService interface, and obtains the
location where the VM goes online.
2 Internal port
1 6. iMaster NCE delivers network configurations to a switch.

Computing After the VM goes online, iMaster NCE automatically delivers the Layer
Host agent

Port group VM VM VM
administrator configuration
2 access configurations and Layer 3 gateway configurations of the VM.
4
Then, the VM can access the network properly.
3 Hypervisor

53 Huawei Confidential

• The computing administrator configures compute resource quota (vCPU, RAM,


and operating system) on the VMM based on service requirements.

• The computing administrator creates a VM. The VMM dynamically allocates


compute resources based on the existing configurations, selects a host, and loads
the VM based on the configuration in step 1.

• The computing administrator checks the port group information synchronized


from the network side on the VMM and manually binds the NIC of the VM to the
corresponding port group.

• The VMM synchronizes the port group configurations to the corresponding host
and binds the VM to the port group.

• iMaster NCE detects the VM online and port group binding information through
the WebService interface, and obtains the location where the VM goes online
(including the VM ID and the ID of the host where the VM is located).
Solution Networking Service Service
Overview Solution Model Process

VM Offline Process
The VM offline process is also performed automatically, which cannot be
detected by the network administrator.

1. The computing administrator brings a VM offline through the VMM.

2. The VMM queries the database, finds the host to which a specified
4 VM belongs, brings the VM offline, removes the binding between the
VM and the port group, and reclaims compute resources.
Network
administrator 3. iMaster NCE detects the VM offline information and unbinding
between the VM and port group through the WebService interface,
and obtains the location where the VM goes offline.
3 4. iMaster NCE obtains the connection between the host and TOR
switch port through LLDP, queries the database using the port group
vSwitch as the index to obtain the mapping between the local VLAN and VNI,
Uplink port
and checks whether any VM still uses the local VLAN on the same
2 Internal port
1 port. If no VM uses the local VLAN, iMaster NCE removes the
mapping between the local VLAN and VNI through NETCONF.
Computing
Host agent

Port group VM VM VM 5. The VMM checks whether any other VM on the host is bound to the
administrator configuration 5 current port group. If no VM is bound to the port group, the VMM
Hypervisor reclaims the port group configuration.

55 Huawei Confidential
Solution Networking Service Service
Overview Solution Model Process

Automatic VM Migration Process

If a host or VM is faulty, the system restarts all VMs on the faulty


host on other hosts.
4
1. The VMM detects a fault on the host.

2. The VMM schedules resources and restarts all VMs of the faulty
host on other hosts.

3. iMaster NCE subscribes to VMM events, detects the VM


3 migration, and obtains the locations of the hosts before and
after VM migration.

vSwitch Uplink port vSwitch Uplink port 4. iMaster NCE finds TOR switches and corresponding ports before

1 and after the migration through LLDP. It deletes the mapping


Internal port Internal port
between VLANs and VNIs on the TOR switches before the
migration through NETCONF, and delivers the mapping between
Host agent

Host agent

VLANs and VNIs on the new TOR switches.


VM VM VM VM VM VM
2

Hypervisor Hypervisor

56 Huawei Confidential

• Both the computing administrator and network administrator are unaware of the
automatic migration process. The compute resources are automatically migrated
and network configurations are automatically adjusted based on the
collaboration between the VMM and iMaster NCE.
Solution Networking Service Service
Overview Solution Model Process

Manual VM Migration Process

The computing administrator can manually migrate VMs on the VMM


GUI.

1. The computing administrator triggers VM migration through the


4
VMM.

2. The VMM finds the host to which a VM belongs, performs re-


scheduling in the VMM cluster, selects a new target host, and
migrates the VM.

3 3. iMaster NCE subscribes to VMM events, detects the VM migration,


and obtains the location of the host before and after VM migration.

vSwitch vSwitch 4. iMaster NCE finds TOR switches and corresponding ports before and
Uplink port Uplink port
after the migration through LLDP. It deletes the mapping between
Internal port Internal port VLANs and VNIs on the TOR switches before the migration through
NETCONF (VMs with the same port group do not exist on hosts),
and delivers the mapping between VLANs and VNIs on the new TOR
Host agent

Host agent

VM VM VM VM VM VM
2 switches.

Hypervisor Hypervisor

57 Huawei Confidential
Quiz

1. Which of the following components is used to deploy networks in Huawei


CloudFabric solution?( )
A. iMaster NCE-Fabric

B. SecoManager

C. iMaster NCE-FabricInsight

D. MDA

58 Huawei Confidential

1. A
Summary

⚫ Huawei CloudFabric Solution redefines the O&M, deployment, and interconnection


of DCNs to build intelligent, simplified, ultra-broadband, open, and secure cloud
DCNs. Leveraging iMaster NCE-Fabric and iMaster NCE-FabricInsight, the solution
implements full-lifecycle automation, lossless Ethernet, and network-wide
intelligent O&M.
⚫ Due to limited space, this course only briefly introduces the key features of the
solution. The following courses will further explain the technical implementation
principles and application scenarios of the solution.

59 Huawei Confidential
Thank you. 把数字世界带入每个人、每个家庭、
每个组织,构建万物互联的智能世界。
Bring digital to every person, home, and
organization for a fully connected,
intelligent world.

Copyright©2023 Huawei Technologies Co., Ltd.


All Rights Reserved.

The information in this document may contain predictive


statements including, without limitation, statements regarding
the future financial and operating results, future product
portfolio, new technology, etc. There are a number of factors that
could cause actual results and developments to differ materially
from those expressed or implied in the predictive statements.
Therefore, such information is provided for reference purpose
only and constitutes neither an offer nor an acceptance. Huawei
may change the information at any time without notice.
CloudFabric Data Center Network Planning
and Design
Foreword

⚫ Huawei's CloudFabric hyper-converged data center network (DCN) solution (CloudFabric


solution for short) provides customers with intelligent, lossless, and ultra-broadband
infrastructure networks. The solution supports one-click automatic deployment, AI-powered
intelligent O&M, and on-demand self-service customization to quickly complete the
planning, design, and deployment of industry DCN solutions.
⚫ This course describes the DCN planning and design process of the CloudFabric solution,
including DCN architecture design, underlay and overlay network design, network security
design, and network management and O&M design.

2 Huawei Confidential
Objectives

⚫ On completion of this course, you will be able to:


 Be familiar with the architecture of Huawei's CloudFabric solution.
 Complete the underlay and overlay network design for a DCN based on actual
requirements.
 Complete the high reliability, network security, and network management and
O&M design for a DCN based on actual requirements.

3 Huawei Confidential
Contents

1. Data Center Network Overview

2. Network Architecture Design and Data Planning

3. Underlay Network Design

4. Overlay Network Design

5. Network Security Design

6. Network Management and O&M Design

4 Huawei Confidential
Typical Networking of The Data Center Network
⚫ Shows the typical data center network networking.

Internet access zone/outside campus access


zone/WAN access zone DC

Core

Spine POD1 Spine POD2 Spine PODn

Leaf Leaf Leaf

Server Server Server


Production environment area Non-production environment area Test area

5 Huawei Confidential

• Point of delivery (POD): A data center can be divided into one or more physical
partitions to facilitate resource pooling and management. Each physical partition
is called a POD. A POD is the basic deployment unit of a DC. Each DC can be
deployed with multiple PODs, and a physical device can belong to only one POD.
POD can be a standardized construction of equipment room modules based on
POD or defined based on actual business requirements.

▫ In a large data center, PODs can be defined based on the entire equipment
room module.

▫ A medium-sized data center can define a POD in the unit of two or more
rows of cabinets.

▫ In a small data center, multiple cabinets can be used to form a POD.


Data Center Network Architecture
Internet
/WAN DC 2

External
DCI network DC 1
network ⚫ A DCN is an infrastructure for carrying DC services.
⚫ Multiple DCNs can connect to branches of enterprises
or organizations in different areas. In addition, DCNs
Border leaf DCI-Leaf
can connect to the Internet or local area networks
Spine VXLAN (LANs).

Server leaf
⚫ The Spine-Leaf architecture is recommended for the
Service leaf
underlay network.

Computing VAS access


access

Server Firewall LB

6 Huawei Confidential

• The Spine-Leaf architecture is a new network architecture for a data center. It


consists of spine nodes and leaf nodes. Spine nodes are backbone nodes and
provide high-speed IP forwarding. A leaf node provides the network access
function. In the standard Spine-Leaf architecture, leaf nodes are similar to line
cards of modular switches and are responsible for receiving external traffic. Spine
nodes are similar to the SFUs of modular switches and are responsible for traffic
exchange between leaf nodes.

• Data Center Interconnect (DCI): Two data center network are interconnected to
implement service interworking and service migration across data centers.
Physical Network Role

Roles Function Description

A backbone node, which is the core node of the VXLAN fabric network and provides the high-
Spine speed IP forwarding function and connects to functional leaf nodes through high-speed
interfaces.

Leaf node, which provides Layer 4 to Layer 7 value-added services, such as firewall and load
Service Leaf
balance, to access the VXLAN fabric network.

Leaf node, which provides computing resources, such as virtualized and non-virtualized servers,
Server Leaf
to access the VXLAN fabric network.

Leaf node, which connects external traffic of the data center to the VXLAN fabric network of the
Border Leaf
data center and connects to external routers or transmission equipment.

DCI Leaf Leaf node, which provides cross-DC service interworking and migration functions.

7 Huawei Confidential

• DCI leaf nodes are also called DCI gateways or fabric gateways.
Network Layer
Bottom-up Design
VXLAN
External Overlay: A logical network established using the VXLAN
RR Service Leaf FW
network BGP EVPN
protocol on the underlay network. Network resources are
Border Leaf Spine pooled through iMaster NCE. When creating a logical

Overlay Server Leaf network in a VPC, you can invoke the network resources in
Server
Service
the resource pool. A VPC usually represents a department or
network
a service.
Border Leaf Spine
Service Leaf FW Underlay: A physical topology established by physical
OSPF
network devices, such as switches and routers, provides
interconnection capabilities for all services in a data center
Underlay Server Leaf Server
and is the basic bearer network for service data forwarding
in the data center.
Management Switches

Manage
ment or Management network: Manages all physical devices on the
network service network. There are two types of management: in-

In-Band Loopback interface Out-of-Band band management and out-of-band management.


Management Network Meth interface Management Network

8 Huawei Confidential

• Management network:

▫ Inband management does not occupy service interfaces. Generally,


loopback interfaces are used as management addresses and interwork with
each other through the underlay network.

▫ Out-of-band management: An independent management switch is


configured to connect to the management interface (Meth interface) of the
device to manage the network devices.
DCN Design Overview
DCN design

Network Network
Underlay Overlay Network
architecture management and
network design network design security design
design O&M design

Network
Networking Intra-DC O&M mode
architecture Routing design
design security selection
design

Controller
Inter-DC
Data planning Access design Routing design deployment
security
design

Analyzer
Service
Egress design Firewall deployment
orchestration
design

High
Traffic Security
availability
Forwarding services
design

9 Huawei Confidential

• Note: This course uses this solution design as an example to describe the DCN
design process. The design and deployment parameters and device quantity
involved in this course are examples. You can design a DCN based on actual
service requirements.
Contents

1. Data Center Network Overview

2. Network Architecture Design and Data Planning

3. Underlay Network Design

4. Overlay Network Design

5. Network Security Design

6. Network Management and O&M Design

10 Huawei Confidential
Network Architecture Data Planning

Standard Fabric Network Design


Standard fabric architecture and role separation solution
DC 2
• The spine-leaf architecture is used. All roles are independently
PE
deployed and can be flexibly expanded. Spine and leaf nodes
Border Fabric are fully meshed to form highly reliable redundant links.
VTEP VTEP
leaf gateway
• OSPF or EBGP is used to implement connectivity of the
underlay network and VTEP address reachability, establish BGP
Spine EVPN peer relationships, and guide VXLAN packet forwarding.

• M-LAG is deployed on server leaf nodes and service leaf nodes


Server leaf Server leaf Service leaf
to ensure access reliability, and active-active gateways are
VTEP VTEP VTEP
deployed on border leaf nodes to ensure reliability.

• Evaluate the DC scale and oversubscription ratio based on the


number of access servers, interface bandwidth, and interface
Server Server Firewall LB type, select proper switch models, and flexibly configure the
numbers of spine and leaf nodes.

11 Huawei Confidential

• Spine node design:

▫ Generally, two spine nodes are deployed as independent Layer 3 devices


and are fully meshed to all leaf nodes through Layer 3 interfaces to form
highly reliable redundant links.

▫ Based on the fabric scale, multiple spine nodes can be horizontally


expanded, and only two route reflector (RRs) are deployed (reducing the
memory usage of leaf nodes and the pressure on EVPN route processing
performance).

• Border leaf node design:

▫ Generally, two border leaf nodes are deployed. They are configured as a
DFS group to synchronize entries, forming active-active gateways. Multiple
groups of border leaf nodes can be deployed to meet network expansion
requirements. The number of border leaf nodes can be horizontally
expanded to four (two border leaf node groups) based on bandwidth or
reliability requirements.

▫ In the upstream direction, border leaf nodes are connected to PEs or core
nodes of a DC in square looped or dual-homed Layer 3 networking. In the
downstream direction, border leaf nodes are fully meshed to spine nodes.

▫ A dynamic routing protocol (OSPF or EBGP) runs between border leaf


nodes and core nodes (or PEs).
Network Architecture Data Planning

Converged Fabric Network Design

DC 2
Combination of border Leaf and service leaf nodes
PE

Border leaf and • Combination design:


Fabric
VTEP service leaf VTEP
gateway
nodes combined
▫ Active-active gateways are deployed, which are configured
as a DFS group to synchronize entries and are connected to
VAS devices in an M-LAG.
Spine
Firewall LB
▫ In the upstream direction, border leaf nodes are connected
Server leaf Server leaf Server leaf to PEs or core nodes of a DC in square looped or dual-
VTEP VTEP VTEP homed Layer 3 networking. In the downstream direction,
border leaf nodes are fully meshed to spine nodes.

▫ A dynamic routing protocol (OSPF or EBGP) or static routes


run between border leaf nodes and core nodes (or PEs).
Server Server Server

13 Huawei Confidential

• In the converged network, only two roles can be deployed. In addition, three roles
or even four roles can be deployed on the converged network. The investment
depends on the deployment scenario scale and cost.

• Three-role integration: convergence of border leaf, service leaf, and spine nodes.

• Four roles are integrated: border leaf, service leaf, spine, and server leaf nodes.
Network Architecture Data Planning

Comparison Between Different Networking Solutions


Item Standard Networking Converged Networking
Fabric scale Large Large

High
Minor
Scalability Border leaf nodes, spine nodes, and VAS devices can be
Border leaf nodes are scalable, but VAS resource scalability is poor.
expanded independently.
Initial investment Relatively high Relatively low

Medium Relatively high


Resources such as the hardware ACL and routing table There are high requirements for resources such as the hardware ACL and
Device selection
are distributed on devices of different roles. routing table.
requirements
The requirement on spine nodes is lowered, and line-rate The requirement on spine nodes is lowered, and line-rate forwarding
forwarding based on IP routes is required. based on IP routes is required.
Border leaf and service leaf nodes are combined. The service
configurations and forwarding plane resources for different device roles
This architecture applies to large- and medium-sized need to be deployed on one device, posing high requirements on device
fabrics, supporting about 100 server leaf nodes and 2000 models, and the function scalability is low.
physical servers. This architecture does not support four-active border leaf nodes.
Application
The north-south egresses have strong scalability and Spine nodes are independently deployed. The scalability is not limited and
scenario
support four-active border leaf nodes. four or more spine nodes can be deployed. The fabric supports large-
Scenarios that have strong VAS capacity expansion scale server access.
requirements are supported. A single group of border leaf nodes or service leaf nodes supports a
maximum of 6,000 VMs, and multiple groups of border leaf nodes or
service leaf nodes are supported.

14 Huawei Confidential
Network Architecture Data Planning

Case: Logical Zone Design for a DCN


Internet Extranet Campus Intranet

ISP Private line Ethernet Private line


O&M Network
management zone access zone

Production Non-production Production Non-production


Campus access WAN access
Internet Internet extranet extranet
Network zone zone
access zone access zone access zone access zone
management

Production extranet Non-production


System
resource pool extranet resource pool
management
Extranet core

Data
Intranet core
management

Security Production intranet Non-production


management resource pool intranet resource pool

15 Huawei Confidential

• Zone description:

▫ The DCN consists of the production Internet access zone, non-production


Internet access zone, production extranet access zone, non-production
extranet access zone, campus access zone, WAN access zone, O&M
management zone, production extranet zone, non-production extranet
zone, production intranet zone, and non-production intranet zone, and core
switching zone.

▫ The out-of-band management network is deployed in each zone.

▫ Firewalls are connected to the border of each zone in bypass mode for
isolation.

• Note:

▫ Out-of-band management: iMaster NCE-Fabric connects to the out-of-band


management network ports on network devices through an independently
deployed out-of-band management switch, and manages and controls the
network devices through an independent out-of-band network.

▫ In-band management: No independent management switch and network


are configured. iMaster NCE-Fabric directly connects to the service network
through a service switch, and manages and controls network devices
through the underlay layer of the service network.
Network Architecture Data Planning

Case: Physical Architecture Design for a DCN


Internet Extranet Third-party
access ISP1 ISP2 ISP3 access financial institution
zone Router zone

Anti-DDoS
WAN AS
AS
WAN access Campus access
zone zone Firewall
Firewall
IPS IPS

Production Non-production
extranet core extranet core

Intranet core

Border leaf Border Border


Firewall Border Firewall Border Firewall Firewall
leaf leaf
leaf leaf
VXLAN VXLAN Management VXLAN VXLAN
Spine Spine Spine Spine
domain domain Firewall spine domain domain
Management
Leaf Leaf leaf Leaf Leaf

Controller Controller
Non-production node cluster Non-production VTEP
Production intranet fabric intranet fabric O&M management zone Production extranet fabric extranet fabric

16 Huawei Confidential

• In the DCN, VXLAN is deployed on the production intranet, non-production


intranet, production extranet, and non-production extranet to build a fabric
resource pool.

• Switch positioning:

▫ The resource pool zone uses an architecture where border leaf and service
leaf nodes are combined.

▫ Extranet core switches are connected to the Internet access zone, extranet
access zones, extranet resource pool zone, and DC core switches to control
the advertisement of intranet routes.

• Firewall positioning:

▫ Firewalls are deployed at the border of the resource pool zone and are
connected to the border leaf nodes in bypass mode to perform access
control on all traffic entering and leaving the zone. The cloud platform
drives the SDN controller to automatically deliver the traffic diversion policy
of the border firewalls of the resource pool zone.

▫ Firewalls in the Internet access zone and extranet access zone meet the
two-layer heterogeneous deployment requirements and perform access
control on all traffic entering and leaving the access zones.
Network Architecture Data Planning

VLAN Planning
⚫ The following VLANs need to be planned for the underlay network: interconnection VLANs between some devices,
VLANs reserved for Layer 3 main interfaces, and default reserved VLANs of the system.
⚫ The following VLANs need to be planned for the overlay network: access VLANs (VLANs for VMs and external
networks to access the tenant network) of the tenant network and interconnection VLANs between gateways and
VAS devices. VLAN planning example for the underlay network:
VLAN
Spine Spine VLAN Type Planning Suggestions
Planning
Device
Interconnection Plan interconnection VLANs in advance
interconnection 2 to 30
VLAN based on the actual service design.
VLANs
• Leaf node: Plan 16 VLANs, for example,
VTEP VTEP VTEP VLANs 4047 to 4062.
Reserved VLANs
• Spine node: Plan 63 VLANs, for example,
Access Interconnection Interconnection for Layer 3 main 4000 to 4062
VLANs 4000 to 4062.
VLAN VLAN VLAN interfaces
• The reserved VLANs can be dynamically
adjusted as required.
Server VAS device Egress PE
You are advised to retain the default value.
The reserved VLAN range can be changed
Default reserved
4064 to 4094 on a CE switch using CLI so that the default
VLANs
reserved VLAN range does not overlap with
the planned or existing ones.

18 Huawei Confidential

• VLAN planning for the underlay network:

▫ Device interconnection VLANs: These VLANs provide VLANIF interfaces to


establish links between some devices when an underlay network is
manually constructed.

▫ Reserved VLANs for Layer 3 main interfaces: For some CE series switches
equipped with FD-X series cards, configure a reserved VLAN dedicated for
Layer 3 main interfaces before switching the interface mode to Layer 3.

▫ Default reserved VLANs: These VLANs are used as a channel of the internal
control plane of a switch or a channel for transmitting user service data of
some features.

▫ Note: The number of VLANs required by the underlay network is relatively


fixed.

• VLAN planning for the overlay network:

▫ Note: The number of VLANs required by the overlay network is calculated


based on the number of compute nodes and VAS devices.
Network Architecture Data Planning

IP Address Planning
⚫ The IP addresses of the DCN are classified into service, management, and interconnection IP addresses.

• Service addresses: are the IP addresses of servers, hosts, and


Plan one loopback address
with a 32-bit mask for each gateways.
spine node.
▫ It is recommended that gateway IP addresses use the same last digits,
for example, gateways use IP addresses suffixed by .254.
Spine
Plan two ▫ The IP address range of each service must be clearly distinguished, and
interconnection the IP addresses of each type of service terminals must be contiguous
addresses with a 30-
and can be summarized.
bit mask for each link.
▫ An IP address segment with a 24-bit mask is recommended.
• Management address: is the IP address configured for a
loopback interface created on each Layer 3 network device.
Leaf
▫ A loopback address uses a 32-bit mask. A core device uses a smaller
loopback address than other devices.
Plan two loopback
addresses with a 32-bit • Interconnection address: It is recommended that
mask for each leaf node. interconnection IP addresses use a 30-bit mask and core devices
use a smaller host IP address.

19 Huawei Confidential

• IP address planning principles:

▫ IP addresses must be managed and allocated uniformly on the entire


network.

▫ IP address allocation should be simple and easy to manage, reflect network


layers, simplify network management and network expansion, and be
visualized.

▫ Extensibility of IP address planning must be ensured. That is, some IP


addresses should be reserved at each layer so that IP addresses to be
summarized can be contiguous during network expansion.

▫ IP addresses should be contiguous. The routes with contiguous addresses


can be summarized easily on the hierarchical network. This reduces the
routing table size and speeds up route calculation and route convergence.

▫ IP address allocation must be flexible to allow optimization of various


traffic, security, and routing policies and make full use of the address space.

• IP address allocation principles:

▫ The network ID with the variable length and host address mask are used
for IP address allocation. Some IP addresses need to be reserved based on
the number of hosts on network segments. This ensures that IP addresses
can be summarized and prevents the waste of IP addresses.

▫ To facilitate route summarization, assign IP addresses on the same network


segment to devices in the same network area. If the preceding
requirements cannot be met due to limitations, ensure that routes can be
summarized at the aggregation layer.
Network Architecture Data Planning

VNI and VPC Planning


VXLAN Network Identifier (VNI) planning Virtual Private Cloud (VPC) planning
VNI: Fabric-VPC:
• Unique ID of a DC • VPC-Srv-DMZ: production extranet
• Unique ID of an equipment room module in a DC • VPC-Mgt-DMZ: service assurance extranet
• Unique ID of OpenStack in an equipment room module of a DC • VPC-Srv-Intranet: production intranet
VNI range: • VPC-Mgt-Intranet: service assurance intranet
• Active DC: VNIs from 1000000 to 1999999, a total of 1 million VNIs • VPC-Mgt: in-band management
• Intra-city DR DC: VNIs from 2000000 to 2999999, a total of 1 million
VNIs
• Remote DR cloud DC: VNIs from 3000000 to 3999999, a total of 1
million VNIs

VNI allocation example:


• DC ID (leftmost first digit): The value ranges from 1 to 9. It can be 1 (active DC), 2 (intra-city DR DC), 3 (remote DR DC), or 4 to 9 (reserved).
• Equipment room ID (leftmost second and third digits): The value is a decimal number of 100,000 or 10,000 digits. The value ranges from 01 to 99.
• OpenStack ID (leftmost fourth digit): The value ranges from 1 to 9.
• Subnet ID (leftmost fifth, sixth, and seventh digits): The value ranges from 001 to 999. That is, each OpenStack can use 999 subnets.

VNI allocation example: If the active DC equipment room is the first enabled cloud network equipment room module, the
possible VNI in the fabric is 1011001.

20 Huawei Confidential
Contents

1. Data Center Network Overview

2. Network Architecture Design and Data Planning

3. Underlay Network Design

4. Overlay Network Design

5. Network Security Design

6. Network Management and O&M Design

21 Huawei Confidential
Routing Access Egress HA

Underlay Routing Design: OSPF (1)


⚫ OSPF is recommended on the underlay network if the total number of switches is less than 200.

Single-PoD OSPF • Solution design:


▫ Area division: All devices are planned in Area 0.
Spine1 Spine2
▫ Configure the Loopback0 address as the VETP IP address. Plan the same
IP address for each group of active-active leaf nodes.
▫ Configure a globally unique Loopback1 address for each device as the
router ID.
OSPF 1 Area 0 ▫ Directly connect spine and leaf nodes through Layer 3 routed
interfaces, and set the network type to P2P.
• Route optimization configuration:
▫ Configure BFD for OSPF to shorten the route convergence time.
▫ Configure the OSPF route calculation interval and intervals for updating
Leaf1 Leaf2 Leaf3 Leaf4 Leaf5 Leaf6
and receiving LSAs to optimize route convergence in case of faults.
• Application scenario: This design applies to single-PoD DCs or ▫ Configure the period during which the maximum cost is retained after
dual-PoD DCs. Generally, a maximum of three PoDs and about the interconnection interfaces between the spine and leaf nodes
100 switches are deployed, which is simple and reliable. changes from Down to Up to optimize the switchback convergence
performance.

22 Huawei Confidential
Routing Access Egress HA

Underlay Routing Design: OSPF (2)


⚫ OSPF is recommended on the underlay network if the total number of switches is less than 200.

Multi-PoD OSPF
• Solution design:
Super spine ▫ Deploy the entire DC underlay network in the same OSPF
process.
▫ Deploy the interconnection areas between PoDs and between
OSPF 1 Area 0 spine and super spine nodes in Area 0.
▫ Deploy the interconnection area between spine and leaf nodes
in a PoD in a non-backbone OSPF area.
PoD1 Spine Spine PoD2

OSPF 1 Area 1 OSPF 1 Area 2

Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4

• Application scenario: This design applies to a DC with


more than three PoDs and a total number of switches
no more than 200.

23 Huawei Confidential

• Other planning description:

▫ When planning the connections between spine nodes and super spine
nodes, ensure that all nodes in OSPF Area 0 are reachable.
Routing Access Egress HA

Underlay Routing Design: EBGP (1)


⚫ EBGP is recommended on the underlay network if the total number of switches is greater than 200.
⚫ Application scenario: This design applies to a DC with more than three PoDs and a total number of switches more
than 200.

EBGP route planning in a PoD • Solution design:


▫ AS partitioning: Deploy each group of active-active leaf nodes in an AS
AS 65001 and spine nodes at the same layer in an AS.
Spine1 Spine2
▫ Peer establishment: Use IP addresses of Layer 3 routed interfaces to
establish EBGP peer relationships.
▫ Route advertisement: Advertise loopback addresses.
▫ Enable BGP load balancing to implement underlay load balancing.
EBGP
• Route optimization configuration:
▫ It is recommended that BFD for BGP be configured to shorten the route
convergence time.
▫ When the BGP peer relationship status changes from Down to Up, set
Leaf1 Leaf2 Leaf3 Leaf4 Leaf5 Leaf6 the BGP route priority to the lowest to optimize the switchback
AS 65002 AS 65003 AS 65004 convergence performance.

24 Huawei Confidential
Routing Access Egress HA

Underlay Routing Design: EBGP (2)


⚫ EBGP is recommended on the underlay network if the total number of switches is greater than 200.
⚫ Application scenario: This design applies to a DC with more than three PoDs and a total number of switches more
than 200.
• Solution design:
EBGP route planning between PoDs
▫ AS partitioning: Deploy each group of active-active leaf nodes
AS 65001
in an AS, spine nodes at the same layer in an AS, and the
Super spine
super spine node group in an AS.
▫ Peer establishment: Fully mesh spine nodes with super spine
EBGP nodes, and establish EBGP peer relationships through
AS 65002 AS 65003
interconnection interface addresses.
PoD1 Spine PoD2 Spine
▫ In principle, a network segment is deployed in a PoD, and
routes are summarized through spine nodes.
EBGP EBGP
▫ Use network segment routes for cross-PoD access to reduce
the number of cross-PoD routes.

Leaf1 Leaf2 Leaf3 Leaf4 Leaf1 Leaf2 Leaf3 Leaf4


AS 65004 AS 65005 AS 65006 AS 65007

25 Huawei Confidential
Routing Access Egress HA

Comparison Between Routing Protocols on the Underlay


Network
Item OSPF EBGP
Convergence
The convergence speed is fast. The convergence speed is faster.
speed
The protocol deployment is simple, but there are few control
Protocol The configuration is complex and various route control
methods. The protocol depends on the cost and needs to be
deployment methods are available.
adjusted on the entire network.
Applicable to small- and medium-sized networks. Applicable to medium- and large-sized networks.
Network scale
OSPF has high calculation consumption and limited scalability. BGP has low calculation consumption and good scalability.
The routing domain is independent in each area, and the fault
Fault domain The fault domain is large.
domain is controllable.
• Applicable to small- and medium-sized DCs and seldom
used in large-sized DCs.
• Applicable to large- and medium-sized DCNs.
• A single area is deployed for small- and medium-sized
• Multiple PoDs and multi-layer spine nodes are deployed,
networks, multiple areas are deployed for large-sized
and routes are transmitted between PoDs through EBGP.
networks with a three-layer architecture.
Application • It is recommended that the number of peers be less than
• It is recommended that the number of router IDs be less
scenario 500.
than 200 and the number of OSPF neighbors be less than
• It is recommended that the number of peers in a single PoD
100.
be less than 100 to prevent a large routing domain from
• It is recommended that the number of neighbors in a single
affecting network performance.
PoD be less than 100 to prevent a large routing domain
from affecting network performance.

26 Huawei Confidential
Routing Access Egress HA

Firewall Access Design (1)


Firewalls connected to service leaf nodes Firewalls connected to border leaf nodes Firewalls connected to border leaf nodes
in bypass mode in bypass mode in inline mode

External External External


network Firewall network Firewall network

PE PE PE

Firewall
Combination of
border leaf and Combination of
service leaf nodes border leaf and
Border Service service leaf nodes
leaf leaf
Spine Spine
Spine
Fabric Fabric
Fabric
Server leaf Server leaf
Server leaf
Server Server Server
• This is a standard architecture and features • This mode has low physical costs but poor • External traffic must pass through the
high scalability. Multiple service leaf node scalability. It is a typical deployment mode firewall, which applies to scenarios with high
groups are supported to connect to more VAS for small- and medium-sized DCs. security requirements.
devices. • A physical device plays multiple roles, • The scalability of the firewall is poor. The
• This mode is recommended if load balancing DCN must keep stable for a certain period of
consuming more resources.
needs to be performed among multiple border time.
leaf nodes on the same egress network.

27 Huawei Confidential

• The following factors must be considered for firewall access design:

▫ Select the resource pool type (Huawei firewall, managed third-party


firewall, or non-managed firewall) and network mode (inline or bypass)
based on the firewall product model and customer service requirements.

▫ Select the hardware device model and card type of the service leaf or
border leaf node based on the firewall access bandwidth (10G or 40G).

▫ Physically, both one-armed and two-armed connections are supported. The


one-armed connection has advantages in terms of the cost (saving ports)
and fault model (faults occur on both uplink and downlink logical links). As
such, the one-armed mode is recommended.

▫ Firewalls can be connected to service leaf nodes in bypass mode, to border


leaf nodes in bypass mode, or between border leaf nodes and PEs in inline
mode.
Routing Access Egress HA

Firewall Access Design (2)


⚫ Networking design:
 Border Leaf (BL) and Service Leaf are co-deployed.

Core switch
 The firewall is connected to the BL node in bypass mode and logically
connected between VRF1 and VRF2.

 Firewalls are deployed in active/standby mirroring mode. Each firewall is


VRF2 connected to two blade servers through two 10GE ports. M-LAG is deployed on
Border Leaf
VRF1 the BL to connect to the active and standby firewalls.

 The firewall differentiates different service traffic through virtual systems.


vSys2 vSys2
 Two 10GE links are deployed between active/standby firewalls as heartbeat
vSys1 vSys1
synchronization links.
Heartbeat line
FW FW  Traffic entering and leaving the fabric needs to pass through the firewall for
(active) (standby) security access control.

⚫ Route design:
 OSPF runs between the border leaf switch and core switch, and service VRF is
used to isolate VPC routes.

28 Huawei Confidential
Routing Access Egress HA

SLB Access Design (1)


⚫ Load balancing applications in DCs include server load balancing (SLB) and global server load balancing (GSLB).
The former implements server load balancing within a DC, and the latter implements load balancing between DCs.

Connecting LBs to service leaf nodes in bypass mode Connecting LBs to border leaf nodes in bypass mode

PE Firewall LB PE Firewall LB

Combination of
border leaf nodes and
Border leaf Service leaf service leaf nodes

Spine Spine
Fabric Fabric

⚫ LBs can be connected to border leaf nodes or service leaf nodes in bypass mode based on their deployment
locations on the network.
⚫ It is recommended that LBs be deployed in the same manner as firewalls.

29 Huawei Confidential

• The LB is a key component of the high-availability network infrastructure. A


cluster consisting of multiple servers replaces a single server to provide services
externally. A large number of service requests are distributed to multiple servers,
solving the high concurrency and high availability problems in the network
architecture. In this way, resource usage is optimized, throughput is maximized,
response time is minimized, and overload is avoided.

• The following factors must be considered for LB access design:

▫ Select a proper LB model, working mode, and scheduling algorithm based


on service characteristics and requirements.

▫ Based on the interface bandwidth of the LB, select the hardware device
model and card type of the service leaf or border leaf node.

▫ One-armed and two-armed connection modes are supported. In actual


deployment, LBs are generally connected in one-armed mode to reduce
costs and improve reliability.
Routing Access Egress HA

SLB Access Design (2)


⚫ Solution design:

User: 1.2.3.1/24  LBs are deployed in active/standby mode. Heartbeat links


are deployed between LBs.
External
network  LBs are connected to border leaf nodes in bypass mode
PE and connected to border leaf nodes through Eth-Trunks in
Untrust M-LAG mode.
1 NAT: EIP -> VIP Firewall
Trust
 LBs are connected in one-armed mode and connected to
Combination of border leaf the VXLAN at Layer 2. The floating IP address, VIP, and
Virtual Server IP (VIP): and service leaf nodes
2
192.168.201.20 Gateway: server IP address are configured in the same network
Self IP: Self IP:
192.168.201.1
192.168.201.245 192.168.201.246 segment.
Floating IP: Floating IP:
LB
192.168.201.254 Fabric 192.168.201.254  The real server gateway is not the F5 floating IP address
but the Layer 3 VXLAN gateway of the switch.
 The network needs to process gratuitous ARP packets and
192.168.201.110 supports failover between floating IP addresses and VIPs.

30 Huawei Confidential

• Solution deployment description:

▫ In the cloud-network integration scenario, the LB management capability is


provided by the cloud platform of each vendor. You need to query the
product capability of each vendor.

▫ For details about the LB automation capability in the network virtualization


scenario, see the controller product documentation.

▫ The LB can be configured with SNAT or configured as the server gateway.


The specific solution is determined by the service orchestration of the
LB/server and is not limited by the CloudFabric solution.

▫ The floating IP addresses and service VIPs of LBs and server IP address can
be in the same subnet or different subnets. You are advised to deploy them
in the same subnet. In this case, you do not need to configure a static route
destined for a service VIP on a switch.

• NAT Server load balancing:

▫ The client sends a request to the load balancing device at the front end of
the server cluster. The virtual service on the load balancing device receives
the request, selects a real server based on the scheduling algorithm,
translates the destination address of the request packet to the address of
the selected real server, and sends the request to the real server.

▫ The real server sends a response packet to the load balancing device, which
changes the source IP address in the response packet to the VIP, and then
forwards the response packet to the user.
Routing Access Egress HA

Server Access Design (1)


Connecting a server to an M-LAG in Connecting a server to an M-LAG in Connecting a server to standalone leaf
active-active mode active/standby mode nodes in active/standby mode

DAD link DAD link

Peer-link Peer-link
Active-active Active/standby Active/standby

Standalone Switches Switches in an M-LAG


Server access mode Active/standby Active/standby or active-active
High. The two switches are deployed independently and faults are High. Control planes are independent and fault domains are
Availability
isolated. isolated.
The cost is moderate. Peer-links and heartbeat links need to be
Cost The cost is low. No cable needs to be deployed between switches.
deployed.
The two switches are upgraded independently, without The two switches are upgraded independently, without
Version upgrade
interrupting services. Upgrade risks are low. interrupting services. Upgrade risks are low.
Active/standby and load balancing access modes are deployed on
Application
Active and standby NICs of the server are connected. the network, and the access solution on the entire network is
scenario
unified. This mode applies to M-LAG.

31 Huawei Confidential

• Consider the following factors when selecting models of and designing server leaf
nodes:

▫ Select an access mode. Servers often use M-LAG, stacking, and standalone
modes. M-LAG active-active deployment is recommended because it can
ensure service continuity during the upgrade of access switches.

▫ Select server leaf nodes (hardware devices) based on the server access
bandwidth (10GE/25GE access) and the ratio of server leaf nodes' uplink
bandwidth to spine nodes' downlink bandwidth.

▫ Determine the number of server leaf nodes based on the number of servers.

▫ Select the model of server leaf nodes depending on whether


microsegmentation or IPv6 deployment or evolution towards them is
required.
Routing Access Egress HA

Server Access Design (2)


Management plane: 10GE optical port

Service plane: 10GE optical port


Spine
Storage plane: 10GE optical ports

Leaf Leaf
Service and Storage switch
management
switch

eth0 eth2 eth4 eth0 eth2 eth4


eth1 eth3 eth5 eth1 eth3 eth5

Server1 Server2

32 Huawei Confidential

• Design scheme:

▫ Servers in adjacent cabinets share two groups of leaf switches, which are
connected to the service and management NICs and storage NICs of the
servers.

▫ The two leaf switches connect to the spine switches in the uplink.

▫ M-LAG is deployed between leaf nodes to connect to servers. Two


interfaces are used as peer-links. Each group of leaf nodes uses four uplink
interfaces to connect to two spine switches.

▫ Server SAN storage servers are deployed in storage node cabinets. The
traffic between Server SAN nodes is heavy. It is recommended that storage
nodes be connected to independent storage switches.

▫ The management plane and storage plane need to be configured using


iMaster NCE (Fabric) in advance. Therefore, the network administrator
needs to plan access ports for the service plane and storage plane and
connect storage NICs to storage switches.

• Server NIC planes are divided into the management plane (carrying management
traffic), service plane (carrying service traffic), and storage plane (carrying
storage traffic).

• In normal cases, a single node and different planes need to be dual-homed to


access switches to ensure reliability.
Routing Access Egress HA

Egress Design Overview


⚫ The egress network of a DC refers to the connection and configuration between border leaf nodes and egress PEs.
Border leaf nodes and PEs can be connected in multiple modes. Select a connection mode based on the customer's
existing border conditions.
Four Layer 3 interfaces Two Layer 3 interfaces One Layer 3 interface

PE PE PE

VRRP

Bypass link Bypass link

Border leaf Border leaf Border leaf


Layer 3 interface Layer 3 interface

33 Huawei Confidential

• Physical networking design:

▫ It is recommended that PEs be deployed in a two-node cluster to ensure


reliability.

▫ Deploy border leaf nodes as active-active gateways in an M-LAG. In some


scenarios, bypass links need to be deployed between border leaf nodes.

▫ The interconnection topology between border leaf nodes and PEs can be
square-shaped (two PEs have at least two physical ports) or dual-homed
(two PEs have at least four physical ports), depending on the number of
ports provided by the PEs. The dual-homed topology is recommended.

• Interconnection interface design:

▫ PEs can be connected to border leaf nodes through one, two, or four Layer
3 interfaces. It is recommended that PEs be connected to border leaf nodes
through four Layer 3 interfaces.

▫ Supported Layer 3 interfaces include VBDIF and VLANIF interfaces. (For


details, see the device model and interconnection scenario.)

• Egress routing design:

▫ Border leaf nodes and PEs can interwork through dynamic or static routes.
It is recommended that external routes be summarized and default routes
be advertised within the DC.

• Note: If four Layer 3 interfaces are used, a border leaf node group provides four
Layer 3 interfaces (physical or logical interfaces) to connect to PEs.
Routing Access Egress HA

Connecting Border Leaf Nodes to PEs Through Four Layer 3


Interfaces
Four Layer 3 interfaces

Independent
⚫ Physical networking design:
PE
deployment
 Four interconnection links form a dual-homed topology.
Four independent Layer 3 interfaces need to be configured
on the two PEs. If there are multiple cards, it is
Bypass link recommended that links be deployed across cards.

Border leaf  A Layer 3 bypass link can be deployed.


Layer 3 interface on  The M-LAG peer-link has at least two member links across
a border leaf node
cards to ensure reliability and bandwidth, and the member
• Application scenario: Two PEs are independently
link cannot be configured as the bypass link.
deployed and provide four Layer 3 interfaces to
connect to active-active border leaf nodes.

34 Huawei Confidential

• Solution description:

▫ A Layer 3 interface is configured on a border leaf node to connect to a


Layer 3 interface on a PE.

▫ Dynamic routing protocols and static routes can be deployed between


border leaf nodes and PEs. You are advised to deploy a dynamic routing
protocol, for example, BGP.

▫ Fast switchover: Associate statics routes with NQA or dynamic routing


protocols with BFD to detect the peer PE status, accelerating route
convergence.

▫ Route convergence: Configure a delay for the interface connecting the


border leaf node to the PE to go Up and a delay for advertising routes
when the interface goes Up from Down to optimize the switchback
performance.

▫ When border leaf nodes and spine nodes are deployed independently and
there are only a few interconnection interfaces between them, a Monitor
Link group needs to be deployed to associate the interfaces connecting
border leaf nodes to spine nodes with the interfaces connecting the border
leaf nodes to firewalls/LBs and PEs, preventing service interruption caused
by multi-link faults.

▫ Deploy a dynamic routing protocol for the bypass link. The two border leaf
nodes then can advertise egress routing information to each other for
egress link protection.
Routing Access Egress HA

Connecting Border Leaf Nodes to PEs Through Two Layer 3


Interfaces
Two Layer 3 interfaces

Independent
⚫ Physical networking design:
PE
deployment
 A square-looped topology is formed. If there are multiple
cards, it is recommended that links be deployed across
cards.
Bypass link
 A Layer 3 bypass link must be deployed.
Border leaf
 The M-LAG peer-link has at least two member links across
Layer 3 interface on
a border leaf node cards to ensure reliability and bandwidth, and the member

• Application scenario: Two PEs are independently link cannot be configured as the bypass link.
deployed and provide two Layer 3 interfaces to
connect to active-active border leaf nodes.

35 Huawei Confidential

• Solution description:

▫ A Layer 3 interface is configured on a border leaf node to connect to a


Layer 3 interface on a PE.

▫ Dynamic routing protocols and static routes can be deployed between


border leaf nodes and PEs. You are advised to deploy a dynamic routing
protocol, for example, BGP.

▫ Fast switchover: Associate statics routes with NQA or dynamic routing


protocols with BFD to detect the peer PE status, accelerating route
convergence.

▫ Route convergence: Configure a delay for the interface connecting the


border leaf node to the PE to go Up and a delay for advertising routes
when the interface goes Up from Down to optimize the switchback
performance.

▫ When border leaf nodes and spine nodes are deployed independently and
there are only a few interconnection interfaces between them, a Monitor
Link group needs to be deployed to associate the interfaces connecting
border leaf nodes to spine nodes with the interfaces connecting the border
leaf nodes to firewalls/LBs and PEs, preventing service interruption caused
by multi-link faults.

▫ Deploy a dynamic routing protocol for the bypass link. The two border leaf
nodes then can advertise egress routing information to each other for
egress link protection. Fast Reroute (FRR) can be configured to improve the
fault convergence performance.
Routing Access Egress HA

Connecting Border Leaf Nodes to PEs Through One Layer 3


Interface
One Layer 3 interface
⚫ Physical networking design:
PE PE
 Use a dual-homed or square-looped topology.
VRRP VRRP

VRRP VRRP

Border leaf Border leaf


Dual-homed networking Square-looped networking

• Application scenario: VRRP is deployed on PEs, or firewalls are


connected to border leaf nodes in inline mode through static routes.

36 Huawei Confidential

• Solution description:

▫ Deploy Virtual Router Redundancy Protocol (VRRP) on the two PEs, and
configure the same virtual IP address for them.

▫ Deploy the border leaf nodes as an M-LAG to connect to PEs and configure
the same IP address for the border leaf nodes.

▫ Configure static routes between PEs and border leaf nodes to implement
connectivity. The next hop of the static route configured on the border leaf
node is the VRRP address of the PEs.

▫ Fast switchover: Associate static routes with NQA to detect the peer PE
status, accelerating route convergence.

▫ Convergence optimization: Configure a delay for interconnection interfaces


connecting border leaf nodes to PEs to go Up to optimize the switchback
performance.

▫ When border leaf nodes and spine nodes are deployed independently and
there are only a few interconnection interfaces between them, a Monitor
Link group needs to be deployed to associate the interfaces connecting
border leaf nodes to spine nodes with the interfaces connecting the border
leaf nodes to firewalls/LBs and PEs, preventing service interruption caused
by multi-link faults.
Routing Access Egress HA

HA Design Overview
⚫ As the core department of the customer's IT infrastructure, the DC stores various data, runs a variety of services,
and provides services to external networks. DC faults will cause great loss in every year. Therefore, stable and
reliable running of DCs is critical.
⚫ High availability (HA) design can be divided into three levels:
• The controller and forwarder are loosely coupled. If the controller is faulty, delivered services are
not affected.
Controller
• The controller cluster supports protection against node faults.
• The controller supports active and standby clusters and provides protection against cluster faults.

• The underlay IP network is deployed on the spine-leaf physical architecture.


• A highly reliable overlay network is built based on M-LAG and VXLAN EVPN.
HA Network
layer • The underlay protocol is isolated from the overlay protocol. Routing protocols of the underlay and
design overlay networks do not affect each other.
• BFD and Monitor Link are supported for fault detection.

• The control, monitoring, and management planes are independent of each other, ensuring
system reliability and service continuity.
Device
layer • Main control boards, monitoring boards, switch fabric units, power modules, and fan modules
adopt redundancy design, and components are hot swappable, eliminating single points of failure
(SPOFs).

37 Huawei Confidential
Routing Access Egress HA

Spine HA Design

⚫ Link redundancy: Spine nodes are fully meshed with all leaf
nodes to form a full-mesh architecture.

Spine1 Spine2 ⚫ Device redundancy: Multiple high-density DC switches are


deployed to implement device-level reliability. In a typical
deployment scenario, two or four spine nodes are deployed.

IP ECMP The number of deployed spine nodes depends on the live


network scale.

⚫ Network redundancy:
 In the spine-leaf networking architecture, multiple spine nodes are
deployed to construct an IP ECMP load balancing network,
Leaf1 Leaf2 Leaf3 Leaf4 Leaf5 Leaf6
implementing network-level reliability.

 Routing protocols (BGP/OSPF) are deployed to achieve a high-


reliable architecture.

38 Huawei Confidential

• Fault detection and tolerance design:

▫ When the link of a spine node fails, leaf nodes quickly switch traffic to a
normal link through ECMP routes on the underlay network.

▫ If a spine node fails, leaf nodes quickly switch traffic to other spine nodes
through ECMP routes on the underlay network.
Routing Access Egress HA

Border Leaf HA Design

⚫ Link redundancy:
 It is recommended that border leaf nodes be connected to different
core switches and spine nodes, or to the same core switch and spine
node through multiple links across cards.
Core switch
 At least two links are deployed between border leaf nodes as an Eth-
OSPF 100
Trunk (used as a peer-link). It is recommended that the links be
deployed across cards.

Bypass link
⚫ Device redundancy: Active-active device groups are deployed on
Border leaf
border leaf nodes to implement device-level reliability.

⚫ Network redundancy:
 It is recommended that border leaf nodes be dual-homed to core
switches and be fully meshed with spine nodes to build an IP ECMP
network, eliminating Layer 3 loops.

 Link aggregation protocols (LAG or M-LAG) and routing protocols


(BGP or OSPF) are used to ensure a high-reliable architecture.

39 Huawei Confidential

• Fault detection and tolerance design:

▫ A bypass link is deployed between border leaf nodes to prevent traffic


interruptions caused by faults of all uplink interfaces. The bandwidth must
be at least equal to the total bandwidth of the uplinks of a single device.
(Optional. The bypass link is required in square-looped topology.)
Routing Access Egress HA

Server Leaf HA Design

⚫ Link redundancy: At least two links are deployed between


server leaf nodes as an Eth-Trunk (used as a peer-link). It is
Border leaf
VTEP recommended that the links be deployed across cards.

⚫ Device redundancy: Server leaf nodes are deployed as an


VXLAN
Spine active-active device group in an M-LAG to implement device-
level reliability.
VTEP VTEP ⚫ Network redundancy:
Server leaf
VBDIF Peer-link VBDIF VBDIF Peer-link VBDIF  Server leaf nodes are fully meshed with spine nodes to form an IP
ECMP network, eliminating Layer 3 loops.

 Link aggregation protocols (LAG or M-LAG) and routing protocols


(BGP or OSPF) are used to ensure a high-reliable architecture.
Server Server Server Server

40 Huawei Confidential

• Fault detection and tolerance design:

▫ If possible, configure dual-active detection (DAD) based on out-of-band


management network ports. Otherwise, configure DAD based on service
network ports.

▫ Deploy a Monitor Link group. If all uplinks fail, the associated downlink
goes Down, preventing traffic interruptions.

▫ Configure broadcast, unknown unicast, and multicast packets (BUM packets)


storm suppression on the downlink ports of leaf nodes. When VLAN 1 is not
used, packets from this VLAN are denied to prevent loops.
Routing Access Egress HA

Firewall HA Design

Peer-link ⚫ Link redundancy: Each firewall is dual-homed to the active-


Service leaf
active service leaf device group in an M-LAG through link
bundling.

⚫ Device redundancy: Firewalls are deployed in active/standby


mode and support hot standby.
Firewall
Active Standby

41 Huawei Confidential

• Fault detection and tolerance design:

▫ A heartbeat link is deployed between the active and standby firewalls to


check the status of the peer firewall and the connectivity of the heartbeat
link.

▫ Each firewall can be connected to service leaf nodes in one-armed or two-


armed mode. In one-armed mode (for example, a firewall is connected to a
service leaf node in one-armed mode), if the uplink logical port is faulty,
the downlink logical port is also faulty, and vice versa. Therefore, fault
detection is faster.

▫ Link fault: If any link between the active firewall and a service leaf node
fails, the M-LAG member interface goes Down, and the dual-homing
networking changes to single-homing networking, without affecting traffic
forwarding on the active firewall.

▫ Firewall fault: After an active/standby firewall switchover is performed, the


new active firewall sends gratuitous ARP packets to trigger the update of
forwarding entries on the service leaf nodes and divert traffic to the new
active firewall.
Routing Access Egress HA

Examples of HA Design Fault Scenarios


Spine Spine Spine

Leaf Leaf Leaf


Active Standby Active Standby Active Standby
Spine

Fault scenario 1: The standby Fault scenario 2: The active Fault scenario 3: The uplink
Leaf NIC or standby link is faulty. NIC or active link is faulty. of a leaf node is faulty.
Active Standby

Spine Spine Spine

Normal
Leaf Leaf Leaf

Active Standby Active Standby Active Standby

Normal traffic forwarding


Failover traffic Fault scenario 4: A leaf node Fault scenario 5: A spine Fault scenario 6: A spine
is faulty. node is faulty. node is faulty.

42 Huawei Confidential
Contents

1. Data Center Network Overview

2. Network Architecture Design and Data Planning

3. Underlay Network Design

4. Overlay Network Design

5. Network Security Design

6. Network Management and O&M Design

43 Huawei Confidential
Networking Routing Service Orchestration Traffic Forwarding

Overlay Networking Design (1)


⚫ There are three types of overlay networks. Network overlay indicates that the two endpoints of a VXLAN tunnel are physical switches.
Network overlay networking modes are classified into centralized network overlay and distributed network overlay.

VETP VXLAN Layer 3 Gateway


VTEP VXLAN Layer 2 Gateway

Distributed Centralized

Spine Spine

Server Leaf Service Leaf Border Leaf Server Leaf Service Leaf Border Leaf
VTEP VETP VETP VETP
VTEP VETP VTEP VTEP VETP VTEP

Physics Virtual Firewall Loaded Egress Physics Virtual Firewall Loaded Egress
Server Server equalizer PE Server Server equalizer PE

44 Huawei Confidential

• Centralized network overlay: VXLAN Layer 3 gateways are deployed in a


centralized manner, and leaf nodes function only as VXLAN Layer 2 bridges.

• Distributed network overlay: Leaf nodes function as VXLAN Layer 3 gateways,


and spine nodes are used only for IP forwarding.
Networking Routing Service Orchestration Traffic Forwarding

Overlay Networking Design (2)


⚫ In host overlay mode, all VXLAN tunnel endpoints are deployed on software switches (installed on servers). That is,
the start and end points of VXLAN tunnels are software switches.

Spine

Leaf

VETP VETP VETP VETP


VSwitch VSwitch VSwitch VSwitch
VM VM VM VM

45 Huawei Confidential
Networking Routing Service Orchestration Traffic Forwarding

Overlay Networking Design (3)


⚫ In hybrid overlay networking, VXLAN tunnel endpoints are deployed on hardware and software switches. That is,
VXLAN tunnel start and end points are both hardware and software.

Spine

VETP
Leaf

VETP
VSwitch VSwitch VSwitch VSwitch
VM VM VM VM

46 Huawei Confidential
Networking Routing Service Orchestration Traffic Forwarding

Overlay Routing Design: EVPN Deployment

Single-PoD EVPN ⚫ Solution design:


 Distributed VXLAN EVPN gateways are deployed. Each leaf
Spine1 Spine2 node functions as the Layer 2 gateway and Layer 3
IBGP EVPN
gateway, and traffic is forwarded along the shortest path.
 IBGP EVPN is deployed, and loopback addresses are used
to establish IBGP peer relationships.
AS 65001
 Configuring IBGP RRs reduces the number of fully meshed
connections between IBGP peers, simplifying configurations,
and reducing device resource consumption. It is
Server Service Border
recommended that spine nodes be used as RRs.
leaf leaf leaf

47 Huawei Confidential

• Other planning description:

▫ Run the undo policy vpn-target command on RRs to disable VPN target-
based filtering for VPN routes or label blocks.

▫ Configure a delay for the interface connecting the border leaf node to the
PE to go Up to optimize the traffic switchback performance.
Networking Routing Service Orchestration Traffic Forwarding

Service Model: Tenant Service Model (1)


⚫ Understanding the terminology, background, and implementation principles associated with service provisioning is helpful for quickly
mastering tenant network interconnection skills in a computing scenario.

• Tenant: is the minimum unit for enterprise service management.


Tenant
• Virtual Private Cloud (VPC): provides secure and reliable information
processing, storage, and transmission services to tenants through the
VPC External VPC
network virtualization and encryption technologies based on network, storage,
and compute resources. Multiple VPCs can be created for a tenant based
Logical router Firewall
on service requirements.
Optional
• Logical router: is virtualized by a network device where virtualization
Logical switch software is running, and is connected to VMs on different networks, so
that VMs can communicate with each other on a Layer 3 network. One
network device can be virtualized into multiple logical routers for
Logical Logical different tenants.
port port
• Logical switch: connects to different VMs to ensure that the VMs can
communicate with each other at Layer 2. One network device can be
virtualized into multiple logical switches for different tenants.

48 Huawei Confidential

• One network device can be virtualized into multiple logical routers for different
tenants. Multiple tenants can share a network device. For each tenant, a logical
router functions as an independent and real router with independent hardware
and software resources and running space. Services on different logical routers do
not affect each other. In terms of experience, there is no difference between a
logical router and a real router.

• One network device can be virtualized into multiple logical switches for different
tenants. Multiple tenants can share a network device. For each tenant, a logical
switch functions as an independent and real switch with independent software
and hardware resources and running space. Services on different logical switches
do not affect each other. In terms of experience, there is no difference between a
logical switch and a real switch.
Networking Routing Service Orchestration Traffic Forwarding

Service Model: Tenant Service Model (2)

Tenant
• Logical port: functions as an access point for VMs to access the network.
One physical port on a network device can be virtualized into multiple
VPC External VPC
network logical ports for different tenants. For each tenant, a logical port
functions as an independent and real port.
Logical router Firewall
Optional • External network: networks outside the tenant's management, such as
Internet or other tenant networks connected through VPNs.
Logical switch
• Firewall: The firewall function is provided by a physical firewall or virtual
firewall.
Logical Logical
port port
• VM: virtual machine.

49 Huawei Confidential

• Located at the border of a network, a firewall implements secure access control


between the external network and internal network, which enhances the network
protection capability. It protects service data flows between the Untrust and Trust
zones based on 5-tuple information. It can also be used for access control
between subnets. You can choose whether to deploy firewalls based on whether
the tenant needs to access an external network. For security purposes, deploy a
firewall when a tenant is connected to an external network.

• In the computing scenario, VMs are provisioned by the VMM connected to


iMaster NCE-Fabric. The VMM manages compute resources, and iMaster NCE-
Fabric manages network resources.
Networking Routing Service Orchestration Traffic Forwarding

Relationship Between a Physical Network and a Logical


Network
Logical network Physical network Abstract network model

External network
Internet
L4-L7
NAT/firewall
Firewall Internet
/IPsec/WAF…
Spine IPsec VPN Firewall
L3 Logical router
VRF
WAF NAT
Layer 3
L2 Logical switch gateway
Leaf Fabric
Layer 2
Bridge gateway
domain
Logical Logical
L1 Sub-
port port
interface VM Physical server
End port VM Physical Firewall
server

VM Physical server

50 Huawei Confidential

• A fabric network is a logical network constructed based on the VXLAN


technology and provides services such as VM access, physical machine access,
VPN access, Internet access, server load balancing, IP address translation, and
ACL-based packet filtering. Tenants can focus on services and construct logical
networks as required using services provided by a fabric network. You need to
consider the following factors when constructing a tenant network:

▫ Number of VPCs planned based on service types or security requirements.

▫ Security policies planned for access authorization between VPCs.

▫ Subnets planned for each VPC.

▫ Routes planned for communication between subnets.

▫ Resources such as VNIs and BDs allocated for each subnet.

• A logical network provides the following services based on the fabric network:

▫ Logical port: Logical ports are located at the bottom of a logical network
and provide access to the VXLAN network from VMs, physical machines,
NAT devices, IPsec VPNs, firewalls, and WAFs.

▫ Logical switch: Logical switches are located at the second layer of a logical
network and provide the network switching service between logical ports.

▫ Logical router: Logical routers are located at the third layer of a logical
network and provide the network route service between logical ports.

▫ NAT devices, IPsec VPNs, firewalls, and WAFs: They are located at the layer
4 to layer 7 of a logical network and provide advanced services.
Networking Routing Service Orchestration Traffic Forwarding

Manually Orchestrating Logical Networks (1)


Logical Network - 1 Logical Networks - 2

External External
⚫ Huawei iMaster NCE (Fabric) uses a logical model to define
network network
networks and divide multiple independent logical networks
VAS VAS
R Logical Router R Logical Router based on the physical network to virtualize network functions.
Logical VAS Logical VAS
⚫ Implementation principle:
S S S S
 A physical network is divided into multiple logical networks by
Logical switch 1 Logical switch 2 Logical switch 1 Logical switch 2
configuring VRF/BD features.
Logical network orchestration  On the Agile Controller-Campus, logical networks are created
Network Map
based on the logical network model that network engineers can
understand and automatically map the logical networks to the

Centralized management and control network features such as VRF and BD on the physical network. In
this way, the Agile Controller-Campus centrally manages and
controls the switches on the physical network.

Physical network

52 Huawei Confidential
Networking Routing Service Orchestration Traffic Forwarding

Manually Orchestrating Logical Networks (2)

Solution Advantages
• The drag-and-drop configuration is
intuitive and visible. Each operation step
is guided by configuration, and the
operation interface is user-friendly.

Application Scenario
• This mode is applicable to users who are
not familiar with configuration operations
or manual configuration scenarios with
small service scale.

53 Huawei Confidential

• Create a single service VPC. In the VPC view, drag different logical units to
complete network deployment.
Networking Routing Service Orchestration Traffic Forwarding

Overlay Forwarding Design Overview


⚫ A DCN has various communication traffic, for example, north-south traffic between inside and outside the DC, and
east-west traffic within the DC. Such traffic may or may not pass through the firewall. In these scenarios, the
controller can be used for unified orchestration to implement traffic forwarding.

Firewall Core (PE) East-west traffic in the same VPC of a DC:


▫ Communication between VMs on the same subnet in
a VPC
▫ Communication between VMs on different subnets
Service leaf Border leaf in a VPC

East-west traffic between different VPCs in a DC:


▫ Communication between VMs in different VPCs, with
Spine traffic passing through a firewall
▫ Communication between VMs in different VPCs, with
traffic not passing through a firewall
Server leaf
North-south traffic between inside and outside a DC
▫ North-south traffic passing through a firewall
▫ North-south traffic not passing through a firewall

VPC1 VPC1 VPC1 VPC2


192.168.1.0/24 192.168.1.0/24 192.168.2.0/24 172.16.1.0/24 Peer-link

54 Huawei Confidential
Networking Routing Service Orchestration Traffic Forwarding

Intra-VPC Communication
⚫ East-west traffic in a VPC is classified into intra-subnet Layer 2 traffic and inter-subnet Layer 3 traffic.

Physical model • Intra-subnet communication in a VPC:


Border leaf ▫ Create a BD, associate it with an L2VNI, and create an EVPN
instance. Then, create an access interface to associate the
VRF1 Layer 2 sub-interface with the BD.
▫ The local leaf node learns the MAC address of the host and
generates a MAC route. Then, the route is advertised to the
BGP EVPN remote leaf node through BGP EVPN, and the remote leaf
node receives the route through route target (RT)-based route
Server leaf 1 Server leaf 2 leaking.

VRF1 VRF1 • Inter-subnet communication in a VPC:


▫ In addition to the preceding operations, you need to create a
VBDIF interface for the gateway and associate the VBDIF
BD1 BD2 BD1 BD2
interface with a VRF and an L3VNI.
▫ After learning the host ARP entry and generating an IRB route,
the local leaf node advertises the route to the remote leaf
VM/BM VM/BM VM/BM VM/BM
node through the BGP EVPN peer relationship. The remote
192.168.1.1/24 192.168.2.1/24 192.168.1.2/24 192.168.2.2/24
leaf node receives the route through RT-based route leaking.

55 Huawei Confidential

• From the perspective of the logical network, intra-VPC communication services


are orchestrated as follows:

▫ For communication within a subnet in a VPC, use the SDN controller to


create a tenant and a VPC, and then create logical switches, logical ports,
and end ports.

▫ For communication across subnets in a VPC, associate different logical


switches with the same logical router in addition to the preceding
operations.
Networking Routing Service Orchestration Traffic Forwarding

Inter-VPC Communication
⚫ A VPC dynamically divides a physical network into logical network resource domains, including logical networks and
logical VASs. Access between VPCs can be implemented through firewalls at two sides, a firewall at one side, or no
firewall based on security access control requirements, which can be flexibly orchestrated on the SDN controller.
Physical model Tenant VPC 1
Server network segment: 10.1.1.0/24 • When traffic passes through the same firewall group at two
sides and border leaf and service leaf nodes are combined:
VRF1 Server leaf
1. In VRF1 on the border leaf node, configure a static route to 10.1.2.0/24
with the next hop being the IP address of the firewall, import the
3 4
Static route to BGP VPN-Instance VRF1, and advertise the route to EVPN.
Firewall 1
route
2. In VRF2 on the border leaf node, configure a static route to 10.1.1.0/24
vSYS1 VRF1
Static route

with the next hop being the IP address of the firewall, import the
Border leaf &
Static route to BGP VPN-Instance VRF2, and advertise the route to EVPN.
2 service leaf
route 3. The border leaf node sends static routes destined for 10.1.2.0/24 and
vSYS2 VRF2
10.1.1.0/24 to the server leaf nodes through BGP EVPN. Each server
leaf node selects a route based on the VPN RT value. The RT value
3 4
varies depending on the VPN.

VRF2 Server leaf 4. The server leaf node sends the host routes destined for 10.1.1.1 and
10.1.2.1 to the border leaf node through BGP EVPN. The border leaf
Tenant VPC 2 node selects a route based on the VPN RT value. The RT value varies
Server network segment: 10.1.2.0/24 depending on the VPN.

56 Huawei Confidential

• If traffic does not pass through a firewall:

1. In VRF1 on the border leaf node, configure a static route to 10.1.2.0/24


with the next hop being the IP address of VRF2, import the route to BGP
VPN-Instance VRF1, and advertise the route to EVPN.

2. In VRF2 on the border leaf node, configure a static route to 10.1.1.0/24


with the next hop being the IP address of VRF1, import the route to BGP
VPN-Instance VRF2, and advertise the route to EVPN.

3. The border leaf node sends static routes destined for 10.1.2.0/24 and
10.1.1.0/24 to the server leaf nodes through BGP EVPN. Each server leaf
node selects a route based on the VPN RT value. The RT value varies
depending on the VPN.

4. The server leaf node sends the host routes destined for 10.1.1.1 and
10.1.2.1 to the border leaf node through BGP EVPN. The border leaf node
selects a route based on the VPN RT value. The RT value varies depending
on the VPN.
Networking Routing Service Orchestration Traffic Forwarding

Communication Between a VPC and an External Network


⚫ When an SDN network needs to communicate with an external network, you need to create an external network on
the controller and associate it with an external gateway. If traffic needs to pass through a firewall, you need to
create a logical VAS and associate the logical router and logical VAS with the external network to enable north-
south service traffic.
Physical model External network
10.0.0.0/8
PE
3 Static route/BGP
Static 2
Firewal
route
l Ext_vSYS Ext_VRF
Static route Border leaf &
4 Static
1 service leaf
route
vSYS1 VRF1

VRF1
Server leaf
BD1

Tenant VPC 1
Server network segment: 10.1.1.0/24

57 Huawei Confidential

• When traffic passes through firewalls at two sides and the firewalls are
connected to border leaf nodes (combined with server leaf nodes) or service leaf
nodes in bypass mode:

▫ Deliver a static route destined for the external network with the next hop
being the firewall interconnection IP address to the tenant VRF (VRF1) on
the border leaf node. Import the static route to the tenant VRF and
advertise the route to the server leaf node through BGP EVPN.

▫ On the border leaf node, create an external gateway egress VRF (Ext_VRF),
and configure a static route pointing to the network segment of a VM or
server with the next hop being the firewall interconnection address. If the
border leaf and service leaf nodes are deployed independently, the static
route needs to be imported to the egress VRF and advertised to the border
leaf node through BGP EVPN.

▫ Static routes or BGP routes can be used between the external gateway
egress VRF on the border leaf node and the PE.

▫ In the egress vSYS on the firewall, configure a static route pointing to the
tenant vSYS on the firewall and a static route pointing to the egress VRF on
the border leaf node. In the tenant vSYS on the firewall, configure a static
route pointing to the egress vSYS on the firewall and a static route pointing
to the tenant VRF on the border leaf node.

• If traffic does not pass through a firewall:

▫ Configure static routes between the tenant VRF and egress VRF on the
border leaf node (service leaf node).
Contents

1. Data Center Network Overview

2. Network Architecture Design and Data Planning

3. Underlay Network Design

4. Overlay Network Design

5. Network Security Design

6. Network Management and O&M Design

59 Huawei Confidential
CloudFabric Security Architecture
Internet/WAN
⚫ Security at the network layer of a DC is classified as
Anti-DDoS
follows:
DC border security  Intra-DC security: intra-VPC and inter-VPC security
Firewall SVN
 Inter-DC security: intra-VPC and inter-VPC security

IPS Sandbox  DC-external network security: north-south traffic security

Zone border security

Vulnerability
scanning
Intra-DC Network
Network Network Cloud Network
security Flow probe Flow probe Firewall Firewall Flow probe
Firewall Firewall platform
SecoManager
WAF Database audit CIS WAF

Virtualization
security Office zone Server zone Management zone DMZ

60 Huawei Confidential

• Virtualization security:

▫ Security groups of the cloud platform are used to protect VMs. The cloud
platform adds VMs that require security control to different security groups
using orchestration and defines security policies between security groups for
access control.

• Intra-DC security:

▫ Office zone:

▪ Access control policies are configured on firewalls to protect security


of access between intranet office users.

▪ Cybersecurity Intelligence System (CIS) flow probes can be deployed


to collect traffic in the office zone for in-depth threat detection.

▫ Server zone:

▪ Firewalls or web application firewalls (WAFs) are used to protect


intranet servers.

▪ Database audit is used to protect database servers.

▪ CIS flow probes can be deployed to collect traffic in the server zone
for in-depth threat detection, preventing intranet threats from being
spread.
Intra-DC Security Deployment

Firewall Core (PE)


Traffic in the same VPC can be directly forwarded
through VXLAN. Alternatively, SFCs can be used to
redirect traffic in the same subnet or different
Service leaf Border leaf subnets to VAS devices to implement in-depth
security access control on traffic in subnets.

Spine Traffic between VPCs across subnets needs to be


isolated from each other. Firewalls control the
traffic for security.
Server leaf

VAS devices such as IPSs/firewalls and LBs


implement security access control or load
balancing when users outside the DC access
servers in a VPC inside the DC.

VPC1 VPC1 VPC1 VPC2


192.168.1.0/24 192.168.1.0/24 192.168.2.0/24 172.16.1.0/24

Peer-link

62 Huawei Confidential
Security Deployment Between DCs and External Networks

FW Core(PE)

Service Leaf Border Leaf When a VPC in a data center accesses an external network,
traffic is diverted to a VAS device, such as the firewall and
IPS, for SNAT or secure access control. Then, the VPC in the
Spine data center accesses the external network.

Server Leaf

VPC1 VPC1 VPC1 VPC2


192.168.1.0/24 192.168.1.0/24 192.168.2.0/24 172.16.1.0/24

Peer-Link

63 Huawei Confidential
Inter-DC Security Deployment
Multi-PoD scenario Multi-site scenario

Arbitration node Cloud management platform


API API

SecoManager iMaster NCE- iMaster NCE- SecoManager iMaster NCE- iMaster NCE-
SecoManager SecoManager
(Active) Fabric (active) Fabric (standby) (standby) Fabric Fabric

Active and standby egresses


VAS pool 1 VAS pool 2 VAS pool 1 VAS pool 2
Consistent policy VPC 1 VPC 2
Large Layer 2 network
VXLAN Layer 3 interconnection VXLAN
VXLAN Layer 2 interconnection
VPC 1
Fabric 1 Fabric 2 Fabric 1 Fabric 2

In the multi-PoD scenario, the controller cluster manages multiple In the multi-site scenario, two controller clusters independently manage
fabrics, and a unified VXLAN domain exists between fabrics. their DCs. Each DC has its own VXLAN domain and the two DCs
If the centralized egress is used, the situation is the same as that in the implement Layer 2 or Layer 3 communication through segment VXLAN.
single-DC scenario. If the active and standby egresses are used, a group During Layer 3 communication, traffic can be orchestrated to pass
of firewalls in active/standby mirroring mode must be deployed in each through firewalls in one DC or in both DCs, allowing security policies to
of the two fabrics. The controller cluster delivers the security policy to be deployed flexibly.
the two groups of firewalls. Firewalls in different DCs do not
synchronize sessions.

64 Huawei Confidential

• Note:

▫ Arbitration service: supports the site private network monitoring function. It


periodically checks the network connectivity of the active, standby, and
third-party sites and notifies these sites of the monitoring result through
the communication links between the arbitration nodes. If the arbitration
heartbeat is abnormal due to a network exception or site fault, the
arbitration service uses an internal algorithm to provide the optimal site on
the current network to implement automatic switchover between the active
and standby sites.
Firewall Hot Standby Design
⚫ You are advised to deploy firewalls in hot standby mode to improve reliability.

Active firewall Standby


Active firewall Standby firewall
Fault firewall

Hot standby Hot standby

Cluster/Stack
Link aggregation
Border leaf Border leaf Service packet

⚫ As shown in the figure, firewalls are connected to border leaf nodes in bypass mode. Two firewalls are configured
with the hot standby function and interconnected through heartbeat hot standby links.
⚫ If the active firewall is faulty, the standby firewall takes over services from the active firewall and forwards service
packets.
65 Huawei Confidential
Security Zone Design
⚫ A security zone, also known as a zone, is a collection of networks connected through one or more interfaces, where
users have the same security attributes. There are three typical types of security zones: Trust, DMZ, and Untrust.
 The Trust zone is a security zone with a high security level. It is typically used to define the zone where intranet users are
located.
 The DMZ is a security zone with a medium security level. It is typically used to define the zone where the servers that need to
provide services for external networks are located.
 The Untrust zone is a security zone with a low security level. It is typically used to define insecure networks such as the Internet.
Untrust
Internet WAN Security zone planning

• Generally, the intranet of a DC is considered secure, and security


threats mainly come from the outside of the DC.
Trust
• Therefore, the Internet is divided into the Untrust zone, the DC
Fabric intranet is divided into the Trust zone, and security devices are
deployed at the egress to isolate the intranet from the external
network and defend against external threats.

66 Huawei Confidential

• Most security policies are implemented based on security zones. Each security
zone identifies a network, and a firewall connects networks. Firewalls use security
zones to divide networks and mark the routes of packets. When packets travel
between security zones, security check is triggered and corresponding security
policies are enforced. Security zones are isolated by default.
Security Policy Design
⚫ After security zones are created on the firewall, these security zones are isolated from each other by default. To
enable communication between security zones (for example, the campus intranet accesses the Internet), you need
to configure Layer 3 connectivity and security policies on the firewall.
Security policy 2
• Intrusion detection Recommended security policy design for common zones
Internet
• Antivirus
• URL filtering Access Recommended
Access Source Trustworthiness
Zone Security Policies

DMZ External users Untrusted Intrusion


DMZ Untrust
Internet detection, URL
Employees on the go Medium filtering, antivirus
Firewall
URL filtering,
WAN Enterprise branch Medium
antivirus
VN1-Trust VN2-Trust Intranet employees High URL filtering,
Intranet
Guests Low antivirus
Security policy 1
• Intrusion
detection
VN1 • Antivirus VN2
Path for traffic from the Internet to DMZ
Path for traffic from VN1 to VN2

67 Huawei Confidential

• As shown in the figure, after security policies are configured, virtual networks
(VNs) on the intranet of the DC can communicate with each other, and the
external networks can access servers in the DMZ. In addition, different security
protection policies can be applied to traffic in different security zones.
Security Service Selection
⚫ Huawei security service architecture:

The CloudFabric solution provides the following


Third-party
Network controller
SecoManager
security service security services:
iMaster NCE-Fabric orchestration
• IPsec VPN

• Source Network Address Translation (SNAT)


Layer 2/Layer 3 fabric Huawei VAS Third-party VAS
• EIP

• Bandwidth management based on firewalls


⚫ iMaster NCE-Fabric: provisions logical network services, orchestrates the
bidirectional interconnection network between Huawei VAS devices and Layer • Anti-DDoS
2 or Layer 3 fabric, and manages Huawei CE switches and delivers network
• Security policy
configurations to them.

⚫ SecoManager: orchestrates services for Huawei VASs, manages Huawei VAS • Content security detection
devices, and delivers network configurations to them. • Virtual system
⚫ Huawei VAS devices: Huawei firewalls provide service functions such as
security policy, Elastic IP (EIP), SNAT, IPsec VPN, and content security
detection.

68 Huawei Confidential

• SecoManager description:

▫ SecoManager is a security controller that provides unified management for


Huawei firewalls on a network.

▫ In the CloudFabric solution, SecoManager functions as a security controller


to implement application-based independent security service provisioning.
SecoManager provides security policies for applications and between
applications to implement network visualization and improve network
maintainability.

▫ By interworking with iMaster NCE-Fabric (SecoManager is installed on


iMaster NCE-Fabric as a security service), SecoManager provides the
following security capabilities for the solution: security policy service, SNAT,
EIP, and IPsec.
Third-Party VAS Management Solution

⚫ In the CloudFabric solution:


Third-party VAS  For third-party VAS devices such as Check Point firewalls, the
management
Network controller platform/device service manager mode is used. iMaster NCE-Fabric manages
iMaster NCE-Fabric management UI third-party VAS devices and is responsible for bidirectional
Third-party VAS network provisioning and bidirectional traffic diversion
configuration. To provision L4-L7 services, administrators can
Fabric redirect to the third-party VAS management UI from iMaster
Spine VAS pool
NCE-Fabric.
Third-party
Leaf firewall/LB
 For VAS devices of other vendors, the network policy mode is
used. That is, iMaster NCE-Fabric does not manage third-party
VAS devices. Only network provisioning and traffic diversion
configuration from a fabric to third-party VAS devices are
implemented on iMaster NCE-Fabric. Services provided by third-
party VAS devices depend on the device capabilities.

69 Huawei Confidential

• Service manager mode: iMaster NCE-Fabric manages fabrics and orchestrates the
Layer 2 or Layer 3 interconnection network between Huawei VASs and a fabric.
The third-party management platform orchestrates and delivers L4-L7 policies of
third-party VASs.

• Service policy mode: iMaster NCE-Fabric manages fabrics and VASs, and
orchestrates and delivers L2-L7 policies of third-party VASs.

• Network policy mode: iMaster NCE-Fabric does not manage third-party VAS
devices. It is responsible for orchestrating the unidirectional interconnection
network and traffic diversion from a fabric to third-party VAS devices.

• Note:

▫ iMaster NCE-Fabric can manage third-party VAS devices such as Check


Point firewalls.

▫ iMaster NCE-Fabric uses SNMP to read device and link information and
uses RESTful to deliver service commands to these devices.
Contents

1. Data Center Network Overview

2. Network Architecture Design and Data Planning

3. Underlay Network Design

4. Overlay Network Design

5. Network Security Design

6. Network Management and O&M Design

70 Huawei Confidential
In-band Management/Out-of-band Management
In-band management Out-of-band management

VAS VAS

Border leaf Border leaf


(service leaf) Controller cluster (service leaf)

Out-of-band
Spine management Spine
Switch

Server leaf Server leaf


Server leaf

Out-of-band
Controller cluster Server Server management connection Server Server

• No independent management switch and network are configured. • iMaster NCE-Fabric connects to the out-of-band management
iMaster NCE-Fabric directly connects to the service network through network interfaces on network devices through an
a service switch, and manages and controls network devices through independently deployed out-of-band management switch, and
the underlay layer of the service network. manages and controls the network devices through an
independent out-of-band network.

71 Huawei Confidential
Network O&M Mode Selection
Network configuration Network monitoring and
management management

Configuration • CLI (Telnet/SSH) • CLI (Telnet/SSH)


management
• SNMP • SNMP

• NETCONF • NETCONF

• NetStream

• sFlow
Performance
management • Telemetry
Network
monitoring • Syslog
Fault management
• LLDP

• Mirroring

⚫ You can select different network management modes based on DCN O&M requirements.

72 Huawei Confidential

• The command-line interface (CLI) supports both network configuration


management and network monitoring management.

• The Set function of SNMP supports network configuration management, and its
Trap function supports network monitoring management.

• The Edit function of NETCONF supports network configuration management, and


its Get function supports network monitoring management.
Network O&M Mode: SNMP

Managed device ⚫ Configure the SNMP management program in


the network management station (NMS), enable
the agent program on the managed device, and
configure the SNMP protocol on the network.
⚫ Using SNMP:
 The NMS can obtain or change device information
NMS
through the agent to implement remote monitoring
and management.
 The agent can report device status to the NMS in a

Managed device timely manner.

Active link
Standby link

73 Huawei Confidential
Network O&M Mode: NETCONF

⚫ NETCONF provides a set of mechanism for managing

Management
network devices. With this mechanism, users can
SDN controller
platform add, modify, delete, back up, restore, lock, and
NETCONF client unlock network device configurations. In addition,
NETCONF message NETCONF provides transaction and session operation
Network functions to obtain network device configuration and
status information.
⚫ NETCONF has three objects:
NETCONF server
 NETCONF client
Device Device 1 Device 2 Device 3
 NETCONF server
 NETCONF message

74 Huawei Confidential
Network O&M Mode: Telemetry

⚫ Telemetry, also known as network telemetry, is a


3 roles
technology for network monitoring, including packet
Analyzer check and analysis, intrusion and attack detection,
intelligent data collection, and application performance

Collector Controller
management.
⚫ Advantages of telemetry:
Telemetry- NETCONF-based
Within  Supports multiple implementation modes, meeting
based configuration
subseconds
data upload delivery diversified user requirements.

Network device
 Collects a wide variety of data with high precision to fully
reflect network status.

1 typical management mode  Continuously reports data with only one-time data
subscription.
 Locates faults rapidly and accurately.

75 Huawei Confidential

• The collector, analyzer, and controller are components of the network


management system.

▫ The collector receives and stores monitoring data reported by network


devices.

▫ The analyzer analyzes the monitoring data received by the collector and
processes the data, for example, displays the data on the GUI.

▫ The controller uses NETCONF to deliver configurations to devices, so as to


manage network devices. The controller can deliver configurations to
network devices based on the analysis data provided by the analyzer and
adjust the forwarding behavior of network devices. It also controls the data
that the network devices sample and report.
CloudFabric O&M Overview
Visualized management and monitoring Intelligent Fault Processing

NE management Device Fault locating Path detection


management
VXLAN Connectivity
Alarm management Log management
detection detection
Performance indicator Configuration file Network loop
End port locating
management management detection

Network management Three-level Fault remediation Emergency plan


topology visibility template
Fabric network Logical resource Three-level Data consistency
management management rollback verification

Tenant management Link management


Fault rectification Fault rectification
and isolation
Network health
Device Fault detection

Controller Network Protocol Fault locating

Analyzer Overlay Service Root cause analysis

76 Huawei Confidential

• Intelligent O&M is implemented by the controller and analyzer. This course


describes some O&M features.
iMaster NCE-Fabric Deployment Design
Management leaf node managed (recommended) Management leaf node not managed

Spine Management spine Service spine

VXLAN

Management leaf VXLAN Service leaf Storage leaf Management leaf Service leaf Storage leaf

VTEP VTEP VTEP

Cloud Control Controller VM PM Storage Storage Cloud Control Controller VM PM Storage Storage
service node cluster service node cluster
node node

• The management leaf nodes managed by the controller are • The management leaf nodes not managed by the controller are
connected to the controller and cloud platform nodes. The connected to the controller and cloud platform nodes and are
management core and service core switches are combined. The deployed on an independent management network. The
management plane and service plane do not need to be isolated. management core and service core switches are deployed
This networking mode is applicable to a single-region single-core independently. The management plane and service plane are
network or multi-region single-core network. physically isolated. This mode is applicable to a single-region dual-
core network or multi-region dual-core network.

77 Huawei Confidential

• iMaster NCE-Fabric is a next-generation SDN controller for DCNs and is a core


component of the CloudFabric solution.

• The controller and SecoManager can be deployed in two modes: management


leaf nodes managed by the controller and management leaf nodes not managed
by the controller.

▫ Networking where management leaf nodes are managed:

▪ The management leaf node is managed by the DCN controller. The


northbound and southbound gateways of the DCN controller are
deployed on the management leaf node.

▪ The southbound gateway of the DCN controller is deployed using a


VLANIF interface. Direct routes are imported to the routing protocol
on the underlay network to achieve communication with in-band
management loopback interface addresses.

▪ The DCN controller creates an overlay network for communication


between network planes, such as the northbound gateway and cloud
platform.
iMaster NCE-Fabric Deployment Design: Single-Node System
⚫ To ensure high availability of network services, iMaster NCE-Fabric can be commercially deployed in a single-node
system or cluster.
⚫ In the single-node system deployment solution, iMaster NCE-Fabric is deployed on one node (PM or VM), saving
hardware resources to the maximum extent.

PE

VTEP Border leaf

Spine
VXLAN

VTEP VTEP Server leaf

iMaster NCE-Fabric VM PM

79 Huawei Confidential

• Note: The single-node system deployment solution is only applicable in network


overlay networking.
iMaster NCE-Fabric Deployment Design: Cluster
⚫ iMaster NCE-Fabric can be deployed in a cluster on PMs or VMs. PM cluster deployment is recommended to
improve reliability. If a single node in the cluster is faulty, other nodes can still run properly.
⚫ In the single-cluster deployment solution, iMaster NCE-Fabric is deployed in a three-server cluster in a DC to
manage all switches in the DC. An iMaster NCE-Fabric cluster can also manage switches in multiple DCs.
• Solution description:
Administrator PE
▫ If iMaster NCE-Fabric is deployed in in-band mode, it directly
connects to in-band management switches on the underlay
Northbound access network through leaf switches.
VTEP Border leaf
▫ The northbound access management network is used to log in
Management to iMaster NCE-Fabric.
network Spine
VXLAN • Solution characteristics:
▫ Two IP addresses are planned for each node, and the network
configuration complexity is medium.
VTEP VTEP Server leaf ▫ The southbound service plane is deployed separately;
Southbound therefore, the service volume increase on the southbound
service service plane does not affect cluster communication or
northbound communication.
▫ Southbound and northbound networks are isolated to prevent
unauthorized access and malicious attacks.
iMaster NCE-Fabric cluster VM PM

80 Huawei Confidential

• The iMaster NCE-Fabric cluster consists of three network planes:

▫ Internal communication network plane: Used for internal communication in


a cluster, for example, communication between nodes and communication
with the database.

▫ Northbound management network plane: Used for northbound


communication and Linux management, including cloud platform
interconnection, web access, and Linux login.

▫ Southbound service network plane: Used for communication with network


devices in the southbound direction through NETCONF, SNMP, and
OpenFlow.

• In actual deployment, the internal communication plane is integrated with the


northbound management plane, and the southbound service plane is
independently deployed. In addition, the southbound service plane manages DC
switches in in-band mode. The figure shows the deployment solution.

• The cluster deployment solution has the following advantages:

▫ Load balances services across multiple cluster nodes to ensure high


reliability and performance.

▫ Ensures the entire cluster runs normally even if a cluster node fails,
improving reliability.

▫ Supports flexible expansion to enhance the performance of the entire


cluster, improving scalability.
iMaster NCE-Fabric Deployment Design: Active/Standby
Clusters
Manual switchover for remote DR Automatic switchover for remote DR

Third-party site
Primary site Secondary site
Heartbeat link

Data Data Product Data


replication link sharing link (third-party) sharing link

Arbitration
Product Product
(active) (standby) Primary site Secondary site
Heartbeat link

Data replication link


Product Product
(active) (standby)

• Application scenario: Equipment rooms are located at two sites, and • Application scenario: Equipment rooms are located at three sites, and
the status of the active and standby sites is manually monitored. If a the status of the active and standby sites needs to be monitored in
site-level fault occurs, there is no strict requirement on the fault real time. If a site-level fault occurs, an active/standby switchover
recovery time. In this case, manual O&M is required. needs to be quickly implemented to restore services. The arbitration
service is provided by iMaster NCE-Fabric.

81 Huawei Confidential

• In the multi-DC active/standby DR scenario, the active/standby cluster solution


can be used.

• If the active DC is faulty, the standby DC and standby controller cluster become
active and continue to provide services, improving DC DR reliability.
iMaster NCE-FabricInsight Deployment Design: Single-Node
System and Standard Cluster
⚫ In single-node system deployment (in-band management) of ⚫ In standard cluster deployment (in-band management) of
iMaster NCE-FabricInsight, the collector and analyzer are iMaster NCE-FabricInsight, the analyzer and collector are
combined. Only one server needs to be connected to the leaf combined. That is, no independent collector server needs to
node. be deployed.

Single-node system deployment Standard cluster deployment

Fabric service network Fabric service network

Spine Spine

VTEP Server leaf VTEP Server leaf

iMaster-NCE
iMaster-NCE FabricInsight
FabricInsight cluster
Analyzer and collector Analyzer and collector 1-3

82 Huawei Confidential
iMaster NCE-FabricInsight Deployment Design: Advanced
Cluster
⚫ In advanced cluster deployment of iMaster NCE-FabricInsight, the collector and analyzer are deployed separately. It
is recommended that iMaster NCE-FabricInsight be connected to an independent leaf node, preventing link
congestion caused by increased traffic pressure on service links.
In-band management scenario Out-of-band management scenario

Fabric-1 service Out-of-band Fabric-n service


Fabric service
network management network network
network
Spine Spine Out-of-band Out-of-band Spine
management management
switch-1 switch-2
VTEP Server leaf VTEP VTEP

Server leaf Server leaf

iMaster-NCE
FabricInsight
cluster Collector Analyzer-1 Analyzer-2 Analyzer-3 Fabric-1 Analyzer-1 Analyzer-2 Analyzer-3 Fabric-n
collector collector

83 Huawei Confidential
CloudFabric Software Deployment Mode Selection
Deployment
Component Mandatory Description
Mode
Single-node system
The controller is deployed on one node.
deployment
iMaster NCE-Fabric Yes
The controller cluster consists of N nodes.
Cluster deployment
The controller can be installed on PMs or VMs.

In single-node system deployment, the collector and analyzer are combined. Only one
Single-node system
server needs to be connected to the leaf node.
deployment
A maximum of 100 CloudEngine devices can be managed.

iMaster NCE- Standard cluster In standard cluster deployment, the analyzer and collector are combined. That is, no
No deployment independent collector server needs to be deployed.
FabricInsight
In advanced cluster deployment, the collector and analyzer are deployed separately.
Advanced cluster It is recommended that iMaster NCE-FabricInsight be connected to an independent
deployment leaf node, preventing link congestion caused by increased traffic pressure on service
links.

Independent
SecoManager is deployed on a server or VM as independent software.
deployment
SecoManager No
Combined with SecoManager and iMaster NCE-Fabric are deployed on the same physical server or
iMaster NCE-Fabric VM.

84 Huawei Confidential
Quiz

1. (True or false) On a CloudFabric data center network with more than 200 switches, OSPF
is recommended on the underlay network. ( )
A. True

B. False

2. (Multiple-answer question) Which of the following deployment modes can be used to


ensure high reliability of border leaf nodes? ( )
A. Deploy border leaf nodes in active-active mode.

B. Cross-connect border leaf nodes to uplink core devices.

C. Fully mesh border leaf nodes with downlink spine nodes.

D. Deploy a bypass policy between border leaf nodes.

85 Huawei Confidential

1. B

2. ABCD
Summary

⚫ This course describes the planning and design of the CloudFabric DCN,
including the network architecture design, underlay and overlay network
design, network security design, network management and O&M design.
⚫ On completion of this course, you will understand the typical methods of
designing a DCN and be able to plan and design a DCN.

86 Huawei Confidential
Thank you. 把数字世界带入每个人、每个家庭、
每个组织,构建万物互联的智能世界。
Bring digital to every person, home, and
organization for a fully connected,
intelligent world.

Copyright©2023 Huawei Technologies Co., Ltd.


All Rights Reserved.

The information in this document may contain predictive


statements including, without limitation, statements regarding
the future financial and operating results, future product
portfolio, new technology, etc. There are a number of factors that
could cause actual results and developments to differ materially
from those expressed or implied in the predictive statements.
Therefore, such information is provided for reference purpose
only and constitutes neither an offer nor an acceptance. Huawei
may change the information at any time without notice.
CloudFabric Data Center Network Deployment -
Computing Scenario
Foreword

⚫ When the computing service management system is complex or the computing


management and network management are not integrated enough and a unified cloud
platform cannot be constructed, the computing and network can be associated to manage
and provision services together.
⚫ This course introduces how to build data center network. This solution is oriented to the
computing linkage scenario. The controller connects to the computing virtualization
platform instead of the cloud platform. The controller and computing virtualization
platform deliver services together to implement collaborative provisioning of computing and
network services.

2 Huawei Confidential
Objectives

⚫ On completion of this course, you will be able to:


 Describe the deployment process in the computing scenario.
 Service provisioning in the computing scenario.
 Understand the deployment in easy mode.

3 Huawei Confidential
Contents

1. Deployment Process Overview

2. Pre-configuration

3. Service Provisioning

4. Easy Deployment

4 Huawei Confidential
Architecture of the Computing Scenario

Service • In the computing scenario, no cloud platform is deployed. iMaster


NCE orchestration presentation NCE provides an independent UI, which interconnects with iMaster
layer NCE through the RESTful API.

• iMaster NCE automatically configures network devices and


Network interconnects with the VMM to implement virtual network provisioning
analysis/ and virtualization awareness.
SecoManager control layer • SecoManager interconnects with iMaster NCE to implement VAS
orchestration, policy management, and VAS modeling, instantiation,
and configuration delivery.
• FabricInsight detects network anomalies based on real service traffic.
Spine
Firewall LB
Network
VXLAN service layer • The spine-leaf architecture is used.

Leaf

• Virtualization server: iMaster NCE interconnects with the VMM to


Computing automatically deliver and modify configurations on the network side
VMM when VMs go online or offline.
access layer
• Physical server: Some servers do not support virtualization.
Administrators need to orchestrate and deliver network-side
configurations through the iMaster NCE-Fabric.

5 Huawei Confidential

• The architecture and deployment process of the rack leasing scenario are similar
to those of the computing linkage scenario. This chapter uses the computing
scenario as an example to describe the architecture and deployment process of
the solution.
Deployment Process in the Computing Scenario
Underlay
iMaster NCE Interconnecting SecoManager SecoManager Service
Started Network Ending
Preconfiguration with iMaster NCE Preconfiguration interconnection provisioning
Configuration

Manual Load the license Interconnecting Load the license


configuration file with file Interconnecting
FusionComupute with HiSec Insight
Pre-configured Discover the
Server firewall
Configuring Syslog
ZTP
(Underlay basic Creating a dual-
Configuring the system hot
configuration, RESTful
device backup group
management, link Interconnecting
discovery, and with the LDAP Data
fabric creation) Authentication synchronization
Server
Resource pool Collecting Device
management Interconnecting Alarms
with the RADIUS
Creating an Creating a
Authentication
External Gateway Firewall Security
Server
Resource Pool Mandatory
Configuring DHCP
Relay Optional

6 Huawei Confidential

• In the deployment phase, the network administrator needs to perform operations


such as basic underlay configuration and device management. These operations
can be manually performed or completed using zero touch provisioning (ZTP).
This slide uses ZTP as an example.
Contents

1. Deployment Process Overview

2. Pre-configuration
◼ Underlay Network Pre-configuration

▫ iMaster NCE-Fabric Pre-configuration

3. Service Provisioning

4. Easy Deployment

7 Huawei Confidential
Underlay Network Pre-configuration
⚫ An underlay network is the basic network for constructing a virtual extensible local area network (VXLAN) service
network, which is an overlay network.
⚫ The underlay network can be configured in ZTP or manual mode.

VXLAN

• ZTP allows newly delivered or unconfigured devices to


Overlay automatically load version files, deploy the underlay

network, and register with iMaster NCE for being managed


Spine
after they start.
OSPF
• This course will introduce both the ZTP and manual modes.
Leaf

Underlay

8 Huawei Confidential

• In traditional deployment mode, the administrator needs to manually configure


each newly delivered or unconfigured device after hardware installation, which
lowers deployment efficiency and results in high labor costs. iMaster NCE-Fabric
provides the ZTP-based simplified deployment function, which enables you to
plan the networking topology and fabric resources, automatically bring devices
online, execute device installation scripts, and deliver underlay configurations to
devices in batches on the GUI. This reduces labor costs and improves deployment
efficiency. ZTP-based simplified deployment enables quick rollout and
management of DCN devices.

• The DC physical network of the CloudFabric solution uses the spine-leaf


architecture and supports horizontal on-demand capacity expansion. The roles in
the network include spine nodes, server leaf nodes, border leaf nodes, service leaf
nodes, and DCI gateways. There are often a large number of server leaf nodes,
which require automatic service provisioning. Therefore, ZTP currently focuses on
server leaf nodes.

• Server leaf nodes support Multichassis Link Aggregation Group (M-LAG) and
standalone networking, which are applicable to different server access scenarios.
M-LAG networking is recommended because high reliability is achieved when
servers are dual-homed to the M-LAG. In addition, each M-LAG device has its
own control plane, simplifying upgrade and maintenance.
ZTP Manual Configuration

Basic Networking of ZTP


ZTP in-band networking ZTP out-of-band networking

Service cable Service cable


Management
cable

Out-of-band
Root device management
switch
M-LAG M-LAG

DHCP server SFTP server DHCP server SFTP server

Third-party server or service Third-party server or service


built in iMaster NCE-Fabric built in iMaster NCE-Fabric

Management traffic and subsequent service traffic of devices to be iMaster NCE-Fabric uses an independent management network to manage
brought online by iMaster NCE-Fabric are transmitted on the same management traffic of each device to be brought online.
network. The management network and service network share service ZTP out-of-band networking is recommended, which is used as an
network interfaces, and no independent management network is available. example in this course.

10 Huawei Confidential

• For in-band networking:

▫ iMaster NCE-Fabric: used to execute ZTP tasks and manage the devices to
be brought online.

▫ Root device: a device that has been managed by iMaster NCE-Fabric and
connects to the devices to be brought online. The root device functions as
the DHCP relay agent of the devices to be brought online and applies for a
temporary IP address from the DHCP server for these devices. The root
device is involved in in-band networking and needs to be manually
managed.

▫ Spine and leaf nodes: CE switches that need to be brought online through
ZTP. Currently, spine and server leaf nodes can be brought online through
ZTP. To bring a border leaf node online through ZTP, bring the border leaf
node online as a server leaf node and configure an external gateway for the
border leaf node on iMaster NCE-Fabric.

▫ Device to go online: CE device that is to be brought online through ZTP.

▫ Online device: upper-level device of the devices to be brought online. In in-


band networking scenarios, if spine nodes are brought online through ZTP,
the online device is the root device; if server leaf nodes are brought online
through ZTP, the online device is a spine node.
ZTP Manual Configuration

Standard ZTP Fundamentals

Provide the address and


ZTP-related parameters.
Root device
(DHCP relay agent)
DHCP server
Layer 3 network
Collect device logs.

1 Syslog server
Obtain the IP address, intermediate file
Resolve the domain
server address, and intermediate file
name.
name through DHCP.
2 DNS server
Obtain the intermediate file in .ini or Provide the
Python format, which contains the version intermediate file.
file address and version file name.
3 Intermediate file server
.cc Provide version
Obtain version files, including the system .cfg files.
software, configuration file, and patch file, Built in
and load the version files. Version file server

12 Huawei Confidential

• If the intermediate file is in .ini format, the device downloads the version files
based on the version file server address and version file names contained in the
intermediate file. If the intermediate file is a Python script, the device
automatically runs the script to download the version files.

• The intermediate file server or version file server can be a standard SFTP server
or the SFTP service built in iMaster NCE-Fabric.

• The DHCP service can also be built in iMaster NCE-Fabric.

• The DNS server is optional. If domain name resolution is not required, the DNS
server does not need to be deployed.
ZTP Manual Configuration

Standard ZTP Process


The process ends, and the device
starts with the configuration file.

Yes
Is there a Is a USB flash Yes Obtain the intermediate file from
Start configuration file? drive inserted? the USB flash drive and parse it.

No No
No Obtain the temporary IP address No Can the
Is ZTP deployment file information
and intermediate file server
terminated? be obtained?
address through DHCP.
Yes Yes

The process ends, and the device Obtain the intermediate file from Obtain version files from the USB
starts with empty configuration. the file server. flash drive.

Parse version files from the


intermediate file and obtain the
version files from the file server.

Load version files.

The process ends, and the


device restarts.

13 Huawei Confidential

• The process of using the controller to bring devices online through ZTP is slightly
different from this process and will be described later.

• Powering on and starting the device:

▫ After the device is powered on, if the device has a configuration file, the
device properly starts with the configuration file; if the device has no
configuration file, the ZTP process starts.

▫ If you have logged in to the device without a configuration file through the
console port, you can choose whether to terminate the ZTP process as
prompted. If you choose to terminate the ZTP process, the device starts
with empty configuration.

• Obtaining the intermediate file and version files from the USB flash drive:

▫ After the ZTP process starts, the unconfigured device first tries to obtain the
intermediate file from the USB flash drive. If the device obtains the
intermediate file, it parses the file and obtains information about the
version files to be downloaded. After downloading the version files, the
device restarts to complete automatic deployment. The device enters stage
3 if any of the following conditions occur: no USB flash drive is installed; the
USB flash drive does not contain a required intermediate file; the device
fails to obtain the version files.
ZTP Manual Configuration

ZTP Deployment Using the Controller


ZTP Configuration Provisioning Process

Network Manager • Application scenario: The Controller has been physically connected to the device
(Huawei CE switch), the device license of the device has been imported to the
1 Controller, and the root device has been manually managed.
• Process:
1. The network administrator clicks to start the ZTP task.
2. The iMaster NCE-Fabric advertises DHCP packets.
3. The device to go online obtains the temporary IP address and southbound IP
2 3 4 address of the iMaster NCE-Fabric from the DHCP packet sent by the iMaster
5 6 7 NCE-Fabric.
4. The device to go online uses the built-in certificate to initiate authentication
to the iMaster NCE-Fabric.
5. After the authentication succeeds, the iMaster NCE-Fabric determines the
Spine
device role (Spine/Leaf) based on the device model.
6. The iMaster NCE-Fabric delivers configurations such as the management IP
VXLAN address, SNMP, and NETCONF to devices to go online. After the devices to go
online are restarted, the iMaster NCE-Fabric implements formal
Leaf
management using the management IP address.
Underlay 7. The controller delivers interconnection configurations, OSPF configurations,
and BGP configurations to newly online devices through LLDP links.
8. Devices on the entire network go online successfully and all links are
established. The network topology is displayed on the iMaster NCE-Fabric.

15 Huawei Confidential

• The iMaster NCE-Fabric is used as the server to go online through ZTP. No


intermediate file is required. The iMaster NCE-Fabric directly obtains necessary
files (such as boot.py and cfg) from the iMaster NCE-Fabric and sets the
configuration file and version file for next startup.

• In the spine-leaf architecture, in inband networking scenarios, after a spine node


goes online, the upper-layer device of the leaf node is an online spine node, and
configurations such as DHCP relay are delivered to the spine node.
ZTP Manual Configuration

ZTP Process in Out-of-Band Networking (Using the Service


Built In iMaster NCE-Fabric)
Preparing the server 1 Enabling the SFTP service of iMaster NCE-Fabric

If the management network is a Layer 3


Configuring the network, configure routes between iMaster
Preparing for
2 management NCE-Fabric and devices and configure a DHCP
network connectivity
network relay agent on the out-of-band management
switch.

Typical On iMaster NCE-Fabric, select the typical Configure


3 configuration configuration and plan the networking based fabric resources
Check the
planning on the number of devices. and set
physical
deployment
topology and
Or parameters
Planning networking device
such as the
and fabric resources information on
DHCP/SFTP
User-defined iMaster NCE-
server on
3 configuration Use the template of iMaster NCE-Fabric. Fabric.
iMaster NCE-
planning Fabric.

Connecting devices based on Bringing devices online in batches on


Bringing devices online 4 5
the network plan iMaster NCE-Fabric

16 Huawei Confidential
ZTP Manual Configuration

Typical Configuration Planning and User-Defined


Configuration Planning
Item Typical Configuration Planning User-Defined Configuration Planning

Applies to the scenario where the customer has no Applies to the scenario where the customer has a network
Scenario
network plan. plan.

The configuration is simple.


Compared with typical configuration planning, user-defined
You only need to enter the numbers of spine and leaf
Configuration configuration planning is more complex.
nodes and the port range of the nodes. iMaster NCE-
complexity Using the iMaster NCE-Fabric template is more complex than
Fabric automatically generates a topology based on
using CloudFabric Designer.
the number of devices and networking mode.

Impact on You can connect devices only after networking and


After completing network planning, you can directly connect
follow-up fabric resource planning are complete on iMaster
devices. Fabric resource planning can be performed during ZTP.
operations NCE-Fabric and a topology is generated.

17 Huawei Confidential
ZTP Manual Configuration

Process of Manually Configuring an Underlay Network


Controller
Management Hardware
Start Border leaf Server leaf interconnection End
network firewall
configuration

Management
Basic IP address Basic IP address Interface
interface LLDP
configuration configuration configuration
configuration

SSH Security zone


Underlay route Underlay route NETCONF
configuration configuration

Security Eth-Trunk
Overlay route Overlay route Underlay route
hardening configuration

M-LAG M-LAG Hot standby M-LAG

External network
Security policy
interconnection

Virtual system

18 Huawei Confidential

• This section describes the scenario where the border leaf and service leaf nodes
are combined.
ZTP Manual Configuration

Management Network Configuration

Management interface

Management cable Management interface configuration


• Configure the IP address and route of the management
interface (MEth interface on the CE switch).
Spine SSH configuration
• Enable the SSH function, create an SSH user, and
specify the source interface of the SSH server.
Leaf
Security hardening
• Configure an ACL to allow only the controller to log in
to the device using SSH.

19 Huawei Confidential

• Configure an ACL to match the controller IP address and run the ssh server acl
acl-number command to perform SSH security hardening.
ZTP Manual Configuration

Border Leaf - Basic IP Address Configuration

L0 11.0.X.X

L1 11.0.0.X
Interconnection interface
• Plan a 30-bit network segment for interconnection
5 6 interfaces.
Spine
3
Loopback0
10.0.0.0/30
(Example) • Configure the same VTEP address for the two spine
1 2 nodes and the same VTEP address for the server leaf
Leaf
1 2 3 4 nodes configured in an M-LAG.
Loopback1
• Configure an independent router ID for each device.

20 Huawei Confidential

• In this example, the spine, border leaf, and service leaf nodes are combined.
ZTP Manual Configuration

Border Leaf - Underlay Route

• Configure OSPF in a single area and enable OSPF on


interconnection interfaces to optimize the OSPF
convergence speed.
• Configuration example:

Spine #
ospf 1 router-id 11.3.3.3
OSPF Area 0 stub-router on-startup 600 include-stub
area 0
network 10.1.1.1 0.0.0.0
Leaf
...
#
interface 10GE1/0/20
undo portswitch
ip address 10.1.1.54 255.255.255.252
ospf network-type p2p

21 Huawei Confidential
ZTP Manual Configuration

Border Leaf - Overlay Route

BGP EVPN

BGP EVPN
RR RR • Use the IP address of Loopback1 as the source address
Border leaf & spine VTEP to establish a BGP EVPN peer relationship. Spine nodes
function as RRs, and leaf nodes function as RR clients.
• Configure the IP address of Loopback0 as the VTEP
Server leaf address, and configure the same source MAC address
VTEP VTEP
for NVE interfaces on the active-active gateways.

22 Huawei Confidential
ZTP Manual Configuration

Border Leaf - M-LAG

Eth-Trunk M-LAG:
• Configure Eth-Trunk interfaces as the peer-link
interfaces and set the spanning tree mode to V-STP.
• Configure an M-LAG and establish links between the
border leaf nodes and firewalls and between the
Layer 3 link border leaf nodes and external routers.
Border leaf
& spine Peer-link

23 Huawei Confidential

• Traffic from the corresponding interconnection management VLAN must be


allowed to pass through the interconnection link between the border leaf nodes
and firewalls.

• After creating Eth-Trunks with external routers, you need to configure


interconnection between the border leaf nodes and external routers based on the
external network connection mode. This configuration is performed on iMaster
NCE-Fabric after the border leaf nodes are managed by iMaster NCE-Fabric.
ZTP Manual Configuration

Server Leaf - Active-active Gateways and M-LAG

Active-active gateways
Spine
• Deploy active-active gateways between server leaf
nodes, configure a DFS group, and enable the active-
active function.
Server leaf
M-LAG
• Configure an M-LAG based on the server access mode,
such as access in LACP mode, active/standby mode, or
single-homing mode.
Bond in LACP Active/standby Single-homing
mode mode access

24 Huawei Confidential

• The basic IP address configuration, interconnection interface configuration,


underlay route configuration, and overlay route configuration on server leaf
nodes are similar to those on border leaf nodes, and are not provided here.

• An M-LAG does not need to be configured on server leaf nodes when servers
access in active/standby or single-homing mode. You can configure an Eth-Trunk
with a single member interface on the access device or not configure
aggregation.
ZTP Manual Configuration

Firewall
⚫ Perform the following pre-configurations on firewalls.

No. Task Name Description

1 Interface IP address Configure an IP address for hot standby heartbeat interfaces and the management interface.

Configure Eth-Trunks for connecting to border leaf nodes and allow packets from the
2 Eth-Trunk
corresponding service VLANs to pass through.

Add the hot standby heartbeat interfaces to the DMZ, Eth-Trunk and Virtual-if0 to the Untrust
Security domain and
3 zone, and management interface to the Trust zone.
security policy
Set the action in the security policy to permit by default.

Configure hot standby, enable the quick session backup function, configure a firewall to restart
4 Hot standby with the basic hot standby configuration and synchronize normal service configurations from
the other normal firewall, and enable certain functions of the standby firewall.

Virtual system and Virtual-


5 Enable the Virtual-if name conversion function and enable the virtual system function.
if name conversion

25 Huawei Confidential
ZTP Manual Configuration

Interconnection with the Controller


⚫ To enable the controller to manage devices, perform configurations, and discover links, configure the
following functions on switches and firewalls.

No. Task Name Description

Configure SNMPv3 and set the corresponding MIB view.


1 SNMP The controller obtains LLDP link information from the MIB view specified in SNMP. The MIB
view defined in SNMP is iso-view, and the OID MIB sub-tree of the MIB objects is iso.

Switch: Configure SSH, enable the NETCONF function, and enable the SNETCONF service.
2 NETCONF Firewall: Configure an API user, enable the NETCONF interface service, and configure the
NETCONF port number.

You need to enable LLDP globally on CE switches and firewalls so that the controller can
3 LLDP
discover links using LLDP.

26 Huawei Confidential
Contents

1. Deployment Process Overview

2. Pre-configuration
▫ Underlay Network Pre-configuration
◼ iMaster NCE-Fabric Pre-configuration

3. Service Provisioning

4. Easy Deployment

27 Huawei Confidential
Importing a License to iMaster NCE-Fabric
⚫ Obtain the license file of iMaster NCE-Fabric and import it to iMaster NCE-Fabric.

28 Huawei Confidential

• This slide uses the traditional license management mode as an example.


iMaster NCE-Fabric Pre-configuration (1)

Task Name Task Description

To add the server and the links between the server and devices to iMaster NCE (Fabric), you need to configure the link
Pre-configured Server
discovery protocol on the server and check whether the host name is duplicate.

Configure a global policy for device management on iMaster NCE (Fabric). (for example, synchronizing device online
data, saving device configurations periodically, and verifying device SSH fingerprints.) Then, iMaster NCE-Fabric discovers
Device management and manages network devices based on SNMP and NETCONF.
After the management is complete, you need to create a device group to manage all-active devices and collect device
alarms so that iMaster NCE-Fabric can successfully manage the target devices and device groups.
iMaster NCE-Fabric can learn the topology structure between devices and obtain the network connectivity status based on
the link status and link details.
Link management iMaster NCE-Fabric supports automatic discovery, manual creation, and batch import. Automatic discovery is based on the
fact that the devices at both ends of a link support the link automatic discovery protocol, such as LLDP. To manually create
or import a file in batches, you need to enter the port information of the devices at both ends.
(Optional) When CheckPoint, Palo Alto, Fortinet, and F5 load balancing are used, you can use iMaster NCE-Fabric to
Managing Third-Party VAS manage the devices in the computing interworking scenario. In this way, when VAS services are provisioned, iMaster
Devices NCE-Fabric can automatically deliver configurations such as service routes to the interconnection ports between the NCE
and switches.

29 Huawei Confidential

• After the license file is loaded, you need to perform the following pre-
configurations to prepare for service provisioning.
iMaster NCE-Fabric Pre-configuration (2)
Task Name Task Description
Creating a fabric: After device management and link discovery are successful, you need to create a fabric resource pool on
iMaster NCE-Fabric and specify the egress gateway and DCI gateway (multi-DC scenario) to prepare for service
provisioning. This scenario supports the creation of distributed network overlay (recommended) and centralized network
overlay fabrics.
Configure the best-effort link for the active-active CE device group: If the border leaf node functions as the active-active CE
switch, configure the best-effort link to improve the reliability of the egress network.
Setting the role of the firewall link: For the service interconnection link between the firewall and switch, you can set the role
to internal, external, or internal and external links (recommended).
Creating third-party L4-L7 resource pools: If third-party VAS devices are used, create third-party L4-L7 resource pools on
iMaster NCE (Fabric).
Resource pool management Associating fabrics with L4-L7 resource pools: Associate the created fabric with L4-L7 resource pools so that L4-L7 services
can be associated with L2-L3 services when services are provisioned. If you need to associate Huawei L4-L7 resource pools,
perform this operation only after the resource pools are created on the SecoManager.
Configuring interconnection resources: When cross-VPC interconnection services pass through the firewall, you need to
specify the value range of the interconnection VLAN and IP address between the switch and the firewall. iMaster NCE-Fabric
automatically selects the VLAN and IP address from the range to deliver configurations. When a best-effort link is
configured, iMaster NCE-Fabric automatically selects the interconnection IP addresses of the two ends of the best-effort link
and delivers the IP addresses to devices.
Configuring global resources: During service provisioning, iMaster NCE-Fabric uses a series of variable parameters. (such as
the BD, global VNI, global VLAN, public IP address, and interworking IP address). Therefore, you need to set these
parameters globally in advance so that iMaster NCE-Fabric can invoke the parameters.

30 Huawei Confidential
iMaster NCE-Fabric Pre-configuration (3)

Task Name Task Description


To orchestrate services that access external data centers, you need to create an external gateway on iMaster NCE-
Creating an External Gateway
Fabric and define the destination, outbound interface, and route delivery mode for accessing external networks.
(Optional) Create a DHCP relay so that VMs on the service network can automatically obtain IP addresses from the
Configuring DHCP Relay
DHCP server.

31 Huawei Confidential
iMaster NCE-Fabric Interconnection Commissioning
⚫ After iMaster NCE-Fabric is preconfigured, you need to interconnect with the VMM on the computing virtualization
platform to deliver computing services and network services.

Task Name Task Description

To enable iMaster NCE-Fabric to detect VM login, logout, and migration through FusionCompute and
Interconnecting with
implement automatic network service deployment, you need to configure interconnection between iMaster
FusionCompute
NCE-Fabric and FusionCompute.

(Optional) If iMaster NCE-Fabric and iMaster NCE (FabricInsight need to implement data association for
Configuring FabricInsight
intelligent fault handling, you need to interconnect them with each other.
(Optional) iMaster NCE-Fabric can interconnect with multiple systems through northbound interfaces to
implement the following functions:
Interconnecting with • iMaster NCE-Fabric interconnects with eSight through the northbound SNMP protocol. After the
Northbound Services interconnection is complete, eSight can synchronize internal alarms of iMaster NCE-Fabric.
• iMaster NCE-Fabric interconnects with the Syslog server to transfer iMaster NCE-Fabric logs to the Syslog
server for centralized storage.
(Optional) Interconnect with the LDAP server and RADIUS server.
• To enable users on the LDAP server or AD server to log in to iMaster NCE-Fabric, you need to configure
Interconnecting with interconnection between iMaster NCE-Fabric and these servers.
Southbound Services • To enable users in the user group on the RADIUS server to authenticate logins to iMaster NCE-Fabric, you
need to configure interconnection between iMaster NCE-Fabric and the RADIUS server.

32 Huawei Confidential

• iMaster NCE-Fabric can also interconnect with the vCenter of VMware.


SecoManager Pre-configuration and Interconnection
Commissioning
⚫ In the CloudFabric solution, the SecoManager and iMaster NCE-Fabric are combined. After the SecoManager is
installed, the SecoManager and iMaster NCE-Fabric are automatically connected. When the SecoManager is
installed and the license is activated, perform the following operations to prepare for service provisioning.

No. Task Name Description


Loading the license of the
1 Log in to iMaster NCE-Fabric and load the license of the SecoManager.
SecoManager
2 Discovering firewalls Discover Huawei firewalls on the SecoManager.
(Optional) By default, the SecoManager can automatically identify hot standby groups. You do not need to
Creating a hot standby group for
3 perform this operation. If the SecoManager cannot automatically identify firewalls in a hot standby group, you
firewalls
need to manually create a hot standby group after the firewalls are discovered.
Before service provisioning, you are advised to perform difference discovery between the active and standby
4 Synchronizing data
firewalls to ensure that the configurations on the active and standby firewalls are the same.
5 Collecting device alarms Enable the SNMP trap function on iMaster NCE-Fabric to collect firewall traps.
Add the firewalls discovered by the SecoManager to a security resource pool to implement virtualization and
6 Creating a firewall resource pool
provide virtual security resources for tenant services.
Associating a fabric with an L4-L7 The resource pool created in the previous step must be associated with the fabric resource pool created on
7
resource pool iMaster NCE-Fabric.
Set roles of links between firewalls and switches. Different link roles carry different traffic.
8 Setting the link role for firewalls
There are three types of link roles: internal link, external link, internal and external link.

33 Huawei Confidential
Contents

1. Deployment Process Overview

2. Pre-configuration

3. Service Provisioning
◼ Deploying Layer 2 and Layer 3 Basic Services

▫ Deploying the VPC Interconnection Service

▫ Deploying Value-added Services

▫ Deploying Microsegmentation and Service Chain

4. Easy Deployment
34 Huawei Confidential
Service Provisioning Overview
⚫ Service provisioning refers to allocating appropriate network and computing resources to carry service
applications. A data center administrator needs to allocate certain resources to tenants based on the
service plan. Then the tenant administrator can configure and deploy network and computing services
based on the resources.
⚫ Service provisioning involves two steps: L2-L3 basic service invoking and other advanced services.
 Layer 2 and Layer 3 basic services: Constructs the VPC basic network and associates with the VMM to connect
VMs to the network.
 Other advanced services: Invokes various advanced services, such as VPC interworking, value-added services, and
service chain, based on service requirements.

Pre-configuration
underlay network and L2-L3 basic services Advanced services
iMaster NCE-Fabric
Prerequisite

35 Huawei Confidential
Overview of L2-L3 Basic Services
⚫ Basic L2-L3 services refer to the basic networks created by the tenant administrator in the VPC,
including logical routers and switches. Logical ports and user ports can be created in the following
scenarios:
 Computing association: VMs are connected to the network through VMM mapping, and logical ports and user
ports are automatically generated.
 Rack leasing: Manually create logical ports and user ports and specify the actual parameters for connecting
servers to server leaf nodes.

36 Huawei Confidential

• Configuration roadmap:

▫ Create a tenant and allocate resources to the tenant.

▫ Create a VPC in the tenant.

▫ Create a logical router in the VPC of the tenant and create a subnet list.

▫ Create a logical switch in the tenant's VPC and associate it with a logical
router and subnet.

▫ Orchestration server access:

▪ Computing association scenario: Create VMM mappings and associate


them with different logical switches. Create a VM on the VMM and
select the corresponding port group to access the network.

▪ In the rack leasing scenario, create a logical port and a user port and
set parameters respectively.
Creating a Tenant and VPC

Data Center Administrator • Creating a Tenant and • Creating a VPC and Authorizing
Allocating Resources Related Resource Pools

Spine

Fabric

Leaf

Specifying an area for enterprise A

Virtual server Physical server

37 Huawei Confidential

• A tenant can create multiple VPCs based on service requirements.


Creating Logical Routers and Switches
⚫ Orchestrate logical networks in a VPC and create logical routers and switches based on service requirements.
 When creating a logical router, you can set related parameters, such as the subnet and gateway.
 When creating a logical switch, you can associate it with a logical router to automatically complete service orchestration.

• Setting parameters related to the logical router

• Associating a Logical Switch with a Logical Router

38 Huawei Confidential
Orchestration Server Access (1)
⚫ Create VMM mappings for accessing VMs.

• Creating VMM Mappings

• View VMM Port Name

39 Huawei Confidential

• The name of a port group is in the following format: tenant name|logical switch
name|VDS name|VLAN ID.
Orchestration Server Access (2)
⚫ The computing administrator creates a VM on the VMM and selects a port group to access the network. After the
VM is started, iMaster NCE-Fabric automatically detects the logical port and user port connected to the VM.

40 Huawei Confidential
Contents

1. Deployment Process Overview

2. Pre-configuration

3. Service Provisioning
▫ Deploying Layer 2 and Layer 3 Basic Services
◼ Deploying the VPC Interconnection Service

▫ Deploying Value-added Services

▫ Deploying Microsegmentation and Service Chain

4. Easy Deployment
41 Huawei Confidential
VPC Interconnection Service Overview
⚫ By default, subnets under a VPC's logical routers communicate with each other at Layer 2 and Layer 3.
However, networks between different logical routers, VPCs, and tenants cannot communicate with
each other. To achieve such communication, you need to configure VPC interconnection.
⚫ Based on the mutual access requirements, the VPC interworking scenarios are as follows:
 Traffic not passing through the firewall.
 Traffic passing through the firewall in only one direction.
 Traffic passing through the firewall in both directions.

42 Huawei Confidential
Traffic Model for Communication Between VMs Across VPCs
(When Traffic Does Not Pass Through a Firewall)
Tenant
Network
Administrator
VPC1 VPC2
VPC
Interconnection
Logic Router1 Logic Router2
Instance

Logic Switch1 Logic Switch2


Spine Subnet1 Subnet2
192.168. 1.0 / 24 192.168. 2.0 / 24

Leaf
Logic Port Logic Port Logic Port

VM1 VM2 VM3


VM1 VM2 VM3
192.168. 1.0 / 24 192.168. 2.0 / 24
VPC1 VPC2

43 Huawei Confidential
Key Configurations for Communication Between VMs Across
VPCs (When Traffic Does Not Pass Through a Firewall)
Network
Administrator Key configuration: Configure VPC communication on iMaster NCE.

Spine

Leaf

VM1 VM2 VM3

192.168. 1.0 / 24 192.168. 2.0 / 24


VPC1 VPC2

44 Huawei Confidential
Configuring Communication Between VMs Across VPCs
(When Traffic Passes Through a Firewall)
Network
Scenario Description
Administrator
⚫ A tenant deploys two different service systems that belong to different
VPC logical networks. In terms of services, the two VPCs need to
communicate with each other, and the inter-VPC traffic needs to pass
through the firewall in one VPC. Therefore, the cross-VPC access service
needs to be deployed.

Configuration roadmap
Spine 1. Create two VPCs and orchestrate the basic L2 and L3 networks in the
VPCs. (For example, logical routers, logical switches, VMM mapping,
and VM online).

Leaf
2. Create an external network domain and a logical firewall in VPC1, and
configure internal and external links for the logical firewall.

3. Create a security policy on the firewall to allow the subnets to


VM1 VM2 VM3 communicate with each other.

192.168. 1.0 / 24 192.168. 2.0 / 24 4. Create a VPC interworking instance and specify the logical firewall in
VPC1 VPC2 VPC1 to implement interworking.

45 Huawei Confidential

• If both firewalls need to pass through the firewall, configure the logical firewall
in VPC2. The operations are the same as those in VPC1.
Logical Model for Configuring Communication Between VMs
Across VPCs (When Traffic Passes Through a Firewall)
VPC1 External network Tenant
Network domain
Administrator VPC2
Logical firewall
VPC
Logic Router1 Interconnection Logic Router2
Instance
(Through a
single wall)
Logic Switch1 Logic Switch2

Spine Subnet1 Subnet2


192.168. 1.0 / 24 192.168. 2.0 / 24

Leaf
Logic Port Logic Port Logic Port

VM1 VM2 VM3


VM1 VM2 VM3
192.168. 1.0 / 24 192.168. 2.0 / 24
VPC1 VPC2

46 Huawei Confidential
Key Configurations for Configuring Communication Between
VMs Across VPCs (When Traffic Passes Through a Firewall)
Network
Administrator
Key configuration: When configuring VPC communication on iMaster NCE, you

need to specify the firewall.

Spine

Leaf

VM1 VM2 VM3

192.168. 1.0 / 24 192.168. 2.0 / 24


VPC1 VPC2

47 Huawei Confidential
Configuring Communication Between VMs Across VPCs
(When Traffic Passes Through Firewalls)
Network Scenario Description
Administrator
⚫ A tenant deploys two different service systems that belong to different
VPC logical networks. In terms of services, the two VPCs need to
communicate with each other, and the inter-VPC traffic needs to pass
through the firewall in one VPC. Therefore, the cross-VPC access
service needs to be deployed.

Configuration roadmap:
Spine 1. Create two VPCs and orchestrate the basic L2 and L3 networks in the
VPCs. (For example, logical routers, logical switches, VMM mapping,
and VM online).

Leaf
2. Create an external network domain and a logical firewall in VPC1, and
configure internal and external links for the logical firewall.

3. Create a security policy on the firewall to allow the subnets to


VM1 VM2 VM3 communicate with each other.

192.168. 1.0 / 24 192.168. 2.0 / 24 4. Create a VPC interworking instance and specify the logical firewall in
VPC1 VPC2 the VPC to implement interworking.

48 Huawei Confidential
Logical Model for Configuring Communication Between VMs
Across VPCs (When Traffic Passes Through Firewalls)
External network Tenant External network
VPC1 VPC2
Network domain
domain
Administrator
Logical firewall Logical firewall
VPC
Logic Router1 Interconnection Logic Router2
Instance
(crossing the
double wall)
Logic Switch1 Logic Switch2

Spine Subnet1 Subnet2


192.168. 1.0 / 24 192.168. 2.0 / 24

Leaf
Logic Port Logic Port Logic Port

VM1 VM2 VM3


VM1 VM2 VM3
192.168. 1.0 / 24 192.168. 2.0 / 24
VPC1 VPC2

49 Huawei Confidential
Key Configurations for Configuring Communication Between
VMs Across VPCs (When Traffic Passes Through Firewalls)
Network
Administrator
Key configuration: When configuring VPC communication on iMaster NCE, you

need to specify the firewall.

Spine

Leaf

VM1 VM2 VM3

192.168. 1.0 / 24 192.168. 2.0 / 24


VPC1 VPC2

50 Huawei Confidential
Contents

1. Deployment Process Overview

2. Pre-configuration

3. Service Provisioning
▫ Deploying Layer 2 and Layer 3 Basic Services

▫ Deploying the VPC Interconnection Service


◼ Deploying Value-added Services

▫ Deploying Microsegmentation and Service Chain

4. Easy Deployment
51 Huawei Confidential
Deploying the SNAT Service: Intranet VMs Access the Internet

Network
Administrator Internet

Scenario Description

⚫ The network segment of the internal hosts in the data center of


company A is 192.168.1.0/24. The hosts need to access the
Spine Internet through SNAT.

Leaf

VM1

192.168. 1.0 / 24

52 Huawei Confidential

Configuration roadmap:
1. Create a tenant and a tenant VPC on iMaster NCE (Fabric).
2. Create a logical router in the tenant VPC and add an IPv4 subnet.
3. Create a logical switch in the tenant VPC and associate it with the logical router
and subnet.
4. Create VMM mappings in the tenant VPC and associate them with different
logical switches.
5. Create a VM on the VMM and connect the VM to the corresponding network.
6. Create a logical VAS (firewall) in the tenant VPC and configure internal links.
7. Create an external network domain in the tenant VPC, associate the domain with
the created external gateway, and configure external links.
8. Create an SNAT policy in the tenant VPC and specify the SNAT type, source
IP address, destination IP address, and public IP address.
9. Create a security policy in the tenant VPC to allow the subnets or addresses
for which SNAT needs to be performed.

• Prerequisite:

▫ Device discovery, global resource configuration, and interconnection


resource configuration have been completed.

▫ Fabric and L4-L7 resource pools have been created, associated with the
resource pools, and roles of inter-device links have been configured.

▫ An external gateway (of the L3 shared egress type) has been created and a
public IP address has been configured for VM address translation.
▫ iMaster NCE-Fabric has been interconnected with VMM.
Deploying the SNAT Service: Configuring SNAT Policies and
Security Policies
For SNAT access to the Internet, you need to configure security
1 On the VPC1 orchestration page, click the FW Service tab and create
2 policies to permit traffic. In the Tenant View, click the Security tab to
an SNAT policy for the logical firewall.
go to the Security page.

53 Huawei Confidential

• For details, see the lab manual.


Deploying an EIP: Accessing Intranet Server from an External
Network
Network
Administrator
Internet

Scenario Description
Floating IP address
10.10.10.12
⚫ The internal host (IP address: 192.168.1.1) in the data center of
company A provides services externally. The external network
Spine accesses the floating IP address 10.10.10.12 to access the
services provided by the internal host.

Leaf

VM1

Real IP address: 192.168.1.1/24

54 Huawei Confidential

• Configuration roadmap:
1. Create a tenant and tenant VPC on iMaster NCE (Fabric).
2. Create a logical router in the tenant VPC and add a subnet.
3. Create a logical switch in the tenant VPC and associate the logical router with
the corresponding subnet.
4. Create VMM mappings in the tenant VPC and associate them with different
logical switches.
5. Create a VM on the VMM and connect the VM to the corresponding network.
6. Create a logical VAS (firewall) in the tenant VPC and configure internal links.
7. Create an external network domain in the tenant VPC, associate the domain
with the created external gateway, and configure external links.
8. Create an EIP policy in the tenant VPC and specify the working mode,
floating IP address, and fixed IP address of the EIP.
9. Create a security policy in the tenant VPC to allow the subnet or IP address
for which the EIP needs to be executed.
• EIP is also called floating IP address.
• Prerequisite:
▫ Device discovery, global resource configuration, and interconnection
resource configuration have been completed.
▫ Fabric and L4-L7 resource pools have been created, associated with the
resource pools, and roles of links between devices have been configured.
▫ An external gateway (of the L3 shared egress type) has been created, and a
public IP address has been configured for VM address translation.
▫ iMaster NCE-Fabric has been interconnected with VMM.
Deploy EIP: Configure EIP Policies and Security Policies
On the VPC1 orchestration page, click the FW Service tab and In the EIP scenario, you need to configure security policies to permit
1 2 traffic. In the tenant view, click the Security tab to go to the Security
configure an EIP policy for the logical firewall.
page.

55 Huawei Confidential

• For details, see the lab manual.


Contents

1. Deployment Process Overview

2. Pre-configuration

3. Service Provisioning
▫ Deploying Layer 2 and Layer 3 Basic Services

▫ Deploying the VPC Interconnection Service

▫ Deploying Value-added Services


◼ Deploying Microsegmentation and Service Chain

4. Easy Deployment
56 Huawei Confidential
Microsegmentation Overview
⚫ Microsegmentation is a security isolation technology that groups DC services based on certain rules and deploys policies between
groups to implement traffic control.
⚫ Traditionally, subnets are created for DCs based on coarse-grained granularities such as VLAN IDs or VNIs. Microsegmentation
supports more fine-grained and flexible grouping modes, for example, grouping based on IP addresses, MAC addresses, and VM
names. This can further narrow down security zones to implement more fine-grained service isolation and enhance network security.
⚫ Microsegmentation implements service isolation between different servers of a VXLAN network and ensures secure management and
control for the VXLAN network. In addition, the configuration and maintenance of microsegmentation are simple, significantly
reducing the configuration and maintenance costs.

Server VM Server VM
GBP
Subnet IP Subnet IP
Action: Permit/Deny
MAC VM MAC VM

... ...

Source EPG Destination EPG

57 Huawei Confidential
Basic Concepts of Microsegmentation - EPG
⚫ End point group (EPG): A group of entities that carry services, such as servers and VMs. EPGs can be
defined based on IP addresses, MAC addresses, VM names, and applications.
⚫ After service entities on a network are allocated to EPGs, the VMs are classified based on the EPG:
 Unknown EPG member: VMs that do not belong to any EPG (for example, VM5 and VM6).
 EPG member: VMs that belong to any EPG (for example, VM1, VM2, VM3, and VM4).
 Members in the same EPG: VMs that belong to the same EPG (for example, VM1 and VM2, or VM3 and VM4).
 Members in different EPGs: VMs that belong to different EPGs (for example, VM1 and VM3).

EPG1 EPG2 Unknown EPG

VM1 VM2 VM3 VM4 VM5 VM6

58 Huawei Confidential
Basic Concepts of Microsegmentation - GBP
⚫ Group-based policy (GBP): policy for traffic control within an EPG and between EPGs. A GBP can be
configured based on EPGs, protocol numbers, and port numbers, which specifies the policies within an
EPG, between EPGs, and between a known EPG and an unknown EPG.

Access control policy for members in different EPGs

EPG1 EPG2 Unknown EPG

VM1 VM2 VM3 VM4 VM5 VM6

Access control policy for members in an EPG Access control policy for unknown EPG members

59 Huawei Confidential
Basic Concepts of Microsegmentation - Default GBP Policies
Access control policy for members in different EPGs

EPG1 EPG2 Unknown EPG

VM1 VM2 VM3 VM4 VM5 VM6

Access control policy for members in an EPG Access control policy for unknown EPG members

① By default, the access control policy for an unknown EPG member is permit. That is, unknown EPG members can communicate with
each other, and an unknown EPG member and a known EPG member can also communicate with each other.

② By default, the access control policy for an EPG member is deny. That is, members in different EPGs cannot communicate with each
other.

③ The default access control policy for members in an EPG varies according to CE switch models.

60 Huawei Confidential

• Default access control policy for members in an EPG:

▫ For the CE6857EI, CE6857E, CE6857F, CE6865EI, CE6865E, CE8861EI, and


CE8868EI, the default access control policy is always permit for members in
an EPG, which cannot be modified. That is, members in the same EPG can
communicate with each other.

▫ For the CE6881, CE6881K, CE6881E, CE6863, CE6863E, CE6863K, CE6820,


and CE5881, the default access control policy is always none for members
in an EPG, which can be modified. That is, access control is not performed
for members in the same EPG. In this case, the devices perform access
control for members in the EPG according to the default access control
policy (policy 2 on this slide).
Microsegmentation Application Scenarios
⚫ You can use microsegmentation to allocate servers or VMs to different EPGs and specify the GBP for members in
different EPGs to implement traffic control between service functions (SFs).
⚫ You can use a CE switch alone or use a CE switch and iMaster NCE-Fabric together to implement
microsegmentation. If iMaster NCE-Fabric is used, it configures EPGs and GBPs and delivers the configurations to
the CE switch through NETCONF.
Spine

Fabric

Leaf1 Leaf2

VM1 VM2 VM3 VM4 VM5 VM6 VM7 VM8


EPG1 EPG2 EPG3

61 Huawei Confidential
Configuring Microsegmentation

Logical Router LR1

Scenario description:
Logical switch LS1 Logical switch LS2
⚫ To isolate two VMs (such as VM1 and VM2) that are not
dependent on firewalls within a VPC network, you can use
iMaster NCE-Fabric to deploy microsegmentation. Associate
VM1 with EPG1 and VM2 with EPG2.

VM1 VM2 ⚫ If necessary, you can create a service chain policy to allow
192.168. 10.2 192.168. 20.2
mutual access between specified protocols and ports.
EPG1 EPG2

62 Huawei Confidential

• Configuration roadmap:

1. Create a tenant and a tenant VPC on iMaster NCE (Fabric).

2. Create a logical router in the tenant VPC and add a subnet.

3. Create a logical switch in the tenant VPC and associate the logical router with
the corresponding subnet.

4. Create VMM mappings in the tenant VPC and associate them with different
logical switches.

5. Create a VM on the VMM and connect the VM to the corresponding network.

6. Create two EPG servers of the host type.

7. Create a service chain template for microsegmentation.

8. (Optional) Configure a service chain policy to allow the required communication


protocols and ports.
Creating an EPG

1 Creating an EPG

Logical Router LR1

Logical switch LS1 Logical switch LS2

VM1 VM2
192.168. 10.2 192.168. 20.2
2 Allocate VM1 to EPG1 based on the device
EPG1 EPG2 IP address type in the microsegmentation.

63 Huawei Confidential

• Repeat the same steps to create another EPG and add VM2 as a member.
Creating a Service Chain
⚫ To allow certain protocols and ports to pass between two VMs, create a service chain and related policies. When a
microsegmentation-based SFC is created, SF nodes cannot be used between the source EPG and the destination
EPG. That is, no value-added service is supported.

64 Huawei Confidential
Service Chain Overview
⚫ Service function chain (SFC) is a technology that provides ordered services for the application layer.
⚫ After the SFC path is defined, the matching traffic can pass through the specified VAS device in sequence. (e.g.,
firewall, load balancing, in-depth detection, intrusion prevention, etc.) so as to obtain corresponding value-added
services in turn.
⚫ Service chains are orchestrated on the Agile Controller-DCN, and may be implemented by using Policy-Based
Routing (PBR) or Network Service Header (NSH).
VXLAN tunnel
Traffic diversion point

Delivers filtering and Delivers filtering and


redirection policies redirection policies
Delivers filtering and
redirection policies

VM1 SC SFF1 SFF2 ToR VM2

SF1 SF2 SF3

65 Huawei Confidential
Basic Concepts of SFC

• SFC domain: A domain where SFs are deployed.


Border Leaf
• EPG: A set of service units with the same features.
External
SFC defines the service function chain between a
Spine
pair of EPGs, including an SFP and the SFC policy.
• Service classifier (SC): An SC is located at the
ingress of an SFC domain. After packets enter the
SFC domain, the SC classifies the packets.

SFF
• SF: SFs are devices that provide VASs, such as
Service Leaf
Server Leaf firewalls and load balancers.
SC
• Service function forwarder (SFF): An SFF
forwards the packets received from a network to
its associated SFs.
SF SF
Firewall IPS • SFP: An SFP is a packet path calculated based on
configurations.

66 Huawei Confidential

• SFC domain: An area that includes an SFC device may be referred to as an SFC
domain.
• EPG: EPGs can be defined based on external network domains, logical routers,
and logical switches. Users can specify source EPGs, destination EPGs, and service
nodes between them based on service requirements.
• Classifier: It is located at the border ingress of the SFC domain. After a packet
enters the SFC domain, the traffic classification is performed first. The
classification granularity is determined by the capability of the classifier and the
SFC policy. The classification rules can be rough or detailed. For example:
▫ In general, all packets on a port meet an SFC rule and are transmitted
through SFP A.
▫ For details, only the packets meeting the 5-tuple requirements can meet an
SFC rule and pass through SFP B.
• Service node: An incomplete SF list includes the firewall, load balancer,
application accelerator, validity interception, and NAT. One SFC domain may
have multiple SFs.
• Service chain forwarding node: forwards packets received from the network to
SFs associated with the SFF based on the NSH-encapsulated information. (In the
NSH implementation mode, if the PBR implementation mode is used, the PBR
forwarding is based on the traffic information.) After the SF processing, the
packet is returned to the same SFF. The SFF determines whether to send the
packet back to the network.
• Service chain path: A packet path calculated based on the configuration can be
used to accurately specify the location of each SF.
SFC Application Scenarios
Security protection between the data center network
and external networks is the core of network security.
The external north-south access traffic can be flexibly
External
diverted to different SFs (marked by the green line)
based on the defined SFC to implement functions such
as address translation and security filtering for internal
and external networks.

When service units of different security levels need to


communicate with each other, east-west traffic can be
flexibly defined to pass through SFs (marked by the
yellow line) in the resource pool in sequence based on
VM1 VM2 LB Firewall IPS user requirements for security protection.

67 Huawei Confidential

• You can use a switch alone or use a switch and iMaster NCE-Fabric together to
implement SFC. The controller orchestrates SFs, configures an SFP, and delivers
the SFP configurations to the SC and SFFs (Huawei CE series switches) through
NETCONF interfaces.
Configuring a Service Chain
Scenario description:
When intranet users need to access the Internet, the traffic orchestrated by
External EPG2 intranet users passes through the core firewall for value-added service operations
Network
such as service isolation and security control, and then enters the border firewall
for address translation.

Configuration roadmap:
Border FW 1. On iMaster NCE (Fabric), create tenants and tenant VPCs and orchestrate
logical networks.

2. Create a service chain template for intranet users to access the Internet. The
template passes through Firewall 1 (the core firewall provides security policy
Core FW
filtering) and then through Firewall 2 (the border firewall provides SNAT).

3. Create an EPG in the tenant VPC, set logical switches 1 and 2 as EPG1
(source EPG), and set the external network domain as EPG2 (destination
WAF
EPG).

4. Create an SFC, associate it with an SFC template, and redirect traffic to the
logical firewall.
EPG1
5. Configure security filtering policies on logical firewall 1 and SNAT on logical
VM1
firewall 2.

68 Huawei Confidential

• When orchestrating logical networks, you need to create logical firewall 1 and
configure internal links between the and logical routers. Create a domain
between logical firewall 2 and the external network, and configure internal links
between logical firewall 2 and the logical router and external links between
logical firewall 2 and the external network domain.
Creating a Service Chain Template
⚫ Creating an SFC template is to set the SFC path, that is, the service nodes that the source EPG server passes through
and the sequence of the service nodes that the source EPG server passes through the destination EPG server.
⚫ Create a service chain template and drag the required VAS node icons between two EPG servers based on service
requirements. In this example, drag the firewall node icons between two EPG servers.

69 Huawei Confidential
Creating an EPG
⚫ Take EPG1 as an example. Enter EPG1 in Name and select SW1 as the source EPG of the service chain. Create EPG2
in the same way. In the topology, select the external network domain Ext1 as the source EPG of the service chain.

70 Huawei Confidential
Creating a Service Chain
⚫ When creating a service chain, select the created template and set related parameters.

In the Service Function Node Configuration area, click and select a


1 Setting the source and destination EPG servers 2
logical VAS created in the VPC as the SF role in the service chain.

3 Configure policy actions.

71 Huawei Confidential
Similarities and Differences Between the Service Models &
Basic Concepts of Microsegmentation and SFC
EPG Service Model

1. Granularity: 1. Significant differences related to the support for VASs:


• In the SFC scenario, EPGs can be configured only based on logical • In the microsegmentation scenario, SF nodes cannot be deployed
routers, logical switches, and external network domains. between the source EPG and destination EPG. That is, no VAS is
• In the microsegmentation scenario, EPGs can be configured based on supported. In the SFC scenario, multiple VASs are supported.
logical switches, external network domains, subnets, and terminals 2. Both SFC and microsegmentation support east-west traffic
with a finer granularity. In this scenario, terminals support four
and north-south traffic:
matching modes —matching based on the prefix and suffix of the
• In an SFC model for north-south traffic, the source EPG is a logical
host name, matching based on the prefix and suffix of the VM name,
switch, subnet, or terminal (discrete IP address), and the destination
matching based on the MAC address, and matching based on the IP
EPG is an external network domain.
address.
• In an SFC model for east-west traffic, the source and destination EPGs
2. Default isolation: must be connected to the same logical router.
• In the microsegmentation scenario, automatic isolation is
implemented after EPGs are configured.
• In the SFC scenario, after EPGs are created, you still need to
configure SFC to implement traffic isolation or diversion.

72 Huawei Confidential

• In common service scenarios, a microsegmentation configuration model saves


more ACL resources than an SFC configuration model.

• In the microsegmentation scenario, traffic between the source EPG and


destination EPG cannot be redirected to SF nodes (except for north-south traffic,
which usually passes through firewalls based on route forwarding).

• The configuration models of microsegmentation and SFC vary in that the SFC
model adopts 5-tuple-based policies, while the microsegmentation model uses
EPG–based policies.
Contents

1. Deployment Process Overview

2. Pre-configuration

3. Service Provisioning

4. Easy Deployment

73 Huawei Confidential
Overview of Easy Mode (1)
⚫ The preceding figure shows the traditional deployment process. To facilitate quick network deployment, Huawei
CloudFabric solution provides the Easy deployment function. That is, you can go to the dedicated page for Easy in
iMaster NCE-Fabric. Based on the navigation tree, you can complete zero-touch deployment (ZTP) for switches,
create tenants and VPCs, and provision basic network services in the VPC.

Network Traditional manual mode


Administrator
For complex networks, the traditional manual mode has high
requirements on the skills of deployment personnel.
You need to preconfigure the underlay network on devices,
manually preconfigure resource pools and device groups on
iMaster NCE (Fabric), and manually create tenants and VPCs.

Spine

Leaf

74 Huawei Confidential
Overview of Easy Mode (2)
⚫ When the Easy deployment mode is used, the network planning scheme can be automatically generated based on
the number of devices and cable connections.

Network
Administrator

Spine

Leaf

75 Huawei Confidential
Easy Mode vs Manual Mode
Comparison Item Easy Mode Manual Mode

Networking Requirements Low Low


Manually plan the connection, or
Service network connection planning automatically generate the connection plan Manually plan cable connections in the LLD.
by iMaster NCE (Fabric).
iMaster NCE-Fabric automatically generates
Parameters need to be manually planned and
Pre-Configure the Underlay Network on Switches planning parameters and configuration
configured one by one.
script files.
• Underlay network pre-configuration on firewalls needs to be manually performed.
Firewall and SecoManager pre-configuration
• SecoManager pre-configuration (Security menu) requires manual configuration.

Creating and configuring fabric resource pools, managed


devices, device groups, device roles, tenants, and VPCs Created by iMaster NCE-Fabric Manually configure items one by one in the
on iMaster NCE (Fabric) Configuration Wizard menu.

Layer 2 and Layer 3 Basic Service Orchestration in a VPC Directly orchestrate on the Easy page. Orchestrate on the Service Provisioning page.

Cross-VPC communication (through firewalls and not


Orchestrate on the Service Provisioning page.
through firewalls)
North-south access to external networks through
Orchestrate on the Service Provisioning page.
firewalls (including SNAT and DNAT)

76 Huawei Confidential
Deployment Process
⚫ The following figure shows the deployment process in Easy mode.

Preparing for Easy Network Service Perform Follow-up


Started Operations
Deployment Deployment

Enter the Easy mode


Load the license file
Deploy new DCs
Disabling the First and complete the
Online device rollout
Synchronization Access Port Pre-
Function configuration
Configuration
Configuring an SSH
Fingerprint Tenant network
Verification Policy deployment

Expanding the
Access Port Capacity

77 Huawei Confidential

• Preparations before deployment:


▫ Entering the Easy mode: Enter the dedicated Easy mode on iMaster NCE-
Fabric to configure the network.
▫ Create a DC to complete the switch online. On the page for creating a DC,
set related parameters and use ZTP to quickly bring the switch online.
▫ Access port pre-configuration: When a server is connected to a server leaf
node, you can use the access port pre-configuration function to configure
Eth-Trunk IDs, M-LAG IDs, and LACP mode for the ports in batches.
▫ Tenant network deployment:
▪ Configure the tenant, VPC, associated external gateway, subnet, and
access port based on the actual service plan.
▪ Enable an external gateway of the L3 shared egress type and
configure interconnection interfaces to prepare for southbound and
northbound service provisioning.
▪ During tenant network deployment, you can associate VLANs and
port groups to logical ports on logical switches in batches, improving
configuration efficiency. Therefore, you need to create a port group
before configuring the overlay network.
▫ Access port capacity expansion: Create logical ports in batches as required
to expand server ports.
• Follow-up operations: After the Easy network is deployed, the basic VPC network
has been provisioned. If you need to deploy other advanced services, such as the
firewall service, inter-VPC access service, and microsegmentation service, you can
provision the services by yourself.
Quiz

1. (Multiple-answer question) Which elements are involved in the out-of-band


networking of the ZTP function in the CloudFabric solution? ( )
A. Independent management switch

B. Root device

C. iMaster NCE-Fabric

D. Spine and leaf nodes

78 Huawei Confidential

1. ACD
Summary
⚫ In the CloudFabric computing scenario, the network administrator is only responsible for
network setup and service orchestration, and the VMM platform is managed and
maintained by the computing management personnel.

79 Huawei Confidential
Thank you. 把数字世界带入每个人、每个家庭、
每个组织,构建万物互联的智能世界。
Bring digital to every person, home, and
organization for a fully connected,
intelligent world.

Copyright©2023 Huawei Technologies Co., Ltd.


All Rights Reserved.

The information in this document may contain predictive


statements including, without limitation, statements regarding
the future financial and operating results, future product
portfolio, new technology, etc. There are a number of factors that
could cause actual results and developments to differ materially
from those expressed or implied in the predictive statements.
Therefore, such information is provided for reference purpose
only and constitutes neither an offer nor an acceptance. Huawei
may change the information at any time without notice.
CloudFabric Intelligent Data Center
Network O&M Solution
Foreword
⚫ As technologies such as cloud computing, big data, and artificial intelligence develop continuously and
gain popularity in commercial use, the digital transformation of enterprises is deepening. However,
traditional data centers (DCs) lag behind in this development trend, making cloud-based
transformation become a must-have. However, the rapid increase in the scale and traffic of DCs has
brought difficulties and challenges in network management and service operations. The traditional
manual O&M model gradually becomes ineffective in terms of complicated application migration
strategies, unstable service experience, difficult fault locating, and large-scale management of massive
security policies.
⚫ This course describes the CloudFabric intelligent data center network (DCN) O&M solution. With
iMaster NCE-Fabric and iMaster NCE-FabricInsight, the solution can overcome challenges in the
traditional passive O&M model and eliminate difficulties in fault locating, providing users with
ubiquitous network application and network assurance.

2 Huawei Confidential
Objectives

⚫ On completion of this course, you will be able to:


 Describe pain points of the traditional DCN O&M and the Reason Why
Intelligent O&M Is Required.
 Describe O&M functions and features of iMaster NCE-Fabric.
 Describe application scenarios of iMaster NCE-FabricInsight.
 Describe main functions and features of iMaster NCE-FabricInsight.

3 Huawei Confidential
Contents

1. DCN O&M Challenges and CloudFabric Intelligent DCN O&M Solution

2. iMaster NCE-Fabric

3. iMaster NCE-FabricInsight

4 Huawei Confidential
DCN Evolution to a Multi-Cloud and Multi-DC Mode
Mobility-based acceleration Ubiquitous services Agile service rollout
Offline -> online, improving Personalized services, enhancing user loyalty Quick business monetization,
the service efficiency accelerating innovation

Online
shopping
Ad Ecosystem Third-party app
Mobile app Supermarket integration … …
placement
Payment by experience shopping
card User
Multiple quick Entertainment profile SDK/API SDK/API SDK/API
Entrusted fee payment methods consumption
deduction ...
Hundreds of Third-party Product
Information
Counter millions of monthly transfer recommendation
push
transfer active users ETC Risk
prevention
Usage and control Data volume: Interconnection Rollout
User scale: 10x 24/7 services
frequency: 10x 200x scenarios: 100x speed: 10x

Centralized -> distributed Single-DC -> multi-DC Private cloud -> hybrid cloud
More complex DC scale increased Virtualization scale
system architecture by 100 times increased by 100 times

5 Huawei Confidential

• Note:

▫ ETC: Electronic Toll Collection

▫ SDK: Software Development Kit

▫ API: Application Programming Interface


Challenges Faced by Traditional Manual O&M
Difficult health check Difficult fault locating Difficult network change
Fluctuating securities market, Hundreds of millions of cross-bank Enormous increase in Internet
resulting in the daily needs to cope transactions per day, requiring traffic, requiring network
with service peaks. 24/7 uninterrupted services. changes every week.
0.09 0.63 1.1 1.6 2.0 2.8 6.48


Survey on loss caused by fault-triggered
interruptions ①
It takes three person-hours to perform
The complicated architecture
routine inspection before the market About 70% of network faults are
results in difficult fault locating.
opens every day. This increases caused by human errors as changes
It takes 76 minutes on average to
difficulties in confidently keeping up are manually compared and verified.
locate a fault.
with the general market trends.

6 Huawei Confidential

• As applications are migrated dynamically and traffic increases sharply, DC O&M


needs to be intelligent.

▫ The fault domain expands with the increase in DC scale.

▫ The boundaries of virtual networks extend to servers (such as vSwitches),


blurring the O&M boundary between networks and IT systems.

▫ It is more and more difficult to implement traditional O&M methods as the


network needs to dynamically detect virtual machine (VM) migration and
elastic scaling of applications, network configurations change frequently,
there is a surge in traffic, and the application policies and mutual access
relationships in DCs are increasingly complex.

▫ To improve user experience and ensure high reliability of key services, faults
need to be located and rectified in real time.

• Note:

▫ Source ①: Network Computing, the Meta Group and Contingency Planning


Research
▫ Source ②: App Annie
Vision of the CloudFabric Intelligent DCN O&M Solution
Traditional O&M Intelligent O&M
• Network perspective: monitors the KPIs of devices without • Service perspective: provides correlation analysis of applications
metric association analysis. The normality of a single metric and networks, proactively detects service changes, and
does not indicate that the network is healthy. systematically evaluates the network health status.
• Passive O&M: fails to sense network status in real time. For • Proactive perception: proactively monitors the application traffic
example, exceptions such as microburst cannot be identified and network KPIs, collects and identifies transient exceptions in
in time. milliseconds based on telemetry.
• Skill dependency: depends on experts' skills in fault analysis, • Machine learning: traces the source based on fault aggregation,
leading to time-consuming root cause locating. with minute-level intelligent inference of root causes.

Smart Predictability
sensing Network health status can be detected in real time to
proactively identify potential network risks before a fault occurs.

Self-maintenance
Smart
Network
controller + Network
analyzer
Vision analysis
Faults can be proactively detected, with intelligent
analysis on network fault causes as well as
automatic, closed-loop fault rectification.

Self-optimization
Smart Automatic network optimization is implemented based
optimization on service intents, maximizing service running efficiency.

7 Huawei Confidential
Overall Architecture of CloudFabric Intelligent DCN O&M
Solution
iMaster NCE

and subscribe to
iMaster NCE-Fabric iMaster NCE-FabricInsight

configurations.
configurations

incremental
O&M entry

Obtain full
Service O&M entry Troubleshooting entry
Management Troubleshooting Network health evaluation Fault troubleshooting
and monitoring
Service
Hardware Fault Fault
component NE management Fault locating Link Entry
component detection locating
Network Fault System Application
management rectification Protocol Fault impact analysis
Subscribe to resource flow
ARP/FIB
Database (DB) service: unified inventory (alarm, Analysis: common services of the big data
entries AI engine
configuration, and low-speed state performance) platform
Unified
Low-speed state High-speed
southbound Configuration Traffic Log
performance performance
collection management collection collection
collection collection
service

Standard SNMP NETCONF gRPC ERSPAN NetStream Syslog


infrastructure Device Telemetry TCP control Specified TCP/UDP
interface Hardware KPI configuration Device metrics flow collection flow analysis packet Log

Note: After a network device is managed by iMaster NCE-Fabric, the network


Physical administrator can also log in to the device and run O&M commands, serving as a
network supplement to the CloudFabric intelligent DCN O&M solution.

8 Huawei Confidential

• In the CloudFabric solution, iMaster NCE-Fabric provides some O&M capabilities,


including network management, path detection, network reachability verification,
and "1-3-5" fault rectification.

• Based on Huawei big data platform, iMaster NCE-FabricInsight receives data


reported by multiple types of network devices, analyzes network data using
intelligent algorithms, quickly detects network faults and O&M risks, quickly
locates network faults, and displays key network events, providing a decision-
making basis for network O&M.
O&M Function Panorama of the CloudFabric Intelligent DCN
O&M Solution
Visualized management and monitoring Intelligent troubleshooting

NE management Device Path detection


Fault locating
management
Alarm management Log management VXLAN detection Network connectivity
detection
Performance metrics Configuration file Network loop
management management End port locating
detection

Network Three-layer network Emergency plan


management visualization Fault rectification template
Logical resource Three-level Data consistency
Fabric management rollback verification
management

Tenant management Link management


Closed-loop Fault rectification
troubleshooting and isolation
Network health evaluation
Device dimension Fault detection
Controller
Network dimension Protocol dimension Fault locating
Analyzer Overlay dimension Service dimension Root cause analysis

9 Huawei Confidential

• An intelligent O&M system consists of the controller and analyzer. This course
describes some O&M features.
Contents

1. DCN O&M Challenges and CloudFabric Intelligent DCN O&M Solution

2. iMaster NCE-Fabric

3. iMaster NCE-FabricInsight

10 Huawei Confidential
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

iMaster NCE-Fabric
iMaster NCE-Fabric O&M panorama

Physical network O&M Logical network O&M Application network O&M


Visualization of Neutron EPG application
VMM Cloud platform

Status

Detail

Mapping
application, logical, Consistency check network network mapping
O&M O&M
and physical networks restoration (web/app/DB)
interconnection interconnection
Loop fault diagnosis End port O&M Service
Controller installation and (locating) Logical switch O&M Application path

Status

Fault

Change
provisioning
deployment visualization
audit
Logical network Logical router O&M (connectivity and
Underlay network connectivity Events, logs,
topology path)
detection Physical firewall and statistics
management Logical SF O&M
Physical resource pool (fabric) Logical resource pool
Software firewall
(resource visualization)
management
Server management
Virtual switch
Physical topology management
Physical switch
ZTP-based switch installation management

11 Huawei Confidential

• iMaster NCE-Fabric centrally manages and controls cloud DCNs and provides
automatic mapping from applications to physical networks, resource pool
deployment, and visualized O&M, helping customers build service-centric
dynamic network service scheduling capabilities.

• In addition to network planning and deployment, iMaster NCE-Fabric also


provides DCN service O&M, including: topology visualization, loop detection, path
detection, traffic statistics collection, three-level rollback, and data consistency
verification.
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

iMaster NCE-Fabric O&M Panorama


Physical network O&M Logical network O&M Application network O&M

• Service provisioning phase: • Service provisioning phase: • Service provisioning phase:


▫ Physical resource pool management: ▫ Logical network topology ▫ Application network topology
supports the management of firewalls, • O&M phase: monitoring and ▫ Service connectivity detection
load balancers, and DHCP servers.
troubleshooting ▫ O&M phase: monitoring and
▫ Physical network topology
▫ Consistency check troubleshooting
• O&M phase: monitoring and ▫ Network path detection
• O&M phase: network quality
troubleshooting
▫ Logical interface performance
▫ Loop detection
monitoring: displays the performance
▫ Log management: During system data of logical switches, provides current
running, iMaster NCE-Fabric can performance data for users to query, and
generate logs about system management displays the tenant name, logical switch
operations (management logs) and information, logical interface, physical
system running (run logs), facilitating interface, device IP address, number of
auditing and fault locating. sent/received packets, and number of
▫ Device management: supports the sent/received bytes.
management of hardware switches and
firewalls, including device replacement
and deletion.

12 Huawei Confidential
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Three-Layer Network Visualization (1)


App 1 App 2
Three-layer network visualization

• iMaster NCE-Fabric provides visualization of physical


Policy 1 Policy 3
networks, logical networks, and application networks,
Web 1 App 1 Web 1 App 1 solving the problem of blurred O&M boundaries between
Application
networks and IT devices so as to make the topology layers
network Policy 2 Policy 4
clear.
DB 1 DB 1 ▫ Displays the logical and physical network resources
used by the application network, that is, mapping from
the top to the bottom, enabling users to efficiently add
or reduce resources.
▫ Displays the tenant and application information of a
specific physical resource (device, link, or interface),
Logical network that is, mapping from the bottom to the top. When a
physical network changes, iMaster NCE-Fabric rapidly
Logical network 1 Logical network 2 identifies the affected tenants and applications as well
as the impact scope and then sends notifications to
these tenants or applications.
▫ Displays the physical network resources used by and
applications carried by the logical network to present
Physical network an overall grasp of service and network conditions.
▫ When a physical network resource change (device
restart or link disconnection) occurs, iMaster NCE-
Fabric automatically synchronizes the change to logical
and application networks.

13 Huawei Confidential

• Physical network topology: is obtained by scanning the actual physical addresses


of network devices. Once an alarm is generated on the network, the physical
topology can display the faulty network device to the network administrator in a
timely and detailed manner.

• Logical network topology: allows administrators to query relationships between


devices or fabric networks.

• Application network topology: When using group-based policies (GBPs) to deliver


services, iMaster NCE-Fabric constructs an application network topology based on
application models such as end point groups (EPGs) and service function chains
(SFCs). Application network topology can be displayed in tenant and application
dimensions. iMaster NCE-Fabric supports mutual visibility between application
and logical networks.
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Three-Layer Network Visualization (2)


Application network
O&M tips

Application network -> logical


network -> physical network
100% service visualization: Query the logical and physical network
mapping of the physical resources used by tenants.
Add or reduce resources in a timely
Logical network topology to the logical manner.
topology, and of the logical
Application network -> logical
topology to the application network -> physical network
network topology Query the applications running on
physical networks.
Logical network Reversely identify the impact scope of
physical network faults.

Application network -> logical


network -> physical network
Query the physical resources used by
logical networks.
Query the applications running on
logical networks.
Physical network Present intuitive insights into service and
network conditions.

14 Huawei Confidential
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Service Connectivity Detection


⚫ Connectivity detection is used to detect the network connectivity between two VMs. With the simulation of Address
Resolution Protocol (ARP) requests and ping processes, ARP request or Internet Control Message Protocol (ICMP)
request packets are constructed and sent from the source VM to the destination VM. Network connectivity between
two VMs is determined by checking whether the source VM receives the response packets from the destination VM.

When the Execution Result is Packet Loss Ratio 0%, the source node and destination node are reachable to each other and network connectivity between
the source and destination nodes is normal.

When the Execution Result is Fail Go to Single-path Detection, the source node and destination node are unreachable to each other and the path
between the source node and destination node may be disconnected. Click Single-path Detection. The system then automatically switches to the
single-path detection page to perform path detection.

15 Huawei Confidential

• Service connectivity detection:

▫ MAC ping uses ARP request packets to check whether an ARP probe system
is normal.

▫ IP ping uses ICMP request packets to check whether the network


reachability between a VM, bare metal server (BMS), container, or end port
or device of dynamic type and the destination IP address is normal.

• Connectivity check can be performed periodically.

• Prerequisites:

▫ iMaster NCE-Fabric is running properly.

▫ The source and destination IP addresses must be on the same subnet when
MAC ping is used to check whether the ARP probe system is normal.

▫ When both the source and destination VMs and the devices that VMs are
connected to are physical CE devices, the source and destination VMs
cannot be on the same subnet of the same host if MAC ping or IP ping is
used to check connectivity.

▫ When devices are the type of source NEs, the devices need to have VMs
connected.

▫ During network connectivity check, an NE router cannot serve as the


destination node. An NE router functions as the destination node or traffic
of destination IP addresses traverses an NE router.
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Service Path Visualization


Service path visualization

Single-path, multi-path, and network detection • iMaster NCE-Fabric can display the real
Actual physical path physical paths of services based on
application and logical networks. When
5-tuple-based
packet filtering the physical network is decoupled from
the logical network, iMaster NCE-Fabric
can quickly locate network faults, and
detect and rectify unexpected service
interruptions.
• Service path visualization provides the
following functions:
▫ Single-path detection
▫ Multi-path detection
▫ Network loop detection
100% path visualization, from physical links to
logical links and from a single path to multiple paths

16 Huawei Confidential

• The service path visualization feature supports the filtering of path information
based on 5-tuple information and the display of hop-by-hop path information,
enabling users to view service path information as needed.
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Network Path Detection


⚫ Path detection simulates the actual forwarding path of packets.

Single-path detection Multi-path detection

• Single-path detection traces the actual physical paths between VMs, • Multi-path detection traces multiple physical
BMSs, containers, or network devices, and checks whether service flows paths between NVE devices to check whether the
are interrupted. service flows are interrupted.
• Detection principle: iMaster NCE-Fabric sends a Packet-Out message to • Detection principle: The implementation of
the source CE switch through an OpenFlow channel. This message multi-path detection is similar to that of single-
simulates a service flow. 5-tuple information (including source IP path detection. The only difference is that a
address, destination IP address, source port, destination port, and single detection packet is sent during single-path
protocol) and MAC address of this service flow are encapsulated into the detection while the number of packets sent
message. The source CE switch forwards the Packet-Out message during multi-path detection is configurable.
according to the service forwarding path. All devices that receive the iMaster NCE-Fabric also can filter out duplicate
Packet-Out message in the path report a Packet-In message to iMaster paths.
NCE-Fabric. iMaster NCE-Fabric then parses the Packet-In message and
calculates the detection path based on the actual links.

17 Huawei Confidential
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Network Path Detection: Single-Path Detection


⚫ Task configuration: Select the types of source and destination
end ports or network devices for path detection from the drop-
down list box and set task parameters. Filter packets based on
5-tuple information.

⚫ Task result:
 If Status is displayed as Finished, the
detection is successful and the paths are
normal.
 If Status is displayed as Failed, the detection
task cannot be executed due to the failure to
find the source node or other causes.
 If Status is displayed as Timeout, the
detection task is executed, but packet
forwarding fails because the path is
incomplete or interrupted.

18 Huawei Confidential

• Customer requirements:

▫ Actual physical paths of service flows can be displayed, or service flow


interruptions can be detected.

▫ IPv4 and IPv6 overlay network paths can be detected.

▫ Single-path detection can be performed across fabric networks.

▫ Single-path detection of container networks is supported.

▫ IPv4 NSH-based SFC can be detected.

▫ VM access paths can be detected.

▫ Traffic statistics on a specified interface can be collected.

▫ Single-path detection is supported in scenarios where IPv4 or IPv6 VMs are


connected to the CE1800V.

▫ Single-path detection is available to traffic that traverses a firewall.

• Prerequisites:

▫ iMaster NCE-Fabric is running properly.

▫ Management IP addresses of switches have been configured and links


between them have been set up.

▫ SecoManager has been deployed on iMaster NCE-Fabric to detect the path


passing through a firewall.
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Network Path Detection: Multi-Path Detection


⚫ Task configuration: Set multi-path
detection task parameters.

⚫ Task result:
 If Status is displayed as Finished, the
detection is successful and the paths are
normal.
 If Status is displayed as Failed, the
detection task cannot be executed due to
the failure to find the source node or
other causes.
 If Status is displayed as Timeout, the
detection task is executed, but packet
forwarding fails because paths between
the source device and destination device
are incomplete or interrupted.

19 Huawei Confidential

• Customer requirements:

▫ Actual physical paths of service flows can be displayed, or service flow


interruptions can be detected.

▫ IPv4 and IPv6 overlay network paths can be detected.

▫ Multi-path detection can be performed across fabric networks.

• Prerequisites:

▫ iMaster NCE-Fabric is running properly.

▫ Management IP addresses of switches have been configured and links


between them have been set up.
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Network Loop Detection and Elimination (1)


⚫ iMaster NCE-Fabric can automatically detect whether virtual extensible local area network (VXLAN) and virtual
local area network (VLAN) loops occur on a fabric network through mechanisms such as traffic collection and event
association. It then can locate and eliminate the loops, avoiding any impacts on traffic services caused by improper
networking or network attacks.
⚫ The following figure shows the network loop detection and elimination process of iMaster NCE-Fabric.
Loop detection
Monitor the network.

Check whether an alarm related


No to a network loop is generated.
Yes
Check whether a loop occurs
on the network. No

Yes
Loop elimination
Check the loop.
Manual operations on
iMaster NCE-Fabric
Eliminate the loop.

20 Huawei Confidential

• When detecting loops, CE switches generate alarms. The alarms can be classified
into different types, including the traffic threshold-crossing alarm, VLAN MAC
address flapping alarm, and VXLAN MAC address flapping alarm. iMaster NCE-
Fabric samples ARP packets based on the alarms reported by interfaces or sub-
interfaces of CE switches and displays all suspected loops in a list.
▫ When collecting multiple same packets within a specific period of time,
iMaster NCE-Fabric determines a loop occurs and displays the loop
information on the loop detection page and provides elimination
suggestions. Only the local interface where the loop occurs is displayed in
the loop detection result.
▫ If a device interface or sub-interface sends a large number of normal
packets, iMaster NCE-Fabric may fail to collect multiple same packets, and
therefore cannot determine whether a loop exists. In this case, you can log
in to the device and manually confirm whether a loop exists based on the
suspected loop information.
• Customer requirements:
▫ On a fabric network, traffic service exceptions may occur due to improper
networking or network attacks. Customers require a traffic monitoring
technology that samples packets on device interfaces to monitor the traffic
status in real time and promptly find abnormal traffic as well as the source
of attack traffic.
• Prerequisites:
▫ iMaster NCE-Fabric is running properly.
▫ The device to be monitored has been added to the fabric network and has
available ACL resources.
▫ Loop alarm reporting has been enabled on the device.
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Network Loop Detection and Elimination (2)


⚫ iMaster NCE-Fabric displays suspected loop information in the current record based on the received loop alarm. Users can delete or
manually confirm the current record.

⚫ Information about the confirmed loops is listed on the Loop Device tab page. Users can view the details and perform port isolation.

21 Huawei Confidential

• Loop elimination:

▫ Current records:

▪ After confirming that a suspected loop on a CE switch does not exist,


you can delete the record of the suspected loop.

▪ If you need to confirm whether a suspected loop on a CE switch exists


or not, you can delete the record of the suspected loop, and click the
related button in the record of the suspected loop to delete the loop
information. Click Refresh to view information about the confirmed
loop and perform port isolation.

▫ Historical records: display the status of suspected loops that have been
processed so that loop interfaces that have gone offline can be
reconnected. On the Historical Record page, the status of a suspected loop
can be one of the following:

▪ Timeout: After the suspected loop is confirmed and eliminated, the


interface on the loop still reports a MAC address flapping alarm
within the MAC address aging time of the CE switch. Information of
the loop is displayed again in the current record of the suspected loop.
The Status is New and changes to Timeout 2 minutes later.

▪ Manually deleted: indicates that users manually delete the record of


a suspected loop. After the record is deleted, the status of this record
is Manually deleted on the Historical Record page.
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Consistency Check
⚫ iMaster NCE-Fabric sends a configuration query request to a forwarder and detects configuration inconsistencies
between iMaster NCE-Fabric and the forwarder to facilitate subsequent inconsistency elimination. As such, you can
perform either data reconciliation (synchronizing configurations from iMaster NCE-Fabric to the forwarder) or data
synchronization (synchronizing configurations from the forwarder to iMaster NCE-Fabric).

Scenario Inconsistency Discovery Mode Inconsistency Elimination Mode

Manually overwrite inconsistent data on the forwarder with


Data restoration based on iMaster NCE-
data on iMaster NCE-Fabric after inconsistency discovery is
Fabric Inconsistency discovery is performed on manually or automatically performed.
services delivered by iMaster NCE-Fabric to
the forwarder. • Automatic recovery: This mode enables the configuration of
Automatic discovery: policies as required and allows the forwarder to
1. Full inconsistency discovery automatically synchronize all service data from itself to
2. Incremental inconsistency discovery iMaster NCE-Fabric when the forwarder goes online for the
Data synchronization based on the
Manual discovery: first time.
forwarder
1. Full inconsistency discovery • Manual recovery: This mode manually synchronizes
2. Incremental inconsistency discovery inconsistent data from the forwarder to iMaster NCE-Fabric
after inconsistency discovery is manually or automatically
performed.

23 Huawei Confidential

• Synchronization policies include data reconciliation based on iMaster NCE-Fabric


and data synchronization based on the forwarder.

▫ Controller-based reconciliation is to overwrite inconsistent data on the


forwarder with data on iMaster NCE-Fabric. If inconsistencies are caused by
service data delivered by iMaster NCE-Fabric, synchronize the data from
iMaster NCE-Fabric to the forwarder.

▫ Forwarder-based synchronization is to synchronize forwarder data to


iMaster NCE-Fabric. If inconsistencies are caused by manually configured
data, synchronize the data from the forwarder to iMaster NCE-Fabric.

• Inconsistency discovery modes include full inconsistency discovery and


incremental inconsistency discovery.

▫ During full inconsistency discovery, iMaster NCE-Fabric collects all the data
on the forwarder. During incremental inconsistency discovery, iMaster NCE-
Fabric collects only the forwarder data that differs from the data collected
last time.

▫ Incremental inconsistency discovery consumes fewer performance resources


than full inconsistency discovery, but requires at least one full inconsistency
discovery to be performed in advance.
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Data Reconciliation: Data Inconsistency Discovery


⚫ Offline configurations on a forwarder may cause data inconsistencies between the forwarder and iMaster NCE-
Fabric. iMaster NCE-Fabric supports manual and automatic data inconsistency discovery. After data inconsistency
discovery, you can proactively initiate inconsistency elimination (reconciliation) based on iMaster NCE-Fabric.

⚫ Manually perform data inconsistency discovery.


 Select All or New.

⚫ View the data inconsistency discovery result.

24 Huawei Confidential

• Data inconsistency discovery results are as follows:

▫ If Status is displayed as Discovered, data inconsistency discovery is


completed.

▫ If Status is displayed as Invalid inconsistent data and Failure Reason is


displayed as Device re-attachment is performed, the link of the active
node is switched and the inconsistent service data is invalid. In this case,
perform inconsistency discovery again to resolve the issue.

▫ If Status is displayed as Discovery_Error and Failure Reason is displayed


as The device is isolated, you can cancel isolation of the device in the
Advanced Settings area of the Device Management page.

▫ If Status is displayed as Discovery_Error and Failure Reason is displayed


as SFTP users are not configured, configure SFTP first.

▫ If Status is displayed as Discovery_Error and Failure Reason is displayed


as Failed to send netconf package, perform the following operations on
the forwarder:

▪ If the forwarder is running properly, run the ssh client first-time


enable command on the forwarder to enable first authentication for
the SSH client.

▪ If the forwarder fails to connect to the southbound service IP address


of iMaster NCE-Fabric through the SFTP client, run the sftp client-
source -a X.X.X.X command on the forwarder. In this command,
X.X.X.X indicates the source IP address used when the forwarder acts
as an SFTP client.
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Data Reconciliation: Data Inconsistency Elimination (1)


⚫ Eliminate data inconsistencies for devices.
 Select the desired device and click Sync To. Data that exists on iMaster NCE-Fabric but not on the forwarder is synchronized to
the forwarder.

⚫ Eliminate data inconsistencies for instances. (Data that exists on iMaster NCE-Fabric but not on the forwarder
is synchronized to the forwarder.)
 Click the arrow in front of the desired device to check inconsistent features and data types.
 Click . On the page that is displayed, click Expand All to view inconsistent data.
 In the Data from the controller area, select the data to be overwritten and click Sync To. Data that exists on iMaster
NCE-Fabric but not on the forwarder is delivered to the forwarder.

25 Huawei Confidential

• Eliminate data inconsistencies for instances.

▫ If multiple features are inconsistent, eliminate feature inconsistencies in the


following sequence: system, gre, evc, ifm, ethernet, nvo3, dfs, syslog, vxlan,
rtp, l3vpn, ifmtrunk, mlag, mstp, trafficanalysis, vrrp, acl, bfd, directrt, bgp,
evpn, sflow, staticrt, smartlink, vlan, dhcp, dhcpv6, nd, arp, qos, ospfv2,
ospfv3, mac, sshs, feiarpstatus, sfc, l2mc, dgmp, mcastbase, mvpn, pim, and
msdp.
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Data Reconciliation: Data Inconsistency Elimination (2)


⚫ Eliminate data inconsistencies for instances by deleting the data that exists on the forwarder but not on iMaster
NCE-Fabric.
 Enable Delete Via Reconciliation.
 Click the arrow in front of the desired device to check inconsistent features and data types.
 Click . On the page that is displayed, click Expand All to view inconsistent data.
 In the Data from the forwarder area, select the data to be overwritten and click Sync To. Data that exists on the forwarder but
not on iMaster NCE-Fabric is deleted.

26 Huawei Confidential

• Eliminate data inconsistencies for instances.

▫ If Delete Via Reconciliation is enabled, click Sync To to delete


configuration that exists on the forwarder but not on the controller.

▫ If multiple features are inconsistent, eliminate feature inconsistencies in the


following sequence: msdp, pim, mvpn, mcastbase, dgmp, l2mc, sfc,
feiarpstatus, sshs, mac, ospfv3, ospfv2, qos, arp, nd, dhcpv6, dhcp, vlan,
smartlink, staticrt, sflow, evpn, bgp, directrt, bfd, acl, vrrp, trafficanalysis,
mstp, mlag, ifmtrunk, l3vpn, rtp, vxlan, syslog, dfs, nvo3, ethernet, ifm, evc,
gre, and system.
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Data Reconciliation: Data Inconsistency Elimination (3)


⚫ Initiate data restoration.
 If Status is displayed as Reconciled, data restoration is completed.

 If Status is displayed as Invalid inconsistent data and Failure Reason is displayed as Device re-attachment is performed, the
active node link of the device has been switched and the inconsistent service data is invalid. In this case, perform inconsistency
discovery and reconciliation again to resolve the issue.

27 Huawei Confidential

• If the above issue cannot be resolved, contact Huawei technical support.


O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Three-Level Rollback
Network-wide rollback Tenant snapshot Service-level rollback
• Network-wide rollback is used to resolve • The tenant snapshot function is used to back • Service-level rollback helps quickly restore
major faults on the entire network. For up and restore network service configurations original network configurations to recover
example, if network configurations are by tenant, and apply to multi-tenant services. services when a network exception occurs
deleted due to changes, many services are Backup and restoration operations performed due to a fine-grained single-point service
interrupted. In this case, network-wide by a tenant do not affect the provisioning of provisioning failure.
configurations can be rolled back to those other tenants' services, including backup and • You do not need to manually back up data
before the changes or interruptions, restoration of network service configurations for service-level rollback, but need to
enabling quick service recovery. by other tenants. manually restore data.
• Before changes, you can back up network- • The tenant snapshot function allows a tenant • iMaster NCE-Fabric automatically backs up
wide configurations on iMaster NCE-Fabric. to set a backup point and save all its service each service that is provisioned. When an
When a problem occurs due to changes, configurations at the backup point. If needed, exception occurs, iMaster NCE-Fabric can
the configurations can be quickly restored service configurations can then be restored to quickly restore the service to the status
to the backup point, resolving major a specific snapshot point. Additionally, before the service is provisioned.
network faults. iMaster NCE-Fabric can compare the current
• You can manually save data in real time or configurations with the configurations at the
snapshot point, or compare the
periodically on the GUI. You need to
proactively back up data. configurations from two given snapshot
points, and perform configuration rollback to
eliminate differences.
• The tenant snapshot function supports
manual backup and restoration as well as
automatic and periodic backup.

28 Huawei Confidential

• iMaster NCE-Fabric provides three-level rollback, meeting the reliability


requirements of different scenarios and ensuring quick service recovery. This
feature covers 70% to 80% of routine change scenarios. For example, the fast
rollback feature is available for single-point service provisioning exceptions and
independent tenant services.
▫ Network-wide rollback features:
▪ iMaster NCE-Fabric saves the snapshots of the entire network,
including those of iMaster NCE-Fabric and its managed devices.
▪ You can manually save the snapshots in real time or periodically.
▪ During restoration, iMaster NCE-Fabric delivers commands to devices
to restore data. The devices restore specific configurations based on
specified snapshot point labels and do not need to be restarted.
▫ Tenant snapshot features:
▪ You can manually save the snapshots in real time or periodically.
▪ iMaster NCE-Fabric divides different tenant spaces for tenant backup
so that operations between tenants do not affect each other.
▪ Differences between rollback points can be previewed for further
examinations.
▫ Service-level rollback features:
▪ Service operations are automatically saved.
▪ Snapshots are automatically stored in mirroring mode.
▪ The linkage technology enables rollback of multiple operations to the
previous state.
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Network-Wide Rollback (1)


⚫ When used by a single user, if a network configuration error occurs, iMaster NCE-Fabric can quickly restore the network
configuration to a certain time point during which the configuration has been backed up. In this way, network services can be quickly
restored in full mode, avoiding great loss caused by time-consuming network configuration restoration. iMaster NCE-Fabric can back
up and restore its network service configurations and configurations of all managed CE devices.

⚫ Network-wide data backup


 Network-wide data backup includes the backup of iMaster NCE-Fabric database and configuration backup of CE devices.

When Task Progress is 100%, the backup succeeds.

29 Huawei Confidential

• If the task fails, click > to view details about the backup task and failure cause.
Then locate and rectify the fault based on the failure cause to back up the data
again.
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Network-Wide Rollback (2)


⚫ Network-wide data restoration
 Network-wide data can be restored
based on the existing backup points.
 When Task Progress is displayed as
100%, the restoration succeeds.
 If Status is displayed as Restore
Failed, you can view details about the
restoration task and failure cause.
◼ Database restoration failure: Locate and
rectify the fault and perform the
restoration task again.

◼ Device restoration failure: Select a device


in the Device Name column and click
Retry to restore the device failing to be
restored again.

30 Huawei Confidential
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Tenant Snapshot Management (1)


⚫ If there are a large number of tenants on iMaster NCE-Fabric, tenant snapshot management can be implemented
to restore the network service configurations of a single tenant to a certain time point. Tenant snapshot
management supports the backup and restoration of tenant network configurations without affecting the delivery
of network service configurations of other tenants on iMaster NCE-Fabric. For a single tenant, tenant snapshot
management offers the following capabilities:
 Manual or automatic creation of tenant snapshots.
 Saving of snapshots to a remote server and importing of snapshots from a remote server.
 Display of differences between current tenant configurations and snapshots
 Rollback based on snapshot files.
 Display of the rollback task status and historical records.

⚫ Automatic backup or snapshot creation:

31 Huawei Confidential
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Tenant Snapshot Management (2)


⚫ Restoring a snapshot means to roll back the current Tenant snapshot

tenant configuration to the backup configuration.

32 Huawei Confidential
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Service-Level Rollback (1)


⚫ When the VPC design state data is submitted, iMaster NCE-Fabric automatically generates a configuration change record. Based on
the selected configuration change history, the service-level rollback function can quickly roll back the service configurations that
have been successfully delivered to network devices to the configurations before the data in the design state is submitted. This
function is useful for scenarios where an exception occurs after configurations are delivered to network devices and emergency
rollback is required. The following figure shows the implementation of service-level rollback.
Intent change Difference Rollback Design-state 1. Design-state configuration delivery: When design-state configurations are submitted
record details execution configuration delivery and delivered to network devices, the service-level rollback function records the
configuration change data for subsequent rollback.
GUI
2. Intent change record: The service-level rollback function provides the change records
and displays them on the UI of the intent management center.

Application 3. Difference details: The service-level rollback function provides configuration


Service-level rollback
intent differences at the logical layer and application layer and displays the data on the UI
of the intent management center.

4. Rollback execution: The service-level rollback function performs atomic rollback of


Design state
configurations at the logical layer and application layer. Different configurations at
the application layer can be rolled back only after different configurations at the
logical layer are successfully rolled back. If different configurations at the logical
Snapshot
layer fail to be rolled back, service-level rollback fails.

33 Huawei Confidential

• Currently, the following service-level rollback functions are provided:

▫ Check the intent change records.

▫ Check difference details in the intent change records.

▫ Select the intent change history and perform rollback.

▫ Check the rollback details after a successful rollback.

• Design state:

▫ Indicates the process of VPC service orchestration, simulation, and


verification in the Service Simulation app. Services orchestrated in design
state will be delivered to the design-state database but not to real devices.
The data will be submitted to the production-state database and delivered
to the devices after you click Submit.
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Service-Level Rollback (2)


⚫ You can search for the corresponding change records, click in the Search column, and click Confirm to perform the rollback task.

One-click
rollback

34 Huawei Confidential
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Troubleshooting
⚫ iMaster NCE-Fabric provides the intent-driven intelligent event function, which supports device monitoring, application and service
fault monitoring, as well as display of fault details, rectification suggestions or plans. You can perform closed-loop management of
faults based on user intents. The intelligent event function helps users quickly locate and rectify faults, shortening the time for fault
locating and troubleshooting as well as enhancing service continuity.

⚫ Functional architecture: iMaster NCE-FabricInsight Autonomous driving web UI iMaster NCE-Fabric

Fault Key assurance


FaultService
detection flow service

Report issues. Intent engine Link management


Root cause service
analysis Event management
Management of events and
contingency plan status

Event Event Runbook engine


handling identification Provide Matching, execution, and rollback
feedback on of contingency plans
contingency
plan actions Contingency plan management
or monitoring
results of key Impact analysis of NAE OAM
assurance contingency plans emergency plans
flows.

35 Huawei Confidential

• Currently, the intelligent event function supports the following types of fault
events: device fault, application fault, and service fault. iMaster NCE-Fabric needs
to collaborate with iMaster NCE-FabricInsight to solve application and device
faults, while iMaster NCE-Fabric can solve service faults independently.

▫ Device fault: When a CE1800V, physical CE switch, or Huawei firewall is


faulty, iMaster NCE-FabricInsight reports a fault event to iMaster NCE-
Fabric, for example, a fan module of the switch is faulty.

▫ Application fault: When a key assurance flow monitoring exception is


detected, iMaster NCE-FabricInsight reports a fault event to iMaster NCE-
Fabric.

▫ Service fault: A new host access link is set up due to incorrect interface
connection or server migration. In this case, the status of the existing host
access link becomes unknown. When detecting the unknown host access
link, iMaster NCE-Fabric generates a fault event.

• Note:

▫ Web UI: Web user interface

▫ FaultService: fault service

▫ Runbook engine: text workflow engine

▫ NAE: network automation engine

▫ OAM: operation, administration, and maintenance


O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Handling Process of Various Fault Events


Device fault Application fault Service fault

1. iMaster NCE-FabricInsight collects syslogs, device 1. iMaster NCE-Fabric sends the 1. When detecting an unknown host access link,
configurations, and device flow information to created key assurance flow task iMaster NCE-Fabric sends the fault
automatically detect faults, and analyze the faults information to iMaster NCE- information to the fault remediation module
and their impacts. FabricInsight. using closed-loop troubleshooting.
2. iMaster NCE-FabricInsight sends fault details, root 2. iMaster NCE-FabricInsight monitors 2. The fault remediation module analyzes the
causes, and fault impacts to iMaster NCE-Fabric. the traffic status of a specified task. network configuration associated with an
3. iMaster NCE-Fabric analyzes fault information and When a flow exception is detected, unknown link.
it sends the exception information
provides suggestions as well as a fault rectification 3. Fix the logical switch configuration associated
to iMaster NCE-Fabric.
plan and its impacts. with the unknown link. After the rectification
is successful, the network configuration
4. iMaster NCE-Fabric delivers a rectification plan. After
associated with the existing link will be
detecting that the fault is rectified, iMaster NCE-
migrated to a new port.
FabricInsight updates the event status.
4. After the unknown link is cleared, the status
of the fault event is updated to Solved.

• Fault rectification method: • Fault rectification method: • Fault rectification method:


▫ Notification: Notify the user of the fault. ▫ Display monitoring details of key ▫ Fix the network configuration associated
assurance flow exceptions. with an unknown link.
▫ Suggestion: Provide rectification suggestions.
▫ Rectification plan delivery: Provide a rectification ▫ Clear the unknown link.
plan, which can be delivered and rolled back in
one-click mode.

36 Huawei Confidential
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Case: Switch Routing Hardware Table Loss (Rectification


Plan Delivery) (1)
• Case description: When the routing hardware table of a switch is lost, iMaster NCE-Fabric detects and rectifies the
fault.
⚫ Click the number on the upper right of the intelligent twins on the iMaster NCE-Fabric GUI.

⚫ In the dialog box that is displayed, click Handle on the Switch Routing Hardware Table Loss tab page to view the
fault details.

37 Huawei Confidential

• Prerequisites: iMaster NCE-Fabric has been connected to iMaster NCE-


FabricInsight.
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Case: Switch Routing Hardware Table Loss (Rectification


Plan Delivery) (2)
• Case description: When the routing hardware table of a switch is lost, iMaster NCE-Fabric detects and rectifies the
fault.
⚫ Check the solution and impact analysis of the
solution on the iMaster NCE-Fabric GUI
shown below.

Solution 1: Deliver the inconsistent


table entries between the software and
hardware tables to the hardware table
based on the software table entries to
rectify the forwarding fault.

Click Solution 1, and check the


impact analysis of the solution
and the configuration to be
delivered. If you want to use the
rectification plan, click Deliver.

38 Huawei Confidential
O&M Three-Layer Network Connectivity Service Path Consistency Three-Level Troubleshooting
Panorama Visualization Detection Visualization Check Rollback

Case: Switch Routing Hardware Table Loss (Rectification


Plan Delivery) (3)
• Case description: When the routing hardware table of a switch is lost, iMaster NCE-Fabric detects and rectifies the
fault.
⚫ Click OK in the Are you sure you want to deliver the solution? dialog box.

⚫ After the fault is rectified, the event status changes to Solved. Click Close Event to close the event.

39 Huawei Confidential

• If you need to roll back the rectification plan after it is successfully delivered, click
Roll Back on the Solution 1 tab page.
Contents

1. DCN O&M Challenges and CloudFabric Intelligent DCN O&M Solution

2. iMaster NCE-Fabric

3. iMaster NCE-FabricInsight
◼ Overview

▫ Network Visualization and Health Evaluation

▫ Fault Locating

▫ Change Assurance

40 Huawei Confidential
iMaster NCE-FabricInsight
Network health Minute-level Key service
O&M service app ⚫ Based on the Huawei-developed
evaluation troubleshooting assurance
• Multi-DC and multi-cloud • "1-3-5" big data analytics platform, iMaster
network health evaluation troubleshooting for 75 • Service intent • IP address
verification visualization NCE-FabricInsight receives data
• Capacity/Traffic risk types of typical faults
O&M prediction • Automatic root cause • Data plane • Network change from network devices in telemetry
service app • Unified health inference modeling comparison
mode and uses AI algorithms to
management of multi- • One-click flow • Service intent • Network search
vendor DCNs troubleshooting management analyze network data.

APIs ⚫ Main functions of iMaster NCE-


Intelligent analysis system FabricInsight include network
health evaluation, telemetry, IP
Big data analytics platform AI engine address visualization, intent
iMaster NCE- verification, change comparison,
Machine learning
FabricInsight Spark Druid Kafka HDFS
algorithm library resource management, and system
management.
Fabric state database Flow state database Machine learning framework

Network telemetry

Fabric

41 Huawei Confidential

• The overall architecture of iMaster NCE-FabricInsight consists of three parts:


network devices, iMaster NCE-FabricInsight collector, and iMaster NCE-
FabricInsight analyzer.

▫ Network devices:

▪ Huawei CE switches (For details about the models and supported


specifications, see the specification list of the corresponding version.)

▪ Devices report performance metrics such as interface traffic in


telemetry mode based on Google Remote Procedure Call (gRPC).
Devices are connected to iMaster NCE-FabricInsight as gRPC clients.
Users can run commands to configure the telemetry function on the
devices. The devices then proactively establish gRPC connections with
the desired collector and send data to the collector. The current
version supports the following sampling metrics: CPU and memory
usage at the device and card levels; number of sent and received
bytes, number of discarded sent and received packets, and number of
sent and received error packets at the interface level; number of
congested bytes at the queue level; packet loss behavior data. For
details about metrics and device models, see the product specification
list.
Data Processing
Distribution Analysis/AI
Subscription Collection Storage
/Buffering computing

Device Analyzer

Syslog Collection service Kafka Spark Druid/HDFS

User log Streaming


Real-time data Raw data
Telemetry processing
Device/User Spark
performance metric Aggregated data
Offline data
processing
SNMP
Data acceptation Data
distribution/buffering AI algorithm Analytics data
Device management

After data subscription, the collection service module collects data in seconds. The high-throughput distributed message
system is used to buffer and distribute the collected data. Service modules perform data analysis and calculation based on
the AI algorithm and expert experience, and save the processed data to the fast and column-oriented distributed data
storage system. You can access the page to view the data and functions.

43 Huawei Confidential

• Note:

▫ Kafka: the messaging middleware for storing and distributing data reported
by devices.

▫ Spark: a universal parallel framework

▫ Streaming: flow processing

▫ Druid: a database for real-time analysis, used as a high-concurrency


backend API requiring fast aggregation.

▫ HDFS: Hadoop Distributed File System


Contents

1. DCN O&M Challenges and CloudFabric Intelligent DCN O&M Solution

2. iMaster NCE-Fabric

3. iMaster NCE-FabricInsight
▫ Overview
◼ Network Visualization and Health Evaluation

▫ Fault Locating

▫ Change Assurance

44 Huawei Confidential
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Visualized Analysis on Multi-Dimensional Data and


Systematic Evaluation of Network-wide Health Risks
Data collection Network visualization Health evaluation

Configuration Network Network Forwarding


data topology metric entry Fault Risk
Big data Network VM
analytics KPI trend migration Knowledge evaluation assessment
Service Network Log & Network flow + graph
flow resource alarm sampling AI-based +
Network telemetry exception AI-based
Exception Network learnware Network
detection Service SLA
log traffic performance
Vendor A evaluation
profile
Vendor B
SDN SDN Multi- Multiple
Traditional
(network (host cloud -vendor Resource
network Resource status Service Health report
overlay) overlay) network device map
forwarding path
The heterogeneous network data is subscribed The real-time status and trend of Network-wide potential risks are
in real time based on telemetry, covering networks are visualized, with evaluated based on AI and
service flows, configurations, entries, and intuitive insights into network warnings are generated to reduce
multi-dimensional metrics. exceptions and changes in real time. the network fault rate.

45 Huawei Confidential

• iMaster NCE-FabricInsight provides network O&M services featuring visualization,


automation, and intelligence:

▫ Visualization: visible and clear

▪ The concept of "visible" consists of two aspects: observed objects and


real-time observation. Observed objects include physical objects such
as devices, interfaces, and links. Real-time observation supports
perception of millisecond-level symptoms, for example, identifying
microburst traffic congestion on the network.

▪ The concept of "clear" refers to the observation accuracy. On the one


hand, a myriad of data needs to be collected. On the other hand, the
data must be analyzed in real time.

▫ Automation: proactive analysis

▪ To proactively and intelligently detect issues on the network in a


timely manner, the O&M system must be able to analyze massive
data and identify abnormal events on the network. In addition, the
O&M system needs to determine whether to generate issue models
and recommend them to users based on machine learning algorithms.
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Compliance of Real-time Analysis Requirements Based on


Telemetry Technology
SNMP Telemetry

Data
Simple statistics collection with analysis Intelligent data analysis with
manual decision-making automatic troubleshooting

Transport
Unstructured data with low format GPB binary encoding and decoding
encoding and decoding efficiency with high transmission efficiency

<Pull> <Push+gRPC>
Data
Request-response mode with a Continuous data push with only one-
collection
large sampling interval time data subscription

5/15 min
Minute-level polling cycle, failing Data Near Realtime
to meet the service requirements generation Quasi-real-time data acquisition
of real-time management

The quasi-real-time data acquisition capability is the key to data


analysis of the intelligent network O&M.
47 Huawei Confidential
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Telemetry-Powered Proactive Monitoring and Real-Time


Network Visualization (1)

Efficient data collection


Proactive quasi-real-time
subscription based on gRPC,
delivering high performance and
efficiency Identify devices with abnormal metrics.

Extensive data types


SNMP Data collection from
eight dimensions +
proactive management of Set up a benchmark, compare baseline
common metrics metric trends, and identify abnormal metrics.
Telemetry

Intuitive status
Intelligent exception detection
based on dynamic baselines,
intuitively displaying historical
trends and facilitating network
optimization

48 Huawei Confidential
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Telemetry-Powered Proactive Monitoring and Real-Time


Network Visualization (2)
⚫ Real-time monitoring of key metrics from eight dimensions, gaining deep insights into network status.
Measurement Objects Measurement Metrics Default Interval

Device CPU usage/Memory usage 1 min


Board CPU usage/Memory usage, and FIB/MAC address entry usage 1 min
Chip Rules usage, Meters usage, Counters usage, and Slice / Banks usage 1 min
Number of received/sent packets, number of bytes, number of lost packets,
number of error packets, number of broadcast packets, number of multicast
Interface packets, number of unicast packets, bandwidth usage, and number of ECN packets 1 min

ARP & ND attack source tracing


Buffer size 100 ms

Queue Number of sent/received PFC frames, number of PFC deadlock detection times,
number of PFC deadlock recovery times, Headroom buffer in use, and Guaranteed 1 min
buffer in use

Optical link Transmitted/received optical power, current, voltage, and temperature 1 min

Packet loss behavior Forwarding packet discarding and congestion-triggered packet loss 1 min
Entry Details of FIB/ARP/ND entries Dynamic subscription

49 Huawei Confidential
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Dynamic Baseline and Exception Detection (1)


Dataset & preprocessing Dynamic baseline construction Exception detection

Value stability metric scenario Differentiated stability metric scenario Period stability metric scenario
If values at sampling points are out of the If salient differences exist in time points before If salient differences exist between the
valid range, they are called outliers. and after the sampling, it is called link sampling interval series and the overall trend,
comparison exception. it is called parallel comparison exception.

Input: Time series data of metrics Functions: ⚫ Period stability metric algorithm: Functions: ⚫ Number of exceptions
(value, time) time series decomposition ⚫ Suppression and
Functions: ⚫ Automatic identification of
⚫ Value stability metric algorithm: combination of problems
collection frequencies Gaussian regression ⚫ Problem notification
⚫ Automatic filling of missing data
⚫ Baseline boundary construction
⚫ Noise reduction data: noise based on algorithms
reduction of abnormal data
⚫ Baseline sensitivity adjustment
⚫ Special adaptation: extra data
processing during holidays
⚫ Prediction for the top and bottom
Output: ⚫ Data features (value stability or Output: Output: ⚫ Exception
baseline boundaries of the next
period stability)
collection interval
⚫ Metric collection interval

50 Huawei Confidential
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Dynamic Baseline and Exception Detection (2)


⚫ When a baseline exception occurs on a device, you can view associated flow information.
 For static threshold detection, you can adjust the static threshold and number of repetitions.
 For dynamic baseline exception detection, you can adjust the exception dynamic baseline, baseline offset, number of repetitions,
and detection direction.

51 Huawei Confidential

• Dynamic baseline exception detection:

▫ Exception Watermark: applies to ratio-type metrics (such as CPU usage


and memory usage). Dynamic baseline exception detection is performed
only when the metric value exceeds the exception dynamic baseline.

▫ Baseline Offset: applies to the dynamic baseline and is used to adjust the
dynamic baseline detection range.

▫ Repetitive Times: indicates the number of times that the dynamic baseline
or static threshold is exceeded consecutively.

▫ Detection Direction: indicates the direction in which dynamic detection is


performed, including the scenarios where the metric value is detected only
against the upper threshold, the metric value is detected only against the
lower threshold, and the metric value is detected both against the upper
threshold and lower threshold. The trend chart is displayed in green when
the threshold is not exceeded and in red when the static threshold or the
dynamic baseline is exceeded.

▫ Issue Generation upon Threshold Exceeding: If this toggle is switched on


and the static threshold or dynamic baseline is exceeded, a pending issue is
generated on the Health page.

• When a baseline exception occurs on an interface, you can view associated flow
information.

▫ You can adjust the static threshold and number of repetitions. You can also
adjust the sensitivity of dynamic baseline detection based on detected
exceptions.
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Interface Traffic Prediction with Precautions of Health Risks


Scenario: How to determine whether link traffic exceeds Solution: The interface traffic trend in the next three months is
the threshold to provide a basis for capacity expansion predicted, with an algorithm accuracy of 90%.

⚫ iMaster NCE-FabricInsight analyzes over 20 key factors from three


⚫ How to evaluate the traffic during peak hours this year and dimensions, namely, historical time, space topology, and service attributes.
make plans in advance?
⚫ iMaster NCE-FabricInsight predicts whether the interface traffic exceeds
⚫ If the DCI link bandwidth usage is increased from 20% to the threshold in the next three months through the learning and inference
65% within two weeks, when is capacity expansion needed? of deep neural network algorithm.

Server 1 45% 60%


Server Border
Spine
Leaf Leaf
Server 2 45% 60% ↑
100 G
Server 3 45% 60% ↑
Server Border
Spine
Leaf Leaf
Server 4 45% Key factors Model training
60% ↑
Traffic prediction
Challenges to traditional O&M:
⚫ The rules of service traffic growth cannot be manually
identified, making it hard to determine the proper time of
capacity expansion to minimize the cost.
⚫ Bottlenecks exist in capacity. Faults are passively alarmed
and cannot be predicted.
Historical data Predicted data Warning threshold

52 Huawei Confidential
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Interface Traffic Prediction


⚫ You can create prediction tasks to predict the inbound and outbound bandwidth usage of interfaces. You can also view the
prediction trend, threshold-crossing statistics (based on the static threshold configured on the Telemetry page), and deviation rate
statistics.

⚫ Every Sunday, AI predicts the inbound and outbound bandwidth usage trends of interfaces in the next 12 weeks based on the
historical data of the last 66 days. If the data of the last 66 days is incomplete, traffic prediction reliability decreases, or even no
prediction result is generated (the number of days in which historical data is stored is less than the threshold).

53 Huawei Confidential
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

IP Address Visualization: Overview


⚫ The Overview tab page collects the IP address statistics and ranking of the IP address statistics on a network from multiple
dimensions, and clearly displays the statistics.

⚫ The Overview tab page displays statistics on devices and hosts in the current system, including the host access mode, top 10
switches to which hosts are connected, top 10 gateways to which hosts are connected, top 10 fabrics by IP address usage, top 10
subnet usage, online IP address statistics and change trend, as well as invalid IP address statistics.
⚫ Top 10 subnets by IP address usage ⚫ Top 10 devices connected to online hosts

The top 10 subnets by IP View the top 10 switches accessed


address usage are displayed to by VMs.
facilitate the allocation and View the top 10 gateways accessed
planning of IP address by VMs to analyze whether VMs
resources. are evenly distributed.

54 Huawei Confidential

• Use the analyzer of V100R021 as an example. Choose Toolbox > IP 360 to access
the IP address visualization page.
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

IP Address Visualization: IPv4 Distribution


⚫ The IPv4 Distribution tab page displays IP address usage in the network-wide or subnet view. The IP address status can be: Online,
Transient, Offline, Exclude, Unknown, Selected, and Invalid IP Address ID. The page also displays top 10 and bottom 10 subnets
ranked by IP address usage in the current view.

When you select an IP address, the IP


address overview is displayed on the right,
including the address space, number of
online IP addresses, number of offline IP
addresses, number of unknown IP addresses,
home fabric, as well as number of online,
offline, and migrated IP addresses.

55 Huawei Confidential
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

IP Address Visualization: IP Address


⚫ The IP Address tab page displays IP address information about devices and VMs, including the IP address, name, MAC address,
fabric, VLAN, access device, access IP address, gateway interface, first discovery time, latest discovery time, active status, and
discovery mode. IP addresses that frequently migrate are displayed on the top of the list.

⚫ In addition, you can filter items by the IP address, MAC address, fabric, access device, access interface, virtual routing and forwarding
(VRF), VLAN ID, active status, and access type of a VM.

56 Huawei Confidential
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

IP Address Visualization: Full-Lifecycle VM Management (1)

⚫ Scenario:
 During dynamic VM migration, the network team cannot
determine whether the location of the switch to which the
VM is connected has changed. As a result, the VM cannot
be accessed to the network before dynamic VM migration Virtual machine ARP update message
manager (VMM) (telemetry)
and it is difficult to locate the fault.
Leaf-1 Leaf-2 Leaf-3 Leaf-4
⚫ Solution:
 iMaster NCE-FabricInsight uses telemetry to collect ARP
update information (including the added, deleted, and
VM migration
modified information) of network-wide devices, and
supports full-lifecycle visualization of VM login, logout, and
migration records based on fabric information.

57 Huawei Confidential
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

IP Address Visualization: Full-Lifecycle VM Management (2)


⚫ On the IP Address tab page, click View Historical IP Address Access to view the historical information about IP
addresses. As such, you can implement full-lifecycle VM management and view details about VM login, logout, and
migration records.

XX-06-28 13:20:12 169.254.2.1 Offline:


• Switch: Serverleaf_193.162
VM offline
• Access interface: 10GE1/0/20 Offline
• Gateway: 169.254.2.100

XX-06-28 01:11:40 169.254.2.1 Migration:


VM • Switch: Serverleaf_193.162
migration • Access interface: 10GE1/0/20
• Gateway: 169.254.2.100
Migration

XX-06-27 06:09:52 169.254.2l.1 Online:


VM online • Switch: Serverleaf_193.162
• Access interface: 10GE1/0/23
• Gateway: 169.254.2.100
Online

58 Huawei Confidential
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Full Log Analysis Principles


⚫ The log module of the system software logs events occurring during system operation. Logs provide reference information for system
diagnosis and maintenance, and help you check the device running status, analyze network conditions, and locate faults. There are
eight levels based on the severity, each identified by a number. A smaller number indicates higher log severity levels. The detailed
definition of log levels is listed in the following table.

Level Severity Description


A fault that makes the device unable to run normally unless it is restarted. For example, the device restarts due to a program
0 Emergencies
exception or an error in memory usage.
1 Alert Major device fault, which requires an immediate solution. For example, the device memory usage reaches the upper limit.
A fault that needs to be analyzed and processed. For example, the memory usage of the device falls below the lower limit, or
2 Critical
BFD detects that a device is unreachable.
An incorrect operation or service processing exception that does not affect services but needs to be analyzed. For example,
3 Error
users enter incorrect commands or passwords, or error protocol packets are received.
An exception that occurs when a device is operating and requires attention because it may cause service processing faults. For
4 Warning
example, a routing process is disabled, BFD detects packet loss, or error protocol packets are detected.
A key operation that is performed to ensure normal operations of the device, such as the interface shutdown, the neighbor
5 Notification
discovery (ND), or the status change of the protocol state machine.
6 Informational A common operation that is performed to ensure normal operations of the device. For example, the display command is run.
7 Debugging Common information that is generated during normal operations of the device, which requires no attention.

59 Huawei Confidential

• iMaster NCE-FabricInsight monitors all logs from level 0 to level 4, with statistics
collected in different dimensions, such as the device name, IP address, module,
severity level, and type, so as to quickly master the distribution of abnormal logs
on the network.

• The system analyzes and displays fault logs on devices and allows you to filter
logs by the device name, device IP address, module, severity, type, and details.
Log severities include Emergencies, Alert, Critical, Error, and Warning.
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Full Network Log Visualization Enables Intelligent Analysis


on Abrupt and Occasional Exceptions
Intelligent identification of abrupt log
changes and exceptions
with proactive warning
Intelligent exception
identification
Detect the abrupt change of network- Abnormal increase of exception logs
wide logs based on machine learning and
provide warnings in time.

Occasional/New log analysis View the trend chart of the number of exception logs and
Count the abrupt change of logs and the the distribution of logs by severity in each time period.
type, module, level, and quantity of new
logs to quickly identify key check points.

Network-wide log
event visualization
Display the trend, distribution statistics, and
details of logs from level 0 to level 4 in
multiple dimensions to present intuitive
insights into network-wide log events.

60 Huawei Confidential

• Application scenario:
▫ iMaster NCE-FabricInsight identifies exception logs that increase sharply on
a network. By performing dynamic baseline exception detection,
compressing logs, and comparing logs generated before and after
exceptions, iMaster NCE-FabricInsight helps O&M personnel to quickly
identify root causes of exceptions.
• Exception identification principles:
▫ iMaster NCE-FabricInsight checks whether the number of exception logs on
the entire network increases sharply based on the dynamic baseline.
▫ It then analyzes the logs that increase sharply by log type and frequency to
identify log distribution and check whether there are occasionally generated
logs.
▫ iMaster NCE-FabricInsight performs multi-dimensional clustering analysis
on the analysis result and automatically generates an issue, prompting
users to solve the issue in a timely manner.
• View the trend of the number of exception logs and details about exception logs:
The trend chart displays the trend of exception logs in the current time window,
top 10 devices and features by the number of exception logs, and log distribution
by severity.
▫ Move the pointer to a time period in the trend chart and view data in the
time period.
▫ Move the pointer to a device, feature, or log severity to view the
corresponding statistics.
▫ Click Top 10 Devices, Top 10 Features, or Logs by Severity to display the
corresponding exception log statistics and log list.
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Network Traffic Analysis Overview

NetStream flow analysis ERSPAN (TCP) Edge intelligence

Network traffic composition Connectivity fault analysis SLA detection based on TCP,
analysis based on traffic sampling based on service flow and UDP, and multicast flows
network association
Analyze the network traffic, traffic trend Implement one-click troubleshooting of Monitor major services based on quality
statistics, and traffic characteristics from connectivity issues based on correlation analysis on the connectivity and packet
multiple dimensions based on the analysis between TCP services and loss/latency of specified services, and
NetStream traffic sampling technology to networks through flow path visualization, quickly locate fault points after poor-QoE
identify abnormal network traffic and hop-by-hop latency awareness (feature issues such as packet loss occur.
allocate resources properly. packets), and abnormal traffic analysis.

61 Huawei Confidential

• Note:

▫ ERSPAN: Encapsulated Remote Switched Port Analyzer


Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

NetStream Packet Statistics Collection Principle


⚫ A service flow is a flow of unidirectional data packets transmitted from a source IP address to a destination IP
address. The packets in this service flow have the same attributes: source IP address, source port, destination IP
address, destination port, IP protocol, and inbound and outbound interfaces. When a device receives the first IP data
packet, a flow is initialized. All data packets that meet the characteristics of the flow are included into the byte
count and packet count of the flow, and the information about the flow is uploaded through UDP for analysis.

Flow Packet-1 Packet-2 Packet-3 Packet-4 Packet-5 Packet-6 …… Packet-N

Source IP address: identifies the traffic source.


Destination IP address: identifies the traffic direction.
Source port: identifies the traffic source.
Destination port: identifies applications used by traffic.
Layer 3 protocol: identifies protocols used by traffic.
Input interface: identifies the traffic distribution of network devices.
Other information about IP packets: ...

62 Huawei Confidential

• Packet sampling is enabled on an interface by running the NetStream sampler


random-packets packet-interval { inbound | outbound } command.

• That is, packets are periodically sampled within the specified packet interval (1–
65535). For example, if the interval is 100 packets, one random packet is sampled
from every 100 packets.
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Refined Network Traffic Analysis Based on NetStream,


Gaining Insights into Network Traffic Composition
⚫ User-defined application visualization, supporting
traffic analysis on 500 user-defined applications.
⚫ Multi-dimensional traffic analysis and multi-
Traffic analysis
dimensional drill-down correlation analysis,
NetStream V9
understanding the detailed traffic distribution and
Analyzer
trend to properly allocate resources and identify
• Traffic cleaning, aggregation, network capacity expansion points.
and storage
• Traffic analysis ⚫ Customization of IP groups, port groups, overall
• Result display
analysis on subnets, domain names, and private line
traffic, identifying the traffic proportion and analyzing
abnormal traffic and interface usage to quickly locate
network exceptions.

63 Huawei Confidential
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Network Traffic Analysis: Device Traffic


⚫ iMaster NCE-FabricInsight provides a traffic statistics list by device. You can click a row to view the traffic trend of a
device and multi-dimensional analysis results of traffic characteristics.

View the traffic trend of a device.

View traffic characteristics of the device in multiple dimensions.

64 Huawei Confidential
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Network Traffic Analysis: Interface Traffic


⚫ iMaster NCE-FabricInsight provides a traffic statistics list by interface. By default, interfaces are sorted in descending
order of traffic volume. You can click a row to view the traffic trend of the current interface and the main traffic
components of the interface from multiple dimensions.

View the traffic trend of an interface.

View traffic characteristics of the interface in multiple dimensions.

65 Huawei Confidential
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Network Traffic Analysis: Application Traffic (1)


⚫ iMaster NCE-FabricInsight provides a traffic statistics list by application.

66 Huawei Confidential
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Network Traffic Analysis: Application Traffic (2)


⚫ You can click a row to view the traffic trend of an application and multi-dimensional traffic characteristic analysis
results.
View the traffic trend of an application.

View traffic characteristics of the application in multiple dimensions.

67 Huawei Confidential
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Network Traffic Analysis: Host Traffic


⚫ iMaster NCE-FabricInsight provides a traffic statistics list by host. You can click a row to view the traffic trend of a
host and multi-dimensional traffic characteristic analysis results.

View the traffic trend of a host.

View traffic characteristics of the host in multiple dimensions.

68 Huawei Confidential
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Network Traffic Analysis: Session Traffic


⚫ iMaster NCE-FabricInsight provides a traffic statistics list by session. You can click a row to view the traffic trend of
a session and multi-dimensional traffic characteristic analysis results.

View the traffic trend of a session.

View traffic characteristics of the session in multiple dimensions.

69 Huawei Confidential
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

ERSPAN Flow Analysis: TCP Control Packet Collection


Principle
TCP client TCP server

SYN
Three-way 2 SYN
handshake
for link SYN, ACK Spine
establishment

ACK ERSPAN

3 SYN 1 2 3
FIN, ACK 1 SYN

Four-way ACK Leaf


handshake
for link FIN, ACK
disconnection
ACK

TCP flows on the network can be collected


through ERSPAN mirroring of TCP feature
packets (SYN, FIN, and RST packets).

70 Huawei Confidential

• iMaster NCE-FabricInsight can obtain the following information about a TCP flow:

▫ Packet forwarding route information.

▫ TCP start and end time.

▫ Transmitted bytes. (FIN serial number minus SYN serial number.)

▫ SYN route latency and FIN route latency.

▫ Exception: latency >1 ms, TCP Flags exception (RST), TCP retransmission,
TTL < 3, etc.
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

ERSPAN Flow Visualization: TCP Flow Analysis


⚫ The Dashboard page provides the statistics of top N information in a session from multiple dimensions and session
visualization statistics analysis pages to display the trends of network-wide TCP connection setup times and TCP
connection setup failures as well as network-wide TCP connection setup statistics.
Statistics on network-wide TCP connection setup and connection setup failure

Network-wide TCP connection setup statistics

71 Huawei Confidential

• Statistics on network-wide TCP connection setup and connection setup failure:


This portlet displays the trend chart of the number of connection setup times and
number of connection setup failures. By default, the system filters data by the
"session status = connection setup failure" condition.

• Network-wide connection setup statistics: This portlet displays network-wide


connection setup statistics. You can switch statistics between the 2-tuple (source
IP address and destination IP address) and 3-tuple (source IP address, destination
IP address, and destination port). You can click a row to view details about
connection setup failures, including the ratio of SYN and SYN ACK connection
setup failures, number of connection setup failures, statistics chart of the number
of connection setup failures, and trend chart of the connection setup failure rate.
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Edge Intelligence Solution


CE switch NetStream V9

Coprocessor
Periodical export in every 10 seconds
or instant export after TCP flow aging
TCP/UDP traffic analysis
Traffic statistics: 4-tuple/start and end
time of flows/traffic volume
Flow exception: packet loss/RTT ⚫ User scenario:
 1:1 data packet analysis of specified flows (configurable), including
Incoming and outgoing traffic TCP/UDP (unicast) traffic.
(specified) is copied and sent.
Original
⚫ Output traffic information:
packet Forwarding chip  Traffic visualization (5-tuple and port)
ACL for matching a specific TCP/UDP flow
 Traffic statistics (packet/byte)

 Flow quality (packet loss and latency)

⚫ This function is implemented based on a dedicated coprocessor


and does not affect the CPU usage of the device.

72 Huawei Confidential

• Note:

▫ RTT: Round-Trip Time, indicating the total latency from the time when the
transmit end sends data to the time when the transmit end receives an
acknowledgment from the receive end (the receive end sends an
acknowledgment immediately after receiving the data).
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Edge Intelligence: Setting a Specified Flow Analysis Task


⚫ Edge intelligence performs full-packet analysis on specific flows and proactively identifies information such as packet loss, latency,
and zero windows.

⚫ Set a flow analysis task:


 You can set the source IP address, destination IP address,
destination port, and protocol of a flow analysis task to measure
and analyze specified flows. The configuration can be
automatically delivered to selected devices.

 When the protocol is TCP, IPv4 and IPv6 (IPv6 overlay) are
supported and the device configuration items include ACL number,
whether to match VXLAN packets, whether to match packets
containing one-layer VLAN tags, whether to configure aging of TCP
termination packets, aging time of active items, aging time of
inactive items, unidirectional flow matching sequence number, and
unidirectional flow matching mask. Latency settings include the
RTT threshold. Zero window settings include the zero window
threshold.

73 Huawei Confidential

• When the protocol is UDP, only IPv4 is supported and the device configuration
items include ACL number, whether to match VXLAN packets, whether to match
packets containing one-layer VLAN tags, and aging time of inactive items.
Latency settings include the latency threshold.
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Edge Intelligence: Specified TCP Flow Analysis (1)


⚫ Quality analysis of specified TCP flows:
Overview
• Overview:
• This portlet displays the task name, protocol, total session count,
number of sessions with packet loss, enabled measuring point,
RTT threshold, zero window threshold, device configuration, and
duration of the measurement task. You can place the pointer on
Device Configuration to view the detailed configurations. You
can click the delete icon in the upper right corner to delete the
measurement task and related device configurations.

Basic information

Analysis conclusion

Event list

74 Huawei Confidential

• Quality analysis of specified TCP flows:

▫ Basic information: displays the number of sessions with packet loss, total
number of sessions, number of lost packets in the request directions, total
number of packets in the request directions, number of lost packets in the
response directions, total number of packets in the response directions,
total traffic in the request direction, total traffic in the response direction,
number of zero windows in the request direction, and number of zero
windows in the response direction.

▫ Analysis conclusion: displays the packet loss rate analysis results, packet loss
node analysis results, average RTT analysis results in the request and
response directions, and analysis results for the maximum number of zero
windows in the request and response directions.

▫ Event list: displays the source IP address, destination IP address, and


destination port number for the measurement task, and allows you to view
the topology and metrics for the session with the specified source IP
address, destination IP address, and destination port number.
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Edge Intelligence: Specified TCP Flow Analysis (2)


Topology

75 Huawei Confidential

• Quality analysis of specified TCP flows:

▫ Topology: displays the fabric and networking information, information


about the devices for which edge measurement has been enabled and
information about the devices for which edge measurement can be
enabled, topology between the source host and the destination host, and
packet loss information (including the number of lost packets and packet
loss rate) in the request and response directions.
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Edge Intelligence: Specified TCP Flow Analysis (3)


Measurement point-based flow quality analysis

76 Huawei Confidential

• Quality analysis of specified TCP flows:

▫ Measurement point-based traffic quality analysis: You can click a


measurement point (or an extended group at the same layer) to view the
analysis data of the measurement point. You can click More to view more
statistics records. The list displays the statistics on sessions in the request
and response directions in each time period, including the total traffic, total
number of lost packets/total number of packets, packet loss location
relative to the measurement point, RTT, number of zero windows, time
when the first zero window starts, and time when the last zero window
ends.
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Overall Network Health Check, Systematically Evaluating


the DCN Quality
Key assurance &
network SLAs
You can view key metrics such as
the network-wide key assurance
object, intent verification result,
transmission latency, and packet
loss rate.

Network-wide
resource status check
You can view network-wide
underlay/overlay network resources
and collected KPI metric data, and
compare the data collected
yesterday with that collected today.

Five-layer health
evaluation system
You can view detailed analysis from
dimensions such as device, network,
protocol, service, and overlay to
check whether the network health
status is normal.

77 Huawei Confidential

• Health evaluation refers to the evaluation on the overall health status of the
current network based on identified network issues, helping you quickly and
accurately identify and rectify faults.

▫ This portlet displays the overall health status of the network based on
multiple metrics such as the number of service assurance objects, network
connectivity intent verification, average transmission latency, and packet
loss rate. It also displays the distribution and growth of abnormal data
from dimensions such as device and telemetry.

▫ This portlet displays the number of pending issues, events, and resources
from dimensions such as device, network, protocol, overlay, and service. You
can click each layer to view the total number of events, resources, and
events unassociated with issues of network entities.
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Issues That Can Be Identified Based on Health Evaluation (1)

Category Issue
Switch CPU threshold exceeded, switch memory threshold exceeded, service affected by switch interface congestion,
firewall CPU or IPv4 session threshold exceeded, abnormal switch CPU usage increase, abnormal switch memory
usage increase, abnormal firewall CPU usage increase, abnormal firewall memory usage increase, abnormal drop
Performance
packet increase, abnormal error packet increase, abnormal unicast packet increase, abnormal multicast packet
increase, abnormal broadcast packet increase, abnormal bandwidth usage change, abnormal huge page memory
usage increase, and abnormal forwarding core usage increase
Switch ARP entry threshold exceeded, switch ND entry threshold exceeded, switch MAC entry threshold exceeded,
switch storage space threshold exceeded, switch ACL resource threshold exceeded, switch SFU forwarding
performance insufficiency, switch FIB4 entry threshold exceeded, switch FIB6 entry threshold exceeded, number of
routes received from a BGP peer exceeding the limit, abnormal switch ARP entry increase, abnormal switch ND entry
Capacity increase, abnormal switch FIB4 entry increase, abnormal switch FIB6 entry increase, abnormal switch MAC entry
increase, predicted traffic threshold exceeding, switch BD threshold exceeded, switch VRF entry threshold exceeded,
switch Layer 2 sub-interface threshold exceeded, abnormal TCAM rule usage increase, predicted forwarding core
usage threshold exceeding, abnormal EMC entry usage increase, abnormal ND-suppress entry usage increase,
abnormal ARP-suppress entry usage increase, and abnormal virtual port usage increase

78 Huawei Confidential
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Issues That Can Be Identified Based on Health Evaluation (2)

Category Issue
Switch LPU exception, repeated switch LPU exception, switch MPU exception, switch SFU exception, repeated switch
SFU exception, switch fan exception, switch power exception, link port status flapping, unidirectional link connectivity
fault on the network side of a switch, routing loop, switch port Error-Down, suspected subhealthy optical link,
suspected switch entry change, switch ARP entry loss, switch routing table loss, BGP peer status flapping, access-side
IP address conflict on the VXLAN network, suspected Layer 2 loop, optical module type mismatch, repeated switch
Status
MPU exception, repeated switch restart, switch fault, switch disconnection, switch M-LAG dual-active state, switch
chip soft failure, VXLAN tunnel interruption, license file expiration, OSPF router ID conflict, physical switch port
suspension, OSPF DR IP address conflict, OSPF neighbor status change, BGP peer status change, stack fault, host IP
address conflict, IP address conflict on the network side, access-side port blocked by STP, license file about to expire,
and abnormal increase of exception logs
Policy TCP SYN flood attack, ARP attack, ND attack, and invalid ARP packet received by a switch
Single IP address fault on the access side, server access fault, TCP service port not enabled, TCP service port fault, and
Connection
service interruption caused by BD deletion, sub-interface shutdown, or sub-interface deletion
Inconsistent link and port metrics, routing loop on the entire network, routing blackhole on the entire network, service
Intent
reachability intent verification failure, and service isolation intent verification failure

79 Huawei Confidential
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Real-Time or Periodical Push of Health Reports, Providing


References for Optimization
Network overview KPI details Report details

80 Huawei Confidential

• Network overview:

▫ Display different key information, including resource overview, client


overview, and quality overview.

• KPI details:

▫ Identify network quality issues based on five dimensions of the network


health evaluation system.

• Report details:

▫ Display the health status from five dimensions in detail and identify
exceptions in a timely manner to provide optimization suggestions.
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Multi-Cloud Multi-DC Analysis, with Unified Cross-Domain


Health Evaluation

IP network

Forwarding plane: VXLAN


DC site 1 Control plane: BGP EVPN DC site 2

Multi-data center analyzer


(MDA: the deployment of extra three
physical machines/VMs)

FabricInsight1: FabricInsight 2:
single-node/cluster deployment standalone/cluster deployment

Multi-domain application mutual Multi-domain interconnection Multi-domain network


Unified O&M portal
access traffic visualization traffic visualization health evaluation

81 Huawei Confidential

• Unified O&M portal:


▫ Scenario example: a data center has 47 PoDs, with over 10 O&M portals
before and different login passwords for each O&M system.
▫ Solution: network-wide O&M requiring only one login based on single sign-
on (SSO).
• Multi-domain application mutual access traffic visualization:
▫ Scenario example: bandwidth costs are allocated and applications
consuming a large number of bandwidth resources should be counted.
▫ Solution: cross-DC/-fabric application interaction traffic and trend
visualization, enabling fast identifications of abnormal burst traffic.
• Multi-domain interconnection traffic visualization:
▫ Scenario example: private line bandwidth should be scaled in/out based on
service changes, requiring evaluations on the inter-domain interconnection
traffic.
▫ Solution: traffic visualization of the Internet, VPN, and private line on the
Fabric egress.
• Multi-domain network health evaluation:
▫ Scenario example: the overall health status of the network should be
evaluated in routine inspections to check whether the network traffic
increases or decreases sharply.
▫ Solution: network-wide health condition interpretation from dimensions of
north-south DC traffic, east-west DC traffic, and intra-DC traffic.
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Health Evaluation by MDA


Cross-DC/-fabric network
access traffic
Cross-DC interconnection network
traffic is identified based on egress
links to analyze the traffic of
Internet, VPNs, and private lines at
the fabric egress.

Cross-DC/-fabric application
access traffic
Cross-fabric application interaction
traffic and trend are displayed to
quickly identify abnormal traffic
changes, facilitating fault locating
and capacity expansion.

Cross-DC/-fabric network
evaluation
The composition of north-south
DC, east-west DC, and intra-DC
traffic at peak hours is analyzed to
identify the applications with high
traffic at peak hours, and evaluate
the overall network health status.

82 Huawei Confidential

• The health evaluation function evaluates the overall health status of the current
network based on identified network issues, helping customers quickly and
accurately identify and rectify faults.

▫ Network evaluation: This portlet analyzes the composition of north-south


and east-west traffic at peak hours, identifies the applications with high
traffic at peak hours, and evaluates the overall network health status.

▫ Traffic distribution: You can click View Details to view details about the
composition of north-south and east-west traffic on the entire network,
including the traffic statistics, bandwidth usage, and health evaluation
statistics.
Network KPI Network IP Address Network Log Network Traffic Health Risk
Visualization Visualization Visualization Visualization Assessment

Issues That Can Be Identified Based on Health Evaluation by


MDA

Category Issue

Capacity Cross-fabric routes received from BGP peers exceeding the threshold

Change of BGP peer status between fabric gateways, BGP peer relationship
Status flapping between fabric gateways, cross-fabric host IP address conflict, and
VXLAN tunnel interruption

Link port metrics inconsistency, routing loop on the entire network, routing
Intent blackhole on the entire network, service reachability intent verification failure,
and service isolation intent verification failure

83 Huawei Confidential
Contents

1. DCN O&M Challenges and CloudFabric Intelligent DCN O&M Solution

2. iMaster NCE-Fabric

3. iMaster NCE-FabricInsight
▫ Overview

▫ Network Visualization and Health Evaluation


◼ Fault Locating

▫ Change Assurance

84 Huawei Confidential
Proactive Fault Identification and Analysis Service Assurance and Fault Locating

"1-3-5" Troubleshooting (1)


Major loss caused Far-reaching Strict industry
by service impacts of supervision
interruptions network faults requirements
Three
challenges of
network O&M

US$ 100,000 Cloud-based Service interruption over 30


in losses per deployment of minutes reported to the
hour for 98% 70% of China Banking Regulatory
of enterprises applications Commission (CBRC)

Traditional manual O&M problems

Manual monitoring Manual locating Manual rectification


• Cloud-based services, • Point-by-point check: path • Experience-based, no
resulting in network > device > packet capture pre-verification
invisibility • Average time for fault • Uncontrollable risks
• Unaware of network faults locating: 76 minutes

85 Huawei Confidential

• The passive troubleshooting based on personal experience evolves into the AI-
based automatic locating of closed loops.
Proactive Fault Identification and Analysis Service Assurance and Fault Locating

"1-3-5" Troubleshooting (2)

1 - minute fault detection 3 - minute fault locating 5 - minute fault rectification


Collect full Construct the Intelligently Rectification plan
information in real network Network recommend recommendation
time based on knowledge graph knowledge graph and evaluate Change impact
telemetry, and using AI algorithms, rectification analysis
Health score: 84 Root
quickly detect faults and quickly locate solutions, and Service traffic
cause
through the root causes of automatically simulation.
network health faults. rectify faults. Automatic
module. delivery

86 Huawei Confidential
Proactive Fault Identification and Analysis Service Assurance and Fault Locating

AI-based Knowledge Inference, Achieving Fast Fault


Locating
Collection Analysis Decision-
making
Intelligent analysis engine

Knowledge inference BGP OSPF ……


Huawei 30+ years engine
of O&M
expertise Knowledge 1 IS-IS Manual recovery
STP BFD
Knowledge 2 Anomaly
Continuous learning detection
Knowledge3 Root cause
and training
based on real site analysis
Knowledge 4 Risk prediction
faults
Intent-driven
Model application troubleshooting in a
Multi-dimensional closed-loop manner
DC data AI-based exception Network object
Data cleansing
Service flow data/ identification modeling
Telemetry data

iMaster NCE-FabricInsight iMaster NCE-Fabric

87 Huawei Confidential
Proactive Fault Identification and Analysis Service Assurance and Fault Locating

"1-3-5" Troubleshooting Scope (1)


Number SDN
Dimension Issue Non-SDN Network
of Issues Network
Switch faults, repeated switch restarts, switch disconnections, switch MPU
exceptions, repeated switch MPU exceptions, switch LPU exceptions,
repeated switch LPU exceptions, switch SFU exceptions, repeated switch
SFU exceptions, switch fan exceptions, switch power exceptions, switch CPU
threshold crossing/abnormal increase of the switch CPU usage, switch
memory usage threshold-crossing/abnormal increase of the switch memory
usage, switch ACL resource threshold crossing, switch FIB4 entry threshold
crossing/abnormal increase of switch FIB4 entries, switch FIB6 entry
threshold crossing/abnormal increase of switch FIB6 entries, switch ND
entry threshold crossing/abnormal increase of switch ND entries, switch
ARP entry threshold crossing/abnormal increase of switch ARP entries,
Device 43 ✔ ✔
switch MAC entry threshold crossing/abnormal increase of switch MAC
entries, switch SFU forwarding performance insufficiency, switch storage
space threshold crossing, stack faults, suspicious Layer 2 loops, abnormal
increase of exception logs, firewall CPU usage or IPv4 session resource
usage threshold crossing, abnormal increase of the firewall CPU usage,
abnormal increase of the firewall memory usage, license file expiration,
license about to expire, switch BD/VRF/L2 sub-interface threshold crossing,
traffic exceptions caused by lost switch routing hardware tables, flow
exceptions caused by CE switch chip soft failures, traffic exceptions caused
by switch entry inconsistencies between the software table and hardware
table, and traffic exceptions caused by lost switch ARP entries

88 Huawei Confidential
Proactive Fault Identification and Analysis Service Assurance and Fault Locating

"1-3-5" Troubleshooting Scope (2)


Number SDN
Dimension Issue Non-SDN Network
of Issues Network
Access-side interfaces blocked by STPs, suspected sub-healthy optical links,
services affected by switch interface congestions, link interface status ✔
flapping, switch interface error-down, switch physical interface suspended, The following issues
unidirectional link connectivity faults on the network side of a switch, host are excluded:
IP address conflicts, IP address conflicts on the network side, predicted Link interface metric
Network 19 ✔ inconsistency
traffic threshold crossing, link interface metric inconsistency, routing loops
on the entire network, routing blackholes on the entire network, abnormal Routing loops on the
increase of drop packets, abnormal increase of error packets, invalid ARP entire network
packets received by switches, optical module type mismatch, ARP attack, Routing blackholes
and ND attack on entire network

OSPF router ID conflicts, changes of the OSPF neighbor status, OSPF DR IP


address conflicts, changes of the BGP neighbor status, BGP peer
Protocol 7 ✔ ✔
relationship flapping, two master switches in M-LAG, and routes received
from BGP peer threshold exceeding the limit
VXLAN tunnel interruptions, IP address conflicts on the VXLAN network
Overlay access side, service interruptions due to BD deletion, and service 5 ✔ ✘
interruptions due to sub-interface deletion
Access-side single IP address exceptions, server access exceptions, TCP
service interface exceptions, TCP service interface disabled, TCP SYN flood
Service
attacks, service reachability intent verification failures, and service isolation
7 ✔ ✘
intent verification failures

89 Huawei Confidential
Proactive Fault Identification and Analysis Service Assurance and Fault Locating

Use Case: Millisecond-Level Queue Detection and Proactive


Identification of Service Packet Loss
Service interruption caused by packet loss due to Timely display of service packet loss based on
microbursts, resulting in difficult fault locating. millisecond-level detection, enabling fast fault locating.
Big data services require a large number of servers to form a • Interface buffer size is detected at a 100-ms interval based on telemetry.
cluster and work together. • For example, when packets are discarded due to queue congestion, the 5-
Once the traffic of multiple nodes is sent to the same compute tuple details (port-queue-discarded packet) should be proactively detected.
node, packet loss due to transient congestion may occur on the • Faults are discovered based on interfaces.
network and services are interrupted.

➢ The traditional NMS collects data every 5 minutes, unable to


identify microbursts.
➢ Issues occur irregularly, which are difficult to trace and
reproduce.

90 Huawei Confidential

• Note:

▫ NMS: Network Management System


Proactive Fault Identification and Analysis Service Assurance and Fault Locating

Case: Root Cause Analysis of Service Packet Loss

Root cause: 10GE1/0/17 of Spine 2 is continuously congested, causing packet loss.

91 Huawei Confidential
Proactive Fault Identification and Analysis Service Assurance and Fault Locating

Case: Handling Suggestions for Service Packet Loss


Repair advice

92 Huawei Confidential

• Handling suggestions:

▫ Step 1: Run the display qos queue statistics interface port-type port-
number command on the switch to check whether packet loss occurs on
each interface queue.

▫ Step 2: Choose Telemetry > Interface on iMaster NCE-FabricInsight and


check whether the traffic trend of the interface complies with the historical
trend.

▫ Step 3: If the traffic trend does not comply with the historical trend and the
traffic increases sharply, keep checking for 30 minutes and check whether
any application fault is reported. If no fault is reported, close this issue.

▫ Step 4: If an application fault is reported, notify the corresponding service


team. If packet loss due to congestion persists, migrate some heavy-traffic
VMs or use switches with higher bandwidth.
Proactive Fault Identification and Analysis Service Assurance and Fault Locating

Unknown Fault Inference: Network Fault Inference and


Source Tracing Based on Knowledge Graphs
Knowledge
Network modeling Event association Fault inference
management
1
2 4
3
A model
5
7
6
AI-based similarity
computing
1
2 4
B model 8 9

6
• Construction of a knowledge • Automatically associate logs, • Fault inference and root cause • Manual marking of inference
base for 40+ objects, 300+ metrics, and flow events with aggregation. results.
attributes, and other specified network objects. • Mining and display of fault • Algorithm-based graph
relationships. • Dynamic graph update and propagation chains. similarity matching.
• Compatible with third-party automatic locating triggering. • Automatic generation of fault • Dynamic maintenance of case
access devices. descriptions based on inference libraries based on customer
results. concerns.

93 Huawei Confidential
Proactive Fault Identification and Analysis Service Assurance and Fault Locating

Unknown Fault Inference: Exception Analysis


Display root causes in a centralized
mode based on fault inference.

Event list

View fault propagation Display fault source tracing by


paths by alarm. network object in a graph.

94 Huawei Confidential

• On the Health page, you can click a specific network entity to view its details and
the icon in the Operation column is blue. You can click this icon to go to the
knowledge graph page and view the exception analysis about the associated
event.

• This portlet displays possible root causes and fault propagation paths. You can
click an NE to view associated events.
Proactive Fault Identification and Analysis Service Assurance and Fault Locating

Use Case: Minute-level Root Cause Locating Based on


Knowledge Graphs
AS-Is: TO-Be:
Manual locating based on alarms and experience Automatic exploration of root causes
based on knowledge graphs
Scenario: Services are interrupted in a bank and 300+ alarms
are reported within a short period. As a result, the one-hour Root causes of fault issues are proactively reported, enabling
manual troubleshooting shows that the interruption is mainly minute-level automatic locating of fault points.
caused by the router ID conflict.

vs.

Locating based on expert


Alarm monitoring viewing
experience after alarm
classification

95 Huawei Confidential

• Challenges to traditional O&M:

▫ Massive alarms: hundreds of alarms triggered by a fault, resulting in


difficult locating.

▫ Manual fault locating: locating based on expert experience, requiring a lot


of time.

▫ Source tracing failure: unable to identify the fault impact scope, resulting in
difficult root cause tracing.

• Intelligent O&M:

▫ Fault aggregation: only root causes of fault issues are reported.

▫ Automatic locating: AI-based intelligent inference, not relying on personnel


skills.

▫ Path visualization: fault propagation path display and fault impact scope
identification.
Proactive Fault Identification and Analysis Service Assurance and Fault Locating

Unified O&M of Multiple Vendors


KPI visualization Function
• Device-, board-, and Detailed List
Category
interface-level KPI
visualization • Switch fault
• 24/7 health evaluation on • Repeated switch restart
devices/networks/protocols • Abnormal increase of exception logs
• Switch LPU fault
• Suspicious Layer 2 loop
• IP address conflict on the network side
• EtherChannel interface down
Troubleshooting • Switch interface error-down
Unified O&M • Link interface status flapping
• OSPF router ID conflict
of third • Predicted traffic threshold exceeding
parties • Abnormal increase of the switch CPU usage
• Abnormal increase of the switch memory usage
• Abnormal increase of discarded packets on an interface
• Abnormal increase of error packets on an interface
• CPU usage
• Memory usage
• Number of received/sent packets, number of
Troubleshooting Exception prediction received/sent broadcast packets, number of
KPI visualization
received/sent multicast packets, number of
• Minute-level root cause • Intelligent exception
received/sent unicast packets, number of received/sent
locating for 15+ typical issues detection based on
bytes, number of received/sent discarded packets, and
• Network traffic analysis dynamic baselines
number of received/sent error packets
• Traffic prediction

96 Huawei Confidential
Proactive Fault Identification and Analysis Service Assurance and Fault Locating

Minute-level Proactive Detection and Locating of Inter-DC


or Inter-Fabric Faults
Option 1
DCI Core ⚫ MDA identifies inter-DC or inter-fabric network issues in
Option 2
Border Leaf DWDM/Dark fiber Border Leaf minutes through the knowledge graph modeling and analysis of
Fabric Option 3 Fabric cross-domain networks to quickly locate root causes.
gateway gateway
⚫ The following inter-DC or inter-fabric network issues can be
Spine Spine
identified currently:
Issue List
Server Leaf Service Leaf Service Leaf Server Leaf
BGP peer status changing between fabric gateways
BGP peer status flapping between fabric gateways
Cross-fabric routes received from BGP peers threshold exceeding the limit
BGP EVPN BGP EVPN BGP EVPN
Option 1 VXLAN tunnel interruption
VXLAN VXLAN VXLAN
Cross-fabric host IP address conflict
BGP EVPN IGP/BGP BGP EVPN Link interface metric inconsistency
Option 2 Layer 3 interconnection
VXLAN VXLAN Routing loop on the entire network
on the underlay network
Routing blackhole on the entire network
BGP EVPN VLAN BGP VLAN BGP EVPN
Option 3 Handoff EVPN Handoff Service reachability intent verification failure
VXLAN VLAN VXLAN VLAN VXLAN Service isolation intent verification failure

97 Huawei Confidential
Proactive Fault Identification and Analysis Service Assurance and Fault Locating

Cross-DC or Cross-Fabric Pending Issues and Historical


Issues
⚫ Issues are identified using certain rules based on original exception information such as exception logs reported by
devices and detected KPI exceptions.

98 Huawei Confidential

• The Pending Issues list displays all issues that are not cleared or acknowledged.
The Historical Issue list displays all cleared and acknowledged issues. You can
view issue details, including basic issue information and issue impact scope.

• Issues of the MDA health evaluation function and issues of the iMaster NCE-
FabricInsight health evaluation function are independent of each other. The two
kinds of issues cannot be cleared at the same time or the acknowledgment status
of them cannot be conducted at the same time.
Proactive Fault Identification and Analysis Service Assurance and Fault Locating

Flow Analysis: Path Comparison


⚫ Implementation of path comparison:
 Automatic learning of network forwarding paths between VMs is enabled based on the VXLAN overlay forwarding model. For
instance, four paths are available for IP1 to access IP2.
A B C1 D1 A B C2 D1

A B C1 D2 A B C2 D2

 When an exception occurs, the forwarding path of an abnormal packet and those of a normal one are compared to quickly
detect their differences.
A B C2 D2
◼ Scenario 1: The path is incomplete.

◼ Scenario 2: The path changes. A B C1 D4

99 Huawei Confidential
Proactive Fault Identification and Analysis Service Assurance and Fault Locating

Flow Analysis: Troubleshooting Based on the Rule Engine


1 ① Clustering analysis is conducted on
Mirrored flow AI 4
Root cause the mirrored flow data on the
data/Telemetry data ERSPAN, gRPC engine
network through the AI engine to
Connectivity fault proactively detect abnormal
troubleshooting for
distributed VXLANs connectivity.
Connectivity issues
Proactive exception
② Flow paths with abnormal
identification Layer 3 interconnection
2 connectivity are modeled to identify
Flow path modeling 3 The next hop is not the logical topology of the event
SNMP + FTP Input Output a tunnel.
Device Knowledge flow. Current abnormal flow paths
configuration inference engine
The next hop is a VM. and previous normal flow paths are
snapshot
compared to identify breakpoint
Telnet
Device entry The breakpoint device devices of flow paths.
snapshot does not have the ARP
Syslog entry of the ③ The fault inference is performed
Device exception destination IP address. based on the fault inference rule
log
library and the context of breakpoint
Check whether the
... access sub-interface devices to identify possible faults.
is down.
④ The detailed troubleshooting process
*see the next slide.
is displayed.

100 Huawei Confidential


Proactive Fault Identification and Analysis Service Assurance and Fault Locating

Knowledge Inference Engine Example


Connectivity fault
troubleshooting for
distributed VXLANs

Layer 2 Layer 3 …
interconnection interconnection

The next hop of the The next hop of the breakpoint …


breakpoint device is a tunnel. device is not a tunnel.

… The next hop is a The next hop is a The next hop is a …


PE. VAS. VM.


The breakpoint device has the ARP The breakpoint device does not have the …
entry of the destination IP address. ARP entry of the destination IP address.

Check whether the Check whether MAC Check whether the


… number of ARP entries address flapping occurs on access sub-interface is …
exceeds the threshold. the access sub-interface. down.

101 Huawei Confidential


Proactive Fault Identification and Analysis Service Assurance and Fault Locating

ERSPAN TCP Flow Analysis: Flow Troubleshooting (1)


⚫ The flow troubleshooting function displays ERSPAN mirrored packets that have undergone packet combination and request direction
identification and allows you to query the ERSPAN mirrored packets by multiple dimensions (including the source IP address, source
port, destination IP address, and destination port).

⚫ For SYN/SYN ACK packets whose Status is displayed as TCP Retransmission or Flow events in which packet status is abnormal TTL,
you can switch to the fault inference diagram as well as automatic and intelligent troubleshooting.

102 Huawei Confidential

• By default, only abnormal flow events (TCP retransmission, abnormal TTL, TCP
RST, and abnormal TCP flag) and long flows (TCP flows that are not terminated
within 10 seconds) are displayed.
Proactive Fault Identification and Analysis Service Assurance and Fault Locating

ERSPAN TCP Flow Analysis: Flow Troubleshooting (2)

⚫ Based on the expert experience


library and troubleshooting
process, iMaster NCE-
FabricInsight summarizes a
unified troubleshooting model
and provides an automatic
troubleshooting framework that
can be orchestrated and
requires no manual perception.

⚫ Troubleshooting actions involve


network checks. Users can

Troubleshooting based perform one-click


on the rule engine troubleshooting, improving the
troubleshooting efficiency.

103 Huawei Confidential

• Click Fault Reasoning to view the logical topology passed by the event. Compare
the current abnormal flow paths with the previous normal flow paths.

• After the paths are calculated for normal flows and abnormal flows, you can
click No Troubleshooting or Timely Troubleshooting to select a
troubleshooting mode. Troubleshooting can be performed only when abnormal
flow paths exist. By default, No Troubleshooting is used, indicating that
troubleshooting is not performed. If normal flow paths exist and Timely
Troubleshooting is selected, the system performs troubleshooting for log issues
based on the timestamps of normal flow paths and abnormal flow paths, and
performs troubleshooting for entry, configuration change, and firewall policy
blocking issues based on the timestamps of normal flow paths and current
timestamps. If no normal flow path exists and Timely Troubleshooting is
selected, the system performs troubleshooting based on the timestamps of
abnormal flow paths and current timestamps. Before performing troubleshooting
for configuration change and firewall policy blocking issues, the system
automatically synchronizes the latest configurations.
Proactive Fault Identification and Analysis Service Assurance and Fault Locating

Overview and Principle of Intelligent Network Search


⚫ The intelligent network search engine is an application constructed by iMaster NCE-FabricInsight based on the knowledge graph to
quickly search the real-time network information.

⚫ In the traditional O&M system, information is isolated and no association exists between configurations, entries, KPIs, and logs,
requiring multiple searches and manual matching. Intelligent network search improves the searching efficiency of network data,
including network resource data, device configuration files, device forwarding entries, "1-3-5" troubleshooting issues, device
exception logs, KPIs, and intent verification data. 4 Search service

2 ES Knowledge graph service 3

Index Knowledge
graph
Data replication service 1

Full & incremental data


synchronization

Resource Issue Configuration NetDiff Telemetry …


file
GaussDB GaussDB HDFS HDFS Druid

104 Huawei Confidential

• Technical principles:

▫ Data replication service:

▪ Multiple data source access mechanisms (Gauss DB, Druid, and


HDFS).

▪ Multi-source multi-mode data increment change awareness and


incremental synchronization.

▪ Full & incremental data synchronization efficiency and data


compression algorithm.

▫ Search service and elastic search:

▪ Multi-source multi-mode data source mapping.

▪ Indexed incremental change and remote data search efficiency.

▫ Knowledge graph service:

▪ Multi-source multi-mode data access and performance for querying


TDBs.

▪ Search performance and optimization in big data scenarios.

• Note:

▫ HDFS: Hadoop distributed file system

▫ TDB: trivial database


Proactive Fault Identification and Analysis Service Assurance and Fault Locating

Minute-level Fault Demarcation Based on Intelligent


Network Search
The network topology & hierarchical topology
completely display the association position of the object
and the information about other associated objects.

Key service data associated with the


object is comprehensively displayed.

Attention is suggested to be paid on


potential risks of identified objects.

105 Huawei Confidential

• The network search function searches for resources, entries, configuration files,
issues, and other objects on the network in a unified manner and displays
information such as metrics, associated applications, and entries of target objects,
as well as the recommended correlation analysis result. The search function can
efficiently search for target objects and their associated data, improving O&M
efficiency.
▫ Searches for objects such as devices, boards, interfaces, power modules, fan
modules, optical modules, ARP entries, routing table entries, configuration
files, and issues.
▫ Displays the physical topology and hierarchical topology of target objects.
▫ Displays the recommended correlation analysis result of target objects.
▫ Searches for issues and displays issue details.
▫ Searches for entries and configuration files.
• Scenario:
▫ In an enterprise, the network department receives a fault report from the
service department that an IP service is interrupted, requiring joint
troubleshooting.
• Solution: Search the VM IP address and obtain the comprehensive information
about the VM to quickly locate the failure point. The information includes:
▫ VM access location.
▫ VM access interface status.
▫ Whether congestions occur on the interface connected to the VM.
▫ Whether changes occur in the configurations of gateways connected to the
VM.
▫ Whether the incoming and outgoing traffic of the VM changes sharply.
▫ Whether the VM frequently goes online and offline.
Contents

1. DCN O&M Challenges and CloudFabric Intelligent DCN O&M Solution

2. iMaster NCE-Fabric

3. iMaster NCE-FabricInsight
▫ Overview

▫ Network Visualization and Health Evaluation

▫ Fault Locating
◼ Change Assurance

106 Huawei Confidential


Network Snapshot Analysis Intent Verification

Network Snapshot Analysis Principles


Timestamp 1 Timestamp 2

Network Network Analysis Data


snapshot 1 snapshot 2 Analysis Methods
Object Characteristics
Device, interface, and link First, implement structured processing
Resource
on collected snapshots to form the
<Key/Value> structure. For an object,
Resource
BGP, OSPF Key is the unique ID and Value is the
Protocol Structuralization
Protocol attribute. Identify added and modified
Entry
items based on changes of the key
Configurations during
and modifications based on changes
device running
of the value.
Configuration
Identify different contents of
ARP/ND/IPv4&IPv6
Configuration Text configuration files using the longest
routing table entries
Entry common substring (LCS) algorithm.
Board CPU, memory, error Identify whether changes are within
packet, number of lost packets, KPI Time series data the expected range based on the KPI
bandwidth, and traffic trend change rate.
KPI

⚫ The network snapshot refers to the data backup file running on a device at a specified time point. The first snapshot is the
synchronization of the full device data. A new snapshot is created based on incremental changes.

⚫ Changes of resources, protocols, configurations, entries, and KPI trends are managed in real time based on telemetry and network
changes are rapidly detected based on the comparison between snapshots at different time points.

107 Huawei Confidential

• Note:

▫ LCS: Longest Common Substring


Network Snapshot Analysis Intent Verification

Automatic Check on 16 Changes in Five Dimensions Based


on Snapshot Analysis (1)
Four steps of the DCN cutover tool

Step 1: information collection before a change

Create snapshot collection tasks before a change and provide snapshots of multiple performance metrics, such as device configurations, ARP entries,
ND entries, RIB entries, CPU usage, memory usage, and interface bandwidths.

108 Huawei Confidential

• Five dimensions: Configurations, Entries, Topologies, Capacities, and


Performances
Network Snapshot Analysis Intent Verification

Automatic Check on 16 Changes in Five Dimensions Based


on Snapshot Analysis (2)
Four steps of the DCN cutover tool

Step 2: information collection after a change

Automatically synchronize and analyze device configurations and entry snapshots after a change, supporting manual snapshot synchronization.

109 Huawei Confidential


Network Snapshot Analysis Intent Verification

Automatic Check on 16 Changes in Five Dimensions Based


on Snapshot Analysis (3)
Four steps of the DCN cutover tool

Step 3: automatic analysis on change results

Compare and analyze data snapshots before and after a change, visualizing the differences of each device.

110 Huawei Confidential


Network Snapshot Analysis Intent Verification

Automatic Check on 16 Changes in Five Dimensions Based


on Snapshot Analysis (4)
Four steps of the DCN cutover tool

Step 4: comparison details about change differences

Display detailed comparisons of configured entries and other dimensions of snapshots before and after a change, identifying configuration changes.

111 Huawei Confidential


Network Snapshot Analysis Intent Verification

Difficulties in Providing Service-oriented Assurance


Capabilities
How can we monitor the connectivity between subnet
Internet/
192.168.1.0/24 and the external network?
WAN
DCN services are frequently changed and are
Firewall Loopback
Egress
manually verified, leading to low efficiency.
router
⚫ Traditional O&M and verification methods:
Service NVE NVE Border
leaf leaf  Ping/Traceroute:
◼ Unpredictable result: In SDN networking, it is hard to
predict which gateway on a leaf node is pinged.
Spine
◼ Incomplete path coverage: Not all ECMP paths can be
Server leaf covered. The ping test passes but service packets cannot be
DVR DVR DVR DVR
transmitted. In addition, it is difficult to traverse network-
NVE NVE NVE NVE
wide services within a limited change time window.

vSwitch vSwitch vSwitch


 Packet mirroring:
◼ High deployment cost: The mirroring mode needs to be
Physical server Virtual server Virtual server Virtual server
VPC 1 VPC 1 VPC 1 VPC 2 enabled on each device on the entire network to
192.168.1.10/24 192.168.1.20/24 192.168.2.30/24 192.168.10.10/24 implement full-flow mirroring analysis.

112 Huawei Confidential


Network Snapshot Analysis Intent Verification

Intent Verification Overview

Business logic/strategy Passive and responsive maintenance -> Data-based AI-


driven maintenance
Intent
• The concept of Intent-Based Networking (IBN) emphasizes
the service-driven perspective. Intent is the input of users to the
system, aiming to convert service intents into network
00:00 12:00 23:59 configurations. As such, iMaster NCE-FabricInsight can obtain
+ the network status through data collection and analysis during
Playback Prediction
the entire running process, and perform closed-loop dynamic
Digital twin - consolidation adjustment to ensure that the actual behavior of the system is
and integration Real-time consistent with the service intents.
Intent-driven awareness and • Data plane verification (DPV) is used to verify IBN changes.
automation predictive By collecting network data after configuration changes, iMaster
Fragmented maintenance NCE-FabricInsight creates a model to check whether the actual
Distributed network forwarding behavior is consistent with service intents.
Based on the verification result, you can check whether the
Simplified ultra-broadband changes meet the expectation and causes issues. If an intent
infrastructure verification fails, you can locate the root cause, greatly
improving the O&M efficiency in network change scenarios.

113 Huawei Confidential

• DPV builds a model based on the data plane information on the DCN. The data
plane information includes forwarding entries of network devices, such as routing
forwarding entries, ARP entries, VXLAN tunnel connection relationships and
status, VXLAN peer connection relationships and status, as well as physical link
relationships and status on the underlay network. This information reflects the
actual forwarding behavior on the DCN.

• When service configurations change on a network, data on the forwarding plane


changes accordingly. As a result, service forwarding behavior is affected.

• This is where post-event verification comes in. This function is implemented


based on the collection, modeling, and analysis of network data plane
information, as well as service intents input by users.
Network Snapshot Analysis Intent Verification

Data Plane Simulation and Verification

Network data plane Network Algorithm Network-wide connectivity


information: modeling calculation
route, ARP, and VXLAN BD VRF BD Secure access/violation
isolation between tenants
Formal verification
Overlay network
Network configuration plane modeling
+ Routing blackholes/loops
information:
ACL, VLAN, VNI, BD, and VRF
Configuration consistency

Network topology
Underlay network External access to the internal
information:
network without passing
underlay and overlay
through a firewall

Data collection Network modeling Intent verification


iMaster NCE-FabricInsight collects iMaster NCE-FabricInsight models the underlay Solution models returned by the algorithm are
information about topologies, and overlay networks based on collected displayed as the verification result and root cause
configurations, and forwarding entries of information of the live network. It also conducts of issues in terms of reachability, consistency,
the current data center network at a high intent calculations by transforming network isolation, and existence, and are integrated with
speed. models into transfer functions. network health evaluations to notify users of the
intent verification status in a timely manner.

114 Huawei Confidential


Network Snapshot Analysis Intent Verification

Service Functions Supported by Intent Verification (1)


Intent Category Intent Subcategory Source

[Overlay] East-west reachability verification within a PoD or across PoDs on the same subnet Customized
[Overlay] East-west reachability verification within a PoD or across PoDs within a VPC on different
Customized
subnets
[Overlay] East-west reachability verification within a PoD or across PoDs between different VPCs,
Customized
without passing through a firewall
[Overlay] East-west reachability verification within a PoD or across PoDs between different VPCs,
Customized
passing through a firewall
[Overlay] North-south reachability verification within a PoD or across PoDs: communication between IP
Customized
Reachability addresses of hosts on a fabric and external IP addresses of a fabric
[Underlay] Communication between IP addresses within a fabric or across fabrics Customized
[Underlay] Traffic forwarding according to underlay routes within a fabric, such as communication
Customized
between BGP peers and between VTEPs of a VXLAN tunnel
Constraint-based forwarding path passing through one node to N nodes Customized
Verification and display of ECMP reachability Customized
Verification of route reachability between BGP peers on the entire network Preset
Verification of route reachability between VTEPs of VXLAN tunnels on the entire network Preset

115 Huawei Confidential


Network Snapshot Analysis Intent Verification

Service Functions Supported by Intent Verification (2)


Intent Category Intent Subcategory Source

Isolation Verification of whether two subnets (or IP addresses) are isolated from each other Customized
Verification of whether routing loops occur on the network Preset
Existence
Verification of whether routing blackholes exist on the network Preset
Verification of whether interface configurations on both sides of a link are the same, including the
Consistency maximum transmission unit (MTU) information, rate, duplex mode, auto-negotiation mode, working Preset
mode, VLAN ID, and IP subnet

116 Huawei Confidential


Network Snapshot Analysis Intent Verification

Intent Verification: Service Assurance Intent Input


⚫ Reachability and existence intents are preset in iMaster NCE-FabricInsight and user-defined intent rules are also
supported.

Service intent definition


Source service network Destination service network
segment definition segment definition

L4 port range of a service

Network path node definition


(For example, inter-subnet interaction Select the range of nodes covered by multiple
traffic needs to pass through a firewall.) paths passing by traffic.
(All specified nodes need to be passed through or
only one node needs to be passed through.)

117 Huawei Confidential

• Reachability intent: checks the network connectivity between source and


destination IP addresses. On the rule creation page, you can specify verification
rules such as the source IP network segment, destination IP network segment,
protocol type, port number, and devices through which traffic passes. From the
perspective of access direction, DPV can be used to verify east-west and north-
south access traffic in a single fabric and across fabrics. From the perspective of
the network plane, DPV can be used to verify underlay route reachability and
overlay service reachability in a single fabric and across fabrics.

• Isolation intent: checks whether the source and destination IP addresses are
isolated. Isolation intents are generally used for verifying the network policy
compliance. For example, they can be used to check whether the security policies
of firewalls are as expected. The page for creating an isolation intent rule is
similar to that for creating a reachability intent rule, except that you do not need
to set the transit node on the former page.
Network Snapshot Analysis Intent Verification

Intent Verification: Status Query

118 Huawei Confidential

• Intent verification result overview: The overview area on the Intent Verification
Overview tab page displays the intent pass rate, distribution, and trend. The
intent pass rate distribution is displayed in terms of reachability, isolation,
existence, and consistency. You can switch the time range to view the intent pass
rate trend in a specified time range.

• User-defined reachability intent: In the Intents list, you can click a reachability
intent verification result link to view the detailed verification result.
Quiz

1. (Multiple-answer question) Which of the following are covered by "1-3-5"


troubleshooting? ( )
A. Service

B. Device

C. Network

D. Interface

E. Protocol

119 Huawei Confidential

1. ABCE
Summary
⚫ Currently, with the rapid increase of services and traffic, it is a must-have to implement
effective, flexible, and fast O&M. CloudFabric intelligent DCN O&M solution enables O&M
engineers to implement O&M in an intelligent way rather than by themselves.
⚫ This course describes the multi-dimensional, refined, and visualized O&M capabilities
provided by iMaster NCE-Fabric, helping to solve the problems of mixed physical and virtual
devices, blurred O&M boundaries of network and IT devices, and decoupling of physical and
logical networks. Various intelligent O&M functions provided by iMaster NCE-FabricInsight
are also introduced, including network visualization, network health evaluation, "1-3-5"
troubleshooting, service flow analysis and troubleshooting, and intent verification, with an
aim to solve problems such as traditional passive O&M and difficult fault locating, and
provide ubiquitous application and network assurance.
120 Huawei Confidential
Thank you. 把数字世界带入每个人、每个家庭、
每个组织,构建万物互联的智能世界。
Bring digital to every person, home, and
organization for a fully connected,
intelligent world.

Copyright©2023 Huawei Technologies Co., Ltd.


All Rights Reserved.

The information in this document may contain predictive


statements including, without limitation, statements regarding
the future financial and operating results, future product
portfolio, new technology, etc. There are a number of factors that
could cause actual results and developments to differ materially
from those expressed or implied in the predictive statements.
Therefore, such information is provided for reference purpose
only and constitutes neither an offer nor an acceptance. Huawei
may change the information at any time without notice.

You might also like