0% found this document useful (0 votes)
98 views51 pages

Intelligent Server Full-Lifecycle Management Software

The document discusses intelligent full-lifecycle management software for servers. It describes challenges with traditional server O&M including incomplete fault management, high costs, and difficulties with edge deployments. The proposed solution, FusionServer, uses technologies like FusionDirector and iBMC for centralized management, monitoring, and intelligent functions to improve efficiency and stability of server O&M.

Uploaded by

ops sks
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views51 pages

Intelligent Server Full-Lifecycle Management Software

The document discusses intelligent full-lifecycle management software for servers. It describes challenges with traditional server O&M including incomplete fault management, high costs, and difficulties with edge deployments. The proposed solution, FusionServer, uses technologies like FusionDirector and iBMC for centralized management, monitoring, and intelligent functions to improve efficiency and stability of server O&M.

Uploaded by

ops sks
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Intelligent Server Full-Lifecycle

Management Software

Security Level:
1 Trends and Challenges of Enterprise ICT O&M

2 Intelligent Full-Lifecycle Management for Servers

3 FusionServer Server Full-Lifecycle Security System

2
DC Server O&M Evolution

Stability & Security Services & Productivity

Intelligent
Automated
Tool-based  Automatic processing
Human labor & based on system behavior
scripting  Transition from using tools • Massive data collection
with manual intervention to • Massive experience collection
 Partially tool-based
unattended maintenance  Challenges
 Standalone script operations
• Streamlined process and • Data accuracy and formatted
 Batch-operation script  Challenges
standardization collection of experience
 Manual configuration • Success rate
• Requires support at • Feature extraction based on
 Challenges • Stability machine learning
architecture level
• Low O&M efficiency, high • Performance
 Challenges
labor costs, and high • Standardization
error rate • Success rate
• Fragmented O&M tasks
• Architecture support
capability

3
Challenge 1: Incomplete, Inefficient, and Reactive Fault Management

Complex fault types and incomplete coverage Difficult to locate

New media and component types as well as product complexity are


increasing rapidly, while only about 70% of server hardware alarms
(out-of-band) can be managed. Personnel
disparity
Slow
response Difficult
Difficult locating
30% are
in-band alarms,
Hard Disk monitor
ing
not covered
(SSD、Hard Disk、 O&M Various
DDR、NIC)
and
complex
70% are out-of-band devices
alarms, covered Varied
(CPU Core、DDR、SSD、Fan、 vendors Spare
Hard Disk、Power、Sensor) parts

Source: XX data center survey

4
Challenge 2: O&M Is Costly and Difficult

Long deployment cycle High power Difficult edge


Low efficiency
consumption management

24/7 manual maintenance Network scale:


45 person-months 50%
10,000+ devices
Infrastructure planning & OPEX for electricity
design expenditure

12 person-months
Deployment acceptance ¥ 9 million
Annual electricity fee for
a 1000 kW data center Edge locations: extreme cold,
seaside, roadside, mountains

• Overlapped subsystems,
• Difficult and costly to manually
complicated engineering
deploy, maintain, and patrol
interfaces, and long
• Traditional data centers have a • Siloed data center with separated massive devices
construction cycle
PUE of about 1.5 management and reactive • Frequent updates and
• Impossible to simply replicate • Difficult to access mains supply maintenance iterations of edge services
customization, hardware
• High electricity fees • Equipment room depends on 24/7 • Complex deployment
installation, and software
manual maintenance environments
commissioning operations

Source: XX data center survey. The preceding data is based on the scale of over 2000 servers in XX data center equipment rooms.

5
1 Trends and Challenges of Enterprise ICT O&M

2 Intelligent Full-Lifecycle Management for Servers

3 FusionServer Server Full-Lifecycle Security


System
4 Application Scenarios and Success Cases

6
FusionServer Intelligent Computing Server O&M Solution

Intelligent Intelligent
FusionDirector O&M brain: Intelligent deployment
maintenance update
Centralized Centralized server O&M in batches,
reduced O&M OPEX, and improved
management overall stability of servers in data centers Intelligent discovery Intelligent power saving

Baseboard management system: Environment


Fault diagnosis Integration interface
iBMC Independently manages the running monitoring
Out-of-band environment and hardware status of
management servers, visualizes server management, Firmware Configuration Energy saving
and improves server ease-of-use and management management management
reliability.

Assistant management agent: Slow disk Memory Performance


iBMA Monitors server hardware from OS, detection isolation monitoring
In-band reduces unexpected OS breakdown,
management and implements comprehensive Hardware monitoring on OS Performance monitoring
hardware management.

Software Type Positioning Feature

7
Management Software Networking

FusionDirector Redfish/SNMP
Web / Portal Centralized management

Upper-layer OSS
O&M personnel

Re
dfi
MP sh/
SN
sh/ SN
dfi MP
Re
Redfish / SNMP

Server 1 Server 2 Server 3

iBMC iBMC iBMC


Onboard out-of-band management Onboard out-of-band management Onboard out-of-band management

Redfish Redfish协议 Redfish协议

iBMA iBMA iBMA


OS OS OS In-band hardware management
In-band hardware management In-band hardware management

iBMA: installed after OS installation. iBMC: shipped onboard with server hardware. FusionDirector: installed on a standalone server or virtual machine (VM).

8
Five Intelligent Technologies Provide Top-level O&M Efficiency

AI-based intelligent fault diagnosis for


CPU, memory, drives, and power supplies

Intelligent Fault diagnosis accuracy: 93%


maintenance

Streamlined deployment of
hardware, OSs, and databases Automatic version updating
On-demand working mode switching Intelligent Intelligent Automatic version completeness
deployment update
20x higher update efficiency
10x higher deployment efficiency

30% higher O&M efficiency

Intelligent
AI-powered, self-tuning energy-saving technology Intelligent Automatic stocktaking for assets
power
Dynamic Energy Management Technology (DEMT) discovery within seconds Real-time track tracing
saving
18% lower server power consumption 100% accurate

9
Intelligent maintenance: Transforming from Reactive to Proactive O&M

Diagnosis
Provides online and offline
FDM intelligent fault diagnosis tests for CPUs,
management engine, memory, and hard drives,
93% diagnosis accuracy and automatic software and
for CATEER faults hardware fault demarcation

Prediction
Monitoring Use AI algorithms to Among the server hardware faults in data
predict risky hard centers, hard drive faults account for 48%+
Fully out-of-band drives 7–30 days in
monitoring of advance
hardware faults

Isolation

Isolates DIMM and PCIe Isolates faulty DIMMs and hard drives to
faults, minimizing service keep services always-on
downtime

10
Intelligent maintenance:Intelligent Fault Diagnosis with 93% Accuracy

Expert diagnosis and prewarning libraries


Complete component monitoring
1. Automatic fault data collection by BMC
1. Monitors the health status of all hardware components and
sensors in an out-of-band manner 2. FusionServer's over 20 years of x86 server fault data
accumulation
2. Supports both in-band and out-of-band data links for fault data
collection, no matter whether the OS is normal or not 3. Fault diagnosis by intelligent management chip

CPU System normal In-band fault


collection

In-band detection data


Info Raw data
receiving
iBMC Fault occurs Out-of-band
Storage fault collection
System down
PCIe
Management
Out-of-band chip Parsed data
detection data
analysis Diagnosis
Expert
library
Memory Others Output
Mainboard, board power supply,
links, PSUs, and fan modules
Prewarning
Expert
library

Information collection diagram Fault locating diagram

11
Intelligent Maintenance: AI Powered Disk Fault Prediction, Ensuring Data Security

Risky disks can be predicted 7 to 30 days in advance.

Real-time Exact match Reliable Efficient


Drive data inference Forecast accuracy ≥ 90% Reduced concurrent disk failure rates 90% less manpower for fault locating

③ FusionDirector local inference

Inference results

Inference Drive status


models data

① Latest inference model Real-time data collection ②


HDD/SSD
Feature running data
model of Real-time
faults drive status
data
Million-level drive data Real-time
Continuous update of massive samples fault
Model training diagnosis iBMC
Self-evolving

FusionServer data lake on the


Servers on the live network
cloud
12
Intelligent Maintenance: AI-Powered Memory Self-healing Minimizes Server Breakdowns

Conventional Problems ④ FusionDirector


Result display
Others 39%
• Memory faults have severe impacts and cannot be avoided.
Memory 61% • The system breakdown rate caused by memory faults is up to 61%. inference result
• Memory faults can be rectified only by manual replacement.
Memory status data

AI powered memory self-healing


Manufacturing data
5 million+
① Big data training platform ② Fault inference module ③ Fault self-healing module
System
breakdown rate
Hard isolation
O&M devices

50
Fault models Inference results
1 million+ Cloud-based big data AI-based fault
Soft isolation
training platform inference module %
O&M memory module Fault pre-warning
10 million+

Cloud training Local real-time inference Imperceptible self-healing

Millions of training samples Continuous self-tuning Software and hardware isolation and repair

13
Intelligent Update:
Cloud-Device Collaboration Ensures Latest Software Versions for Devices

Before:  The versions on the live network are incomplete, causing difficulties in troubleshooting and posing high security risks.
Frequent manual intervention and high  The upgrade plan is manually developed and downloaded, which requires heavy workload.
network security risks  Insufficient concurrent upgrade capability results in time-consuming and labor-intensive upgrade.

1 Automatically push version information 4 One-click automatic batch upgrade


Version release

R&D 2 Automatically download the version packages 3 Automatically check matching versions
engineers Servers
HOUP system FusionDirector
on the cloud management software

Now:  Auto detection, auto completeness, auto download, auto upgrade, and end-to-end auto version management.
Full automation enables  With update schedule configuration, O&M engineers do not need to prepare update schemes, download update
20x higher update packages, and execute updates.
efficiency  Enhanced batch upgrade capability increases the number of devices that can be upgraded in a batch from 20 to
100, greatly shortening the upgrade time.

14
Intelligent Energy Saving: DEMT 2.0 Reduces Server Energy Consumption by 18%

• DEMT 2.0 is a set of technologies that dynamically and intelligently adjust the power consumption of each component in real time.
• According to statistics, the cost on electricity consumed by a single server in three to five years is approximately equal to the price of the server. For a
customer with 10,000 servers, DEMT 2.0 can save equivalent CAPEX from 400 servers each year.

Intelligent DEMT 2.0 energy-


saving technologies

Enhanced technology

Frequency
Hibernation DTS MPC-PID Loss reduction
modulation

The fan speed is


Smart AI algorithms adjust
controlled within the Three patented
the power consumption
The CPU working DIMMs, drives, and margin range based on technologies
and find the lowest power
frequency is adjusted PSUs with light loads are the relationship between • Bridgeless PFC
consumption point of
based on the actual hibernated based on the the CPU performance • Polyphase
servers based on the load,
service load. actual service load. and temperature margin integration
temperature, performance,
to reduce the power • Interleaving
and fan speed.
consumption of servers.

18% higher energy efficiency for servers

15
Intelligent Energy Saving:
Machine Learning-Based Algorithm Enables the Minimum Device Power Consumption

Technical principles of MPC-PID Maximum power saving anytime

Based on machine learning algorithms, AI calculates the target CPU


Power consumption comparison in the 24-hour
temperature in real time when a server has the lowest power consumption. By
adjusting the fan speed and CPU parameters, the power consumption of the
running in the virtualization scenario
server can be minimized.

Load percentage
Power
consumption (W) Industry FusionServer Load

800 100
The PID command is used
to adjust the fan speed. No performance loss 700 90
Minimum server 80
600
power consumption 70
500 60
CPU target temperature
400 50
300 40
MPC-PID: FusionServer's machine
iBMC learning-based energy saving algorithm
30
200
20
100 10
MPC-PID: FusionServer's machine learning-based energy saving algorithm
Model Predictive Control (MPC) is a predictive control model that controls processes under 0 0
specific restrictions.
Proportional-Integral-Derivative (PID) adjusts control parameters, such as the proportion gain,
0 2 4 6 8 10 12 14 16 18 20 22 24
integral gain, and time.
Time
The differential gain and time are used to achieve the optimal control effect of the system.
Note: 24-hour running effect in the virtualization scenario

16
Intelligent Discovery: Automatic Stocktaking in Seconds and Real-time Track Tracing
Traditional stocktaking of 1,000 cabinets twice a year requires 200 person-days. FusionDirector provides the auto stocktaking function,
which can be done in seconds, saving US$100,000 each year.

Traditional IT asset
CMDB/Others
management
Automated inventory
Output the stocktaking report in
FusionDirector management system seconds.
Major challenges: Device Part stocktaking
information
CPUs, DIMMs, hard drives, PSUs,
• Asset stocktaking is Device/Component + RAID controller cards, and fan
difficult and error-prone.
• Low manual maintenance
information
Rack management board
Cabinet/UID/SN
information Location
= modules
Location tracking
efficiency and high costs +
• Asset change and loss The location is updated in real time,
risks UID 3D display and the change track is
sensor
• Cabinet space * U space anti-interference dynamically displayed.
technology, leading in the industry
management is difficult, UID
Resource display
and the usage is low. sensor
Zombie server discovery, display of
cabinet space and power supply
usage, improving usage

17
Intelligent Deployment: Faster, More Efficient Service Rollout

75% work done by tools 10x higher rollout efficiency


Streamlined deployment From hardware planning to upper-layer software

Hardware Server Software


commissioning commissioning commissioning Acceptance

HANA system Customer case:


FusionServer Server deployment deployment More than 100,000 servers are deployed.
IP address allocation, OS deployment, and
deployment solution: device acceptance are automated. The
Project planning Asset storage OS Installation Configuration Plan
Deployment challenges: rollout efficiency is over 10 times higher.
• Long device/HANA rollout
Interconnection Automatic OS
period BMC IP address
Demand/Transfer
configuration
with ManageOne installation Rollout efficiency HANA system
• Complex configuration Maintenance Portal (servers/person- deployment time
• Time-consuming acceptance
day) (hour)
Software and Automatic HAHA
Asset check Asset check
configuration database 150 24
verification installation

• Planning in advance by physical Installation and Configuration & Check Asset acceptance Automatic HANA
location, SN, or MAC address power-on HA configuration
• Fixed basic configuration and
batch configuration Password change in
• OS image, out-of-band Onsite acceptance Acceptance report Onsite acceptance
batches
deployment
• One-click HANA installation and 12
3
Tool-based automatic
HA configuration Manual operations Tool-based FD
HANA deployment
Manual Manual Tool
人工 FusionDirector 人工 HANA部署工具

18
FusionDirector Installation Environment Requirements

 FusionDirector deploys microservices based on the microservice container


VM Type ISO Format Version
architecture and deploys "OS+container+microservice" as a large package.
 FusionDirector can be embedded in the MM920 management module of an 6.0 update3hl
E9000 server, installation free and plug-and-play.
 FusionDirector supports VM installation and multiple hypervisor VM formats in 6.5a
enterprise scenarios.
6.5hl
 FusionDirector supports deployment with a minimum VM environment of "2C/8 VMware vSphere
.ova
GHz/500GB, single network port" and supports deployment on a laptop. The ESXi
6.5u1hl
recommended typical configuration is "4C/8 GHz/500GB, single network port".
6.5u2
FusionDirector.iso 6.7

Microservice Microservice Windows Server 2012 R2


APP APP
Microsoft Hyper-V vhdx.zip
…… Bin/Libs
Bin/Libs
Bin/Libs Windows Server 2016

Docker Engine CentOS 7.6

KVM .qcow2 RHEL 7.3


Host OS
RHEL 7.5
Hardware

19
FusionDirector for Device Management

Advantages:
 Supports display of information about 60 components.
 Supports a wide range of devices, including mainstream FusionServer
 Supports periodic monitoring over server performance, facilitating
servers and third-party servers.
performance analysis.
 Supports flexible management that allows manual addition, batch import,
 Supports managing a maximum of 12,000 devices.
DHCP, and SSDP automatic discovery.

Supported devices:
FusionServer: V3 rack servers (RH1288, RH2288, RH2288H, 5288, RH5885, RH5885H, and RH8100); V5 rack servers (1288H, 2288, 2288H, 2298, 2488,
2488H, 5288, 5288X, 5885H, 8100, 1288X, 2288X, and 2288X LBG-1); V6 rack servers (1288H、2288H、2488H), and E9000 chassis with MM910,
MM920, or MM921 management modules
Atlas: heterogeneous servers (G560, G530 V5, G560 V5, and G2500); intelligent edge hardware (Atlas 500 AI edge station and Atlas 200 SoC); Atlas 500
AI servers; Atlas 800 AI servers (Atlas 800 3000, Atlas 800 3010, Atlas 800 9000, and Atlas 800 9010), and Atlas 900 AI clusters (Atlas 900 compute node)
TaiShan: TaiShan 100 servers (2280、5280、2280K); TaiShan 200 servers (1280, 2180, 2280, 2280E, 5280, X6000 XA320, 2180K, 2280K, 2480, and
5290)
Third-party servers (advanced features supported): Foxconn servers (NW8220 and NW8220X); H3C servers (S920X00, H3C UniServer R5360 G3, and
S920X05); Ramaxel servers (RS221G1, RS221G1H, R521G1X, and R421G1Q); Digital China servers (KunTai A924); QuantaGrid(D52BM-2U);
HUAQIN(H8001D, P6220)
Third-party servers (alarm monitoring supported only): HPE ProLiant DL380 G10, HPE ProLiant DL380 G9, HPE ProLiant DL380 G8, Dell PowerEdge
R740, Great Wall R5215_G11, Inspur NF5280M5, NVIDIA DGX-2, H3C R4900 G3, MiTAC NV680, and NV681

Operation Operation Operation


Minimum
Duration Duration Duration Typical Three-Node
Function Configuration
(Minutes/ (Minutes/ (Minutes/ Configuration (PCS) Configuration (PCS)
(PCS)
Server) 20 Servers) 100 Servers)

Add rack servers 0.5 1 5 2000 5000 12000

20
Northbound Interface (NBI)

Upper-layer NMS FusionDirector provides open NBIs, uses the Redfish protocol,
and supports SSL to implement seamless interconnection with
Redfish SNMP upper-layer network management systems.

Query/Configuration Interface Type Description


Fusion management interface
Director Alarm interface
Queries server health/online status and
Information query basic information (SNs, configuration, RAID,
DB etc.); obtains the infrastructure list.

Configures server power-on and power-off,


Setting/
SNMP
IP addresses, virtual media URIs, and
Redfish Configuration
device connectivity testing.

OS deployment Deploys FusionServer server OSs.


iBMC iBMC iBMC
Upgrades FusionServer server firmware
KunLun
Firmware upgrade
FusionPoD FusionServer Pro such as BMC, BIOS, and PICe.

21
iBMC Overview

Intelligent Baseboard Management Controller (iBMC), a server lifecycle management system

BMC hardware: Intelligent management chip

1710 single-core 800 MHz


CPU
O&M Monitoring 1711 quad-core 1 GHz

1 dedicated GE port for BMC management, 1 VGA port,


Other ports
and 1 RMII

BMC software: FusionServer's proprietary embedded server management


system
Industry
Deployment Upgrade management IPMI 1.5/2.0, SNMP, Redfish
standards
Provides out-of-band management functions, such as
Key features monitoring, diagnostics, configuration, agentless, and
remote control.

22
BMC Chip:Secure, Intelligent, Mobile, and Highly Integrated Platform Management

Application system architecture of the BMC chip Key competitive edges

Intelligent O&M

DIMM
x86/ARM • Built-in machine learning engine, implementing
Remote intelligent O&M on a single server
KVM/VNC • Integrates various O&M interfaces and provides
VGA massive real-time machine data to implement system-
level intelligent O&M
eMMC
Flash Management Chip
eSPI/LPC Security features
SFC AI Core x2 • Independent security core + security engine supporting
boot PCIe trusted boot
Southbridge • Built on FusionServer's controllable trusted root inside
Computing Core x2 USB the chip
DDR4 • Independent security core + security engine provides
DDR
the public cloud tenant identity authentication function.
Secure Secure The key cannot be stored or used outside the chip.
Module Boot SPI SPI
BIOS
Rich RAS functions
• Integrated black box, ensuring no loss of key system
information
ADC I2C/GPIO NC-SI • Integrates the intelligent running monitoring engine to
NIC implement system recording and blue screen capture
Fan module/ FRUs such as RAID controller
PSU card/SSD card/NIC

23
iBMC V2 Key Feature Overview

iBMC V2
Key Features
Monitoring and intelligent diagnostics
1 Monitoring and intelligent diagnostics Provides comprehensive monitoring of hardware health and
1 intelligent, accurate, OS-independent fault diagnostics.

Zero-touch O&M

2 Multiple remote control tools, such as KVM, SOL, virtual media


Zero-touch O&M 2 and web access
iBMC
intelligent server Standardization and ecosystem
management system
Complies with standard management protocols and mainstream
3 Standardization and ecosystem 3 management software, and implements automatic, intelligent
server management.

System security management


4 System security management
Provides comprehensive protection from the hardware layer,
4 system layer, application layer, and access layer to ensure system
security and reliability.

24
Monitoring and Intelligent Diagnostics

Implements 360° monitoring and diagnostics to ensure stable service system running.

Status monitoring
• Monitors CPUs, memory,
drives, PCIe cards, RAID
controller cards, fans,
PSUs, temperature, and
voltage.

Diagnosis assistance Diagnostics

• Power-on and power-off • CPU CATERR events


video recording • PSU faults
• Serial port audio recording iBMC • Overtemperature alarms
• Breakdown screenshot • Fans with potential risks
• Breakdown video • Storage medium faults
recording

• Blackbox - Kbox
• Analytical tool - hwkbox

Running record

25
Zero-Touch O&M

Implements full-lifecycle server management over network anytime, anywhere.

Deployment Configuration Upgrade Maintenance


• OS installation without • Out-of-band RAID • Package-free • Power control
CD-ROM drive configuration upgrade • Location positioning
• Out-of-band network • Configuration • Upgrade taking (UID)
deployment import/export effect immediately • Offline diagnosis
• Tool for batch • Support for • Driver update • Query of startup self-
deployment stateless computing • Upgrade rollback check code

Agentless expansion Programmable


Virtual KVM Virtual media SOL SSO Web UI NCSI
management API

Various remote access and management tools

26
Various E2E Remote Maintenance Tools, Making Devices Accessible

Local or remote access at any time HTML5

Initial IP address configuration Standard VNC tool

Alarm monitoring for routine O&M JAVA-independent, JRE-dependent

UID indicator control SmartServer


Compatible with mainstream browsers

Mobile app KVM

Virtual media
Web access

Virtual DVD-ROM drive, virtual folders, and USB key


VMM path configuration interface
HTTPS-based visual management interface
JNLP start mode, without JRE
Mainstream browsers (Internet Explorer, Chrome, Firefox)
SSH to SOL changeover
Alarms via e-mail notifications
SOL
Serial port audio recording
Graphical user interface (GUI)

27
Convenient Out-of-Band RAID Configuration: Configuring RAID on the BMC GUI

Out-of-band RAID monitoring configuration effectively resolves the automation environment preparation problem
during the OS installation, shortening RAID configuration time by 50%.

28
HTML5-based KVM Eases Operations

VNC KVM HTML5 KVM

JAVA-free, ActiveX-dependent
Compatible with mainstream
VNC desktop tools Compatible with mainstream web browsers
Supports raw device
multi-tenant scenarios Supports automatic mouse synchronization

29
Chip-based System Security Protection, iBMC implements data and access security
protection for the service and management systems.

Ensure system security from the source to prevent data tampering

Access
layer
• 100% BIOS source code
• Provides TCM/TPM interface and supports TPM2.0/SMX
algorithms
• BMC image storage prevents malicious tampering on firmware

layer
Data
Ensure data storage and transmission security

• Supports HTTPS, SSH, SFTP, and LDAPS


• Uses encryption algorithms AES128, AES256, and RSA2048

Hardware layer
Ensure access security by multiple authentication mechanisms

• Access validity verification


• Principle of least privilege
• Two-factor authentication
Secure
engine

30
iBMA Assists iBMC to Implement Comprehensive Hardware Monitoring

iBMA is the OS agent of FusionServer servers. It monitors and manages the hardware on the OS and assists the iBMC to monitor
the hardware status in a 360-degree manner.
Key functions include memory fault isolation, hard drive health monitoring, performance data collection, and hardware log reporting.

• OS
• Red Hat • Windows • RAID controller card &
• SUSE • ESXi VMware • FPGA
hard drive
• CentOS • EulerOS • GPU
• NIC and optical module
• Ubuntu • OpenStack • NVMe
• HBA & CAN
• Citrix • IB card
System Information
compatibility collection
• iBMC service to in-band • CPU usage
(SSH/HTTPS) Performance • Memory usage
iBMC service collection
• iBMC event in-band reporting (BoB • Hard drive usage
SNMP Trap) • Network port usage
• iBMC event to in-band storage
In-band
management
Fault check
• RAID, hard drive, and BBU health
• Link bit error
• Driver update
• NVMe health
• iBMA automatic upgrade
• PCIe card connection
• Software management
• Memory fault isolation

31
iBMA In-Band Fault Management Improves Fault Management Capabilities

Memory fault OS
Drive fault
management fault management
management

• Reads SSD hard drive • Hardware fault


• Obtains and reports wear value information collection
memory alarm • Hard drive SMART • The black box function
information information reading and saves breakdown
• DDR4 memory fault alarm reporting information when the
isolation • SAS and SATA hard OS breaks down,
• 15% less drive fault prediction helping locate software
unexpected system • Hard drive health check and hardware faults.
breakdown • Predicts hard drive faults • Automatic restart when
one week in advance to the operating system is
reduce loss suspended

32
1 Trends and Challenges of Enterprise ICT O&M

2 Intelligent Full-Lifecycle Management for Servers

3 FusionServer Server Full-Lifecycle Security System

4 Application Scenarios and Success Cases

33
FusionServer Server Full-Lifecycle Security System

Secure startup Secure runtime Secure data flow Security compliance Secure decommissioning

Secure data processing


Internal server data Compliant with laws
Ultimate firmware protection Runtime detection and protection and
protection and regulations
IT infrastructure

 Secure boot based on  iBMC and BIOS platform firmware  Industry-recognized  CC EAL 2+ security  Secure offline drive
chip-level root of trust resilience. secure certification. erasure; nine rounds
(RoT) and integrated with  Management for keys, certificates, communication  ISO 27001 of full-drive deep
RoT security core that accounts and passwords; security protocols. management system erasure.
cannot be tampered with. isolation and least privilege; stack  Industry's latest certification.
 Signature authentication protection; vulnerability exploitation secure encryption
for software package prevention. algorithms.
building and release,
preventing information
leakage and tampering.

34
Secure Boot Based on Chip-Level RoT
Integrated RoT security core is safe and cannot be tampered with.

Proprietary BMC chips Chip-level trusted root secure boot

Integrated RoT security core Secure boot process:

Chip HOST
BootROM Daemon BMC BMC Southbridge
BIOS OS BOOT HOST OS
RoT FW uboot OS FW
Chip Verify Verify
Verify Verify Verify

ROT
Security core
Secure boot architecture:

EFUSE Security
engine
Chip
RoT X86 bootROM

Security core
Security core
EFUSE Security PCH SSD
engine
engine

MUX

BMC Flash BIOS FLASH

35
Secure Boot: Signatures for Software Package Building and Release, No Information
Leakage and Tampering

Encrypted Automatic
signature for the signature for the Account Runtime
upgrade package release package authentication lock BIOS
Version Software Support
website BIOS Flash
development version

Trusted upgrade
Download over
untrusted
networks Authentication
Verification
and encrypted
and upgrade
Customers Internet communication BMC

Protection for software version data security Protection for access data security

 The software compilation process is encrypted and signed to ensure


that data packets are not tampered with or disassembled by malicious  Local access: BIOS accounts are authenticated locally, and
users. configuration changes are logged. The BIOS flash is locked when
 The software integrity is verified with digital signature when it is the BIOS is running.
released on the Support website.  Remote access: All external access protocols of the BMC use
 Customers can verify the signature after downloading the software enhanced encryption for communication, requiring authentication
package. for access and login.
 The signature of the upgrade package is verified during the upgrade.

36
Secure Running: Server Runtime Detection and Protection

BMC and BIOS platform firmware resilience Runtime detection and protection

 The BMC and BIOS firmware integrity is protected. Digital


signatures of upgrade packages are verified based on the hardware
RoT, preventing the firmware code from being maliciously tampered Key management
with. Provides hierarchical full lifecycle management for keys to
 The BMC and BIOS firmware integrity is checked based on the reduce the risks of password cracking or data leaks.

hardware RoT to detect unauthorized malicious modification to the


Certificate management
firmware. Supports the issuing device certificates on the customer's
 The BMC and BIOS can be automatically recovered. The firmware live network. Device certificates are managed in a unified
has backup images. If malicious modification to the firmware is manner.
detected, it can be immediately recovered to the backup image Account and password management
version. Manages man-machine system accounts and change
account passwords in batches.

Protection Security isolation and least privilege


Minimizes process and OS account permissions; sensitive
processes are strongly protected and isolated; external
uncontrollable input is isolated using the sandbox.
BMC/BIOS Image Active

BMC/BIOS Image Gold Stack protection and vulnerability exploitation prevention


Uses the GCC and VS security options to prevent buffer overflow attacks.
Restoration Detection
BMC/BIOS Image Temp

Firmware resilience Firmware images

37
Data Flow Security: Secure Protocols Mitigate Threats on Information Leakage

 FusionServer Networking Security Competence Center keeps up with the latest security trends in the industry and
continuously improves protocol security hardening standards.
 Insecure communication protocols are unacceptable, and secure protocols and security configurations are supported.

Security hardening Industry-recognized secure communication protocols


Unacceptable Acceptable
 Basic security Telnet (plaintext transfer) SSH (encrypted transfer)
mandatory
requirements FTP (plaintext transfer) SFTP (encrypted transfer)
 Web security IPMI1.5 (plaintext transfer) IPMI 2.0 (encrypted transfer)
development
SNMPv1/v2c (plaintext transfer and weak SNMPv3 (encrypted transfer and strong
standards
identity authentication) identity authentication)
 Apache hardening
standards HTTP (plaintext transfer, no identity HTTPS (encrypted transfer and server
 OS hardening authentication on the server side) certificate verification)
standards LDAP (plaintext transfer, no identity LDAPS (encrypted transfer and server
authentication on the server side) certificate verification)

38
Data Flow Security: Secure Algorithms Prevent Information Leakage

FusionServer always follows the latest industry security dynamics and develops cryptographic algorithm application specifications.
Insecure encryption algorithms are unacceptable and only industry-recognized secure algorithms are acceptable.

KVM over IP: management channel encryption Industry-recognized secure encryption algorithms
Encryption
algorithm Unacceptable Acceptable
USB data
specifications DES/3DES (can be cracked
Ethernet data

Displayed data down within one day by


Keyboard
 Basic security
mouse data
existing brute force cracking
requirements
devices) AES128/AES192/AES256
 Cryptographic
algorithm RC2/RC4 (have vulnerabilities
application and can restore plaintext in
Displayed data specifications encrypted information)
 Key
management MD2/4/5 (can construct two
specifications pieces of data with the same SHA256
hash value)
RSA1024 (claim exists that the
RSA2048
algorithm has been cracked)

39
Secure Decommissioning: Secure Data and Infrastructure Processing and Secure Offline
Drive Erasure

Supports data erasure for drives and NVMe drives managed by RAID controller cards, software RAID, and southbridge
pass-through (AHCI).

Fast mode: erasure of a logical drive partition Safe mode: deep erasure (nine rounds) of a physical drive

40
Thank you.
Fusion X, Digital Infinity

Copyright©2021 xFusion Digital Technologies Co., Ltd.


All Rights Reserved.

The information in this document may contain predictive


statements including, without limitation, statements regarding
the future financial and operating results, future product
portfolio, new technology, etc. There are a number of factors that
could cause actual results and developments to differ materially
from those expressed or implied in the predictive statements.
Therefore, such information is provided for reference purpose
only and constitutes neither an offer nor an acceptance. xFusion
may change the information at any time without notice.
Intelligent Deployment:
Flexible Working Mode Switching, Enabling Smooth Application Running

One-click switchover, easy to use


Present situations
and challenges
• Diversified scenarios
HPC Big Data Web server Distributed DBs Virtualization High performance, reliability, and
• More than 80% storage throughput, and low power
servers do not work consumption and latency
in optimal mode.
• There are strict
requirements on iBMC BIOS • Easy to use
server optimization Best practice One-click switchover, easy to use
Out-of-band
and O&M engineer's expert library Optimal configuration combination
control
capabilities
• Services change One-click configuration
dynamically, and the 10 best practice scenarios
working mode Hardware
cannot be adjusted
as and when 20-year expert library
needed. CPU Memory PCIe Other

10,000+ POC tests

42
Intelligent Care: Connect to FusionServer Service Center for Proactive Customer Care

Customer FusionServer
GTAC
FusionDirector
Internet

pted
Encry sion E-mail server
is
Customer mailbox transm eService Back-end FusionServer
FusionServer eService front-end technical support
servers center

Customer engineer (supervision) Dedicated VPN channel for remote processing

Dispatch work orders

FusionServer frontline FusionServer service


engineers engineers

FusionServer's Call Home function reduces information transfer costs and improves customer
experience.  Create service orders automatically
 Transfer alarms to FusionServer
 Customer authorization and supervision
 Handle problems quickly
43
Edge Device Management:
Unified Management of over 10k Edge Devices and Central Devices

Atlas management network Features

FusionDirector Comprehensive • Monitoring and management of edge devices


• Hardware information, system status, and
application information

Center
Fine-grained management of user rights and
Network •
customized role rights
Private line/Public Secure • Rights- and domain-based management for 1000 groups
network
• TLS 1.2 encrypted transmission
• Edge + center two-way authentication for management

• Automatic device management


Edge Edge Edge Simple • Batch configuration import and export (configuration
Node 1 Node··· Node N
Edge nodes
restoration after spare parts replacement)
• Automatic firmware upgrade
Maximum management capability of
10k edge devices + central devices

44
Edge Service Management:
Center-Edge Collaboration for Unified Application Provisioning and Management

Background and Challenges: A large number of devices Frequent service model iteration Devices deployed in remote areas

3rd party:ModelArts/Service model management platform


Distributes central management containers and images
Northbound interface for
based on the service type of the edge device, implementing Manage container applications Upload image files
secondary development
fine-grained and targeted distribution of images.

Edge
FusionDirector application Image
management repository
Provides web management interfaces for users and REST
management interfaces for third-party platforms.
Push edge application Report container application
management commands status information Pull image files

Cloud-edge APP
Docker
collaborative agent
software Engine APP
Pre-installs the agent of the edge device in factory.
Edge Node
Linux OS

45
1 Trends and Challenges of Enterprise ICT O&M

2 Intelligent Full-Lifecycle Management for Servers

3 FusionServer Server Full-Lifecycle Security System

4 Application Scenarios and Success Cases

46
Usage Instruction

 The Internal cases: Applicable to point-to-point communication with customers.


 External cases: Applicable to point-to-point communication with customers and bidding cases.
 Case publicity: For use by HQ marketing only.
 Cases used in marketing activities: The case materials must be provided or confirmed by the
HQ marketing team.

47
Finance Internal

HengFeng Bank's Digital Transformation with FusionDirector


Challenges
 The goal of HFBank's digital transformation is to transform from an AI platform to a digital ecosystem of finance.
The focus of financial services is shifted from B2B to B2C, which poses higher requirements on the fast, stable, and
efficient running of servers. HFBank's national data center locates in a three-floor equipment room. To meet service
requirements, more servers were added. However, the huge O&M workload was handled by a team of only about a
dozen engineers. It took about 15 days to manually count assets.
 With limited manpower, how to quickly respond to O&M requirements such as status monitoring and fault alarm and
how to efficiently perform real-time O&M management on a large server cluster became a major pain point.

Solution
 The project used FusionDirector, the intelligent server management software, to seamlessly integrate with the
customer's management software platform and perform full-lifecycle O&M for XXXX FusionServer intelligent
servers.
 FusionDirector enables intelligent management of deployment, faults, energy efficiency, versions, and assets,
perfectly resolving O&M pain points.

Customer Benefits
 Intelligent fault management: Provides fault warning 7 to 30 days in advance, automatically diagnoses faults (with
an accuracy of 93%), and automatically isolates faults, helping prevent or quickly rectify faults online and reducing
unexpected downtime by about 46 hours per year.
HengFeng Bank (HFBank) is one of the 12 national joint-stock commercial  Intelligent asset management: Completes networkwide inventorying in seconds, and saves about 88.8 person-days
banks in China. It runs 18 level-1 branches and a total of 306 branches in China. each year compared with manual asset management.
In addition, 5 rural banks have been set up. HFBank is also expanding  Intelligent deployment: Implements pipeline-style deployment. In 2020, XXX servers were deployed using
FusionDirector, reducing about 33 person-days.
presence inside and outside China.  Intelligent energy efficiency management: Reduces electricity fees by about USD485,000 for the customer each
It ranked No. 143 by The Banker magazine in the 2016 Global Top 1000 Banks. year.
HFBank ranks 7th among national commercial banks in China and No. 3 among  FusionDirector reduces OPEX by about 15% (USD127,000/year).
national joint-stock commercial banks. In terms of profitability, HFBank ranks
[Data description] According to the ITIC 2020 Global Server Hardware and Server OS Reliability Report, the average availability of
5th in the Asian bank competitiveness study released by the Chinese University FusionServer intelligent servers is 99.9997%. The figures above assume the labor cost is USD38 per hour. FusionDirector can reduce
of Hong Kong. the unexpected downtime of servers and edge devices by 90%.

48
Carrier Internal
FusionDirector Facilitates VEON's Digital Transformation
Challenges
 The goal of VEON's digital transformation is to transform from an NSP to an ISP. Therefore, VEON pays special
attention to the reliability and availability of devices. VEON continuously strengthens the big data analysis
technology and develops a big data analysis platform. VEON also has high expectations for quick deployment,
update, and real-time monitoring of applications. VEON also requires CAPEX optimization.
 From 2020 to 2024, VEON will continue to purchase a large number of servers to meet the transformation
requirements. As the number of servers keeps increasing and O&M personnel are insufficient, how to continuously
meet customers' high expectations for device reliability and availability and reduce the project OPEX becomes a
great challenge.

Solution
 FusionServer series intelligent servers won VEON's 5-year framework contract (2020–2024) with excellent
performance test results. The total number of servers is XXXX. FusionServer intelligent servers fully meet the
customer's requirements for internal IT and B2B services, mission-critical service systems, and software-defined
storage solutions.
 The project uses the FusionDirector intelligent server management software to implement full-lifecycle O&M for
FusionServer, including intelligent management of faults, deployment, energy efficiency, and versions, continuously
Founded in 1992, VEON is headquartered in Amsterdam, the Netherlands. ensuring device efficiency and stability and reducing the project OPEX.
With 41,994 full-time employees, VEON is the world's seventh largest
telecom operator and the world's sixth largest mobile network operator. It
provides a series of voice and data services, including traditional, mobile
Customer Benefits
broadband, and fixed network services. VEON aims to transform from an  Intelligent fault management can detect faults 7 to 30 days in advance, automatically diagnose and isolate faults,
and help avoid or quickly rectify faults online. It is estimated that the unexpected downtime will be reduced by about
NSP to an ISP. Since the end of 2017, VEON APP has gone live on all its 12 188 hours from 2020 to 2024.
networks and becomes a B2B2C platform. In addition, VEON develops APP  Compared with manual operations, intelligent asset management and deployment are expected to save about
partners and a big data analysis platform. 1,119 person-days (USD340,000) from 2020 to 2024.
 The intelligent version management function implements automatic management and upgrade, and one-click quick
update. The procedure is reduced from 20 steps to 3 steps.
 It is estimated that FusionDirector will help reduce the OPEX by about USD1.987 million during the use period from
2020 to 2024.
[Data description] According to the ITIC 2020 Global Server Hardware and Server OS Reliability Report, the average availability of
FusionServer intelligent servers is 99.9997%. The figures above assume the labor cost is USD38 per hour. FusionDirector can reduce
the unexpected downtime of servers and edge devices by 90%.

49
Finance Internal

Sberbank's Digital Transformation with FusionDirector


Challenges
 The strategic focus of Sberbank's digital transformation is to launch OCP online banking to digitize sales and
services, and apply the FRMP financial and risk management platform to improve the efficiency of effective
information and data flows. Sberbank provides services for 1.3 million enterprise users and 250 million accounts.
The data center scale is large, with leased equipment rooms. The tight manpower resources pose great challenges
to the stable and efficient O&M of the project.
 The customer has specific requirements on intelligent asset and fault management.

Solution
 The customer uses a FusionServer service team (6 persons) to support the full-lifecycle O&M of 12,000
FusionServer servers and 300 storage devices through the functional modules of FusionDirector.
 The powerful intelligent fault and asset management functions enable stable and efficient O&M, helping customers
easily cope with challenges and reduce project OPEX.

Customer Benefits
Founded in 1841, Sberbank (Savings Bank of the Russian Federation) is the  Intelligent fault management generates warnings 7 to 30 days in advance, and automatically diagnoses and
largest state-owned commercial bank in Russia. It owns more than a quarter of isolates faults to help prevent or quickly rectify faults online. The intelligent fault management function of
FusionDirector reduces the unexpected downtime by about 985 hours from 2016 to 2020.
domestic bank assets and is closely related to Russia's economic and social  Intelligent asset management completes networkwide inventorying in seconds, saving about 1882 person-days
development. (about USD572,000) from 2016 to 2020 compared with manual asset management.
Today, Sberbank has become a global commercial bank that provides the most  Intelligent deployment achieves quick device installation, saving 1320 person-days (about USD401,000) from 2016
extensive banking services, has a stable customer base, and operates smoothly to 2020 compared with manual deployment.
 Customers do not need to build their own O&M teams, which reduces training costs, improves O&M efficiency, and
in all links of the financial market. With more than 20,000 branches, Sberbank reduces the fault rate caused by manual misoperations.
provides banking services for 1.3 million enterprise users in 11 time zones
across the country and has 250 million retail accounts.
[Data description] According to the ITIC 2020 Global Server Hardware and Server OS Reliability Report, the average availability of
FusionServer intelligent servers is 99.9997%. The figures above assume the labor cost is USD38 per hour. FusionDirector can reduce
the unexpected downtime of servers and edge devices by 90%.

50
Transportation
Internal
and Logistics

FusionDirector Helps Qatar Airways Achieve Digital Transformation


Challenges
 Qatar Airways builds a Digital Factory that integrates cloud computing, big data, and AI to change the connection,
exploration, and transaction modes of millions of users on the Qatar digital platform, enhance customer experience,
improve enterprise operation efficiency, and increase business efficiency and value. The project has high
requirements on server asset management, rapid deployment, update, and real-time monitoring of applications.
 The Digital Factory of Qatar Airways has a wide range of contents, but its O&M personnel are limited. How to
implement efficient, flexible, and real-time intelligent O&M becomes a great challenge.

Solution
 FusionDirector is used to manage XXXX FusionServer servers throughout their lifecycles.
 FusionDirector is seamlessly integrated into the customer's management software platform to enable intelligent
management of deployment, faults, energy efficiency, versions, and assets, resolving project difficulties and
requirements.

Qatar Airways is a company with leading digital transformation among global Customer Benefits
airlines. Founded in 1997, the company is one of the youngest global airlines
serving six continents. Its team consists of more than 46,000 professionals.  Intelligent fault management provides fault warning 7 to 30 days in advance, and supports automatic diagnosis and
isolation to help prevent or quickly rectify faults online, reducing the unexpected downtime by 30 hours per year.
By 2020, the company will expand its routes to 100 destinations, expand its  The intelligent asset management function of FusionDirector saves about 56.6 person-days per year. The intelligent
operations to more than 700 flights per week, and increase its annual deployment function saves about 119 person-days per year. The total labor cost is reduced by about USD53,400
passenger traffic to more than 25 million. Since 2007, Qatar Airways has per year.
become a five-star airline company rated by SKYTRAX. In 2011, 2012, 2015,  FusionDirector reduces the overall project OPEX by about 15%, that is, saves about USD180,000 per year.
2017, and the latest 2019, Qatar Airways was rated as the annual best  Intelligent version management provides one-click update, reducing the steps from 20 to 3.
airline company by SKYTRAX. In 2015, Qatar Airways ranked third among
the world's best airlines by the Travel+Leisure magazine. [Data description] According to the ITIC 2020 Global Server Hardware and Server OS Reliability Report, the average availability of
FusionServer intelligent servers is 99.9997%. The figures above assume the labor cost is USD38 per hour. FusionDirector can reduce
the unexpected downtime of servers and edge devices by 90%.

51

You might also like