Dell PowerEdge Concepts and Features - Participant Guide
Dell PowerEdge Concepts and Features - Participant Guide
Dell PowerEdge Concepts and Features - Participant Guide
CONCEPTS AND
FEATURES
PARTICIPANT GUIDE
PARTICIPANT GUIDE
Dell PowerEdge Concepts and Features
Portfolio Overview 7
Introduction to Server Portfolio 7
Dell PowerEdge Naming Convention 7
Service Tag and Asset Tag 8
PowerEdge 13G Servers 10
PowerEdge 13G Specifications 12
PowerEdge 13G Control Panel 15
PowerEdge 14G Servers 16
PowerEdge 14G Specifications 18
PowerEdge 14G Control Panel 20
PowerEdge 15G Servers 21
PowerEdge 15G Specifications 23
Server Components 25
Introduction 25
Processors 27
Memory 33
Power 59
Cooling 68
Networking 97
Graphic Processing Units (GPUs) 106
Expansion Card 115
Storage 120
Portfolio Overview
Movie:
Click the link below to watch an introduction to the Dell server portfolio.
https://edutube.dell.com/Player.aspx?autoplay=true&vno=ZICcxeiVyyvuY3T8JG72
ng
Dell PowerEdge servers with common design components are identified by the
server model name.
The server naming convention provides insight into the form factor, class of
system, generation, and the CPU socket count.
Important:
• The PowerEdge XE family of servers is purpose-built
for complex, emerging workloads that require high-
performance and large storage. For example, the
PowerEdge XE8545.
• The PowerEdge XR family of servers is ruggedized,
industrial-grade servers intended for extreme
environments. For example, PowerEdge XR11/XR12.
The Dell service tag is a seven-character identifier that is unique to the product.
• The service tag of a PowerEdge server is a pullout tab also known as an
Enterprise Service Tag (EST). ESTs are typically located on the front or rear of
the chassis.
• Information about the service tag can also be found on a sticker typically on the
side of the chassis, and in the server BIOS.
All Dell PowerEdge servers have a Service Tag and can have an Asset Tag
added.
The Asset Tag is an empty field within BIOS where you can input your own
identifying information such as the system’s security number or location ID.
Movie:
Click the link below to watch a demo video on locating the service tag in a
PowerEdge server.
https://edutube.dell.com/Player.aspx?autoplay=true&vno=DnD1V|@$@|UUKQ2lR
tWtNqDYMw
PowerEdge R630
PowerEdge R630
PowerEdge R730xd
PowerEdge R730xd
PowerEdge R930
PowerEdge R930
Embedded NIC- 1 GbE x4 (this port is for management only). The NDC is present
in different versions ranging from 4x 1 GB, 2x 1 Gb + 2x 10 GB or 4x 10 GB.
5: Memory:
6:
Listed below are the PERC9 cards that are supported in the PowerEdge 13G
systems:
• Internal controllers: PERC S130 (Software RAID), PERC H330, PERC H730,
PERC H730P, HBA330 (no RAID internal HBA)
• External HBAs (RAID): PERC H830
• External HBAs (non-RAID): 12 Gbps SAS HBA
Storage Controllers: Listed below are the PERC9 cards that are supported in the
PowerEdge 13G systems:
• Internal controllers: PERC S130 (Software RAID), PERC H330, PERC H730,
PERC H730P, HBA330 (no RAID internal HBA)
• External HBAs (RAID): PERC H830
• External HBAs (non-RAID): 12 Gbps SAS HBA
• 8 GB vFlash media (optional) - all systems will get shipped with a vflash but
only systems that have the enterprise license will be able to use them.
• 16 GB vFlash media (optional) - all systems will ship with 8GB but customers
can request the 16GB vflash as an upgrade.
Embedded NIC- 1 GbE x4 (his is the dedicated port for the iDRAC) - The NDC is
present in different versions ranging from 4x 1 GB, 2x 1 Gb + 2x 10 GB or 4x 10
GB.
The table below explains components on the front panel of the PowerEdge server.
1 Power Button
2 NMI Button
5 LCD Panel
6 VGA Connector
8 Service Tag
10 USB Connector
PowerEdge R640
PowerEdge R640
PowerEdge R740xd
PowerEdge R740xd
PowerEdge R940
PowerEdge R940
PowerEdge R940xa
All the systems will ship with a bezel but the customer has the choice of
purchasing the bezel with or without LCD.
iDRAC9 supports the Group Manager feature that enables users to have multiple
console experience and offers simplified basic iDRAC management.
2: Intel C620 chipset: PowerEdge 14G systems include the Intel Lewisburg as the
Platform Controller Hub (PCH) chip. The Integrated Intel® Ethernet with scalable
iWARP RDMA in the Intel® C620 series chipset provides up to four 10 GBPS
high-speed Ethernet ports for high data throughputs and low-latency. Ideal for
storage, data intensive, and connected IoT solutions.
3: The Intel® Xeon® scalable processor family supports 2933 MT/s memory. As
an example, the PowerEdge R740 and R740xd support two DIMMs per channel at
2933 MT/s with these processors.
iDRAC9 supports the Group Manager feature that enables users to have multiple
console experience and offers simplified basic iDRAC management.
Intel C620 chipset: PowerEdge 14G systems include the Intel Lewisburg as the
Platform Controller Hub (PCH) chip. The Integrated Intel® Ethernet with scalable
iWARP RDMA in the Intel® C620 series chipset provides up to four 10 GBPS
high-speed Ethernet ports for high data throughputs and low-latency. Ideal for
storage, data intensive, and connected IoT solutions.
Processor: The Intel® Xeon® scalable processor family supports 2933 MT/s
memory. As an example, the PowerEdge R740 and R740xd support two DIMMs
per channel at 2993 MT/s with these processors.
The image below highlights the control panels that are located at the front of a
PowerEdge 14G system.
PowerEdge R750
PowerEdge R750
2 Root of Trust is a concept that starts a chain of trust that ensures systems boot
with a legitimate code at every step of the boot process. RoT is controlled by the
iDRAC9.
PowerEdge XE8545
PowerEdge XE8545
PowerEdge XR11
PowerEdge XR11
PowerEdge R6515
PowerEdge R6515
1: Direct Liquid Cooling (DLC): DLC is introduced in the PowerEdge 15G systems
and features a leak-sensing technology to identify and resolve issues faster. The
DLC technology is supported only in the PowerEdge R650, PowerEdge R750,
PowerEdge R750xa and PowerEdge C6520 servers.
2: PowerEdge RAID Controllers: Support for PERC 10 and PERC 11 cards for
enhanced RAID performance.
To view the list of PERC types for Dell systems, visit the List of PowerEdge RAID
Controller (PERC) types for Dell systems document in dell.com/support
4: Memory: The 15G PowerEdge servers support the Intel Optane persistent
memory, that support up to 16 DIMMS per CPU.
Intel Optane persistent memory is also known as Barlow Pass. Click here for more
information on Barlow pass and different configuration.
Direct Liquid Cooling (DLC): DLC is introduced in the PowerEdge 15G systems
and features a leak-sensing technology to identify and resolve issues faster. The
DLC technology is supported only in the PowerEdge R650, PowerEdge R750,
PowerEdge R750xa, and PowerEdge C6520 servers.
To view the list of PERC types for Dell systems, visit the list of PowerEdge RAID
Controller (PERC) types for Dell systems document in dell.com/support.
Memory: The 15G PowerEdge servers support the Intel Optane persistent
memory, that support up to 16 DIMMS per CPU.
Intel Optane persistent memory is also known as Barlow Pass. Click here for more
information on Barlow pass and different configuration.
Server Components
Introduction
Hot swap components enable zero system downtime for failures and
serviceability.
Examples of some Hot swap components are fans, disks, and Power Supply Units
(PSUs).
FRUs are replaced by a user or technician without having to send the entire
product or system to a repair facility.
FRU is marked as blue color. Blue color indicates that the system must be
shutdown to replace this component.
Some component parts are designed for easy customer removal and replacement;
such parts are designated as Customer Self-Replaceable (CSR) or Customer
Replaceable Unit (CRU).
When during the troubleshooting, a Dell Technician determines that the repair can
be accomplished with a CSR/CRU designated part, Dell ships the designated part
directly to the customer, which allows customers to replace parts at their own
convenience.
Processors
The Intel processor uses a metal naming convention to designate the different
levels of available features. The levels are Platinum, Gold, Silver, and Bronze.
1: Intel® Xeon® Platinum processors offer the industry best performance for
mission-critical and hybrid cloud workloads, real-time analytics, machine learning,
and artificial intelligence. The platinum processors offer monumental leaps in I/O,
memory, storage, and network technologies.
2: Intel® Xeon® Gold processors offer high performance, advanced reliability, and
hardware-enhanced security. The gold processors are optimized for demanding
data-centers, hybrid-cloud compute, network, and storage workloads.
− Up to 32 CPU cores.
− Up to four socket configurations.
− Up to 6 TB memory.
• Silver: Intel® Xeon® Silver processors offer the hardware-enhanced
performance and security that is required for data center compute, network,
and storage. The silver processors are optimal for midsized and growing IT
organizations.
The silver processors support:
− Up to 20 CPU cores.
− Up to two socket configurations.
− Up to 1.5 TB memory.
• Bronze: Intel® Xeon® Bronze processors provide optimized performance for
small businesses and basic storage servers.
The bronze processors support:
AMD Processors
AMD Processor
The Dell PowerEdge server portfolio is powered by the third generation AMD
EPYC™ Processors. The AMD processors system on a chip (SOC) is the next-
generation data center processor supporting socket compatibility with socket
infrastructure. The AMD Milan processor is based on a new enhanced Zen2 CPU
core with integrated I/O controllers.
The AMD Milan processor:
• Offers significant performance improvement from current generation
production.
• Has 128 PCIe lanes, eight-channel memory, and dual-socket configurations.
• Lowers cost through an optimal balance of compute, memory, I/O, and
security.
• Offers one I/O memory die which removes internal bottleneck for lower latency.
• Has up to 64 CPU cores per processor.
• Interchip global memory interconnect (xGMI2) up to 64 lanes.
• Has Secure Encrypted Virtualization(SEV) which provides 509 unique
hypervisor keys.
• Has two restrictions.
− The RTC/CMOS is built into the CPU, similar to previous PowerEdge AMD
servers. RTC/CMOS will be lost when CPU1 in server is removed or
reinstalled.
− AMD does not support early boot. No error message is displayed when
there is no memory that is populated in the system.
The PowerEdge 15G servers also support third-generation Intel Xeon scalable
processors.
The Intel® Xeon® Processor has increased performance and incremental memory
options.
The Xeon scalable processor supports usages from entry designs based on Intel
Xeon Silver processors to advanced capabilities offered in the new Intel Xeon
Platinum processor.
The third-generation Intel Xeon scalable processor supports:
• Up to 40 CPU cores.
The 15G server with Intel processors and heatsinks has an additional anti-tilt
feature to prevent tilting of the heatsink assembly. The plastic nuts secure the
heatsinks to the system board.
Processor Settings
The Processor Settings option is used to view and configure various processor
settings. The Processor Settings can be accessed through System Setup utility.
Processor Settings.
Go to System Setup Main Menu > System BIOS > Processor Settings.
Memory
Dell PowerEdge servers run on Error-Correcting Code (ECC) memory. The ECC
memory can test and correct any memory errors without the processor or the user
being aware of these operations. ECC corrects the errors without interrupting
other operations on the server.
Memory Comparison
The table below highlights the differences in memory features across the three
generations of Dell PowerEdge servers.
RAM Size 1 X 4 GB 1 X 8 GB 1 X 8 GB
5Memory channels are the physical layer on which the data travels between the
CPU and memory modules.
Memory Layout
Memory Settings
The Dell server memory settings can be accessed through the Lifecycle Controller
(LCC) System Setup option.
8If socket A1 is populated for processor 1, then populate socket B1 for processor
2 with an identical DIMM.
Step 1
Step 2
Press F2 key in the keyboard. The system will enter the System Setup page.
Step 3
Step 4
Memory Settings
If the field is set to Disabled, the system supports NUMA (asymmetric) memory
configurations. This option is set to Disabled by default.
Memory Modes
The Dell PowerEdge chipset allows different operating modes for the memory to
be set in the BIOS.
The various memory modes available on the Dell PowerEdge servers are:
• Optimizer Mode: The memory controllers run independently of each other.
• Mirror Mode: The system supports memory mirroring if identical memory
modules are installed in two channels.
• Advanced ECC Mode: The two memory channels closest to the processor
(channel 0 & 1) are combined to form a single 128-bit channel.
• Spare Mode: One rank11 per memory channel is reserved as a spare.
• Dell Fault Resilient Mode: An area of the memory that is established as fault
resilient and is used by a VMware vSphere hypervisor or other services to
maximize the availability.
Optimizer Mode:
In Optimizer mode, all three channels are populated with memory modules. This
mode permits a larger total memory capacity but does not support SDDC with x8-
based memory modules.
It is recommended to populate all three channels with identical memory but each
channel can have a different size DIMM. The larger DIMM has to be installed in
the first slot and the configuration has to be the same across all three channels. In
a dual-processor configuration, the memory configuration for each processor must
be identical. Optimizer mode is the only mode to support mixed memory sizes.
11A memory rank is a block or area of data that is created using some, or all, of
the memory chips on a module.
Any configurations not following the above rules may generate error messages or
not POST at all. For more detail read the initial release notes: Installing and
configuring DDR3 Memory
It is recommended to populate all three DIMM slots on servers with three DIMM
slots per channel to take advantage of memory interleaving to get maximum
performance. While a single UDIMM per channel gives slightly better performance
than an RDIMM, RDIMMs give better performance when multiple DIMM per
channel are installed.
Optimizer is used if just one DIMM for each processor is configured. A minimal
single-channel configuration of 1 GB memory modules per processor is also
supported in this mode. Minimum to POST would be one DIMM and in the first slot
and just CPU 1 installed.
Memory Mirroring:
In Advanced ECC (Lockstep) mode, the two channels closest to the processor
(CH 0 & 1) are combined to form one 128-bit channel. This mode supports Single
Device Data Correction (SDDC) for both x4 and x8 based memory modules.
Memory modules must be identical in size, speed, and technology in the slots on
channel 0 and 1. Channel 2 has to be empty or option will not be available in the
System Setup program.
Using the Intel 5500 and 5520 chipset with Intel 55xx and 56xx processors
channels 0 and 1 are combined which enables 8-bit error correction instead of 4-
bit in normal Advances ECC (not lockstep). SDDC gives the ability to recover from
more types of single and multibit memory errors. The third channel and
corresponding memory slots cannot be used but full amount of installed physical
memory will be accessible to the operating system.
Note: 14G and 15G servers do not support advanced ECC mode.
Dell Fault Resilient Mode (FRM) is a Memory Operating mode available on the
BIOS settings of high-end yx2x Dell PowerEdge servers and later. This mode
establishes an area of memory that is fault resilient and protects the hypervisor
against uncorrectable memory errors, and safeguards the system from becoming
unresponsive. Systems with ESXi that supports the FRM feature can load the
operating system kernel to maximize system availability and or critical applications
or services.
• Single-rank sparing mode: It allocates one rank per channel as a spare. This
mode requires a population of two ranks or more per channel.
• Multi-rank sparing mode: It allocates two ranks per channel as a spare. This
mode requires a population of three ranks or more per channel.
When single rank memory sparing is enabled, the system memory available to the
operating system is reduced by one rank per channel. See the example below.
The Intel Optane Persistent Memory (Barlow Pass) solution retains data during a
power loss, system shutdown, or system errors. Barlow Pass (BPS) uses
persistent memory as storage, rather than traditional memory.
• Creates a unique new memory tier to reduce latencies and optimize workloads.
• Provides disruptive storage class memory cell technology (3DxPoint) that
resides on the DDR memory interface.
• Provides large memory footprints of 128 GB, 256 GB, and 512 GB.
• Enables in-memory data to survive a soft reset or a hard reboot (power loss).
• Provides minimal latency and faster storage for large amounts of memory.
The Barlow Pass architecture consists of two-tier memory and storage hierarchy
to address the data performance and storage challenges. The advantages of the
hierarchical approach are:
The memory modes can be used only if the DIMMs are RDIMMs with capacity of
32GB or lesser.
BPS1 4+4 1 or 2 4 4 Y Y
BPS2 6+1 1 or 2 6 1 Y
BPS3 8+1 1 or 2 8 1 Y
BPS4 8+4 1 or 2 8 4 Y Y
BPS5 8+8 1 or 2 8 8 Y Y
BPS6 12+2 1 or 2 12 2 Y
Each row on the below chart represents a different valid memory configuration for
mixing Barlow Pass (B) and RDIMMs (R).
Operational Modes
Users can configure the memory modes and update it in the BIOS.
12The mode can be changed through the BIOS settings: F2 -> System BIOS ->
Memory Settings -> Persistent Memory -> Intel Persistent Memory -> Region
Configuration.
Memory Mode
13They access DIMMs as system memory, and will not have control or direct
access to DDR4 DIMMs that are used for caching.
AppDirect Mode
The App Direct mode is the default memory mode on the BIOS. AppDirect mode
uses the DIMM as storage.
Features of App Direct mode:
• Provides larger storage capacity, higher endurance, low latency and traditional
read/write.
• Works with existing file systems to access the files. Two major methods to
access the files are Block method14 and PMEM method15.
• Cache lines are accessed using load or store instructions.
• Application is responsible for flushing data out of CPU cache into persistence
guaranteed memory buffers.
• Persistent memory aware file systems, operating system and applications can
access BPS DIMMs in AppDirect mode.
• BPS DIMMs are used as persistent memory when configured in AppDirect
Mode.
In Memory mode, the BIOS and operating system list the capacity of the Optane
memory and not the RDIMMs. The RDIMMS are used as cache for the Optane
DIMMs when running in memory mode.
14 The block method is slower and is similar to traditional storage access. The
block size is configurable at the operating system level.
15 PMEM method uses the full technology potential, but requires the application to
be optimized.
In AppDirect mode, 632 GB is the full amount of memory available but only 128
GB of it is volatile. The system uses the rest of the memory as persistent storage.
The image shows the difference between the memory capacity that is available when running in
Memory mode and AppDirect mode.
Power
In most cases16, the power supply unit (PSU) is a hot-swappable component that
provides power redundancy support on PowerEdge servers.
16The PSUs shipped in the PowerEdge 200-500 server series are not hot-
swappable. Not all Dell PowerEdge support a minimum of two PSUs. Some low-
end PowerEdge servers have a single PSU.
Grid Redundant
In grid redundant mode, the hot spare18 feature is disabled and the power output
is distributed equally across both power supplies. The Power Factor Correction
17 Not all Dell PowerEdge support a minimum of two PSUs. Low end PowerEdge
servers have a single PSU. For example, the PowerEdge R230 does not support
the multiple PSU feature and redundancy.
18 When the hot spare feature is enabled, one of the redundant PSUs is switched
to the sleep state. The active PSU supports 100 percent of the system load, thus
operating at higher efficiency. The PSU in the sleep state monitors the output
Power supplies are divided into Grid A and Grid B. If a grid or a PSU on one grid
fails, but the PSUs on the second grid are functional, the system does not shut
down. Grid redundancy depends on the system configuration.
No Redundancy
voltage of the active PSU. If the output voltage of the active PSU drops, the PSU
in the sleep state returns to an active output state.
19 When the hot spare feature is enabled, one of the redundant PSUs is switched
to the sleep state. The active PSU supports 100 percent of the system load, thus
operating at higher efficiency. The PSU in the sleep state monitors the output
voltage of the active PSU. If the output voltage of the active PSU drops, the PSU
in the sleep state returns to an active output state.
The PSU redundancy mode depends on the server type and the number of PSUs
in the system.
Power Capping
The Power capping option is used to limit the amount of power consumed by a
server.
When power cap policy is enabled, it enforces a user-defined power limits on the
system. If power-capping is not enabled, the default hardware power-protection
policy is used. This power-protection policy is independent of the user-defined
policy. The system performance is dynamically adjusted to maintain power
consumption close to the specified threshold.
condition occurs the lower wattage PSU is disabled and a warning condition is
triggered.
OFF No power
The PSU firmware can be updated through the Lifecycle Controller (LCC). Click
here to review how to update a server Power Supply Unit firmware, including a
video walk-through of the procedure.
PSU Blanks
To maintain an efficient airflow for system cooling, all servers with an empty PSU
slot require PSU blank plates. PSU blanks avoid the loss of cooling airflow. If the
PSU blanks are missing, the system temperature might increase and result in
component failures.
In 15G servers, the PSUs are located in the rear of the system. The PSUs are on
the opposite side of each other for better airflow within the chassis.
Cooling
A server consists of multiple fans. When a fan fails, the remaining fans take up the
load.
The cooling fans dissipate the heat generated by the functioning of the server21.
These fans cool the processors, expansion cards, and memory modules.
21Some servers may not have hot-swappable fans (Hot swapping is the
replacement or adding of components to a system unit system without stopping,
shutting down, or rebooting the system). If no hot-swappable fans are available
and if a fan fails, the iDRAC ramps up the existing fans, similar to systems with
hot-swap fans. However, the failed fan cannot be replaced until the system has
been powered off as the fans must have their cables that are disconnected from
the system board.
Types of Fans
The Dell PowerEdge servers use Standard fans, High-Performance fans, and Very
High-Performance fans based on the server configurations22.
Dell PowerEdge servers come with different chassis dimensions and they can be
1U, 2U, and so on. Based on the chassis dimension and design the fan dimension
may vary as well.
If a system has six fans and one of the fans fails, the iDRAC ramps up the
remaining fans. It keeps the temperature within the chassis at a set level. (It
should be noted that if the temperature is already well below the required level, the
iDRAC may not ramp up the remaining fans.)
Once the failed fan has been replaced, the iDRAC tests the new fan. It slowly
decreases the speed of the existing fans while increasing the speed of the new fan
until they are all operating at the correct speed.
Removal of the chassis covers often results in the fans ramping up as the cover is
used to deflect airflow throughout the system. If the cover is removed a certain
amount of airflow is lost so the iDRAC, upon detecting that the cover has been
removed, it ramps up the fan-in an effort to increase the airflow across the
components to maintain required temperatures.
Also when the system is first powered on, the temperatures take a few seconds to
be recorded. As a fail-safe procedure, the iDRAC ramps the fans up and then
bring them back down as temperature status is analyzed.
Should a fan fail, errors are posted and the remaining fans pick up the additional
workload. However, based on the temperature within the chassis, the remaining
fans may or may not increase their speed.
22A user can see the thermal restrictions matrix or the technical guide of the
server for more information.
Some servers, especially the lower ranges may not have hot-swap fans. In that
case, if a fan fails, the iDRAC ramps up the existing fans like for systems that have
hot-swap fans, but the failed fan cannot be replaced until the system has been
powered off as the fans must have their cables that are disconnected from the
system board.
The HPR fans provide a higher airflow rate. HPR fans are required in 12 x 3.5”,
rear-storage configurations and most GPU configurations.
VHP fans require front 16x NVMe drives or 8x NVMe, 16x SAS drives with GPU
configurations.
Some of the 15th generation Dell Intel servers use standard, high-performance
silver grade, or high-performance gold grade fans, dependent on the configuration.
Dell PowerEdge servers such as R250, R350, T350, T150, XR11 and XR12 use
non hot pluggable single rotor fans.
Heatsinks
The type of heatsink that is used is based on the CPU TDP23 and GPU
configurations.
Some PowerEdge servers have unique fan positioning in the chassis. For
example, the Dell PowerEdge XR11 has two fans that are located towards the
middle of the chassis. It has an extended heatsink design for optimum cooling.
In, certain single CPU configurations (non-GPU or nonrear-drive) only four fans
are required to be installed in the fan bay.24
23 Thermal Design Power (TDP) is measured in watts and is the maximum amount
of heat that is generated by a GPU or CPU. There are multiple types of CPU
heatsinks available including standard (STD), T-type, and full height heatsinks.
Unique fan position and extended heatsink that is used in the PowerEdge XR11.
To remove the heatsink: The heatsink and processor are too hot to touch for some
time after the system has been powered off. Allow the heatsink and processor to
cool down before handling them.
1. Ensure that all four Anti-Tilt wires are in the locked position (outward position),
and then using a Torx number T30 screwdriver, loosen.
• Loosen the first screw three turns.
• Loosen the screw diagonally opposite to the screw you loosened first.
• Repeat the procedure for the remaining two screws.
• Return to the first screw to loosen it completely.
24In such configurations, only four fans are required to cool the system. For the
other two fan sockets, two fan blanks are required to be installed in fan bays 1 and
2. The number of fans that are required depends on the server model and
configuration.
DIMM blanks on empty DIMM slots help regulate air flow through the CPU and
DIMM area.
For some servers, the air shroud looks like piano keys25 that drop down into empty
DIMM slots to stop airflow from being wasted. However, systems with midrange
storage require DIMM blanks for empty DIMM slots since the trays do not contain
piano keys.
Some of the PowerEdge servers like the XE8545 have separate GPU air shrouds
for better airflow and heat dissipation.
Dell specialized servers also known as XE servers can have advanced GPU
configurations. These systems generate a lot of heat and require custom
solutions.
25 This means that they do not require single DIMM blanks on empty DIMM slots.
The Dell PowerEdge XE8545 supports up to 4x NVIDIA A100s GPUs and NVLink
in an air-cooled chassis.
The GPUs are cooled with the help of hot-pluggable GPU fans and specialized
heatsinks for each of the GPUs.
Design Innovation
The advanced thermal design streamlines the airflow pathway in the chassis and
directs the appropriate volume of air to components that require a constant air
supply.
The design minimizes the fan and system power consumption while maintaining
the system temperature.
iDRAC Cooling Configuration settings page where a user can change the exhaust temperature
limit.
Users can apply custom fan speeds when using interfaces such as: iDRAC UI,
BIOS setup (F2), and RACADM.
26This minimization of system fan speeds and airflow can result in high exhaust
temperatures may be of concern to some users.
As computing demands grow, so do data centers, and with this growth comes
huge amounts of heat27 that must be managed efficiently. Many data centers start
out as a few racks in a server room, adding more equipment over time. Without
taking cooling factors into account, data center HVAC management can become
difficult.
27When data centers are exposed to heat, servers start to slow down or
malfunction altogether. The same thing happens when the server rooms are too
cold. The ideal temperature for the data center depends on the size and amount of
heat that is emitted, but operating within this ideal temperature range is crucial for
overall performance.
Hot aisle containment (HAC) guides the hot air (red arrows) into a system unit room air handler
(CRAH) which then recirculates the flow into cool air (blue arrows).
Multiple PowerEdge servers with new Intel and AMD processors support the Dell Technologies
DLC.
Direct Liquid Cooling (DLC) solution manages the growing thermal challenges.
Dell DLC solutions28 cool the CPU with warm liquid, which has the capacity to
transfer heat up to 4X more than the capacity of air cooling.
Because DLC solutions are more efficient at extracting heat, it reduces the burden
on server system fans and the data center’s cooling infrastructure.
The PowerEdge servers below offer DLC cooling on the newest Intel and AMD
processors:
• C6520
• C6420
• C6525
28DLC solution is more efficient at extracting heat, reducing the burden on server
system fans and the data center’s cooling infrastructure.
• R6525
• R7525
• R650
• R750
• R750xa
Discussion Block
Discussion Notes:
DLC example of a cold plate and coolant loop. Monolithic is used in the 15G rack servers and
modular is used in the Dell PowerEdge C6420 and C6520 servers.
DLC uses the exceptional thermal capacity of liquid to absorb and remove the
heat that is created by new high-power processors. Cold plates are attached
directly to the processors. The coolant captures and removes the heat from the
system to a heat exchanger in the rack or row.
This heat load is removed from the data center using a warm water loop,
potentially bypassing the expensive chiller system. By replacing (or
supplementing) conventional air-cooling with higher-efficient liquid cooling, the
overall operational efficiency of the data center is improved.
Leak Sense technology provides customers with the knowledge that potential
issues are found and reported quickly.
If a coolant leak occurs, the system’s leak sensor logs an alert29 in the iDRAC
system.
29Three errors can be reported: small leak (warning), large leak (critical), leak
sensor error (warning – indicates the issue with the leak detection board) on the
iDRAC. These error detections can be configured to take meaningful actions using
tools like OpenManage Enterprise.
POD Solution
POD solution containing two outer racks with node-level DLC and one middle In-Row Cooler.
The Dell rack-level POD solution30 concept is designed for total heat capture.
The POD solution contains front and back containment for racks of DLC servers,
plus an In-Row Cooler that is integrated between the IT racks to capture any
remaining heat.
Monolithic Architecture
The Dell PowerEdge R650, R750, and R750xa follow the monolithic architecture.
30A pod or a cluster is a set of system units that are linked by high-speed
networks into a single unit.
In the monolithic architecture, the Liquid Leak Sensor board connects to the
Complex Programmable Logic Device (CPLD) using the Liquid Cooling Rear I/O
board.
Modular Architecture
In the modular architecture, the Liquid Leak Sensor board connects directly to the
Complex Programmable Logic Device (CPLD).
Discussion Notes:
• The Liquid Cooling Rear I/O (LC RIO) board is a component specific to the
monolithic architecture only (PowerEdge R650, R750, and R750xa).
• A high-level overview of the liquid cooling process:
− Depending on the platform, the Liquid Leak Sensor (LLS) board, connects
to the immediate upstream entity (LC RIO board for the monolithic
architecture or the CPLD for the modular architecture) using an alert cable.
− The message is then forwarded to the iDRAC using the SPI-X registers,
and the error is logged.
• If a leak develops in a particular cold plate and the detection cable is not
engaged, then the alarm signal will not be received. A disengaged alert cable
is reported as an error in the iDRAC logs.
• Here, the context of a 'modular architecture' does not include blade servers.
The Dell PowerEdge MX750c system does not support a liquid cooling
configuration.
Liquid cooling is supported internally and externally in the 14G C6420 and some
15G PowerEdge servers.
External support for liquid cooling is common for both the monolithic and modular
architectures.
− CDUs connect to the rack manifold to pump coolant to the racks and
exchange heat from the servers with facility water.
DLC Ecosystem
The image below shows the high-level overview of the DLC ecosystem.
DLC ecosystem
Discussion Block
Discussion Notes:
• Heat Exchanger:
Also called as Coolant Distribution Units (CDU), the liquid-to-liquid heat
exchangers can support either one rack or a group of racks for a cooling
solution.
CoolIT Systems Rack DLC product line offers various Heat ExchangeModules
depending on load requirements and availability of facility water, including CHx
(Liquid-to-Liquid), AHx (Liquid-toAir), and custom options.
• Liquid Leak Sensor (LLS) Board:
A Liquid Leak Sensor (LLS) is a mechanism that detects leaks within the liquid
cooling system in the system.
The LLS mechanism can determine if a leak is small (0.02 ml) or large (0.2 ml).
• Liquid Cooling Rear I/O board:
The Liquid Cooling Rear I/O board is a new design, specific to the monolithic
architecture. The Liquid Leak Sensor (LLS) board connects to the Liquid
Cooling rear I/O board using an alert cable.
• Liquid Cooling Module:
The Dell Chiller-Less Fresh Air solution brings air into the data center from the
outside to support the cooling systems. The Dell Fresh Air Solution:
Dell Fresh Air 2.0 hardware includes specific configurations that can operate at
higher temperature and humidity levels and use clean outside air for air intake
instead of tightly controlled air conditioning (AC) from a cold aisle.
The general configuration and device restrictions for deployment in a fresh air
environment are listed below.
1:
• High-power PCIe cards (>75 W that use AUX cable, such as: GPU) Lower
power cards could also be excluded based on system limitations.
• Third-party PCIe card (any power levels).
3:
Networking
NDC used in the Dell PowerEdge 12G, 13G, and 14G servers.
Older generation servers used a network interface card (NIC) built into the system
board. When upgrading or changing the NIC technology, users would install a
PCIe network interface controller in one of the PCIe slots in the server.
With the Dell PowerEdge 12G, 13G and 14G servers, the NICs are based on a
daughter card.31 Users can easily change network requirements as they evolve.
31The Network Daughter Card (NDC) is a custom form factor mezzanine card that
contains a complete NIC subsystem.
A Dell Network Daughter Card (NDC)32 enables the user to choose the right
network fabric without using a valuable PCI slot. It presents an easy upgrade path
from 1 GbE to 25 GbE LAN speeds.
OCP Card
The Open Compute Project (OCP)33 cards are network cards that connect to the
PCI bus. They are physically smaller than the Industry Standard Architecture (ISA)
expansion card and often connect to a dedicated connector on the system board.
The OCP card was introduced with the Dell PowerEdge 15G servers.
32 The NDC typical includes the features and behavior of a traditional LOM (LAN
on Motherboard) subsystem. It includes the added benefit of flexibility in terms of
providing the choice to customers to choose their favorite network types, speed,
and vendors.
33 The Open Compute Project (OCP) is an organization that shares designs of
data center products and best practices among companies. The designs and
projects include server designs, data storage, rack designs, open networking
switches, and so on
Important: The OCP and the NDC cards are not a hot-swappable
component.
SNAP I/O
A Storage Network Architecture and Parallel I/O or SNAP I/O network interface
controller consists of a PCIe card.
The SNAP I/O adapters enable both CPUs within a dual-socket server to connect
directly to the network through its own dedicated PCIe interface.
SNAP I/O results in low latency, CPU utilization and higher network throughput.
The image below is of a SNAP I/O ConnecX-5 dual-port 100 GbE only adapter. It
supports PCIe Gen3/Gen4 x16. It is supported by the iDRAC and the Lifecycle
Controller.
The bottom-left image is of a primary SNAP I/O card, and the bottom right is an
auxiliary card. The left image is the SNAP I/O ConnectX-6 single port VPI HDR
adapter. It supports PCIe Gen4 x16 and PCIe Gen3 x32 (with auxiliary card). The
iDRAC and Lifecycle Controller do not support this card.
On the left, is a SNAP I/O ConnectX-6 single port VPI HDR adapter and on the right is an auxiliary
card.
The image below shows the SNAP I/O ConnectX-6 card along with the auxiliary
card which is installed in a Dell PowerEdge server.
SNAPI
The above image shows SNAPI-capable NIC directly connected to both CPUs bypassing QPI and
UPI34. It frees up bandwidth for applications and improves latency.
Both SNAP I/O and SNAPI35 (also know as socket direct cards) are similar in how
they function. However, they connect to the CPU differently.
The Rear I/O and the LAN on Motherboard (LOM) cards are available with
PowerEdge 15G servers.
35System for NUMA Aligned Partitioned I/O (SNAPI) allows an I/O device to
connect directly with multiple upstream CPU sockets. It bypasses inter-CPU
socket link usage and associated overheads such as NUMA latency penalty.
PowerEdge 15G servers have the option of using the LOM, OCP, or both.
36 RIO has an iDRAC port, but the iDRAC chipset is on the system board.
Accelerator Cards
GPUs
37 A GPU typically has thousands of cores that are designed for efficient execution
of mathematical functions.
CPUs consist of minimal cores optimized for serial processing, while GPUs consist of thousands of
smaller, more efficient cores designed for parallel performance.
FPGAs
In the image example, Intel FPGA Accelerated Network Function moves load
balancing, QoS and classifying tasks away from the CPU load.
Intel FPGA card programmed to augment the capabilities of Virtual Network Functions running on
a carrier cloud.
ASICs
Application-Specific Integrated Circuits (ASICs) are cards with silicon devices built
for specific purpose such as graph computing with massively parallel, low-
precision floating-point computing.
A Graphcore card with embedded IPUs, specifically designed for artificial intelligence.
Tip: The Graphcore IPU supports the PowerEdge R6525 and DSS
8440.
Manufactu Model
rer
• Financial modeling: Accelerate the HPC and artificial intelligence (AI) industry
to leverage massive datasets to better understand risk and return.
• Modeling and simulation: Provide modeling and simulation for early evaluation,
fast testing of design modifications enabling more iterations.
• Signal processing: Enable providers to model and analyze signal data streams
coming in from computers, radios, videos, and cell phones in real time.
• Visualization: Enhance performance for 3D visualization applications such as
computer-aided design, enabling software to draw models in real time as the
user moves them.
• Seismic processing: Oil and Gas - accelerate extraction information from
massive seismic data stores, speeding time to results and lowering costs.
CUDA divides work into small independent work and solves independently among
the CUDA blocks.
CUDA requires a supported version of Linux with a GCC complier and toolchain,
or Microsoft Windows or Microsoft Visual Studio, depending on the OS used.
The links that are provided below detail CUDA installation instructions by
operating system.
The system board inlet temperature shouldbetween the minimum and maximum
warning threshold38.
If a system board inlet temperature warning message is logged, the GPUs lower
the power consumption39 to avoid thermal damage.
The GPU full-length kit, half-length kit, and the GPU power cable kit are kits
available for customers. Depending on the kit ordered, the respective components
are available.
38 The range is optimal for GPU performance. The iDRAC sets the thermal
warning threshold when the GPU is installed.
39 Lowering the power consumption results in lower GPU performance.
Use SolVe to generate the upgrade procedure for the GPU kit.
Expansion Card
PCIe Overview
An x1, x4, x8, or x16 card can use a x16 lane slot. A system board can have
multiple slot types and support different PCIe versions.
PowerEdge R750 CPU and PCIe lanes. The PowerEdge R750 supports many riser and PCIe lane
configurations.
The system board processors control the PCIe slots. Also, the system board
chipset may support PCIe slots.
Dell PowerEdge 14G servers support PCIe 3. PowerEdge 15G servers support
PCIe 3 and PCIe 4.
For example: the PowerEdge R750, with the use of expansion card risers, can
support up to 48 x 4.0 PCIe lanes per CPU.
The PowerEdge server supports many different PCIe card form factors. The
graphic shows the standard full-length, half-length, and low profile dimensions.
Also, PowerEdge servers may support other form factors such as half-length, and
half-height (HLHH).
computing.
Risers
Riser cards enable users to install additional expansion cards for the server.
Storage
Advantages
14G servers use generation 3 PCIe technology to write to the drives using four
lanes. 15G servers use PCIe Gen 4 and 2 lanes to each drive.
Cabling
The rear drive slot numbering differs based on the cable configuration of the rear
backplane.
NVMe cabling.
Form Factor
NVMe PCIe SSDs use a 2.5 inch (U.2) and an add-in controller (AIC) form factor.
The NVMe PCIe SSD U.2 installs into a carrier. The NVMe PCIe SSD AIC form
factor installs into the appropriate system board slot.
Paddle cards connect an NVMe backplane to the system board using cables. The
paddle card interfaces with the system board chipset.
When paddle cards are used, the onboard s150 is controlling the NVMe disks.
Paddle cards are similar to risers, but they do not have the riser cage. Paddle
cards provide efficient data management on systems with many storage devices.
The paddle cards are only available with certain riser configurations. Not all
systems will come with paddle boards, it is dependent on the users configuration.
For example, in the PowerEdge R750, the configuration that supports 24 X 2.5"
hard drives with the backplane has paddle cards for efficient management.
1:
The Internal Dual SD Module (IDSDM) provides a redundant SD-card module for
embedded hypervisors. Users configure the IDSDM for storage or as the operating
system boot partition.
Important:
When the redundancy option is set to Mirror Mode, the information
is replicated from one SD card to another.
ESXi 7.0 must be at 7.0 U2c to avoid potential issues with writing to
the SD cards. Find more information about partition intermittently
breaks in the VMware knowledge base.
The IDSDM also supports a vFlash card with an iDRAC Enterprise
license.
vFlash only ships with the IDSDM on 14G servers. The PowerEdge
15G servers no longer support vFlash with the iDRAC.
Dell offers two types of Boot Optimized Storage Solution (BOSS) cards.
BOSS-S1
44 The data is written on both cards, but the data is read from the first card. If the
first card fails or is removed, the second card automatically becomes active.
Click here to learn more about the Dell BOSS-S1 card through the Dell Boot
Optimized Server Storage-S1 User's Guide.
BOSS-S2
Click here to learn more about the Dell BOSS-S2 through the Dell Technologies
Boot Optimized Storage Solution-S2 User's Guide.
RAID
Data is distributed across the drives in several ways known as RAID levels. Based
on the customer requirements, the RAID levels can be configured for optimal
performance. Click here to learn more about available RAID levels and
specifications.
1:
2:
3:
45Parity data is redundant data that is generated to provide fault tolerance within
certain RAID levels.
4:
RAID 6 uses the concept of dual parity with block-level disk striping. RAID 6 allows
two disk failures without duplicating the contents of entire physical disks. The disk
capacity is calculated by n-2. If there are four disks, then the virtual disk capacity
is the total size of two disks.
The minimum number of disks required to configure RAID 6 is four. RAID 6 can
have a maximum of 32 drives.
5:
RAID 10 combines RAID 0 and RAID 1 with a minimum of four disks. In RAID 10,
two disks are striped and mirrored onto two other disks, creating a single array of
disk drives.
The minimum number of disks required to configure RAID 10 is four. RAID 6 can
have a maximum of 240 drives.
6: RAID 50 (RAID 5+0), a type of nested RAID level, combines the block-level
striping of RAID 0 with the distributed parity of RAID 5.
RAID 0
An example of RAID 0.
RAID 0 uses the concept of striping that allows data to be written across multiple
hard drives instead of one physical disk. RAID 0 involves the partitioning of each
physical disk storage space into 64 KB stripes.
The minimum number of disks required to configure RAID 0 is two. RAID 0 can
have a maximum of 32 drives.
• Performance boost for read and write operations due to the striping of data
across multiple disks.
• Increases the total size of available space that is presented to the operating
system.
RAID 1
An example of RAID 1.
RAID 1 uses the concept of data mirroring. Data is mirrored or cloned to other
disks so that if one of the disks fails, the other one can be used.
The minimum number of disks required to configure RAID 1 is two. RAID 1 can
have a maximum of 32 drives.
• Write performance is reduced since all the drives must be updated whenever
new data is written.
• Disk space is wasted to duplicate the data thereby increasing the cost to
storage ratio.
RAID 5
An example of RAID 5.
RAID 5 uses the concept of distributed parity with block-level disk striping. RAID 5
stripes data blocks across multiple disks like RAID 0 while storing parity
information. The disk capacity is calculated by n-1. If there are three disks, then
the virtual disk capacity is the total size of two disks.
The minimum number of disks required to configure RAID 5 is three. RAID 5 can
have a maximum of 32 drives.
RAID 6
An example of RAID 6.
RAID 6 uses the concept of dual parity with block-level disk striping. RAID 6 allows
two disk failures without duplicating the contents of entire physical disks. The disk
capacity is calculated by n-2. If there are four disks, then the virtual disk capacity
is the total size of two disks.
The minimum number of disks required to configure RAID 6 is four. RAID 6 can
have a maximum of 32 drives.
RAID 10
RAID 10 combines RAID 0 and RAID 1 with a minimum of four disks. In RAID 10,
two disks are striped and mirrored onto two other disks, creating a single array of
disk drives.
The minimum number of disks required to configure RAID 10 is four. RAID 6 can
have a maximum of 240 drives.
RAID 50
RAID 50 (RAID 5+0), a type of nested RAID level, combines the block-level
striping of RAID 0 with the distributed parity of RAID 5.
RAID 60
RAID 60 (6+0), a type of nested RAID level, combines the block-level striping of
RAID 0 with the dual distributed parity of RAID 6.
The table highlights the major differences between each RAID level.
Hot Spare
Hot spares46 are dedicated standby disks. When a hard drive that is used in a
virtual disk fails, the assigned hot spare47 is activated to replace the failed hard
drive without interrupting the system or requiring any intervention. When a hot
spare is activated, it rebuilds the data for all redundant virtual disks that were
using the failed hard drive.
46 A hot spare must be at least as large as the drive it is to replace, and a hot
spare must be the same drive type (SAS/SATA) as the drive it is to replace.
47 Hot spare cannot be assigned to 7200 RPM disks to replace 10 drives. It also
48The PERC 10 series can be configured so that the system backplane or storage
enclosure disk slots are dedicated as hot spare slots. This feature can be enabled
using the Dell OpenManage storage management application.
PERC Overview
The Dell PowerEdge RAID Controller (PERC) is a series of RAID disk storage
controllers which support SAS, SATA hard drives, and Solid-State Drives (SSDs).
NVMe hardware RAID support is available with the PERC 11 (H755N front,
H755MX and H755 adapter).
H745 Adapter
1 Heat sink
2 Battery
6 PCIe Connector
H745 Front
2 Battery
3 Heat sink
7 Power connector
H345 Adapter
1 Heat sink
4 PCIe connector
H345 Front
2 Heat sink
5 Power connector
H755 Adapter
1 Heat sink
2 PCIe connector
3 Battery
4 Backplane connector A
5 Backplane connector B
H755 Front
1 Battery
4 Heat sink
5 Backplane connector A
6 Backplane connector A
H755 NVMe
1 Battery
4 Heat sink
5 Backplane connector A
6 Backplane connector B
H755MX
The PERC11 controller introduces new features that boost performance. PERC11
supports the PCIe Gen4 host interface and the upgraded DDR4 8GB 2666MT/s
cache memory. However, the greatest addition to this generation of technology is
the inclusion of NVMe hardware RAID support. NVMe hardware RAID support is
available on the H755N front, H755MX and H755 adapter form factors.
The H755MX cable is used and shipping in modular systems such as with the
MX700 chassis.
• PERC 10 controllers have three form factors: mini monolithic, MX, and adapter.
• The mini-monolithic controller plugs in to the system board and the backplane
using a cStack cable.
• The MX version connects to the system board through the mini mezzanine slot.
• The PCIe adapter plugs into the server riser card. The backplane is then
connected using an internal SAS cable.
• PERC 1150 and PERC 1051 controllers replace the mini monolithic form factor
with the front PERC. This redesign has shorter cabling between the PERC,
backplane, and system board.
• PERC cabling is used for all data communication including Storage Enclosure
Services (SES), SAS, PCIe, and I2C data.
50 PERC 11 cards are an upgraded version of PERC 10 cards and are used with
Intel 15G PowerEdge servers.
51 PERC 10 cards were released with PowerEdge 14G and 15G servers.
Below are different types of PERC 10.6 and 11.1 cards supported by PowerEdge
15G servers.
1 Heatsink
2 PCIe connector
3 Battery
4 Backplane connector A
5 Backplane connector B
1 Heatsink
2 Battery
6 PCIe connector
2 Battery
3 Heatsink
7 Power connector
1 Heatsink
4 PCIe connector
2 Heatsink
5 Power connector
1 Battery
4 Heatsink
5 Backplane connector A
6 Backplane connector B
1 Battery
4 Heatsink
5 Backplane connector A
6 Backplane connector B
8: PERC H755 MX
The PERC H755MX does not support the MX5016s storage sled. The customers
want to use the MX5016s should use the HBA300MMZ (manages internal disks
only) or jumbo PERC (manages both internal and storage sled disks).
1. Heatsink
2. PCle connector
3. Battery
4. Backplane connector A
5. Backplane connector B
6. Battery cable connector
1. Battery
2. PCle input connector
3. Power card edge connector
4. Heatsink
5. Backplane connector A
6. Backplane connector B
7. Battery cable connector
1. Battery
2. PCle cable connector
3. Power card edge connector
4. Heatsink
5. Backplane connector A
6. Backplane connector B
7. Battery cable connector
1. Heatsink
2. Battery
3. Battery cable connector
4. Backplane connector A
5. PCle connector
1. Heat sink
2. Backplane connector
3. Backplane connector A
4. PCle connector
4. Backplane connector A
5. Power card edge connector
1. Heatsink
2. Backplane connector A
3. PCle connector
• A PERC translates the SCSI instructions and passes the instructions to the
NVMe drives.
• Windows Device Manager lists all the NVMe drives.
The image shows how the Windows Device Manager lists the NVMe drives.
In the Dell 15G servers, PERC has two options for enclosure configuration mode:
Unified Mode and Split Mode.
The split mode is indicated as <X:Y>. By default, the split mode is a <12:12> split.
X slots are assigned to one controller and Y slots are assigned to a different
controller.
Apply Changes
Once added to pending operations, click Apply Now to initiate the configuration
operation.
After the job is completed, a cold reboot is required to apply the changes.
PERC 10 Specifications
PERC 11 Specification
Ask: List few PERC cards from PERC 10.6 series which support 14G and 15G
both.
Accept answers and discuss. Ensure that the following points are covered:
Backplanes
PowerEdge backplane.
The backplane is the board where the HDDs/SSDs connect. Logically splitting into
two separate backplanes to mirror channels creates a single point of failure so if
multiple HDDs lose connection the backplane should be investigated. All the
drives not powering on indicates that the backplane may have lost its power. Also
the drives may be powered on but the system reports that it cannot connect to any
or some of the drives. The data connection failing between the backplane and the
system board or disk controller may cause the issue.
If the system does not have a backplane, users can cable drives directly to the
system board.
52 14G and 15G SAS backplanes meet 12 Gbps SAS and 6 Gbps SATA
requirements. 15G NVMe backplanes meet 16 GTs PCIe Gen4 requirements with
2 lanes per drive. 14G NVMe backplanes meet 8 GTs PCIe Gen3 requirements
with 4 lanes per drive.
Universal Backplane
Server security focuses on protecting the data and resources that are
stored in servers.
55 The chip includes a unique endorsement key that is baked into the
module during manufacturing, like a digital fingerprint to establish the
trustworthiness of data and applications. This cross-platform solution
engages at the lowest level of system operation, protecting against
unauthorized firmware and software modifications that can undermine
system integrity.
The TPM cannot be removed from one system board and installed on
another system board.
TPM 2.0 is not fully supported in legacy BIOS mode because there is
no pointer to TPM logs in legacy BIOS mode.
Each domain or hierarchy of TPM has its own resources and controls.
The iDRAC is responsible for RoT and verifies BIOS SPI code before
allowing host chipset & CPU to run any code.
RoT Purpose
The silicon-based RoT starts a chain of trust to ensure systems boot with
a legitimate BIOS code. If the performed BIOS code has been verified as
legitimate, those credentials are trusted by the execution of each
subsequent code.
RoT Operation
1. On a server, the silicon chip acts to validate that the BIOS is legitimate
by checking its encrypted signature.
2. This encrypted signature (a Dell encryption key) is burned into silicon
during the manufacturing process and cannot be changed.
The only way to make the Root of Trust robust is to do it in hardware. The
read-only encryption keys are burned into PowerEdge servers at the
factory. These keys cannot be changed or erased. When the server
powers on, the hardware chip verifies that the BIOS code is legitimate
from Dell using the key that is burned into silicon in the factory.
• The BIOS Live Scanning feature enables users to scan the system
BIOS once POST is completed. This task can be run once or can be
set up on a schedule.
• The scan period could be once a week, once a month or once in a year
(adjustable by end user).
• The BIOS Live Scanning is a licensed feature available only with
iDRAC Datacenter license in 15G systems only.
Image of the iDRAC UI with the BIOS Live Scanning option highlighted.
The 14G and the 15G PowerEdge servers support the Intel Boot Guard
verified boot feature. Boot Guard protects the server BIOS.
• Basic Input Output System (BIOS) is implicitly a critical element of any
solution stack that includes risks59 while updating.
• The BIOS persists between power cycles, becoming a potentially
attractive target for malicious attacks.
• Attacks against the BIOS are typically hard to detect. Attacks run
before the operating system and other security software loads. This
mechanism leaves a platform or organization exposed to further threat
or performance issues.
Boot Guard is a processor feature that prevents the system from running
the firmware images that are not released by the manufacturer. It also
allows the BIOS or UEFI to verify that the BIOS is not compromised before
booting.
In the Boot Guard verification method, the CPU compares the current
BIOS or UEFI firmware image with an official hash-generated version of
the image that is stored on PowerEdge servers.
Discussion
Boot Guard is a processor feature that prevents the system from running
the firmware images that are not released by the manufacturer. It also
allows the BIOS or UEFI to verify that the BIOS is not compromised before
booting.
Discussion
Due to the critical nature of the BIOS and the perceived risks of updating,
some customers hesitate to perform scheduled updates during a server
lifecycle. This can leave a platform or organization exposed to even further
threat or performance issues. For this reason, we have implemented
multiple new features including Boot Guard.
1. The Boot Guard extends the platform RoT to the Platform Controller
Hub (PCH).
2. The PCH contains One-Time Programmable (OTP) fuses that are
burned at the Dell factory during the manufacturing process.
3. The OTP contains the selected Boot Guard policy and the hash of the
master public key.
4. The key manifest on the BIOS SPI flash is signed by the Dell master
Original Equipment Manufacturer (OEM) key and delegates authority
to the boot policy manifest key.
5. Each BIOS module contains a hash value of the next module in the
chain and uses the hash value to validate the next module.
If the Boot Guard event detects any issue in the BIOS image before
booting, it immediately activates the BIOS or UEFI recovery feature and
attempts to recover a backup BIOS or UEFI.
The Boot Guard event and the subsequent events that perform the BIOS
or UEFI recovery are captured in the Lifecycle Controller log as
highlighted in the image.
There are two BIOS ROMs in the system, one that is 32MB (for the normal
full-sized BIOS) and another 16MB recovery ROM.
Secure Boot
The UEFI Secure Boot is a technology that secures the boot process by
verifying if the drivers and operating system loaders are signed by the key
that is authorized by the firmware.
The BIOS authenticates each module that is run during the boot process
using certificates in the Secure Boot policy. Before the system BIOS loads
a module into memory, Secure Boot checks whether the module has the
authorization to run the system. This is done by launching various code
modules, such as: device firmware, diagnostics, and operating system
loaders.
The BIOS authenticates each module that is run during the boot process
using certificates in the Secure Boot policy. Before the system BIOS loads
a module into memory, Secure Boot checks whether the module has the
authorization to run the system. This is done by launching various code
modules, such as device firmware, diagnostics, and operating system
loaders.
The Secure Boot policy allows a user to specify the policy or digital
signature that BIOS uses to authenticate. The policy can be classified as:
• Standard: BIOS uses the default set of certificates to validate the
drivers and operating system loaders during the boot process. By
default, the Secure Boot Policy is set to Standard.
• Custom: BIOS uses the specific set of certificates that can be
imported or deleted from the standard certificates to validate the
drivers and operating system loaders during the boot process.
The Secure Boot policies in the latest technology of Dell servers are
described in terms of various modes.
Configuration Validation
Overview
Configuration Validation:
• Error Message - HWC8010: Occurs when there are one or two issues
in the configuration.
• Error Message - HWC8011: Occurs when there are multiple issues in
the configuration.
The product specific user manuals provide additional details about these
errors.
Error Messages
The following table highlights the HWC8010 and HW8011 error messages
along with the interpretation of the error.
The Dell PowerEdge 750xa supports two GPU configurations in the chassis front
end.
• Four GPUs configuration
− In this configuration, each riser has two GPUs.
• Two GPUs configuration
− In this configuration, each riser has one GPU. The empty slots should be
installed with a dummy GPU module.
− For example, if GPUs are installed in slot 31 and slot 33, then dummy GPU
modules must be installed in slot 32 and slot 34. Similarly, if two GPUs are
installed in slot 33 and slot 34, then dummy GPU modules must be installed
in empty slots 31 and slot 32.
RAID in OMSA
Dell recommended guidelines for installing memory module on a 15G server are:
• All DIMMs must be DDR4.
• RDIMMs and LRDIMMs must not be mixed.
• DRAMs with x4 and x8 based memory modules can be mixed.
• In Optimizer Mode, the DRAM controllers operate independently in the 64-bit
mode and provide optimized memory
performance.
• If memory modules with different speeds are installed, they will operate at the
speed of the slowest installed memory module(s) or slower, depending on the
system DIMM configuration.
• Populate all the sockets with white release tabs first and then the black release
tabs.
• Memory modules of different capacities can be mixed, provided other memory
population rules are followed.
• When mixing memory modules with different capacities, populate the sockets
with memory modules with the highest capacity first.
− For example, if the customer wants to mix 8 GB and 16 GB memory
modules, populate 16 GB memory modules in the sockets with white
release tabs and 8 GB memory modules in the sockets with black release
tabs.
• In a dual-processor configuration, the memory configuration for each processor
should be identical.
• Mixing of more than two memory module capacities in a system is not
supported.
• Unbalanced or odd memory configuration results in a performance loss and
system may not identify the memory modules being installed, so always
populate memory channels identically with equal DIMMs for best performance.
• Supported RDIMM / LRDIMM configurations are 1, 2, 4, 6, 8, 12, or 16 DIMMs
per processor.
Dell recommended guidelines for installing memory module on a 14G server are:
• All DIMMs must be DDR4.
• RDIMMs and LRDIMMs/NVDIMMs and LRDIMMs should not be mixed.
• NVDIMMs and RDIMMs can be mixed.
• 64 GB LRDIMMs should not be mixed with 128 GB LRDIMMs.
• DRAMs with x4 and x8 based memory modules can be mixed.
• Up to two RDIMMs and LRDIMMs can be populated per channel regardless of
rank count.
• A maximum of two different ranked DIMMs can be populated in a channel
regardless of rank count.
• If memory modules with different speeds are installed, they will operate at the
speed of the slowest installed memory module(s) or slower, depending on the
system DIMM configuration.
• Populate all the sockets with white release tabs first, followed by the black
release tabs.
• Memory modules of different capacities can be mixed, provided other memory
population rules are followed.
• When mixing memory modules with different capacities, populate the sockets
with memory modules with the highest capacity first.
− For example, if the customer wants to mix 8 GB and 16 GB memory
modules, populate 16 GB memory modules in the sockets with white
release tabs and 8 GB memory modules in the sockets with black release
tabs.
• In a dual-processor configuration, the memory configuration for each processor
should be identical.
• Mixing of more than two memory module capacities in a system is not
supported.
• Unbalanced memory configurations will result in a performance loss so always
populate memory channels identically with identical DIMMs for best
performance.
Dell recommended guidelines for installing memory modules on a 13G server are:
• RDIMMs and LRDIMMs must not be mixed.
• DRAMs with x4 and x8 based memory modules can be mixed.
• Up to three dual-rank or single-rank RDIMMs can be populated per channel.
• Up to three LRDIMMs can be populated per channel regardless of rank count.
• If memory modules with different speeds are installed, they will operate at the
speed of the slowest installed memory module(s) or slower, depending on
system DIMM configuration.
• Populate memory module sockets only if a processor is installed.
• Populate all the sockets with white release tabs first, followed by the black
release tabs, and then the green release tabs.
• Memory modules of different capacities can be mixed, provided other memory
population rules are followed.
• When mixing memory modules with different capacities, populate the sockets
with memory modules with highest capacity first.
− For example, if the customer wants to mix 4 GB and 8 GB memory
modules, populate 8 GB memory modules in the sockets with white release
tabs and 4 GB memory modules in the sockets with black release tabs.
• In a dual-processor configuration, the memory configuration for each processor
should be identical.
• Mixing of more than two memory module capacities in a system is not
supported.
• Populate four memory modules per processor (one DIMM per channel) at a
time to maximize performance.
Fans may be running at high or full speed for various reasons. The workload
running on the server can result in high CPU utilization and thus an increase in
cooling requirement. If the system is idle and fans are still at full speed, then either
a hardware option (such as a high-power card or a third-party PCIe adapter)
presents in the server requires that full fan speed, or there is a failure of sensor
communication, a fan failure, or operation of a server without chassis cover and/or
air regulating shroud. Some systems require blanks for nonpopulated hard drive
slots, DIMM slots and/or CPU. Cooling for certain components may be
compromised if these blanks are missing, resulting in higher fan speeds.
Thermal algorithms define the minimum system fan speeds based on ambient
temperature, system configuration and system utilization. Allowing the user to
reduce fan speed could put system cooling at risk, potentially causing system
thermal-related failures. The only instance in which the user can reduce system
fan speed is when a third-party PCIe adapter card is part of the configuration for
which a thermal algorithm provides cooling based on limited information from the
card. This response may result in overcooling of the card. In this case, the user
can turn off the fan response that is associated with this card, or define a custom
airflow value for the card (iDRAC Web interface or RACADM). Turning off third-
party Card fan response may reduce the fan speed if other components within the
system are not requesting a higher fan speed than the response requested by the
third-party PCIe adapter card. It is not recommended to turn off this response
unless the user is aware of the cooling requirement of the adapter card.
I hear a fan spinning but my server is not powered ON. Is that expected?
Some server platforms are designed to allow one particular fan-in the system to
power ON when the system is in standby (AUX) state (AC plugged in, but power
button not pressed). This fan may run under some system inlet ambient conditions
to ensure cooling for onboard network devices that may be active in system AUX
state.
Fan speeds are expressed in Revolution Per Minute (RPM) but the input signal
that drives the fan to run at different speed is expressed as PWM (Pulse Width
Modulation). PWM can be any number between 0% and 100%. It should be noted
that a PWM of 0% generally does not mean that a fan is OFF. 0% is typically
defined as the fan’s lowest operational speed. Conversely, at 100% PWM, fans
run at the maximum RPM. The relationship between fan PWM and RPM is linear.
Various custom thermal settings are available and accessible using iDRAC
interfaces like RACADM, iDRAC UI, and BIOS HII browser. These thermal options
include, Custom Thermal Profiles (Maximum Performance, Maximum
Performance per Watt, Sound Cap); custom fan speed options (minimum fan
speed, fan speed offsets); and reduced Exhaust Temperature settings. In addition,
custom airflow settings can be applied to third-party PCIe adapter cards through
RACADM and iDRAC UI interfaces. The easiest way to access these options is to
connect to the iDRAC Web UI of the server and go to Cooling -> Configure Fans -
> Fan Configuration.
Sound Cap is a new feature of PowerEdge 14G servers. Sound Cap was
developed in response to customer requests and is for specialized environments
in which minimizing acoustical output is a higher criteria than peak raw
performance. Sound Cap limits, or “caps”, CPU power consumption and thus fan
speed, resulting in a lower acoustical ceiling. Its application is unique for
acoustical deployments and may result in reduced system performance.
Why are PCIe adapter cards installed based on a slot priority requirement in
the server?
There are various reasons that slot restrictions exist for certain cards. Some
common ones are:
• Certain slots are limited by PCIe lane width (like x4, x8, x16).
• Mechanically, a card may fit only in certain locations. This can be based on
such as whether the card is single wide vs. double wide, or standard-length or
full-length card.
• Cabling that is connected to the card that requires the card to be in a certain
location for optimum cable routing.
• Cooling limitations in certain slots, such as airflow limitations may cause a
Cooling or Thermal priority.
Where can I find more information about PCIe adapter card cooling on the
server?
The best place to look for this information is within the iDRAC Web UI. From the
iDRAC home screen, select Cooling -> Fan Overview -> Configure Fans. Then
scroll down to see the “PCIe Airflow Settings”. This section displays all the PCIe
adapter slots present in the system and the maximum airflow in LFM (Linear Feet
per Minute) available at each slot (when all fans are at full speed). This section
also indicates if a particular PCIe adapter card is considered a third-party Card,
and if so, what LFM is being provided. The user has the option to customize the
airflow based on the card specifications. This feature is new with PowerEdge 14G
servers and is an industry first.
Why is the top cover of my system hot and is that an indication of potential
cooling problem? OR Why are CPU temperatures high? OR Why is the air
coming out of the server so hot?
The system top cover may get hot in local regions above the CPU heatsinks or
near the back of the system. This occurs most commonly in dense systems and in
1U servers. The localized heating of the top cover is due to the close proximity of
the cover to the CPU heatsink or to the heated exhaust air at the rear of the
system. The surface- and exhaust temperatures should not exceed safety limits of
70°C. Components such as CPUs, GPUs, and general board components are
designed to run at higher temperatures without impact component or system
reliability. Users wanting to review or adjust system temperatures or exhaust air
temperature can use Custom Thermal Settings through various iDRAC interfaces
to increase fan speed (and thus system cooling) by applying any one of the Fan
Speed Offset, Minimum Fan Speed, and/or Custom Exhaust Temperature options.
Many high-power compute GPU devices that are passively cooled require
platform-specific configuration restrictions, and those are allowed only on a limited
number of platforms. Lower power (such as less than 75 W) PCIe adapters are
supported on all platforms. See platform-specific limitations to ensure compliance.
Some platforms require different CPU heat sinks based on the installed CPU TDP
or other specific hardware options. For example, shorter heat sinks and a different
air shroud are required in the R740 and R740xd to allow for proper GPU cooling.
See the individual platform details for specific information.
NVMe Support
Intel Gen 4
P5500/P5600
Kioxia Gen 4
CD6/CM6
Samsung Gen 4
1733/1735
Parity
Parity is small data that is used to connect to larger data blocks and
recover data when a disk failure occurs.
vFlash card
vFlash cards provide a shared storage space between the server system
and its iDRAC.