0% found this document useful (0 votes)

57 views10 pages

HPC Cluster Tuning Guide On 3rd Generation Intel Xeon Scalable Processors 1

Uploaded by

Sean Marrs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views10 pages

HPC Cluster Tuning Guide On 3rd Generation Intel Xeon Scalable Processors 1

Uploaded by

Sean Marrs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Optimization Guide

High Performance Computing

Intel® Xeon® Scalable Processor

HPC Cluster Tuning on

3rd Generation Intel® Xeon® Scalable Processors

Revision Record ................................................................................................................................. 2

1. Introduction ............................................................................................................. 3
1.1. 3rd Generation Intel® Xeon Scalable Processors ....................................................... 3
2. Hardware Configuration ........................................................................................... 3
2.1. DIMM Slot Configuration ...................................................................................................... 3
2.2. Memory Size ............................................................................................................................... 4
2.3. Memory Errors ........................................................................................................................... 4
2.4. BIOS Settings ............................................................................................................................. 4
3. Linux Optimizations .................................................................................................. 5
3.1. Virtual Machines ....................................................................................................................... 5
3.2. Network Configuration .......................................................................................................... 6
3.3. Disk Configuration ................................................................................................................... 6
3.4. CPU Configuration ................................................................................................................... 6
3.5. Services......................................................................................................................................... 6
4. Application Settings .................................................................................................. 6
4.1. Development Environment ................................................................................................. 6
4.2. User Environment .................................................................................................................... 7
4.3. Benchmark Optimization ...................................................................................................... 7
5. References ................................................................................................................... 9
5.1. numactl ......................................................................................................................................... 9
5.2. PCM ................................................................................................................................................ 9
Product Brief | Title

Revision Record

Date Rev. Description

04/06/2021 1.0 Initial public release.

Revision 1.0 Page 2 | Total 10

Optimization Guide | HPC Cluster Tuning for 3rd Generation Intel® Xeon® Scalable Processors

Optimizing performance of servers used for the high-performance computing (HPC) applications may require different
configuration options than for servers used with other enterprise applications. For HPC clusters, the goal is to reduce
workload runtimes for applications using MPI libraries and high-performance fabrics This guide will cover many system
and software configuration options that have been demonstrated to improve application performance in internal
controlled tests.

All configuration settings are intended for multi-node HPC clusters running 2-socket 3rd Generation Intel® Xeon
processor-based servers. No other hardware has been evaluated for this guide.

The objective of this guide is to provide an environment optimized for typical multi-user production clusters.
Configuration settings should be beneficial to a broad list of multi-node applications using MPI libraries. However, HPC
applications may be affected differently by settings using in this guide; therefor performance improvement for any
single application cannot be guaranteed.

3rd Generation Intel® Xeon® Scalable processors (former codename “Ice Lake”) deliver industry-leading,
workload-optimized platforms with built-in AI acceleration, providing a seamless performance foundation to help
speed data’s transformative impact, from the multi-cloud to the intelligent edge and back. Here are some of the
features in these new processors:

• Enhanced performance
• Enhanced Intel® Deep Learning Boost with VNNI
• More Intel® Ultra Path Interconnect (UPI) links
• Increased DDR4 memory speed and capacity (2 integrated memory controllers; 4 channels per controller)
• Intel® Advanced Vector Extensions (Intel® AVX)
• Intel® Security Essentials supporting Intel® Security Libraries for Data Center (Intel® SecL-DC)
• Intel® Speed Select Technology (Intel® SST)
• Support for Intel® Optane™ Persistent Memory 200 series

 Populate all memory channels with the fastest DIMM speed supported by the platform.
Intel 3rd Generation Xeon Scalable processors supports 8 memory channels per processor. Every memory channel
should be occupied by at least one DIMM. Use identical dual-rank, registered DIMMS for all memory slots. Dual-rank
DIMMs will perform better than single rank DIMMs. DIMM speed should be the fasted speed supported by the platform.

At the same memory speed, 2 DIMMs per channel may perform slightly better than 1 DIMM per channel, if memory
speed is not reduced by using more than 1 DIMM.

Do not use Intel® Optane™ memory for HPC benchmarks.

Revision 1.0 Page 3 | Total 10

Optimization Guide | HPC Cluster Tuning for 3rd Generation Intel® Xeon® Scalable Processors

Most HPC applications and benchmarks will benefit from larger physical memory size. Specific requirements will be
determined by the application. As an example, each cluster node running the GROMACS or LAMMPS molecular
dynamics codes should have a minimum 96 GB of RAM installed. Following memory population guidelines and using
one 16-GB DDR4 DIMM in each memory channel would provide a total 256GB of RAM for a 2-socket system.

A repeating, corrected memory error will reduce performance. If memory correction is enabled, check dmsg or the
system event log to confirm there are no corrected memory events and replace any DIMMs that show repeated memory
errors.

 Enable Sub-NUMA Clusters, enable One-way IMC Interleave, and set power profiles to “Performance”
CPU Power and Performance Policies and Fan Profiles should always be set to “Performance”.

Enable Turbo Mode. It is unlikely to improve HPC application results due to high CPU utilization, but it will not reduce
performance. You may disable it if performance metrics for a benchmark run must be consistent with previous runs.

The recommended setting for hyper-threading (SMT or Symmetric Multi-Threading) is enabled; however, performance
benefits will vary for each application. For some applications, a small decrease in performance may be observed. It is
recommended to evaluate hyper-threading performance for the applications that you will use. For the STREAM
benchmark specifically, the recommended setting for hyper-threading is Disabled.

Use optimized configuration settings and recommended values. Default values are noted by an asterisk (*).
Configuration Item Recommended Value
Hyper-Threading (SMT) Enabled* (see text)
Core Prefetchers Enabled*
Turbo Boost Technology Enabled*
Intel® SpeedStep® (P-States) Disabled
SNC (Sub-NUMA Clusters) Enabled
IMC Interleave One-way
UPI Prefetch Enabled*
XPT Prefetch Enabled*
Total Memory Encryption (TME) Disabled
Memory controller page policy Static closed
Autonomous Core C-State Disabled*
CPU C6 Report Disabled*
Enhanced Halt State (C1E) Disabled*
Package C State C0/C1 State*
Relax Ordering Disabled*
Intel VT for Directed I/O (Intel VT-D) Disabled*
CPU Power Policy Performance
Local/Remote Threshold Auto*
LLC Prefetch Disabled*

Revision 1.0 Page 4 | Total 10

Optimization Guide | HPC Cluster Tuning for 3rd Generation Intel® Xeon® Scalable Processors

Configuration Item Recommended Value

LLC Dead Line Alloc. Enabled*
Directory AToS Disabled*
Direct-to-UPI (D2U) Enabled
DBP-for-F Enabled

Sub-NUMA Cluster (SNC)

SNC is a feature that provides similar localization benefits as Cluster-On-Die (COD), a feature found in previous
processor families, without some of COD’s downsides. SNC breaks up the last level cache (LLC) into disjoint clusters
based on address range, with each cluster bound to a subset of the memory controllers in the system. SNC improves
average latency to the LLC and is a replacement for the COD feature found in previous processor families.

For all HPC applications, both SNC and XPT/UPI prefetch should be enabled. This will set two clusters per socket and
utilize LLC capacity more efficiently and reduces latency due to core/IMC proximity.

Integrated Memory Controller (IMC) Interleaving

This controls the interleaving between the Integrated Memory Controllers (IMCs). If SNC is enabled, IMC Interleaving is
set to one-way, and there will be no interleaving.

XPT (eXtended Prediction Table) Prefetch

Extended prediction table (XPT) Prefetch is a new capability that is designed to reduce local memory access latency.
XPT Prefetch is an “LLC miss predictor” in each core that will issue a speculative DRAM read request in parallel to an LLC
lookup, but only when XPT predicts a “miss” from the LLC lookup.

Ultra-Path Interconnect (UPI) Prefetch

UPI Prefetch is another new capability that is designed to reduce remote memory access latency. The UPI controller
issues a UPI Prefetch, also in parallel to an LLC lookup, to the memory controller when a remote read arrives to the
home socket.

Direct-to-UPI (D2U or D2K)

D2U is a latency-saving feature for remote read transactions. With D2U enabled, the IMC will send the data directly to
the UPI instead of going through the Caching and Home Agent (CHA), reducing latency. Keep enabled, although
workloads that are highly NUMA-optimized or that use high levels of memory bandwidth are less likely to be affected
by disabling D2U.

DBP-for-F

DBP-for-F is a new feature that can benefit multi-threaded workloads, but workloads that are single-threaded could
experience lower performance.

Revision 1.0 Page 5 | Total 10

Optimization Guide | HPC Cluster Tuning for 3rd Generation Intel® Xeon® Scalable Processors

Running applications inside of a virtual machine will reduce performance, although the reduction may be small.
Performance impact is dependent on the hypervisor and the configuration used. Do not execute HPC benchmarks in a
virtual machine.

Many HPC applications are fine-grained, requiring more frequent inter-process communication with smaller payloads.
As a result, performance is dependent more on communication latency than on bandwidth. High-performance fabrics
are used to minimize message latency.

For maximum performance, use one fabric host controller per CPU. In a server, each PCIe expansion slot is associated
with one of the CPUs. Install the fabric host controllers so that each CPU is associated with its own fabric host
controller. You will need to consult the system board technical specification to determine PCIe lane assignment.

Fabric performance may be further enhanced using multiple links (dual-rail or multi-rail). For many, bandwidth needs
are limited and using multi-rail does not benefit performance.

Disk configuration has little to no impact on benchmark performance, including HPL, HPCG, or STREAM.

Storage requirements for other HPC applications

Before running benchmarks, all CPUs should be set to performance mode. For example, to use the cpupower utility, run

cpupower -c all frequency-set --governor performance

 Disable all unnecessary services and cron jobs.

When running HPC benchmarks, do not run them as a job using a batch scheduler or resource manager, and those
services should be disabled until the benchmark is complete. Make certain that no other users are logged into any
systems that will be used during the benchmark.

 Build and execute applications using the latest Intel® oneAPI HPC Toolkit.
Intel® oneAPI Toolkits enable the development with a unified toolset, allowing developers to deliver applications and
solutions across CPU, GPU, and FPGA architectures. The Intel® oneAPI HPC Toolkit delivers what’s needed to build,
analyze, optimize, and scale HPC applications with the latest techniques in vectorization, multithreading, multi-node
parallelization, and memory optimization. The HPC toolkit is an add-on to the Intel® oneAPI Base Toolkit, which is
required.

The oneAPI Toolkits are available for installation using a local installer, or through online APT and YUM repositories. To

Revision 1.0 Page 6 | Total 10

Optimization Guide | HPC Cluster Tuning for 3rd Generation Intel® Xeon® Scalable Processors

install the latest toolkit, go to

https://software.intel.com/content/www/us/en/develop/tools/oneapi/hpc-toolkit/download.html

The oneMKL provides enhanced math routines and libraries, such as BLAS, LAPACK, sparse solvers, fast Fourier
transforms (FFT), random number generator functions (RNG), summary statistics, data fitting, and vector math. Use of
oneMKL is recommended for optimal performance of HPC applications and benchmarks.

The library is included in the Intel® oneAPI Base Toolkit, but oneMKL support for Intel® MPI library or Intel® Fortran
Compilers requires the Intel® oneAPI HPC Toolkit.

Use Intel MPI Library minimum version 2021.2.0. MPI applications should be compiled with this version using compilers
and libraries from Intel oneAPI toolkit 2021.2.0 or later.

For OpenMP based applications that do not benefit from using simultaneous multi-threaded cores, make sure that the
number of OpenMP threads do not exceed the available number of physical cores on the system. OpenMP threads are
controlled by setting the OMP_NUM_THREADS environment variable.

Also set the appropriate thread to core affinity, based on how Hyper-Threading is enabled on the server.

If Hyper-Threading is enabled:

export KMP_AFFINITY=granularity=fine,compact,1,0

If Hyper-Threading is disabled:

export KMP_AFFINITY=compact

If your system is configured with Omni-Path fabric and multiple links, enable multi-rail communication. Set the variable
PSM2_MULTIRAIL equal to the number of cable links on each host controller. By default, it is set to 1.

Optimized settings for HPC benchmarks may differ from applications.

Optimum tuning of the HP LINPACK benchmark uses custom configuration that may impact performance of other
benchmarks and applications. For that reason, it is not included here. For information on tuning the HP LINPACK
benchmark, contact your Intel representative.

 Use the Intel® Optimized HPCG benchmark included with the oneAPI Math Kernel Library
Revision 1.0 Page 7 | Total 10
Optimization Guide | HPC Cluster Tuning for 3rd Generation Intel® Xeon® Scalable Processors

The High Performance Conjugate Gradients (HPCG) Benchmark project (http://hpcg-benchmark.org) is designed to
complement the HP LINPACK (HPL) benchmark by providing metrics that more closely match a different and broad set
of important applications. It is designed to measure the performance of

• Sparse matrix-vector multiplication.

• Vector updates.
• Global dot products.
• Local symmetric Gauss-Seidel smoother.
• Sparse triangular solver

The Intel® Optimized HPCG benchmark provides an implementation of the HPCG benchmark optimized for Intel® Xeon®
processors with support for the latest processor technologies, including Intel® Advanced Vector Extensions (Intel® AVX),
Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector Extensions 512 (Intel® AVX-512).

The benchmark can be found in the “/benchmarks/hpcg” subdirectory under the oneAPI MKL installation. To prepare
the benchmark, follow the instructions at
https://software.intel.com/content/www/us/en/develop/documentation/onemkl-linux-developer-guide/top/intel-math
-kernel-library-benchmarks/intel-optimized-high-performance-conjugate-gradient-benchmark/getting-started-with-int
el-optimized-hpcg.html

Use the Scalable processor optimized binary xhpcg_skx. The Intel Optimized HPCG package also includes the source
code necessary to build these versions of the benchmark for other MPI implementations. Also note that:

• Small problem sizes will produce very good results. However, the problem size must be large enough so that it will
not fit into cache; otherwise it is considered an invalid run. It should occupy a minimum 25% of physical memory.
Optimum local dimension grid size will need to determine through practical evaluation.
• Longer runtimes beyond the required 3600s runtime do not appear to impact performance.
• Best results are obtained when using from 1 to 1.25 MPI process per total core count and 12 to 16 OpenMP threads
per MPI process. The optimal configuration will need to be determined through experimentation. Skip SMT cores
when assigning threads.

The STREAM benchmark is a simple, synthetic benchmark designed to measure sustainable memory bandwidth (in
MB/s) and a corresponding computation rate for four simple vector kernels (Copy, Scale, Add and Triad). It is also part
of the HPCC benchmark suite. Its source code is freely available from http://www.cs.virginia.edu/stream/. It measures:

• Sustainable memory bandwidth

• Corresponding computation rate for a simple vector kernel

The general rule for STREAM is that each array must be at least four times (4×) the sum of all last-level caches used in
the run. STREAM may be run in its standard form, or it may be optimized. When optimized, results must be identified as
such (see the STREAM FAQ at http://www.cs.virginia.edu/stream/ref.html).

For instructions on how to obtain the best performance of the standard STREAM benchmark on Intel processors, see
https://software.intel.com/content/www/us/en/develop/articles/optimizing-memory-bandwidth-on-stream-triad.html

Revision 1.0 Page 8 | Total 10

Optimization Guide | HPC Cluster Tuning for 3rd Generation Intel® Xeon® Scalable Processors

The numactl tool can be used to view the configuration and status of the NUMA node of the current server. For
example, CPU core count, memory size of each node, and the distance between different nodes. The process can be
bound to the specified CPU core through this tool, and the specified CPU core will run the corresponding process.

Figure: numactl sample output

You can view the of the status of current NUMA nodes using numastat. This includes local and remote memory access
by CPU cores.

Figure: numastat sample output

The Processor Count Monitor (PCM) can be used to monitor performance indicators of the Intel CPU core. PCM is often
used to monitor the bandwidth of persistent memory. The tool can be downloaded from https://github.com/opcm/pcm.

Figure: PCM example of monitoring persistent memory bandwidth

Revision 1.0 Page 9 | Total 10

Optimization Guide | HPC Cluster Tuning for 3rd Generation Intel® Xeon® Scalable Processors

Notices & Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

Code names are used by Intel to identify products, technologies, or services that are in development and not
publicly available. These are not "commercial" names and not intended to function as trademarks

The products described may contain design defects or errors known as errata which may cause the product to
deviate from published specifications. Current characterized errata are available on request.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its
subsidiaries. Other names and brands may be claimed as the property of others.

Revision 1.0 Page 10 | Total 10

TNC - Exe Software Manual
No ratings yet
TNC - Exe Software Manual
36 pages
Win Server 2022 Performance Tuning Guidelines
No ratings yet
Win Server 2022 Performance Tuning Guidelines
261 pages
Intel Xeon Cpu Max Series Configuration and Tuning Guide 1
No ratings yet
Intel Xeon Cpu Max Series Configuration and Tuning Guide 1
36 pages
HPC Clusters Best Practices Performance Study
No ratings yet
HPC Clusters Best Practices Performance Study
38 pages
3rd Gen - Intel Xeon - Scalable Processors HPC - Sales Guide - Manufacturing Public
No ratings yet
3rd Gen - Intel Xeon - Scalable Processors HPC - Sales Guide - Manufacturing Public
2 pages
Parallel Programming and Optimization With Intel Xeon Phi Coprocessors
No ratings yet
Parallel Programming and Optimization With Intel Xeon Phi Coprocessors
520 pages
Data Plane Development Kit - Performance Optimization Guidelines
No ratings yet
Data Plane Development Kit - Performance Optimization Guidelines
41 pages
Xeon Intel Server Processor Comparison Guide
No ratings yet
Xeon Intel Server Processor Comparison Guide
11 pages
CPU Xeon Intel Server Processor Comparison Guide
No ratings yet
CPU Xeon Intel Server Processor Comparison Guide
8 pages
MPSS Users Guide-Windows
No ratings yet
MPSS Users Guide-Windows
74 pages
Qat Performance Optimization Guide
No ratings yet
Qat Performance Optimization Guide
20 pages
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
A Duc - Intel Keynote 40mins For HPE Compute Day May-June 2021
No ratings yet
A Duc - Intel Keynote 40mins For HPE Compute Day May-June 2021
20 pages
Performance Tuning Guide Ucs m6 Servers
No ratings yet
Performance Tuning Guide Ucs m6 Servers
25 pages
Intel 2nd Gen Xeon Scalable Processors Brief
No ratings yet
Intel 2nd Gen Xeon Scalable Processors Brief
17 pages
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
From Everand
PlayStation 2 Architecture: Architecture of Consoles: A Practical Analysis, #12
Rodrigo Copetti
No ratings yet
Intel Processor Comparision
No ratings yet
Intel Processor Comparision
18 pages
S2600ST P4000ConfigGuide PDF
No ratings yet
S2600ST P4000ConfigGuide PDF
64 pages
2nd Gen Xeon SP Transition Guide Final
No ratings yet
2nd Gen Xeon SP Transition Guide Final
6 pages
Big Data Technologies Infographic
No ratings yet
Big Data Technologies Infographic
1 page
System Tools User Guide
No ratings yet
System Tools User Guide
172 pages
High Performance Computing
No ratings yet
High Performance Computing
17 pages
Previous Generation: Scalable Performance For Data-Demanding Applications
No ratings yet
Previous Generation: Scalable Performance For Data-Demanding Applications
9 pages
16G Poweredge Bios Performance
No ratings yet
16G Poweredge Bios Performance
6 pages
h19559 Memory Population Rules For Intel Xeon Scalable Processors On Poweredge
No ratings yet
h19559 Memory Population Rules For Intel Xeon Scalable Processors On Poweredge
12 pages
Nginx HTTPs With Crypto NI Tuning Guide On 3rd Generation Intel Xeon Scalable Processors
No ratings yet
Nginx HTTPs With Crypto NI Tuning Guide On 3rd Generation Intel Xeon Scalable Processors
12 pages
Intel Remote Management Module 4 and Integrated BMC Web Console User Guide
No ratings yet
Intel Remote Management Module 4 and Integrated BMC Web Console User Guide
136 pages
User's Guide: Quick Links
No ratings yet
User's Guide: Quick Links
48 pages
Intel Xeon E5 v3 Thermal Guide
No ratings yet
Intel Xeon E5 v3 Thermal Guide
102 pages
3rd Generation Intel Xeon Scalable Processors
No ratings yet
3rd Generation Intel Xeon Scalable Processors
20 pages
3rd Generation Intel Xeon Scalable Processors
No ratings yet
3rd Generation Intel Xeon Scalable Processors
20 pages
Performance Tuning Guide For Mellanox Network Adapters Rev 1 0
No ratings yet
Performance Tuning Guide For Mellanox Network Adapters Rev 1 0
13 pages
Configuring Low Latency Environments On Dell Poweredge 12g Servers
No ratings yet
Configuring Low Latency Environments On Dell Poweredge 12g Servers
8 pages
Power IO Performance Tuning Guide - V3
No ratings yet
Power IO Performance Tuning Guide - V3
51 pages
CXL SW Guide
No ratings yet
CXL SW Guide
121 pages
Skylake Architecture
No ratings yet
Skylake Architecture
31 pages
Intel Server Roadmap
No ratings yet
Intel Server Roadmap
19 pages
梁存铭Intel - Core - effeciency PDF
No ratings yet
梁存铭Intel - Core - effeciency PDF
21 pages
3rd Gen - Intel Xeon - Scalable Processors HPC - Sales Guide - FSI Public
No ratings yet
3rd Gen - Intel Xeon - Scalable Processors HPC - Sales Guide - FSI Public
2 pages
Intelserversystembiossetuputilityguide10 Final
No ratings yet
Intelserversystembiossetuputilityguide10 Final
119 pages
Server Master Class Main Deck v3 (Partners Share) PDF
No ratings yet
Server Master Class Main Deck v3 (Partners Share) PDF
96 pages
Configuring and Tuning HP Servers For Low-Latency Applications-C01804533
No ratings yet
Configuring and Tuning HP Servers For Low-Latency Applications-C01804533
29 pages
Tuning Guide PDF
No ratings yet
Tuning Guide PDF
44 pages
KNL Presentation TACC Summer School - Shared
No ratings yet
KNL Presentation TACC Summer School - Shared
73 pages
Intel® HD Graphics P4600: Guide
No ratings yet
Intel® HD Graphics P4600: Guide
4 pages
Performance Xeon E3 1200 HD Graphics p4000 Guide
No ratings yet
Performance Xeon E3 1200 HD Graphics p4000 Guide
4 pages
Intel Xeon Scalable Platform: Product Brief
No ratings yet
Intel Xeon Scalable Platform: Product Brief
14 pages
15DD
No ratings yet
15DD
51 pages
Intel Optimization Reference Manual V1 050
No ratings yet
Intel Optimization Reference Manual V1 050
895 pages
WP Bios Settings Primergy WW en
No ratings yet
WP Bios Settings Primergy WW en
9 pages
Intel Xeon Processor Scalable Family BIOS User Guide
No ratings yet
Intel Xeon Processor Scalable Family BIOS User Guide
156 pages
ThinkSystem SR665 V3 Sets 4 World Records
No ratings yet
ThinkSystem SR665 V3 Sets 4 World Records
5 pages
Intel Xeon 6 Product Brief
No ratings yet
Intel Xeon 6 Product Brief
6 pages
Performance Tuning Windows Server 2016
100% (1)
Performance Tuning Windows Server 2016
199 pages
Configuring Dell PowerEdge Servers For Low Latency White Paper
No ratings yet
Configuring Dell PowerEdge Servers For Low Latency White Paper
7 pages
Hpe Course Hpe Server Options
No ratings yet
Hpe Course Hpe Server Options
21 pages
ThinkSystem SR665 V3 Sets 4 World Records
No ratings yet
ThinkSystem SR665 V3 Sets 4 World Records
5 pages
Optmize Performance Windows Server 2016 PDF
No ratings yet
Optmize Performance Windows Server 2016 PDF
235 pages
Configuring Dell PowerEdge Servers For Low Latency 12132010 Final
No ratings yet
Configuring Dell PowerEdge Servers For Low Latency 12132010 Final
7 pages
8th Gen Core Family Datasheet Vol 1 Rev009
No ratings yet
8th Gen Core Family Datasheet Vol 1 Rev009
155 pages
8th Gen Core Family Datasheet Vol 1
No ratings yet
8th Gen Core Family Datasheet Vol 1
135 pages
Disini Brosur Hardware Update Februari 2022
No ratings yet
Disini Brosur Hardware Update Februari 2022
4 pages
Training Final Report
100% (1)
Training Final Report
45 pages
Hmi Tp177a Tp177b Op177b Operating Instructions en US en-US
No ratings yet
Hmi Tp177a Tp177b Op177b Operating Instructions en US en-US
380 pages
Case Study Nift
67% (6)
Case Study Nift
19 pages
Poly Works Beginners Guide
100% (1)
Poly Works Beginners Guide
121 pages
IT G500 HardwareManual
No ratings yet
IT G500 HardwareManual
35 pages
Objective Question Bank
No ratings yet
Objective Question Bank
17 pages
Introduction To Computing
No ratings yet
Introduction To Computing
154 pages
Sick WF Data Sheet
No ratings yet
Sick WF Data Sheet
8 pages
1unit III Me3791 Mts & Iot QB With Answer
100% (1)
1unit III Me3791 Mts & Iot QB With Answer
11 pages
09 NT I NW Masterpact Prekidaci 630-6300a
No ratings yet
09 NT I NW Masterpact Prekidaci 630-6300a
172 pages
0478 w17 QP 12
No ratings yet
0478 w17 QP 12
12 pages
MapR Sandbox For Hadoop DocUpdateFor3.1.1
No ratings yet
MapR Sandbox For Hadoop DocUpdateFor3.1.1
7 pages
Ict Notes With Image
No ratings yet
Ict Notes With Image
7 pages
Project Report PDF
No ratings yet
Project Report PDF
15 pages
UCI-50 and UCI-100 Universal Chromatography Interfaces Operating
No ratings yet
UCI-50 and UCI-100 Universal Chromatography Interfaces Operating
50 pages
Design Note DN401: Interfacing CC1020/1 To The MSP430
No ratings yet
Design Note DN401: Interfacing CC1020/1 To The MSP430
12 pages
Blue Brain Technology
No ratings yet
Blue Brain Technology
30 pages
Women Security System
No ratings yet
Women Security System
75 pages
Measuring Position and Displacement With LVDTS: Tutorial
No ratings yet
Measuring Position and Displacement With LVDTS: Tutorial
5 pages
Osai PLC Osai PLC
No ratings yet
Osai PLC Osai PLC
27 pages
Friend Function and Operator Overloading
No ratings yet
Friend Function and Operator Overloading
54 pages
Robot Arm Setup and Maintenence RV-2SD PDF
No ratings yet
Robot Arm Setup and Maintenence RV-2SD PDF
108 pages
DataSheet QLE269x
No ratings yet
DataSheet QLE269x
5 pages
GSSF L1
No ratings yet
GSSF L1
20 pages
MCAP QP CT - I - 2 Marks - Key
No ratings yet
MCAP QP CT - I - 2 Marks - Key
3 pages
2013 Remedy Al
No ratings yet
2013 Remedy Al
1 page
Instruction-Set Accelerated Implementation of CRYSTALS-Kyber
No ratings yet
Instruction-Set Accelerated Implementation of CRYSTALS-Kyber
12 pages
Can Information System Provide Business With A Competitive Advantage
No ratings yet
Can Information System Provide Business With A Competitive Advantage
5 pages

HPC Cluster Tuning Guide On 3rd Generation Intel Xeon Scalable Processors 1

Uploaded by

HPC Cluster Tuning Guide On 3rd Generation Intel Xeon Scalable Processors 1

Uploaded by

Optimization Guide

High Performance Computing

HPC Cluster Tuning on

Revision Record ................................................................................................................................. 2

Date Rev. Description

04/06/2021 1.0 Initial public release.

Revision 1.0 Page 2 | Total 10

Do not use Intel® Optane™ memory for HPC benchmarks.

Revision 1.0 Page 3 | Total 10

Revision 1.0 Page 4 | Total 10

Configuration Item Recommended Value

Sub-NUMA Cluster (SNC)

Integrated Memory Controller (IMC) Interleaving

XPT (eXtended Prediction Table) Prefetch

Ultra-Path Interconnect (UPI) Prefetch

Direct-to-UPI (D2U or D2K)

Revision 1.0 Page 5 | Total 10

Storage requirements for other HPC applications

cpupower -c all frequency-set --governor performance

 Disable all unnecessary services and cron jobs.

Revision 1.0 Page 6 | Total 10

install the latest toolkit, go to

Optimized settings for HPC benchmarks may differ from applications.

• Sparse matrix-vector multiplication.

• Sustainable memory bandwidth

Revision 1.0 Page 8 | Total 10

Figure: numactl sample output

Figure: numastat sample output

Figure: PCM example of monitoring persistent memory bandwidth

Revision 1.0 Page 9 | Total 10

Notices & Disclaimers

Intel technologies may require enabled hardware, software or service activation.

No product or component can be absolutely secure.

Your costs and results may vary.

Revision 1.0 Page 10 | Total 10

You might also like