0% found this document useful (0 votes)
68 views8 pages

Optimize System Bandwidth For HPC Ai Micron CXL Intel Xeon Whitepaper

This white paper discusses the optimization of memory bandwidth for High-Performance Computing (HPC) and Artificial Intelligence (AI) workloads using Micron's CXL memory expansion modules on Intel's Xeon 6 processors. Experimental results demonstrate that utilizing CXL memory can enhance read-only bandwidth by 24% and mixed read/write bandwidth by up to 39%, leading to an overall performance speedup of 24% across various workloads. The study highlights the importance of employing a software-based weighted interleaving method to effectively manage memory allocation between local DRAM and CXL memory for optimal performance.

Uploaded by

myTert
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views8 pages

Optimize System Bandwidth For HPC Ai Micron CXL Intel Xeon Whitepaper

This white paper discusses the optimization of memory bandwidth for High-Performance Computing (HPC) and Artificial Intelligence (AI) workloads using Micron's CXL memory expansion modules on Intel's Xeon 6 processors. Experimental results demonstrate that utilizing CXL memory can enhance read-only bandwidth by 24% and mixed read/write bandwidth by up to 39%, leading to an overall performance speedup of 24% across various workloads. The study highlights the importance of employing a software-based weighted interleaving method to effectively manage memory allocation between local DRAM and CXL memory for optimal performance.

Uploaded by

myTert
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

White Paper | Optimizing System

Memory Bandwidth with Micron


CXL™ Memory Expansion Modules on
Intel® Xeon® 6 Processors
Micron Technology Intel Corporation
Rohit Sehgal Anil Godbole
Vishal Tanna
Vinicius Petrucci

1
Optimizing System Memory Bandwidth with
Micron CXLTM Memory Expansion Modules on
Intel® Xeon® 6 Processors
Rohit Sehgal, Vishal Tanna Vinicius Petrucci Anil Godbole
Micron Technology Micron Technology Intel Corporation
San Jose, CA Austin, TX Santa Clara, CA

Abstract— High-Performance Computing (HPC) and DRAM modules. The memory bandwidth expansion enabled
Artificial Intelligence (AI) workloads typically demand by CXL is essential for enhancing the performance of HPC
substantial memory bandwidth and, to a degree, memory and AI workloads.
capacity. CXLTM memory expansion modules, also known as
CXL “type-3” devices, enable enhancements in both memory
capacity and bandwidth for server systems by utilizing the CXL
While CXL has primarily aimed at expanding memory
protocol which runs over the PCIe interfaces of the processor. capacity, its advantages for bandwidth-intensive workloads
This paper discusses experimental findings on achieving still need to be thoroughly explored and quantified in real
increased memory bandwidth for HPC and AI workloads using CXL-capable systems, utilizing as many supported PCIe
Micron’s CXL modules. This is the first study that presents real lanes as possible. In particular, the unique bandwidth
data experiments utilizing eight CXL E3.S (x8) Micron CZ122 characteristics of local DRAM and CXL memory can differ
devices on the Intel® Xeon® 6 processor 6900P (previously depending on the read/write ratio of workloads, creating
codenamed Granite Rapids AP) featuring 128 cores, alongside challenges in optimizing the capabilities of each memory tier
Micron DDR-5 memory operating at 6400 MT/s on each of the in terms of memory bandwidth. For this purpose, a software-
CPU’s 12 DRAM channels. The eight CXL memories were set
up as a unified NUMA configuration, employing software-based
based weighted interleaving method, available in mainstream
page level interleaving mechanism, available in Linux kernel Linux kernel distribution, is employed for optimization.
v6.9+, between DDR5 and CXL memory nodes to improve
overall system bandwidth. Memory expansion via CXL boosts
II. PLATFORM CONFIGURATION
read-only bandwidth by 24% and mixed read/write bandwidth A. Intel Xeon 6 CPU System (Avenue City platform)
by up to 39%. Across HPC and AI workloads, the geometric
mean of performance speedups is 24%. The 6900P CPU supports 6 x16 (96) PCIe 5.0 lanes. The lanes
support CXL 2.0 Type-3 devices, allowing for memory
Keywords—DDR5, CXL, HPC, software-interleaving, expansion. The CPU supports any four x16 lanes to be used as
bandwidth, LLM inferencing, AI vector search CXL links.

I. INTRODUCTION
High-performance and AI workloads encompass important
computational tasks that demand substantial processing and
memory resources. These workloads are frequently utilized
in scientific research, simulations, and data-intensive
applications, including computational fluid dynamics,
weather forecasting, and DNA sequencing.

Alongside HPC, AI plays a crucial role in analyzing large


datasets and driving innovations across various fields. For
example, LLM inference and vector search in Retrieval-
Augmented Generation (RAG) are crucial workloads as they
enable efficient access to relevant information and enhance
the quality of generated responses, making AI interactions
more accurate and contextually aware.

This paper presents experimental work conducted by Micron


and Intel, which examines the performance of AI and HPC
workloads on the Intel® Xeon® 6 processor 6900P series now
in full production, paired with Micron CZ122 CXL devices.
The study quantifies the performance benefits of utilizing Figure 1. System Architecture of Intel Xeon 6 processor 6900P with
Micron CZ122 devices in HPC/AI workloads, noting 128 cores and 12x Micron DDR5 6400 MT/s. All 12 local DRAM
channels are designated as NUMA node 0 (HEX mode), while all
improvements in performance by expanding system memory
the Micron’s CXL modules (8 in total) are brought up as separate
bandwidth using CXL memory expansion, beyond local NUMA node 1.

2
As the focus of this paper is on demonstrating the OS Red Hat Enterprise Linux 9.4
effectiveness of increasing bandwidth rather than capacity, Kernel 6.11.6 (With support for weighted
smaller memory modules were intentionally chosen for both memory interleaving)
native DRAM (64 GB) and CXL (128 GB) modules.
B. Memory Expansion with Micron CZ122 CXL modules
The system configuration employed (Figure 1) facilitates the Micron's CZ122 CXL modules are currently in production
management of various memory tiers by efficiently and have demonstrated reliable performance across various
organizing and distinguishing between the locally attached workloads, effectively showcasing memory expansion over
DRAM and the CXL memory modules. CXL interface. The addition of these CXL modules enhance
both the memory bandwidth and the capacity of the server,
Traditionally, the Linux kernel has managed memory building on what is already provided by the RDIMM slots;
allocation across multiple NUMA (Non-Uniform Memory that is, delivering memory bandwidth expansion.
Access) nodes. Each of the memory types (either DRAM or
CXL) is represented as a single NUMA node, allowing the Optimally placing newly allocated pages is a complex issue.
system to use existing abstractions to manage and allocate NUMA interleaving, a traditional approach under Linux,
memory across these two different pools. evenly distributes pages across memory nodes for consistent
performance. However, it lacks the ability to consider
Recently, NUMA nodes have been used to categorize memory tier performance differences.
memory into performance tiers, while existing allocation
policies can place memory on specific NUMA nodes. For A recent series of patches has added weighted NUMA
example, when brought up as system memory, CXL memory interleaving capabilities to the Linux kernel, allowing for
is treated as a separate NUMA node. more strategic memory allocation based on performance
characteristics of different memory nodes in system. This
To showcase the advantages of using CXL memories, the strategy optimizes system memory bandwidth by effectively
system configuration is designed so that the local RDIMM utilizing bandwidth both local DRAM and CXL memory
slots are filled with the fastest available Micron RDIMMs, nodes. The weighted-interleaving feature, introduced in
delivering a bandwidth of 6400 MT/s per slot. All 12 Linux kernel version 6.9+ and influenced significantly by
available slots are populated – totaling 768GB memory Micron’s contributions, enables the adjustment of weights
capacity. As shown in Figure 2, eight Micron CZ122 128GB assigned to individual pages across various memory types,
CXL devices are utilized, occupying 64 PCIe lanes and thereby enhancing overall memory bandwidth (as illustrated
providing a total additional memory capacity of 1TB. in Figure 2).

Figure 2. Configuration of Micron CZ122 CXL modules with an


Intel Xeon-6 CPU on an Avenue City platform involves connecting
four cards directly to the backplane, while the other four cards are
attached using riser cards in two CME slots.

Additional details on the platform are shown in the table


below.

Platform Intel Avenue City


CPU family Intel® Xeon® 6 6900P series with 128
cores
Native DRAM Micron DDR5-64GB (6400MTs) Figure 3. Software-based weighted interleaving (M:N) allowing
(12 modules ~ 768 GB) – HEX mode placing M pages on local DRAM and N pages on CXL memory for
CXL Memory Micron CZ122 – 128GB * 8 optimized system memory bandwidth.
(8 modules E3.S form factor ~ 1TB)

3
Figure 4. Bandwidth vs Latency curves using DRAM only vs DRAM + CXL. The interleaving weights are represented as pairs
(DRAM, CXL). It’s important to note that at low bandwidth, a greater number of pages (9) are allocated to DRAM compared to
CXL (1), as indicated by the weights (9,1). Conversely, under high load conditions, the optimal interleaving weights shift to (3,1).

III. NATIVE DRAM VS. CXL ATTACHED MEMORY workload. Therefore, it's crucial to analyze the read-to-write
PERFORMANCE CHARACTERISTICS ratio of a workload to identify the optimal interleaving
Before the performance analysis of the actual workloads is strategy for utilizing DRAM and CXL memory tiers
introduced, the performance characteristics of local DRAM effectively.
and CXL memory regarding bandwidth at various read-to-
write ratios of memory traffic will be presented and As shown in Figure 4, it's also important to note that memory
discussed1. latency is reduced when using CXL. This is because
workloads that rely solely on local DRAM can be bandwidth-
CXL over limited, leading to significantly higher memory access
Memory Bandwidth Bandwidth DRAM
Workload
Tier (in GB/s) (Normalized) (Theoretical latency (loaded latency) under heavy loads. In contrast,
gains with CXL) combining DRAM with CXL memory through optimized
Read only DRAM 556 1.00 - weighted interleaving results in lower latency, despite CXL
3R,1W DRAM 486 0.87 - memory having a higher unloaded latency.
2R,1W DRAM 474 0.85 -
2R,1W
(non-
At each data point on the “DRAM + CXL” curve, the
DRAM 466 0.84 - interleave ratio of DRAM and CXL is displayed. Under low
temporal
W) bandwidth conditions, it's advantageous to utilize more
1R,1W DRAM 446 0.80 - DRAM due to its lower latency compared to CXL memory
(9:1 ratio). However, as the load increases, the reliance on
Read only CXL 205 1.00 37%
DRAM decreases while the emphasis shifts towards CXL
3R,1W CXL 214 1.04 44%
memory. Ultimately, a 3:1 ratio was identified as optimal
2R,1W CXL 208 1.01 44%
2R,1W
under maximum load conditions for a read-only workload
(non- traffic.
CXL 189 0.92 41%
temporal
W)
When comparing the use of CXL memory alongside local
1R,1W CXL 214 1.04 48%
DRAM, various performance improvements can be observed.
For instance, in a read-only scenario (where DRAM excels)
The performance data from table above indicates that DRAM
the addition of CXL memory bandwidth results in a 24%
performs optimally in read-only workloads, but its
performance boost. The upcoming experiments will
performance diminishes when the number of writes is equal demonstrate that for mixed read/write workloads, the
to or exceeds the number of reads. For instance, in a workload performance improvements with CXL, attributed to balanced
with a 1:1 read to write ratio, DRAM's performance drops by
memory interleaving, can reach as high as 38%. The
20% compared to a read-only scenario.
following sections will demonstrate that for different
workload mixes, we may need to adjust the interleaving
Conversely, CXL memory demonstrates the opposite trend
weights based on the read-to-write ratio of the workload.
due to the bidirectional nature of the PCIe interface, resulting
in better performance for mixed read-write workloads.
Another noteworthy observation is that CXL memory shows
an 8% decrease in bandwidth during a non-temporal write

1
Performance results are derived from testing in the specified
configuration (Section II.A). Results may vary, so it is
recommended to reconfirm them in your setting.

4
IV. WORKLOAD ANALYSIS Workload: W5 (1W, 1W)

A. Intel MLC (Microbenchmark) Weight Weight BW BW


Intel MLC (Memory Latency Checker) is a microbenchmark (DRAM) (CXL) (in GB/s) (Normalized)
tool designed to assess memory latencies and bandwidth in 1 0 446 1.00
computer systems. It helps analyze how these metrics change 1 1 409 0.92
under varying loads, providing insights into the performance
2 1 621 1.39
of the memory subsystem.
5 2 614 1.37
Utilizing the software-based interleaving kernel feature, 3 1 585 1.31
memory allocation between DRAM and CXL is determined
4 1 551 1.24
based on a user-defined ratio. Bandwidth measurements are
obtained by running the MLC workloads with different 0 1 214 0.48
read:write ratios.
As shown above, MLC results for the W5 (1W, 1R) workload
The weights for each memory tier are given in terms of the indicate a 39% increase in bandwidth with a 5:2 interleave
number of pages allocated on DRAM versus CXL memory. ratio of DRAM to CXL.
For example, a weight of 3 (DRAM) and weight of 1 (CXL)
means 75% of the pages (and eventually the associated Workload: W10 (2R, 1W non-temporal)
memory traffic) allocated on DRAM, while 25% allocated to
CXL memory. The following tables presents the results of Weight Weight BW BW
MLC for various read:write ratios. (DRAM) (CXL) (in GB/s) (Normalized)
1 0 466 1.00
Workload: R (read-only) 1 1 390 0.84
Weight Weight BW BW 2 1 533 1.14
(DRAM) (CXL) (in GB/s) (Normalized) 5 2
1 0 556 1.00 607 1.30
3 1 601 1.29
1 1 394 0.71
4 1 572 1.23
2 1 590 1.06
0 1 189 0.41
5 2 669 1.20
3 1 690 1.24
As shown above, MLC results for the W10 (2R, 1W non-
4 1 677 1.22 temporal) workload indicate a 30% increase in bandwidth
0 1 205 0.37 with a 5:2 interleave ratio of DRAM to CXL.

In summary, as seen in the tables above, for the 100% Read


As shown above, MLC results for the R (Read-only) workload, splitting the pages between DRAM and CXL in a
workload indicate a 24% increase in bandwidth with a 3:1 3:1 ratio (3 pages in DRAM, 1 in CXL) results in a 24%
interleave ratio of DRAM to CXL. bandwidth gain compared to using only DRAM.
Workload: W2 (2W, 1R)
For the W2, W3, W5, and W10 MLC workloads, the optimal
Weight Weight BW BW
performance occurs with a DRAM to CXL ratio of 5 to 2.
(DRAM) (CXL) (in GB/s) (Normalized) This configuration yields a 34-38% bandwidth increase over
1 0 DRAM alone. The MLC data shows that adding CXL
474 1.00
memory significantly boosts bandwidth.
1 1 422 0.89
2 1 624 1.32 It is worth noticing that the MLC data provides us with a
5 2 upper bound on the performance gains when the workload is
636 1.34 memory-bandwidth bound given a particular read:write ratio.
3 1 617 1.30
4 1 586 1.24 For instance, LLM inference predominantly involves read-
0 1 only traffic, with bottlenecks generally arising at the token
208 0.44 generation stage, which necessitates repeated reading of
model weights for each token. Consequently, the optimal
As shown above, MLC results for the W2 (2W, 1R) workload interleave ratio should be 3:1 for DRAM to CXL memory.
indicate a 34% increase in bandwidth with a 5:2 interleave
ratio of DRAM to CXL. B. AI Workloads
The Intel Xeon 6 processor with P-cores family is optimized
for HPC and AI workloads, enhancing performance in deep
learning and machine learning applications. Optimizations
take advantage of Intel® Advanced Vector Extensions 512

5
(Intel® AVX-512) Vector Neural Network Instructions search time. These values were optimized to achieve a high
(VNNI) and Intel® Advanced Matrix Extensions (Intel® recall rate with minimal search time. The configuration
AMX) on Intel CPUs. resulted in a recall rate of 77% @ 10, meaning 77% of the
true nearest neighbors are included in the top 10 results
With 128 physical cores, the CPU architecture provides returned by the search algorithm.
specialized acceleration for AI operations, improving
throughput and reducing latency in LLM inferencing and Weight Weight Time Speedup
vector search workloads. The architecture supports matrix (DRAM) (CXL) (ms / query)
multiplication and efficiently handles models with billions of 1 0 0.545 1.00
parameters. 2 1 0.442 1.23
5 2 0.454 1.20
LLM Inference - To run LLM inferencing on the Intel
hardware, the open-source Intel Extensions for PyTorch The FAISS workload demonstrated a 23% improvement with
(IPEX) was used. IPEX has up to date optimizations for an a DRAM to CXL ratio of 2:1.
extra performance boost on Intel hardware. The LLM model C. HPC Workloads
used was Meta-Llama3-8B-Instruct. The data type employed
HPC workloads stand for High performance workloads –
for the weights is ‘bfloat16’. Batch size of one was used. With
those include OpenFOAM, HPCG, Xcompact3d, POT3D.
using the intel pytorch extensions for inferencing, the
These workloads typically require higher memory
LLAMA3-8B-Instruct gave a speed up of 17% with 3:1
bandwidths in addition to increased capacity.
DRAM to CXL ratio versus using DRAM only memory.
OpenFOAM - OpenFOAM workload benchmarks are
Weight Weight Output Token Speedup
(DRAM) (CXL) Latency (ms) standardized test cases designed to evaluate the performance
1 0 42.91 1.00 and scalability of hardware and software configurations when
2 1 40.43 1.06 running OpenFOAM, an open-source computational fluid
5 2 37.54 1.14 dynamics (CFD) software. These benchmarks simulate
3 1 36.83 1.17 various fluid dynamics scenarios to assess how efficiently
different systems handle complex CFD computations. The
FAISS (Vector Search) - FAISS [7] is a library developed OpenFOAM drivaerFastback case was used with an input of
by Facebook AI for efficient similarity search and clustering approximately 200 million cells. The results from the
of dense vectors. The dataset used was the Microsoft Turing- benchmark for different DRAM/CXL ratios are shown
ANNS consisting of a raw vector space of one billion points below:
with 100 dimensions, using L2 distance and k-NN method.
As recommended by Meta [8], the index used was: Weight Weight Execution Speedup
OPQ128_256-IVF65536_HNSW32-PQ128x4fsr. This is an (DRAM) (CXL) time (s)
1 0 254 1.00
optimized FAISS index configuration that specifies a series
2 1 212 1.20
of transformations and indexing methods for efficient 5 2 209 1.22
similarity search. Here is a breakdown of what each part 3 1 210 1.21
means:
• OPQ128_256: Optimized Product Quantization The OpenFOAM workload has exhibited a 22%
rotates vectors for efficient encoding, with 128 and improvement with a DRAM to CXL ratio of 5:2.
256 dimensions involved.
• IVF65536: Inverted File Index with 65,536 clusters HPCG - The High-Performance Conjugate Gradients
speeds up the search by dividing the vector space (HPCG) benchmark is a workload designed to assess
into clusters. supercomputing systems by solving a large, sparse linear
• HNSW32: Hierarchical Navigable Small World system using a multigrid preconditioned conjugate gradient
graph with 32 neighbors, a graph-based method for algorithm. Unlike the High Performance Linpack (HPL)
approximate nearest neighbor search. benchmark, which focuses on dense matrix computations,
• PQ128x4fsr: Product Quantization with 128 HPCG emphasizes memory access patterns and data
dimensions and 4 subquantizers for further movement, reflecting the behavior of real-world scientific
optimizations. and engineering applications. By doing so, HPCG provides a
more comprehensive measure of a system’s capability to
The configuration combines several advanced techniques to handle complex, memory-intensive workloads. The input
create an efficient and scalable index for similarity search in used was the following: x=192, y=192, z=192. Results are
large datasets. shown in the table below.

To report the final performance data, these parameters were Weight Weight Performance Speedup
configured: nprobe=4096 and efSearch=512. Both are crucial (DRAM) (CXL) (GFlops/s)
for balancing speed and accuracy in FAISS searches. A 1 0 92 1.00
higher nprobe (number of clusters probed) increases accuracy 2 1 111 1.20
but also search time. Similarly, efSearch (number of 5 2 113 1.23
candidate nodes explored) enhances accuracy at the cost of 3 1 117 1.27

6
Figure 5. Summary of performance gains for the HPC and AI workloads running on DDR5-6400 (baseline) vs. DDR5-6400 + CXL.

The POT3D workload has demonstrated a 27% improvement


The HPCG benchmark has shown 27% improvement with with a DRAM to CXL ratio of 5:2.
DRAM: CXL = 3:1
D. Putting it All Together
Xcompact3D - The Xcompact3D benchmark is a Figure 3 below presents a comprehensive summary of the
performance evaluation tool designed to assess performance improvements observed across various HPC
computational efficiency when solving the incompressible and AI workloads. These gains range from a 1.17x to a
Navier-Stokes equations using the Xcompact3D solver. It remarkable 1.30x enhancement, illustrating the effectiveness
focuses on simulating fluid dynamics scenarios, such as the of integrating DDR5-6400 memory with CXL technology.
3D Taylor-Green Vortex, to measure how effectively a By carefully calibrating the balance between DRAM and
system manages high-order finite-difference computations. CXL memory allocations, an optimized execution
Researchers and engineers utilize this benchmark to evaluate configuration can be found for demanding computational
and compare the performance of different hardware tasks. For HPC and AI workloads, the geometric mean of
configurations and computational setups in fluid dynamics performance speedups across all those workloads is 24%.
simulations. Results are shown in the table below.
A notable example of these performance gains is the POT3D
Weight Weight Execution time Speedup workload, a high-performance computing (HPC) application.
(DRAM) (CXL) (s) The improvements in memory bandwidth and latency
1 0 196 1.00 reduction have translated into a faster execution of complex
2 1 221 0.89 simulations, highlighting the transformative impact of CXL
5 2 157 1.25 memory expansion in HPC environments.
3 1 159 1.24
On the artificial intelligence (AI) front, the FAISS benchmark
The benchmark has seen 25% improvement with DRAM: serves as a prime example. FAISS, an AI workload focused
CXL = 5:2 on similarity search, has shown a remarkable 23%
improvement with the optimized DRAM:CXL ratio of 2:1.
POT3D - The Pot3D benchmark is a computational This gain is a testament to the enhanced memory bandwidth
performance benchmark that simulates the 3D Poisson and performance scalability that CXL technology brings to
equation, often used to measure the performance of AI applications. By leveraging the combined capabilities of
processors and systems in handling scientific and engineering DDR5-6400 and CXL-based memory expansion modules,
workloads. This benchmark calculates electrostatic potentials FAISS can manage larger datasets and perform more efficient
within a 3D space, which is important in fields like molecular searches, thereby accelerating the overall AI processing
dynamics and computational physics. Results are shown in pipeline.
the table below.
V. CONCLUSION
Weight Weight Execution Speedup The experimental results presented in this paper demonstrate
(DRAM) (CXL) time(s) that Micron’s CZ122 CXL memory modules used in software
1 0 687 1.00
level ratio based weighted interleave configuration
2 1 562 1.22
significantly enhance memory bandwidth for HPC and AI
5 2 539 1.27
3 1 552 1.24 workloads when used on systems with Intel’s 6th Generation
Xeon processors.

7
Key takeaways from this study include:
• Significant improvements in system performance
with the combination of CXL based memory
expansion and native DDR5-6400 memory due to
bandwidth improvements.
• The optimization of the DRAM:CXL ratios as a
critical factor in achieving these performance gains.
• The potential for CXL technology to drastically
elevate the capabilities of high-performance
computing and artificial intelligence applications.

The findings in this paper underscore the potential of CXL to


significantly improve system efficiency and performance in
demanding applications. Future research and development
efforts should continue to explore and refine this integration,
paving the way for even greater innovations in hybrid
memory systems to meet the increasing computing demands
for HPC and AI workloads.
ACKNOWLEDGEMENTS
Thanks to the team members at Micron Technology: Eishan
Mirakhur and Venkata Ravi Shankar Jonnalagadda for their
contributions to the early discussions on weighted
interleaving work and experiments.
REFERENCES
[1] Intel Extensions for PyTorch –
https://github.com/intel/intel-extension-for-pytorch
[2] https://www.hpcg-benchmark.org/
[3] https://en.wikipedia.org/wiki/OpenFOAM#:~:text=OpenFOAM%20(
Open%20Field%20Operation%20And,computational%20fluid%20dy
namics%20(CFD).
[4] https://openbenchmarking.org/test/pts/openfoam
[5] https://openbenchmarking.org/test/pts/incompact3d
[6] https://github.com/predsci/POT3D
[7] https://faiss.ai/
[8] https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-
an-index

micron.com
©2024 Micron Technology, Inc. All rights reserved. All information herein is provided on as “AS IS” basis without warranties of any kind, including any implied
warranties, warranties of merchantability or warranties of fitness for a particular purpose. Micron, the Micron logo, and all other Micron trademarks are the property of
Micron Technology, Inc. Intel and Xeon are trademarks of Intel Corporation. All other trademarks are the property of their respective owners. No hardware, software or
system can provide absolute security and protection of data under all conditions. Micron assumes no liability for lost, stolen or corrupted data arising from the use of any
Micron product, including those products that incorporate any of the mentioned security features. Products are warranted only to meet Micron’s production data sheet
specifications. Products, programs and specifications are subject to change without notice. Rev. A 12/2024 CCM004-676576390-11778

You might also like