0% found this document useful (0 votes)

98 views6 pages

MONTRES-NVM: An External Sorting Algorithm For Hybrid Memory

This document summarizes a research paper presented at the 2018 7th IEEE Non-Volatile Memory Systems and Applications Symposium about a new hybrid memory-aware sorting algorithm called MONTRES-NVM. MONTRES-NVM is designed for a hybrid main memory containing both DRAM and PCM. It improves on previous external sorting algorithms by taking advantage of DRAM's low latency to accelerate sorting while minimizing writes to the higher latency PCM. The algorithm works in three phases: detecting already sorted sequences, sorting unsorted blocks in DRAM and writing parts to PCM, and finally merging all sorted data using a dichotomous technique. Experimental results show MONTRES-NVM provides up to 80% faster

Uploaded by

angki_ang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views6 pages

MONTRES-NVM: An External Sorting Algorithm For Hybrid Memory

Uploaded by

angki_ang

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

2018 7th IEEE Non-Volatile Memory Systems and Applications Symposium

MONTRES-NVM: an External Sorting Algorithm

for Hybrid Memory
Mohammed Bey Ahmed Khernache Arezki Laga, Jalil Boukhobza
Univ. Bretagne Sud Univ. Bretagne Occidentale
UMR 6285, Lab-STICC UMR 6285, Lab-STICC
F-56100, Lorient, France F-29200, Brest, France
[email protected] {ﬁrstname.lastname}@univ-brest.fr

Abstract—DRAM technology is approaching its scaling limit In the literature, many algorithms have been proposed to
and the use of emerging NVM is seen as one possible solution sort data in the main memory (DRAM) such as quick sort,
to such an issue. As NVM technologies are not mature enough radix sort or merge sort. When dealing with large data volumes
and does not outperform DRAMs, several studies expect the
use of hybrid main memories containing both DRAM and PCM (i.e which size is larger than the allocated main memory
NVM. Redesigning applications for such systems is mandatory size), external sorting algorithms have to be used. They are
as PCM does not have the same performance model as DRAM. composed of two phases: a run generation phase and a
In this context, we designed a hybrid memory-aware sorting run merge phase [8]. The run generation phase splits data
algorithm called MONTRES-NVM. Since an NVM-based hybrid into chunks that fit into the main-memory, sorts them and
memory presents a performance gap between DRAM and PCM,
we believe that the sorting algorithm falls in the external sorting writes the sorted chunk into intermediate files (called runs).
category. As a matter of fact, we extended our previously Then, runs are merged and written into the final sorted file.
designed flash-based external sorting algorithm MONTRES for The performance of these algorithms highly depends on the
a hybrid memory by taking profit of byte addressability, and way they manage I/O operations. External sorting algorithms
performance asymmetry between reads and writes. MONTRES- were designed to optimize I/O requests on traditional magnetic
NVM enhances the performance of the merge sort algorithm on
PCM by more than 60%, the merge sort on DRAM by 3-40% drives [8]. They were then optimized to take benefit from flash
and MONTRES (on a hybrid memory) by 3-33% according to memory performance (SSD) [10], [7], [6].
the proportion of already sorted data in the dataset. In our work, we considered a hybrid main memory with a
Index Terms—Sorting algorithm, Hybrid memory, Non Volatile large proportion of PCM as compared to DRAM, as in several
Memory, Phase Change Memory. state-of-the-art work [2]. Since PCM has higher latencies than
DRAM and asymmetric read/write operations performance
I. I NTRODUCTION [4][2], we believe that sorting in a PCM/DRAM main memory
Nowadays, the scaling of DRAM memory is approaching have some similarity with external sorting algorithms.
its limit [3] and increasing its density imposes an exponen- In this paper, we present a new hybrid memory-aware
tial cost penalty. Emerging memory technologies, such as sorting algorithm named MONTRES-NVM. This sorting algo-
Phase Change Memory (PCM), may be part of the solution rithm is based on a previously developed external sorting flash
thanks to the high density they can provide [2]. PCM is memory-based algorithm named MONTRES (Merge ON-The-
a byte-addressable memory. It has small-sized cells and a Run External Sorting) [10]. The main idea of MONTRES-
good endurance compared to NAND flash memory. PCM NVM is to take profit of the small size DRAM to accelerate
may change our view on the memory hierarchy. It can be the sorting process while minimizing the number of write op-
integrated either horizontally where it is considered as an erations performed on the PCM. To do so, MONTRES-NVM
extension of an existing memory level, or vertically where it is is composed of three main phases: (1) a first read operation is
interleaved between two existing memory levels [2]. However, performed on the data to detect already sorted sub-sequences.
as compared to DRAM, PCM has a higher access latency, These sub-sequences are then indexed ; (2) unsorted sub-
especially for write operations, thus a higher energy cost [4]. sequences are divided in blocks that can fit into DRAM
The volume of data is growing exponentially and it is workspace and sorted in DRAM, we used MONTRES’merge-
supposed to attain 185 zettabytes in 2025 [5]. To take profit of on-the-fly mechanism to store parts of the sorted data in
this huge amount of data, for instance, for real-time analytics PCM, (3) finally, all sorted data sub-parts are merged using
applications, the need for fast processing becomes a necessity. dichotomy technique and a heap data structure.
Sorting data is one of the most important computational MONTRES-NVM has been compared with both in-memory
problem for which algorithms have been developed [9]. For and external sorting algorithms. On random data, when com-
instance, the CPU spends 60% of its time sorting data [11] and pared with the merge sort (in-memory) algorithm, MONTRES-
most operations in a Data Base Management Systems (DBMS) NVM decreases the sorting time on PCM by more than 60%,
use these algorithms. and on DRAM by about 14%. It outperformed MONTRES

2575-257X/18/$31.00 ©2018 IEEE 49

DOI 10.1109/NVMSA.2018.00013

Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:36:32 UTC from IEEE Xplore. Restrictions apply.
by about 6%. With partially sorted data, MONTRES-NVM $

improves MONTRES by up to 33%, while it enhances the

merge sort on PCM by up to 80% and the merge sort fully on

DRAM by up to 40%. !% "!#$
$
!
The remainder of the paper is organized as follows: Section

!%
2 gives some background about PCM and traditional external
sorting algorithm ; Section 3 describes MONTRES-NVM ;
Section 4 gives the experimental evaluation and ﬁnally in
$ $ $ $ !$ !$
Section 5 we conclude the paper.

II. BACKGROUND

A. Phase Change Memory $# !#$"

Phase Change Memory (PCM) is a non-volatile memory Fig. 1: Illustration of the run generation phase [10].
based on a chalcogenide alloy. It is a resistive NVM that
utilizes different resistance levels to represent binary infor-

mation. A PCM cell usually uses two electrodes that wrap the
chalcogenide material, in addition to a heater. This chalco-

genide material is subject to rapid amorphous-to-crystalline

phase-change process that is electrically initiated. One can
distinguish two states of resistance according to the voltage

pulse applied to the PCM cell. A short but high voltage pulse
results in an amorphous state (binary 0), while a long pulse
with a low voltage results in a crystalline state (binary 1).

PCM gives good performance ﬁgures (that may be as good

as DRAM for read operations) and a good static energy
properties. However, it presents two main drawbacks, the write
operations are slow as compared to reads and the endurance

of PCM cells are much lower than that of DRAM cells.
B. External sorting algorithm

External sorting algorithms are tailored for data volumes
that are larger than the main memory. Data are initially stored
Fig. 2: Illustration of the run merge phase [10].
in a storage device and the sorting algorithm uses an allocated
amount of DRAM to work. As the performance gap between
DRAM and storage devices is significant, external sorting A. Motivation: sorting in PCM vs sorting in DRAM
algorithms try to minimize the number of I/O operations
performed. External sorting algorithms were first designed The objective of this section is to show the impact of
for Hard Disk Drives (HDD). In order to reduce the cost the PCM performance properties on the execution of some
of I/O operations, they massively relied on sequential I/O traditional sorting algorithms compared to their execution on
operations. Then, they were optimized for Solid State Drives DRAM. We will highlight the need for revising state-of-the-art
(SSD), mainly by relying on random reads to minimize the sorting algorithms for PCM-based memories.
number of I/O operations and at the same time by reducing We have evaluated the execution of the following popular in-
the amount of write operations to preserve SSD lifetime. memory sorting algorithms: merge sort, quick sort, heap sort,
An external sorting algorithm is composed of two phases: counting sort and radix sort. For each algorithm, we measured
(1) a run generation phase and (2) a run merge phase. In the the execution time for two memory configurations: the first
first phase, a chunk of data is loaded from the storage device one is a full DRAM main memory, and the second one is a
into the DRAM. Then it is sorted with an in-memory sorting full PCM main memory, emulated using a PCMSim [1].
algorithm. The sorted chunk is written back to the storage Fig. 3 presents the execution times of in-memory sorting
device in an intermediate file called a run, see Fig. 1. In the algorithms on the two memory configurations. Many obser-
second phase, the sorting algorithm merges the runs by loading vations can be drawn from this figure: (1) using PCM slows
sub-parts into the DRAM iteratively until the runs are entirely down the execution time of the sorting algorithms, which is
merged, see Fig. 2. quite intuitive as the PCM presents higher access latencies
than DRAM. (2) The performance degradation due to PCM
III. MONTRES-NVM D ESIGN is highly variable from one algorithm to another. Indeed, the
In this section, we first discuss the motivation behind this performance degradation for counting sort, radix sort, merge
work, then we describe MONTRES-NVM, the hybrid memory sort, quick sort and heap sort is 45%, 194%, 27%, 39% and
aware sorting algorithm we have designed. 61% respectively. This is mainly due to the proportion of write

Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:36:32 UTC from IEEE Xplore. Restrictions apply.
operations performed by these algorithms. (3) Finally, for a The more the input data contains already sorted values, the
given data set to sort, the ranking of the sorting algorithms is longer the sorted sequences are, and the lower is the amount of
changed from one memory to the other. For instance, the radix data to sort in the next phases. The remaining unsorted data are
sort is better than the merge sort on DRAM but on PCM the processed during the run generation phase. This first phase
radix sort is more than two times slower. did not exist in MONTRES.
Those results motivated us to revise sorting algorithms for
Example 1. In Fig. 4, we give an example for the sorted data
NVM, especially hybrid memories that consist of PCM and
detection phase. In this example, the input data contain 16
DRAM. In such a configuration, we assume a large PCM
values stored in PCM and the number of sequences already
volume (as PCM presents a higher density) and a small
sorted to extract is set to L = 2. So, MONTRES-NVM finds
DRAM. Since the performance gap between PCM and DRAM
L = 2 longest already sorted sequences containing 4 sorted
is high for the write operations, we believe that external
values. The first already sorted sequence is located between
sorting algorithms are more suited for such hybrid memory
positions 4 and 7 in the input data. The second one is located
configuration. In the case of external sorting, data to sort are
between positions 10 and 13. Finally, these sequences are
stored in PCM and chunks are successively brought to the
inserted into the primary-index.
DRAM for sorting process. The final sorted data are written
back to PCM.

,QSXWGDWD $OUHDG\VRUWHGGDWD $OUHDG\VRUWHGGDWD

3&0

3ULPDU\LQGH[ DOUHDG\VRUWHGGDWDLQGH[

%HJLQSRVLWLRQZLWKLQLQSXWGDWD (QGSRVLWLRQZLWKLQLQSXWGDWD 0LQLPXPYDOXH

'5$0

Fig. 4: already sorted data detection

,QSXWGDWD $OUHDG\VRUWHGGDWD $OUHDG\VRUWHGGDWD

Fig. 3: Execution times of ﬁve sorting algorithms on a random 3&0

data set VHFRQGDU\LQGH[ XQVRUWHGGDWDLQGH[

&KXQNLG %HJLQDQGHQGSRVLWLRQVZLWKLQLQSXWGDWD 0LQLPXPYDOXH

B. The design of MONTRES-NVM

MONTRES-NVM includes three different phases: (1) sorted

'5$0
data detection, (2) run generation and (3) run merge. These
phases are described in the following subsections. Fig. 5: unsorted data indexation
1) Sorted data detection: The idea behind this phase is
to detect already sorted data in PCM. This is done to avoid 2) Run generation phase: The objective of this phase is
loading them in DRAM for sorting and writing them back to to generate sorted runs from the unsorted chunks of data and
PCM once sorted as write operations are much more costly store them into the PCM before merging them in the next
than reads’ in PCM memory. Since our objective is to use step. During this phase MONTRES-NVM adapts the merge-
as little DRAM as possible, we bound the number of already on-the-fly optimization of MONTRES. It consists in sorting
sorted sequences we can detect to L sequences. the unsorted data beginning from the chunks containing the
So, during this first phase, MONTRES-NVM searches for lowest values. We rely on values locality. That is, if a chunk
the L longest already sorted sequences of values in the input contains the lowest value, there is a high probability that it
data set stored in PCM. The L sorted sequences starting and contains other low values. Doing so makes it possible to evict
ending positions are saved in and index stored in DRAM. We directly into the final file several values without saving them
call it the primary index. This is possible thanks to byte- into intermediate runs, thus reducing the write traffic on PCM.
addressability feature of PCM memory as we do not have During the first phase, in addition to the primary index,
block boundary constraints like in SSDs. MONTRES-NVM organizes the remaining (unsorted) input

Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:36:32 UTC from IEEE Xplore. Restrictions apply.
data into chunks of m values, m being number of values able algorithm retrieves from the obtained run all minimum values
to fit in the available DRAM space dedicated for the sorting lower than the next minimum value in the min-heap. These
process. These chunks are indexed using a secondary index. minimum values are written directly into the final sorted data.
Each entry of the index includes the minimum value of the Therefore, this mechanism allows MONTRES-NVM to write
chunk and positions of the remaining data belonging to one several values to the output array in every iteration, whereas
chunk (starting and ending position). MONTRES writes only one.
The run generation phase starts by retrieving from the
Example 3. Fig 8 illustrates the merge of the two generated
secondary index the chunk containing the lowest value. This
runs with L = 2 already sorted data sequences. The merge
chunk is loaded into the DRAM space and sorted using the
mechanism builds a min-heap with the minimum values from
merge sort in-memory algorithm. Merge sort was used since
the generated runs and the already sorted data. Once the min-
it gives a good trade-off between performance and memory
heap is created, MONTRES-NVM starts the merge process by
footprint. Once the chunk is sorted in DRAM, sorted values
retrieving the first minimum value from the min-heap (min =14
greater than the next minimum value in the secondary index
located in intermediate run 1). Then, all the minimum values
are written into an intermediate run in the PCM memory.
in the intermediate run 1 lower than the next minimum value in
The lower values are merged on-the-fly with previously sorted
the min-heap (17), that are 14 and 15 in this case, are retrieved
data (written into previously generated run if any) and already
and written into final sorted data structure. Then, MONTRES-
sorted data (see [10] for more details). This phase generates
n NVM updates the heap by inserting the new minimum value
at most m runs stored in PCM.
in the intermediate run 1, 40 in this case.
While MONTRES loads data block by block during this
phase, MONTRES-NVM uses the byte addressability property
to load chunks of data of different size according to the ,QSXWGDWD $OUHDG\VRUWHGGDWD $OUHDG\VRUWHGGDWD
secondary index.

Example 2. In Fig.6, MONTRES-NVM gathers unsorted data
to create chunks containing m = 4 values. These chunks are
,QWHUPHGLDWHUXQ
then inserted into the secondary-index and sorted according
to their minimum value. Chunks are processed successively, 3&0

starting from the one having the lowest value. In Fig. 6, SULPDU\LQGH[ VHFRQGDU\LQGH[

the ﬁrst chunk (chunk 0) containing the lowest value in the &KXQN

secondary-index is loaded from PCM memory into DRAM, &KXQN 1H[WPLQLPXP

then sorted. Values in the sorted chunk, greater than the next &KXQN ,QPHPRU\ 6RUWHGFKXQN

minimum value in secondary-index, that is 14, are written

VRUWLQJ

into the intermediate run in the PCM memory. The remaining '5$0

values are merged on-the-ﬂy with already sorted data (see Fig
7). The merge on-the-ﬂy, presented in Fig 7, considers three Fig. 6: Sorting chunks in the DRAM
inputs: the remaining values of the sorted chunk in DRAM (6
and 9) and two already sorted data belonging to the input
data already sorted and stored in PCM. In this case, only &KXQN 1H[WPLQ

one intermediate run has been created, and its values are all 0LQYDOXHVLQFKXQN
greater than the next min value 14, that is why intermediate

run does not take part in the merge on-the-fly process. Values
'5$0
merged on-the-fly are written directly into the final sorted data
space in PCM.
,QSXWGDWD
,QSX $OUHDG\VRUWHGGDWD $OUHDG\VRUWHGGDWD

3) Run merge phase: The run merge phase of MONTRES-

NVM was ﬁrst adapted to exploit the byte addressable feature

of the PCM memory to merge the runs in one pass. In fact, 0HUJHRQ
WKHIO\
instead of loading blocks of data from each run into in- )LQDOVRUWHGGDWD ,QWHUPHGLDWHUXQ
memory buffers as in traditional external sorting algorithms

(like MONTRES), MONTRES-NVM loads only one value

3&0
from each run into DRAM for the merging process.
n
In order to merge the m generated runs, MONTRES-NVM Fig. 7: Merge on-the-ﬂy of the second iteration
n
uses the k-way merge algorithm [11] with k = m . This
n
algorithm makes use of a min-heap containing m minimum
n IV. P ERFORMANCE EVALUATION
values from the m generated runs.
MONTRES-NVM iterates on the min-heap, and extracts This section describes the used methodology to evaluate the
the run containing the lowest value on each iteration. The relevance of MONTRES-NVM. Then, results are discussed.

Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:36:32 UTC from IEEE Xplore. Restrictions apply.
B. Results and discussion

0LQKHDS

'5$0

,QSXW $OUHDG\ $OUHDG\

GDWD 6RUWHG'DWD 6RUWHG'DWD

,QWHUPHGLDWHUXQ ,QWHUPHGLDWHUXQ

3&0

Fig. 8: Run merge phase illustration

A. Evaluation methodology & setup

This section describes the experimental platform setup,
followed by the methodology to evaluate MONTRES-NVM.
1) Experimental platform setup: For our experimentations, Fig. 9: Execution time speedup of MONTRES-NVM on
we used PCMSim [1], a Linux kernel block device driver that random data as compared to merge sort on DRAM, on PCM
emulates a PCM in one of the DIMM slots of the motherboard. and MONTRES
We ran PCMSim with a Linux kernel 2.6.38. We used datasets
extracted from TPC-H benchmark. We experimented with two Fig. 9 shows the execution time speedup given by
different datasets: random/unsorted and partially sorted. MONTRES-NVM as compared to merge sort on PCM, merge
2) Methodology: In our experimental methodology, we sort on DRAM and MONTRES. MONTRES-NVM enhances
compared our results with state-of-the-art work. The basic idea the execution time over the merge sort on PCM by about 65%
behind the design of MONTRES-NVM is the use of a hybrid for the three DRAM configurations. It also enhances the merge
memory with a small sized DRAM. Since MONTRES-NVM sort on DRAM by 15%, 14% and 12% when using only 1%,
is an extension (of MONTRES which is an extension) of the 5% and 10% of DRAM memory respectively.
merge sort algorithm, we compared it with the merge sort fully MONTRES-NVM enhances the execution time of MON-
executed on DRAM. We also compared with the merge sort TRES by 9%, 5% and 3% for 1%, 5% and 10% of DRAM
on PCM to see if it is worth designing in-memory sorting space respectively. One may observe that adding more memory
algorithms on a hybrid memory. Finally we compared with does not help in sorting the data faster. In case of random data,
MONTRES algorithm on a hybrid memory, since we reused the use of more DRAM would not help in accelerating the
some of its optimizations. sorting process while it incurs more CPU processing during
In this evaluation part, we varied (1) the DRAM size: we the sorting process. In fact, on random distribution, the first
used three configurations 1%, 5% and 10% of the overall phase creates a high fragmentation of the memory with very
PCM size (see Table I) This proportion is comparable to small sorted parts. This fragmentation generates a lot of CPU
main memory buffers used for external sorting algorithms. As overhead for large DRAM space.
an example, PostgreSQL uses a comparable proportion to do Overall, MONTRES-NVM performs better than the in-
sorting. (2) We varied the data distribution: we evaluated the memory merge sort algorithm whether it is executed in PCM
algorithms on both unsorted, and partially sorted data. In the or in DRAM. These results justify the use of external sorting
latter case, sorted parts represent 20%, 40%, and 60% of the algorithms for a hybrid memory.
overall data set. Table I contains the configuration summary. Fig. 10 shows the execution time speedup of MONTRES-
NVM as compared to the merge sort on DRAM and on PCM,
Parameter Value and MONTRES for 20%, 40% and 60% partially sorted data.
Dataset size 512 MB
PCM size 2 GB
MONTRES-NVM enhances the performance of the merge
DRAM size 1%, 5%, and 10% of PCM size sort on PCM by more than 60% for all the configurations
Index size (K) 32 of DRAM for 20% of partially sorted data. MONTRES-NVM
Smallest sorted subsequence length 5 (unsorted) 64 (sorted) also enhances the performance of the merge sort on DRAM by
Partially sorted data 20%, 40% and 60%
3%, 13% and 14%, for 1%, 5% and 10% of DRAM space re-
spectively. Finally, it improves the performance of MONTRES
TABLE I: Experimental setup
by up to 8%. One may observe from these experiments that for
DRAM space higher than 5%, the performance improvement is

Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:36:32 UTC from IEEE Xplore. Restrictions apply.

(a) Partially sorted data 20% (b) Partially sorted data 40% (c) Partially sorted data 60%
Fig. 10: Execution time speedup of MONTRES-NVM on partially sorted data as compared to merge sort on DRAM, on PCM
and MONTRES

R EFERENCES
too low to justify the use of more DRAM. In effect, the added
CPU overhead is not compensated by the memory read/write [1] PCMSim, https://github.com/huwan/pcmsim
[2] J. Boukhobza, S. Rubini, R. Chen, and Z. Shao, “Emerging NVM.,” In:
savings induced by the use of MONTRES-NVM. ACM Transactions on Design Automation of Electronic Systems 23.2
In case of 60% partially sorted data, MONTRES-NVM (Nov. 2017), pp. 132.
improves the merge sort on PCM by up to 80%, the merge sort [3] O. Mutlu, “Main Memory Scaling: Challenges and Solution Directions.,”
In: More than Moore Technologies for Next Generation Computer
on DRAM by up to 40% and MONTRES by up to 33%. In Design. New York, NY: Springer New York, 2015, pp. 127153.
fact, when there are more partially sorted data, the primary [4] Benjamin C Lee, Ping Zhou, Jun Yang, Youtao Zhang, Bo Zhao, Engin
index makes a good job in avoiding several block sorting Ipek, Onur Mutlu, and Doug Burger, “Phase-change technology and the
future of main memory.,” In: IEEE micro 30.1 (2010).
operations (thus avoiding several read and write operations [5] J. Boukhobza, and P. Olivier, “Flash Memory Integration: Performance
on PCM). In addition, the merge phase is also accelerated. and Energy Issues.,” 1st. UK: ISTE Press - Elsevier, 2017.
[6] H. Park, and K. Shim, “FAST: Flash-aware external sorting for mobile
database systems.,” In: Journal of Systems and Software 82.8 (2009),
V. C ONCLUSION pp. 12981312. 2017.
[7] J. Lee, H. Roh, and S. Park, “External Mergesort for Flash-Based Solid
State Drives.,” In: IEEE Transactions on Computers 65.5 (May 2016),
This paper presents an external sorting algorithm named pp. 15181527.
MONTRES-NVM for a hybrid main memory. This algorithm [8] G. Graefe, “Implementing Sorting in Database Systems.,” In: ACM
is an adaptation of MONTRES, a ﬂash-based external sorting Comput. Surv. 38.3 (Sept. 2006).
[9] T. Cormen H., C. Leiserson E., Ronald L. Rivest, and C. Stein,
algorithm. We believe that in a hybrid memory, traditional “Introduction to Algorithms, Third Edition.,” 3rd. The MIT Press, 2009.
in-memory sorting algorithms are not well suited as the [10] A. Laga, J. Boukhobza, F. Singhoff, and M. Koskas, “MONTRES :
performance behavior of DRAM and PCM are different. Merge ON-the-Run External Sorting Algorithm for Large Data Volumes
on SSD Based Storage Systems.,” In: IEEE Transactions on Computers
MONTRES-NVM uses a small part of DRAM to sort a data 66.10 (Oct. 2017), pp. 16891702.
set on PCM. MONTRES-NVM tries to reduce the number of [11] D.E. Knuth, “The art of computer programming: sorting and searching.,”
write operations performed on the PCM while maintaining a Vol. 3. Pearson Education, 1998.
set of structures in DRAM to accelerate the sorting process.
Less efforts have been made in state-of-the-art work to
optimize the CPU overhead of external sorting algorithms as
compared to in-memory algorithms. Traditionally, as the I/O
operations are very time consuming, CPU overhead is hidden.
When performing external sorting on hybrid memory, one
should pay a particular attention to the CPU overhead. We will
investigate different ways to reduce the CPU overhead to better
take proﬁt of the DRAM space during the sorting process
in MONTRES-NVM. We will also work toward reducing the
energy consumption overhead of sorting algorithms on hybrid
memories for embedded systems.

Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:36:32 UTC from IEEE Xplore. Restrictions apply.

OS Storage Management
No ratings yet
OS Storage Management
8 pages
2020 - ICDE2020 NUMA Main Memory Query
No ratings yet
2020 - ICDE2020 NUMA Main Memory Query
12 pages
Vit - IO Book
No ratings yet
Vit - IO Book
191 pages
Algorithms For Memory Hierarchies
No ratings yet
Algorithms For Memory Hierarchies
443 pages
Trends in Computer Operating Systems
No ratings yet
Trends in Computer Operating Systems
5 pages
STL For Large Data Management
No ratings yet
STL For Large Data Management
51 pages
14 NVM
No ratings yet
14 NVM
46 pages
Wong 2010
No ratings yet
Wong 2010
27 pages
External Memory Sorting and Searching
No ratings yet
External Memory Sorting and Searching
22 pages
Hai Jin
No ratings yet
Hai Jin
56 pages
Dsa Important Ques
No ratings yet
Dsa Important Ques
38 pages
Sort Computer
No ratings yet
Sort Computer
16 pages
In Memory Data Management and Analysis: Arun Jagatheesan Justin Levandoski Thomas Neumann Andrew Pavlo
No ratings yet
In Memory Data Management and Analysis: Arun Jagatheesan Justin Levandoski Thomas Neumann Andrew Pavlo
157 pages
Multiply Accumulate Operations in Memristor Crossbar Arrays Foranalog Computing
No ratings yet
Multiply Accumulate Operations in Memristor Crossbar Arrays Foranalog Computing
22 pages
A Survey of MRAM-Centric Computing From Near Memory To in Memory
No ratings yet
A Survey of MRAM-Centric Computing From Near Memory To in Memory
13 pages
Algorithms and Data Structures For External Memory
No ratings yet
Algorithms and Data Structures For External Memory
191 pages
Reconfigurable 2T2R ReRAM Architecture For Versatile Data Storage and Computing In-Memory
No ratings yet
Reconfigurable 2T2R ReRAM Architecture For Versatile Data Storage and Computing In-Memory
14 pages
Pseudocode For Clarity, Not A Full Implementation
No ratings yet
Pseudocode For Clarity, Not A Full Implementation
41 pages
Efficient Parallel Sort On AVX-512-based Multi-Core and Many-Core Architectures
No ratings yet
Efficient Parallel Sort On AVX-512-based Multi-Core and Many-Core Architectures
9 pages
PCM Ieee Micro10
No ratings yet
PCM Ieee Micro10
11 pages
Ho 2019
No ratings yet
Ho 2019
6 pages
Myung 2020
No ratings yet
Myung 2020
8 pages
Efficient Memory Mapped File IO For In-Memory File Systems
No ratings yet
Efficient Memory Mapped File IO For In-Memory File Systems
6 pages
Cache-Oblivious Data Structures
No ratings yet
Cache-Oblivious Data Structures
29 pages
Dsa-Assignment !
No ratings yet
Dsa-Assignment !
4 pages
Sunny Kumar Srivastava
No ratings yet
Sunny Kumar Srivastava
10 pages
10 1109icesc48915 2020 9155623
No ratings yet
10 1109icesc48915 2020 9155623
7 pages
Adding Topology and Memory Awareness in Data Aggregation Algorithms
No ratings yet
Adding Topology and Memory Awareness in Data Aggregation Algorithms
16 pages
Umami
No ratings yet
Umami
15 pages
Memristive Data Ranking
No ratings yet
Memristive Data Ranking
13 pages
Efficient Implementation of Sorting On Multi-Core SIMD CPU Architecture
No ratings yet
Efficient Implementation of Sorting On Multi-Core SIMD CPU Architecture
12 pages
Memory Arch
No ratings yet
Memory Arch
9 pages
A Comprehensive Study of Main-Memory Partitioning and Its Application To Large-Scale Comparison - and Radix-Sort (Sigmod14i)
No ratings yet
A Comprehensive Study of Main-Memory Partitioning and Its Application To Large-Scale Comparison - and Radix-Sort (Sigmod14i)
12 pages
Electronics: Challenges and Applications of Emerging Nonvolatile Memory Devices
No ratings yet
Electronics: Challenges and Applications of Emerging Nonvolatile Memory Devices
24 pages
Leo Computer Journal
No ratings yet
Leo Computer Journal
16 pages
Lecture 8 Memory Hierachy-Virtual Memories
No ratings yet
Lecture 8 Memory Hierachy-Virtual Memories
28 pages
An Overview of Computing-In-Memory Circuits With DRAM and NVM
No ratings yet
An Overview of Computing-In-Memory Circuits With DRAM and NVM
6 pages
NVM Aware NUMA Replication
No ratings yet
NVM Aware NUMA Replication
61 pages
Optimizated Allocation of Data Variables To PCM/DRAM-based Hybrid Main Memory For Real-Time Embedded Systems
No ratings yet
Optimizated Allocation of Data Variables To PCM/DRAM-based Hybrid Main Memory For Real-Time Embedded Systems
4 pages
Fast Decoding ECC For Future Memories
No ratings yet
Fast Decoding ECC For Future Memories
12 pages
Virtual Memory: Princeton University, Princeton, New Jersey
No ratings yet
Virtual Memory: Princeton University, Princeton, New Jersey
37 pages
(FIXED) CSTIC 2020 - 9-13 - Chien Wang - Peter - Xu
No ratings yet
(FIXED) CSTIC 2020 - 9-13 - Chien Wang - Peter - Xu
3 pages
Topic 3
No ratings yet
Topic 3
69 pages
Ec24m2018 VTTD
No ratings yet
Ec24m2018 VTTD
11 pages
Practical Consideration of Internal Sorting and External
No ratings yet
Practical Consideration of Internal Sorting and External
20 pages
PPGCC: Non-Volatile Memory: Emerging Technologies and Their Impacts On Memory Systems
No ratings yet
PPGCC: Non-Volatile Memory: Emerging Technologies and Their Impacts On Memory Systems
44 pages
Memory Devices and Applications For In-Memory Computing
No ratings yet
Memory Devices and Applications For In-Memory Computing
16 pages
Emerging Nonvolatile Memory (NVM) Technologies: An Chen
No ratings yet
Emerging Nonvolatile Memory (NVM) Technologies: An Chen
5 pages
P1753-Arulraj Non Volatile Memory Database Management Systems
No ratings yet
P1753-Arulraj Non Volatile Memory Database Management Systems
6 pages
Hardware Memory Management For Future Mobile Hybrid Memory Systems
No ratings yet
Hardware Memory Management For Future Mobile Hybrid Memory Systems
11 pages
NOVA: A Log-Structured File System For Hybrid Volatile/Non-volatile Main Memories
No ratings yet
NOVA: A Log-Structured File System For Hybrid Volatile/Non-volatile Main Memories
17 pages
Outline: - Introduction - Different Scratch Pad Memories - Cache and Scratch Pad For Embedded Applications
No ratings yet
Outline: - Introduction - Different Scratch Pad Memories - Cache and Scratch Pad For Embedded Applications
54 pages
Moving Processing To Data: On The Influence of Processing in Memory On Data Management
No ratings yet
Moving Processing To Data: On The Influence of Processing in Memory On Data Management
21 pages
(Survey) Memory Devices and Applications For In-Memory Computing
No ratings yet
(Survey) Memory Devices and Applications For In-Memory Computing
16 pages
A Survey of Software Techniques For Using No-Volatile Memories For Storage and Main Memory Systems
No ratings yet
A Survey of Software Techniques For Using No-Volatile Memories For Storage and Main Memory Systems
14 pages
Bengal College of Engineering and Technology: Report On Storage Strategies
No ratings yet
Bengal College of Engineering and Technology: Report On Storage Strategies
15 pages
PhaseChangeMemory PDF
No ratings yet
PhaseChangeMemory PDF
136 pages
Entity Authentication
No ratings yet
Entity Authentication
30 pages
PTS Syllabus
100% (1)
PTS Syllabus
6 pages
How To Install Openshift On A Laptop or Desktop
100% (1)
How To Install Openshift On A Laptop or Desktop
7 pages
C. Router
No ratings yet
C. Router
10 pages
WORD Shortcut Keys
50% (2)
WORD Shortcut Keys
2 pages
HPC Tuning Guide PDF
No ratings yet
HPC Tuning Guide PDF
22 pages
2-Review of Discrete-Time Signals and Systems-13-12-2024
No ratings yet
2-Review of Discrete-Time Signals and Systems-13-12-2024
68 pages
Storage (S3, Cloudfront)
No ratings yet
Storage (S3, Cloudfront)
21 pages
Summative Test Grade 9 Math
No ratings yet
Summative Test Grade 9 Math
3 pages
The Ribbons - MS Word Review 12345
No ratings yet
The Ribbons - MS Word Review 12345
14 pages
Oct 2020 Spotlight PDF
No ratings yet
Oct 2020 Spotlight PDF
163 pages
2024 09 Exam SRM Syllabus
No ratings yet
2024 09 Exam SRM Syllabus
6 pages
Cand's Pack
No ratings yet
Cand's Pack
8 pages
What's A Satellite Assembly ?
No ratings yet
What's A Satellite Assembly ?
5 pages
New XXX Hot XNXX Sex Xvideo Hot Sex 0008
No ratings yet
New XXX Hot XNXX Sex Xvideo Hot Sex 0008
3 pages
E3220 p5k3 Deluxe
No ratings yet
E3220 p5k3 Deluxe
172 pages
تفعيل الانترنت 4جي - خدماتل
No ratings yet
تفعيل الانترنت 4جي - خدماتل
2 pages
Yunita 2021 J. Phys. Conf. Ser. 1898 012044
No ratings yet
Yunita 2021 J. Phys. Conf. Ser. 1898 012044
15 pages
Installation & Basic Operations: Medcaptain Service Dept
No ratings yet
Installation & Basic Operations: Medcaptain Service Dept
24 pages
New Sorting-Based Partial Distortion Elimination Algorithm For Fast Optimal Motion Estimation
No ratings yet
New Sorting-Based Partial Distortion Elimination Algorithm For Fast Optimal Motion Estimation
6 pages
29.11.2024 FN Seating
No ratings yet
29.11.2024 FN Seating
4 pages
Understanding The Priority Queue With Custom
No ratings yet
Understanding The Priority Queue With Custom
3 pages
Sorting Unsigned Permutations by Reversals Using Multi-Objective Evolutionary Algorithms With Variable Size Individuals
No ratings yet
Sorting Unsigned Permutations by Reversals Using Multi-Objective Evolutionary Algorithms With Variable Size Individuals
4 pages
ST Secure Solutions Authentication and Iot
No ratings yet
ST Secure Solutions Authentication and Iot
14 pages
Research Paper Topics On Computer Engineering
100% (1)
Research Paper Topics On Computer Engineering
7 pages
Load Balancing in Cloud Computing: Violetta N. Volkova, Liudmila V. Chernenkaya Elena N. Desyatirikova
No ratings yet
Load Balancing in Cloud Computing: Violetta N. Volkova, Liudmila V. Chernenkaya Elena N. Desyatirikova
4 pages
Learning Complexity vs. Communication Complexity
No ratings yet
Learning Complexity vs. Communication Complexity
11 pages
Influence of Information Overload On IT Security Behavior: A Theoretical Framework
No ratings yet
Influence of Information Overload On IT Security Behavior: A Theoretical Framework
10 pages
IT English Test Unit 5
No ratings yet
IT English Test Unit 5
6 pages
Experimental Study On The Five Sort Algorithms: You Yang, Ping Yu, Yan Gan
No ratings yet
Experimental Study On The Five Sort Algorithms: You Yang, Ping Yu, Yan Gan
4 pages
20 - Access Control Lists - FortiGate
No ratings yet
20 - Access Control Lists - FortiGate
3 pages
System Center 2022 v23.10
No ratings yet
System Center 2022 v23.10
1 page
$evwudfw 7Klv Duwlfoh Pdlqo/ DLPV DW WKH Vruwlqj Dojrulwkp
No ratings yet
$evwudfw 7Klv Duwlfoh Pdlqo/ DLPV DW WKH Vruwlqj Dojrulwkp
4 pages
A Detailed Experimental Analysis of Library Sort Algorithm: Neetu Faujdar
No ratings yet
A Detailed Experimental Analysis of Library Sort Algorithm: Neetu Faujdar
6 pages
Implementation and Performance Comparison of Some Heuristic Algorithms For Block Sorting
No ratings yet
Implementation and Performance Comparison of Some Heuristic Algorithms For Block Sorting
6 pages
Thread-Level Parallel Algorithm For Sorting Integer Sequence On Multi-Core Computers
No ratings yet
Thread-Level Parallel Algorithm For Sorting Integer Sequence On Multi-Core Computers
5 pages
A Reduced-Complexity Sphere Decoding Algorithm For MIMO Systems
No ratings yet
A Reduced-Complexity Sphere Decoding Algorithm For MIMO Systems
5 pages
Sum-of-Products With Default Values: Algorithms and Complexity Results
No ratings yet
Sum-of-Products With Default Values: Algorithms and Complexity Results
5 pages
K-K Sorting On The Multi Mesh of Trees: Nitin Rakesh, Member, IEEE Nitin Chanderwal, Member, IEEE, SIAM, ACM
No ratings yet
K-K Sorting On The Multi Mesh of Trees: Nitin Rakesh, Member, IEEE Nitin Chanderwal, Member, IEEE, SIAM, ACM
4 pages
Reversible Watermarking Based On Sorting Prediction Scheme
No ratings yet
Reversible Watermarking Based On Sorting Prediction Scheme
4 pages
Clump Sort: A Stable Alternative To Heap Sort For Sorting Medical Data
No ratings yet
Clump Sort: A Stable Alternative To Heap Sort For Sorting Medical Data
4 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-N
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-N
3 pages
Assignment 10-13 PDF
No ratings yet
Assignment 10-13 PDF
12 pages
A Sort Implementation Comparing With Bubble Sort and Selection Sort
No ratings yet
A Sort Implementation Comparing With Bubble Sort and Selection Sort
2 pages
Mms Exam Form
No ratings yet
Mms Exam Form
1 page
Accelerated Computing with HIP
From Everand
Accelerated Computing with HIP
Yifan Sun
4.5/5 (2)
Computer Science: Research in Memory Management
From Everand
Computer Science: Research in Memory Management
Iris Li
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet

MONTRES-NVM: An External Sorting Algorithm For Hybrid Memory

Uploaded by

MONTRES-NVM: An External Sorting Algorithm For Hybrid Memory

Uploaded by

2018 7th IEEE Non-Volatile Memory Systems and Applications Symposium

MONTRES-NVM: an External Sorting Algorithm

2575-257X/18/$31.00 ©2018 IEEE 49

improves MONTRES by up to 33%, while it enhances the

A. Phase Change Memory  $#  !#$"

genide material is subject to rapid amorphous-to-crystalline

,QSXWGDWD $OUHDG\VRUWHGGDWD $OUHDG\VRUWHGGDWD

%HJLQSRVLWLRQZLWKLQLQSXWGDWD (QGSRVLWLRQZLWKLQLQSXWGDWD 0LQLPXPYDOXH

Fig. 4: already sorted data detection

      

,QSXWGDWD $OUHDG\VRUWHGGDWD $OUHDG\VRUWHGGDWD

Fig. 3: Execution times of ﬁve sorting algorithms on a random 3&0

data set VHFRQGDU\LQGH[ XQVRUWHGGDWDLQGH[

&KXQNLG %HJLQDQGHQGSRVLWLRQVZLWKLQLQSXWGDWD 0LQLPXPYDOXH

MONTRES-NVM includes three different phases: (1) sorted

minimum value in secondary-index, that is 14, are written    

NVM was ﬁrst adapted to exploit the byte addressable feature

(like MONTRES), MONTRES-NVM loads only one value

,QSXW $OUHDG\ $OUHDG\

       

Fig. 8: Run merge phase illustration

   

A. Evaluation methodology & setup

           

You might also like

A. Phase Change Memory $# !#$"

minimum value in secondary-index, that is 14, are written