0% found this document useful (0 votes)
98 views6 pages

MONTRES-NVM: An External Sorting Algorithm For Hybrid Memory

This document summarizes a research paper presented at the 2018 7th IEEE Non-Volatile Memory Systems and Applications Symposium about a new hybrid memory-aware sorting algorithm called MONTRES-NVM. MONTRES-NVM is designed for a hybrid main memory containing both DRAM and PCM. It improves on previous external sorting algorithms by taking advantage of DRAM's low latency to accelerate sorting while minimizing writes to the higher latency PCM. The algorithm works in three phases: detecting already sorted sequences, sorting unsorted blocks in DRAM and writing parts to PCM, and finally merging all sorted data using a dichotomous technique. Experimental results show MONTRES-NVM provides up to 80% faster

Uploaded by

angki_ang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views6 pages

MONTRES-NVM: An External Sorting Algorithm For Hybrid Memory

This document summarizes a research paper presented at the 2018 7th IEEE Non-Volatile Memory Systems and Applications Symposium about a new hybrid memory-aware sorting algorithm called MONTRES-NVM. MONTRES-NVM is designed for a hybrid main memory containing both DRAM and PCM. It improves on previous external sorting algorithms by taking advantage of DRAM's low latency to accelerate sorting while minimizing writes to the higher latency PCM. The algorithm works in three phases: detecting already sorted sequences, sorting unsorted blocks in DRAM and writing parts to PCM, and finally merging all sorted data using a dichotomous technique. Experimental results show MONTRES-NVM provides up to 80% faster

Uploaded by

angki_ang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2018 7th IEEE Non-Volatile Memory Systems and Applications Symposium

MONTRES-NVM: an External Sorting Algorithm


for Hybrid Memory
Mohammed Bey Ahmed Khernache Arezki Laga, Jalil Boukhobza
Univ. Bretagne Sud Univ. Bretagne Occidentale
UMR 6285, Lab-STICC UMR 6285, Lab-STICC
F-56100, Lorient, France F-29200, Brest, France
[email protected] {firstname.lastname}@univ-brest.fr

Abstract—DRAM technology is approaching its scaling limit In the literature, many algorithms have been proposed to
and the use of emerging NVM is seen as one possible solution sort data in the main memory (DRAM) such as quick sort,
to such an issue. As NVM technologies are not mature enough radix sort or merge sort. When dealing with large data volumes
and does not outperform DRAMs, several studies expect the
use of hybrid main memories containing both DRAM and PCM (i.e which size is larger than the allocated main memory
NVM. Redesigning applications for such systems is mandatory size), external sorting algorithms have to be used. They are
as PCM does not have the same performance model as DRAM. composed of two phases: a run generation phase and a
In this context, we designed a hybrid memory-aware sorting run merge phase [8]. The run generation phase splits data
algorithm called MONTRES-NVM. Since an NVM-based hybrid into chunks that fit into the main-memory, sorts them and
memory presents a performance gap between DRAM and PCM,
we believe that the sorting algorithm falls in the external sorting writes the sorted chunk into intermediate files (called runs).
category. As a matter of fact, we extended our previously Then, runs are merged and written into the final sorted file.
designed flash-based external sorting algorithm MONTRES for The performance of these algorithms highly depends on the
a hybrid memory by taking profit of byte addressability, and way they manage I/O operations. External sorting algorithms
performance asymmetry between reads and writes. MONTRES- were designed to optimize I/O requests on traditional magnetic
NVM enhances the performance of the merge sort algorithm on
PCM by more than 60%, the merge sort on DRAM by 3-40% drives [8]. They were then optimized to take benefit from flash
and MONTRES (on a hybrid memory) by 3-33% according to memory performance (SSD) [10], [7], [6].
the proportion of already sorted data in the dataset. In our work, we considered a hybrid main memory with a
Index Terms—Sorting algorithm, Hybrid memory, Non Volatile large proportion of PCM as compared to DRAM, as in several
Memory, Phase Change Memory. state-of-the-art work [2]. Since PCM has higher latencies than
DRAM and asymmetric read/write operations performance
I. I NTRODUCTION [4][2], we believe that sorting in a PCM/DRAM main memory
Nowadays, the scaling of DRAM memory is approaching have some similarity with external sorting algorithms.
its limit [3] and increasing its density imposes an exponen- In this paper, we present a new hybrid memory-aware
tial cost penalty. Emerging memory technologies, such as sorting algorithm named MONTRES-NVM. This sorting algo-
Phase Change Memory (PCM), may be part of the solution rithm is based on a previously developed external sorting flash
thanks to the high density they can provide [2]. PCM is memory-based algorithm named MONTRES (Merge ON-The-
a byte-addressable memory. It has small-sized cells and a Run External Sorting) [10]. The main idea of MONTRES-
good endurance compared to NAND flash memory. PCM NVM is to take profit of the small size DRAM to accelerate
may change our view on the memory hierarchy. It can be the sorting process while minimizing the number of write op-
integrated either horizontally where it is considered as an erations performed on the PCM. To do so, MONTRES-NVM
extension of an existing memory level, or vertically where it is is composed of three main phases: (1) a first read operation is
interleaved between two existing memory levels [2]. However, performed on the data to detect already sorted sub-sequences.
as compared to DRAM, PCM has a higher access latency, These sub-sequences are then indexed ; (2) unsorted sub-
especially for write operations, thus a higher energy cost [4]. sequences are divided in blocks that can fit into DRAM
The volume of data is growing exponentially and it is workspace and sorted in DRAM, we used MONTRES’merge-
supposed to attain 185 zettabytes in 2025 [5]. To take profit of on-the-fly mechanism to store parts of the sorted data in
this huge amount of data, for instance, for real-time analytics PCM, (3) finally, all sorted data sub-parts are merged using
applications, the need for fast processing becomes a necessity. dichotomy technique and a heap data structure.
Sorting data is one of the most important computational MONTRES-NVM has been compared with both in-memory
problem for which algorithms have been developed [9]. For and external sorting algorithms. On random data, when com-
instance, the CPU spends 60% of its time sorting data [11] and pared with the merge sort (in-memory) algorithm, MONTRES-
most operations in a Data Base Management Systems (DBMS) NVM decreases the sorting time on PCM by more than 60%,
use these algorithms. and on DRAM by about 14%. It outperformed MONTRES

2575-257X/18/$31.00 ©2018 IEEE 49


DOI 10.1109/NVMSA.2018.00013

Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:36:32 UTC from IEEE Xplore. Restrictions apply.
by about 6%. With partially sorted data, MONTRES-NVM  $

improves MONTRES by up to 33%, while it enhances the 


merge sort on PCM by up to 80% and the merge sort fully on    
  
DRAM by up to 40%. !% "!#$
$
!   
The remainder of the paper is organized as follows: Section 
  
!%
2 gives some background about PCM and traditional external
sorting algorithm ; Section 3 describes MONTRES-NVM ;
Section 4 gives the experimental evaluation and finally in
$ $ $ $ !$ !$
Section 5 we conclude the paper.
      
II. BACKGROUND    
   

A. Phase Change Memory  $#  !#$"


Phase Change Memory (PCM) is a non-volatile memory Fig. 1: Illustration of the run generation phase [10].
based on a chalcogenide alloy. It is a resistive NVM that
utilizes different resistance levels to represent binary infor-
      
mation. A PCM cell usually uses two electrodes that wrap the
chalcogenide material, in addition to a heater. This chalco-   

genide material is subject to rapid amorphous-to-crystalline    


phase-change process that is electrically initiated. One can      
distinguish two states of resistance according to the voltage
   
pulse applied to the PCM cell. A short but high voltage pulse
results in an amorphous state (binary 0), while a long pulse     
with a low voltage results in a crystalline state (binary 1).  
  

PCM gives good performance figures (that may be as good   
  
as DRAM for read operations) and a good static energy
properties. However, it presents two main drawbacks, the write
operations are slow as compared to reads and the endurance
  
of PCM cells are much lower than that of DRAM cells.
B. External sorting algorithm 

 
External sorting algorithms are tailored for data volumes
that are larger than the main memory. Data are initially stored
Fig. 2: Illustration of the run merge phase [10].
in a storage device and the sorting algorithm uses an allocated
amount of DRAM to work. As the performance gap between
DRAM and storage devices is significant, external sorting A. Motivation: sorting in PCM vs sorting in DRAM
algorithms try to minimize the number of I/O operations
performed. External sorting algorithms were first designed The objective of this section is to show the impact of
for Hard Disk Drives (HDD). In order to reduce the cost the PCM performance properties on the execution of some
of I/O operations, they massively relied on sequential I/O traditional sorting algorithms compared to their execution on
operations. Then, they were optimized for Solid State Drives DRAM. We will highlight the need for revising state-of-the-art
(SSD), mainly by relying on random reads to minimize the sorting algorithms for PCM-based memories.
number of I/O operations and at the same time by reducing We have evaluated the execution of the following popular in-
the amount of write operations to preserve SSD lifetime. memory sorting algorithms: merge sort, quick sort, heap sort,
An external sorting algorithm is composed of two phases: counting sort and radix sort. For each algorithm, we measured
(1) a run generation phase and (2) a run merge phase. In the the execution time for two memory configurations: the first
first phase, a chunk of data is loaded from the storage device one is a full DRAM main memory, and the second one is a
into the DRAM. Then it is sorted with an in-memory sorting full PCM main memory, emulated using a PCMSim [1].
algorithm. The sorted chunk is written back to the storage Fig. 3 presents the execution times of in-memory sorting
device in an intermediate file called a run, see Fig. 1. In the algorithms on the two memory configurations. Many obser-
second phase, the sorting algorithm merges the runs by loading vations can be drawn from this figure: (1) using PCM slows
sub-parts into the DRAM iteratively until the runs are entirely down the execution time of the sorting algorithms, which is
merged, see Fig. 2. quite intuitive as the PCM presents higher access latencies
than DRAM. (2) The performance degradation due to PCM
III. MONTRES-NVM D ESIGN is highly variable from one algorithm to another. Indeed, the
In this section, we first discuss the motivation behind this performance degradation for counting sort, radix sort, merge
work, then we describe MONTRES-NVM, the hybrid memory sort, quick sort and heap sort is 45%, 194%, 27%, 39% and
aware sorting algorithm we have designed. 61% respectively. This is mainly due to the proportion of write

50

Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:36:32 UTC from IEEE Xplore. Restrictions apply.
operations performed by these algorithms. (3) Finally, for a The more the input data contains already sorted values, the
given data set to sort, the ranking of the sorting algorithms is longer the sorted sequences are, and the lower is the amount of
changed from one memory to the other. For instance, the radix data to sort in the next phases. The remaining unsorted data are
sort is better than the merge sort on DRAM but on PCM the processed during the run generation phase. This first phase
radix sort is more than two times slower. did not exist in MONTRES.
Those results motivated us to revise sorting algorithms for
Example 1. In Fig. 4, we give an example for the sorted data
NVM, especially hybrid memories that consist of PCM and
detection phase. In this example, the input data contain 16
DRAM. In such a configuration, we assume a large PCM
values stored in PCM and the number of sequences already
volume (as PCM presents a higher density) and a small
sorted to extract is set to L = 2. So, MONTRES-NVM finds
DRAM. Since the performance gap between PCM and DRAM
L = 2 longest already sorted sequences containing 4 sorted
is high for the write operations, we believe that external
values. The first already sorted sequence is located between
sorting algorithms are more suited for such hybrid memory
positions 4 and 7 in the input data. The second one is located
configuration. In the case of external sorting, data to sort are
between positions 10 and 13. Finally, these sequences are
stored in PCM and chunks are successively brought to the
inserted into the primary-index.
DRAM for sorting process. The final sorted data are written
back to PCM.

,QSXWGDWD $OUHDG\VRUWHGGDWD $OUHDG\VRUWHGGDWD

               

               
        

3&0
 
 3ULPDU\LQGH[ DOUHDG\VRUWHGGDWDLQGH[

%HJLQSRVLWLRQZLWKLQLQSXWGDWD (QGSRVLWLRQZLWKLQLQSXWGDWD 0LQLPXPYDOXH

  

  

'5$0

Fig. 4: already sorted data detection

                

,QSXWGDWD $OUHDG\VRUWHGGDWD $OUHDG\VRUWHGGDWD

               

               

Fig. 3: Execution times of five sorting algorithms on a random 3&0

data set VHFRQGDU\LQGH[ XQVRUWHGGDWDLQGH[

&KXQNLG %HJLQDQGHQGSRVLWLRQVZLWKLQLQSXWGDWD 0LQLPXPYDOXH

    
B. The design of MONTRES-NVM   

MONTRES-NVM includes three different phases: (1) sorted


'5$0
data detection, (2) run generation and (3) run merge. These
phases are described in the following subsections. Fig. 5: unsorted data indexation
1) Sorted data detection: The idea behind this phase is
to detect already sorted data in PCM. This is done to avoid 2) Run generation phase: The objective of this phase is
loading them in DRAM for sorting and writing them back to to generate sorted runs from the unsorted chunks of data and
PCM once sorted as write operations are much more costly store them into the PCM before merging them in the next
than reads’ in PCM memory. Since our objective is to use step. During this phase MONTRES-NVM adapts the merge-
as little DRAM as possible, we bound the number of already on-the-fly optimization of MONTRES. It consists in sorting
sorted sequences we can detect to L sequences. the unsorted data beginning from the chunks containing the
So, during this first phase, MONTRES-NVM searches for lowest values. We rely on values locality. That is, if a chunk
the L longest already sorted sequences of values in the input contains the lowest value, there is a high probability that it
data set stored in PCM. The L sorted sequences starting and contains other low values. Doing so makes it possible to evict
ending positions are saved in and index stored in DRAM. We directly into the final file several values without saving them
call it the primary index. This is possible thanks to byte- into intermediate runs, thus reducing the write traffic on PCM.
addressability feature of PCM memory as we do not have During the first phase, in addition to the primary index,
block boundary constraints like in SSDs. MONTRES-NVM organizes the remaining (unsorted) input

51

Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:36:32 UTC from IEEE Xplore. Restrictions apply.
data into chunks of m values, m being number of values able algorithm retrieves from the obtained run all minimum values
to fit in the available DRAM space dedicated for the sorting lower than the next minimum value in the min-heap. These
process. These chunks are indexed using a secondary index. minimum values are written directly into the final sorted data.
Each entry of the index includes the minimum value of the Therefore, this mechanism allows MONTRES-NVM to write
chunk and positions of the remaining data belonging to one several values to the output array in every iteration, whereas
chunk (starting and ending position). MONTRES writes only one.
The run generation phase starts by retrieving from the
Example 3. Fig 8 illustrates the merge of the two generated
secondary index the chunk containing the lowest value. This
runs with L = 2 already sorted data sequences. The merge
chunk is loaded into the DRAM space and sorted using the
mechanism builds a min-heap with the minimum values from
merge sort in-memory algorithm. Merge sort was used since
the generated runs and the already sorted data. Once the min-
it gives a good trade-off between performance and memory
heap is created, MONTRES-NVM starts the merge process by
footprint. Once the chunk is sorted in DRAM, sorted values
retrieving the first minimum value from the min-heap (min =14
greater than the next minimum value in the secondary index
located in intermediate run 1). Then, all the minimum values
are written into an intermediate run in the PCM memory.
in the intermediate run 1 lower than the next minimum value in
The lower values are merged on-the-fly with previously sorted
the min-heap (17), that are 14 and 15 in this case, are retrieved
data (written into previously generated run if any) and already
and written into final sorted data structure. Then, MONTRES-
sorted data (see [10] for more details). This phase generates
n NVM updates the heap by inserting the new minimum value
at most m runs stored in PCM.
in the intermediate run 1, 40 in this case.
While MONTRES loads data block by block during this
phase, MONTRES-NVM uses the byte addressability property
to load chunks of data of different size according to the ,QSXWGDWD $OUHDG\VRUWHGGDWD $OUHDG\VRUWHGGDWD
secondary index.
               
Example 2. In Fig.6, MONTRES-NVM gathers unsorted data                
to create chunks containing m = 4 values. These chunks are
,QWHUPHGLDWHUXQ  
then inserted into the secondary-index and sorted according
to their minimum value. Chunks are processed successively, 3&0

starting from the one having the lowest value. In Fig. 6, SULPDU\LQGH[ VHFRQGDU\LQGH[

the first chunk (chunk 0) containing the lowest value in the    &KXQN    

secondary-index is loaded from PCM memory into DRAM,    &KXQN   1H[WPLQLPXP

then sorted. Values in the sorted chunk, greater than the next &KXQN ,QPHPRU\ 6RUWHGFKXQN

minimum value in secondary-index, that is 14, are written    


VRUWLQJ
   

into the intermediate run in the PCM memory. The remaining '5$0

values are merged on-the-fly with already sorted data (see Fig
7). The merge on-the-fly, presented in Fig 7, considers three Fig. 6: Sorting chunks in the DRAM
inputs: the remaining values of the sorted chunk in DRAM (6
and 9) and two already sorted data belonging to the input
data already sorted and stored in PCM. In this case, only &KXQN   1H[WPLQ

one intermediate run has been created, and its values are all 0LQYDOXHVLQFKXQN
greater than the next min value 14, that is why intermediate  

run does not take part in the merge on-the-fly process. Values
'5$0
merged on-the-fly are written directly into the final sorted data
space in PCM.
,QSXWGDWD
,QSX $OUHDG\VRUWHGGDWD $OUHDG\VRUWHGGDWD

3) Run merge phase: The run merge phase of MONTRES-           

NVM was first adapted to exploit the byte addressable feature


of the PCM memory to merge the runs in one pass. In fact, 0HUJHRQ
WKHIO\
instead of loading blocks of data from each run into in- )LQDOVRUWHGGDWD ,QWHUPHGLDWHUXQ
memory buffers as in traditional external sorting algorithms        

(like MONTRES), MONTRES-NVM loads only one value


3&0
from each run into DRAM for the merging process.
n
In order to merge the m generated runs, MONTRES-NVM Fig. 7: Merge on-the-fly of the second iteration
n
uses the k-way merge algorithm [11] with k = m . This
n
algorithm makes use of a min-heap containing m minimum
n IV. P ERFORMANCE EVALUATION
values from the m generated runs.
MONTRES-NVM iterates on the min-heap, and extracts This section describes the used methodology to evaluate the
the run containing the lowest value on each iteration. The relevance of MONTRES-NVM. Then, results are discussed.

52

Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:36:32 UTC from IEEE Xplore. Restrictions apply.
B. Results and discussion


 

0LQKHDS

   
'5$0

,QSXW $OUHDG\ $OUHDG\


GDWD 6RUWHG'DWD 6RUWHG'DWD

       

     
,QWHUPHGLDWHUXQ ,QWHUPHGLDWHUXQ

3&0

Fig. 8: Run merge phase illustration

     

A. Evaluation methodology & setup


This section describes the experimental platform setup,
followed by the methodology to evaluate MONTRES-NVM.
1) Experimental platform setup: For our experimentations, Fig. 9: Execution time speedup of MONTRES-NVM on
we used PCMSim [1], a Linux kernel block device driver that random data as compared to merge sort on DRAM, on PCM
emulates a PCM in one of the DIMM slots of the motherboard. and MONTRES
We ran PCMSim with a Linux kernel 2.6.38. We used datasets
extracted from TPC-H benchmark. We experimented with two Fig. 9 shows the execution time speedup given by
different datasets: random/unsorted and partially sorted. MONTRES-NVM as compared to merge sort on PCM, merge
2) Methodology: In our experimental methodology, we sort on DRAM and MONTRES. MONTRES-NVM enhances
compared our results with state-of-the-art work. The basic idea the execution time over the merge sort on PCM by about 65%
behind the design of MONTRES-NVM is the use of a hybrid for the three DRAM configurations. It also enhances the merge
memory with a small sized DRAM. Since MONTRES-NVM sort on DRAM by 15%, 14% and 12% when using only 1%,
is an extension (of MONTRES which is an extension) of the 5% and 10% of DRAM memory respectively.
merge sort algorithm, we compared it with the merge sort fully MONTRES-NVM enhances the execution time of MON-
executed on DRAM. We also compared with the merge sort TRES by 9%, 5% and 3% for 1%, 5% and 10% of DRAM
on PCM to see if it is worth designing in-memory sorting space respectively. One may observe that adding more memory
algorithms on a hybrid memory. Finally we compared with does not help in sorting the data faster. In case of random data,
MONTRES algorithm on a hybrid memory, since we reused the use of more DRAM would not help in accelerating the
some of its optimizations. sorting process while it incurs more CPU processing during
In this evaluation part, we varied (1) the DRAM size: we the sorting process. In fact, on random distribution, the first
used three configurations 1%, 5% and 10% of the overall phase creates a high fragmentation of the memory with very
PCM size (see Table I) This proportion is comparable to small sorted parts. This fragmentation generates a lot of CPU
main memory buffers used for external sorting algorithms. As overhead for large DRAM space.
an example, PostgreSQL uses a comparable proportion to do Overall, MONTRES-NVM performs better than the in-
sorting. (2) We varied the data distribution: we evaluated the memory merge sort algorithm whether it is executed in PCM
algorithms on both unsorted, and partially sorted data. In the or in DRAM. These results justify the use of external sorting
latter case, sorted parts represent 20%, 40%, and 60% of the algorithms for a hybrid memory.
overall data set. Table I contains the configuration summary. Fig. 10 shows the execution time speedup of MONTRES-
NVM as compared to the merge sort on DRAM and on PCM,
Parameter Value and MONTRES for 20%, 40% and 60% partially sorted data.
Dataset size 512 MB
PCM size 2 GB
MONTRES-NVM enhances the performance of the merge
DRAM size 1%, 5%, and 10% of PCM size sort on PCM by more than 60% for all the configurations
Index size (K) 32 of DRAM for 20% of partially sorted data. MONTRES-NVM
Smallest sorted subsequence length 5 (unsorted) 64 (sorted) also enhances the performance of the merge sort on DRAM by
Partially sorted data 20%, 40% and 60%
3%, 13% and 14%, for 1%, 5% and 10% of DRAM space re-
spectively. Finally, it improves the performance of MONTRES
TABLE I: Experimental setup
by up to 8%. One may observe from these experiments that for
DRAM space higher than 5%, the performance improvement is

53

Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:36:32 UTC from IEEE Xplore. Restrictions apply.
           

                 

(a) Partially sorted data 20% (b) Partially sorted data 40% (c) Partially sorted data 60%
Fig. 10: Execution time speedup of MONTRES-NVM on partially sorted data as compared to merge sort on DRAM, on PCM
and MONTRES

R EFERENCES
too low to justify the use of more DRAM. In effect, the added
CPU overhead is not compensated by the memory read/write [1] PCMSim, https://github.com/huwan/pcmsim
[2] J. Boukhobza, S. Rubini, R. Chen, and Z. Shao, “Emerging NVM.,” In:
savings induced by the use of MONTRES-NVM. ACM Transactions on Design Automation of Electronic Systems 23.2
In case of 60% partially sorted data, MONTRES-NVM (Nov. 2017), pp. 132.
improves the merge sort on PCM by up to 80%, the merge sort [3] O. Mutlu, “Main Memory Scaling: Challenges and Solution Directions.,”
In: More than Moore Technologies for Next Generation Computer
on DRAM by up to 40% and MONTRES by up to 33%. In Design. New York, NY: Springer New York, 2015, pp. 127153.
fact, when there are more partially sorted data, the primary [4] Benjamin C Lee, Ping Zhou, Jun Yang, Youtao Zhang, Bo Zhao, Engin
index makes a good job in avoiding several block sorting Ipek, Onur Mutlu, and Doug Burger, “Phase-change technology and the
future of main memory.,” In: IEEE micro 30.1 (2010).
operations (thus avoiding several read and write operations [5] J. Boukhobza, and P. Olivier, “Flash Memory Integration: Performance
on PCM). In addition, the merge phase is also accelerated. and Energy Issues.,” 1st. UK: ISTE Press - Elsevier, 2017.
[6] H. Park, and K. Shim, “FAST: Flash-aware external sorting for mobile
database systems.,” In: Journal of Systems and Software 82.8 (2009),
V. C ONCLUSION pp. 12981312. 2017.
[7] J. Lee, H. Roh, and S. Park, “External Mergesort for Flash-Based Solid
State Drives.,” In: IEEE Transactions on Computers 65.5 (May 2016),
This paper presents an external sorting algorithm named pp. 15181527.
MONTRES-NVM for a hybrid main memory. This algorithm [8] G. Graefe, “Implementing Sorting in Database Systems.,” In: ACM
is an adaptation of MONTRES, a flash-based external sorting Comput. Surv. 38.3 (Sept. 2006).
[9] T. Cormen H., C. Leiserson E., Ronald L. Rivest, and C. Stein,
algorithm. We believe that in a hybrid memory, traditional “Introduction to Algorithms, Third Edition.,” 3rd. The MIT Press, 2009.
in-memory sorting algorithms are not well suited as the [10] A. Laga, J. Boukhobza, F. Singhoff, and M. Koskas, “MONTRES :
performance behavior of DRAM and PCM are different. Merge ON-the-Run External Sorting Algorithm for Large Data Volumes
on SSD Based Storage Systems.,” In: IEEE Transactions on Computers
MONTRES-NVM uses a small part of DRAM to sort a data 66.10 (Oct. 2017), pp. 16891702.
set on PCM. MONTRES-NVM tries to reduce the number of [11] D.E. Knuth, “The art of computer programming: sorting and searching.,”
write operations performed on the PCM while maintaining a Vol. 3. Pearson Education, 1998.
set of structures in DRAM to accelerate the sorting process.
Less efforts have been made in state-of-the-art work to
optimize the CPU overhead of external sorting algorithms as
compared to in-memory algorithms. Traditionally, as the I/O
operations are very time consuming, CPU overhead is hidden.
When performing external sorting on hybrid memory, one
should pay a particular attention to the CPU overhead. We will
investigate different ways to reduce the CPU overhead to better
take profit of the DRAM space during the sorting process
in MONTRES-NVM. We will also work toward reducing the
energy consumption overhead of sorting algorithms on hybrid
memories for embedded systems.

54

Authorized licensed use limited to: Ingrid Nurtanio. Downloaded on October 01,2020 at 02:36:32 UTC from IEEE Xplore. Restrictions apply.

You might also like