Hybrid Storage Systems A Survey of Architectures and Algorithms
Hybrid Storage Systems A Survey of Architectures and Algorithms
sg)
Nanyang Technological University, Singapore.
2018
Niu, J., Xu, J., & Xie, L. (2018). Hybrid storage systems : a survey of architectures and
algorithms. IEEE Access, 6, 13385‑13406.
https://hdl.handle.net/10356/87746
https://doi.org/10.1109/ACCESS.2018.2803302
© 2018 IEEE. Translations and content mining are permitted for academic research
only.Personal use is also permitted, but republication/redistribution requires IEEE
permission.See
http://www.ieee.org/publications_standards/publications/rights/index.html for more
information.
ABSTRACT Data center storage architectures face rapidly increasing demands for data volume and quality
of service requirements today. Hybrid storage systems have turned out to be the one of the most popular
choices in fulfilling these demands. A mixture of various types of storage devices and structures enables
architects to address performance and capacity concerns of users within one storage infrastructure. In this
paper, we present an extensive literature review on the state-of-the-art research for hybrid storage systems.
First, different types of hybrid storage architectures are explored and categorized thoroughly. Second,
the corresponding algorithms and policies, such as caching, scheduling, resource allocation and so on,
are discussed profoundly. Finally, the advantages and disadvantages of these hybrid storage architectures
are compared and analyzed intensively, in terms of system performance, solid state drive lifespan, energy
consumption, and so on, in order to motivate some future research directions.
INDEX TERMS Caching algorithms, data migration, hot data identification, hybrid storage system.
I. INTRODUCTION speed and lower cost per GB, e.g. HDD, and the high per-
In the past decades, the speed of data growing is in exponen- formance of high tier devices with lower capacity and higher
tial. The IDC report [1] estimates that the digital universe dou- cost per GB, e.g. SSD. Hybrid storage architecture is one of
bles in size every two years. In view of its estimation, by 2020, the solutions that can provide comparable or close system
the data may increase to 40 zettabytes. Such tremendous and performance with all fast devices, whereas giving nearly same
assorted data requirements create challenging issues for the storage capacity with all lower tier devices. It usually has the
design of storage infrastructures. Other than the requirement ability to automatically promote or demote different types
of storage capacity, the data accessing speed and reliability of data across different tiers of storage devices. The data
are also key factors influencing the user experience. movement is managed via either the host or the drive itself
Due to the mechanical limitations of Hard Disk Drive according to the performance and capacity requirements. The
(HDD), the traditional storage architecture turns out to be hybrid storage system is the main-stream in modern data
the major performance bottleneck in data intensive systems. centers, and will continue to play an important role in the
With the quick evolution of Flash memory technologies [2], storage infrastructure if the performance and cost difference
Solid State Drive (SSD) is going to replace HDD in some data of storage devices still exists.
centers (DCs) as the IOPS of SSD can be hundreds or even In early days, high-end HDDs (such as 15k RPM and
thousands times faster than that of HDD, and throughput can 10k RPM) acted as the performance tier, and low-end
be also times over HDD. Other than that, SSD also brings HDDs (such as 7200 RPM and 5400 RPM) acted as the
down energy utilization and provides stable execution under capacity tier [4]. Nowadays, NAND Flash [2] SSDs are
vibration. However, due to the high cost per GB, limited write employed to replace high-end HDDs. The next generation of
cycles, and reliability issues of SSD [3], HDD still plays an hybrid storage systems have incorporated with the emerging
important role in modern storage systems. Non-Volatile Memory (NVM) technologies, such as Phase
In order to provide both large storage capacity and fast Change Memory (PCM) [5], Spin-Transfer Torque Magnetic
access speed with the minimal cost, it is vital to fully utilize random-access memory (STTMRAM) [6], Resistive RAM
the high capacity of low tier devices with lower accessing (RRAM) [7], etc. Table 1 lists the performance and price of
2169-3536 2018 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 6, 2018 Personal use is also permitted, but republication/redistribution requires IEEE permission. 13385
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
J. Niu et al.: Hybrid Storage Systems: A Survey of Architectures and Algorithms
TABLE 1. The performance, power consumption, price and endurance of some non-volatile memories and HDD.
some emerged NVMs and HDD. As the performance cost architectures. There are mainly two types of hybrid storage
ratio, which is the ratio of performance, the combination architectures, tiered storage architecture (tiering method) and
of random access speed (IOPS) and sequential access speed cache storage architecture (caching method). The fundamen-
(throughput) for various system requirements over the price tal distinction between the two architectures are as follows:
per GB, of NVMs is increasing, these NVMs with fast speed 1) Tiering method moves data to the fast storage area as
can be used as the performance tier [8], [9] or cache [10]–[13] opposed to copying the data in caching method. 2) The time
in modern hybrid storage systems. duration that the hot data stay in the faster device is normally
Nowadays, although there are hybrid drive products that longer with tiering method compared with caching method.
consist of HDD and NAND flash, data centers prefer to However, their performances are chiefly influenced by the
design their own hybrid structures in the system level, where following four aspects.
SSD is still the first choice as the performance tier, HDD • Firstly, data allocation policies are essential to control
and the high capacity Shingled Magnetic Recording (SMR) the data flows among various devices. The data alloca-
drives are often utilized as the backup tier [14]–[17]. Many tion in tiering method allocates the data based on the
industry players [18]–[25] integrate the SSD devices into the hotness of the data. Meanwhile, the data allocation in
traditional storage system as the faster tier or data cache to caching method distributes the data to different devices
improve the system performance with slightly increased cost. in light of the caching policy utilized, such as read-only,
Meanwhile, lots of researchers, as listed in Table 2, have write-back, etc.
also proposed various structures and algorithms to improve • Secondly, address translation mechanism is needed in
one or few aspects of hybrid storage systems. In this paper, a hybrid storage system to keep the different locations
we concentrate on the hybrid storage structures that combine in different devices for the same data. The design of
HDDs and SSDs, despite the fact that the general idea can be address translation is important to the speed of data
applied to a wide range of hybrid storage systems. retrieval and the size of memory utilization to store the
A general hybrid storage architecture is shown in Fig. 1, translation information.
where different algorithms and policies are used to manage • Thirdly, due to the size limitation of SSD cache/tier com-
the external and internal data for the tier and cache storage pared with main storage HDDs, the SSD cache/tier is
TABLE 3. Comparison of the hybrid storage systems with SSD caching method.
the SSD capacity is nearly full. Although the wear-leveling durations of idle periods will be short. The write data might
techniques inside the SSD can help to reduce the number not be able to be flushed completely to HDD during these
of GC processes, it is better to reduce the unnecessary write short idle periods. Then when the space of SSD is nearly full,
operations to SSD. the flushing of write data to HDD may further downgrade the
To reduce the unnecessary write operations, [30] proposes system performance. Meanwhile, due to the non-volatility of
a method to check the data hotness based on the demotion SSD device, data can be kept in SSD for long time without
counter and the proposed control metric, and migrate the demoting before the space is full, the system performance
hot data blocks to SSD. In [54], a heuristic file-placement can be improved by applying the write-back cache policy.
algorithm is designed to improve the cache performance by However, if the SSD fails, the write data inside might be
considering the IO patterns of the incoming workload. Mean- lost. To overcome the SSD failure problem, the write data
while, a distributed caching middleware is implemented at inside SSD are flushed to HDD as soon as possible at the cost
user level to detect and manipulate the frequently-accessed of reducing the system performance. Furthermore, the SSD
data. In [17], a hybrid structure that combines SSD and lifespan is another problem as the number of write operations
SMR devices is proposed, which utilizes SSD as the read- to SSD is much larger compared with the storage architecture
only cache to improve the random read performance. In [30], that uses SSD as read-only cache.
the number of demotions from DRAM to SSD is recorded There are some research works [14]–[16], which integrate
and utilized to predict the future accesses to SSD, and decide SSD and the SMR drive as a hybrid storage system to further
whether the data need to be recorded in SSD or not. increase the storage capacity whereas maintaining the access-
ing speed. The SSD is utilized as the write cache to keep
2) SSD AS A WRITE-BACK CACHE the random write data, and leaves the sequential write data
There are two types of write cache, which are write-through to be handled by the SMR drive. At a later time, the random
cache and write-back cache. In the system with write-through write requests are combined and written to the SMR drive
cache policy, the write data are buffered to the cache and also in sequence. With this, the number of random write requests
to the disk, and it is only considered as complete when the to the SMR drive is reduced, as well as the number of read-
data are successfully written to the disk. This policy is mainly modify-write operations. The overall performance can be
utilized in the DRAM cache to avoid the data loss during significantly improved.
power off as the DRAM is volatile memory, and improve Industry players apply RAID techniques to SSD cache
the performance of read on write data in cache. Meanwhile, to improve the reliability of written data. For exam-
the system with write-back policy can write the data to cache ple, RAID-1 is employed in EnhanceIO [59], RAID-4/5 is
first, and flush the data to disk at a later time, which can employed in LSI ElasticCache [60] and HP SmartCache [61].
improve the write performance significantly in some cases. In research area, RAID techniques are also applied to SSDs
In a hybrid storage system with SSD as a cache, the write- to improve the data reliability and save the total cost. In [28],
through policy is seldom applied as the SSD is non-volatile a new hybrid storage architecture is designed that employs
memory, which can keep the data safe even when the drive low-end SSDs to construct a RAID architecture, which is
power is lost. utilized as the cache to improve the data reliability. The new
To enhance the write performance of the hybrid storage architecture combines the advantages of data reliability of
system, some research works [51], [55]–[58] utilize SSD as RAID system and performance of log-structured approach to
the cache for both read and write requests. The data opera- achieve better performance per dollar and lifetime per dollar
tions in the storage architectures with write-back policy are compared with high-end SSDs.
shown in Fig. 4. In the architecture, if the write data are new, Some researchers [34], [35] divide the SSD space into
the data will be recorded to SSD. With the write-back policy, read and write regions to improve the system performance
these write data might be flushed to HDD at a later time. with near balanced read and write requests. The read region
However, because of the later flush, the write data in HDD is only used to store the incoming read data, meanwhile,
can not be updated simultaneously when it is written to SSD, the write region is only used to record the incoming write
which may cause data synchronization problem. Period write data. The data flows for these kinds of storage architectures
data flushing might be applied to flush the write data to HDD are shown in Fig. 5. As shown in the figure, there are mainly
to improve the data synchronization. 6 special operations in this architecture: 1) The write data
Normally, utilizing SSD as a write-back cache can improve are kept in the write cache region. 2) When the write request
the performance of a hybrid storage system. However, when needs to access the data located in the read region, there are
the SSD is nearly full, GC process is required to clean the two options to handle this situation: If the cache is DRAM,
invalid data before that the new write requests comes in, the write request will directly update the data located in the
which lowers the performance for write requests, and the read region, and then moves the data from read region to write
overall hybrid system performance might degrade. If the region. If the cache is SSD cache, the write request will keep
workload is write intensive with high IOPS, the number of its new data to the write region, and mark the original data in
requests to HDD might be large if the re-access ratio is small, SSD as invalid. The invalid data can be cleaned at a later time.
the disk may have limited number of idle periods, and the 3) If the data in write region are flushed to the disk, and the
data are frequently accessed by read requests, then the clean B. SSD TIERING METHOD
data can be moved to the read region. 4) The data accessed Tiering techniques divide the data into hot and cold cate-
by read requests can be allocated to the read region. 5) The gories, and store them in SSD and HDD respectively. The
read data located in the HDD can be moved to the read region data stored in SSD tier is permanent instead of temporary
if they are identified as hot. 6) The data in the write region as in SSD caching, and there is no duplicate copy in HDD.
can be flushed to disk through background process. With this A portion of the research works about SSD tiering strategy are
kind of cache architecture, the cache replacement algorithm listed in Table 4. From the table, we can observe that the tiered
can be improved, and the GC process can be more efficient. storage architectures can be isolated into two classifications,
Thus the overall system performance is improved. However, which are host controlled tiered structure [67], [69], [70],
if the incoming workload is read-intensive or write-intensive, [73], [76], [81], [82] and device controlled tiered structure
the corresponding write cache region and read cache region [66], [71], [72], [74], [75], [77]. In the host controlled tiered
will be wasted, and the overall system performance might structure, the data movements are controlled by the system
degrade. host by using the host side information, whereas the device
Write updating to SSD influences the data in the original controlled tiered structure controls the data movements by
block and makes the them become invalid. GC process is analyzing the properties of the current incoming data and the
needed to clean these invalid data blocks, which requires extra history data.
SSD space. In [35], a new over-provisioned region is split out One of the tiered structures containing SSD tier and HDD
and used for the GC process. The split ratios of the read, write tier is shown in Fig. 6. The data movements inside the
and over-provisioned region are analyzed through modeling tiered structure include: The data allocation, which is actively
to provide optimal system performance. performed to specifically designate the data to suitable tier,
The system performance, especially the write perfor- which is shown as operations 1 and 2; The data migra-
mance, can be enhanced by utilizing SSD as the write-back tion, which is performed in background to relocate the data
cache. However, because the huge number of write requests between the two tiers through steps 3 and 4 in Fig. 6. The data
to SSD, the SSD lifespan is reduced significantly. Mean- allocation and migration are typically controlled by the host
while, the number of GC processes required in SSD may in host controlled tiered structure or the device itself in device
also increase. Keeping in mind the end goal to utilize the controlled tiered structure.
advantage of SSD write-back cache without reducing the
SSD lifespan and increasing the number of GC processes
excessively, numerous algorithms [27], [32], [41], [62] and
architectures [36] are proposed to reduce the number of write
requests to SSD. In [27], a duplication-aware write handling
strategy is used to filter out the duplicated write requests,
which saves the number of write operations to SSD device.
In [32], the data replacement is done as late as possible,
which saves some of the write operations caused by the data FIGURE 6. The data flows in the system with SSD as a tier.
promotion and demotion. Despite the fact that it might miss
some new generated hot data, the SSD lifespan is saved and In the host controlled tiered structure, the choice of
the hot data with longer re-access interval may be kept. which tier to store the approaching data is made by the
To make the data allocation and migration more precise, host through analyzing the host side information. A linearly
the host side information might be utilized to check the weighted formulation [70] is applied at the operating system
hotness of the new coming data, as it might contain more level to characterize the workload access pattern to enhance
valuable information. For instance, in [63], a multilevel cache the overall performance. GreenDM [67] combines the SSD
TABLE 4. Comparison of the hybrid storage systems with SSD tiering method.
devices and HDD devices in a storage system, and a block trade-off between the system performance and the size of
level interface is provided to allow data migration between written data to SSD needs to be carefully considered.
SSD and HDD. GreenDM is adaptable to different kinds To save the power consumption of hybrid storage system,
of incoming workloads by providing some valuable tunable [71] develops a novel architecture that consists of one SSD
parameters. and multiple HDDs, which is called E-HASH. The architec-
In the device controlled tiered structure, the decision of ture fully utilizes the advantages of random access of SSD
data movements is made by the device itself, which is the tier and sequential access of HDDs to improve the system per-
module implemented in the firmware, through analyzing the formance, whereas minimizing the overall cost. Meanwhile,
historical access information and the properties of the incom- the lifespan of SSD is also prolonged.
ing workload. Hystor [66] analyzes the data access pattern, Tier warm-up is also deployed to improve the tiered storage
especially the access locality to find the critical access data. system performance. In [72], the user access patterns over
It saves the critical data to the SSD tier to enhance the system long time-scales are automatically learned and the intensity of
performance. Meanwhile, write data are also allocated to the the future user works can be predicted. Based on the knowl-
SSD tier to improve the write performance. Although the edge of the tier capacity and the performance differences,
write performance is improved, the SSD lifespan is reduced a schedule can be made to decide when to start and stop the
and the number of GC processes may increase. Hybrid- background system works. The data required by the back-
store [68] utilizes the regional access frequency information ground system works or the user workloads can be pre-filled
through analyzing the historical access information to distin- to the faster tier to improve the overall system performance
guish the hot data and move them to SSD tier to improve the during the transition period.
overall performance.
Data allocation and migration in a tiered storage system C. SSD HYBRID METHOD
are usually determined by the data hotness. Notwithstanding, Both the SSD caching method and SSD tiering method have
the device status and the data properties are not considered in their own focal points and inconveniences. In tiering method,
these sorts of systems. In [83], a measurement-driven mech- the frequency of data location optimization is extremely low,
anism is applied for the data relocation between different usually less than 1 time per hour, or even per day. At the point
tiers. In the system, the cold data are kept in slower tier. when the hot data region changes, it needs a considerable
The hot data might be stored to different tiers in view of the measure of time for the system to move the hot data to SSD to
data properties and the status of the devices in various tiers. enhance the future system performance. In caching method,
For instance, the hot random data might be kept in SSD as the data location is optimized in short time if the hotness of
SSD has faster random accessing speed. The hot sequential the data are changed and the system performance in subse-
accessing data are kept in HDD to save SSD bandwidth, quent accesses can be improved. However, the frequent data
as HDD has similar sequential performance compared with movements between HDDs and SSDs increase the load of
SSD. In order to save the SSD lifespan and reduce the number devices, and might influence the overall system performance.
of GC processes in SSD, the hot data identification and To eliminate the restrictions of the caching method and tier-
data migration algorithms need to be carefully designed. The ing method, the storage architectures in [89] and [92]–[94]
FIGURE 7. The data flows in the system with SSD tiering and caching FIGURE 8. The data flows in the system with HDD as the write cache of
methods. SSD.
combine the tiering method and caching method together, will be located to HDD. 2) The write data might need to be
which is the SSD hybrid method to utilize the SSD as both tier flushed to HDD in background. With this type of architecture,
and cache to improve the overall system performance. A sim- the SSD lifespan is prolonged and the number of GC pro-
plified data work flow for these kinds of storage architectures cesses required is significantly reduced as the write update is
are shown in Fig. 7: 1) The write data are directly recorded to handled mainly by HDD devices. The only drawback is that
the HDD if the data are not inside SSD. 2) The write request some of read requests might need to access both the HDD and
updates the write data in SSD tier region if the data are hot SSD to get the complete required data.
and located in SSD tier. 3) The read request reads data from I-CASH [95] is a new type of storage architecture that intel-
SSD tier region if the data are located inside SSD. 4) The ligently couples array of SSDs and HDDs. In the architecture,
write request buffers the data to SSD cache region if write- the SSD keeps the write data, called reference blocks, and
back policy is applied. 5) The read request reads data from HDD records the difference between the new coming data
HDD and buffers the data to SSD cache. 6) The hot read data and the reference blocks, which is called delta. Read request
are migrated to the SSD tier region in background. 7) The data is handled through the integration of the reference blocks and
movements between the cache and tier region are an optional the deltas, and write request only needs to update the log of
operations. deltas, and thus the write update to SSD is minimized. With
Reference [89] consolidates the tiering and caching meth- this architecture, the write performance is improved, and the
ods together in a hybrid storage array containing SSDs and lifespan of SSD is saved. However, the data might be lost if
HDDs. An allocation proportion is defined to adjust the the delta is not saved to HDD properly. Thus the delta data
number of SSDs in the tiering and the caching structure. in the DRAM should be flushed to disk frequently, and the
The system performance, especially for the write intensive flush interval is tuned based on the number of write updates
workloads is enhanced compared with the one without com- that happened in the history.
bination. Reference [90] utilizes the data hotness and the A hybrid storage design called Griffin [96] uses HDD to
load status of the SSD and HDD devices to perform the data cache write data to SSD. The write requests to SSD are
allocation and migration. The parallelism execution of all logged into the HDD in sequence, and migrated back to
the devices are fully utilized to improve the overall perfor- SSD when the system is not busy. With such kind of design,
mance. In [91], a hybrid storage aware flash translation layer the number of writes to SSD is reduced and thus the SSD
(FDTL) is introduced to allocate the data to different levels lifespan is prolonged without significantly reducing the read
of devices. The data allocation is based on the polices of performance.
utility maximization and energy consumption minimization. In [97], the SSD space is split into data zone and delta zone,
It combines the advantages of the sequential access speed of the data zone is used to store the new coming data, and the
HDD and random access speed of SSD, which improves the delta zone is used to store the difference between the history
IO performance and reduces the energy consumption. data and the new updated data. With the proposed mechanism,
the small write penalty is reduced and the SSD lifespan is
D. OTHER STRUCTURES WITH SPECIAL PURPOSES prolonged significantly.
Besides storage architectures with SSD as tier and/or cache, To improve the data reliability and synchronization, one
there are some other types of storage architectures. For storage architecture [98] builds a SSD RAID0 storage sys-
instance, some of the hybrid storage architectures employ tem with multiple SSD devices, and deploys an HDD with
HDD as the write cache for SSD [95], [96]. The operations identical space as the mirror to the SSD RAID. The pro-
in this kind of storage architecture are shown in Fig. 8. posed storage system can improve the data synchronization
Compared with other types of storage architectures, there are by lowering the IO performance a bit. Besides that, a multi-
3 special operations: 1) The original data, which are the first tier file system (TierFS) [99] is proposed to protect the stored
copy of data is located to the SSD, whereas the update data data continuously, which keeps an extra copy of the updates
demoting to the lower tier until the update data backup is algorithms in Table 5, where access frequency and interval
successful. are the most important two factors in hot data identification
and data migration algorithms. Meanwhile, some of the algo-
E. HYBRID STRUCTURES USING VARIOUS NVMs rithms also take the factors in Table 6 into consideration,
As mentioned in Section I, besides NAND Flash, there are which will not be discussed in detail.
other types of NVMs. Some of them have better performance,
longer endurance, and lower power consumption compared A. DATA ALLOCATION
with NAND. However, their price per GB is much higher Hybrid storage system needs to decide which storage devices
than NAND Flash. These kinds of NVMs are often utilized to handle the new coming data. Especially for the write data,
as a storage layer or a cache layer in modern hybrid storage the system needs to allocate the data to proper storage devices
systems to improve the performance cost ratio. In early stage, to fully utilize the storage resources. We call this problem
NVM is deployed as a cache to the HDD devices [27], [104], as resource (data) allocation problem [76], [79], [82], [84],
[105]. In [105] and [106], the NVM and NAND flash memory [105], [113]. In [79], the data allocation mechanism is for-
are both employed as the cache for HDD to improve the mulated as a multiple choice knapsack problem. The opti-
energy efficiency and overall system performance. In [27], mal allocation solution can be found through multiple-stage
NVM is added to improve the power management of the disk dynamic programming. To maximize the system utilization,
through four aspects: 1) extending the idle periods, 2) storing bottleneck aware allocation [76] is designed and developed
the read-miss content in the cache, 3) spinning up the media to automatically choose the allocation ratio between clients
only when NVM cannot serve the requests, and 4) write with different bottleneck. To exploit parallelism between the
throttling. In [107], the potential of PCM in storage hierarchy HDD and SSD devices and provide lower response time,
is evaluated. The tiering method and caching method are a space allocation policy [84] is designed to find the Wardrop
explored by modeling a hypothetical storage system consist- equilibrium by utilizing initial block allocation and migra-
ing of PCM, HDD, and Flash to offer the best performance tion. Based on the allocation policy, the system can achieve
within cost constraints by identifying the combinations of load balancing adaptively across different levels of devices.
different types of devices. A smart controller for disk cache [105] is employed to
The higher performance NVM is also implemented as a distribute the IO requests between NAND and PCM, where
cache for NAND flash [10]–[13]. In [10], the data generated fine-granularity accesses are directed to the PCM, and coarse-
by flash translation layer are stored in NVM, which pro- granularity accesses are directed to flash memory to fully
longs the Flash memory lifespan. In [12], the combination utilize their advantages.
of DRAM and PCM is deployed as a cache to SSD. A data
migration policy is proposed to save the power consumption B. ADDRESS MAPPING ALGORITHM
and enhance the SSD endurance. Meanwhile, the log infor- The address mapping between the SSD cache/tier and the
mation of NAND is stored in the PCM [13] to support for HDD is necessary for the data access in hybrid storage sys-
in-place updating and minimize the number of write updates tem. Unified address mapping between the logical and phys-
to SSD, which reduces the Flash erase operation and prolongs ical block groups [91] can be applied to the physical/logical
the SSD lifespan. address translation. In the system, the address translation map
Tiered memory system that utilizes NVM and DRAM is is important and can not be lost, so the translation map is
another type of tiered storage architectures. In [8] and [9], stored into the HDD.
storage class memory (SCM) is deployed together with In [114], the addresses of physical data blocks in the sec-
DRAM to form a tiered memory system, which leverages the ondary storage and addresses of logical blocks corresponding
advantage of capacity and data loss avoiding when power fail- to the physical blocks are stored in the mapping table. Each
ure for SCM, and the advantage of access speed of DRAM. translation entry contains a value called current level, which
In [108], a new hybrid hardware/software solution that inte- is a function of the access frequency and the logarithmic
grates PRAM and DRAM in the storage system is proposed. system time. A top-K tracking table with B-tree structure is
The write data reliability for PRAM is ensured by applying maintained to keep the first K elements with highest value of
wear leveling uniformly through all the PRAM pages by current level. When a new mapping entry is created, whose
considering the information of write frequencies. current level will be compared with the lowest one in the top-
K tracking table, if it is higher, the lowest one will be replaced
III. ALGORITHMS UTILIZED IN HYBRID by the new coming entry. The value of K is selected based on
STORAGE SYSTEM the memory size allocated to the mapping table.
The algorithms and policies applied in the hybrid storage In [115], the performance difference is considered for
systems actually determine the performance of the overall different kinds of mapping tables under different run-
storage system. In this section, the most important algorithms, time or load conditions. An adaptive address mapping with
such as data allocation, hot data identification, data migration, dynamic run-time memory mapping selection method is
and scheduling algorithm are surveyed. For easy of repre- applied to optimize the mapping performance. A memory
sentation, we list most of the factors considered in these controller is deployed to monitor the online performance of
TABLE 5. Two aspects need to be considered during designing of hot data identification algorithm.
TABLE 6. The other workload properties need to be considered during storage system design.
current memory mapping and dynamically adjust the memory applied to check the hotness of data through spatial locality
mappings at run-time based on the observed performance. and temporal locality. HotDataTrap [117] attemps statistical
approach to avoid some cold data being inserted into the
C. HOT DATA IDENTIFICATION AND DATA cache, which gives more chance for the hot data to be stored
MIGRATION PROCESS in the cache space. An improved adaptive cache replacement
The hybrid system performance is heavily influenced by the algorithm named D-ARC [27] is proposed to identify the hot
selected caching policies, especially the data identification data through both the recency and frequency of the incoming
and migration algorithms. Numerous research works have workload. D-ARC algorithm can improve the cache hit ratio,
focused on improving the hot data identification accuracy and average I/O latency and the SSD lifespan significantly for
migration efficiency, some of which are shown in Table 7. certain types of workloads. Multiple independent hash func-
These hot data identification algorithms are mainly based on tions [118] are applied to increase the correctness of hot data
factors shown in Tables 5 and 6. identification.
which is used to identify and select data for migration. In a Time First (SATF) [128], SCAN [129], etc. However, due
storage system, the service level agreement (SLA) is quite to the fundamental difference between SSD and mechani-
important for the QoS requirements by the user. How to effi- cal driven HDD, these conventional scheduling algorithms
ciently explore the available storage resources to provide high implemented in HDD are not suitable for SSD drive and
QoS to customers is a quite challenging problem. In [122], hybrid systems.
a new approach is presented to meet different kinds of SLA
requirements through data migration processes in varied envi- 2) CACHE SCHEDULING ALGORITHMS IN SSD
ronments. The data allocation and migration are decided by To fully utilize the SSD properties, an efficient scheduling
the SLA requirements and the temperature of the data, which scheme [130] for NCQ in SSDs is proposed to improve the
is the access frequency of the data. system performance. In the scheme, the I/O service time
for each command is estimated based on the status of the
4) DATA MIGRATION buffer as well as the physical characteristics of SSDs. The
scheduler selects the next command to be executed based on
Together with the hot data, the data migration algorithms [39],
the estimated service time. Some scheduling algorithms also
[88], [121], [124], [125] also need to consider other factors,
consider the workload properties. In [131], a workload-aware
such as the incoming workload properties and the status of the
budget compensation scheduling algorithm is utilized for the
devices. To improve the storage and retrieval performance,
device-level request scheduling. The scheduler estimates the
in [88], a metric is created to categorize all the levels of
cost of GC contribution for each of the virtual machines
storage pools, which is applied to prioritize the data segments,
in the SSD device, and compensates the budget for each
and the data migration is performed based on the priority of
virtual machine based on the estimated cost for performance
the segments. In [124], the data migration policy is designed
isolation.
based on the data value computed through data intrinsic
attributes and the prospective values. In [74], the migration
3) CACHE SCHEDULING ALGORITHMS IN
deadline is also considered, and an adaptive data migration
HYBRID STORAGE SYSTEM
model is proposed to attemp to keep the hot data in SSD.
The adaptive data migration is realized through IO profiling Based on the difference of the scheduling algorithms in HDD
and look-ahead migration, where the IO profiling predicts and and SSD, new scheduling algorithms need to consider the
exploits the hot accessed area and the look-ahead migration properties of HDDs and SSDs, the combination of HDD
moves the data, which will become hot in near future to scheduling and SSD scheduling algorithms can be one of the
SSD. In [126], offline algorithms are explored for the Flash solutions for hybrid storage system. In [?], a hybrid disk-
cache by considering both the hit ratio and Flash lifespan. aware CFQ is proposed to optimize the I/O reordering in new
A multi-stage heuristic is designed to approximate the offline hybrid storage systems. In [132], a Parallel Issue Queuing
optimal algorithm, and provide the same optimal read hit (PIQ) scheduling method is applied to increase the parallel
ratio whereas reducing the number of Flash erasures signifi- execution of incoming requests. It attempts to combine the
cantly [125] by a bi-directional migration policy to balance non-conflict requests into the same batch to avoid unneces-
the storage QoS and migration cost. The policy has two sary access conflicts.
thresholds to determine the data promotion and demotion. Scheduling algorithms may also consider the SLA from
In [78], a fully automatic migration process is proposed and users. To guarantee the SLA, [133] proposes a scheduling
a double thresholds method is introduced to only migrate algorithm to provide guaranteed service level objectives in a
the most suitable data blocks. The data migration can be dynamic environment. The proposed algorithm can dynam-
done for both directions: low-end to high-end and high-end to ically adjust the number of active hosts, which makes the
low-end. An adaptive multi-level cache (AMC) replacement system suitable for various kinds of workloads with different
algorithm [39] is applied to adaptively adjust the cache blocks IOPS and throughput, whereas minimizing the overall system
between different cache levels. When new request data are power consumption.
coming, AMC can dynamically determine the cache level to
IV. PERFORMANCE COMPARISON
locate the incoming data by combining the selective promo-
In this section, the performances of different kinds of hybrid
tion and demotion operations. With AMC, the cache resource
storage architectures are compared and analyzed. A hybrid
management becomes more efficient, and multi-level cache
storage system simulator is designed and developed to con-
exclusiveness is achieved.
duct the quantitative comparison for different kinds of storage
architectures, e.g. SSD as a read-only cache, SSD as a write
D. CACHE SCHEDULING ALGORITHMS back cache, and SSD as a high tier, etc. The comparisons are
1) CACHE SCHEDULING ALGORITHMS IN HDD conducted based on the following perspectives: 1) The system
There are numerous types of scheduling algorithms utilized in performance, such as the overall response time. 2) The SSD
HDD device to optimize the disk service time, including the lifespan, which is simulated as the number of data written
seek time, rotational latency and data transfer time, such as and erased in the SSD device. 3) The energy consumption,
Rotation Position Optimization (RPO) [127], Shortest Access which includes energy consumption caused by HDD and
TABLE 8. The hardware specification of HDD in the system. Other than that, the hot data identification threshold is
set differently for different traces, e.g. 2 for SPC1C trace,
3 for web searching trace, and 5 for financial trace. All these
parameters can be adjusted easily through the system settings.
TABLE 11. Performance comparison under different caching policies for financial trace.
TABLE 12. Performance comparison under different caching policies for web searching trace.
FIGURE 10. System performance under different caching policies for SPC1C traces. (a) Enough SSD free space. (b) Limited SSD free space.
with high IOPS. The increment for the write-back policy is performance is almost same. This is on account of that the
significantly larger compared with read-only policy, as the web searching trace is a read-intensive workload, which has
write requests to SSD might cause write data flushing to very few write requests.
HDD when the free space in SSD is not enough to store To reduce the overhead of data flushing and data copying
the new coming data, and the GC process inside the SSD between HDD and SSD, [26] identifies the hot data at the host
device. The performance of the system with read-only policy side by analyzing the user level information, which provides
also decreases, which is caused by the reduced number of more hints for the identification process. With the accurate
hot data kept in the SSD cache, which might reduce the hot data identification process, the system can minimize the
chance of read cache hit. Meanwhile, we also notice that cold data copying to SSD, and increase the chance of read
there is a turn point around BSU 40, where 1 BSU equals to cache hit. In [32], the lazy adaptive cache replacement algo-
5 IOPS. When BSU is small, the number of requests to HDD rithm can also reduce the chance of copying cold data to SSD,
is small, there are enough idle periods for the cache to flush which also improves the read cache hit ratio and the overall
the write data to HDD, and the DRAM cache has clean space system performance.
for storing the new coming write requests. Thus the read-
only cache has better performance compared with write-back
cache. However, when the BSU is large, the number of idle 2) SSD LIFESPAN
periods may not be enough to let the DRAM cache flush the In Fig. 11, the sizes of write data to the SSD, which affect
write data, and the process of write data flush will increase the the SSD lifespan most under different caching policies for
waiting time for the requests. In this circumstance, the write- the SPC1C traces are presented. We notice that the system
back policy can keep the write data for a while, and have with write-back cache policy writes more data to the SSD,
better performance compared with read-only cache policy. this is the result of that the write requests are taken care of
In Table 11, the performances of systems with both caching by the SSD devices. Meanwhile, the read data are likewise
policies for the financial trace are compared. We can notice duplicated to the SSD cache from HDD for future read access.
that the performance with read-only policy is somewhat supe- From the figure, we also notice that the size of write data to
rior to the one with write-back approach, in light of the SSD is larger when the SSD free space is not enough to cover
very high re-access ratio of the workload. Since majority of the incoming write data, this is caused by the GC process in
the requests can be dealt by the cache in SSD and DRAM, SSD, which consists of the processes of reading data, erasing
the disk will have enough idle periods to handle the flushed block, and writing data.
write data, which cleans the dirty data inside the DRAM In Table 11, the number of SSD writes in the system with
cache. The cleaned space can be utilized to handle the future read-only policy is significantly less compared to the one
coming write requests. Thus the performance of hybrid stor- with write-back policy. This is on the grounds that the write
age system with SSD as the read-only cache is slightly better requests are mostly handled by the HDD device. The numbers
compared with write-back policy, on the grounds that the of SSD writes for systems with read-only and write-back
write access speed of DRAM is faster than SSD. In Table 12, policies are similar for the web searching trace as shown
the performances of the systems with both cache polices for in Table 12, which can be clarified by the high read ratio of
web searching trace are compared, and we can see that the the workload.
FIGURE 11. SSD lifespan under different caching policies for SPC1C traces. (a) Enough SSD free space. (b) Limited SSD free space.
FIGURE 12. System energy consumption under different caching policies for SPC1C traces. (a) Enough SSD free space. (b) Limited SSD free
space.
Minimizing the number of writes to SSD [36], [40] can policy consumes more energy compared with the one apply-
prolong the SSD lifespan at the cost of decreasing the overall ing write-back policy. This is due to that the write requests
system performance. In [26], the SSD device only caches are mostly handled by HDD device for the system with read-
for the random write data, and the sequential write data only cache policy, which increases the HDD busy time and
are located to the HDD, which prolongs the SSD lifespan. energy consumption. When SSD only has limited free space,
In [27], before the write request coming to the storage system, the energy consumption is higher for the system with the
deduplication process is conducted to remove the duplicated read-only cache policy when the IOPS is low, but lower when
data, which can reduce the number of updates significantly the IOPS is high compared with the system with write-back
when the workload contains lots of update requests. At that policy. The principle explanation behind this is that when the
point, the SSD lifespan can be prolonged. In [32], due to free space in SSD is not enough, if the IOPS of the trace is
the lazy adaptive cache replacement algorithm, the insertion higher, the busy period of the HDD device is similar for both
and removing of cache entries from the SSD are more strict, read-only cache policy and write-back policy, as the write
which reduces the number of write operations to SSD and data need to be flushed to HDD under write-back policy.
saves the lifespan of SSD. Meanwhile, the SSD device consumes more energy under
write-back policy as it needs to handle more write requests.
3) ENERGY CONSUMPTION In Tables 11 and 12, we can observe that lower energy con-
The energy consumptions of the system with various caching sumption of the system with read-only policy when the SSD
policies for SPC1C traces are shown in Fig. 12. When SSD free space is limited. This is on account of that the write
has enough free space, the system with read-only cache requests need to be handled by SSD device first, and then
TABLE 13. Performance comparison under different system architectures for financial trace.
TABLE 14. Performance comparison under different system architectures for web searching trace.
FIGURE 13. System performance under different storage architectures for SPC1C traces. (a) Enough SSD free space. (b) Limited SSD free
space.
FIGURE 15. SSD lifespan under different storage architectures for SPC1C traces. (a) Enough SSD free spacey. (b) Limited SSD free space.
see that the system with the hybrid method has slightly better
performance than the tiering method when the SSD free space
is not enough. This is in light of the fact that the combination
of tiering and caching method can fully utilize the benefit of
caching method when the re-access ratio of the workload is
smaller. Comparative things happened on the web searching
trace, which also does not have high re-access ratio. However,
when SSD free space is limited, if the re-access ratio of the
workload is high, such as the financial trace, the performance
of tiering method is better compared with caching method,
which can be found in Table 13. This is on account of that
the tiering method minimizes the data migration between the
SSD and HDD and only the hot data are migrated to the SSD.
To further improve the system performance, [90] considers
the load status of the SSD and HDD devices during the FIGURE 16. The SSD lifespan under different hot data identification
policies for SPC1C traces.
data allocation and migration, which improves the system
performance by parallel usage of the devices.
In a tiered storage system, the system performance is
also highly affected by the hot data identification and data identified as hot, whereas the caching method moves the data
migration policies, which can be observed in Fig. 14. From to SSD when they are read from HDD. In Tables 13 and 14,
the figure, it is easy to notice that the system performance the SSD lifespan usage with financial trace and web searching
decreases when the data hotness threshold increases. This can trace are shown. We notice that the SSD lifespan usage under
be explained by the fact that the number of data migrated to tiering method is always smaller compared with the caching
SSD is reduced when the hotness threshold is increased. The method and the hybrid method for both of the traces no matter
benefit of the faster accessing speed of SSD can not be fully the SSD free space is enough or not. In order to improve
utilized compared the one with lower hotness threshold. More the overall system performance, the load balance is also
read requests need to access data from HDD, which lower the considered. In [90], the SSD lifespan is further utilized as the
overall system performance. algorithm needs to maximize the parallelism of the devices,
which requires more write operations to SSD when the load
of HDD is very high.
2) SSD LIFESPAN In a tiered storage system, the SSD lifespan is also affected
The SSD lifespan usages under various storage architec- by the hot data identification policies, which can be shown
tures for SPC1C traces are shown in Fig. 15. From the in Fig. 16. From the figure, we notice that when the hotness
figure, we notice that the SSD lifespan usage under the tier- threshold is large, the more SSD lifespan is saved. This can be
ing method is vastly improved compared with the caching explained by the fact that the size of hot data moved to SSD
method and hybrid method. This is because the tiering are reduced when the hotness threshold is large compared
method only moves the data to SSD when the data are with the one having a smaller hotness threshold.
FIGURE 17. System energy consumption under different storage architectures for SPC1C traces. (a) Enough SSD free space. (b) Limited SSD
free space.
their review, and Grant Mackey for his corrections on some [26] L. Lin, Y. Zhu, J. Yue, Z. Cai, and B. Segee, ‘‘Hot random off-loading: A
grammar issues of this manuscripts. hybrid storage system with dynamic data migration,’’ in Proc. IEEE 19th
Annu. Int. Symp. Model., Anal., Simulation Comput. Telecommun. Syst.
(MASCOTS), Jul. 2011, pp. 318–325.
REFERENCES [27] X. Chen, W. Chen, Z. Lu, P. Long, S. Yang, and Z. Wang, ‘‘A duplication-
[1] V. Turner, J. F. Gantz, D. Reinsel, and S. Minton, ‘‘The digital universe aware SSD-based cache architecture for primary storage in virtualization
of opportunities: Rich data and the increasing value of the Internet of environment,’’ IEEE Syst. J., vol. 11, no. 4, pp. 2578–2589, Dec. 2017.
Things,’’ IDC Anal. Future, Apr. 2014. [28] Y. Oh, E. Lee, C. Hyun, J. Choi, D. Lee, and S. H. Noh, ‘‘Enabling cost-
[2] (2017). Flash Memory—From Wikipedia, the Free Encyclopedia. effective f lash based caching with an array of commodity SSDs,’’ in Proc.
[Online]. Available: https://en.wikipedia.org/wiki/Flash_memory 16th Annu. Middleware Conf., 2015, pp. 63–74.
[3] (2017). Solid State Drive—From Wikipedia, the Free Encyclopedia. [29] X. Meng et al., ‘‘HFA: A hint frequency-based approach to enhance the
[Online]. Available: https://en.wikipedia.org/wiki/Solid_state_drive I/O performance of multi-level cache storage systems,’’ in Proc. 20th
[4] N. Muppalaneni and K. Gopinath, ‘‘A multi-tier RAID storage system IEEE Int. Conf. Parallel Distrib. Syst. (ICPADS), 2014, pp. 376–383.
with RAID1 and RAID5,’’ in Proc. 14th Int. Conf. Parallel Distrib. [30] Y. Liu, X. Ge, X. Huang, and D. H. C. Du, ‘‘MOLAR: A cost-efficient,
Process. Symp. (IPDPS), 2000, pp. 663–671. high-performance hybrid storage cache,’’ in Proc. Int. Conf. Cluster
[5] H.-S. P. Wong et al., ‘‘Phase change memory,’’ Proc. IEEE, vol. 98, no. 12, Comput. (CLUSTER), 2013, pp. 1–5.
pp. 2201–2227, Dec. 2010. [31] J. Tai, B. Sheng, Y. Yao, and N. Mi, ‘‘SLA-aware data migration
[6] Y. Huai, ‘‘Spin-transfer torque MRAM (STT-MRAM): Challenges and in a shared hybrid storage cluster,’’ Cluster Comput., vol. 18, no. 4,
prospects,’’ AAPPS Bull., vol. 18, no. 6, pp. 33–40, 2008. pp. 1581–1593, 2015.
[7] H.-S. P. Wong et al., ‘‘Metal–oxide RRAM,’’ Proc. IEEE, vol. 100, no. 6, [32] S. Huang, Q. Wei, D. Feng, J. Chen, and C. Chen, ‘‘Improving flash-based
pp. 1951–1970, Jun. 2012. disk cache with lazy adaptive replacement,’’ ACM Trans. Storage, vol. 12,
[8] M. K. Qureshi, V. Srinivasan, and J. A. Rivers, ‘‘Scalable high perfor- no. 2, 2016, Art. no. 8.
mance main memory system using phase-change memory technology,’’
[33] T. Kgil and T. Mudge, ‘‘FlashCache: A NAND flash memory file cache
ACM SIGARCH Comput. Archit. News, vol. 37, no. 3, pp. 24–33, 2009.
for low power Web servers,’’ in Proc. Int. Conf. Compil., Archit. Synth.
[9] K. A. Bailey, P. Hornyack, L. Ceze, S. D. Gribble, and H. M. Levy,
Embedded Syst. (CASES), 2006, pp. 103–112.
‘‘Exploring storage class memory with key value stores,’’ in Proc.
[34] T. Kgil, D. Roberts, and T. Mudge, ‘‘Improving NAND flash based
1st Workshop Interact. NVM/FLASH Oper. Syst. Workloads (INFLOW),
disk caches,’’ in Proc. 35th Int. Symp. Comput. Archit. (ISCA), 2008,
2013, pp. 4:1–4:8.
pp. 327–338.
[10] C. W. Smullen, J. Coffman, and S. Gurumurthi, ‘‘Accelerating enterprise
solid-state disks with non-volatile merge caching,’’ in Proc. IEEE Int. [35] Y. Oh, J. Choi, D. Lee, and S. H. Noh, ‘‘Caching less for better perfor-
Conf. Green Comput. (IGCC), Aug. 2010, pp. 203–214. mance: balancing cache size and update cost of flash memory cache in
[11] N. Lu, I.-S. Choi, S.-H. Ko, and S.-D. Kim, ‘‘A PRAM based block hybrid storage systems,’’ in Proc. USENIX Conf. File Storage Technol.
updating management for hybrid solid state disk,’’ IEICE Electron. Exp., (FAST), 2012, pp. 25:1–25:12.
vol. 9, no. 4, pp. 320–325, 2012. [36] J. Yang, N. Plasson, G. Gillis, N. Talagala, S. Sundararaman, and
[12] M. Tarihi, H. Asadi, A. Haghdoost, M. Arjomand, and H. Sarbazi-Azad, R. Wood, ‘‘HEC: Improving endurance of high performance flash-based
‘‘A hybrid non-volatile cache design for solid-state drives using com- cache devices,’’ in Proc. 6th Int. Syst. Storage Conf. (SYSTOR), 2013, Art.
prehensive I/O characterization,’’ IEEE Trans. Comput., vol. 65, no. 6, no. 10.
pp. 1678–1691, Jun. 2016. [37] J. Ou, J. Shu, Y. Lu, L. Yi, and W. Wang, ‘‘EDM: An endurance-aware
[13] G. Sun, Y. Joo, Y. Chen, Y. Chen, and Y. Xie, ‘‘A hybrid solid-state stor- data migration scheme for load balancing in SSD storage clusters,’’ in
age architecture for the performance, energy consumption, and lifetime Proc. IEEE 28th Int. Conf. Parallel Distrib. Process. Symp. (IPDPS),
improvement,’’ in Emerging Memory Technologies. New York, NY, USA: May 2014, pp. 787–796.
Springer, 2014, pp. 51–77. [38] R. Appuswamy, D. C. van Moolenbroek, and A. S. Tanenbaum, ‘‘Cache,
[14] W. Xiao, H. Dong, L. Ma, Z. Liu, and Q. Zhang, ‘‘HS-BAS: A hybrid cache everywhere, flushing all hits down the sink: On exclusivity in mul-
storage system based on band awareness of shingled write disk,’’ in Proc. tilevel, hybrid caches,’’ in Proc. 29th Symp. Mass Storage Syst. Technol.
IEEE 34th Int. Conf. Comput. Design (ICCD), Oct. 2016, pp. 64–71. (MSST), May 2013, pp. 1–14.
[15] C.-L. Wang, D. Wang, Y. Chai, C. Wang, and D. Sun, ‘‘Larger, cheaper, [39] Y. Cheng, W. Chen, Z. Wang, X. Yu, and Y. Xiang, ‘‘AMC: An adaptive
but faster: SSD-SWD hybrid storage boosted by a new SMR-oriented multi-level cache algorithm in hybrid storage systems,’’ Concurrency
cache framework,’’ in Proc. IEEE Symp. Mass Storage Syst. Technol. Comput., Pract. Exper., vol. 27, no. 16, pp. 4230–4246, 2015.
(MSST), May 2017. [40] Y. Chai, Z. Du, X. Qin, and D. A. Bader, ‘‘WEC: Improving durability of
[16] Z.-W. Lu and G. Zhou, ‘‘Design and implementation of hybrid shingled SSD cache drives by caching write-efficient data,’’ IEEE Trans. Comput.,
recording raid system,’’ in Proc. IEEE 14th Int. Conf. Pervasive Intell. vol. 64, no. 11, pp. 3304–3316, Nov. 2015.
Comput. (PiCom), Aug. 2016, pp. 937–942. [41] N. Dai, Y. Chai, Y. Liang, and C. Wang, ‘‘ETD-cache: An expiration-time
[17] D. Luo, J. Wan, Y. Zhu, N. Zhao, F. Li, and C. Xie, ‘‘Design and driven cache scheme to make SSD-based read cache endurable and cost-
implementation of a hybrid shingled write disk system,’’ IEEE Trans. efficient,’’ in Proc. 12th ACM Int. Conf. Comput. Frontiers (CF), 2015,
Parallel Distrib. Syst., vol. 27, no. 4, pp. 1017–1029, Apr. 2016. Art. no. 26.
[18] V. Tkachenko. (2016). Flash Cache. [Online]. Available:
[42] F. Ye, J. Chen, X. Fang, J. Li, and D. Feng, ‘‘A regional popularity-aware
https://github.com/facebookarchive/flashcache
cache replacement algorithm to improve the performance and lifetime of
[19] EMC Corporation. (2013). VNX FAST Cache—A Detailed Review.
SSD-based disk cache,’’ in Proc. IEEE Int. Conf. Netw., Archit. Storage
[Online]. Available: http://www.emc.com/collateral/software/white-
(NAS), Aug. 2015, pp. 45–53.
papers/h8046-clariion-celerra-unified-fast-cache-wp.pdf
[43] H.-P. Chang, S.-Y. Liao, D.-W. Chang, and G.-W. Chen, ‘‘Profit data
[20] D. Reinsel and J. Rydning, ‘‘Breaking the 15K-rpm HDD performance
caching and hybrid disk-aware completely fair queuing scheduling
barrier with solid state hybrid drives,’’ IDC, Framingham, MA, USA,
algorithms for hybrid disks,’’ Softw., Pract. Exper., vol. 45, no. 9,
White Paper #244250, 2013.
pp. 1229–1249, 2015.
[21] Dell Storage SC9000 Array Controller, Specification Sheet, Dell Inc.,
Round Rock, TX, USA, 2017. [44] M. Saxena and M. M. Swift, ‘‘Design and prototype of a solid-state
[22] A. Offeri, D. Hartmanii, and B. Porat, ‘‘IBM XIV Gen3 storage system cache,’’ ACM Trans. Storage, vol. 10, no. 3, 2014, Art. no. 10.
model 2810-114 120,000 mailbox resiliency exchange 2010 storage solu- [45] Y. Li, L. Guo, A. Supratak, and Y. Guo, ‘‘Enabling performance as a
tion,’’ White Paper, 2017. service for a cloud storage system,’’ in Proc. IEEE 7th Int. Conf. Cloud
[23] Netapp FAS9000 Modular Hybrid Flash System, NetApp Corp., Comput. (CLOUD), Jun./Jul. 2014, pp. 554–561.
Sunnyvale, CA, USA, 2017. [46] Z. Zong, R. Fares, B. Romoser, and J. Wood, ‘‘FastStor: improving the
[24] Huawei OceanStor Storage Cases, Specification Sheet, Huawei Technol. performance of a large scale hybrid storage system via caching and
Co., Ltd., Shenzhen, China, 2017. prefetching,’’ Cluster Comput., vol. 17, no. 2, pp. 593–604, 2014.
[25] Dell EMC Unity Hybrid Storage, Specification Sheet, Dell Inc., Round [47] S. Lee, Y. Won, and S. Hong, ‘‘Mining-based file caching in a hybrid
Rock, TX, USA, 2017. storage system,’’ J. Inf. Sci. Eng., vol. 30, no. 6, pp. 1733–1754, 2014.
[48] W. Felter, A. Hylick, and J. Carter, ‘‘Reliability-aware energy manage- [72] J. Xue, F. Yan, A. Riska, and E. Smirni, ‘‘Storage workload isolation
ment for hybrid storage systems,’’ in Proc. IEEE 27th Symp. Mass Storage via tier warming: How models can help,’’ in Proc. 11th Int. Conf. Auto.
Syst. Technol. (MSST), May 2011, pp. 1–13. Comput. (ICAC), 2014, pp. 1–11.
[49] K. Bu, M. Wang, H. Nie, W. Huang, and B. Li, ‘‘The optimization of [73] J. Guerra, H. Pucha, J. S. Glider, W. Belluomini, and R. Rangaswami,
the hierarchical storage system based on the hybrid SSD technology,’’ ‘‘Cost effective storage using extent based dynamic tiering,’’ in Proc.
in Proc. 2nd Int. Conf. Intell. Syst. Design Eng. Appl. (ISDEA), 2012, USENIX Conf. File Storage Technol. (FAST), 2011, pp. 20–31.
pp. 1323–1326. [74] G. Zhang, L. Chiu, and L. Liu, ‘‘Adaptive data migration in multi-tiered
[50] M. Canim, G. A. Mihaila, B. Bhattacharjee, K. A. Ross, and C. A. Lang, storage based cloud environment,’’ in Proc. 3rd Int. Conf. Cloud Comput.
‘‘SSD bufferpool extensions for database systems,’’ Proc. Very Large (CLOUD), 2010, pp. 148–155.
Data Bases Endowment, vol. 3, nos. 1–2, pp. 1435–1446, 2010. [75] A. Elnably, H. Wang, A. Gulati, and P. J. Varman, ‘‘Efficient QoS for
[51] T. Bisson and S. A. Brandt, ‘‘Reducing hybrid disk write latency with multi-tiered storage systems,’’ in Proc. USENIX Workshop Hot Topics
flash-backed I/O requests,’’ in Proc. 15th Int. Symp. Modeling, Anal., Storage File Syst. (HotStorage), 2012, p. 6.
Simulation Comput. Telecommun. Syst. (MASCOTS), 2007, pp. 402–409. [76] H. Wang and P. Varman, ‘‘Balancing fairness and efficiency in tiered
[52] M. Saxena, M. M. Swift, and Y. Zhang, ‘‘FlashTier: A lightweight, storage systems with bottleneck-aware allocation,’’ in Proc. USENIX
consistent and durable storage cache,’’ in Proc. 7th ACM Eur. Conf. Conf. File Storage Technol. (FAST), 2014, pp. 229–242.
Comput. Syst. (EuroSys), 2012, pp. 267–280. [77] G. Zhang, L. Chiu, C. Dickey, L. Liu, P. Muench, and S. Seshadri, ‘‘Auto-
[53] K. R. Krish, B. Wadhwa, M. S. Iqbal, M. M. Rafique, and A. R. Butt, mated lookahead data migration in SSD-enabled multi-tiered storage
‘‘On efficient hierarchical storage for big data processing,’’ in Proc. 16th systems,’’ in Proc. IEEE 26th Symp. Mass Storage Syst. Technol. (MSST),
IEEE/ACM Int. Symp. Cluster, Cloud Grid Comput. (CCGrid), May 2016, May 2010, pp. 1–6.
pp. 403–408. [78] X. Zhao, Z. Li, and L. Zeng, ‘‘FDTM: Block level data migration policy
[54] D. Zhao, K. Qiao, and I. Raicu, ‘‘Towards cost-effective and high- in tiered storage system,’’ in Proc. IFIP Int. Conf. Netw. Parallel Comput.
performance caching middleware for distributed systems,’’ Int. J. Big (NPC), 2010, pp. 76–90.
Data Intell., vol. 3, no. 2, pp. 92–110, 2015. [79] H. Shi, R. V. Arumugam, C. H. Foh, and K. K. Khaing, ‘‘Optimal disk
[55] J. Do, D. Zhang, J. M. Patel, D. J. DeWitt, J. F. Naughton, and storage allocation for multi-tier storage system,’’ in Proc. Asia–Pacific
A. Halverson, ‘‘Turbocharging DBMS buffer pool using SSDs,’’ in Proc. Magn. Rec. Conf. (APMRC), 2012, pp. 1–7.
Int. Conf. Manage. Data (SIGMOD), 2011, pp. 1113–1124. [80] X. Wu and A. L. N. Reddy, ‘‘Data organization in a hybrid storage
[56] D. Lee, C. Min, and Y. I. Eom, ‘‘Effective SSD caching for high- system,’’ in Proc. Int. Conf. Comput., Netw. Commun. (ICNC), 2012,
performance home cloud server,’’ in Proc. IEEE Int. Conf. Consum. pp. 583–587.
Electron. (ICCE), Jan. 2015, pp. 152–153. [81] S. Ma, H. Chen, Y. Shen, H. Lu, B. Wei, and P. He, ‘‘Providing
[57] S. H. Baek and K.-W. Park, ‘‘A fully persistent and consistent read/write hybrid block storage for virtual machines using object-based storage,’’ in
cache using flash-based general SSDs for desktop workloads,’’ Inf. Syst., Proc. 20th IEEE Int. Conf. Parallel Distrib. Syst. (ICPADS), Dec. 2014,
vol. 58, pp. 24–42, Jun. 2016. pp. 150–157.
[58] Y. Liu, J. Huang, C. Xie, and Q. Cao, ‘‘RAF: A random access first cache
[82] I. Iliadis, J. Jelitto, Y. Kim, S. Sarafijanovic, and V. Venkatesan, ‘‘Exa-
management to improve SSD-based disk cache,’’ in Proc. IEEE 5th Int.
Plan: Queueing-based data placement and provisioning for large tiered
Conf. Netw., Archit. Storage (NAS), Jul. 2010, pp. 492–500.
storage systems,’’ in Proc. IEEE 23rd Int. Symp. Modeling, Anal. Simu-
[59] STEC Inc. (2012). EnhanceIO Open Source for Linux. [Online]. Avail- lation Comput. Telecommun. Syst. (MASCOTS), Oct. 2015, pp. 218–227.
able: https://github.com/stec-inc/EnhanceIO
[83] X. Wu and A. L. N. Reddy, ‘‘Managing storage space in a flash and
[60] D. K. Mridha and L. Bert, ‘‘Elastic cache with single parity,’’ U.S. Patent
disk hybrid storage system,’’ in Proc. IEEE Int. Symp. Modeling, Anal.
9 122 629, Sep. 1, 2015.
Simulation Comput. Telecommun. Syst., Sep. 2009, pp. 1–4.
[61] HP SmartCache. (2016). [Online]. Available: https://www.
[84] X. Wu and A. L. N. Reddy, ‘‘Exploiting concurrency to improve latency
hpe.com/us/en/product-catalog/detail/pip.5364342.html
and throughput in a hybrid storage system,’’ in Proc. IEEE Int. Symp.
[62] Y. Liang, Y. Chai, N. Bao, H. Chen, and Y. Liu, ‘‘Elastic queue: A univer-
Modeling, Anal. Simulation Comput. Telecommun. Syst. (MASCOTS),
sal SSD lifetime extension plug-in for cache replacement algorithms,’’ in
Aug. 2010, pp. 14–23.
Proc. 9th ACM Int. Syst. Storage Conf. (SYSTOR), 2016, pp. 1–11.
[85] D. Park and D. H. C. Du, ‘‘Hot data identification for flash-based stor-
[63] G. Yadgar, M. Factor, K. Li, and A. Schuster, ‘‘Management of multi-
age systems using multiple bloom filters,’’ in Proc. USENIX Conf. File
level, multiclient cache hierarchies with application hints,’’ ACM Trans.
Storage Technol. (FAST), 2011, pp. 1–11.
Comput. Syst., vol. 29, no. 2, p. 5, 2011.
[64] Z. Zhang, Y. Kim, X. Ma, G. Shipman, and Y. Zhou, ‘‘Multi-level hybrid [86] Y. Lv, B. Cui, X. Chen, and J. Li, ‘‘Hotness-aware buffer management for
cache: Impact and feasibility,’’ Oak Ridge Nat. Lab., Oak Ridge, TN, flash-based hybrid storage systems,’’ in Proc. 22nd ACM Int. Conf. Inf.
USA, Tech. Rep., 2012, doi: 10.2172/1035823. Knowl. Manage. (CIKM), 2013, pp. 1631–1636.
[65] Y. Klonatos, T. Makatos, M. Marazakis, M. D. Flouris, and A. Bilas, [87] S. H. I. Joseph and S. Roy, ‘‘Enhancing tiering storage performance,’’
‘‘Azor: Using two-level block selection to improve SSD-based I/O U.S. Patent 8 996 808, Mar. 31, 2015.
caches,’’ in Proc. 6th IEEE Int. Conf. Netw., Archit. Storage (NAS), [88] D. Montgomery, ‘‘Extent migration for tiered storage architecture,’’
Jul. 2011, pp. 309–318. U.S. Patent 8 627 004, Jan. 7, 2014.
[66] F. Chen, D. A. Koufaty, and X. Zhang, ‘‘Hystor: Making the best use of [89] S. Hayashi and N. Komoda, ‘‘Evaluation of volume tiering method and
solid state drives in high performance storage systems,’’ in Proc. Int. Conf. SSD cache method in tiered storage system,’’ in Proc. 2nd Asian Conf.
Supercomput. (ICS), 2011, pp. 22–32. Inf. Syst. (ACIS), 2013, pp. 8–14.
[67] Z. Li, ‘‘GreenDM: A versatile tiering hybrid drive for the trade-off [90] X. Wu and A. L. N. Reddy, ‘‘A novel approach to manage a hybrid storage
evaluation of performance, energy, and endurance,’’ Ph.D. dissertation, system,’’ J. Commun., vol. 7, no. 7, pp. 473–483, 2012.
Dept. Comput. Sci., Stony Brook Univ., Stony Brook, NY, USA, 2014. [91] S. Bai, J. Yin, G. Tan, Y.-P. Wang, and S.-M. Hu, ‘‘FDTL: A unified flash
[68] Y. Kim, A. Gupta, B. Urgaonkar, P. Berman, and A. Sivasubramaniam, memory and hard disk translation layer,’’ IEEE Trans. Consum. Electron.,
‘‘HybridStore: A cost-efficient, high-performance storage system com- vol. 57, no. 4, pp. 1719–1727, Nov. 2011.
bining SSDs and HDDs,’’ in Proc. IEEE 19th Annu. Int. Symp. Modeling, [92] K. Oe and K. Okamura, ‘‘A hybrid storage system composed of on-the-
Anal., Simulation Comput. Telecommun. Syst. (MASCOTS), Jul. 2011, fly automated storage tiering (OTF-AST) and caching,’’ in Proc. 2nd Int.
pp. 227–236. Symp. Comput. Netw. (CANDAR), Dec. 2014, pp. 406–411.
[69] A. Raghavan, A. Chandra, and J. B. Weissman, ‘‘Tiera: Towards flexible [93] K. Oe, T. Nanri, and K. Okamura, ‘‘On-the-fly automated storage tiering
multi-tiered cloud storage instances,’’ in Proc. 15th Int. Middleware with caching and both proactive and observational migration,’’ in Proc.
Conf., 2014, pp. 1–12. 3rd Int. Symp. Comput. Netw. (CANDAR), 2015, pp. 371–377.
[70] R. Salkhordeh, H. Asadi, and S. Ebrahimi, ‘‘Operating system level data [94] M. Abashkin, A. Natanzon, and E. Bachmat, ‘‘Integrated caching and
tiering using online workload characterization,’’ J. Supercomput., vol. 71, tiering according to use and QoS requirements,’’ in Proc. IEEE 34th Int.
no. 4, pp. 1534–1562, 2015. Perform. Comput. Commun. Conf. (IPCCC), Dec. 2015, pp. 1–8.
[71] J. Hui, X. Ge, X. Huang, Y. Liu, and Q. Ran, ‘‘E-HASH: An energy- [95] Q. Yang and J. Ren, ‘‘I-CASH: Intelligently coupled array of SSD and
efficient hybrid storage system composed of one SSD and multiple HDD,’’ in Proc. IEEE 17th Int. Symp. High Perform. Comput. Archit.
HDDs,’’ in Proc. Int. Conf. Swarm Intell., 2012, pp. 527–534. (HPCA), Feb. 2011, pp. 278–289.
[96] G. Soundararajan, V. Prabhakaran, M. Balakrishnan, and T. Wobber, [119] X. Zhao, Z. Li, and L. Zeng, ‘‘A hierarchical storage strategy based on
‘‘Extending SSD lifetimes with disk-based write caches,’’ in Proc. block-level data valuation,’’ in Proc. 4th Int. Conf. Netw. Comput. Adv.
USENIX Conf. File Storage technol. (FAST), 2010, pp. 101–114. Inf. Manage. (NCM), vol. 1. 2008, pp. 36–41.
[97] C. Li, D. Feng, Y. Hua, and F. Wang, ‘‘Improving RAID performance [120] M. Alshawabkeh, A. Riska, A. Sahin, and M. Awwad, ‘‘Automated
using an endurable SSD cache,’’ in Proc. 45th Int. Conf. Parallel Process. storage tiering using Markov chain correlation based clustering,’’ in Proc.
(ICPP), 2016, pp. 396–405. 11th Int. Conf. Mach. Learn. Appl. (ICMLA), 2012, pp. 392–397.
[98] W. Xiao, X. Lei, R. Li, N. Park, and D. J. Lilja, ‘‘PASS: A hybrid [121] X. Meng, L. Zheng, L. Li, and J. Li, ‘‘PAM: An efficient power-
storage system for performance-synchronization tradeoffs using SSDs,’’ aware multi-level cache policy to reduce energy consumption of soft-
in Proc. IEEE 10th Int. Symp. Parallel Distrib. Process. Appl., Jul. 2012, ware defined network,’’ in Proc. 1st Int. Conf. Ind. Netw. Intell.
pp. 403–410. Syst. (INISCom), 2015, pp. 18–23.
[99] M. K. Aguilera, K. Keeton, A. Merchant, K.-K. Muniswamy-Reddy, and [122] J. Tai, B. Sheng, Y. Yao, and N. Mi, ‘‘Live data migration for reducing
M. Uysal, ‘‘Improving recoverability in multi-tier storage systems,’’ in SLA violations in multi-tiered storage systems,’’ in Proc. Int. Conf. Cloud
Proc. 37th Annu. IEEE/IFIP Int. Conf. Dependable Syst. Netw. (DSN), Eng. (IC2E), 2014, pp. 361–366.
2007, pp. 677–686. [123] D. Park, B. Debnath, Y. Nam, D. H. C. Du, Y. Kim, and Y. Kim,
‘‘HotDataTrap: A sampling-based hot data identification scheme for flash
[100] B. Mao, H. Jiang, S. Wu, Y. Fu, and L. Tian, ‘‘SAR: SSD assisted restore
memory,’’ in Proc. 27th Annu. ACM Symp. Appl. Comput. (SAC), 2012,
optimization for deduplication-based storage systems in the cloud,’’ in
pp. 1610–1617.
Proc. 7th Int. Conf. Netw., Archit. Storage (NAS), 2012, pp. 328–337.
[124] M. He, L. Xing, and G. Li, ‘‘A data migration strategy for HSM based on
[101] C. Wu, X. He, Q. Cao, and C. Xie, ‘‘Hint-K: An efficient multi-level
data value,’’ J. Inf. Comput. Sci., vol. 8, no. 2, pp. 312–317, 2011.
cache using K-step hints,’’ in Proc. 39th Int. Conf. Parallel Process., 2010,
[125] X. Zhao, Z. Li, X. Zhang, and L. Zeng, ‘‘Block-level data migration in
pp. 624–633.
tiered storage system,’’ in Proc. 2nd Int. Conf. Comput. Netw. Technol.
[102] T. M. Wong and J. Wilkes, ‘‘My cache or yours? Making storage more (ICCNT), 2010, pp. 181–185.
exclusive,’’ in Proc. General Track Annu. Conf. USENIX Annu. Tech. [126] Y. Cheng, F. Douglis, P. Shilane, G. Wallace, P. Desnoyers, and K. Li,
Conf., 2002, pp. 161–175. ‘‘Erasing Belady’s limitations: In search of flash cache offline opti-
[103] B. S. Gill, ‘‘On multi-level exclusive caching: Offline optimality and mality,’’ in Proc. USENIX Annu. Tech. Conf. (USENIX ATC), 2016,
why promotions are better than demotions,’’ in Proc. USENIX Conf. File pp. 379–392.
Storage Technol. (FAST), 2008, Art. no. 4. [127] W. Burkhard and J. Palmer, ‘‘Rotational position optimization (RPO)
[104] T. Bisson and S. A. Brandt, ‘‘Flushing policies for NVCache enabled hard disk scheduling,’’ Univ. California San Diego, La Jolla, CA, USA,
disks,’’ in Proc. Mass Storage Syst. Technol. (MSST), 2007, pp. 299–304. Tech. Rep., 2001. [Online]. Available: http://www.ncstrl.org:8900/
[105] L. Shi, J. Li, C. J. Xue, and X. Zhou, ‘‘Hybrid nonvolatile disk cache ncstrl/servlet/search?formname=detail&id=oai%3Ancstrlh%3Aucsd
for energy-efficient and high-performance systems,’’ ACM Trans. Design _cs%3Ancstrl.ucsd_cse%2F%2FCS2001-0679
Autom. Electron. Syst., vol. 18, no. 1, 2013, Art. no. 8. [128] D. M. Jacobson and J. Wilkes, ‘‘Disk scheduling algorithms based
[106] T.-C. Huang and D.-W. Chang, ‘‘TridentFS: A hybrid file system for non- rotational position,’’ Hewlett-Packard Lab., Palo Alto, CA, USA, Tech.
volatile RAM, flash memory and magnetic disk,’’ Softw., Pract. Exper., Rep. HPL-CSP-91-7rev1, 1991.
vol. 46, no. 3, pp. 291–318, 2016. [129] P. J. Denning, ‘‘Effects of scheduling on file memory operations,’’ in Proc.
[107] H. Kim, S. Seshadri, C. L. Dickey, and L. Chiu, ‘‘Evaluating phase change Apr. 18–20, 1967, Spring Joint Comput. Conf., Apr. 1967, pp. 9–21.
memory for enterprise storage systems: A study of caching and tiering [130] Y. Cho and T. Kim, ‘‘An efficient scheduling algorithm for NCQ within
approaches,’’ ACM Trans. Storage, vol. 10, no. 4, pp. 15:1–15:21, 2014. SSDs,’’ IEICE Electron. Exp., vol. 12, no. 4, p. 20150066, 2015.
[108] G. Dhiman, R. Ayoub, and T. Rosing, ‘‘PDRAM: A hybrid PRAM and [131] B. Jun and D. Shin, ‘‘Workload-aware budget compensation scheduling
DRAM main memory system,’’ in Proc. ACM/IEEE 46th Design Autom. for NVMe solid state drives,’’ in Proc. Non-Volatile Memory Syst. Appl.
Conf. (DAC), 2009, pp. 664–669. Symp. (NVMSA), 2015, pp. 1–6.
[109] (2017). Cache Replacement Policies—Wikipedia, The Free [132] C. Gao, L. Shi, M. Zhao, C. J. Xue, K. Wu, and E. H. Sha, ‘‘Exploiting
Encyclopedia. [Online]. Available: https://en.wikipedia.org/ parallelism in I/O scheduling for access conflict minimization in flash-
wiki/Cache_replacement_policies based solid state drives,’’ in Proc. 30th Symp. Mass Storage Syst. Technol.
(MSST), 2014, pp. 1–11.
[110] L. Cherkasova, ‘‘Improving WWW proxies performance with greedy-
[133] Z. Yao, I. Papapanagiotou, and R. D. Callaway, ‘‘Multi-dimensional
dual-size-frequency caching policy,’’ Hewlett-Packard Lab., Palo Alto,
scheduling in cloud storage systems,’’ in Proc. Int. Commun. Conf. (ICC),
CA, USA, Tech. Rep. HPL-98-69 (R.1), 1998.
2015, pp. 395–400.
[111] E. J. O’Neil, P. E. O’Neil, and G. Weikum, ‘‘The LRU-K page replace- [134] S. Gougeaud, S. Zertal, J.-C. Lafoucriere, and P. Deniel, ‘‘A generic and
ment algorithm for database disk buffering,’’ ACM SIGMOD Rec., vol. 22, open simulation tool for large multi-tiered hierarchical storage systems,’’
no. 2, pp. 297–306, 1993. in Proc. Int. Symp. Perform. Eval. Comput. Telecommun. Syst. (SPECTS),
[112] S. Jin and A. Bestavros, ‘‘Popularity-aware greedy dual-size Web proxy 2016, pp. 1–8.
caching algorithms,’’ in Proc. 20th Int. Conf. Distrib. Comput. Syst. [135] Y. Yamato, ‘‘Use case study of HDD-SSD hybrid storage, distributed
(ICDCS), 2000, pp. 254–261. storage and HDD storage on openstack,’’ in Proc. 19th Int. Database Eng.
[113] S. Rajasekaran, S. Duan, W. Zhang, and T. Wood, ‘‘Multi-cache: Appl. Symp., 2015, pp. 228–229.
Dynamic, efficient partitioning for multi-tier caches in consolidated VM [136] C. San-Lucas and C. L. Abad, ‘‘Towards a fast multi-tier storage system
environments,’’ in Proc. IEEE Int. Conf. Cloud Eng. (IC2E), 2016, simulator,’’ in Proc. IEEE Ecuador Tech. Chapters Meeting (ETCM),
pp. 182–191. Oct. 2016, pp. 1–5.
[114] Y. Letian, C. Hao, and Z. Liu, ‘‘Solid-state disk caching the top-K hard- [137] SPC Benchmark 1C (SPC-1) SPC Benchmark 1C/Energy Extension
disk blocks selected as a function of access frequency and a logarithmic (SPC-1C/E) Official Specification, Storage Performance Council, 2012.
system time,’’ U.S. Patent 8 838 895, Sep. 16, 2014. [138] OLTP Application I/O and Search Engine I/O, UMass Trace Repository,
[115] A. Schaefer and M. Gries, ‘‘Adaptive address mapping with 2007.
dynamic runtime memory mapping selection,’’ U.S. Patent 9 026 767,
Jun. 23, 2015.
[116] Y. Lv, X. Chen, G. Sun, and B. Cui, ‘‘A probabilistic data replacement JUNPENG NIU received the B.E. and M.S.
strategy for flash-based hybrid storage system,’’ in Proc. Asia–Pacific degrees from the School of Computer Engineer-
Web Conf. (APWeb), 2013, pp. 360–371. ing, Nanyang Technological University, Singa-
[117] D. Park, Y. J. Nam, B. Debnath, D. H. C. Du, Y. Kim, and Y. Kim, ‘‘An pore, in 2007 and 2009, respectively, where he
on-line hot data identification for Flash-based storage using sampling is currently pursuing the Ph.D. degree with the
mechanism,’’ ACM SIGAPP Appl. Comput. Rev., vol. 13, no. 1, pp. 51–64, School of Electrical and Electronic Engineering.
2013.
[118] J.-W. Hsieh, L.-P. Chang, and T.-W. Kuo, ‘‘Efficient on-line identification
of hot data for flash-memory management,’’ in Proc. ACM Symp. Appl.
Comput., 2005, pp. 838–842.
JUN XU (SM’12) received the B.S. degree in LIHUA XIE (F’07) received the B.E. and
applied mathematics from Southeast University, M.E. degrees in electrical engineering from the
China, in 2001, and the Ph.D. degree in control Nanjing University of Science and Technology,
and automation from Nanyang Technological Uni- Nanjing, China, in 1983 and 1986, respectively,
versity, Singapore, in 2006. He was with the Data and the Ph.D. degree in electrical engineering from
Storage Institute, Nanyang Technological Univer- the University of Newcastle, Callaghan, NSW,
sity and the National University of Singapore. He Australia, in 1992.
is currently a Principle Engineer with HGST West- Since 1992, he has been with the School of
ern Digital Corporation. He has published around Electrical and Electronic Engineering, Nanyang
60 international papers and patents, and one mono- Technological University, Singapore, where he is
graph. He has multi-discipline knowledge and solid experiences in complex currently a Professor and the Director of the Delta-NTU Corporate Labo-
system design, management, modeling and simulation, data analytics, data ratory for Cyber-Physical Systems. He served as the Head of the Division
center, cloud storage, and IoT. He is a certificated FRM. He was a committee of Control and Instrumentation from 2011 to 2014. His research interests
member of several international conferences on control and automation. He include robust control and estimation, networked control systems, multiagent
is an Editor of the journal Unmanned Systems. networks, localization, and unmanned systems. He is an Elected Member
of Board of Governors and the IEEE Control System Society from 2016 to
2018. He is a fellow of IFAC. He is an Editor-in-Chief of Unmanned Systems
and an Associate Editor of the IEEE TRANSACTIONS ON NETWORK CONTROL
SYSTEMS. He has served as an Editor of IET book series in Control and an
Associate Editor of a number of journals, including the IEEE TRANSACTIONS
ON AUTOMATIC CONTROL, Automatica, the IEEE TRANSACTIONS ON CONTROL
SYSTEMS TECHNOLOGY, and the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II.