Design and Implementation of SSD-assisted Backup and Recovery For Database Systems
Design and Implementation of SSD-assisted Backup and Recovery For Database Systems
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2018.2884466, IEEE
Transactions on Knowledge and Data Engineering
1
F
1 I NTRODUCTION
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2018.2884466, IEEE
Transactions on Knowledge and Data Engineering
2
by using the characteristic of flash memory, previous stud- supported [34], [36]. MySQL Replication keeps track of all
ies [29], [31], [32] exploit the out-of-place update nature changes to its databases in its binlog [37]. The binlog is
of flash-based SSDs for backup and recovery. They extend a record of all events that modify database structures or
the flash translation layer (FTL) in SSD to support the records since the start of the database server. Each slave that
backup/recovery functionality. This article is in line with connects to the master requests a copy of the binlog. Then,
these previous studies [29], [31], [32] in that we leverage the slave pulls the data from the master and executes the
the characteristics of SSD. In contrast, we focus on the events from the binlog. Thus, all database nodes can process
study of the backup/recovery process in the context of client queries with the same set of records. The transactions
database systems and apply multiple SSDs to database on a master node are propagated and executed to slave
systems on replication and redundant array of independent nodes in order, ensuring identical results across all nodes.
disks (RAID) environments. To perform a recovery in MySQL Replication, all the nodes
In this article, we present an SSD-assisted backup and should be restored to the same database state. During the
recovery scheme for backup management of database sys- recovery, database service is discontinued in all nodes. If
tems. Our scheme provides fast backup and recovery by the backup data is available on all nodes, the restoration
adopting the full backup strategy. By utilizing the character- process can be started immediately. Otherwise, the master
istic of flash (i.e., out-of-place update) and resources inside needs to transfer the backup data to the slaves. After the
SSD, we design and implement the full backup/recovery backup data are transferred, the restoration process begins.
functionality in the FTL on the Samsung SM843Tn SSD [33]1 As an alternative, the restore operation is performed in a
called BR-SSD. BR-SSD provides four operations over the master node, and then the restored data is transferred to the
SATA protocol: create, delete, restore, and backup-info op- slave nodes.
erations. For the backup/recovery operations, without data 2.2 RAID
copies, BR-SSD stores, loads, and deletes only the metadata Redundant arrays of independent disks (RAID) delivers
(e.g., FTL mapping information), and reports the backup higher performance and availability than a single large stor-
status. BR-SSD is fully file system-independent but requires age device by using multiple storage devices [38]. RAID also
small modifications in the OS to provide an OS-level in- offers reliability that prevents data loss by data redundancy
terface via the ioctl system call. Through this system call, even in the case of device failure. Through these advantages,
backup/recovery commands are transferred from database RAID is used for database systems to provide a better
systems to storage devices over the SATA interface. quality of services [39]. The RAID technology is an efficient
Furthermore, we integrate BR-SSDs into database sys-
way to solve the bottleneck problem between CPU and I/O
tems in replication and RAID environments as well as a
processing speed. The growth of RAID technology has been
database system based on a single BR-SSD. To this end,
driven by three factors. First, the performance growth of
we define a set of SQL-like commands and modify MySQL
the processor has outstripped that of the storage device.
5.6.21 for backup management so that the backup can be
Such imbalanced growth moves the performance bottleneck
managed based on the transactions-by-transaction basis of
from CPU to storage device. By using multiple storage
database systems. Also, this allows the backup operations
devices, RAID can increase I/O performance. Second, arrays
to be performed in a simple and straightforward fashion;
of storage devices often have substantial cost, power, and
administrators can easily manage backups in all of SSDs
performance advantages over a single large storage device.
in the replication or RAID environments using the defined
Third, such RAID can make a system highly reliable by
commands for database systems. The experimental result
storing a small amount of redundant information in the
demonstrates that our scheme provides fast backup and
array. Without this redundancy, large disk arrays can have
recovery without database performance overhead in nor-
unacceptably low data reliability.
mal operations. In our previous work [30], we focused on
Especially, RAID-5 among various RAID configurations
the study of database systems in a single and replicated
is generally used since it provides parallel access, fault toler-
database systems. This article extends our scheme to RAID
ance, and a small waste of space for redundancy issues [40].
environments.
Therefore, in this paper, we target a RAID-5 system. RAID-5
2 BACKGROUND consists of block-level striping with distributed parity across
2.1 Replication for database systems three or more storage devices. If a storage device fails, the
Database replication is widely used to improve availability, data that was on the failed storage device can be re-created
reliability, fault-tolerance, and so on. For example, pop- from the remaining data and parity. In terms of the backup
ular SNS companies, such as Twitter and Facebook, use and recovery, similar to the case of replication, all SSDs on
MySQL Replication [34], which is one of the popular repli- RAID are more underutilized as the recovery time increases.
cated database systems [35]. MySQL Replication enables
data from one MySQL database server (e.g., master) to be 3 D ESIGN AND I MPLEMENTATION OF BR-SSD
replicated to one or more MySQL database servers (e.g., 3.1 Overview of flash-based SSD
slaves). In MySQL Replication, the default replication mode The FTL is one of the core engines in flash-based SSDs. In
is asynchronous, and different types of synchronization are the flash memory, any update of the data in a page must be
written to a free page due to the out-of-place update nature
1. SM843Tn SSD is an enterprise-class flash-based SSD that imple- of the flash memory [6]. To hide this unique characteristic of
ments the SATA storage protocol. Inside the SSD, the FTL manages the flash memory from the host, the FTL maps the logical page
DRAM-based cache with supercapacitors for high performance, low
latency, and high reliability. With such preferable features, SM843Tn number (LPN) from the host to the physical page number
SSD is widely adopted in the data-center. (PPN) in flash memory. The old page that has the original
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
PITR
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final SSD : Create
publication. snapshotDOI 10.1109/TKDE.2018.2884466, IEEE
Citation information:
up table Transactions on Knowledge and Data Engineering
ddress
3
ble addr.
BR-SSD
Fig. 2: Creating a backup in BR-SSD with supercap
...
.
Flash page Flash page Flash page SSD
DRAM sp
cache FTL DRAM with supercap • restore(ID) The restore command with a given backup
Asynchronous path
Synchronous path
data
page1
Flash memory
block block
ID restores the backup by changing the current state to
active FTL
flash page flash page ... flash page flash page ... ... the previous state of the given ID. LPN
1
asynchronous path • backup-info The backup-info command returns the MySQL dump
2
synchronous path
backup information, such
PITR SSDas the number
: Restore snapshotof existing
Fig. 1: Overall architecture flash mem
backups and free space; this operation can be used as a bloc
PPN
copy of the data becomes unreachable and obsolete. FTL backup guide. page
Database Data
erases dirty blocks which have obsolete pages and recycles The create and delete commands are performed asyn- File system (backu Raw D
these pages (garbage collection). To offer high performance chronously, returning to the host as soon as SSD receives PITR SSD PITR
Single machine
and reliability, enterprise SSDs are equipped with super- the commands. Then, SSD processes the create or delete
capacitors which protect data on the DRAM buffer from operations in the background while the host continues
power outage. This guarantees that any writes sent to the its computation. During the operations which are being
DRAM buffer are successfully written to the flash memory performed, the I/O requests can be blocked until the
even in the event of a power loss [41]. Such supercapacitors processing is finished. This makes the commands per-
also SSD
minimize the overhead caused by a flushing command
DRAM space with supercapacitor
formed synchronously. In contrast, the restore and backup- PI
ble backup table for ordering
1 and durability [11].
data cache FTL mapping table backup table
info commands are performed synchronously. In terms of
N LPN PPN
ID address flush data
page41 ... 1 10
ID address
the restore operation, database systems should complete
1 table addr. 1 table addr.
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
flash memory 3 store table flash memory 2 load table
This article has been accepted for publication
PITRblock in a(refcount:
SSD future issue
: Delete of this journal, but has not been fully edited. Content may changeblock
2) snapshot prior(refcount:
to final publication. Citation information: DOI 10.1109/TKDE.2018.2884466, IEEE
Msg (Restore, B
MySQL Cluster: 0) Shutdown + res
PPN 10 PPN 11 FTL mapping Transactions
backup on Knowledge and Data Engineering
PPN 10 PPN 11 FTL
Restoring a Master Database Using PITR SSDsmapping backup Database
page1 page2 table table page1 page2 table table
(backup 1) (backup 1) File system Raw device
4 Block layer Slave
1. Restore all slaves
Firmware PITR FTL (Repli
2. Receive response and start the master again
SSD SSD 3. shutdown + restore command + restart
SSD
DRAM space with supercapacitor PITR S
DRAM space with supercapacitor 4 delete table 5 delete backup data cache
p table active FTL mapping table 2 free current pages in data cache
backed-up FTL mapping table backup table page1 page2
ddress LPN PPN backup table
LPN PPN ID address
mapping 1 210 FTL mapping active FTL mapping table Masterbacked-up FTL mapping
Restoretable
fail!!!!: Retry
ID toaddress
restore
1 10 1 FTL mapping
le addr. 2 211 table addr. LPN PPN LPN PPN 1 Database
2 11 table addr.
backup 3 1 lookup backup 1 21010 1 10
PITR SSD 1 lookup backup FS or Raw
scan & decrease refcount 2 21111 2 11
4
flash memory PITR SSD
2 load:table
Restore snapshot overwrite table PITR SSD
Msg (Restore, BackupID) Msg (Restore, BackupID)
block (refcount: 0) Shutdown +flash memory
restore command + restart 3 load
Shutdown + restore table + restart
command
kup PPN 10 PPN 11 block (refcount: 2)
FTL mapping backup PPN 10 PPN 11
ble page1 page2 FTL mapping backup
table table page1 page2 table table
(backup 1) (backup 1)
Slave1 Slave2
Fig. 3: Deleting a backup in BR-SSD (Replica) (Replica)
Fig. 4: Restoring a backup in BR-SSD
SSD
DRAM space with supercapacitor PITR SSD PITR SSD
Create operation:
data cache We describe create operation(s) for the entry including the address of the stored!!!!!!!!!!!
FTL mapping
backup(s)pagein BR-SSD
page
1 2 as current
2 free shown indataFigure
pages in cache 2. backup
When a create
Commit
table table. During the delete operation, BR-SSD does not touch
command is
active FTL
LPN
issued
mapping
PPN
table from the
backed-up host,
FTL mappingatablecreate
ID
1
operation
address
FTL mapping is the active FTL mapping table. Consequently, this delete
LPN PPN table addr.
performed1 by 21010BR-SSD in the following 1 10 steps:
1 lookup backup operation allows the backup to be deleted independently
2 21111 2 11
• BR-SSD flushes the 4 current dirty pages from DRAM to
hot overwrite table from other backups due to the full backup strategy.
persistent
flash memoryflash memory. This3 operation load table preserves the Restore operation: Figure 4 illustrates an example of a
block (refcount: 2)
pagesPPN at10a given
PPN 11 point. FTL mapping backup restore operation in BR-SSD. When the host issues a restore
page1 page2
• BR-SSD
(backupscans for all entriestable
1) (backup 1) in the FTL table
mapping table command with a given backup ID to BR-SSD, the SSD
while increasing refcount of blocks corresponding to searches the entry in the backup table according to the
the entries. backup ID. If BR-SSD finds the entry, the SSD obtains the
• BR-SSD flushes the FTL mapping table to flash mem- address of the backed-up FTL mapping table in the backup
ory in order to save the mapping information for the table. Otherwise, the SSD returns an error message for this
preserved pages. restore command to the host. After BR-SSD obtains the
•BR-SSD adds the address of the stored FTL mapping address, the SSD frees the current pages in the data cache.
table to the backup table. And then, BR-SSD loads the FTL mapping table from flash
memory to DRAM. Then, BR-SSD overwrites all entries in
Figure 2 illustrates an example of a create operation
the backed-up FTL mapping table to those in the active FTL
inside BR-SSD. In this example, there are two pages, such
mapping table. In this example, the active FTL mapping
as page 1 (LPN 1) and page 2 (LPN 2) in the data cache, and
table contains page 1 and page 2 mapped to PPN 210
each page is mapped to PPN 10 and 11, respectively. When
and PPN 211, respectively. During the restore operation,
the host issues a create command to BR-SSD, the SSD flushes
each PPN field of page 1 and page 2 in the active FTL
the current data (two pages mapped to PPN 10 and PPN 11)
mapping table is overwritten to 10 and 11, respectively.
from the data cache (DRAM space) to flash memory. After
The two pages mapped to PPN 210 and PPN 211 will be
flushing the data, BR-SSD traverses two entries in the FTL
garbage collected if the pages are not referenced by any
mapping table while increasing refcount to two so that
backup. Consequently, this restore operation provides the
these pages do not get garbage collected. And then, the SSD
full backup restoration by replacing all current entries with
flushes the FTL mapping table to flash memory. Then, an
backed-up entries without data copies.
entry that includes the address for the stored FTL mapping
table is added to a backup table which is written into the 3.4 Modification to the OS
flash memory asynchronously. The FTL mapping table is We modify the OS to implement BR-commands and BR-
the space overhead from the create operation. In our storage system calls. For the BR-commands, we employ an existing
device, the size of the FTL mapping table is about 70 MiB. SATA command (i.e., flush command2 ) instead of adding
The more detailed explanation of the create operation can new commands. We use reserved fields: a count field (6 bits)
be referenced from our previous work [30]. for backup IDs (1-63) and an NCQ tag field (5 bits)3 for the
Delete operation: Figure 3 illustrates an example of a delete type of backup/recovery operations (i.e., 0: create, 1: delete,
operation inside BR-SSD. When the host issues a delete 2: restore, 3: backup-info) in the flush command. When a
command with a given backup ID to BR-SSD, the SSD flush command is issued with a backup ID and a type of
searches the entry in the backup table according to the backup/recovery operation, the modified FTL inside BR-
backup ID. If BR-SSD finds the entry in the backup table, SSD processes the command according to each operation.
the SSD obtains the address of the FTL mapping table to We export BR-system calls to the applications by extending
be deleted. Otherwise, BR-SSD returns an error message for the ioctl system call. They carry the backup ID and operation
this delete command to the host. After BR-SSD obtains the to the block I/O subsystem bypassing the file system. We
address, the SSD loads the backed-up FTL mapping table modify the block I/O subsystem including SATA device
from flash memory to DRAM. Then, BR-SSD scans all en- driver to transfer the backup ID and operation from the
tries in the backed-up FTL mapping table while decreasing upper layer to the lower layer in the block I/O subsystem.
refcount of the block, including the preserved pages to For example, we add the fields for the backup ID and opera-
invalidate the pages. In this example, refcount of the block tions to four descriptors (struct bio, struct request,
including the two pages is decreased and the pages will struct scsi_cmd, and struct ata_queued_cmd) in
be garbage collected since refcount is zero; if the pages 2. Flush command is generally used to flush all the dirty pages to
are associated with another backup, refcount is not zero. flash memory.
Then, BR-SSD deletes the backed-up FTL mapping table and 3. The NCQ tag field is not used in the flush command.
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2018.2884466, IEEE
Transactions on Knowledge and Data Engineering
5
C1 (Client)
normal transaction (tx1)
C2 (Client)
normal transaction (tx2)
BM-query is executed from the client interface, the query
Replication 1 with backup query (tx3)
BEGIN tx1; BEGIN tx2; processor handles BM-query. In BM-query, each backup
BEGIN tx1; UPDATE; INSERT;
UPDATE;
INSERT; Replicated
INSERT;
COMMIT tx1;
management operation is performed by issuing the corre-
COMMIT tx1;
BEGIN tx2;
in this order CREATE_BACKUP 1; sponding BR-system call to the OS. In the case of replicated
INSERT;
CREATE_BACKUP 1;
<SQL interface>
master database systems, BM-query is written to binlog and then
<binlog>
BEGIN tx1;
<query processing> replicated to other nodes. The replicated BM-query will be
slave 1
UPDATE;
INSERT; tx1 commit executed in the same order on other nodes as they were
COMMIT tx1;
<replication processing> BEGIN tx2;
tx2 running
executed on an originated node; this guarantees that the
INSERT;
<query processing> CREATE_BACKUP 1; tx3 (backup) backups are managed with the same transaction state across
InnoDB
WAL buffer pool WAL
InnoDB
buffer pool all nodes. This makes the backup management simple and
1. log flush 2. page flush 3. create backup 1 1. log flush 2. page flush 3. create backup 1
straightforward in the replicated database systems without
BR-SSD BR-SSD
the need to transfer the backup data to other nodes [18],
Fig. 5: An example of our query-based create operation in [35]. To safely create a backup by preserving the transaction
MySQL with BR-SSDs state, a CREATE_BACKUP query executes 1. log flush 2. several
page flush operations
in the following order:
the block I/O subsystem. The modified lines of code for the <query processing>
• The transaction acquires a global read lock on all tables,
OS are about 50. This demonstrates that our scheme requires InnoDB
inside WAL
SSDbuffer until
InnoDB
pool
the create operation WAL
isbuffer
finished.
InnoDB
pool
The pend- WAL that InnoDB
the query is replicated to other nodes.
buffer pool
Replication 1
1 Client 2 Client 1
backup query
is
Replication configured
2 as a master-slave relationship.
B B B As shown B inB
4 BEGIN SSD- tx1;
ASSISTED BACKUP 1
transaction
queries (tx1) AND2
transaction
queriesRECOVERY
(tx2)
3
(tx3) FOR BEGINthe tx1; figure, there are two nodes: a master
21
31 B
22
system
(CREATE_BACKUP, DELETE_BACKUP, and BACKUP_INFO), is created in BR-SSDs on all nodes. If the database system is 1
failure
2
device
failure
which are SQL-like commands, and combine them in the not replicated, the operations are not propagated. B 1 B 1
existing MySQL. This has the following advantages: If other concurrent transactions exist,BtheyB (e.g., tx2 by 2 2
Client 3
BM-query
•Replication 1 allows the creationClient
Client 1
of2 a backup based onReplication
(backup application)
c2 2 in the figure) can be aborted after the database sys-
transaction transaction backup query BR-SSD BR-SSD 2 3
transactions
BEGIN tx1;
UPDATE; or a given 1
point.
queries (tx1)
2
queries (tx2)
3
(tx3)
BEGIN tx1;
tem is restored by backup 1. Since MySQL/InnoDB
UPDATE; BR-SSDs on RAID-5 adopts
INSERT; Replicated BEGIN tx1; INSERT; Replicated
BM-query
• COMMIT
BEGIN tx2;
tx1; with the
in this order existing
UPDATE; replication BEGIN tx2;
INSERT;
or RAID
CREATE_BACKUP 1;
mech- COMMITthe
BEGIN tx2;
tx1; steal policy in this orderfor the buffer pool management, it allows
INSERT;
anisms guarantees theCOMMIT
INSERT;
CREATE_BACKUP 1;
backup tx1; management operations INSERT; the dirty
CREATE_BACKUP 1; pages of running transactions to be written to
with <binlog>the same state across all nodes or
<SQL interface>
all devices. master
storage. <binlog>Thus, the backup can include 1 B Bthe
2 B dirty B 3 B pages B
1 of
B 2 B B 1 2 1 2 B
BM-query
• slave 1 allows administrators<query BEGIN tx1;
or users processing>to manage the running transactions slave 2 (e.g., tx2) whenBR-SSD the create 1
operation
BR-SSD BR-SSD 1 BR-SSD 1 1
UPDATE;
backups
<replicationeasily.processing> INSERT;
COMMIT tx1;
tx1 commit
is performed.
<replication processing> After the backup is restored and the database
BEGIN tx2; tx2 running
<query processing> <query processing>
BM-query can be used in the client interface similar to system is restarted, the uncommitted transactions will be
INSERT;
CREATE_BACKUP 1; tx3 commit
InnoDB InnoDB InnoDB
the common WAL SQL
buffer pool commands and WAL queries, buffer pool and each BM- aborted
WAL by applying undo logs. As an alternative option,
buffer pool
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
oDB InnoDB
pool This article has been accepted for publication
WAL in a future
bufferissue
poolof this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2018.2884466, IEEE
Transactions on Knowledge and Data Engineering
ush 3. create backup 1 1. log flush 2. page flush 3. create backup 1
6
tx1 commit a failed node, the backups in BR-SSD on2an available node
slave
can be used to restore the database
<replication processing> in the failed node. The B11 B12 B13 B11 B12 B13
tx2 running recovery process can be performed by restoring the desired
<query processing> B21 B22 B23 B21 B21
tx3 commitbackup in BR-SSD on the available node and transferring the
DB restored database to theInnoDB failed node. The detailed descrip-
WAL buffer pool
ool tion for the modification to the MySQL database system can BR-SSD1 BR-SSD2 BR-SSD4 BR-SSD1 BR-SSD2 BR-SSD4
3. create be referred in the previous study [30]. BR-SSDs on RAID-5 BR-SSDs on RAID-5
ush backup 1
1. log flush 2. page flush 3. create backup 1 Fig. 6: RAID-5 rebuild with BR-SSDs (Sharded backups (Bxy) in
SD
5 E XPLOITING BR-SSD S FOR RAID each device compose a complete backup (Bx). The first digit (x) of Bxy
In this section, we explain how to use BR-SSDs for the RAID
BR-SSD denotes the backup number, and the second digit (y) of bxy denotes the
environment. As mentioned in Section 2, we focus on RAID- device number in which the sharded backup is created. For example,
5 among diverse RAID configurations since it is widely used B11, B12, and B13 compose the first backup (backup1 (B1)).)
due to parallel access to multiple devices and parity bits new rebuild mechanism is necessary to perform successful
to protect data in case of a device failure. In the RAID-5 RAID reconstruction. With the new rebuild mechanism, we
configuration, data for a backup is distributed across the synchronize the new BR-SSD with the active BR-SSDs by
devices. For example, in RAID-5 with three BR-SSDs, when using the previous backups in active BR-SSDs and RAID-5
a backup with two blocks is created, RAID-5 stores a block reconstruction. Backup management on RAID
in the first BR-SSD and another block in the second BR- system or device failure backup management
SSD and then a parity block in the third BR-SSD. Thus, the 5.1 Overall 1
of backup
system
2
devicemanagement for RAID
failure failure
blocks composing a backup are distributed when a backup For backup management in the RAID environment, we
is created. Meanwhile, the blocks are aggregated when a perform the backup B1
operations
B1
in a similar fashion with the
restore operation for the backup is performed. two-phase commit protocol [42] to guarantee the atomicity
B2
Client 3
However, a partial backup which denotes an incomplete of the operations. WeB2first perform a per-device backup
backup can be
(backup application) generated
Replication 2 in the case of failures. For example, operation (create, delete, or restore) in each device on RAID.
on when
backup query a create operation for a backup is progressed, if Then, we complete the backup
BR-SSD2 BR-SSD 3
operation if every device
3 BEGIN tx1;
tx2) (tx3)
system or device failures occur, the backup can be created completes BR-SSDs
its operation. For example, if more than one
UPDATE; on RAID-5
partially in theINSERT;
device(s). Such failures
COMMIT tx1;
make an incomplete
Replicated device fails to create backup during a create operation,
in this order
backup,1; and the
CREATE_BACKUP BEGIN restore
tx2; operation for the backup cannot we delete the created backup in the active device(s). It is
be completed INSERT;
from the partial backup. Also, if the backup
CREATE_BACKUP 1;
because RAID-5 cannot reconstruct the backup by using
cannot
master be reconstructed by RAID-5, the partial backup can the partially created backup if more than one device fails.
<binlog> B1 B2 B3 B1 B2 B1 B2
erface> leave the useless data (i.e., FTL mapping table and its corre- Meanwhile, if only one device fails to create aB1backup, B2 B3 B1 B2
tx1 commit unnecessary space overhead. backup in active devices. This guarantees the atomicity of
<replication processing>
The backup operations, such as create, delete, and re- the backup operations.
tx2 running
tx3 commit
store, have to be atomic to solve the partial backup prob-
<query processing> Figure 6 shows an example of the RAID-5 rebuild with
DB lem. It is non-trivial for RAID-5 to completely re-create
InnoDB BR-SSDs when system and device failures occur. As shown
pool the backups in aWAL new BR-SSD after a single device failure.
buffer pool in the figure, there are three BR-SSDs on RAID-5. In the
The reason is that the RAID-5 subsystem cannot access the initial state, the RAID-5 system with BR-SSDs has two
lush 3. create backup 1 1. log flush 2. page flush 3. create backup 1
backed-up FTL mapping tables and the backup table which backups (B1 and B2). The backup (B1) consists of sharded
SSD are required to reconstruct BR-SSD the backups. Also, the current backups (B11, B12, and B13). The restore operation for B1
FTL mapping table may not include the LPNs of the backup can be performed by using all the sharded backups (B11,
data since our scheme does not generate a new copy of B12, and B13). As shown in the top left of the figure, during
the backup data and its LPN unlike the existing schemes. the creation of a backup (B3), if a sharded backup (B31) in
Instead, as mentioned in Section 3.3, we store the metadata BR-SSD1 is created while the sharded backups (B32 and B33)
such as FTL mapping table and preserve the backup data. in BR-SSD2 and BR-SSD3 are not created due to system and
For example, if an application writes a data, creates a device failures, the system needs to be recovered.
backup of the data, and deletes the data, the LPN for the After replacing the inoperable device (BR-SSD3) with a
backup data cannot be found. The backup information can new device (BR-SSD4), our system deletes B31 since B3 is
be found when the restore operation is performed. Thus, a an incomplete backup and cannot be reconstructed without
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2018.2884466, IEEE
Transactions on Knowledge and Data Engineering
7
Algorithm 1: Create backup on RAID-5 with BR- Algorithm 2: Backup rebuild on RAID-5 with BR-
SSDs SSDs
Shared Variable: /* backup_info structure */
1 global success ← 0; Struct{
2 global result[nr dev] ← {0, 0, ...., 0}; 1 f ree space;
end 2 is exist[backup idavail ];
Function raid5 create backup(devs[], nr dev, backup id): 3 nr backup;
} backup info;
3 for i ← 0 to nr dev − 1 do
4 thread create(create backup, devs[i], backup id, i);
end Function raid5 rebuild backup(active devs[]):
5 < Wait for all threads to finish > 4 temp id ← a new available backup ID for a temporary backup;
5 active dev id ← an active device ID;
6 if global success < nr dev then 6 nr active dev ← the number of active devices;
7 j ← 0; 7 nr new dev ← 1;
8 success devs[nr dev] ← {0, 0, ...., 0}; /* Get backup information from an active device */
9 for i ← 0 to nr dev − 1 do 8 struct backup info binf o ← backup-info(active dev id);
10 if global result[i] = SU CCESS then
11 success devs[j + +] ← devs[i] /* Backup the current state */
end 9 raid5 create backup(active devs, nr active dev, temp id);
end /* RAID 5 reconstruction with active BR-SSDs */
12 if global success < nr dev - 1 then 10 for i ← 1 to binf o.nr backup do
13 raid5 delete backup(success devs, backup id); 11 if i = temp id then
end 12 continue;
14 return F AIL; end
end 13 if binf o.is exist[i] = T RU E then
15 return SU CCESS ; 14 raid5 restore backup(active devs, nr active dev, i);
end 15 < RAID-5 reconstruction >
Function create backup(dev id, backup id, i): 16 raid5 create backup(new dev, nr new dev, i);
16 global result[i] ←create(dev id, backup id); end
17 if global result[i] = SU CCESS then end
18 atomic add(global success); /* Restore the current state
end */
17 raid5 restore backup(active devs, nr active dev, temp id);
end 18 < RAID-5 reconstruction >
/* Delete the current state */
19 raid5 delete backup(active devs, nr active dev, temp id);
end
another sharded backup (i.e., B32 or B33). To reconstruct
the backups (B1 and B2), our system restores B1 and B2
from BR-SSD1 and BR-SSD2, respectively. For B1, the system successful devices (global_success) and the number of
restores B11 and B12 and performs the RAID-5 reconstruc- devices (nr_dev).
tion. As a result, the data of B13 is reconstructed, and our If global_success is smaller than nr_dev, the create
system creates the sharded backup (B13) in BR-SSD4 by operation has failed (line 6), we identify the successful
storing the FTL mapping table. This procedure is repeated devices (success_devs) by using the results for the create
for other backups (i.e., B2). By doing so, we delete the operations (lines 8 and 11). If the number of failed devices
non-reconstructible sharded backups and rebuild the re- is more than two, the created backup(s) in the successful
constructible sharded backups after a failure. Consequently, devices is deleted by calling the delete function for RAID-5
the backup operation can be atomic. We will explain the (raid5_delete_backup()). It is because RAID-5 cannot
procedures with our algorithms. reconstruct the data if more than two devices are failed. This
prevents the useless backups from remaining the devices.
5.2 Backup and rebuild operations for RAID Through this create operation, we can create a backup on the
This section explains the backup and rebuild operations RAID environment by creating the backup for each device
for RAID in more detail. Algorithm 1 shows our create in parallel and detecting the failures, which guarantees
operation (raid5_create_backup()) for RAID-5. We use the atomicity of the backups. In terms of atomicity of the
a global variable (global_success) to check whether all operations, the delete and restore operations are performed
the create operations are the success or fail (line 1) and a similar to the create operation. These functions such as
global array (global_result[]) to get the results for each raid5_create_backup() which manage the backups on
create operation (line 2). The nr_dev variable denotes the RAID-5 can be called by BM-queries for the database sys-
number of active BR-SSDs. When a main thread creates a tems.
backup on RAID-5 with BR-SSDs, the thread spawns worker Algorithm 2 shows the rebuild procedure for RAID-5
threads dedicated to the devices equal to the number of when a new BR-SSD is attached. We use a structure for
active BR-SSDs (lines 3 and 4). The worker threads call backup information which consists of a variable indicating
the create system call for each device with a given backup the free space (free_space), an array (is_exist[]) to
ID in parallel (line 16). And then, the system call returns check whether the backups are available or not, and a vari-
success or failure for the create operation. If the operation able indicating the number of backups (nr_backup) (lines
is successful, the thread increments the global_success 1-3). As shown in lines 4-6, a thread gets a new available
variable atomically (lines 17 and 18). After create operations backup ID (temp_id) for creating a temporary backup, an
are finished, the worker threads are terminated. The main active device ID (active_dev_id), the number of active
thread waits for the worker threads to terminate (line 5). devices (nr_active_dev), and the number of new devices
And then, the main thread checks whether the create op- (nr_new_dev). The thread then calls the backup-info sys-
eration is successful or not by comparing the number of tem call to get current backup information (i.e., binfo) from
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2018.2884466, IEEE
Transactions on Knowledge and Data Engineering
8
Parameters Workload A Workload B
an active device (line 8). In the case of getting the backup
record count 35,000,000 100,000,000
information, we call the system call for an active device since database size 9.5 GiB 27 GiB
all the active devices have the same backup information.
buffer pool
database size A
database size B
BTRFS
782.12
EXT4 with LVM
1023
EXT4 with BR-SSD
1671.54
EXT4
1670.54
Datasize
10GiB
the number of clients per database server
32 10 GiB
create (us)
130.783
delete (us)
134.878
restore (ms)
63.53537
600 30 GiB
measured time (seconds (s)) 132.058 135.403 63.578209
process, the thread stores the current state by calling the TABLE 2: Experimental parameters
RAID (i.e.,database
create function for1800 size A database size B
raid5_create_backup()), 160
10 GiB 30 GiB 50 GiB
Average latency
120
1400
parallel (line 9). 1200 100
80
As shown in lines 10-16, the thread performs RAID-
1000
60
800
5 reconstruction with
600
active BR-SSDs. The backups are 40
20
created in the new 400 device as much as the number of 0
200 create (us) delete (us) restore (ms)
existing backups by0 incrementing the backup ID except Fig. 7: Average latency for backup and recovery operations of
BTRFS EXT4 with LVM EXT4 with BR-SSD EXT4
for the temporary backup (lines 10-12). If the active de- BR-SSD on different data sizes
vice has a backup by checking the backup information Ubuntu 14.04 LTS distribution with a Linux kernel 3.14.3.
(binfo.is_exist[]), the thread restores the backup from We use MySQL 5.6.21 with the InnoDB storage engine and
the active devices by calling a restore function for RAID- replication [34], running sysbench OLTP benchmarks [43].
5 (i.e., raid5_restore_backup()), which restores the The experimental parameters are presented in Table 2. We
backups in the active devices in parallel. Then, the thread fix the number of threads per database server at 32 where
performs RAID-5 reconstruction to rebuild the data for the database performance is the highest in our machines. For
the restored backup. Finally, the thread creates the backup the MySQL/InnoDB configuration, we set the buffer pool as
for the rebuilt data in the new device. This procedure is 1 GiB and enable the direct I/O mode (O DIRECT) to avoid
repeated as much as the number of existing backups. the effect of file system page caching. We set the database
After rebuilding the backups, for returning to the state page size as 4 KiB to reduce the write amplification in flash-
before the rebuild process is started, the thread restores based SSDs [11]. We set the rest of the configurations as
the temporary backups from the active devices by call- default.
ing the restore function for RAID (line 17). Then, RAID- 6.1 Baseline performance of BR-SSD operations
5 reconstruction is performed to make the state of the In this section, we evaluate the baseline performance of the
new device the state before the rebuild process (line 18). backup/recovery operations of a BR-SSD such as create,
And then, the temporary backups are deleted by a delete delete, and restore in a single node by an external tool
function for RAID (i.e., raid5_delete_backup()), which which simply performs BR-system calls. Figure 7 shows the
deletes the backups in the active devices in parallel (line 19). average latency of each operation in different data sizes. It is
Consequently, we make the backups in the active BR-SSDs noted that the latency unit for the create/delete operations
in the new BR-SSD and synchronize the backups among is the microsecond (us) and that for the restore operation
BR-SSDs to solve the partial backup problem. Note that this is millisecond (ms). As shown in the figure, create and
procedure can be applied to other RAID configurations. delete operations take approximately 132 us and 134 us,
6 P ERFORMANCE E VALUATION respectively. The time does not vary in all sizes. The result
demonstrates that the create and delete operations have
In this section, we empirically evaluate the performance of
almost uniform latency due to their asynchronous nature;
the existing and proposed schemes. For evaluation for the
the latency includes the time taken by issuing and receiving
replication environment, we run experiments on a cluster
the system call and the commands themselves.
system consisting of four identical machines connected by
We measure the time for synchronous create and delete
a network: three nodes as a replicated database system
operations. The time is about 5 seconds in both cases. In
(one master and two slaves) and one node as a database
the create/delete operations, the scan operations for all
client. We configured the replication as asynchronous. Each
entries in the FTL mapping table in BR-SSD occupy the
machine is equipped with an Intel Core CPU i7-4790 (3.60
majority of the time; the DRAM with supercapacitors allows
GHz) with four physical cores, which total up to eight
create/delete operations to be performed asynchronously,
cores with hyperthreading, 32 GiB DRAM, SATA 3 interface,
which hides the latency. Meanwhile, the restore operation,
and a 10 GbE network card. We use 123.6 GiB Samsung
which is synchronous, takes approximately 63 ms in all
SM843Tn [33] on all database servers. We run all types of
sizes5 . The time for the restore operation is longer than that
transactions in the benchmark on the master node while
of the asynchronous create/delete operations but shorter
running read-only transactions on the slave nodes. We re-
than that of the synchronous create/delete operations. The
port the transactions per seconds (TPS) measured at the
4 reason is that the restore operation overwrites all entries
master node . For the evaluation of the RAID environment,
in the backed-up FTL mapping table to those in the active
we run experiments on the machine equipped with an
FTL mapping table using memory copy operations rather
Intel Core CPU i7-3770 (3.40 GHz) with four physical cores,
than the scan operation. Consequently, the latencies of the
which total up to eight cores with hyperthreading, 16 GiB
create/delete/restore operations are not affected by the data
DRAM, and SATA 3 interface. We configure RAID-5 with
three Samsung SM843Tn in a single node. All servers run 5. We note that the internal bandwidth inside SSD between DRAM
and flash memory is higher than that of the specification of SSD
4. The TPS from the slave nodes is almost the same in all the schemes, (Samsung SM843Tn SSD) that we used. Thus, in the case of the restore
and the read-only transactions on the slave nodes do not affect the TPS operation, the time of read operations for FTL mapping table inside
of the master node in our evaluation. SSD is shorter than that of read operations between host and SSD.
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
CREATE_BACKUP 1
replication processing
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2018.2884466,<binlog>
IEEE
query processing
buffer pool database size B
BTRFS
Transactions on Knowledge and Data Engineering
EXT4 with LVM EXT4 EXT4 with BR-SSD Datasize 10 GiB
create (us)
130.783
delete (us)
134.878
restore (ms)
63.53537
sl
Workload A 782.12 1023 1670.54 1671.54 10GiB 30 GiB 132.058 135.403 63.578209 1 tx1 2 tx2
Workload B 575.21 608.3 1411.38 1411.48 30GiB 50 GiB 133.796 133.077 63.174145
replication processing
Database size 10GB
1 GiB
BTRFS
9InnoDB
10 GiB 30 GiB 50 GiB query processing
160 WAL buffer pool
1 tx1 2 tx2
1800
Workload A time for each backup creation is T1 and T2, respectively.
Workload B 140
1. commit 2. flush 3. create
Average latency
120 InnoDB
1200
(200 s) ups (B0/1/2) at each restore point (R0/1/2) as shown 60
1. commit 2. flush 3. creat
40
1000
running transaction
Figure 9b. We measure the restore time (T3, T4, and T5) of 20
800 BR-SSD
600
(200 s)
each restoration. Restore 0 T1/1/2
0
create (us) delete (us) restore (ms)
Restore0
restore a backup
Restore1
restore a backup
Restore2
restore a backup creates a full backup at B0 and creates incremental backups
(Backup0) (Backup1)
(R2)
(Backup2) replication proce
(R0) (R1)
restore a backup
T3 restore a backup
T4 restore a backup
T5
at B1 and B2. For recovery, XB restores the full backup at R0
Timeline
query processi
replication proc
(Backup0) (Backup1) (Backup2)
and the incremental backups at R1 and R2, respectively.
T3 T4 T5 WALquerybuffer proces p
Timeline In BS, LS, and the proposed scheme, the backups are
1. commit 2. flush
(b) A timeline for restoring backups created at B0/1/2 and restored at R0/1/2, respectively.WAL buffer
Fig. 9: A simple backup scenario for evaluating backup and re- For the same condition of evaluation, we use a modified 1. commit BR-SSD 2. flus
covery operations (The backup number (e.g., backup0 (B0)) denotes CREATE_BACKUP query in the case of BS, LS, and XB. The BR-SSD
the point where the creating a backup is initiated. The restore number modified query is identical with CREATE_BACKUP query
(e.g., restore0 (R0)) denotes the point where the restoring a backup is
except that it replaces our create system call with the create
initiated. The time with a digit (e.g., T0) denotes the time taken by
each operation.)
T2 commands in BS, LS, and XB. This query creates the identi-
T2 cal backup in all nodes. Thus, we perform the restore oper-
size create
since 0
each operation create 1 scanscreate or 2overwrites all entries restore 0 of
ations
restore 1
in each restore node2 without transferring the backup data
the FTL
create 0 mapping
running transactions
createtable
1
running transaction
regardless
(200 s) create 2
of the (200 s)
data size.
running transaction
shutdown
restore 0 to
restore a backup
other
(create 0)restore 1
restorenodes.
a backup Using restore a this
(create 1) restore 2 (create 2)
backup configuration with the simple
running transaction restore a backup restore a backup
running transactions
T0 (200 s)
running transaction
(200 s)
shutdown (create 0) backup scenario,
(create 1) we empirically
restore a backup
(createTimeline
2) analyze the TPS and the
6.2 Impact
T0
on run-time T1 performance T2 T3 T4
backup/recovery
T5
performanceTimeline in two different workloads
T1 T2 T3 T4 T5
In this section, we show the impact of BTRFS, LVM, and BR- as shown in Table 4. We report the TPS measured at a master
SSD on the run-time performance during normal operations. node as shown in the table. In each slave node that runs
We perform
create 0
the OLTP create 1benchmark in2 the replicated database
create restore 0 read-only transactions,
restore 1 the
restore 2 TPS is approximately 3400 in all
system
create 0 without running transaction
create
create 1
operations. create 2
We vary the file systems restore 0restore a backuprestore 1restore a backup restorerestore a backup
(200 s)
running transaction
shutdown schemes.
(create
restore 0)
a backup We
restore
perform
(create 1)
a backup
the2
create
(create
restore 2)
a backup
operations at the master database user o
applicatio
shutdown database user
and configurations (200 s) and use the
running
(200 s)
unmodified
transaction SSD
running transaction except for (create 0)
node and the restore operations at each node. We report the
T3
(create 1)
T4 (create
T5 2) Timeline applica create(ID) / delete (ID
T0 running
T1 transaction T2
running (200 s)
transaction Database applications Timeline queries
T3 T4 T5
the EXT4T0 with BR-SSD. T1 Figure (200 s) 8 shows T2 thes) TPS with two create time measured
(200 Databaseat the master node and the longest
applications
create(ID) / delete (
queri
database sy
different workloads. BTRFS has the lowest performance restore time among Database all nodes.
systems In the restore operation, we
database s
create (ID) / delete (ID
among all the cases since BTRFS generates redundant writes assume that all backups to be restored already exist in each Database systems create (ID)system ca(
/ delete
system
by performing garbage collection for the old version of node in all cases for the same condition of evaluation. We
the data on run-time. The run-time performance of LVM do SATA commands Create (ID)
SATA not reportCreate
commands the (ID)results Delete in the
(ID) single
Delete (ID)
device
Restore (ID)
Restore (ID) for backup and
OS SATA co
is lower by 1.63x and 2.32x compared to that of EXT4 in recovery operations since the TPS and the backup/restore OS SATA c
create (ID)
SSD
the case of workload A and B, respectively, since it induces time in a single device are similar to that in the SSD
replication.
create (ID)
DRAM
write amplification by copying the data using the CoW Workload A: Cache
Cache
As shown
PITR FTL
PITR FTLin Table 4a, T0 of XB is 1.68x DRAM
with supercap
with supercap
...
mechanism. Meanwhile, the performance of EXT4 with BR- longer than that Flash page
of MB.FlashThe
Flash page
Flash page
page
Flash page
reason
Flash page
is ...that the data copy cache
cach
Flash memory
SSD is the same as that of EXT4 because BR-SSD does not operations (read/write) Asynchronous path in the full backup operation in XB
Asynchronous path
Flash memory
block
blo
Synchronouspath
Synchronous path
perform any operations for backup and recovery during the take longer than the SELECT query operations in mysql- flash flashpage
page flasfla
normal operation. dump. BS and LS show fast snapshot creation. T0 of BS and asynchronou
asynchrono
synchronous
synchrono
LS is much lower than that of MB and XB due to snapshot
6.3 Backup and recovery performance in the replica-
creation based on the CoW mechanism. T0 of the proposed
tion environment
scheme is the shortest since the create operation is per-
Backup scenario: We use a simple backup/recovery sce- formed by storing only the metadata of SSD asynchronously.
nario6 for evaluating the create and restore operations as T1 and T2 of MB are the shortest since the binary log only
shown in Figure 9. As shown in Figure 9a, we create a flushes the cached queries to the binlog file. T1 and T2 of XB
backup of an initial database state before starting the bench- take the longest compared to those of others since XB makes
mark at B0. The measured time for creating the backup is the delta files during the incremental backup process by
T0. Then, we run the benchmark for 600 seconds; while the comparing the full backup data and the data of the current
benchmark is running, we create two backups at B1 (200 database. Also, while making the delta file, the changed data
seconds) and B2 (400 seconds), respectively. The measured is written to the xtrabackup log file.
In BS SSD and DRAM LS, space T1withand T2 include the modified SSD
6. In our scenario, we empirically choose the total running time SSD supercapacitor SSD DRAM space w
and the backup period while referencing the number of chained by CREATE_BACKUP 1 query
data cacheperforming
DRAM space with supercapacitor
thetablesnapshot
FTL mapping backup tablecreation 1
DRAM space wit
data cache
1 data cache FTL
LPNmapping table
PPN backup table 1 data cache
incremental backups used in Facebook [23]. by BTRFS and flush data
LVM;
flush data page the page time taken LPN
1
byPPN10
the IDIDsnapshot
address
address
com-
flush data
flush data 10 page
20 ... 41
page10 page20 1 table addr. page41 ...
page30 ...
1
2 10
20 1 table addr.
page30 ... 2
3 20
30 4 add backup
3 30 4 add backup
2
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html
scan & increase
2 valid count
scan & increase valid count
for more information. scan
sc
3 store table
NAND space NAND space
block 1 block 2 3 store table
NAND space NAND block
space1
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2018.2884466, IEEE
Transactions on Knowledge and Data Engineering
10
Methods Description File system SSD
mysqldump + binlog (MB) mysqldump for full backup with binlog for incremental backup EXT4 unmodified SSD
xtrabackup (XB) xtrabackup for full backup, and incremental backup using CREATE BACKUP query EXT4 unmodified SSD
BTRFS snapshot (BS) BTRFS snapshot using CREATE BACKUP query BTRFS unmodified SSD
LVM snapshot (LS) LVM snapshot using CREATE BACKUP query EXT4 (over LVM) unmodified SSD
proposed scheme SSD-assisted backup/recovery scheme EXT4 modified SSD (BR-SSD)
TABLE 3: Experimental configurations with varying methods
Methods TPS (Master) T0 (B0) T1 (B1) T2 (B2) T3 (R0) T4 (R1) T5 (R2)
mysqldump + binlog (MB) 1684.37 62 s 0.13 s 0.17 s 911.82 s 610 s 638.26 s
xtrabackup (XB) 1425.28 104.54 s 42.5 s 42.5 s 45.5 s 18.4 s 16.9 s
BTRFS snapshot (BS) 782.12 30 ms 4.82 s 5.09 s 438 ms 331 ms 372 ms
LVM snapshot (LS) 784.41 443 ms 2.7 s 4.1 s 12 ms 11 ms 16 ms
proposed scheme 1670.32 134 us 2.2 s 2.1 s 63.6 ms 63.4 ms 63.2 ms
(a) Workload A
Methods TPS (Master) T0 (B0) T1 (B1) T2 (B2) T3 (R0) T4 (R1) T5 (R2)
mysqldump + binlog (MB) 1425.09 186.78 s 0.12 s 0.19 s 4504 s 584.95 s 606.42 s
xtrabackup (XB) 1278.84 298.1 s 162.04 s 139 s 138.5 s 54.3 s 27.8 s
BTRFS snapshot (BS) 608.72 166.9 ms 4.3 s 5.3 s 447 ms 428 ms 401 ms
LVM snapshot (LS) 483.93 1s 3.7 s 4.2 s 112 ms 101 ms 114 ms
proposed scheme 1411.38 136 us 2.2 s 2.3 s 63.5 ms 63.1 ms 63.3 ms
(b) Workload B
TABLE 4: Experimental results for various create and restore operations while running OLTP workloads
mands is 442 ms and 369 ms in the case of BS and LS, file of XB is increased as well. In the case of BS and LS,
respectively. However, the TPS of BS and LS is lower the times for the create and restore operations are still short.
than that of other schemes due to redundant writes. In the However, the TPS of BS and LS is 2.3x and 2.94x lower than
proposed scheme, T1 and T2 take 2.2 seconds and 2.1 sec- the highest TPS among the schemes (i.e., MB), respectively.
onds, respectively. This result shows the time taken by the In the proposed scheme, the TPS is 0.9% lower than the
CREATE_BACKUP query, and the BR-system call for creating highest TPS among the schemes. The time for the create
the backups takes 131 us and 120 us, respectively. Also, the and restore operations is almost the same as those in the
TPS of the proposed scheme is about 0.8% lower than the cases of workload A. It is because that the proposed scheme
highest TPS among the schemes (i.e., MB). Although the perform the backup and recovery operations in the same
create operations running in the background can affect the way regardless of the data sizes as the full backup strategy.
run-time performance, the impact is minimal. 6.4 Run-time and backup/recovery performance in the
Regarding the restore operation, T3, T4, and T5 in the RAID environment
case of MB are the longest among all the schemes. This This section shows the run-time and backup/recovery per-
result shows that the re-executing the backed-up queries is formance in the existing and proposed schemes in the RAID-
time-consuming. If the desired point is B2, the total time 5 configuration. Figure 10 shows the impact of BTRFS, LVM,
for restoring all backups (T3+T4+T5) is 2160.08 seconds. and BR-SSDs on the run-time performance during normal
During this time period, the replicated database system operations on RAID-5. Overall, similar to the result on a
discontinues its service, underutilizing all nodes. While XB single device, BTRFS and LVM negatively affect the run-
largely reduces the restore time compared to MB in all cases, time performance especially when the workload size in-
the restore times (T3/T4/T5) are longer than those of BS, creases. It is because that they can incur a large overhead to
LS, and the proposed scheme. In BS and LS, T3, T4, and support the copy-on-write mechanism in the case of random
T5 are less than a second. This shows that their restore writes even if we use RAID 5 [44]–[46]. In workload B, the
times are much shorter than those of MB and XB. The CoW performance of BTRFS and EXT4 with LVM is decreased by
mechanism of BS and LS demonstrates fast restoration as 1.95x and 1.54x compared to that of EXT4. Meanwhile, the
well as creation. The restore time of LS is the shortest in all performance of EXT4 with BR-SSDs is the same as that of
schemes in this workload since it simply replaces volumes EXT4 which shows that multiple BR-SSDs on RAID-5 do
for restoration. not affect the performance during the normal operation. To
Workload B: As shown in Figure 4b, the TPS in all measure the backup and recovery performance, we use the
schemes is decreased as the database size is increased. In same backup scenario and experimental configurations as
MB, T1 and T2 are similar to those in the case of workload A. those described in Section 6.3. Table 5 shows the TPS and
However, T0 and T3 taken by create and restore operations the backup/recovery performance in two different work-
for the full backup are increased by 3x and 4.93x, respec- loads. The create and restore times are measured when all
tively, compared to those in the case of workload A. T4 and operations are completed in all BR-SSDs.
T5 are reduced since the number of queries executed during Workload A: As shown in Table 5a, T0 of XB is 6.09x
the backup intervals is reduced due to lower TPS. To restore longer than that of MB. Similar to the case of replication
the backups, which are created at B0/1/2, MB takes 5695.37 configuration, T0 of BS and LS is shorter than that of MB
seconds (T3+T4+T5). In XB, the times for all the create and XB, but the TPS of BS and LS is lower than that of
and restore operations (T0-T5) are increased compared to other schemes. T0 of the proposed scheme has the shortest
those in the case of workload A. As the database size is execution time. T1 and T2 of MB are the shortest while
increased, the times for making the full backup and the those of XB are the longest. T1 and T2 of BS and LS take
delta file are increased. As the time for making the delta approximately 20 seconds. The times are longer than those
file is increased, the changed data to be stored in the log of MB and shorter than those of XB. The proposed scheme
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2018.2884466, IEEE
Transactions on Knowledge and Data Engineering
11
Workload A Workload B
1600
reasonable, and we allow database administrators and users
1400 to manage backup easily via the BACKUP_INFO query.
1200
1000 6.6 Supporting online and remote backup
800
600
This section explains and evaluates an online and remote
400 backup. When a device failure occurs, the backed-up
200 data inside SSD cannot be recovered. To supplement
0
T0 T1 T2
sender
receiver
267.06
505.94
224.34
401.77
BTRFS
26.26 334.7
EXT4 with LVM EXT4 EXT4 with BR-SSDs this drawback, we devise a procedure for transferring
Fig. 10: Impact of various file systems and configurations on the backup data to remote storage with TCP connection.
run-time performance on RAID-5 This procedure reads all backed-up pages associated with
sender receiver a backup ID when a user requests the remote backup
600
500
with the backup ID. The procedure is performed as
Latency (us)
400
follows: (1) The host issues a load command to BR-
300 SSD. Then, BR-SSD loads the FTL mapping table to a
200 memory space inside SSD. (2) The host issues a read
100 command to BR-SSD. Then, BR-SSD transfers the backed-
0 up page associated with the backup ID to the host by
T0 T1 T2
referencing the FTL mapping table. And then, the host
Fig. 11: Average latency for supporting online/remote backup
receives the page and sends it to a remote storage system.
This process is repeated until all the pages are transferred.
reduces the T1 and T2 compared to BS and LS while the (3) After completing the transfer, the host issues a release
TPS is almost similar to that of MB. In the case of restore command to BR-SSD. Then, BR-SSD releases the loaded
operations, T3, T4, and T5 of MB are the longest among all FTL mapping table in the memory space.
the schemes similar to the case of replication configuration. The total transfer times which includes all the proce-
The restore times of XB are much shorter than those of MB. dures are 28.2 and 67.5 seconds in the case of workload
BS and LS largely reduce the restore times compared to A and B, respectively. To show the time breakdown,
MB and XB. The proposed scheme still provides fast and Figure 11 depicts the time to transfer a page at each sender
predictable restore times, which are similar to those in other and receiver site. At the sender site, T0 denotes the time in
configurations. which the sender issues a load command and read a page
Workload B: Table 5b shows experimental results in the in a file. T1 denotes the time in which the sender transfers
case of workload B. Due to the increase in database size, the page and receives the response to/from the receiver. T2
the TPS in all schemes is decreased. Especially, the TPS of denotes the time in which the receiver writes and flushes
XB, BS, and LS is reduced by 97.9%, 64.8%, and 40.9%, the page to the remote SSD and issues and completes the
respectively. Compared to workload A, the time for create release command. At the receiver site, T0 denotes the time
and restore operations of MB is increased. Especially, T0 and in which the receiver connects with sender and receives
T3 are increased by 2.88x and 2.87x, respectively. Similarly, the page from the sender. T1 denotes the time in which the
in XB, the times for all the create and restore operations receiver writes the page and T2 denotes the time in which
are increased compared to those in the case of workload A. the receiver flushes the page to the remote SSD. As shown
Especially, T0-T3 of XB are increased by 2.84x, 2.45x, 2.39x, in the result, the TCP connection and flush time occupy the
and 3.15x, respectively. BS and LS reduce the TPS while majority of the time for the remote backup. Consequently,
they still provide faster create times than XB and restore this procedure can support the remote backup so that
times than both MB and XB. In the proposed scheme, the the users can preserve the desirable backups against the
create time is longer than that in the case of workload A device failure.
while the restore times in the proposed scheme are almost
the same as those in the case of workload A since the restore 7 R ELATED W ORK
time is not affected by the database size. OS-based backup and recovery: Peterson et al. [47] sug-
gest file versioning and snapshot using the CoW strat-
6.5 Measuring the free space in BR-SSD egy, ext3cow, on top of the existing EXT3 file system.
We measure the free spaces inside SSD while running OLTP Peabody [48] is a network block storage device that can
workloads as shown in Table 6. The table shows the free recover any previous state of sectors. Peabody provides
spaces at the initial state without any data and the free undo history and easy-to-manage virtualized storage for
spaces measured after a series of backups. We obtain the any file system or raw block device applications. TRAP [49]
results by the BACKUP_INFO query and we execute the same presents a new disk array architecture that provides timely
number of transactions between backups. The free spaces recovery to any point-in-time by leveraging the exclusive-
are reduced as the number of backups and the database sizes OR operations. This paper is in line with such schemes [47]–
increase. Our scheme shows an efficient space compared [49] in terms of providing the backup/recovery function-
with the full physical backups even though our scheme ality with underlying systems. In contrast, we exploit the
performs a backup operation using full backup strategy. characteristics of flash-based SSDs for database systems.
When the database size is 27 GiB and the number of backups SSD technologies for database systems: Do et al. [50]
is six, free space is 55.3 GiB. It is because that BR-SSD explore the opportunities and challenges of a Smart SSD,
only stores the FTL mapping information without copying which executes user programs by exploiting the computa-
the data. Consequently, the free space of our scheme is tional power and DRAM storage of the SSD. Willow [51]
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2018.2884466, IEEE
Transactions on Knowledge and Data Engineering
12
Methods TPS T0 (B0) T1 (B1) T2 (B2) T3 (R0) T4 (R1) T5 (R2)
mysqldump + binlog (MB) 1308.1 69.3 s 95.12 ms 97.21 ms 618.32 s 264.21 s 267.3 s
xtrabackup (XB) 1031.49 422.42 s 75.02 s 75.49 s 32.98 s 6.61 s 7.35 s
BTRFS snapshot (BS) 1057.87 17.71 ms 18.29 s 18.52 s 11.16 ms 10.65 ms 10.06 ms
LVM snapshot (LS) 749.85 81.43 ms 19.82 s 20.08 s 28 ms 29.72 ms 29.67 ms
proposed scheme 1307.67 1.82 ms 10.37 s 10.48 s 64.14 ms 64.19 ms 64.14 ms
(a) Workload A
Methods TPS T0 (B0) T1 (B1) T2 (B2) T3 (R0) T4 (R1) T5 (R2)
mysqldump + binlog (MB) 1094.53 200.85 s 123.87 ms 127.87 ms 1776.52 s 276.91 s 311.09 s
xtrabackup (XB) 521.23 1198.47 s 183.43 s 180.48 s 103.98 s 8.72 s 6.52 s
BTRFS snapshot (BS) 641.75 29.58 ms 23.52 s 23.66 s 34.16 ms 20.08 ms 17.78 ms
LVM snapshot (LS) 532.24 71.86 ms 21.39 s 24.81 s 31.76 ms 31.66 ms 34.99 ms
proposed scheme 1053 1.63 ms 13.6 s 13.58 s 64.14 ms 64.19 ms 64.17 ms
(b) Workload B
TABLE 5: Experimental results for various create and restore operations with RAID-5 while running OLTP workloads
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2018.2884466, IEEE
Transactions on Knowledge and Data Engineering
13
the data structure for managing the blocks can be added nism over a native multi-version, columnar storage model
into the data structure in other enterprise SSDs. Thus, an to independently stage table data from a write-optimized
enterprise SSD can preserve or release the pages by in- columnar layout (i.e., OLTP) into a read-optimized colum-
creasing or decreasing the reference count. In addition, our nar layout (i.e., OLAP). Also, L-store provides contention-
scheme can be combined and implemented in the recent free merging in which L-store merges read-only base
SSDs with host FTL such as open-channel SSD. For example, data with recently committed updates without blocking
to implement our scheme, we can use the host-based FTL ongoing or new transactions by relying on the lineage. To
mapping table and garbage collection scheme in the OS support online backup with L-store, we can backup the
layer. In addition, we can add more diverse functionalities merged read-only base data by minimizing the blocking
by using host resources such as host CPU and main memory. of transactions and with better space utilization. In addi-
However, there can be a work that we carefully should tion, this mechanism can decrease the network overhead
handle to store the metadata for the backup (e.g., FTL with the reduced data size. Consequently, in L-store, our
mapping table and backup table) to the flash memory backup technique can be operated in a more efficient
atomically and persistently. For example, a create command manner.
for a backup should be atomically performed even if the 9 C ONCLUSION
sudden power failure occurs. Note that we take advantages In this paper, we present an SSD-assisted backup/recovery
of supercapacitor inside SSD to perform backup operations scheme for database systems. We design and implement
atomically. To handle these issues, we can perform a trans- the backup/recovery functionality using an enterprise-class
action processing to store the metadata to the flash memory SATA-based SSD (Samsung SM843Tn) for a more realistic
by using a variant of write-ahead logging. When the create and standard fashion. We also provide query-based opera-
operation is performed in the host-based FTL, the FTL tions to provide backup management to database systems.
writes the FTL mapping table and the backup table in a The experimental results demonstrate that our scheme pro-
new location in the flash memory and then writes a commit vides fast and predictable performance. In future works,
block (i.e., commit mark) after all the tables are transferred we will exploit BR-SSDs for other database systems, such
from host to SSD. If a sudden power off occurs, we check as PostgreSQL and MongoDB, and evaluate them with
whether the commit block exists or not after restarting the different workloads, backup scenarios, and environments.
system. If the block exists, the tables are valid. Otherwise,
the tables are not valid, thus, we discard the tables. R EFERENCES
Garbage Collection (GC): When a lot of backups and valid [1] “Purestorage, all-flash cloud,” http://www.purestorage.com/
data are occupied in SSD, the existing GC operations are products/flash-array-m.html, 2016.
performed. However, in our scheme, blocks which include [2] “Ibm flash storage,” http://www-03.ibm.com/systems/storage/
pages associated with the backups cannot get garbage flash/, 2013.
collected. Thus, this situation can generate potential GC [3] “Netapp, all-flash fas storage arrays,” http://www.netapp.com/
us/products/storage-systems/, 2016.
problem and so we have the plan to handle the problem [4] M. Canim et al., “Ssd bufferpool extensions for database systems,”
as a future work. To solve this problem and its challenge, Proceedings of the VLDB Endowment, vol. 3, no. 1-2, pp. 1435–1446,
we can maintain additional information for backed-up 2010.
[5] W.-H. Kang et al., “Flash-based extended cache for higher through-
pages in a flash memory block. For example, we can allow put and faster recovery,” Proc. VLDB Endow., vol. 5, no. 11, pp.
performing the existing GC procedure without refcount 1615–1626, Jul. 2012.
in our existing scheme. Instead of refcount, we can [6] S.-W. Lee et al., “A case for flash memory ssd in enterprise database
add information to a page to detect whether the page is applications,” in SIGMOD. ACM, 2008, pp. 1075–1086.
[7] S.-W. Lee, B. Moon, and C. Park, “Advances in flash memory
associated with a backup or not. Thus, when GC proce- ssd technology for enterprise database applications,” in SIGMOD.
dure is performed, if the pages are selected as victims, ACM, 2009, pp. 863–870.
we can handle the pages as valid pages. Then, the PPN [8] G. Oh et al., “Share interface in flash storage for relational and
for the pages associated with a backup will be changed. nosql databases,” in SIGMOD. New York, NY, USA: ACM, 2016,
pp. 343–354.
Thus, the GC procedure records newly allocated PPNs for [9] W.-H. Kang et al., “X-ftl: Transactional ftl for sqlite databases,” in
the moved pages and updates corresponding backed-up SIGMOD. New York, NY, USA: ACM, 2013, pp. 97–108.
FTL mapping tables for new mapping information of the [10] X. Ouyang et al., “Beyond block i/o: Rethinking traditional storage
primitives,” in HPCA. IEEE, 2011, pp. 301–311.
moved pages. When the restore operation is performed, [11] W.-H. Kang et al., “Durable write cache in flash memory ssd for
BR-SSD loads the backed-up FTL mapping table and relational and nosql databases,” in SIGMOD. New York, NY,
applies the changed mapping information for the moved USA: ACM, 2014, pp. 529–540.
pages to the loaded FTL mapping table. This scheme [12] S. Bhattacharya et al., “Coordinating backup/recovery and data
consistency between database and file systems,” in SIGMOD.
can minimize the overhead of GC procedure since actual New York, NY, USA: ACM, 2002, pp. 500–511.
updates to backed-up FTL mapping tables can be delayed [13] D. Lomet, Z. Vagena, and R. Barga, “Recovery from ”bad” user
until the restore operation. transactions,” in SIGMOD. New York, NY, USA: ACM, 2006, pp.
337–346.
Supporting efficient online backup: To support more [14] MySQL Database Server., https://dev.mysql.com/.
efficient online backup, we can devise a new backup [15] PostgreSQL., http://www.postgresql.org.
technique for hybrid transactional/analytical processing [16] MongoDB., https://www.mongodb.com.
(HTAP) systems. By adopting this previous study (L- [17] B. Schwartz, P. Zaitsev, and V. Tkachenko, High performance
MySQL: Optimization, backups, and replication. ” O’Reilly Media,
store) [55], our backup technique can be improved. L-
Inc.”, 2012.
store introduces an update-friendly lineage-based storage [18] xtrabackup., https://www.percona.com/software/
architecture that enables a contention-free update mecha- mysql-database/percona-xtrabackup.
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TKDE.2018.2884466, IEEE
Transactions on Knowledge and Data Engineering
14
[19] C. Qian, Y. Huang, X. Zhao, and T. Nakagawa, “Optimal backup [50] J. Do et al., “Query processing on smart ssds: opportunities and
interval for a database system with full and periodic incremental challenges,” in SIGMOD. ACM, 2013, pp. 1221–1230.
backup,” Journal of Computers, vol. 5, no. 4, pp. 557–564, 2010. [51] S. Seshadri et al., “Willow: A user-programmable ssd,” in OSDI.
[20] G. Amvrosiadis and M. Bhadkamkar, “Identifying trends in enter- Broomfield, CO: USENIX Association, 2014, pp. 67–80.
prise data protection systems,” in ATC. Santa Clara, CA: USENIX [52] V. Prabhakaran et al., “Transactional flash,” in OSDI. Berkeley,
Association, Jul. 2015, pp. 151–164. CA, USA: USENIX Association, 2008, pp. 147–160.
[21] A Quick Start Guide to Backup Technologies., https://mariadb. [53] S. Hardock, I. Petrov, R. Gottstein, and A. Buchmann, “From in-
com/sites/default/files/A Quick Start Guide to Backup place updates to in-place appends: Revisiting out-of-place updates
Technologies - MariaDB White Paper - 08 26 13 001.pdf, on flash,” in Proceedings of the 2017 ACM International Conference on
2008. Management of Data. ACM, 2017, pp. 1571–1586.
[22] mysqldump., http://dev.mysql.com/doc/refman/5.6/en/ [54] E. Lee, J. E. Jang, T. Kim, and H. Bahn, “On-demand snapshot: An
mysqldump.html. efficient versioning file system for phase-change memory,” IEEE
[23] Hybrid Incremental MySQL Backups., https:// Transactions on Knowledge and Data Engineering, vol. 25, no. 12, pp.
www.facebook.com/notes/facebook-engineering/ 2841–2853, Dec 2013.
hybrid-incremental-mysql-backups/10150098033318920/. [55] M. Sadoghi, S. Bhattacherjee, B. Bhattacharjee, and M. Canim,
[24] Y. Tan et al., “Cabdedupe: A causality-based deduplication perfor- “L-store: A real-time oltp and olap system,” arXiv preprint
mance booster for cloud backup services,” in IPDPS. IEEE, 2011, arXiv:1601.04084, 2016.
pp. 1266–1277. Yongseok Son received his B.S. degree in In-
[25] O. Rodeh et al., “Btrfs: The linux b-tree filesystem,” ACM Transac- formation and Computer Engineering from Ajou
tions on Storage (TOS), vol. 9, no. 3, p. 9, 2013. University in 2010, and his M.S. and Ph.D. de-
[26] J. Bonwick and B. Moore, “Zfs: The last word in file systems,” grees in Department of Intelligent Convergence
2007. Systems and Electronic Engineering and Com-
[27] M. Hasenstein, “The logical volume manager (lvm),” White paper, puter Science at Seoul National University in
2001. 2012 and 2018, respectively. Currently, he was a
[28] “Mysql point-in-time backups,” http://www. postdoctoral research associate in Electrical and
databasejournal.com/features/mysql/article.php/3915236/ Computer Engineering at University of Illinois at
MySQL-Point-in-Time-Backups.htm, 2010. Urbana-Champaign. Currently, he is an assistant
[29] P. Huang et al., “Bvssd: Build built-in versioning flash-based solid professor in School of Software, Chung-Ang Uni-
state drives,” in SYSTOR. New York, NY, USA: ACM, 2012, pp. versity. His research interests are operating, distributed, and database
11:1–11:12. systems. Moonsub Kim received his B.S degree in Com-
[30] Y. Son, J. Choi, J. Jeon, C. Min, S. Kim, H. Y. Yeom, and H. Han, puter Science from University of Seoul in 2016.
“Ssd-assisted backup and recovery for database systems,” in Data Currently, he is a M.S. student in Computer Sci-
Engineering (ICDE), 2017 IEEE 33rd International Conference on. ence and Engineering at Seoul National Univer-
IEEE, 2017, pp. 285–296. sity. His research interests are operating sys-
[31] S. Subramanian et al., “Snapshots in a flash with iosnap,” in tems, distributed systems, and database sys-
EuroSys. New York, NY, USA: ACM, 2014, pp. 23:1–23:14. tems.
[32] K. Sun et al., “Ltftl: Lightweight time-shift flash translation layer
for flash memory based embedded storage,” in EMSOFT. New
Sunggon Kim received his B.S. degree in Com-
York, NY, USA: ACM, 2008, pp. 51–58.
puter Science from University of Wisconsin,
[33] SAMSUNG 843Tn Data Center Series., http://www.samsung.
Madison in 2015. Currently, he is a M.S. stu-
com/semiconductor/global/file/insight/2015/08/PSG2014 2H
dent in Computer Science and Engineering at
FINAL-1.pdf.
Seoul National University. His research interests
[34] MySQL Replication., http://dev.mysql.com/doc/refman/5.6/ are operating systems, distributed systems, and
en/replication.html. database systems.
[35] MySQL Replication for High Availability
- Tutorial, http://severalnines.com/tutorials/
mysql-replication-high-availability-tutorial. Heon Young Yeom He received B.S. degree in
[36] Codership Oy., http://galeracluster.com/products/. Computer Science from Seoul National Univer-
[37] Binary log., http://dev.mysql.com/doc/refman/5.6/en/ sity in 1984 and his M.S. and Ph.D. degrees in
binary-log.html. Computer Science from Texas A&M University
[38] D. A. Patterson, G. Gibson, and R. H. Katz, “A case for redundant in 1986 and 1992 respectively. Currently, he is
arrays of inexpensive disks (raid),” in SIGMOD. New York, NY, a professor with the Department of Computer
USA: ACM, 1988, pp. 109–116. Science and Engineering, Seoul National Uni-
[39] G. Powell, Beginning database design. John Wiley & Sons, 2006. versity. His research interests are database sys-
[40] J. L. Gonzalez and T. Cortes, “Increasing the capacity of raid5 tems and distributed systems.
by online gradual assimilation,” in Proceedings of the International
Workshop on Storage Network Architecture and Parallel I/O, 2004, Nam Sung Kim He received B.S. and M.S.
p. 17. degrees in Electrical Engineering from Korea
[41] T. Pott, http://www.theregister.co.uk/2014/09/24/storage Advanced Institute of Science and Technology
supercapacitors/, Sep 2014. and Ph.D. degree in Computer Engineering and
[42] B. Lampson and D. B. Lomet, “A new presumed commit optimiza- Science from University of Michigan, Ann Arbor.
tion for two phase commit,” in VLDB, vol. 93, 1993, pp. 630–640. Currently, he is an associate professor in the De-
[43] A. Kopytov, “Sysbench: a system performance benchmark,” 2004. partment of Electrical and Computer Engineer-
[44] J. Kára, “Ext4, btrfs, and the others,” in Proceeding of Linux-Kongress ing, University of Illinois at Urbana-Champaign.
and OpenSolaris Developer Conference, 2009, pp. 99–111. His research interests are computer architecture
and systems.
[45] M. Xie and L. Zefan, “Performance improvement of btrfs,” Linux-
Con Japan, 2011.
[46] BTRFS/EXT4 RAID Tests, https://www.phoronix.com/scan.php? Hyuck Han received his B.S, M.S, and Ph.D.
page=article&item=4ssd-linux415-raid&num=2. degrees in Computer Science and Engineering
[47] Z. Peterson and R. Burns, “Ext3cow: a time-shifting file system for from Seoul National University, Seoul, Korea, in
regulatory compliance,” ACM Transactions on Storage (TOS), vol. 1, 2003, 2006, and 2011, respectively. Currently,
no. 2, pp. 190–212, 2005. he is an assistant professor with the Depart-
[48] C. B. Morrey and D. Grunwald, “Peabody: the time travelling ment of Computer Science, Dongduk Women’s
disk,” in MSST, April 2003, pp. 241–253. University. His research interests are distributed
[49] Q. Yang et al., “Trap-array: A disk array architecture providing computing systems and algorithms.
timely recovery to any point-in-time,” in ISCA. Washington, DC,
USA: IEEE Computer Society, 2006, pp. 289–301.
1041-4347 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.