0% found this document useful (0 votes)
21 views9 pages

A Hybrid Filtering Approach For Storage Optimization - 2015 - Egyptian Informat

This document summarizes a research paper that proposes a new approach called the Hybrid Filtering Approach (HFA) for storage optimization in main-memory cloud databases. The HFA aims to improve performance in terms of storage space, query time, and CPU time. It works by first horizontally filtering tuples to identify hot tuples, and then vertically filtering columns to identify hot attributes. An implementation of HFA on Hekaton showed it outperformed the traditional approach, reducing storage space by 44-96%, query time by 25-93%, and CPU time by 31-97%. The paper contributes a novel hybrid approach for hot/cold data management in main-memory databases to optimize storage.

Uploaded by

phani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views9 pages

A Hybrid Filtering Approach For Storage Optimization - 2015 - Egyptian Informat

This document summarizes a research paper that proposes a new approach called the Hybrid Filtering Approach (HFA) for storage optimization in main-memory cloud databases. The HFA aims to improve performance in terms of storage space, query time, and CPU time. It works by first horizontally filtering tuples to identify hot tuples, and then vertically filtering columns to identify hot attributes. An implementation of HFA on Hekaton showed it outperformed the traditional approach, reducing storage space by 44-96%, query time by 25-93%, and CPU time by 31-97%. The paper contributes a novel hybrid approach for hot/cold data management in main-memory databases to optimize storage.

Uploaded by

phani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Egyptian Informatics Journal (2015) 16, 329–337

Cairo University

Egyptian Informatics Journal


www.elsevier.com/locate/eij
www.sciencedirect.com

FULL-LENGTH ARTICLE

A hybrid filtering approach for storage


optimization in main-memory cloud database
Ghada M. Afify a,*, Ali El Bastawissy b, Osman M. Hegazy a

a
Department of Information System, Faculty of Computers and Information, Cairo University, Egypt
b
Faculty of Computer Science, MSA University, Cairo, Egypt

Received 24 December 2014; revised 16 June 2015; accepted 30 June 2015


Available online 21 August 2015

KEYWORDS Abstract Enterprises and cloud service providers face dramatic increase in the amount of data
Cloud computing; stored in private and public clouds. Thus, data storage costs are growing hastily because they
Cloud storage; use only one single high-performance storage tier for storing all cloud data. There’s considerable
Main-memory database; potential to reduce cloud costs by classifying data into active (hot) and inactive (cold). In the
Hot/cold data; main-memory databases research, recent works focus on approaches to identify hot/cold data.
Cold data management Most of these approaches track tuple accesses to identify hot/cold tuples. In contrast, we introduce
a novel Hybrid Filtering Approach (HFA) that tracks both tuples and columns accesses in
main-memory databases. Our objective is to enhance the performance in terms of three dimensions:
storage space, query elapsed time and CPU time. In order to validate the effectiveness of our
approach, we realized its concrete implementation on Hekaton, a SQL’s server memory-
optimized engine using the well-known TPC-H benchmark. Experimental results show that the
proposed HFA outperforms Hekaton approach in respect of all performance dimensions. In
specific, HFA reduces the storage space by average of 44–96%, reduces the query elapsed time
by average of 25–93% and reduces the CPU time by average of 31–97% compared to the traditional
database approach.
Ó 2015 Production and hosting by Elsevier B.V. on behalf of Faculty of Computers and Information,
Cairo University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.
org/licenses/by-nc-nd/4.0/).

1. Introduction

Main-memory database (MMDB) is a database management


* Corresponding author.
system that primarily relies on main-memory for computer
Peer review under responsibility of Faculty of Computers and data storage. It is contrasted with database management sys-
Information, Cairo University. tems which employ a disk storage mechanism. Main-memory
databases are faster than disk-based databases since the inter-
nal optimization algorithms are simpler and execute fewer
Production and hosting by Elsevier CPU instructions. Accessing data in memory eliminates seek

http://dx.doi.org/10.1016/j.eij.2015.06.007
1110-8665 Ó 2015 Production and hosting by Elsevier B.V. on behalf of Faculty of Computers and Information, Cairo University.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
330 G.M. Afify et al.

time when querying the data, which provides faster and more systems are HYRISE [9], H-Store [10], HyPer [11] and
predictable performance than disk [1]. MonetDB [12]. These systems are suitable for the databases
Recent evolution in main-memory sizes has prompted huge that are smaller than the amount of the physical available
increases in the prevalence of database systems that keep the memory. If memory is exceeded, then it will lead to perfor-
entire database in memory. Nonetheless, main-memory is still mance problems. This problem of capacity limitation of
a scarce resource and expensive compared to disk [2]. A major main-memory DBMS has been addressed by a number of
goal of recent research works is to improve main-memory stor- recent works.
age optimization. The more free memory the larger systems to SAP HANA [13] is a columnar in-memory DBMS suitable
be stored in the database, which improves the performance for both OLTP and Online Analytical Processing (OLAP)
and the cost efficiency. The objective is to separate the data workloads. It offers an approach to handle data aging [14].
into active (hot) and inactive (cold) data. The hot data will Hot data refers to columns that are loaded into
remain in main-memory and the cold ones will be moved to main-memory and can be accessed by the DBMS. Cold data
a cheaper cold store [3]. The main difference in the existing is not loaded into main memory but is stored in the disk-
techniques is the level of granularity in which the data is based persistence layer. It uses the Least Recently Used
accessed and classified as hot or cold; which in some databases (LRU) technique to distinguish between hot and cold data.
is at the tuple-level and in others at page-level. Oracle Database 12c In-Memory Option [15] is based on
In the same context, cloud storage becomes more expensive dual-format data store, suitable for use by response-time crit-
because charges of ‘‘GB transferred” over the network vary ical OLTP applications as well as analytical applications for
with the amount of data transferred each month, conceivably real-time decision-making. Oracle in-memory column store
with amazing and capricious variations. Moreover, extra hid- uses LRU technique to identify hot/cold data.
den fees, such as connecting fees, maintenance charges, and HyPer is a main-memory hybrid OLTP and OLAP system
data access charges can add up quickly [4]. Therefore, the con- [11]. It has a compacting-based approach used to handle hot
cept of multi-temperature cloud storage (hot, cold) was devel- and cold data [16]. In this approach, the authors use the capa-
oped to improve the economics of storing the enormous bilities of modern server systems to track data accesses. The
amounts of data. Frequently accessed (hot) data is available data stored in a columnar layout is partitioned horizontally
on fast, high performance storage, while inactive (cold) data and each partition is categorized by its access frequency.
is archived onto lower cost storage [5]. Data in the (rarely accessed) frozen category is still kept in
To the best of the author’s knowledge, this is the first initia- memory but compressed and stored in huge pages to better uti-
tive to propose a Hybrid Filtering Approach (HFA) that hor- lize main memory. HyPer performs hot/cold data classification
izontally filters the database by hot tuples and then, vertically at the Virtual Machine (VM) page level.
filters the database by defining hot attributes, in the aspect of In [17], authors proposed a simple and low-overhead tech-
storage optimization (reducing storage space) in main-memory nique that enables main-memory database to efficiently
cloud database. Moreover, we prove its efficiency compared to migrate cold data to secondary storage by relying on the
the traditional approach using standard benchmark. Operating System (OS)’s virtual memory paging mechanism.
The contributions of this paper can be summarized as Hot pages are pinned in memory while, cold pages are moved
follow: out by the OS to cold storage.
In [18], the authors implemented hot and cold separation in
1. Comprehensive analysis of existing main-memory data- the main-memory database H-Store. The authors call this
bases that focus on hot/cold data management. approach ‘‘Anti-Caching” to underline that hot data is no
2. Introduce the proposed approach and explain it through a longer cached in main-memory but cold data is evicted to sec-
detailed case study. ondary storage. To trace accesses to tuples, tuples are stored in
3. Evaluate the effectiveness of the proposed approach using a a LRU chain per table.
standard benchmark. A comparable approach is presented in Hekaton [19], a
SQL server’s memory-optimized OTLP engine that manages
The remaining of this paper is organized as follow. hot and cold tuples. In Hekaton, the primary copy of the data-
Section 2 surveys the recent related work. Section 3 introduces base is entirely stored in main-memory. Hot tuples remain in
the proposed hybrid filtering approach. Section 4 presents a main-memory while cold ones are moved to cold secondary
detailed case study to illustrate the workflow of the proposed storage [20].
approach. Section 5 reports the experimental evaluation of Table 1 summarizes the comparison between hot/cold data
the proposed approach. Finally, Section 6 concludes the paper. management approaches in main-memory databases. We
observe that SAP HANA [14] vertically filters the data in a
columnar layout, which is a different context than the row lay-
2. Related work out employed in our HFA approach. Oracle 12c dual-format
[15] stores the primary copy of the data on disk, and then uses
Recent development in hardware has led to rapidly dropping the concept of hybrid filtering in its approach. However, it
market prices of main-memory in the past years. This develop- applies HF and VF on disk, and then moves these hot data
ment made it economically feasible to use the main-memory as into in-memory columnar store. In contrast, we apply hybrid
the primary data store of DBMS, which is the main character- filtering approach on data resident in main-memory. HyPer
istic of a main-memory DBMS. Recent research works focus [16,17] perform cold/hot data classification at the VM page
on main-memory DBMS storage. level, which is different from our scope. It is proved by [18]
Commercial systems include Oracle’s Times Ten [6], IBM’s that it is best to make the classification at the same level
solidDB [7], and VoltDB [8]. On the other hand, research of granularity that the data is accessed, which is at the
A hybrid filtering approach for storage optimization in main-memory cloud database 331

Table 1 Hot/cold data management approaches in main-memory databases.


Main-memory database Main-memory Hot/cold data Horizontal Filtering (HF) Vertical Filtering (VF) by Hybrid
approach physical layout classification by hot tuple hot column filtering
SAP HANA [14] Columnar Hot columns NO YES NO
Oracle 12c dual-format Both Hot tuples & hot YES YES YES
[15] columns
HyPer [16] Both Hot pages YES YES YES
Stoica et al. [17] Row Hot pages YES NO NO
Anti-caching [18] Row Hot tuples YES NO NO
Hekaton [19] Row Hot tuples YES NO NO
Proposed HFA Row Hot tuples & hot YES YES YES
columns

tuple-level. Compared to Anti-Caching [18], it uses the LRU 3. Proposed hybrid filtering approach
technique to horizontally filter the database, while our
approach uses the ‘‘datetime” key filtering method. Finally, Our proposed approach is composed of two phases as shown
Hekaton [19] is the work closest to our approach as it uses in Fig. 1. In the first phase, the offline analysis, we classify
the same horizontal filtering methodology by hot tuples which the hot and cold attributes. Hot attributes will remain in main
is using the application pattern ‘‘datetime” key to split the data memory and cold ones will be moved to a cheaper secondary
[21]. Therefore, we chose to build on their work and extend storage. In the second phase, the online analysis, the system
their architecture in order to implement our HFA. interacts with users. The user enters a query and receives a
Our novel Hybrid Filtering Approach (HFA) is based on a response to his query. In this paper, we focus on the offline
row store main-memory database. Our primary copy of data is analysis phase. Comprehensive details on the online analysis
entirely stored in main-memory. First, HFA horizontally fil- phase and query profiling process will be addressed in a sepa-
ters the data by hot tuples. Then, it vertically filters the data rate publication.
by hot columns.

Figure 1 Proposed hybrid filtering approach.


332 G.M. Afify et al.

3.1. Phase1: offline analysis the pre-specified threshold. Cold attributes will be stored to
cold-attributes list.
The offline phase is composed of three modules. Periodically,
we run the offline analysis to define the hot and cold attributes 3.1.3. Vertical filtering
in the log files and update the hot/cold attributes list. The Vertical filtering of a table T splits it into two or more tables
duration is predefined by the system administration according (sub-tables), each of which contains a subset of the attributes
to one of two factors either by time (i.e. number of months) or in T. Since many queries access only a small subset of the attri-
by database workload (i.e. number of queries). butes in a table, vertical filtering can reduce the amount of data
that needs to be scanned to answer the query. According to the
3.1.1. Horizontal filtering hot-attributes list, the database in main-memory is vertically
Similar to recent research work, the primary copy of database filtered at attribute-level of granularity to hot and cold attri-
resides in main-memory and is horizontally filtered at tuple- butes. The hot attributes will remain in main-memory, while
level of granularity to hot and cold tuples. The hot tuples will the cold ones will be migrated to a cold secondary storage.
remain in main-memory and the cold ones will be migrated to
a cold secondary storage. In HFA, we use a horizontal filtering
approach that depends mainly on the application business
logic. Thus, we use the filtering pattern ‘‘datetime” key to split Table 4 Query log file.
the data to hot/cold tuples [21]. Q_ID Table name Attributes

3.1.2. Frequent attributes identification 101 Items Item_ID, Brand, Description, Price
101 Customers Name, Phone
In this module, we developed a novel technique to identify the 102 Customers Name, Phone
hot/cold attributes. We analyze the queries stored in the log 102 Items Item_ID, Brand, Description, Price
files to compute the frequency of occurrence for each attribute. 102 Employee Name, Phone
The hot (most frequent) attributes are identified as the attri- 103 Items Item_ID, Brand, Description
butes that appear more than or equal to a pre-specified thresh- 103 Customers Name
old. The results are stored to hot-attributes list. On the other 104 Items Item_ID, Brand, Price, Cost
hand, the hot attribute will be cold if its frequency is less than 104 Customers Name

Table 2 Items table in main-memory.


Item_ID Brand Description Price Cost Size UPC Weight Taxable
11 Nabisco Cookies 2.25 1 20  2  18 124,576 23.5 1
12 Morries Cigarettes 5 3 777 235,467 78 0
13 Kraft Cheese 6 4 6  10  2 365,421 0.11 0
14 Kellog Cereal 1 0.5 999 875,465 15 1
15 Quaker Oatmeal 2.5 1 333 654,123 1.3 0
16 Nabisco Crackers 4 2 444 412,678 2.4 0
17 Brand Spagetti 0.99 0.5 222 127,896 3.4 0
18 Monte Candy 0.5 0.1 222 345,346 6.3 1
19 Hershy Candy 3.99 2 4  13  5 112,367 50.2 0
20 Kleenex Tissues 2.99 1 2  16  3 224,643 32 0

Table 3 (a) Hot tuples in main-memory (b) Cold tuples on disk.


Item_ID Brand Description Price Cost Size UPC Weight Taxable
(a)
11 Nabisco Cookies 2.25 1 20  2  18 124,576 23.5 1
12 Morries Cigarettes 5 3 777 235,467 78 0
13 Kraft Cheese 6 4 6  10  2 365,421 0.11 0
14 Kellog Cereal 1 0.5 999 875,465 15 1
15 Quaker Oatmeal 2.5 1 333 654,123 1.3 0
(b)
16 Nabisco Crackers 4 2 444 412,678 2.4 0
17 Brand Spagetti 0.99 0.5 222 127,896 3.4 0
18 Monte Candy 0.5 0.1 222 345,346 6.3 1
19 Hershy Candy 3.99 2 4  13  5 112,367 50.2 0
20 Kleenex Tissues 2.99 1 2  16  3 224,643 32 0
A hybrid filtering approach for storage optimization in main-memory cloud database 333

4.1. Phase1: offline analysis


Table 5 Attributes-frequency for Items table.
Attribute Frequency 1. Horizontal filtering: Items table is horizontally filtered to
Item_ID 4 hot tuples (remain in main-memory) and cold tuples
Brand 4 (moved to disk) as shown in Table 3.
Description 3 2. Frequent attributes identification: Identify hot/cold attri-
Cost 2 butes in the database, which involves two main steps.
Price 3
Weight 0
Step 1: Scan the query log file shown in Table 4 in order to
Shape 0
Taxable 0 find the most frequent attributes in Items table.
Size 0 Step 2: Employ a pre-specified attribute frequency thresh-
UPC 0 old (h) = 3 on the attribute-frequency table shown in
Table 5. Thus, the hot-attributes list = [Item_ID, Brand,
Description, Price] and the cold-attributes list = [Cost,
Weight, Shape, Taxable, Size, UPC].

3. Vertical filtering: Hot tuples from Table 3(a) are verti-


Algorithm 1: Query Execution cally filtered by the hot-attributes list. Consequently, the
Input: User Query Q data set for Items table in main-memory will have only
Output: Query Result R hot attributes of hot tuples as shown in Table 6.
1. Begin
2. if all attributes in Q are cold attributes then
3. Return a view of R from cold storage
4. Increment attributes counters 4.2. Phase 2: online analysis
5. else if all attributes in Q are hot attributes then
6. Return a view of R from hot storage Receive the user query and parse it to identify the requested
7. else tables and attributes.
8. Return a view of R from both hot and Query 105:
9. cold storage
10. end if
Select Cost, Weight, Taxable
11. for each attribute in Q
12. if counter >= threshold then From Items;
13. Change cold attribute into hot attribute
14. Update hot and cold attributes lists After parsing the query, it is noticed that it fits the first case
15. else as all query attributes are cold attributes (Cost, Weight, and
16. Change hot attribute into cold attribute Taxable) (Lines 2-4 in pseudo code). We increment these attri-
17. Update hot and cold attributes lists butes’ counters, and then return a view of the query attributes
18. end if from cold storage (disk) using the following T-SQL code
19. end for sample.
20. Return R to User
First, we create a view called V1
21. End

CREATE VIEW V1
AS SELECT Cost, Weight, Taxable
FROM dbo.Cold_Table;
3.2. Phase 2: online analysis GO

The online phase is composed of three modules. Second, we run the view to verify its contents

1. Query parsing: This module receives the user query and SELECT * FROM V1;
parses it to identify the requested tables and attributes. GO
2. Query storage: This module stores the user query into the
Log files. These attributes’ frequencies will be incremented such as
3. Query execution: This module executes the query and (Cost = 3, Weight = 1, Taxable = 1). The Cost attribute fre-
returns the results to the user. The algorithm of the query quency is equal to the threshold (Lines 11–14) then it’ll be
execution is demonstrated using pseudo code in added to hot-attributes list. Thus, the updated hot-attributes
Algorithm 1. list = [Item_ID, Brand, Description, Price, Cost] and Cold-
attributes list = [Weight, Shape, Taxable, Size, UPC].
Finally, the query is stored to the Log files.
4. Case study
5. Experimental evaluation of HFA approach
In this section, a detailed case study is presented in order to
demonstrate the proposed HFA workflow. Table 2 shows the In order to systematically validate the effectiveness of our
Items table which consists of 9 attributes. HFA approach, we have implemented it and the Hekaton
334 G.M. Afify et al.

Table 6 Vertical filtering of hot tuples in main-memory.


Item_ID Brand Description Price
11 Nabisco Cookies 2.25
12 Morries Cigarettes 5
13 Kraft Cheese 6
14 Kellog Cereal 1
15 Quaker Oatmeal 2.5

approach. In this section, we present details of the experiment


setup, the workload, the experiment scenario, and finally the
performance study is reported.

5.1. Experiment setup

Experiments run using the following resources:


Hardware platform: Intel Ò Core TM i7 CPU (@ 2.60 GHZ)
with 12 GB of RAM running on 64-bit Windows 8.1.
Software tools: Microsoft Server 2014 Enterprise Edition as
a software tool to build our in-memory database and tables
and run the queries. In addition, we use client statistics tool
to monitor and compare the performance of our queries.

5.2. Workload

For all experiments, we use the well-known TPC-H [22] bench-


mark, which has been used in reputable research works
[11,13,15]. The workload consists of two tables LINEITEM
and ORDERS. We populated the tables with data by the offi- Figure 2 Tables schema.
cial TPC-H toolkit, with scale factor SF = 1. LINEITEM
table has 6,001,215 rows and the ORDERS table has
1,500,000 rows. Fig. 2 shows the tables schema. The
LINEITEM table consists of 16 columns and the ORDERS 5.4. Performance study
table consists of 9 columns.
In this section, we experimentally evaluate the effectiveness of
5.3. Experiment scenario our HFA compared to Hekaton in terms of three performance
dimensions: storage space, query elapsed time and CPU time.
We base all experiments on a variant of the following query: We have used different cases of hot rows 25–75% using step of
25 employing different cases of hot columns. In ORDERS
Select hotcol1, hotcol2. . . table: HFA-2, HFA-4, HFA-6 and HFA-8 while in
From table LINEITEM table: HFA-3, HFA-7, HFA-11 and HFA-15.
Where hotcol operator x;
5.4.1. Storage space dimension
The value of x is any valid value according to the data type In this experiment, we investigate the storage space require-
of hotcol. The objective of the predicate in the where clause is ments of the proposed HFA compared to Hekaton in main-
to identify and retrieve the hot rows (e.g. OrderDate). The memory database. As shown in Fig. 3, results show that the
select clause vertically filters the table by the hot columns. storage space of all approaches increase with increasing num-
The HFA workflow can be summarized as follow: ber of hot rows. It is obvious that our HFA outperforms
Hekaton in all cases of vertical filtering.
 We create two memory-optimized tables (ORDERS and It can be noted that in Fig. 3(a), the best storage require-
LINEITEM) and store them entirely in the main-memory. ment for Hekaton is worse than the worst storage requirement
 We horizontally filter both tables by hot rows using differ- for the proposed HFA approach using HFA-2 and HFA-4 hot
ent cases of hot rows: 25%, 50% and 75% of the original columns. In Fig. 3(b), the best storage value for Hekaton is
table. worse than the worst storage requirement for the proposed
 We vertically filter each horizontal table by hot columns HFA approach using HFA-3 and HFA-7 hot columns.
using four different cases of hot columns. For ORDERS As shown in Fig. 4(a), results show that our HFA outper-
table: 2, 4, 6 and 8 hot columns. For LINEITEM table: forms Hekaton. The HFA approach has storage improvement
3, 7, 11 and 15 hot columns. on average 44–96% and Hekaton approach on average 25–
A hybrid filtering approach for storage optimization in main-memory cloud database 335

Hekaton HFA-2 cols HFA-4 cols Hekaton HFA-3 cols HFA-7 cols
HFA-6 cols HFA-8 cols HFA-11 cols HFA-15 cols
300
1000

Storage Space (MB)


250

Storage Space (MB)


200 750
150
500
100
250
50
0 0
25% 50% 75% 25% 50% 75%
Hot Rows (%) Hot Rows (%)
(a) (b)
Figure 3 Storage space requirements (a) for ORDERS table (b) for LINEITEM table.

Hekaton HFA-2 cols HFA-4 cols Hekaton HFA-3 cols HFA-7 cols
HFA-6 cols HFA-8 cols HFA-11 cols HFA-15 cols
120% 100%
Storage Space Imp. (%)

Storage Space Imp. (%)


100% 80%
80% 60%
60%
40%
40%
20% 20%

0% 0%
25% 50% 75% 25% 50% 75%
Hot Rows (%) Hot Rows (%)
(a) (b)
Figure 4 Storage space improvements (a) for ORDERS table (b) for LINEITEM table.

76% compared to the original ORDERS table. In Fig. 4(b), From Fig. 5(a), it can be noted that the best elapsed time
the HFA approach has storage improvement on average 47– value for Hekaton is worse than the best value for our pro-
94% and Hekaton approach on average 25–75% compared posed HFA approach using HFA-2, HFA-4 and HFA-6. In
to the original LINEITEM table. Fig. 5(b), the best elapsed time value for Hekaton is worse than
the best value for our proposed HFA approach using HFA-3,
5.4.2. Query elapsed time dimension HFA-7 and HFA-11 hot columns.
In this experiment, we investigate the query elapsed time of the As shown in Fig. 6(a), results show that our HFA outper-
proposed HFA compared to Hekaton in main-memory data- forms Hekaton. The HFA approach has elapsed time improve-
base. As shown in Fig. 5, results show that the query elapsed ment on average 25–90% and Hekaton approach on average
time of all approaches increase with increasing number of 12–74% compared to the original ORDERS table. In Fig. 6
hot rows. It is obvious that our HFA outperforms Hekaton (b), the HFA approach has elapsed time improvement on aver-
in all cases of vertical filtering except in the case of HFA-8 age 45–93% and Hekaton approach on average 40–81% com-
hot columns in the case of hot rows less than 50%. pared to the original LINEITEM table.

Hekaton HFA-2 cols HFA-4 cols Hekaton HFA-3 cols HFA-7 cols
HFA-6 cols HFA-8 cols
HFA-11 cols HFA-15 cols
25
125
Elapsed Time (Sec)

20
Elapsed Time (Sec)

100
15 75
10 50

5 25

0 0
25% 50% 75% 25% 50% 75%
Hot Rows (%) Hot Rows (%)
(a) (b)
Figure 5 Query elapsed time (a) for ORDERS table (b) for LINEITEM table.
336 G.M. Afify et al.

Hekaton HFA-2 cols HFA-4 cols Hekaton HFA-3 cols HFA-7 cols
HFA-6 cols HFA-8 cols HFA-11 cols HFA-15 cols
100% 100%

Elapsed Time Imp. (%)


Elapsed Time Imp. (%)
80% 80%

60% 60%

40% 40%

20% 20%

0% 0%
25% 50% 75% 25% 50% 75%
Hot Rows (%) Hot Rows (%)
(a) (b)
Figure 6 Elapsed time improvements (a) for ORDERS table (b) for LINEITEM table.

Hekaton HFA-2 cols HFA-4 cols Hekaton HFA-3 cols HFA-7 cols
HFA-6 cols HFA-8 cols
2.5 HFA-11 cols HFA-15 cols
12.5
2
CPU Time (Sec)

CPU Time (Sec)


10
1.5
7.5
1 5
0.5 2.5

0 0
25% 50% 75% 25% 50% 75%
Hot Rows (%) Hot Rows (%)
(a) (b)
Figure 7 CPU time (a) for ORDERS table (b) for LINEITEM table.

Hekaton HFA-2 cols HFA-4 cols Hekaton HFA-3 cols HFA-7 cols
HFA-6 cols HFA-8 cols HFA-11 cols HFA-15 cols
120% 120%

100% 100%
CPU Time Imp. (%)
CPU Time Imp. (%)

80% 80%

60% 60%

40% 40%

20% 20%

0% 0%
25% 50% 75% 25% 50% 75%
Hot Rows (%) Hot Rows (%)
(a) (b)
Figure 8 CPU time improvements (a) for ORDERS table (b) for LINEITEM table.

5.4.3. CPU time dimension Fig. 7(b), the best CPU time value for Hekaton is worse than
In this experiment, we investigate the CPU time of the pro- the best value for our proposed HFA approach using HFA-3,
posed HFA compared to Hekaton in main-memory database. HFA-7 and HFA-11 hot columns.
As shown in Fig. 7, results show that the CPU time of all As shown in Fig. 8(a), results show that our HFA out-
approaches increase with increasing number of hot rows. It performs Hekaton. The HFA approach has CPU time
is obvious that our HFA outperforms Hekaton in all cases improvement on average 31–97% and Hekaton approach
of vertical filtering except in the case of HFA-15 hot columns on average 12–62% compared to the original ORDERS
in the case of hot rows less than 50%. table. In Fig. 8(b), the HFA approach has CPU time
From Fig. 7(a), it can be noted that the best CPU time improvement on average 60–96% and Hekaton approach
value for Hekaton is worse than the best value for our pro- on average 41–83% compared to the original LINEITEM
posed HFA approach using HFA-2, HFA-4 and HFA-6. In table.
A hybrid filtering approach for storage optimization in main-memory cloud database 337

6. Conclusion [8] Stonebraker M, Weisberg A. The VoltDB main memory DBMS.


IEEE Data Eng Bull 2013;36(2):21–7.
[9] Grund M, Kruger J, Plattner H, Zeier A, Cudre-Mauroux P,
Due to the budgetary challenges of storing vast amounts of Madden S. HYRISE – a main memory hybrid storage engine.
data in the cloud, identifying hot/cold storage is emerging as Proc VLDB Endow 2012;4(2):105–16.
a significant trend. To contribute to this research, we investi- [10] Kallman R, Kimura H, Natkins J, Pavlo A, Rasin A, Zdonik S, et al.
gated the optimization of the storage space requirements with H-Store: a high-performance, distributed main memory transaction
the aim of reducing the cost in main-memory cloud databases. processing system. Proc VLDB Endow 2008;1(2):1496–9.
We conducted a comprehensive analysis of existing main- [11] Kemper A, Neumann T. HyPer: A hybrid OLTP&OLAP main
memory databases that focus on hot/cold data management. memory database system based on virtual memory snapshots.
27th International Conference on Data Engineering (ICDE),
We proposed a novel Hybrid Filtering Approach (HFA) that
IEEE; 2011. p. 195–206.
filters the tables in the main-memory both horizontally by
[12] Boncz P, Zukowski M, Nes N. MonetDB/X100: hyper-pipelining
rows and then vertically by columns. We demonstrated its query execution. CIDR 2005;5:225–37.
workflow through a detailed case study. We evaluated the [13] Färber F, Cha SK, Primsch J, Bornhövd C, Sigg S, Lehner W.
effectiveness of HFA using the standard TPC-H benchmark. SAP HANA database – data management for modern business
Experimental evaluation proved that the proposed HFA applications. ACM Sigmod Record 2011;40(4):45–51.
approach is superior to Hekaton in terms of all performance [14] Archer S. Data-aging strategies for SAP NetWeaver BW focusing
metrics in main-memory row store database. The proposed on BW’s new NLS offering for Sybase IQ. SAP BW Product
HFA reduces the storage space by average of 44–96%, reduces Management; 2013.
the query elapsed time by average of 25–93% and reduces the [15] Colgan M, Kamp J, Lee S. Oracle database in-memory. Oracle
White Paper; 2014.
CPU time by average of 31–97% compared to the traditional
[16] Funke F, Kemper A, Neumann T. Compacting transactional
database approach.
data in Hybrid OLTP & OLAP databases. Proc VLDB Endow
2012;5(11):1424–35.
References [17] Stoica R, Ailamaki A. Enabling efficient OS paging for main-
memory OLTP databases. In: Proceedings of the ninth interna-
[1] Gupta M, Verma V, Verma M. In-memory database systems-a tional workshop on data management on new hardware, ACM;
paradigm shift. arXiv preprint arXiv, vol. 6, no. 6; December 2013. 2013.
[2] Arora I, Gupta A. Improving performance of cloud based [18] DeBrabant J, Pavlo A, Tu S, Stonebraker M, Zdonik S. Anti-
transactional applications using in-memory data grid. Int J caching: a new approach to database management system
Comput Appl 2014;107(13). architecture. Proc VLDB Endow 2013;6(14):1942–53.
[3] Boissier M. Optimizing main memory utilization of columnar in- [19] Diaconu C, Freedman C, Ismert E, Larson P-A, Mittal P,
memory databases using data eviction. VLDB PhD Workshop; Stonecipher R, et al. Hekaton: SQL server’s memory-optimized
2014. OLTP engine. In: Proceedings of the international conference on
[4] Mark P. Buffington J, Keane M. Cloud storage: the next Frontier management of data, SIGMOD, ACM; 2013. p. 1243–54.
for tape. White paper of Enterprise Strategy Group, April 2013. [20] Delaney K. SQL server in-memory OLTP internals overview for
[5] Song Y. Storing big data—the rise of the storage cloud. Advanced CTP2. SQL Server Technical Article; 2013.
Micro Devices, Inc., AMD; December 2012. [21] Weiner M, Levin A. In-memory OLTP – common workload
[6] Lahiri T, Neimat M, Folkman S. Oracle times ten: an in-memory patterns and migration considerations. SQL Server Technical
database for enterprise applications. IEEE Data Eng Bull Article; 2014.
2013;36(2):6–13. [22] TPC BENCHMARKTM H Standard Specification Revision 2.17.1.
[7] Lindstroem J, Raatikka V, Ruuth J, Soini P, Vakkila K. IBM Transaction processing performance council; 2014. Information
solidDB: in-memory database optimized for extreme speed and available at <http://www.tpc.org/tpch/>.
availability. IEEE Data Eng Bull 2013;36(2):14–20.

You might also like