Arxiv
Arxiv
[email protected] [email protected]
propose a language that can express a wide range of physical Chooses Layout Equivalent
database layouts, going well beyond the row- and column- Transforms Layouts
based methods that are widely used in database management Expert User
Best
Layout
competitive with a state-of-the-art in memory compiled data- Query Query Data Results
base system. Parameters Executable Binary
...
Length (~4B)
...
List List
Tuple Count (~2B)
Count (~2B)
lc.id lc.enter Length (~4B)
Length (~4B)
Tuple
...
...
lc.enter
...
...
(b) A hash-based layout. of precomputation. The correct choice will depend on the
size of the dataset and the size of memory and cache.
Figure 3. Sample layouts (Relational data is highlighted). When we evaluated the three implementations of the
2.4 Hash-index Layout DemoMatch queries, we found that the nested layout (Sec. 2.3)
Now we optimize for lookup performance by fully materi- had the worst performance (11.5ms) and was not the small-
alizing the join and creating a hash index. This layout will est of the three (50Mb). The hash-index based query was
be larger than the nested layout but look ups into the hash the fastest (0.4ms) but produced the largest layout (60Mb).
index will be quick, which will make evaluating the equality The query which combined the hash and ordered index was
predicates on id fast. slightly slower (0.6ms) but had a significantly smaller layout
Figure 3b shows the structure of the resulting layout ( (9.8Mb). We can conclude that for this dataset it is not worth
Figure 10 in the Appendix shows the program). We should precomputing the join as long as the right indexes are used.
be able to fit this layout in memory, but likely not in cache. Although we did not discuss the specific transformations
Since the access pattern through the hash table is random, in this section, in Sec. 4 we describe how layout optimization
there could be a lot of cache thrashing as parts of the layout can be performed by applying a sequence of transformations
are loaded in and out over successive queries. to the original query. The transformations and query can be
saved and re-compiled if the data changes. We need to save
2.5 Hash- and Ordered-index Layout the transformations instead of saving the final query, because
Finally, we investigate a layout which avoids the full join the data could change in ways that invalidate some trans-
materialization, but still has enough indexing to be fast. We formations. Re-running the transformations when the data
can see that the join condition is a range predicate, so we use changes means that we can catch invalid transformations at
an ordered index to make that predicate efficient (Sec. 4.5). compile time.
Then we can push the filters and introduce a hash table to
select idp . The resulting layout is shown in Figure 4 (the pro- 3 Language
gram is shown in the appendix in Figure 11). This layout will In this section we describe the layout algebra. The layout
be larger than the original relation, but not by as much as the algebra starts with the relational algebra and extends it with
other two layouts, and it allows for much faster computation layout operators. These layout operators have relational se-
of the join and one of the filters. mantics, but they also have layout semantics which describes
how to serialize the layout operators to data structures. The
2.6 Discussion combination of relational and layout operators allows the
Each of these layouts represents a trade-off between the layout algebra to express both a query and the data store
size of the layout and the amount of work that must be that supports the execution of the query.
done when running the query. At one end of this spectrum
3.1 Basics
is a layout that precomputes almost everything but has to
store the results of the computation. At the other end is the Programs in the layout algebra have three semantic interpre-
original query, which performs a lot of work at runtime to tations.
compute the join but only needs to store the original dataset. 1. The relational semantics describes the behavior of a
In between are several choices that represent varying degrees layout algebra program at a high level. We define this
4
n ::= identifiers
The semantics operates on three kinds of values: scalars,
tuples and relations. Scalars are values like integers, Booleans,
v ::= integers | strings | Booleans | floats | dates | null
and strings. Tuples are finite mappings from fields to scalar
e ::= v | n | n.n | e + e ′ | e − e ′ | e × e ′ | e/e ′ | e mod e ′ values. Fields can be single names (n) or they can have an
| e < e ′ | e ≤ e ′ | e > e ′ | e ≥ e ′ | e = e ′ | e as n optional relation name (n.n).
| if e then el else er | exists(l) | (l) Relations are represented as finite, ordered sequences of
| count() | sum(e) | min(e) | max(e) | avg(e) tuples. [ ] stands for the empty relation, : is the relation
constructor, and ++ denotes the concatenation of relations.
k ::= concat | cross
The decision to use sequences to represent the output of
o ::= asc | desc relational operators instead of sets has two consequences.
l ::= scan(n) | l as n | select([e 1 , . . . , en ], l) | filter(e, l) First, treating the output of a relational operator as a se-
| join(e, l, l ′ ) | group-by([e 1 , . . . , en ], [f 1 , . . . , fm ], l) quence is more like bag semantics than the set semantics of
| order-byo ([e 1 , . . . , en ], l) | dedup(l) the original relational algebra. This choice brings the layout
algebra more in line with the semantics of SQL, which is
| ∅ | scalar(e) | list(lk , lv ) | tuplek ([l 1 , . . . , ln ])
convenient for our implementation. Second, sequences allow
| hash-idx(lk , lv , ekey ) | ordered-idx(lk , lv , elow , ehiдh )
us to represent query outputs which have an ordering.
Figure 5. Syntax of the layout algebra. In the semantic rules, σ is an evaluation context; it maps
names to scalar values. δ is a relational context; it maps
names to relations. We separate the two contexts because
semantics using a theory of ordered finite relations [7]. the relational context δ is global and immutable; it consists of
This is a strict generalization of SQL semantics. Layout a universe of relations that exist when the query is executed
algebra programs can be evaluated according to this (or compiled) which are contained in some other database
semantics in a context containing relations and query system. The evaluation context σ initially contains the query
parameters to produce an output relation. parameters, but some operators introduce new bindings in σ .
2. The layout semantics describes how the compiler eval- ∪ denotes the binding of a tuple into an evaluation context.
uates the layout operators to produce a data file con- Read σ ∪ t as a new evaluation context that contains the
taining the data needed by the query (Sec. 5.2). The fields in t in addition to the names already in σ .
layout semantics operates in a context which contains In the rules, ⊢ separates contexts and expressions and ⇓
relations, but not query parameters. separates expressions and results. Read σ , δ ⊢ l ⇓ s as “the
3. The runtime semantics describes how the compiled layout l evaluates to the relation s in the context σ , δ .”
query executes, reading the layout file and using the We borrow the syntax of list comprehensions to describe
query parameters to produce the query output (Sec. 5.3). the semantics of the layout algebra operators. For example,
The runtime semantics operates in a context which consider the list comprehension in the filter rule:
contains query parameters but not relations.
[t | t ← s ′, (σ ∪ t, δ ⊢ e ⇓ true)].
These three semantics are connected: the layout seman-
This list comprehension evaluates to a sequence of tuples t
tics and the runtime semantics combine to implement the
from the relation s ′ where the predicate e is true. σ ∪ t, δ ⊢ e
relational semantics. The relational semantics serves as a
denotes the evaluation of e because e may refer to the fields
specification. An interpreter written according to the relat-
in t. When the comprehension contains multiple ← as in
ional semantics should execute layout algebra programs in
the join rule, this should be read as a cross product of the
the same way as our compiler.
relations.
3.2 Syntax 3.3.1 Relational Operators
Figure 5 shows the syntax of the layout algebra. Note that the First, we describe the semantics of the relational operators:
layout algebra can be divided into relational operators (select, scan, filter, join, select, group-by, orderby, dedup, and as.
filter, join, etc.) and layout operators (list, hash-idx, etc.). The These operators are modeled after their equivalent SQL con-
layout algebra is a strict superset of the relational algebra. structs. scan accesses a relation in the relational context δ .
In fact, the layout operators have relational semantics in filter uses a predicate e to remove rows from the input re-
addition to byte-level data layout semantics (see Sec. 3.3.2). lation. join joins two relations using a predicate e. select is
used to add and remove fields from relations as well as for
3.3 Semantics aggregation. It takes a list of expressions E and a relation
Figure 6 shows the relational semantics of the layout algebra. r . If E contains no aggregation operators, then a new tuple
The semantic rules for the relational operators are on the will be constructed according to E for each tuple in r . If E
left, and the layout operators are on the right. contains an aggregation operator (count, sum, min, max,
5
Tuple = {Field 7→ V alue}, Relation = [Tuple] σ,δ ⊢ e ⇓ v
E-Empty E-Scalar
σ , t : Tuple, s : Relation, v : V alue, δ : Id 7→ Relation σ,δ ⊢ ∅ ⇓ [ ] σ , δ ⊢ scalar(e) ⇓ [v]
σ , δ ⊢ lk ⇓ [ ]
δ [n] = s E-List1
E-Scan σ , δ ⊢ list(lk , lv ) ⇓ [ ]
σ , δ ⊢ scan(n) ⇓ s
σ , δ ⊢ lk ⇓ t : ts
σ,δ ⊢ l ⇓ s′ s = [t | t ← s ′, (σ ∪ t, δ ⊢ e ⇓ true)]
E-Filter σ ∪ t, δ ⊢ lv ⇓ s σ , δ ⊢ list(ts, lv ) ⇓ s ′
σ , δ ⊢ filter(e, l) ⇓ s E-List2
σ , δ ⊢ list(lk , lv ) ⇓ s ++ s ′
σ, δ ⊢ ll ⇓ sl σ , δ ⊢ lr ⇓ sr
k = cross ∨ k = concat
tl ← sl , tr ← sr , E-Tuple1
s = tl ∪ tr σ , δ ⊢ tuplek ([ ]) ⇓ [ ]
(σ ∪ tl ∪ tr , δ ⊢ e ⇓ true)
E-Join
σ , δ ⊢ join(e, ll , lr ) ⇓ s σ , δ ⊢ l ⇓ sl
t ← sl , ts ← sls ,
ei contains no aggregates σ,δ ⊢ l ⇓ s′ s = t ∪ ts
σ ∪ t, δ ⊢ tuplecross (ts) ⇓ sls
s = [{name(ei ) 7→ vi } | t ← s ′, (σ ∪ t, δ ⊢ ei ⇓ vi )] E-Tuple2
E-Select σ , δ ⊢ tuplecross (l : ls) ⇓ s
σ , δ ⊢ select([e 1 , . . . , en ], l) ⇓ s
σ , δ ⊢ l ⇓ sl σ , δ ⊢ tupleconcat (ls) ⇓ sls
σ,δ ⊢ l ⇓ s′ E-Tuple3
∀t ∈ s ′ . ∃i.1 ≤ i ≤ |s | ∧ s[i] = t ∧ ∀j. j = i ∨ t , s[j] σ , δ ⊢ tupleconcat (l : ls) ⇓ sl ++ sls
E-Dedup
σ , δ ⊢ dedup(l) ⇓ s
σ , δ ⊢ dedup(lk ) ⇓ sk σ,δ ⊢ e ⇓ k
σ , δ ∪ k ⊢ lv ⇓ sv
σ,δ ⊢ l ⇓ s′
(k ∈ sk ∧ s = sv ) ∨ (k < sk ∧ s = [ ])
s = [[((n, n f ), v) | (nr .n f , v) ← t] | t ← s ′ ] E-HashIdx
E-As σ , δ ⊢ hash-idx(lk , lv , e) ⇓ s
σ , δ ⊢ l as n ⇓ s
σ , δ ⊢ dedup(lk ) ⇓ sk
σ , δ ⊢ el ⇓ lb σ , δ ⊢ eh ⇓ ub
[s 1 , . . . , sn ] = [s | t ∈ sk , lb ≤ t ≤ ub, σ , δ ∪ t ⊢ s]
s = s 1 ++ . . . ++ sn
E-OrderedIdx
σ , δ ⊢ ordered-idx(lk , lv , el , eh ) ⇓ s
Figure 6. Execution semantics of the layout algebra (left: relational operators, right: layout operators).
avg), then select will aggregate the rows in r . If E contains stored together, increasing spatial locality. Note that layout
both aggregation and non-aggregation operators, then the operators can capture the results of executing common relat-
non-aggregation operators will be evaluated on the last tuple ional algebra operations such as joins or selections, allowing
in r . For brevity, we omit the rules for selection with aggre- query processing to be replaced with data layouts. In addi-
gates from Figure 6. The semantics of group-by and order-by tion, layout primitives can express common relational data
are standard, so we omit them from Figure 6. The group-by storage patterns, such as row stores and clustered indexes.
operator takes a list of expressions, a list of fields, and a rela- Castor supports the following data structures:
tion. It groups the tuples in the relation by the values of the Scalars: Scalars can be machine integers (up to 64 bits),
fields, then computes the aggregates in the expression list. strings, Booleans, and decimal fixed-point.
The order-by operator takes a list of expressions, an order, Tuples: Tuples are layouts that can contain layouts with
and a relation. It orders the tuples in the relation by the order different types. If a collection contains tuples, all the tuples
using the expressions the compute a key. dedup removes must have the same number of elements and their elements
duplicate records from its input. as renames a relation. must have compatible types. Tuples can be read either by
A layout algebra program written entirely using relational taking the cross product or concatenating their sub-layouts.
operators can be translated directly to a SQL query. We use Lists: Lists are variable-length layouts. Their contents must
this property later when implementing the layout serializer. be of the same type.
Hash indexes: Hash indexes are mappings between scalar
3.3.2 Layout Operators keys and layouts, stored as hash tables. Like lists, their keys
must have the same type.
We extend the relational algebra with layout operators that
Ordered indexes: Ordered indexes are mappings between
specify the layout of data in memory at a byte level. The
scalar keys and layouts, stored as ordered mappings.
nesting and ordering of the layout operators correspond to
Each data structure has a corresponding layout opera-
the nesting and ordering of the data structures that they
tor. The layout operators are the novel part of the layout
represent. Nesting allows data that is accessed together to be
6
algebra and their semantics are therefore non-standard. The time. Second, references to the relations in δ can only appear
relational semantics of the layout operators are in Figure 6. in compile-time expressions. This requirement ensures that
Although the layout operators can be used to construct com- the query will depend only on the data that is stored in a
plex, nested layouts, they evaluate to flat relations of tuples layout at runtime. Finally, relational operators (Sec. 3.3.1) in
of scalars, just like the relational operators. runtime positions cannot refer to variables bound by static
We discuss the list operator in detail; the hash-idx and expressions. This ensures that relational operators that are
ordered-idx operators behave similarly. The E-List rule (Fig- interleaved with layout operators do not require any rep-
ure 6) specifies the behavior of list. resentation in the layout. The compiler uses a simple type
The list operator takes two arguments, lk and lv . lk de- system to check for serializability.
scribes the data in the list and lv describes the format of that
data. Specifically, evaluating lk produces a relation, and the 4 Transformations
two list rules in Figure 6 recursively decompose this relation. In this section, we define semantics preserving transforma-
The first list rule is straightforward: if lk evaluates to the tions that optimize query and layout performance. These
empty relation, then the list is empty. If lk evaluates to a transformations change the behavior of the program with re-
non-empty relation t : ts then the second rule applies. In this spect to the layout and runtime semantics while preserving
rule, σ ∪ t, δ ⊢ l f ⇓ s says that the first layout in the list will it with respect to the relational semantics. These transfor-
evaluate to a relation s in a context that contains the con- mations subsume standard query optimizations because in
tents of the tuple t. The process of evaluating lv continues addition to changing the structure of the query, they can
recursively for all of the tuples in ts, producing a relation s ′. also change layout fragments in an expression.
The final result is the concatenation of s and s ′.
The remaining layout operators (scalar and tuple) are 4.1 Notation
simpler because they do not introduce any bindings. The
Transformations are written as inference rules. When writ-
tuple operator contains other layout operators and the scalar
ing inference rules, e will refer to scalar expressions and l
operator contains scalar values represented as expressions.
will refer to layout algebra expressions. E and L will refer to
Note that evaluating a tuple operator produces a relation
lists of expressions and layouts. In general, the names we use
not a tuple. Even evaluating a scalar operator produces a
correspond to those used in the syntax description (Figure 5).
relation containing a single tuple. Although these semantics
If we need to refer to a piece of concrete syntax, it will be
are slightly surprising, there are two reasons why we chose
formatted as e.g., concat or x.
this behavior. First, it is consistent with the other layout
Some of the transformation rules make a distinction for
operators, all of which evaluate to relations. Second, tuples
parameter-free expressions. An expression is parameter-free
which contain other layouts (lists for example) must evaluate
if it does not refer to any query parameters, which are special
to relations because in our semantics, tuples can only contain
variables that are bound when the query is executed. The
values, so no nested relations are possible.
compiler automatically determines which expressions are
Returning to the query in Sec. 2.3, the inner list operator
parameter-free. In the transformation rules, parameter-free
list(filter(lp.enter <enter ∧enter <lp.exit, loд) as lc, expressions are denoted as e .
tuplecross ([scalar(lc.id), scalar(lc.enter )])) To avoid writing many trivial inductive transformation
rules, we define transformation contexts, which describe
selects the tuples in loд where enter is between lp.enter and
when transformations are allowed. A transformation context
lp.exit, and creates a list of these tuples. The first argument
is an expression with a single hole. The expression in the
describes the contents of the list and the second describes
hole can be transformed. The grammar of contexts is:
their layout.
join(e = e ′ , l, l ′) → tuplecross ([l, hash-idx(lk , lv , e)]) ll t = filter(|x | < 127, l) lдt = filter(|x | > 127, l)
This is similar to how a traditional database would implement .
list(l, scalar(x)) →
a hash join, but in our case the hash table is precomputed.
Using a hash table adds some overhead from the indirection tupleconcat ([list(ll t , scalar(x)), list(lдt , scalar(x))])
9
4.9 Range Compression n ::= Z
r ::= [n, n]
We can make range splitting more effective by recognizing
cases where values fall into a small range: t ::= intT(r ) | boolT | fixedT(r , nscal e ) | stringT(rchar s )
| tupleT([t 1 , . . . , tk ]) | listT(t, nel ems ) | hash-idxT(tk , tv )
min = minl x
| ordered-idxT(tk , tv ) | emptyT
l′ = select([(x ′ + min) as x], scalar((x − min) as x ′))
. Figure 7. Syntax of the layout types.
list(l, scalar(x)) → list(l, l ′)
Rewriting the values could allow us to use a smaller integer σ ⊢ scalar(e) ⇓ x x is an integer
representation or to apply the previous transformation. Note σ ⊢ scalar(e) : intT([x, x])
that this transformation depends on the particular values
stored in the layout. Castor can efficiently access the data σ ⊢ l 1 : t 1 , . . . , σ ⊢ lk : tk
for a layout expression by generating a SQL query and using σ ⊢ tuple([l 1 , . . . , lk ]) : tupleT([t 1 , . . . , tk ])
an existing database system to execute it. We use the same
σ ⊢ lk ⇓ s t= σ ′ ∈s,σ ′ ⊢l k :t ′ t
′
Ã
mechanism when serializing a layout.
list(lk , lv ) : listT(t, [|s |, |s |])
4.10 Predicate Precomputation
t 1 = intT([l 1 , h 1 ]) t 2 = intT([l 2 , h 2 ])
In some queries, it is known in advance that a parameter will
t 1 ⊔ t 2 = int([min(l 1 , l 2 ), max(h 1 , h 2 )])
come from a restricted domain. If this parameter is used as
part of a filter or join predicate, precomputing the result of Figure 8. Selected semantics of the type inference pass.
running the predicate for the known parameter space can
be profitable, particularly when the predicate is expensive to
compute. Let p be a query parameter and Dp be the domain 2. A serialization pass generates a binary representation
of values that p can assume. of the layout, using information from the layout type
to specialize the layout to the data (Sec. 5.2).
relations(l) = {r } Ô params(e) = {p} 3. A syntax-directed lowering pass transforms each query
w i = e[p 7→ Dp [i]] e ′ = i (w i ∧ p = Dp [i]) ∨ e and layout operator into an imperative intermediate
r ′ = select([w 1 , . . . , w |Dp | , . . . ], r ) representation, using the information in the layout
filter(e, l) → filter(e ′, l[r 7→ r ′]) type to generate the appropriate layout reading code.
The Castor IR is lowered to LLVM IR which is then
This transformation generates an expression w i for each optimized, compiled to native code, and linked with
instantiation of the predicate with a value from Dp . The w i s C code that provides a command line interface to the
are selected along with the original relation r . When we query (Sec. 5.3).
later create a layout for r , the w i s will be stored alongside it.
When the filter is executed, if the parameter p is in Dp , the 5.1 Layout Types
or will short-circuit and the original predicate will not run. To determine the appropriate sizes of the various layouts,
However, this transformation is semantics preserving even if a type inference pass computes the ranges of values in the
Dp is underapproximate. If the query receives an unexpected layouts. The syntax of the layout types is shown in Figure 7.
parameter, then it executes the original predicate e. Note Integers are abstracted using an interval, as are the numer-
that in the revised predicate e ′, p = D P [i] can be computed ators of fixed point numbers. Note that every element in
once for each i, rather than once per invocation of the filter collections like lists and indexes must be of the same type.
predicate. Tuples can contain elements of different types. The operation
We use this transformation on TPC-H queries 2 and 9 to of the type inference pass is shown in Figure 8.
eliminate expensive string comparisons. After computing the layout type, the layout is serialized
to a file and code is generated for executing the query.
5 Compilation
The result of applying the transformation rules is a program 5.2 Layout Serialization
in the layout algebra. This program is still quite declarative, Each of the layout operators has a binary serialization format.
so there is a significant abstraction gap to cross before the The format of the layouts is intended to minimize the space
program can be executed efficiently. Compilation of layout needed to store them and to minimize the use of pointers to
algebra programs proceeds in three passes: preserve data locality.
1. A type inference pass computes a layout type, which • Integers are stored using the minimum number of
contains information about the ranges of values in the bytes, from 1 to 8 bytes.
layout (Sec. 5.1). • Booleans are stored as single bytes.
10
b : Byte strinд, σ , t : Tuple, δ : Id 7→ Relation and a query generator which instantiates the templates. The
σ,δ ⊢ e ⇓ v queries in TPC-H are inherently parametric, and their param-
b is the binary format of v eters come from the domains defined by the query generator.
L-Empty L-Scalar
σ , δ ⊢ ∅ ↓ ”” σ , δ ⊢ scalar(e) ↓ b To build our benchmark, we took the query templates from
TPC-H and encoded them as Castor programs. It is im-
σ , δ ⊢ r k ⇓ [t 1 , . . . , tn ] ∀1 ≤ i ≤ n. σ ∪ ti , δ ⊢ rv ↓ bi
portant that the queries be parametric, because specializing
σ , δ ⊢ scalar(|b1 . . . bn |) ↓ bl en σ , δ ⊢ scalar(n) ↓ bct
b = bct bl en b1 . . . bn non-parametric queries is boring; a non-parametric query
L-List can be evaluated and the result stored.
σ , δ ⊢ list(r k , rv ) ↓ b
TPC-H is a general purpose benchmark, so it exercises
∀1 ≤ i ≤ n. σ , δ ⊢ r i ↓ bi a variety of SQL primitives. We chose not to implement
σ , δ ⊢ scalar(|b1 . . . bn |) ↓ bl en b = bl en b1 . . . bn all of these primitives in Castor, not because they would
L-Tuple
σ , δ ⊢ tuplek ([r 1 , . . . , r n ]) ↓ b be prohibitively difficult, but because they are not directly
related to the layout specialization problem. In particular,
Figure 9. Selected semantics of the layout serialization pass.
Castor does not support executing order-by, group-by, join,
or dedup operators at runtime, and it does not support limit
• Fixed point numbers are normalized to a fixed scale,
clauses at all. Some of these operators can be replaced by
and stored as integers.
layout specialization, but others cannot. We implemented the
• Tuples are stored as the concatenation of the layouts
first 17 queries in TPC-H. Of these queries, we dropped query
they contain, prefixed by a length.
13 because it contains an outer join and removed runtime
• Lists are stored as a length followed by the concatena-
ordering and limit clauses from four other queries (noted in
tion of their elements. They can be efficiently scanned
Table 1).
through, but not accessed randomly by index.
• Hash indexes are implemented using minimal perfect
hashes [4, 13]. The hash values are stored as in a list, 6.2 Baselines
but during serialization a lookup table is generated We compare Castor with PostgreSQL and Hyper. We com-
using the CMPH library and stored before the values. pare against PostgreSQL because it is commonly used and
Using perfect hashing allows the hash indexes to have provides context, not because it is a comparable system. Hy-
load factors up to 99%. per is an in-memory column-store which has a state-of-the
• Ordered indexes are similar to hash indexes in that art query compiler. It implements compilation techniques
they store a lookup table in addition to storing the (e.g. vectorization) that are well outside the scope of this
values. In the case of the ordered index keys are stored paper. We compare against Hyper in two modes: with the
sorted and the correct range is found by binary search. original TPC-H data and with custom views and indexes
that mimic the layout used by Castor. We compare against
5.3 Code Generation vanilla Hyper to show that layout specialization is a power-
Query code is generated according to the compilation strat- ful optimization that can compensate for the many low-level
egy described in [35]. This is referred to as push-based, or compiler optimizations in Hyper. We compare against Hy-
data-centric query evaluation. For each query operator, the per with specialized views to show that the specialization
code generator contains a function that emits the code that techniques that Castor uses are also beneficial in other
implements the operator. These functions take a callback as systems.
a parameter. They pass the variable containing the result tu-
ple to the callback, which emits code for the query operator 6.3 Results
that consumes the result. We found that using this strategy When evaluating the TPC-H queries, we used the 1Gb scale
instead of a traditional iterator model approach is critical for factor. We ran our benchmarks on an Intel® Xeon® E5-2470
getting good performance from the generated code. with 100Gb of memory.
Queries are compiled first to an internal IR, then lowered Runtime: The query runtime numbers in Table 1 show
to LLVM IR, and then to compiled to native code. that the layouts and query code generated by Castor are
faster or significantly faster than Hyper for 10 out of 16
6 Evaluation queries. In the cases where Castor is slower than Hyper,
In Sec. 2, we did a case study on a query from DemoMatch. In only one query is more than 3x slower.
this section we perform a systematic evaluation of Castor. If Hyper is given specialized views and indexes, then its
performance is on par with Castor. However, constructing
6.1 TPC-H Analytics Benchmark and maintaining these views takes effort, and Hyper can-
TPC-H is a standard database benchmark, focusing on analyt- not assist the user in creating a collection of views which
ics queries. It consists of a data generator, 22 query templates, maintains the semantics of the original query.
11
Memory Use: We also measured the peak memory use compiler was never developed to create the layouts from this
of the query process for Hyper and for Castor. Hyper con- language; the paper demonstrated its point by implementing
sistently used 4Gb of memory, regardless of the query. The each layout by hand.
results for Castor show that its peak memory use is gener- There have also been studies of physical layouts for other
ally low—less than 10Mb for 12 out of 16 queries. types of data, such as for scientific data [32], and geo-spatial
Layout Size: Finally, we recorded the size of the layouts data [17]. Although not directly comparable, we hope that
that Castor produced. The layouts were generally small— Castor can be extended to support those data types.
less than 10Mb for 9 out of 16 queries. The original data set, Materialized View and Index Selection. The layouts that
the output of the TPC-H data generator, is 1.1Gb. The size Castor generates are similar to materialized views, in that
difference between Castor’s layouts and the original data they store query results. Castor also generates layouts
supports the hypothesis that queries, even parameterized which contain indexes. Several problems related to the use of
queries, rely on fairly small subsets of the whole database, materialized views and indexes have been studied (see [18]
making layout specialization a profitable optimization. for a survey): (1) the view storage problem that decides
which views need to be materialized [8], (2) the view se-
6.4 Summary lection problem that selects view(s) that can answer a given
We showed that Castor produces artifacts that are com- query, (3) the query rewriting problem that rewrites the
petitive in performance and in size with a state-of-the-art given query based on the selected view(s) [26], (4) the index
in-memory database. These results show that database com- selection problem that selects an appropriate set of indexes
pilation is a compelling technique for improving query per- for a query [5, 16, 31, 36]. However, materialized views are re-
formance on static and slowly changing datasets. stricted to being flat relations. The layout space that Castor
supports is much richer than that supported by materialized
7 Related Work views and indexes. In addition, the view selection literature
has not previously considered the problem of generating
Deductive Synthesis There is a long line of work that uses execution plans for chosen views and indexes.
deductive synthesis and program transformation rules to
Query Compilation Castor uses techniques from the query
optimize programs [2, 27], to generate data structure im-
compilation literature [22, 30, 35] It uses information about
plementations [14], and to build performance DSLs [28, 34].
the layout to further specialize its compiled queries.
Castor is a part of this line of work: it is a performance
DSL which uses deduction rules to generate and optimize
layouts. However its focus on particular data sets and on us- 8 Conclusion and Future Work
ing deduction rules to optimize data in addition to programs We have presented Castor, a domain specific language for
separates it from previous work. expressing a wide variety of physical database designs, and a
compiler for this language. We have evaluated it empirically
Data Representation Synthesis. The layout optimization
and shown that it is competitive with the state-of-the-art in
problem is similar to the problem of synthesizing a data
memory database systems.
structure that corresponds to a relational specification [19,
One area of future work is to build a cost-based optimizer
20, 23, 24, 34]. Castor considers a restricted version of the
for the layout algebra that will choose an appropriate phys-
data structure synthesis problem where the query and the
ical layout. Another future work is to study the problem
dataset are known to the compiler, which allows Castor to
of sharing layouts between multiple queries. Last but not
use optimizations which would not be safe if the data was not
least, we also plan to expand the set of layouts. For example,
known. It also allows Castor to generate code and layouts
bitvectors could be added to store lists of Booleans efficiently
that are specialized to the dataset. This focus on the data in
or run-length encoding could be used to store lists of scalars.
addition to the query separates Castor from the existing
Layouts which store tiles of data together and allow indexing
work on data representation synthesis.
by 2D regions could be used to store spatial data.
Database Storage. Traditional databases are mostly row-
based. Column-based database systems (e.g., MonetDB [3] References
and C-Store [33]) are popular for OLAP applications, out- [1] Anastassia Ailamaki, David J. DeWitt, Mark D. Hill, and Marios Sk-
performing row-based approaches by orders of magnitude. ounakis. 2001. Weaving Relations for Cache Performance. In VLDB.
However, the existing work on database storage generally 169–180.
considers specific storage optimizations (e.g., [1]), rather [2] Lee Blaine, Limei Gilham, Junbo Liu, Douglas R. Smith, and Stephen
than languages for expressing diverse storage options. In Westfold. 1998. Planware-Domain-Specific Synthesis of High-
Performance Schedulers. In Automated Software Engineering, 1998.
this vein is RodentStore [12], which proposed a language to Proceedings. 13th IEEE International Conference On. IEEE, 270–279.
express rich types of storage layouts and showed that differ- [3] Peter A. Boncz and Martin L. Kersten. 1999. MIL Primitives for Query-
ent layouts could benefit different applications. However, a ing a Fragmented World. The VLDB Journal 8, 2 (Oct. 1999), 101–119.
12
PSQL Hyper Castor
Q# Time3 Time3 Time2 Mem. Size Time2 Mem. Size
1 16244 19 <1 15.2 17.8 <1 0.9 0.1
20 999 5 11 238.0 206.6 1 8.7 42.1
3 01 1942 17 22 877.1 966.8 7 16.8 81.0
4 943 8 <1 17.0 17.8 <1 0.8 0.3
5 935 11 4 26.0 24.1 <1 0.9 1.5
6 1996 12 12 900.5 858.8 6 3.3 31.6
7 1666 12 <1 16.3 17.8 <1 0.8 0.1
8 4238 7 17 529.5 375.4 9 13.8 67.1
9 5253 31 41 1580.2 1550.8 103 456.4 466.6
10 01 2686 17 <1 116.8 112.2 2 23.4 776.7
11 370 8 16 89.5 68.2 <1 1.0 5.3
12 2671 8 <1 15.4 17.8 <1 0.9 0.3
14 637 4 <1 16.0 17.8 <1 0.7 0.1
15 42 13 <1 14.7 17.8 <1 0.7 0.3
16 1 2139 38 3 32.7 35.7 33 2.0 1.2
17 273 11 7 1238.2 1224.7 <1 2.1 51.8
18 0 10299 42 12 388.7 295.7 76 101.8 103.5
19 4240 28 <1 19.5 17.8 <1 1.0 0.2
0
Limit clause removed. 1 Run time ordering removed. 2 Specialized.
3
Unspecialized.
Table 1. Runtime of queries derived from TPC-H (ms). Memory use is the peak resident set size during a query (Mb). Size is
the layout size (Mb).
https://doi.org/10.1007/s007780050076 [12] Philippe Cudre-Mauroux, Eugene Wu, and Sam Madden. 2009. The
[4] Fabiano C. Botelho, Rasmus Pagh, and Nivio Ziviani. 2007. Simple and Case for RodentStore, an Adaptive, Declarative Storage System. In
Space-Efficient Minimal Perfect Hash Functions. In Algorithms and CIDR. arXiv:0909.1779 http://arxiv.org/abs/0909.1779
Data Structures: 10th International Workshop, WADS 2007 (Theoretical [13] Davi de Castro Reis, Djamel Belazzougui, Fabiano Cupertino Botelho,
Computer Science and General Issues), Vol. 4619. Springer, Halifax, and Nivio Ziviani. 2011. CMPH: C Minimal Perfect Hashing Library.
Canada, 139–150. http://cmph.sourceforge.net
[5] Nicolas Bruno and Surajit Chaudhuri. 2005. Automatic Physical [14] Benjamin Delaware, Clément Pit-Claudel, Jason Gross, and Adam
Database Tuning: A Relaxation-Based Approach. In Proceedings of Chlipala. 2015. Fiat: Deductive Synthesis of Abstract Data Types in a
the 2005 ACM SIGMOD International Conference on Management of Proof Assistant. In ACM SIGPLAN Notices, Vol. 50. ACM, 689–700.
Data (SIGMOD ’05). ACM, New York, NY, USA, 227–238. https: [15] Goetz Graefe. 1994. Volcano/Spl Minus/an Extensible and Parallel
//doi.org/10.1145/1066157.1066184 Query Evaluation System. IEEE Transactions on Knowledge and Data
[6] Surajit Chaudhuri. 1998. An Overview of Query Optimization in Engineering 6, 1 (1994), 120–135.
Relational Systems. In Proceedings of the Seventeenth ACM SIGACT- [16] H. Gupta, V. Harinarayan, A. Rajaraman, and J. D. Ullman. 1997. Index
SIGMOD-SIGART Symposium on Principles of Database Systems (PODS Selection for OLAP. In Proceedings 13th International Conference on
’98). ACM, New York, NY, USA, 34–43. https://doi.org/10.1145/275487. Data Engineering. IEEE Computer Society, Junglee Corp., Palo Alto,
275492 CA., 208–219. https://doi.org/10.1109/ICDE.1997.581755
[7] Alvin Cheung, Armando Solar-Lezama, and Samuel Madden. 2013. [17] Angélica García Gutiérrez and Peter Baumann. 2007. Modeling Fun-
Optimizing Database-Backed Applications with Query Synthesis. ACM damental Geo-Raster Operations with Array Algebra. In Workshops
SIGPLAN Notices 48, 6 (2013), 3–14. http://dl.acm.org/citation.cfm? Proceedings of the 7th IEEE International Conference on Data Mining
id=2462180 (ICDM 2007), October 28-31, 2007, Omaha, Nebraska, USA. 607–612.
[8] Rada Chirkova and Michael R. Genesereth. 2000. Linearly Bounded https://doi.org/10.1109/ICDMW.2007.53
Reformulations of Conjunctive Databases. In Computational Logic - [18] Alon Y. Halevy. 2001. Answering queries using views: A survey. VLDB
CL 2000, First International Conference, London, UK, 24-28 July, 2000, J. 10, 4 (2001), 270–294. https://doi.org/10.1007/s007780100054
Proceedings. 987–1001. https://doi.org/10.1007/3-540-44957-4_66 [19] Peter Hawkins, Alex Aiken, Kathleen Fisher, Martin Rinard, and Mooly
[9] E. F. Codd. 1970. A Relational Model of Data for Large Shared Data Sagiv. 2010. Data Structure Fusion. In Programming Languages and Sys-
Banks. Commun. ACM 13, 6 (June 1970), 377–387. https://doi.org/10. tems (Lecture Notes in Computer Science). Springer, Berlin, Heidelberg,
1145/362384.362685 204–221. https://doi.org/10.1007/978-3-642-17164-2_15
[10] Edgar F. Codd. 1971. A Data Base Sublanguage Founded on the Relat- [20] Peter Hawkins, Alex Aiken, Kathleen Fisher, Martin Rinard, and Mooly
ional Calculus. In Proceedings of the 1971 ACM SIGFIDET (Now SIGMOD) Sagiv. 2011. Data Representation Synthesis. In Proceedings of the
Workshop on Data Description, Access and Control. ACM, 35–68. 32Nd ACM SIGPLAN Conference on Programming Language Design and
[11] Transaction Processing Performance Council. 2008. TPC-H Benchmark Implementation (PLDI ’11). ACM, New York, NY, USA, 38–49. https:
Specification. 21 (2008), 592–603. //doi.org/10.1145/1993498.1993504
13
[21] Matthias Jarke and Jurgen Koch. 1984. Query Optimization in Database International Conference on Extending Database Technology: Advances
Systems. Comput. Surveys 16, 2 (June 1984), 111–152. https://doi.org/ in Database Technology. ACM, 311–322. http://dl.acm.org/citation.
10.1145/356924.356928 cfm?id=1353383
[22] Yannis Klonatos, Christoph Koch, Tiark Rompf, and Hassan Chafi. [37] Kuat Yessenov, Ivan Kuraj, and Armando Solar-Lezama. 2017. Demo-
2014. Building Efficient Query Engines in a High-Level Language. Match: API Discovery from Demonstrations. In PLDI. ACM, Barcelona,
Proceedings of the VLDB Endowment 7, 10 (2014), 853–864. http://dl. Spain, 15. https://doi.org/10.1145/3062341.3062386
acm.org/citation.cfm?id=2732959
[23] Calvin Loncaric, Michael D. Ernst, and Emina Torlak. 2018. General-
ized Data Structure Synthesis. In Proceedings of the 40th International
Conference on Software Engineering. ACM, 958–968.
[24] Calvin Loncaric, Emina Torlak, and Michael D. Ernst. 2016. Fast Syn-
thesis of Fast Collections. In Proceedings of the 37th ACM SIGPLAN Con-
ference on Programming Language Design and Implementation (PLDI
’16). ACM, New York, NY, USA, 355–368. https://doi.org/10.1145/
2908080.2908122
[25] Thomas Neumann. 2011. Efficiently Compiling Efficient Query Plans
for Modern Hardware. Proc. VLDB Endow. 4, 9 (June 2011), 539–550.
https://doi.org/10.14778/2002938.2002940
[26] Rachel Pottinger and Alon Y. Levy. 2000. A Scalable Algorithm for
Answering Queries Using Views. In VLDB 2000, Proceedings of 26th
International Conference on Very Large Data Bases, September 10-14,
2000, Cairo, Egypt. 484–495. http://www.vldb.org/conf/2000/P484.pdf
[27] M. Puschel, J. M. F. Moura, J. R. Johnson, D. Padua, M. M. Veloso,
B. W. Singer, Jianxin Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K.
Chen, R. W. Johnson, and N. Rizzolo. 2005. SPIRAL: Code Generation
for DSP Transforms. Proc. IEEE 93, 2 (Feb. 2005), 232–275. https:
//doi.org/10.1109/JPROC.2004.840306
[28] Jonathan Ragan-Kelley, Connelly Barnes, Andrew Adams, Sylvain
Paris, Frédo Durand, and Saman Amarasinghe. 2013. Halide: A Lan-
guage and Compiler for Optimizing Parallelism, Locality, and Recom-
putation in Image Processing Pipelines. ACM SIGPLAN Notices 48, 6
(2013), 519–530.
[29] Tiark Rompf and Nada Amin. 2015. Functional Pearl: A SQL to C
Compiler in 500 Lines of Code. In Proceedings of the 20th ACM SIGPLAN
International Conference on Functional Programming (ICFP 2015). ACM,
New York, NY, USA, 2–9. https://doi.org/10.1145/2784731.2784760
[30] Amir Shaikhha, Yannis Klonatos, Lionel Parreaux, Lewis Brown, Mo-
hammad Dashti, and Christoph Koch. 2016. How to Architect a Query
Compiler. In Proceedings of the 2016 International Conference on Man-
agement of Data (SIGMOD ’16). ACM, New York, NY, USA, 1907–1922.
https://doi.org/10.1145/2882903.2915244
[31] Michael Stonebraker. 1974. The choice of partial inversions and com-
bined indices. International Journal of Parallel Programming 3, 2 (1974),
167–188. https://doi.org/10.1007/BF00976642
[32] Michael Stonebraker. 2012. SciDB: An Open-Source DBMS for Scien-
tific Data. ERCIM News 2012, 89 (2012). http://ercim-news.ercim.eu/
en89/special/scidb-an-open-source-dbms-for-scientific-data
[33] Mike Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch
Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden,
Elizabeth O’Neil, and others. 2005. C-Store: A Column-Oriented DBMS.
In Proceedings of the 31st International Conference on Very Large Data
Bases. VLDB Endowment, 553–564. http://dl.acm.org/citation.cfm?
id=1083658
[34] Arvind K. Sujeeth, Kevin J. Brown, Hyoukjoong Lee, Tiark Rompf,
Hassan Chafi, Martin Odersky, and Kunle Olukotun. 2014. Delite: A
Compiler Architecture for Performance-Oriented Embedded Domain-
Specific Languages. ACM Trans. Embed. Comput. Syst. 13, 4s (April
2014), 134:1–134:25. https://doi.org/10.1145/2584665
[35] Ruby Y. Tahboub, Grégory M. Essertel, and Tiark Rompf. 2018. How
to Architect a Query Compiler, Revisited. In Proceedings of the 2018
International Conference on Management of Data. ACM, 307–322.
[36] Zohreh Asgharzadeh Talebi, Rada Chirkova, Yahya Fathi, and Matthias
Stallmann. 2008. Exact and Inexact Methods for Selecting Views and
Indexes for OLAP Performance Improvement. In Proceedings of the 11th
14
A Appendix
select([lp.enter , lc.enter ],
tuplecross ([hash-idx(select([id as k], loд),
list(select([enter , exit],
filter(k = id ∧ enter > exit, loд)),
tuplecross ([scalar(enter ), scalar(exit)])),
idp ) as lp,
filter(lc.id = idc ,
ordered-idx(select([enter as k], loд),
list(filter(loд.enter = k, loд),
tuplecross ([scalar(id), scalar(enter )])),
lp.enter , lp.exit) as lc)]))
15