Breaking the Memory Wall in MonetDB Presented ByM.
Sheshasai Reddy (07D91A1225)
CSE 704 1
Memory Wall
Growing disparity of speed between CPU and memory outside CPU chip
20-40 % of instructions in a program reference memory It is a performance bottleneck to access main memory
CSE 704
Want to break the Memory Wall?
Average time to access memory can be written as: tavg=p x tc + (1-p) x tm tc = time to access cache tm = DRAM access time
where p = probability of a cache hit
To reduce tavg, increase p (or reduce tc)
Exploit cache
CSE 704 3
Exploit the power of cache by :
Vertical Storage Bulk query Algebra Cache conscious algorithms Memory Access cost modeling
CSE 704
MonetDB
1994 - Now Open source Column Store Has been successfully applied in highperformance applications for data mining, OLAP, GIS, XML Query, text and multimedia retrieval.
CSE 704
MonetDB Architecture
CSE 704
Exploit the power of cache by :
Vertical Storage
Bulk query Algebra Cache conscious algorithms Memory Access cost modeling
CSE 704
MonetDB- Vertical storage
ID 10 11 12 13 Day Discount 4/4/98 0.195 9/4/98 0.065 1/2/98 0.175 7/2/98 0
OID
100 101 102 103 104
ID 10 11 12 13 14
OID
Day 100 4/4/98 101 9/4/98 102 1/2/98 103 7/2/98 104 1/2/99
OID
Discount 100 0.195 101 0.065 102 0.175 103 0 104 0.065
CSE 704
MonetDB-Vertical Storage
Uses Decomposed Storage Model (DSM) Each column stored in a separate Binary Association Table (BAT) BAT <surrogate,value> <head, tail>
Two simple memory arrays
Internally, columns are stored using memory-mapped files Provides O(1) database lookup mechanism
CSE 704
Exploit the power of cache by :
Vertical Storage
Bulk query Algebra Cache conscious algorithms
Memory Access cost modeling
CSE 704
10
Bulk Query Algebra
Low-level algebra called BAT Algebra Breaks complex expressions into a sequence of BAT algebra operators Implementing these operations are simple array operations No need of Expression Interpreting engine
CSE 704
11
Example
BAT algebra expression R:bat[:oid, :oid]:=select(B:bat[:oid,:int], V:int) C code implementing the above expression for (i = j = 0; i <n; i++)
if (B.tail[i] == V)
R.tail[j++] = i;
CSE 704
12
MonetDB Vertical Storage
Advantage:
For loop creates high instruction locality and thus eliminates the cache miss problem Materialize intermediate results
Disadvantage:
CSE 704
13
Exploit the power of cache by :
Vertical Storage
Bulk query Algebra Cache conscious algorithms Memory Access cost modeling
CSE 704
14
MonetDB- Cache concious algorithms
Partitioned Hash Join
Relations partitioned into H separate clusters Each cluster fits in L2 memory cache
Problem
Number of clusters (H) exceeds the number of - TLB entries TLB thrashing - Cache lines Cache thrashing
Solution
-Multi-pass radix cluster
CSE 704 15
Multipass radix cluster
Multiple clustering passes
Limit the number of clusters per pass
Avoids TLB/Cache thrashing
CSE 704
16
Exploit the power of cache by :
Vertical Storage Bulk query Algebra Cache conscious algorithms Memory Access cost modeling
CSE 704
17
Memory Access cost modeling
Cost models are created that take the cost of memory access into account Main goal is to predict the number and kind of cache misses for all cache levels. Complex data access patterns of database algorithms are studied.
CSE 704
18
Basic Access Patterns
Single sequential traversal s_trav(R) Single random traversal r_trav(R) Repetitive sequential traversal rs_trav(r,d,R) Repetitive random traversal rr_trav(r,d,R) R= region representing a database table
CSE 704
19
Compound Access Patterns
Database operations accessing more than one data region
Can be created by combining basic access patterns Basic access pattern can be combined either
Sequentially Concurrently Example: s_trav(U) s_trav(W) for W<- select(U)
CSE 704
20
Sequential Execution
Total number of cache misses is at most the sum of the cache misses of all patterns
However if the data region for patterns is same, cache miss can be saved
Concurrent Execution
Patterns are competing for the same cache
Considering this, total number of cache misses are calculated
CSE 704 21
Conclusion
MonetDB has redefined database architecture in the face of an ever-changing computer architecture
Principle of Vertical storage is currently being adopted by many commercial systems HBase, C-Store, Apache Cassandra are some of the other column stores
CSE 704
22