Newsql: Big Data Management Phil Bartie
Newsql: Big Data Management Phil Bartie
Database Landscape
RDF Virtuoso Object XML Relational Oracle
Jena Caché MarkLogic
Stardog Db4o MySQL
RDF4J
Versant
Sedna MS SQL Server
Tamino
GraphDB BaseX
Blazegraph ObjectStore eXist-db
PostgreSQL DB2
SQLite
MS Access Teradata
NoSQL
SAP Adaptive Server
Key-Value Redis Document MongoDB Hive
Memcached DynamoDB FileMaker
Riak KV MariaDB
Aerospike CouchBase
SimpleDB Elasticsearch Informix Vertica
https://mattturck.com/data2021/ 6
Database Universe – value of data over time
VoltDB Co-Founder and Chief Strategy Officer Scott Jarr on the value of data in real-time.
https://www.youtube.com/watch?v=w2hCJIZz3n8&feature=youtu.be&t=186
7
Big Data Management
History in No-tation
http://strataconf.com/london/public/schedule/detail/32351 8
Big Data Management
Security and
cataloguing
Magnetic disk
https://www.cs.oberlin.edu/~jdonalds/311/fig02-01.png
10
Big Data Management
Work
4%
http://blog.jooq.org/2013/08/24/mit-prof-michael-stonebraker-the-traditional-rdbms-wisdom-is-all-wrong/ 11
Big Data Management
Logging
20%
Buffer Management
29%
Locking
18%
Latching
10%
https://downloads.voltdb.com/datasheets_collateral/technical_overview.pdf 12
Big Data Management
http://hadoop4japan.files.wordpress.com/2012/04/kb_scale_outup.png
13
Big Data Management
Column-oriented DBMS
14
Big Data Management
001:Wednesday,Addams,01412318743,[email protected];002:Bart,
Simpson,0131444777,[email protected];003:Lisa,Simpson,NULL,
[email protected];
004:Pugsley,Addams,01911211538,[email protected];
How many disk blocks need to be accessed to count the number of Simpson’s ?
17
Big Data Management
Wednesday:001,Bart:002,Lisa:003,Pugsley:004;Addams:001,004,
Simpson:002,003;01412318743:001,0131444777:002,
NULL:003,01911211538:004;
[email protected]:001,[email protected]:002,[email protected]:0
03,[email protected]:004;
How many disk blocks need to be accessed to count the number of
‘Simpson’?
18
Big Data Management
Hardware Affects
Traditional HW Modern HW
Slow CPU Fast CPUs
Multiple caches
https://www.youtube.com/watch?v=8KGVFB3kVHQ
Big Data Management
Information Retrieval
Efficient use of indexes
Scientific databases
e.g. SLOAN Digital Sky Survey
Wide tuples, focus on small number at a time
RDF
Matches query patterns based on graphs
21
Big Data Management
C-Store
22
Big Data Management
New SQL
23
Big Data Management
Source: https://451research.com/report-short?entityId=66963
24
Big Data Management
Stonebraker’s Definition
SQL as primary interface
ACID support for transactions
Non-locking concurrency control
High per-node performance
Parallel, shared-nothing architecture
M. Stonebraker. New SQL: An Alternative to NoSQL and Old SQL for New OLTP Apps. Blog@Communications of the ACM. 16 June
2011. http://cacm.acm.org/blogs/blog-cacm/109710-new-sql-an-alternative-to-nosql-and-old-sql-for-new-oltp-apps/fulltext
http://hadoop4japan.files.wordpress.com/2012/04/kb_scale_outup.png
25
Big Data Management
Application Areas
OLTP: Online Transaction Processing
Financial
ATMs
Order systems
Retail sales
…
Day-to-day operations
Requires ACID guarantees
What about CAP?
http://imagens.canaltech.com.br/50452.69272-
OLTP-OLAP.png
26
Big Data Management
Key requirements
Strong transactional guarantees – ACID
Not offered by most NoSQL
Consistency requirements
Scale-out not scale-up
Support distribution through
Concurrency control
Flow control
Query processing http://hadoop4japan.files.wordpress.com/2012/04/kb_scale_outup.png
27
Big Data Management
Transaction Bottlenecks
Disk reads/writes
Persist data
Undo/redo logs
T2 T1
Network Communications
Intra-node
Client-server T3
Concurrency control
Locking: restricts concurrent data
access
Processing
Latching: restricts concurrent index
access 28
Big Data Management
Solutions Strategy
Short-lived: Precompile transactions
No user interaction Parameterized
Time limit (5-10 seconds) Serialize transactions
Access (small) subset of data No delays for disk or user
Data must be indexed
No full table scans
Repetitive interactions
Same queries with different inputs
29
Big Data Management
VoltDB Solution
Work Index Management
12% 11%
• Replication
• Failover
In-memory Logging
20%
Buffer Management
29%
Locking • Timestamp CC
Single threaded Latching
18%
• MVCC
transactions 10% - multi version concurrency control
https://downloads.voltdb.com/datasheets_collateral/technical_overview.pdf 30
Big Data Management
Volt DB Video
Ryan Betts, VoltDB's Chief Technology Officer speaks in-depth about VoltDB's architecture.
https://www.youtube.com/watch?v=XPA7zNRshRU 31
Big Data Management
System Architecture
Redesigned for latest hardware
32
Big Data Management
Multiple copies
No buffer pools
Avoid locking (serialised transactions)
Minimise logging: DDL statements only
33
Big Data Management
Limitations
Latency: still going to wait to get data
Limited by speed of light
Limited by bandwidth
SQL subset
No negated sub-queries
Subset of aggregates
34
Big Data Management
A P
NewSQL Systems
36
Variety of architectures
NuoDB: MVCC + append only key-value storage
VoltDB: In-memory, multi-master replication, with disk snapshots
ScaleDB: Cluster manager for MySQL with locks
FoundationDB: Key-value storage with SQL and ACID layer
Clustrix: self-managed MySQL replacement for a cluster
MemSQL: Column-oriented disk persistence, in memory row
oriented, lock-free, compiled plans. Replicas
Big Data Management
https://medium.com/@dilekamadushan/introduction-to-mysql-cluster-84393a12e2f9 38
Big Data Management
Postgres-XL is a horizontally scalable open source SQL database cluster, flexible enough to
handle varying database workloads:
OLTP write-intensive workloads
Business Intelligence requiring MPP parallelism
Operational data store
Key-value store
GIS Geospatial
Mixed-workload environments
Multi-tenant provider hosted environments
39
Big Data Management
Citus
https://www.citusdata.com/product
40
Big Data Management
41
Big Data Management
SQL
https://www.holistics.io/blog/the-rise-of-sql-based-data-modeling-and-dataops/
42
Big Data Management
https://query-tutorial.couchbase.com/tutorial/#1 43
Big Data Management
44
Big Data Management
Comparison
46
Big Data Management
Summary
RDBMS evolution due to
Out-of-date hardware
assumptions
First relational column-stores
(different from wide-column stores)
Second NewSQL
NewSQL: Reaction to NoSQL
Scale-out not up
Replication
48
Big Data Management
Benefits Disadvantages
OLAP OLTP
Data warehouses Row-based operations
Column-based operations: New record
Retrieve multiple values
Aggregate queries
Adjust price of values
Reduced storage space
S. Harizopoulos, D. Abadi and P. Boncz, "Column-Oriented Database Systems", VLDB 2009 Tutorial, p. 5.
http://www.cs.yale.edu/homes/dna/talks/Column_Store_Tutorial_VLDB09.pdf
49
Big Data Management
• https://www.youtube.com/watch?v=wTPGW1PNy_Y&app=desktop
• https://www.couchbase.com/products/n1ql
• https://query-tutorial.couchbase.com/tutorial/#1
51