0% found this document useful (0 votes)
16 views45 pages

Newsql: Big Data Management Phil Bartie

explain this like explaining to 15yrs old girl remember that i'm having examination in next 1 hour so please make sure to cover all the key point in the document

Uploaded by

aksshu1902
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views45 pages

Newsql: Big Data Management Phil Bartie

explain this like explaining to 15yrs old girl remember that i'm having examination in next 1 hour so please make sure to cover all the key point in the document

Uploaded by

aksshu1902
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

NewSQL

Big Data Management


Phil Bartie
[email protected]
EM G.29
Big Data Management

Database Landscape
RDF Virtuoso Object XML Relational Oracle
Jena Caché MarkLogic
Stardog Db4o MySQL
RDF4J
Versant
Sedna MS SQL Server
Tamino
GraphDB BaseX
Blazegraph ObjectStore eXist-db
PostgreSQL DB2
SQLite
MS Access Teradata
NoSQL
SAP Adaptive Server
Key-Value Redis Document MongoDB Hive
Memcached DynamoDB FileMaker
Riak KV MariaDB
Aerospike CouchBase
SimpleDB Elasticsearch Informix Vertica

Wide-Column Graph Neo4J NewSQL SAP HANA


Cassandra Titan Google Spanner
HBase Giraph Clustrix
Accumulo
HyperTable InfiniteGraph VoltDB MemSQL NuoDB
4
5
2021

https://mattturck.com/data2021/ 6
Database Universe – value of data over time

VoltDB Co-Founder and Chief Strategy Officer Scott Jarr on the value of data in real-time.

https://www.youtube.com/watch?v=w2hCJIZz3n8&feature=youtu.be&t=186

7
Big Data Management

History in No-tation

http://strataconf.com/london/public/schedule/detail/32351 8
Big Data Management

Traditional DBMS Architecture


Transactions Schema
Query management
compilation
Logging
Query execution

Security and
cataloguing

Request data transfer to


working memory Concurrency
control
Optimise data distribution on (locks)
disk and access

Magnetic disk
https://www.cs.oberlin.edu/~jdonalds/311/fig02-01.png
10
Big Data Management

Work performed by RDBMS


Locking Latching
24% 24%
Do you spot any issues
with this data? Latching: protects memory
data structures from
Do you trust it? concurrent threads

Buffer Pool Recovery


24% 24%

Work
4%

http://blog.jooq.org/2013/08/24/mit-prof-michael-stonebraker-the-traditional-rdbms-wisdom-is-all-wrong/ 11
Big Data Management

Work performed by RDBMS


Work Index Management
12% 11%

Logging
20%

Buffer Management
29%

Locking
18%
Latching
10%

https://downloads.voltdb.com/datasheets_collateral/technical_overview.pdf 12
Big Data Management

Traditional RDBMS Assumptions


Operating on single machine
Data on disk – disk access slow
Limited memory
Single copy of each record
Many subsystems for management!

http://hadoop4japan.files.wordpress.com/2012/04/kb_scale_outup.png
13
Big Data Management

Column-oriented DBMS

14
Big Data Management

Row Based Storage


Firstname Lastname Number Email
001 Wednesday Addams 01412318743 [email protected]
004 Pugsley Addams 01911211538 [email protected]
002 Bart Simpson 0131444777 [email protected]
003 Lisa Simpson NULL [email protected]

001:Wednesday,Addams,01412318743,[email protected];002:Bart,
Simpson,0131444777,[email protected];003:Lisa,Simpson,NULL,
[email protected];

004:Pugsley,Addams,01911211538,[email protected];

How many disk blocks need to be accessed to count the number of Simpson’s ?
17
Big Data Management

Column Based Storage


Firstname Lastname Number Email
001 Wednesday Addams 01412318743 [email protected]
004 Pugsley Addams 01911211538 [email protected]
002 Bart Simpson 0131444777 [email protected]
003 Lisa Simpson NULL [email protected]

Wednesday:001,Bart:002,Lisa:003,Pugsley:004;Addams:001,004,
Simpson:002,003;01412318743:001,0131444777:002,
NULL:003,01911211538:004;

[email protected]:001,[email protected]:002,[email protected]:0
03,[email protected]:004;
How many disk blocks need to be accessed to count the number of
‘Simpson’?
18
Big Data Management

Hardware Affects

Traditional HW Modern HW
 Slow CPU  Fast CPUs
 Multiple caches

 Limited memory  Large memory

 Large spinning disks  Flash disk


 Slow access  Fast access

 Favours row storage!  Favours column storage! 19


Column-Orientated DBMS

https://www.youtube.com/watch?v=8KGVFB3kVHQ
Big Data Management

Column Oriented Applications


 Data warehousing
 Mainly read operations

 Information Retrieval
 Efficient use of indexes

 Scientific databases
e.g. SLOAN Digital Sky Survey
 Wide tuples, focus on small number at a time

 RDF
 Matches query patterns based on graphs
21
Big Data Management

Column Orientated Relational Database Systems

C-Store

22
Big Data Management

New SQL

23
Big Data Management

451 Group’s Definition of New SQL

A DBMS that delivers the scalability and flexibility promised by


NoSQL while retaining the support for SQL queries and/or ACID,
or to improve performance for appropriate workloads.

Source: https://451research.com/report-short?entityId=66963

24
Big Data Management

Stonebraker’s Definition
 SQL as primary interface
 ACID support for transactions
 Non-locking concurrency control
 High per-node performance
 Parallel, shared-nothing architecture
M. Stonebraker. New SQL: An Alternative to NoSQL and Old SQL for New OLTP Apps. Blog@Communications of the ACM. 16 June
2011. http://cacm.acm.org/blogs/blog-cacm/109710-new-sql-an-alternative-to-nosql-and-old-sql-for-new-oltp-apps/fulltext

http://hadoop4japan.files.wordpress.com/2012/04/kb_scale_outup.png
25
Big Data Management

Application Areas
 OLTP: Online Transaction Processing
 Financial
 ATMs
 Order systems
 Retail sales
 …
Day-to-day operations
 Requires ACID guarantees
 What about CAP?
http://imagens.canaltech.com.br/50452.69272-
OLTP-OLAP.png

26
Big Data Management

Key requirements
 Strong transactional guarantees – ACID
 Not offered by most NoSQL

 Consistency requirements
 Scale-out not scale-up
 Support distribution through
 Concurrency control
 Flow control
 Query processing http://hadoop4japan.files.wordpress.com/2012/04/kb_scale_outup.png
27
Big Data Management

Transaction Bottlenecks
 Disk reads/writes
 Persist data
 Undo/redo logs
T2 T1
 Network Communications
 Intra-node
 Client-server T3
 Concurrency control
 Locking: restricts concurrent data
access
Processing
 Latching: restricts concurrent index
access 28
Big Data Management

Transactions Revisited Solutions

Solutions Strategy
 Short-lived:  Precompile transactions
 No user interaction  Parameterized
 Time limit (5-10 seconds)  Serialize transactions
 Access (small) subset of data  No delays for disk or user
 Data must be indexed
 No full table scans
 Repetitive interactions
 Same queries with different inputs
29
Big Data Management

VoltDB Solution
Work Index Management
12% 11%

• Replication
• Failover
In-memory Logging
20%

Buffer Management
29%

Locking • Timestamp CC
Single threaded Latching
18%
• MVCC
transactions 10% - multi version concurrency control

https://downloads.voltdb.com/datasheets_collateral/technical_overview.pdf 30
Big Data Management

Volt DB Video

Ryan Betts, VoltDB's Chief Technology Officer speaks in-depth about VoltDB's architecture.

https://www.youtube.com/watch?v=XPA7zNRshRU 31
Big Data Management

System Architecture
Redesigned for latest hardware

32
Big Data Management

New Architecture Assumptions


 Distributed shared-nothing cluster
 Move queries to the data
 Aim for stateless query execution

 Data resides in memory


 Backed-up to disk

 Multiple copies
 No buffer pools
 Avoid locking (serialised transactions)
 Minimise logging: DDL statements only
33
Big Data Management

Limitations
 Latency: still going to wait to get data
 Limited by speed of light
 Limited by bandwidth

 SQL subset
 No negated sub-queries
 Subset of aggregates

 Ad hoc schema changes

34
Big Data Management

Where are NewSQL?


CA : Guarantees
But (like to give a
Dynamo), CP: Guarantees
correct response but only while responses are correct even
tables are tunable
C
network works fine if there are network
towards CP
(Centralised / Traditional) failures, but response may
fail (Weak availability)

A P

AP: Always provides a “best-effort”


response even in presence of network failures
(Eventual consistency)
35
Big Data Management

NewSQL Systems

36
Variety of architectures
 NuoDB: MVCC + append only key-value storage
 VoltDB: In-memory, multi-master replication, with disk snapshots
 ScaleDB: Cluster manager for MySQL with locks
 FoundationDB: Key-value storage with SQL and ACID layer
 Clustrix: self-managed MySQL replacement for a cluster
 MemSQL: Column-oriented disk persistence, in memory row
oriented, lock-free, compiled plans. Replicas
Big Data Management

Distributed DBMS: mySQL cluster


 MySQL Cluster is implemented through the NDB or
NDBCLUSTER storage engine for MySQL ("NDB" stands
for Network Database)

 The MySQL Cluster is a fault tolerant in-memory clustered


database designed for high availability (99.9%) and fast
automatic fail
overall running on cost-effective commodity hardware.

https://medium.com/@dilekamadushan/introduction-to-mysql-cluster-84393a12e2f9 38
Big Data Management

Distributed DBMS: Postgresql-XL

 Postgres-XL is a horizontally scalable open source SQL database cluster, flexible enough to
handle varying database workloads:
 OLTP write-intensive workloads
 Business Intelligence requiring MPP parallelism
 Operational data store
 Key-value store
 GIS Geospatial
 Mixed-workload environments
 Multi-tenant provider hosted environments
39
Big Data Management

Citus
 https://www.citusdata.com/product

40
Big Data Management

HEAVY.AI (was OmniSci)


 GPU databases https://www.omnisci.com/demos/tweetmap

41
Big Data Management

SQL

 https://www.holistics.io/blog/the-rise-of-sql-based-data-modeling-and-dataops/
42
Big Data Management

Couchbase SQL++ / N1QL – (nikel)

 SQL++ used to be called N1QL (pronounced “nickel”) is Couchbase’s next-


generation query language. SQL++ aims to meet the query needs of distributed
document-oriented databases. This document specifies the syntax and
semantics of the SELECT statement in SQL++.
 The SQL++ data model derives its name from the non-first normal form,
which is a superset and generalization of the relational first normal form
(1NF).

https://query-tutorial.couchbase.com/tutorial/#1 43
Big Data Management

Couchbase SQL ++ example

44
Big Data Management

Comparison

46
Big Data Management

Source: Almassabi1, IJDMS:10(2), 2018


47
Big Data Management

Summary
RDBMS evolution due to
 Out-of-date hardware
assumptions
 First relational column-stores
(different from wide-column stores)
 Second NewSQL
 NewSQL: Reaction to NoSQL
 Scale-out not up
 Replication

48
Big Data Management

Relational Column Orientated Databases

Benefits Disadvantages
 OLAP  OLTP
 Data warehouses  Row-based operations
 Column-based operations:  New record
 Retrieve multiple values
 Aggregate queries
 Adjust price of values
 Reduced storage space
S. Harizopoulos, D. Abadi and P. Boncz, "Column-Oriented Database Systems", VLDB 2009 Tutorial, p. 5.
http://www.cs.yale.edu/homes/dna/talks/Column_Store_Tutorial_VLDB09.pdf
49
Big Data Management

New SQL (response to NoSQL)


 Distributed shared-nothing cluster
 Scale-out
 Data resides in memory
 Fast transactions
 No user interaction
 Tight upper time limits
 ACID guarantees
 Target OLTP
50
Big Data Management

Links to check out:

Why SQL is awesome (video)

• https://www.youtube.com/watch?v=wTPGW1PNy_Y&app=desktop

SQL++ / N1QL – CouchBase (web page and interactive shell)

• https://www.couchbase.com/products/n1ql

• https://query-tutorial.couchbase.com/tutorial/#1

51

You might also like