11-NoSQL Nhom8
11-NoSQL Nhom8
Systems
TS. Phan Thị Hà
◼ Trends
❑ Big data
◼ Unstructured data
❑ Data interconnexion
◼ Hyperlinks, tags, blogs, etc.
❑ Very high scalability
◼ Data size, numbers of users
◼ Limits of relational DBMSs (SQL)
❑ Need for skilled DBA and well-defined schemas
❑ SQL and complex tuning
❑ Hard to make updates scalable
◼ Parallel RDBMS use a shared-disk for OLTP
◼ The CAP theorem
◼ Polemical topic
❑ “A database can't provide consistency AND availability during a
network partition”
❑ Argument used by NoSQL to justify their lack of ACID properties
❑ But has nothing to do with scalability
◼ Two different points of view
❑ Relational databases
◼ Consistency is essential
❑ ACID transactions
❑ Distributed systems
◼ Service availability is essential
❑ Inconsistency tolerated by the user, e.g. web cache
Client Client
AP ok
C non ok
DB1 DB2
❑ NoSQL systems
❑
◼ Main applications
❑ Document systems
❑ Content Management Systems
❑ Catalogs
❑ Personalization
❑ Analysis of messages (tweets, etc.) in real time
❑ Etc.
◼ Documents
❑ Hierarchical structure, with nesting of elements
❑ Weak structuring, with "similar" elements
❑ Base types: text, but also integer, real, date, etc.
◼ Two main data models
❑ XML (eXtensible Markup Language): W3C standard (1998) for
exchanging data on the Web
◼ Complex and heavy
❑ JSON (JavaScript Object Notation) by Douglas Crockford (2005)
for exchanging data JavaScript
◼ Simple and light
❑ NewSQL systems
❑
◼ SQL/JSON DBMS
❑ Access from a JDBC driver
◼ Key-value store (KiVi)
❑ Dual SQL/KV interface over relational data with efficiency,
elasticity, high availability, indexing, …
❑ Fast, parallel data ingestion
❑ Polystore access: HDFS, NoSQL, …
◼ OLAP parallel processing
❑ Based on the Apache Calcite optimizer
❑ Extensive push down of operators to KiVi
◼ Ultra-scalable transaction processing
Query Engine
independent scale out
KV Store KV Master
(KiVi) Server
KV Data KV Data KV Data
Server Server Server
Processes &
commits
Single-node bottleneck transactions
in parallel
Provides a
consistent
vs view
Time Time
VoltDB Inc. VoltDB Analytics and Open source and proprietary versions
transactional In-memory
❑ Polystores
◼ Multidatabase systems
❑ A few databases (e.g. less than 10)
◼ Corporate DBs
❑ Powerful queries (with updates and transactions)
◼ Web data integration systems
❑ Many data sources (e.g. 1000’s)
◼ DBs or files behind a web server
❑ Simple queries (read-only)
◼ Mediator/wrapper architecture
◼ In the cloud, more opportunities for an efficient multistore
architecture
❑ No restriction to where mediator and wrapper components need
be installed
◼ Examples
❑ BigIntegrator (Uppsala University)
❑ Forward (UC San Diego)
❑ QoX (HP Labs)
◼ Supports SQL++
❑ SQL-like language
❑ Semi-structured data model
◼ Extends JSON and relational data models
◼ Rich web development frameworks
❑ Integrate visualization components (e.g., Google Maps)
◼ Global As View approach for the global schema
❑ Each data source (SQL or NoSQL) appears to the user as an
SQL++ virtual view, defined over SQL++ collections
◼ Architecture
❑ Query processor
◼ Performs SQL++ query decomposition
◼ Cost based optimization
❑ One wrapper per data store
◼ Self-tuning polystore
❑ Automatic data distribution and partitioning across the different
data stores
❑ Each data partition is internally described as a materialized view
over one or several data collection
◼ Query processor
❑ Deals with single model queries only, each expressed in the
query language of the corresponding data source
◼ To integrate various data sources, one would need a common data
model and language on top of Estocada
❑ Query processing involves view-based query rewriting and cost-
based optimization
/* Integration query */
SELECT T1.x, T2.z
FROM T1 JOIN T2 @CloudMdsQL
p
ON T1.x = T2.x
/* SQL sub-query */ x, y
T1(x int, y int)@DB1 =
( SELECT x, y FROM A ) T1@DB1
A (MonetDB)
/* Native sub-query */
T2(x int, z string)@DB2 = N x, z
{*
db.B.find( {$lt: {x, 10}}, {x:1, z:1, _id:0} ) T2@DB2
(MongoDB)
*}
/* SQL subquery */
T1(title string, kw string)@rdbms = ( SELECT title, kw FROM tbl )
/* MFR subquery */
T2(word string, count int)@hdfs = {*
SCAN(TEXT,'words.txt’)
.MAP(KEY,1)
.REDUCE(SUM)
.PROJECT(KEY,VALUE) *}
BigDAWG
Tightly-coupled
Hybrid
Loosely-coupled
Tightly-coupled
Hybrid