0% found this document useful (0 votes)
21 views51 pages

BDS Session 5 - NoSQL DB

Big data system notes-5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views51 pages

BDS Session 5 - NoSQL DB

Big data system notes-5
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

DSECL ZG 522: Big Data Systems

Session 5: NoSQL Databases

John Benito JP
[email protected]
Topics for today

• NoSQL Introduction
• Classification
• Examples
• Cassandra
• Mongo
• GraphDBs: Neo4J and Tinkerpop
Why NoSQL (1)

• RDBMS meant for OLTP systems / Systems of Record


• Strict consistency and durability guarantees (ACID) over multiple data items involved in a
transaction
• But they have scale and cost issues with large volumes of data, distributed geo-scale applications,
very high transaction volumes
• Typical web scale systems do not need strict consistency and durability for every use case
• Social networking
• Real-time applications
• Log analysis
• Browsing retail catalogs
• Reviews and blogs
• …
Why NoSQL (2)

• RDBMS ensure uniform structure and modelling of relationships between entities


• A class of emerging applications need granular and extreme connectivity
information modelled between individual semi-structured data items. This
information needs to be also queried at scale without large expensive joins.
• Connectivity between users in a social media application: How many friends
do you have between 2 hops ?
• Connectivity between companies in terms of domain, technology, people
skills, hiring : Useful for skills acquisition, M&A etc.
• Connectivity between IT network devices: Useful for troubleshooting incidents
What is NoSQL ?
• Coined by Carlo Strozzi in 1998
✓ Lightweight, open source database without standard SQL interface
• Reintroduced by Johan Oskarsson in 2009
✓ Non-relational databases

• Characteristics
✓ Not Only SQL
✓ Non-relational
✓ Schema-less
✓ Loosen consistency to address scalability and availability requirements in large scale
applications
✓ Open source movement born out of web-scale applications
✓ Distributed for scale
✓ Cluster Friendly
Data model

• Supports rich variety of data : structured, semi-structured and


unstructured
• No fixed schema, i.e. each record could have different attributes
• Non-relational - no join operations are typically supported
• Transaction semantics for multiple data items are typically not
supported
• Relaxed consistency semantics - no support for ACID as in RDBMS
• In some cases can model data as graphs and queries as graph
traversals
Classification of NoSQL DBs

• Key – value
✓ Maintains a big hash table of keys and values
✓ Example : Dynamo, Redis, Riak etc
• Document
✓ Maintains data in collections of documents
✓ Example : MongoDB, CouchDB etc
• Column
✓ Each storage block has data from only one column
✓ Example : Cassandra, HBase
• Graph
✓ Network databases
✓ Graph stores data in nodes
✓ Example : Neo4j, HyperGraphDB, Apache Tinkerpop
Characteristics
• Scale out architecture instead of monolithic architecture of relational databases
• Cluster scale - distribution across 100+ nodes across DCs
• Performance scale - 100K+ DB reads and writes per sec
• Data scale - 1B+ docs in DB
• House large amount of structured, semi-structured and unstructured data
• Dynamic schemas
✓ allows insertion of data without pre-defined schema

• Auto sharding
✓ automatically spreads data across the number of servers
✓ applications are not aware about it
✓ helps in data balancing and failure from recovery

• Replication
✓ Good support for replication of data which offers high availability, fault tolerance
Pros and Cons

• Cost effective for large data sets • Joins between data sets / tables
• Easy to implement • Group by operations
• Easy to distribute esp across DCs • ACID properties for transactions
• Easier to scale up/down • SQL interface
• Relaxes data consistency when required • Lack of standardisation in this space
• No pre-defined schema
• Makes it difficult to port from SQL
• Easier to model semi-structured data or and across NoSQL stores
connectivity data
• Less skills compared to SQL
• Easy to support data replication
• Lesser BI tools compared to mature SQL
BI space
SQL vs NoSQL

SQL NoSQL
Relational database Non relational, distributed databases
Pre-defined schema Schema less
Table based databases Multiple options: Key-Value, Document,
Column, Graph
Vertically scalable Horizontally scalable
Supports ACID properties Supports CAP theorem
Supports complex querying Relatively simpler querying
Excellent support from vendors Relies heavily on community support
Vendors

• Amazon
• Facebook
• Google
• Oracle
Topics for today

• NoSQL Introduction
• Classification
• Examples
• Cassandra
• Mongo
• GraphDBs: Neo4J and Tinkerpop
Classification: Document-based

• Store data in form of documents using well known formats like JSON
• Documents accessible via their id, but can be accessed through other index as well
• Maintains data in collections of documents
• Example,
• MongoDB, CouchDB, CouchBase

• Book document :
{
“Book Title” : “Database Fundamentals”,
“Publisher” : “My Publisher”,
“Year of Publication” : “2020”
}
Classification: Key-Value store

• Simple data model based on fast access by the key to the value associated with the key
• Value can be a record or object or document or even complex data structure
• Maintains a big hash table of keys and values
• For example,
✓ Dynamo, Redis, Riak

Key Value
2014HW112220 { Santosh,Sharma,Pilani}
2018HW123123 {Eshwar,Pillai,Hyd}
Classification: Column-based

• Partition a table by column into column families


• A part of vertical partitioning where each column family is stored in its own files
• Allows versioning of data values
• Each storage block has data from only one column
• Example,
✓ Cassandra, Hbase
Classification: Graph based

• Data is represented as graphs and related nodes can be found by traversing the edges using the
path expression
• aka network database
• Graph query languages, e.g. Gremlin, Cypher
• Example
✓ Neo4J
✓ HyperGraphDB
✓ Apache TinkerPop
Topics for today

• NoSQL Introduction
• Classification
• Examples
• Cassandra
• Mongo
• GraphDBs: Neo4J and Tinkerpop
Cassandra

• Born in Facebook and built on Amazon Dynamo and Google Big Table concepts
• AP design in CAP context
• High performance, high availability applications that can sacrifice consistency
✓ Hence built for peer-to-peer symmetric nodes instead of primary-secondary
architecture (as in MongoDB)
• Column oriented DB
✓Create keyspace (like a DB)
✓ Within keyspace create column family (like a table)
✓ Within CF create attributes / columns with their types

18
Read / Write

• Writes
✓ Written to commit log sequentially and deemed successful
✓ Data is indexed and put into in-memory Memtable (one or more per Column Family)
✓ Memtable is flushed to disk SSTable file
✓ SSTable is immutable and append only
✓ Partitioning and replication happens automatically
• Reads
✓ Client connects to any node to read data
✓ Consistency level decides when a read is returned, i.e. how many replicas should contain the
same copy
✓ Read repair: replication via a Gossip protocol is triggered as a client issues a read and
Cassandra has to meet the required consistency level

19
Consistency semantics (1)

• No primary replica - high partition tolerance and availability and levels of


consistency
• Support for light transactions with “linearizable consistency”
• A Read or Write operation can pick a consistency level
• ONE, TWO, THREE, ALL - 1,2,3 or all replicas respectively have to ack
• ANY - Write to any node even if replicas are down (ref Hinted Handoff)
• QUORUM - majority have to ack
• LOCAL_QUORUM - majority within same datacenter have to ack
•…
https://cassandra.apache.org/doc/latest/architecture/dynamo.html
https://cassandra.apache.org/doc/latest/architecture/guarantees.html#
20
Consistency semantics (2)

• For “causal consistency” pick Read consistency level = Read latest written
Write consistency level = QUORUM value from common
node
• Why ? At least one node will be common between write and
read set so a read will get the last write of a data item
• What happens if read and write use LOCAL_QUORUM ?
• If no overlap read and write sets then “Eventual
consistency” R level = W level = QUORUM

https://cassandra.apache.org/doc/latest/architecture/dynamo.html
https://cassandra.apache.org/doc/latest/architecture/guarantees.html#
21
Lightweight transactions

• INSERT and UPDATE with an IF clause support lightweight tx semantics at data item
level
✓ Aka Compare and set
✓ Increases overheads by 4x due to coordination

cqlsh> INSERT INTO cycling.cyclist_name (id, lastname, firstname)


VALUES (4647f6d3-7bd2-4085-8d6c-1229351b5498, 'KNETEMANN', 'Roxxane')
IF NOT EXISTS;

cqlsh> UPDATE cycling.cyclist_name


SET firstname = ‘Roxane’
WHERE id = 4647f6d3-7bd2-4085-8d6c-1229351b5498
IF firstname = ‘Roxxane’;

https://docs.datastax.com/en/cql-oss/3.3/cql/cql_using/useInsertLWT.html

22
Replication strategy for user data

• Simple
✓ Specify replication factor = N and data is stored in N nodes
of cluster
• NetworkTopology
✓ Specify replication factor per DC where we want reliability
from DC failures
✓ e.g. CREATE KEYSPACE cluster1 WITH replication = {'class':
'NetworkTopologyStrategy', 'eastDC' : 2, ‘westDC' : 3};

23
Partitioners

• Partitions data based on hashing to distribute data blocks


from a column among nodes*
• Random
✓ Crypto hash (MD5)- more expensive
• Murmur
✓ Non-crypto consistent hash (MU-Multiple / R - Rotate
operations but easier to reverse compared to Crypto
hash)
✓ 3-5x faster and overall 10% performance improvement
• Byteorder
✓ Lexical order

* Will study this later in context of Dynamo paper


24
Sample queries

> create keyspace demo with replication={'class':'SimpleStrategy',


‘replication_factor':1};
> describe keyspaces;
> use demo; or columnfamily
> create table student_info (rollno int primary key, name text, doj
timestamp, lastexampercent double);
> describe table student_info ;
> consistency quorum
> insert into student_info (rollno,name,doj,lastexampercent) values
(4,'Roxanne', dateof(now()), 90) using ttl 30;
> select rollno from student_info where name=‘Roxanne' ALLOW
FILTERING;
> update student_info set lastexampercent=98 where rollno=2 IF
name='Sam';

25
Case study - eBay

• Marketplace has 100 million active buyers with 200+ million items
• 2B page views, 80B DB calls, multi-PB storage capacity
• No transactions, joins, referential integrity
• Multi-DC deployment
• 400M+ writes and 200M+ reads
• 3 Use cases
✓ Social signal on product pages (read latency is not important but write
performance is key)
✓ Connecting users and items via buy, sell, bid, watch events
✓ Many time series analysis cases, e.g. fraud detection
https://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376/2-eBay_Marketplaces_97_million_active
26
Case study - AdStage (from AWS use cases)

• Sector AdTech
• Online advertising platform to manage multi-channel ad campaigns on
Google, FB, Twitter, Bing, LinkedIn
• 3 clusters with 80+ nodes on AWS
• Vast amount of real-time data from 5 channels
• Constantly monitor trends and optimise campaigns for advertisers
• High performance and availability - consistency is not critical as it is read
mainly
• Cassandra cluster can scale as more clients are added with no SPOF

27
Topics for today

• NoSQL Introduction
• Pros-Cons
• Classification
• Examples
• Cassandra
• Mongo
• GraphDBs: Neo4J and Tinkerpop
MongoDB

• Database is a set of collections


• A collection is like a table in RDBMS
• A collection stores documents
✓ BSON or Binary JSON with hierarchical key-value pairs
✓ Similar to rows in a table
✓ Max 16MB documents stored in WiredTiger storage engine
• For larger than 16MB documents uses GridFS
✓ Support for binary data
✓ Large objects can be stored in ‘chunks’ of 255KB
✓ Stores Meta-data in a separate collection
✓ Does not support multi-document transactions
✓ WiredTiger storage engine*

29 * https://docs.mongodb.com/manual/core/wiredtiger/
MongoDB

• Data is partitioned in shards


✓ For horizontal scaling
✓ Reduces amount of data each shard handles as the cluster grows
✓ Reduces number of operations on each shard
• Data is replicated
✓ Writes to primary in oplog. “write-concern” setting used to tweak write consistency.
✓ Secondaries use oplog to get local copies updated
✓ Clients usually read from primary but “read-preference” setting can tweak read
consistency
• Data updates happen in place and not versioned / timestamped

30
cloud.mongodb.com

Get me top 10 beach front homes

SS ZG 526: Distributed Computing 31


MongoDB: MapReduce
> db.collection.mapReduce(
function() {emit(key,value);}, //map function
function(key,values) {return reduceFunction}, { //reduce function
out: collection,
query: document,
sort: document,
limit: number
}

create a collection reviews with each document as : {“name": “abc”, “review”:”…“, “publish”:”true”}

now count the number of published comments per user name

>db.posts.mapReduce(
function() { emit(this.name,1); },
function(key, values) {return Array.sum(values)}, {
query:{publish:"true"},
out:”total_reviews"
}
).find()

32
MongoDB: Indexing
• Can create index on any field of a collection or a sub-document fields
• e.g. document in a collection
{
"address": {
"city": “New Delhi",
"state": "Delhi",
"pincode": "110001"
},
"tags": [
"football",
"cricket",
"badminton"
],
"name": "Ravi"
}

• indexing a field in ascending order and find


> db.users.createIndex({“tags":1})
> db.users.find({tags:"cricket"}).pretty()

• indexing a sub-document field in ascending order and find


> db.users.createIndex({"address.city":1,"address.state":1,"address.pincode":1})
> db.users.find({“address.city":"New Delhi”}).pretty()

33
MongoDB: Joins
• Mongo 3.2+ it is possible to join data from 2 collections using aggregate
• Collection books (isbn, title, author) and books_selling_data(isbn, copies_sold)
db.books.aggregate([{ $lookup: {
from: "books_selling_data",

localField: "isbn",

foreignField: "isbn",

as: "copies_sold"

}])

• Sample joined document:


{
"isbn": "978-3-16-148410-0",
"title": "Some cool book",
"author": "John Doe",
"copies_sold": [
{
"isbn": "978-3-16-148410-0",
"copies_sold": 12500
}
]
}
34
MongoDB
• Document oriented DB
• Various read and write choices for flexible consistency tradeoff with scale / performance and durability
• Automatic primary re-election on primary failure and/or network partition

35
What is Causal Consistency (recap)

36
Example in MongoDB

• Case 1 : No causal consistency

• Case 2: Causal consistency by making read to secondary wait

https://engineering.mongodb.com/post/ryp0ohr2w9pvv0fks88kq6qkz9k9p3
37
MongoDB “read concerns”

• local :
• Client reads primary replica
• Client reads from secondary in causally consistent sessions
• available:
• Read on secondary but causal consistency not required
• majority :
• If client wants to read what majority of nodes have. Best option for fault tolerance and
durability.
• linearizable :
• If client wants to read what has been written to majority of nodes before the read started.
• Has to be read on primary
• Only single document can be read

https://docs.mongodb.com/v3.4/core/read-preference-mechanics/
38
MongoDB “write concerns”

• how many replicas should ack


• 1 - primary only
• 0 - none
• n - how many including primary
• majority - a majority of nodes (preferred for durability)
• journaling - If True then nodes need to write to disk journal before ack
else ack after writing to memory (less durable)
• timeout for write operation

https://docs.mongodb.com/manual/reference/write-concern/
39
Consistency scenarios - causally consistent and durable

Read latest written


value from common
node

R = W = majority
• read=majority, write=majority
• W1 and R1 for P1 will fail and will succeed in P2
• So causally consistent, durable even with network partition sacrificing performance
• Example: Used in critical transaction oriented applications, e.g. stock trading

https://engineering.mongodb.com/post/ryp0ohr2w9pvv0fks88kq6qkz9k9p3
40
Consistency scenarios - causally consistent but not durable

• read=majority, write=1
• W1 may succeed on P1 and P2. R1 will succeed only on P2. W1 on P1 may roll back.
• So causally consistent but not durable with network partition. Fast writes, slower reads.
• Example: Twitter - a post may disappear but if on refresh you see it then it should be
durable, else repost.

https://engineering.mongodb.com/post/ryp0ohr2w9pvv0fks88kq6qkz9k9p3
41
Consistency scenarios - eventual consistency with durable writes

• read=local, write=majority
• W1 will succeed only for P2 and will not be accepted on P1 after failure. Reads may not succeed
to see the last write on P1. Slow durable writes and fast non-causal reads.
• Example: Review site where write should be durable if committed but reads don’t need
causal guarantee as long as it appears some time (eventual consistency).

https://engineering.mongodb.com/post/ryp0ohr2w9pvv0fks88kq6qkz9k9p3
42
Consistency scenarios - eventual consistency but no durability

• read=local, write=1
• Same as previous scenario and not writes are also not durable and may be rolled back.
• Example: Real-time sensor data feed that needs fast writes to keep up with the rate and
reads should get as much recent real-time data as possible. Data may be dropped on
failures.

https://engineering.mongodb.com/post/ryp0ohr2w9pvv0fks88kq6qkz9k9p3
43
Topics for today

• NoSQL Introduction
• Pros-Cons
• Classification
• Examples
• Cassandra
• Mongo
• GraphDBs: Neo4J and Tinkerpop
Graph computing

• Property graphs
• Data is represented as vertices and edges
with properties
• Properties are key value pairs
• Edges are relationships between vertices
• When to use a graph DB ?
• A relationship-heavy data set with large set of
data items
• Queries are like graph traversals but need to
keep query performance almost constant as
database grows
• A variety of queries may be asked from the
data and static indices on data will not work
Native vs Non-Native Graph storage

• Non-native graph computing platforms can use external DBs for data storage
• e.g. TinkerPop is an in-memory DB + computing framework that can store in
ElasticSearch, Cassandra etc.
• Native platform support built-in storage
• e.g. Neo4j
• Native approach is much faster because adjacent nodes and edges are stored
closer for faster traversal
• In a non-native approach, extensive indexing has to be used
• Native approach scales as nodes get added

One-hop index

https://neo4j.com/blog/native-vs-non-native-graph-technology/
Neo4j / Cypher

• Cypher is a Declarative
language for graph
query
• Example: match (:Person
{name: 'Tom Hanks'})-
[:ACTED_IN]->(m:Movie)
where m.released >
2000 RETURN m limit 5

Launch a free sandbox with dataset on neo4j website


Neo4j / Cypher: More queries

• Find movies that Tom Hanks acted in and directed by Ron Howard
released after 2000
• Match (:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie),
(:Person {name: 'Ron Howard'})-[:DIRECTED]->(m) where
m.released > 2000 RETURN m limit 5
• Who were the other actors in the movie where Tom Hanks acted in
and directed by Ron Howard released after 2000
• Match (:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie),
(:Person {name: 'Ron Howard’})-[:DIRECTED]->(m), (p:Person)-
[:ACTED_IN]->(m) where m.released > 2000 RETURN p limit 5
Apache Tinkerpop / Gremlin

• TinkerPop is a computing platform that connects to GraphDBs that actually


store the nodes and edges. Built-in TinkerGraph stores in-memory data only.
ACTED_IN
• Gremlin is the query language (with traversal machine) that supports
Declarative and Imperative flavours
• Sample queries Person Movie
• movies where Tom Hanks has acted DIRECTED
• g.V().hasLabel(‘person’).has(‘name’,’Tom
Hanks’).outE(‘ACTED_IN’).hasLabel(‘movies’).values(‘name’) Person: Tom Hanks

• movies where Tom Hanks has acted and directed by Ron Howard ACTED_IN
• g.V().hasLabel(‘person’).has(‘name’,’Tom
Hanks’).outE(‘ACTED_IN’).inE(‘DIRECTED’).has(‘name’,’Ron
DIRECTED Movie
Howard’).outE(‘DIRECTED’).values(‘name’)

Person: Ron Howard


Summary

• NoSQL databases are useful when


✓ you have to deal with large data sets
✓ may need geographical distribution
✓ No need for ACID transactions and need flexible consistency
• Choices between key-value, column based, document based, graph based
data stores
• Graph DBs and computing models are very suitable when data sets are
relationship heavy - can be modelled as large number of nodes and edges
and queries are similar to graph traversal
✓ Complex relation centric queries are possible
✓ Graph traversal costs can be kept stable with data growth

50
Next Session:
Hadoop Architecture

You might also like