0% found this document useful (0 votes)

22 views51 pages

BDS Session 5 - NoSQL DB

Big data system notes-5

Uploaded by

Santhosh Arunarthy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views51 pages

BDS Session 5 - NoSQL DB

Big data system notes-5

Uploaded by

Santhosh Arunarthy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

DSECL ZG 522: Big Data Systems

Session 5: NoSQL Databases

John Benito JP
[email protected]
Topics for today

• NoSQL Introduction
• Classification
• Examples
• Cassandra
• Mongo
• GraphDBs: Neo4J and Tinkerpop
Why NoSQL (1)

• RDBMS meant for OLTP systems / Systems of Record

• Strict consistency and durability guarantees (ACID) over multiple data items involved in a
transaction
• But they have scale and cost issues with large volumes of data, distributed geo-scale applications,
very high transaction volumes
• Typical web scale systems do not need strict consistency and durability for every use case
• Social networking
• Real-time applications
• Log analysis
• Browsing retail catalogs
• Reviews and blogs
• …
Why NoSQL (2)

• RDBMS ensure uniform structure and modelling of relationships between entities

• A class of emerging applications need granular and extreme connectivity
information modelled between individual semi-structured data items. This
information needs to be also queried at scale without large expensive joins.
• Connectivity between users in a social media application: How many friends
do you have between 2 hops ?
• Connectivity between companies in terms of domain, technology, people
skills, hiring : Useful for skills acquisition, M&A etc.
• Connectivity between IT network devices: Useful for troubleshooting incidents
What is NoSQL ?
• Coined by Carlo Strozzi in 1998
✓ Lightweight, open source database without standard SQL interface
• Reintroduced by Johan Oskarsson in 2009
✓ Non-relational databases

• Characteristics
✓ Not Only SQL
✓ Non-relational
✓ Schema-less
✓ Loosen consistency to address scalability and availability requirements in large scale
applications
✓ Open source movement born out of web-scale applications
✓ Distributed for scale
✓ Cluster Friendly
Data model

• Supports rich variety of data : structured, semi-structured and

unstructured
• No fixed schema, i.e. each record could have different attributes
• Non-relational - no join operations are typically supported
• Transaction semantics for multiple data items are typically not
supported
• Relaxed consistency semantics - no support for ACID as in RDBMS
• In some cases can model data as graphs and queries as graph
traversals
Classification of NoSQL DBs

• Key – value
✓ Maintains a big hash table of keys and values
✓ Example : Dynamo, Redis, Riak etc
• Document
✓ Maintains data in collections of documents
✓ Example : MongoDB, CouchDB etc
• Column
✓ Each storage block has data from only one column
✓ Example : Cassandra, HBase
• Graph
✓ Network databases
✓ Graph stores data in nodes
✓ Example : Neo4j, HyperGraphDB, Apache Tinkerpop
Characteristics
• Scale out architecture instead of monolithic architecture of relational databases
• Cluster scale - distribution across 100+ nodes across DCs
• Performance scale - 100K+ DB reads and writes per sec
• Data scale - 1B+ docs in DB
• House large amount of structured, semi-structured and unstructured data
• Dynamic schemas
✓ allows insertion of data without pre-defined schema

• Auto sharding
✓ automatically spreads data across the number of servers
✓ applications are not aware about it
✓ helps in data balancing and failure from recovery

• Replication
✓ Good support for replication of data which offers high availability, fault tolerance
Pros and Cons

• Cost effective for large data sets • Joins between data sets / tables
• Easy to implement • Group by operations
• Easy to distribute esp across DCs • ACID properties for transactions
• Easier to scale up/down • SQL interface
• Relaxes data consistency when required • Lack of standardisation in this space
• No pre-defined schema
• Makes it difficult to port from SQL
• Easier to model semi-structured data or and across NoSQL stores
connectivity data
• Less skills compared to SQL
• Easy to support data replication
• Lesser BI tools compared to mature SQL
BI space
SQL vs NoSQL

SQL NoSQL
Relational database Non relational, distributed databases
Pre-defined schema Schema less
Table based databases Multiple options: Key-Value, Document,
Column, Graph
Vertically scalable Horizontally scalable
Supports ACID properties Supports CAP theorem
Supports complex querying Relatively simpler querying
Excellent support from vendors Relies heavily on community support
Vendors

• Amazon
• Facebook
• Google
• Oracle
Topics for today

• NoSQL Introduction
• Classification
• Examples
• Cassandra
• Mongo
• GraphDBs: Neo4J and Tinkerpop
Classification: Document-based

• Store data in form of documents using well known formats like JSON
• Documents accessible via their id, but can be accessed through other index as well
• Maintains data in collections of documents
• Example,
• MongoDB, CouchDB, CouchBase

• Book document :
{
“Book Title” : “Database Fundamentals”,
“Publisher” : “My Publisher”,
“Year of Publication” : “2020”
}
Classification: Key-Value store

• Simple data model based on fast access by the key to the value associated with the key
• Value can be a record or object or document or even complex data structure
• Maintains a big hash table of keys and values
• For example,
✓ Dynamo, Redis, Riak

Key Value
2014HW112220 { Santosh,Sharma,Pilani}
2018HW123123 {Eshwar,Pillai,Hyd}
Classification: Column-based

• Partition a table by column into column families

• A part of vertical partitioning where each column family is stored in its own files
• Allows versioning of data values
• Each storage block has data from only one column
• Example,
✓ Cassandra, Hbase
Classification: Graph based

• Data is represented as graphs and related nodes can be found by traversing the edges using the
path expression
• aka network database
• Graph query languages, e.g. Gremlin, Cypher
• Example
✓ Neo4J
✓ HyperGraphDB
✓ Apache TinkerPop
Topics for today

• NoSQL Introduction
• Classification
• Examples
• Cassandra
• Mongo
• GraphDBs: Neo4J and Tinkerpop
Cassandra

• Born in Facebook and built on Amazon Dynamo and Google Big Table concepts
• AP design in CAP context
• High performance, high availability applications that can sacrifice consistency
✓ Hence built for peer-to-peer symmetric nodes instead of primary-secondary
architecture (as in MongoDB)
• Column oriented DB
✓Create keyspace (like a DB)
✓ Within keyspace create column family (like a table)
✓ Within CF create attributes / columns with their types

18
Read / Write

• Writes
✓ Written to commit log sequentially and deemed successful
✓ Data is indexed and put into in-memory Memtable (one or more per Column Family)
✓ Memtable is flushed to disk SSTable file
✓ SSTable is immutable and append only
✓ Partitioning and replication happens automatically
• Reads
✓ Client connects to any node to read data
✓ Consistency level decides when a read is returned, i.e. how many replicas should contain the
same copy
✓ Read repair: replication via a Gossip protocol is triggered as a client issues a read and
Cassandra has to meet the required consistency level

19
Consistency semantics (1)

• No primary replica - high partition tolerance and availability and levels of

consistency
• Support for light transactions with “linearizable consistency”
• A Read or Write operation can pick a consistency level
• ONE, TWO, THREE, ALL - 1,2,3 or all replicas respectively have to ack
• ANY - Write to any node even if replicas are down (ref Hinted Handoff)
• QUORUM - majority have to ack
• LOCAL_QUORUM - majority within same datacenter have to ack
•…
https://cassandra.apache.org/doc/latest/architecture/dynamo.html
https://cassandra.apache.org/doc/latest/architecture/guarantees.html#
20
Consistency semantics (2)

• For “causal consistency” pick Read consistency level = Read latest written
Write consistency level = QUORUM value from common
node
• Why ? At least one node will be common between write and
read set so a read will get the last write of a data item
• What happens if read and write use LOCAL_QUORUM ?
• If no overlap read and write sets then “Eventual
consistency” R level = W level = QUORUM

https://cassandra.apache.org/doc/latest/architecture/dynamo.html
https://cassandra.apache.org/doc/latest/architecture/guarantees.html#
21
Lightweight transactions

• INSERT and UPDATE with an IF clause support lightweight tx semantics at data item
level
✓ Aka Compare and set
✓ Increases overheads by 4x due to coordination

cqlsh> INSERT INTO cycling.cyclist_name (id, lastname, firstname)

VALUES (4647f6d3-7bd2-4085-8d6c-1229351b5498, 'KNETEMANN', 'Roxxane')
IF NOT EXISTS;

cqlsh> UPDATE cycling.cyclist_name

SET firstname = ‘Roxane’
WHERE id = 4647f6d3-7bd2-4085-8d6c-1229351b5498
IF firstname = ‘Roxxane’;

https://docs.datastax.com/en/cql-oss/3.3/cql/cql_using/useInsertLWT.html

22
Replication strategy for user data

• Simple
✓ Specify replication factor = N and data is stored in N nodes
of cluster
• NetworkTopology
✓ Specify replication factor per DC where we want reliability
from DC failures
✓ e.g. CREATE KEYSPACE cluster1 WITH replication = {'class':
'NetworkTopologyStrategy', 'eastDC' : 2, ‘westDC' : 3};

23
Partitioners

• Partitions data based on hashing to distribute data blocks

from a column among nodes*
• Random
✓ Crypto hash (MD5)- more expensive
• Murmur
✓ Non-crypto consistent hash (MU-Multiple / R - Rotate
operations but easier to reverse compared to Crypto
hash)
✓ 3-5x faster and overall 10% performance improvement
• Byteorder
✓ Lexical order

* Will study this later in context of Dynamo paper

24
Sample queries

> create keyspace demo with replication={'class':'SimpleStrategy',

‘replication_factor':1};
> describe keyspaces;
> use demo; or columnfamily
> create table student_info (rollno int primary key, name text, doj
timestamp, lastexampercent double);
> describe table student_info ;
> consistency quorum
> insert into student_info (rollno,name,doj,lastexampercent) values
(4,'Roxanne', dateof(now()), 90) using ttl 30;
> select rollno from student_info where name=‘Roxanne' ALLOW
FILTERING;
> update student_info set lastexampercent=98 where rollno=2 IF
name='Sam';

25
Case study - eBay

• Marketplace has 100 million active buyers with 200+ million items
• 2B page views, 80B DB calls, multi-PB storage capacity
• No transactions, joins, referential integrity
• Multi-DC deployment
• 400M+ writes and 200M+ reads
• 3 Use cases
✓ Social signal on product pages (read latency is not important but write
performance is key)
✓ Connecting users and items via buy, sell, bid, watch events
✓ Many time series analysis cases, e.g. fraud detection
https://www.slideshare.net/jaykumarpatel/cassandra-at-ebay-13920376/2-eBay_Marketplaces_97_million_active
26
Case study - AdStage (from AWS use cases)

• Sector AdTech
• Online advertising platform to manage multi-channel ad campaigns on
Google, FB, Twitter, Bing, LinkedIn
• 3 clusters with 80+ nodes on AWS
• Vast amount of real-time data from 5 channels
• Constantly monitor trends and optimise campaigns for advertisers
• High performance and availability - consistency is not critical as it is read
mainly
• Cassandra cluster can scale as more clients are added with no SPOF

27
Topics for today

• NoSQL Introduction
• Pros-Cons
• Classification
• Examples
• Cassandra
• Mongo
• GraphDBs: Neo4J and Tinkerpop
MongoDB

• Database is a set of collections

• A collection is like a table in RDBMS
• A collection stores documents
✓ BSON or Binary JSON with hierarchical key-value pairs
✓ Similar to rows in a table
✓ Max 16MB documents stored in WiredTiger storage engine
• For larger than 16MB documents uses GridFS
✓ Support for binary data
✓ Large objects can be stored in ‘chunks’ of 255KB
✓ Stores Meta-data in a separate collection
✓ Does not support multi-document transactions
✓ WiredTiger storage engine*

29 * https://docs.mongodb.com/manual/core/wiredtiger/
MongoDB

• Data is partitioned in shards

✓ For horizontal scaling
✓ Reduces amount of data each shard handles as the cluster grows
✓ Reduces number of operations on each shard
• Data is replicated
✓ Writes to primary in oplog. “write-concern” setting used to tweak write consistency.
✓ Secondaries use oplog to get local copies updated
✓ Clients usually read from primary but “read-preference” setting can tweak read
consistency
• Data updates happen in place and not versioned / timestamped

30
cloud.mongodb.com

Get me top 10 beach front homes

SS ZG 526: Distributed Computing 31

MongoDB: MapReduce
> db.collection.mapReduce(
function() {emit(key,value);}, //map function
function(key,values) {return reduceFunction}, { //reduce function
out: collection,
query: document,
sort: document,
limit: number
}

create a collection reviews with each document as : {“name": “abc”, “review”:”…“, “publish”:”true”}

now count the number of published comments per user name

>db.posts.mapReduce(
function() { emit(this.name,1); },
function(key, values) {return Array.sum(values)}, {
query:{publish:"true"},
out:”total_reviews"
}
).find()

32
MongoDB: Indexing
• Can create index on any field of a collection or a sub-document fields
• e.g. document in a collection
{
"address": {
"city": “New Delhi",
"state": "Delhi",
"pincode": "110001"
},
"tags": [
"football",
"cricket",
"badminton"
],
"name": "Ravi"
}

• indexing a field in ascending order and find

> db.users.createIndex({“tags":1})
> db.users.find({tags:"cricket"}).pretty()

• indexing a sub-document field in ascending order and find

> db.users.createIndex({"address.city":1,"address.state":1,"address.pincode":1})
> db.users.find({“address.city":"New Delhi”}).pretty()

33
MongoDB: Joins
• Mongo 3.2+ it is possible to join data from 2 collections using aggregate
• Collection books (isbn, title, author) and books_selling_data(isbn, copies_sold)
db.books.aggregate([{ $lookup: {
from: "books_selling_data",

localField: "isbn",

foreignField: "isbn",

as: "copies_sold"

}])

• Sample joined document:

{
"isbn": "978-3-16-148410-0",
"title": "Some cool book",
"author": "John Doe",
"copies_sold": [
{
"isbn": "978-3-16-148410-0",
"copies_sold": 12500
}
]
}
34
MongoDB
• Document oriented DB
• Various read and write choices for flexible consistency tradeoff with scale / performance and durability
• Automatic primary re-election on primary failure and/or network partition

35
What is Causal Consistency (recap)

36
Example in MongoDB

• Case 1 : No causal consistency

• Case 2: Causal consistency by making read to secondary wait

https://engineering.mongodb.com/post/ryp0ohr2w9pvv0fks88kq6qkz9k9p3
37
MongoDB “read concerns”

• local :
• Client reads primary replica
• Client reads from secondary in causally consistent sessions
• available:
• Read on secondary but causal consistency not required
• majority :
• If client wants to read what majority of nodes have. Best option for fault tolerance and
durability.
• linearizable :
• If client wants to read what has been written to majority of nodes before the read started.
• Has to be read on primary
• Only single document can be read

https://docs.mongodb.com/v3.4/core/read-preference-mechanics/
38
MongoDB “write concerns”

• how many replicas should ack

• 1 - primary only
• 0 - none
• n - how many including primary
• majority - a majority of nodes (preferred for durability)
• journaling - If True then nodes need to write to disk journal before ack
else ack after writing to memory (less durable)
• timeout for write operation

https://docs.mongodb.com/manual/reference/write-concern/
39
Consistency scenarios - causally consistent and durable

Read latest written

value from common
node

R = W = majority
• read=majority, write=majority
• W1 and R1 for P1 will fail and will succeed in P2
• So causally consistent, durable even with network partition sacrificing performance
• Example: Used in critical transaction oriented applications, e.g. stock trading

https://engineering.mongodb.com/post/ryp0ohr2w9pvv0fks88kq6qkz9k9p3
40
Consistency scenarios - causally consistent but not durable

• read=majority, write=1
• W1 may succeed on P1 and P2. R1 will succeed only on P2. W1 on P1 may roll back.
• So causally consistent but not durable with network partition. Fast writes, slower reads.
• Example: Twitter - a post may disappear but if on refresh you see it then it should be
durable, else repost.

https://engineering.mongodb.com/post/ryp0ohr2w9pvv0fks88kq6qkz9k9p3
41
Consistency scenarios - eventual consistency with durable writes

• read=local, write=majority
• W1 will succeed only for P2 and will not be accepted on P1 after failure. Reads may not succeed
to see the last write on P1. Slow durable writes and fast non-causal reads.
• Example: Review site where write should be durable if committed but reads don’t need
causal guarantee as long as it appears some time (eventual consistency).

https://engineering.mongodb.com/post/ryp0ohr2w9pvv0fks88kq6qkz9k9p3
42
Consistency scenarios - eventual consistency but no durability

• read=local, write=1
• Same as previous scenario and not writes are also not durable and may be rolled back.
• Example: Real-time sensor data feed that needs fast writes to keep up with the rate and
reads should get as much recent real-time data as possible. Data may be dropped on
failures.

https://engineering.mongodb.com/post/ryp0ohr2w9pvv0fks88kq6qkz9k9p3
43
Topics for today

• NoSQL Introduction
• Pros-Cons
• Classification
• Examples
• Cassandra
• Mongo
• GraphDBs: Neo4J and Tinkerpop
Graph computing

• Property graphs
• Data is represented as vertices and edges
with properties
• Properties are key value pairs
• Edges are relationships between vertices
• When to use a graph DB ?
• A relationship-heavy data set with large set of
data items
• Queries are like graph traversals but need to
keep query performance almost constant as
database grows
• A variety of queries may be asked from the
data and static indices on data will not work
Native vs Non-Native Graph storage

• Non-native graph computing platforms can use external DBs for data storage
• e.g. TinkerPop is an in-memory DB + computing framework that can store in
ElasticSearch, Cassandra etc.
• Native platform support built-in storage
• e.g. Neo4j
• Native approach is much faster because adjacent nodes and edges are stored
closer for faster traversal
• In a non-native approach, extensive indexing has to be used
• Native approach scales as nodes get added

One-hop index

https://neo4j.com/blog/native-vs-non-native-graph-technology/
Neo4j / Cypher

• Cypher is a Declarative
language for graph
query
• Example: match (:Person
{name: 'Tom Hanks'})-
[:ACTED_IN]->(m:Movie)
where m.released >
2000 RETURN m limit 5

Launch a free sandbox with dataset on neo4j website

Neo4j / Cypher: More queries

• Find movies that Tom Hanks acted in and directed by Ron Howard
released after 2000
• Match (:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie),
(:Person {name: 'Ron Howard'})-[:DIRECTED]->(m) where
m.released > 2000 RETURN m limit 5
• Who were the other actors in the movie where Tom Hanks acted in
and directed by Ron Howard released after 2000
• Match (:Person {name: 'Tom Hanks'})-[:ACTED_IN]->(m:Movie),
(:Person {name: 'Ron Howard’})-[:DIRECTED]->(m), (p:Person)-
[:ACTED_IN]->(m) where m.released > 2000 RETURN p limit 5
Apache Tinkerpop / Gremlin

• TinkerPop is a computing platform that connects to GraphDBs that actually

store the nodes and edges. Built-in TinkerGraph stores in-memory data only.
ACTED_IN
• Gremlin is the query language (with traversal machine) that supports
Declarative and Imperative flavours
• Sample queries Person Movie
• movies where Tom Hanks has acted DIRECTED
• g.V().hasLabel(‘person’).has(‘name’,’Tom
Hanks’).outE(‘ACTED_IN’).hasLabel(‘movies’).values(‘name’) Person: Tom Hanks

• movies where Tom Hanks has acted and directed by Ron Howard ACTED_IN
• g.V().hasLabel(‘person’).has(‘name’,’Tom
Hanks’).outE(‘ACTED_IN’).inE(‘DIRECTED’).has(‘name’,’Ron
DIRECTED Movie
Howard’).outE(‘DIRECTED’).values(‘name’)

Person: Ron Howard

Summary

• NoSQL databases are useful when

✓ you have to deal with large data sets
✓ may need geographical distribution
✓ No need for ACID transactions and need flexible consistency
• Choices between key-value, column based, document based, graph based
data stores
• Graph DBs and computing models are very suitable when data sets are
relationship heavy - can be modelled as large number of nodes and edges
and queries are similar to graph traversal
✓ Complex relation centric queries are possible
✓ Graph traversal costs can be kept stable with data growth

50
Next Session:
Hadoop Architecture

Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
Big Data Unit 3
No ratings yet
Big Data Unit 3
374 pages
BIG Data 2
No ratings yet
BIG Data 2
18 pages
Nosql
No ratings yet
Nosql
64 pages
Unit-4 DBMS
No ratings yet
Unit-4 DBMS
39 pages
NoSQL Database Technology - A Survey and Comparison of Systems
No ratings yet
NoSQL Database Technology - A Survey and Comparison of Systems
44 pages
Bigdata Unit 4
No ratings yet
Bigdata Unit 4
97 pages
PMFIAS Prelims Magnum 2025 06 Science and Technology
No ratings yet
PMFIAS Prelims Magnum 2025 06 Science and Technology
210 pages
BDA (2) Merged
No ratings yet
BDA (2) Merged
29 pages
NoSql Unit 2
No ratings yet
NoSql Unit 2
72 pages
Unit II
No ratings yet
Unit II
31 pages
06 NoSQL
No ratings yet
06 NoSQL
80 pages
Module 1
No ratings yet
Module 1
69 pages
IntroNoSQL Revised
No ratings yet
IntroNoSQL Revised
28 pages
Unit II - BDA NEW
No ratings yet
Unit II - BDA NEW
48 pages
Lecture 3.1.2
No ratings yet
Lecture 3.1.2
47 pages
Player's Handbook (2014) - PDF
0% (1)
Player's Handbook (2014) - PDF
485 pages
No SQL
No ratings yet
No SQL
32 pages
BIG - DATA - Unit 4
No ratings yet
BIG - DATA - Unit 4
99 pages
NoSQL DB
No ratings yet
NoSQL DB
39 pages
No SQL
No ratings yet
No SQL
109 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
Nosql Databases Unit-1
No ratings yet
Nosql Databases Unit-1
16 pages
INSIGNIA Book Sample
No ratings yet
INSIGNIA Book Sample
38 pages
BigData NoSQL
No ratings yet
BigData NoSQL
30 pages
Unit VI - 1
No ratings yet
Unit VI - 1
31 pages
Credentials - Impeerical Consulting
No ratings yet
Credentials - Impeerical Consulting
22 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
Types No-Sql
No ratings yet
Types No-Sql
3 pages
Chapter 5-NoSQL PDF
No ratings yet
Chapter 5-NoSQL PDF
47 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
12 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
Nosql Module 1
No ratings yet
Nosql Module 1
23 pages
Unit 4 BDA
No ratings yet
Unit 4 BDA
22 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
43 pages
Module 5 - NoSQL Databases
No ratings yet
Module 5 - NoSQL Databases
33 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
Overview of NoSQL
No ratings yet
Overview of NoSQL
17 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
Bda Notes (Unit-2)
No ratings yet
Bda Notes (Unit-2)
26 pages
Intro To NoSQL
No ratings yet
Intro To NoSQL
18 pages
Bda Unit12
No ratings yet
Bda Unit12
9 pages
Unit 2
No ratings yet
Unit 2
26 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
History S Pettett Ver 4a DRW
No ratings yet
History S Pettett Ver 4a DRW
32 pages
Module 1 Introduction
No ratings yet
Module 1 Introduction
9 pages
No SQL Lecture Notes
No ratings yet
No SQL Lecture Notes
17 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
NoSQL
No ratings yet
NoSQL
18 pages
Nosql Database: New Era of Databases For Big Data Analytics - Classification, Characteristics and Comparison
No ratings yet
Nosql Database: New Era of Databases For Big Data Analytics - Classification, Characteristics and Comparison
17 pages
BDS Session 3
No ratings yet
BDS Session 3
64 pages
Unit 2 Handouts
No ratings yet
Unit 2 Handouts
11 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
13 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
NoSQL Big Data Management
No ratings yet
NoSQL Big Data Management
36 pages
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
No ratings yet
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
42 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
BDS Session 1
No ratings yet
BDS Session 1
71 pages
Advanced Nuclear Energy
No ratings yet
Advanced Nuclear Energy
46 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
Technical Notes - John C. Hull
No ratings yet
Technical Notes - John C. Hull
64 pages
Using ICT To Improve Your Monitoring & Evaluation: A Workbook To Help You Develop An Effective ICT System (Davey, Parkinson and Wadia (2008)
No ratings yet
Using ICT To Improve Your Monitoring & Evaluation: A Workbook To Help You Develop An Effective ICT System (Davey, Parkinson and Wadia (2008)
92 pages
A Changing of The Guards at The College of Arts and Sciences (1981)
No ratings yet
A Changing of The Guards at The College of Arts and Sciences (1981)
151 pages
NoSQL Notes
No ratings yet
NoSQL Notes
11 pages
No SQL
No ratings yet
No SQL
12 pages
CTC
No ratings yet
CTC
30 pages
NoSQL Tutorial - New
No ratings yet
NoSQL Tutorial - New
10 pages
Notes of Trends Makmak
No ratings yet
Notes of Trends Makmak
14 pages
Big Data Unit-Ii Notes
No ratings yet
Big Data Unit-Ii Notes
7 pages
Certificate Beneficial Ownership Form
100% (1)
Certificate Beneficial Ownership Form
3 pages
EOI 2019 01 Website PDF
No ratings yet
EOI 2019 01 Website PDF
15 pages
Chapter 04
100% (1)
Chapter 04
27 pages
Features of Nosql: Non-Relational
No ratings yet
Features of Nosql: Non-Relational
7 pages
CSE303 CourseOutline Spring2024 IUB
No ratings yet
CSE303 CourseOutline Spring2024 IUB
6 pages
LUVOBATCH Blowingagents EN 2022
No ratings yet
LUVOBATCH Blowingagents EN 2022
7 pages
Celebrity Endorsement: Advertising Agency Managers' Perspective
No ratings yet
Celebrity Endorsement: Advertising Agency Managers' Perspective
33 pages
Film Insurance
100% (1)
Film Insurance
8 pages
2024 July Rationale Crisil
No ratings yet
2024 July Rationale Crisil
7 pages
The Interactive Effect of Job Involvement and Organizational Commitment On Job Turnover Revisited: A Note On The Mediating Role of Turnover Intention
No ratings yet
The Interactive Effect of Job Involvement and Organizational Commitment On Job Turnover Revisited: A Note On The Mediating Role of Turnover Intention
6 pages
What Is NoSQL
No ratings yet
What Is NoSQL
4 pages
AutoCAD PLANT 3D 2015 System Tools Variables Cadgroup
No ratings yet
AutoCAD PLANT 3D 2015 System Tools Variables Cadgroup
24 pages
Transportation Calculations
No ratings yet
Transportation Calculations
11 pages
123GL Undstd Cybersec
No ratings yet
123GL Undstd Cybersec
6 pages
Optimum Equipment Management Through: Life Cycle Costing
No ratings yet
Optimum Equipment Management Through: Life Cycle Costing
4 pages
B Malli
No ratings yet
B Malli
1 page
Linguine Pasta - Google Search
No ratings yet
Linguine Pasta - Google Search
1 page
Birds Nest Menu
No ratings yet
Birds Nest Menu
7 pages
(Final Draft) Taskap Sesdilu - M. Arief Priowahono
No ratings yet
(Final Draft) Taskap Sesdilu - M. Arief Priowahono
21 pages
2025 11 2 16 12 04 Pre Closestatement
No ratings yet
2025 11 2 16 12 04 Pre Closestatement
3 pages
3D CAD Matrix PDF
No ratings yet
3D CAD Matrix PDF
5 pages
CB2201 5
No ratings yet
CB2201 5
1 page
13 Marquez v. CA
No ratings yet
13 Marquez v. CA
1 page

BDS Session 5 - NoSQL DB

Uploaded by

BDS Session 5 - NoSQL DB

Uploaded by

DSECL ZG 522: Big Data Systems

Session 5: NoSQL Databases

• RDBMS meant for OLTP systems / Systems of Record

• RDBMS ensure uniform structure and modelling of relationships between entities

• Supports rich variety of data : structured, semi-structured and

• Partition a table by column into column families

• No primary replica - high partition tolerance and availability and levels of

cqlsh> INSERT INTO cycling.cyclist_name (id, lastname, firstname)

cqlsh> UPDATE cycling.cyclist_name

• Partitions data based on hashing to distribute data blocks

* Will study this later in context of Dynamo paper

> create keyspace demo with replication={'class':'SimpleStrategy',

• Database is a set of collections

• Data is partitioned in shards

Get me top 10 beach front homes

SS ZG 526: Distributed Computing 31

now count the number of published comments per user name

• indexing a field in ascending order and find

• indexing a sub-document field in ascending order and find

• Sample joined document:

• Case 1 : No causal consistency

• Case 2: Causal consistency by making read to secondary wait

• how many replicas should ack

Read latest written

Launch a free sandbox with dataset on neo4j website

• TinkerPop is a computing platform that connects to GraphDBs that actually

Person: Ron Howard

• NoSQL databases are useful when

You might also like