Unit5_Notes_Short_DB

The lecture covers NOSQL databases and their role in managing big data, highlighting their characteristics such as scalability, availability, and lack of required schemas. It discusses various types of NOSQL systems, including document-based, key-value stores, and graph databases, along with examples like MongoDB and DynamoDB. The document also explains replication models, the CAP theorem, and the importance of sharding for load balancing in distributed systems.

Uploaded by

divya.anantharajan05

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Unit5_Notes_Short_DB

Uploaded by

divya.anantharajan05

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Lecture: NOSQL Databases and Big Data

Storage Systems
Things I know.

View on GitHub

Lecture: NOSQL Databases and Big Data

Storage Systems
Readings: Chapter 24, Fundamentals of database systems, Seventh Edition (R. Elmasri, S.
Navathe).

NOSQL Databases and Big Data Storage Systems

NOSQL: not only SQL
most NOSQL systems are distributed databases or distributed storage systems which
focus on semi-structured data storage, high performance, availability, data replication, and
scalability
why NOSQL?
SQL systems offer too many services (powerful query language, concurrency control,
etc.)
a structured data model may be too restrictive
relational systems require schema, NOSQL systems don’t
NOSQL systems focus on storage of ‘big data’
typical applications that use NOSQL
social media, web links, user profiles, marketing and sales, posts and tweets, road
maps and spatial data, email, etc
Examples
DynamoDB (Amazon): key-value data store
BigTable: Google’s proprietary NOSQL system. Column-based or wide column store
Cassandra (Facebook): uses concepts from both key-value store and column-based
systems
MongoDB and CouchDB: document stores
Neo4J and GraphBase: graph-based NOSQL systems
OrientDB: combines several concepts
NOSQL characteristics
scalability
horizontal scalability (by adding more nodes) is employed while the system is
operational, so techniques for distributing the existing data among new nodes
without interrupting system operation are necessary
availability, replication, and eventual consistency
requirement for continuous system availability (data is replicated over many
nodes in transparent manner)
if one node fails, the data is still available on other nodes
replication improves data availability and read performance (however, write
performance becomes more cumbersome, must write to every copy of the
replicated data items)
this can slow down write performance if serializable consistency is required, so
more relaxed forms of consistency known as eventual consistency are used
sharding of files
NOSQL applications can have millions of records, and these records can be
accessed concurrently by thousands of users (it is not practical to store the
whole file in one node)
sharding (also known as horizontal partitioning) of the file records is employed
this serves to distribute the load of accessing the file records to multiple nodes
the combination of sharding and replicating the shards works towards
improving load balancing as well as data availability
schema not required
semi-structured, self describing data facilitates this flexibility of no schema
the lack of schema and constraints:
constraints on the data would have to be programmed
languages for describing semi-structured data are
JSON (JavaScript Object Notation)
XML (Extensible Markup Language)
less powerful query languages
we may not require a powerful query language such as SQL because search
(read) queries often locate single objects in a single file based on their object
keys
in many cases, the operations are called CRUD operations
only a subset of SQL querying capabilities are provided (many NOSQL systems
do not provide join operations)
replication models
master-slave
requires one copy to be the master copy
all write operations must be applied to the master copy and then propagated to
the slave copies
usually using eventual consistency (the slave copies will eventually be the same
as the master copy)
master-master replication
allows reads and writes at any of the replicas
may not guarantee that reads at nodes that store different copies see the same
values
different users may write the same data item concurrently at different nodes of
the system (so the values of the item will be temporarily inconsistent)
categories of NOSQL systems
document-based NOSQL systems: documents are accessible via their document id,
but can also be accessed rapidly using other indexes
NOSQL key-value stores: simple data model based on fast access by the key to the
value associated with the key (hashing)
graph-based NOSQL systems: data is represented as graphs, and related nodes can
be found by traversing the edges
column-based or wide column NOSQL systems
hybrid NOSQL systems: these systems have characteristics from two or more of the
above four categories
consistency
various levels of consistency among replicated data items (enforcing serializabilty is
the strongest form of consistency)
ACID properties
atomicity: transaction performed in its entirety or not at all
consistency preservation: takes database from one consistent state to another
isolation: not interfered with by other transactions
durability or permanency: changes must persist in the database
high overhead: can reduce operation performance (especially on NOSQL replicated
systems)
the CAP theorem
CAP theorem refers to three desirable properties of distributed systems with
replicated data
consistency: among replicated copies (consider a variable X1 replicated 4 times
and updated concurrently by 6 users)
availability: we receive a non-error response (without guarantee that it is the
most recent write)
partition tolerance: continue to operate despite loss of messages by the
network between nodes
not possible to guarantee all three simultaneously in distributed system with data
replication
weaker consistency level is often acceptable in NOSQL distributed data store
(eventual consistency often adopted)
guaranteeing availability and partition tolerance more important
eventually all accesses to an item will return the last updated value

MongoDB
collections of similar documents
individual documents resemble complex objects or XML documents
documents are self-describing
can have different data elements
documents can be specified in various formats: XML, JSON
MongoDB supports CRUD operations
documents stored in binary JSON (BSON) format
individual documents stored in a collection
each document in collection has unique ObjectID field called _id
a collection does not have a schema
structure of the data fields in documents chosen based on how documents will be
accessed
user can choose normalized or denormalized design
replication
concept of replica set to create multiple copies on different nodes
variation of master-slave approach
a replica set will have one primary copy of a collection C stored in one node
N1 , and at least one secondary copy (replica) of C stored at another node N2
primary copy, secondary copy, and arbiter
arbiter participates in elections to select new primary if needed
all write operations applied to the primary copy and propagated to the secondaries
user can choose read preference
read requests can be processed at any replica
sharding
horizontal partitioning divides the documents into disjoint partitions (shards)
allows adding more nodes as needed
shards stored on different nodes to achieve load balancing
partitioning field (shard key) must exist in every document in the collection (must
have an index; use of shard key)
range partitioning
creates chunks by specifying a range of key values
works best with range queries
Hash partitioning
partitioning based on the hash values of each shard key
hash function h(K) to each shard key K to give the shard

NOSQL Key-Value Stores

key-value stores focus on high performance, availability, and scalability
can store structured, unstructured, or semistructured data
key: unique identifier associated with a data item (used for fast retrieval)
value: the data item itself (can be string or array of bytes)
no query language
DynamoDB
DynamoDB part of Amazon’s Web Services/SDK platforms (proprietary)
table holds a collection of self-describing items
item consists of attribute-value pairs (records-tuples)
attribute values can be single or multi-valued
primary key used to locate items within a table
can be single attribute or pair of attributes
the primary key will be a pair of attributes (A, B) :
attribute A will be used for hashing, and because there will be multiple items
with the same value of A ,
the B values will be used for ordering the records with the same A value.
a table with this type of key can have additional secondary indexes defined
examples of other key-value stores
oracle key-value store: oracle NOSQL Database
redis key-value cache and store
caches data in main memory to improve performance
offers master-slave replication and high availability
offers persistence by backing up cache to disk
apache Cassandra (used by Facebook and others)
offers features from several NOSQL categories

NOSQL Graph Databases and Neo4j

graph databases
data represented as a graph
collection of vertices (nodes) and edges
possible to store data associated with both individual nodes and individual edges
Neo4j
open source system
uses concepts of nodes and relationships
nodes can have labels
zero, one, or several
both nodes and relationships can have properties
each relationship has a start node, end node, and a relationship type
properties specified using a map pattern
creating nodes
CREATE command
part of high-level declarative query language Cypher
node label can be specified when node is created
properties are enclosed in curly brackets
path
traversal of part of the graph
typically used as part of a query to specify a pattern
schema optional in Neo4j
indexing and node identifiers
users can create for the collection of nodes that have a particular label
one or more properties can be indexed
Cypher query made up of clauses
result from one clause can be the input to the next clause in the query
Neo4j has a graph visualization interface, so that a subset of the nodes and edges in
a database graph can be displayed as a graph

knowledge is maintained by diegocasmo.

This page was generated by GitHub Pages.

Department of Computer Science: Lab 1: Mysql and Workbench Environment
No ratings yet
Department of Computer Science: Lab 1: Mysql and Workbench Environment
7 pages
SQL Using R
No ratings yet
SQL Using R
30 pages
Module 7
No ratings yet
Module 7
30 pages
Unit 5 NOSQL
No ratings yet
Unit 5 NOSQL
102 pages
Module 5_NoSQL databases
No ratings yet
Module 5_NoSQL databases
33 pages
01 NSQL
No ratings yet
01 NSQL
5 pages
NoSQL (1)
No ratings yet
NoSQL (1)
12 pages
2 - Disadvantages of NoSQL Technology
No ratings yet
2 - Disadvantages of NoSQL Technology
3 pages
No SQL Lecture Notes
No ratings yet
No SQL Lecture Notes
17 pages
What Is Nosql: Features of Nosql Databases
No ratings yet
What Is Nosql: Features of Nosql Databases
11 pages
DBMS-unit 5-Nosql databases
No ratings yet
DBMS-unit 5-Nosql databases
9 pages
NoSQL Big Data Management
No ratings yet
NoSQL Big Data Management
36 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
NOSQL
No ratings yet
NOSQL
25 pages
BIG Data 2
No ratings yet
BIG Data 2
18 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
Nosql Database
No ratings yet
Nosql Database
19 pages
What Is NoSQL
No ratings yet
What Is NoSQL
4 pages
CHAPTER 03: Big Data Technology Landscape
No ratings yet
CHAPTER 03: Big Data Technology Landscape
81 pages
NoSQL_Notes
No ratings yet
NoSQL_Notes
11 pages
NoSQL Database
No ratings yet
NoSQL Database
8 pages
Introduction To: Nosql
No ratings yet
Introduction To: Nosql
27 pages
Non Relational Database-NoSQL
No ratings yet
Non Relational Database-NoSQL
4 pages
Introduction to NoSQL
No ratings yet
Introduction to NoSQL
1 page
NoSQL Tutorial - New
No ratings yet
NoSQL Tutorial - New
10 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
UNIT-III
No ratings yet
UNIT-III
22 pages
Introduction To Nosql: Gabriele Pozzani
No ratings yet
Introduction To Nosql: Gabriele Pozzani
49 pages
No SQL
No ratings yet
No SQL
9 pages
Mongo Nosql
No ratings yet
Mongo Nosql
12 pages
No SQL
No ratings yet
No SQL
10 pages
NO-SQL
No ratings yet
NO-SQL
32 pages
Nosql Database
No ratings yet
Nosql Database
8 pages
NoSQL Group1
No ratings yet
NoSQL Group1
15 pages
Unit 5_230601_174540-1
No ratings yet
Unit 5_230601_174540-1
14 pages
BDA MODULE 3
No ratings yet
BDA MODULE 3
20 pages
41 NoSQL Introduction.pptx
No ratings yet
41 NoSQL Introduction.pptx
18 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
Nosql Database: Abstract
No ratings yet
Nosql Database: Abstract
6 pages
Module 5
No ratings yet
Module 5
31 pages
Bda Unit-5 PDF
No ratings yet
Bda Unit-5 PDF
83 pages
NoSQL DATABSES
No ratings yet
NoSQL DATABSES
12 pages
NOSQL Concept 2
No ratings yet
NOSQL Concept 2
4 pages
Full Stack-Unit-Iii
No ratings yet
Full Stack-Unit-Iii
56 pages
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
No ratings yet
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
44 pages
Unit 3
No ratings yet
Unit 3
10 pages
Unit 5
No ratings yet
Unit 5
27 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
Lecture 3.1.2
No ratings yet
Lecture 3.1.2
47 pages
No SQL - Types, CAP Theorem(4)
No ratings yet
No SQL - Types, CAP Theorem(4)
12 pages
Dbms Presentation
No ratings yet
Dbms Presentation
22 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
Unit 3 NoSQL
No ratings yet
Unit 3 NoSQL
98 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
18 pages
NoSQL Databases
No ratings yet
NoSQL Databases
20 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
Explain The Term Nosql'. Describe Vertical and Horizontal Scaling
No ratings yet
Explain The Term Nosql'. Describe Vertical and Horizontal Scaling
13 pages
Unit 4-1
No ratings yet
Unit 4-1
21 pages
Module-2
No ratings yet
Module-2
100 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
DB Unit-4
No ratings yet
DB Unit-4
15 pages
DBMS MASTER: Become Pro in Database Management System
From Everand
DBMS MASTER: Become Pro in Database Management System
Ummed Singh
No ratings yet
Database_print
No ratings yet
Database_print
36 pages
Unit-3
No ratings yet
Unit-3
20 pages
PRACTICAL EXERCISES AR VR - EX 1 and 2
No ratings yet
PRACTICAL EXERCISES AR VR - EX 1 and 2
7 pages
NFV Infrastructures U5
No ratings yet
NFV Infrastructures U5
5 pages
Peer Client Server U1
No ratings yet
Peer Client Server U1
3 pages
UNIT III_DIS Notes
No ratings yet
UNIT III_DIS Notes
29 pages
QB_Updated
No ratings yet
QB_Updated
11 pages
db2 Part Clust 115
No ratings yet
db2 Part Clust 115
362 pages
Dinesh Verma - MCS-043 Advanced Database Management Systems (2021)
No ratings yet
Dinesh Verma - MCS-043 Advanced Database Management Systems (2021)
368 pages
AMDP BASICS
No ratings yet
AMDP BASICS
2 pages
BigQuery Overview
No ratings yet
BigQuery Overview
2 pages
Altibase 7.1.0 GettingStarted Eng PDF
No ratings yet
Altibase 7.1.0 GettingStarted Eng PDF
84 pages
Dbms Worksheet-3: Name: - Praduman Kumar Section: - 20ITB5 UID: - 20BCS9446
No ratings yet
Dbms Worksheet-3: Name: - Praduman Kumar Section: - 20ITB5 UID: - 20BCS9446
10 pages
Types of Databases
No ratings yet
Types of Databases
14 pages
Order Database
No ratings yet
Order Database
42 pages
Oracle SQL Hints
No ratings yet
Oracle SQL Hints
4 pages
AnalytixLabs - Visualization & Analytics With Excel-VBA, SQL & Tableau
No ratings yet
AnalytixLabs - Visualization & Analytics With Excel-VBA, SQL & Tableau
16 pages
Database
No ratings yet
Database
28 pages
4 Bca Oracle
No ratings yet
4 Bca Oracle
3 pages
Chapter 1 - Databases and Database Users: Prepared For II/IV B.Tech CSE of RVRJCCE
No ratings yet
Chapter 1 - Databases and Database Users: Prepared For II/IV B.Tech CSE of RVRJCCE
7 pages
Lab 10 Fa10
No ratings yet
Lab 10 Fa10
4 pages
A Survey On Mapping Semi-Structured Data and Graph Data To Relational Data
No ratings yet
A Survey On Mapping Semi-Structured Data and Graph Data To Relational Data
38 pages
Porselvan Resume
No ratings yet
Porselvan Resume
6 pages
CSC 221 - Comp Appreciation
No ratings yet
CSC 221 - Comp Appreciation
38 pages
Lab 2
No ratings yet
Lab 2
5 pages
La3 1
No ratings yet
La3 1
3 pages
SQL Test1
No ratings yet
SQL Test1
18 pages
CoreDB - A Data Lake Service
No ratings yet
CoreDB - A Data Lake Service
4 pages
Top 50 DBMS Interview Questions and Answers
No ratings yet
Top 50 DBMS Interview Questions and Answers
10 pages
CSA Practical
No ratings yet
CSA Practical
16 pages
Oracle DB 19C ADMINISTRATION SERIES
No ratings yet
Oracle DB 19C ADMINISTRATION SERIES
9 pages
CLL F045 Ap TRM Eng
No ratings yet
CLL F045 Ap TRM Eng
48 pages
Born To Be Parallel and Beyond - DA015152
No ratings yet
Born To Be Parallel and Beyond - DA015152
15 pages
Introduction To Database: UCT - Mogadishu, Somalia
No ratings yet
Introduction To Database: UCT - Mogadishu, Somalia
11 pages
Abdulbasitkhan BI Lead
No ratings yet
Abdulbasitkhan BI Lead
3 pages