0% found this document useful (0 votes)

5 views

2- NoSQL

The document discusses Big Data and NoSQL databases, explaining their definitions, sources, and challenges. It details the differences between SQL and NoSQL, including the CAP theorem, BASE model, and various types of NoSQL databases such as key-value, column, document, and graph databases. Additionally, it highlights the advantages and disadvantages of NoSQL databases compared to traditional SQL databases.

Uploaded by

bhoothu8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

2- NoSQL

Uploaded by

bhoothu8

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Assignment No.

• What is Big Data? Explain the various sources of Big Data?

• What are the Three V’s of Big Data? Explain?
• List and explain usage of Big Data.
• Explain various challenges in Big Data.
• What are challenges that organizations are facing when
managing big data using legacy systems?
Chapter 2:
NoSQL
UNIT I
SQL (Structured Query Language)

• Structured Query Language is the standard means of manipulating and querying

data in relational databases
• SQL can be used to share and manage data, particularly data that is found in
relational database management systems, which include data organized into tables.
• Multiple files, each containing tables of data, also may be related together by a
common field.
• Using SQL, you can query, update, and reorganize data, as well as create and modify
the schema (structure) of a database system and control access to its data.
Roles of SQL

• Interactive Query Language

• Administrative Language
• Client / Server Model
• Database Programming Language
• SQL is an Internet data access language
• SQL is distributed Database Language
• SQL is OLTP
NoSQL (Not Only SQL)

• NoSQL databases (aka "not only SQL") are non tabular, and store data differently
than relational tables.
• A NoSQL database provides a mechanism for storage and retrieval of data that is
modeled in means other than the tabular relations used in relational databases.
• Such databases have existed since the late 1960s, but the name "NoSQL" was only
coined in the early 21st century
History of NoSQL

• The acronym NoSQL was first used in 1998 by Carlo Strozzi while naming his
lightweight, open-source “relational” database that did not use SQL.
• The name came up again in 2009 when Eric Evans and Johan Oskarsson used it to
describe non-relational databases. Relational databases are often referred to as SQL
systems.
• The term NoSQL can mean either “No SQL systems” or the more commonly
accepted translation of “Not only SQL,” to emphasize the fact some systems might
support SQL-like query languages.
Cont…

• The NoSQL model uses a distributed database system, meaning a system with multiple computers. The
non-relational system is quicker, uses an ad-hoc approach for organizing data, and processes large
amounts of differing kinds of data.
• For general research, NoSQL databases are the better choice for large, unstructured data sets compared
with relational databases due to their speed and flexibility.
• Not only can NoSQL systems handle both structured and unstructured data, but they can also process
unstructured Big Data quickly.
• This led to organizations such as Facebook, Twitter, LinkedIn, and Google adopting NoSQL systems. These
organizations process tremendous amounts of unstructured data, coordinating it to find patterns and gain
business insights.
Need of NoSQL

• When huge amount of data need to be stored and retrieved .

• The relationship between the data you store is not that important

• The data changing over time and is not structured.

• Support of Constraints and Joins is not required at database level

• The data is growing continuously, and you need to scale the database regular to handle the data.

• NoSQL database provides more flexibility when it comes to handling data. There is no requirement
to specify the schema to start working with the application.
Why Use NoSQL?
• The concept of NoSQL databases became popular with
Internet giants like Google, Facebook, Amazon, etc. who
deal with huge volumes of data. The system response
time becomes slow when you use RDBMS for massive
volumes of data.
• To resolve this problem, we could “scale up” our systems
by upgrading our existing hardware. This process is
expensive.
• The alternative for this issue is to distribute database load
on multiple hosts whenever the load increases. This
method is known as “scaling out.”
• NoSQL database is non-relational, so it scales out better
than relational databases as they are designed with web
applications in mind.
CAP Theorem (Brewers Theorem)
• The CAP Theorem, also known as Brewer’s theorem (after its developer, Eric Brewer), is an important part of non-
relational databases. It states that a distributed data store “cannot” simultaneously offer more than “two of three”
established guarantees. Brewer, at the University of California, presented the theory in the fall of 1998, and it was
published in 1999 as the CAP Principle. The three guarantees that cannot be met simultaneously are:

• Consistency: The data within the database remains consistent, even after an operation has been executed. For instance,
after updating a system, all clients will see the same data.

• Availability: The system is constantly on (always available), with no downtime. Availability means that that any client
making a request for data gets a response, even if one or more nodes are down. Another way to state this—all working
nodes in the distributed system return a valid response for any request, without exception.

• Partition Tolerance: Even if communication among the servers is no longer reliable, the system will continue to function.
This is because the servers can be partitioned off, into multiple groups which can’t communicate with each other.
Cont…
• NoSQL (non-relational) databases are ideal for
distributed network applications. Unlike their
vertically scalable SQL (relational) counterparts,
NoSQL databases are horizontally scalable and
distributed by design—they can rapidly scale across
a growing network consisting of multiple
interconnected nodes.
• Today, NoSQL databases are classified based on the
two CAP characteristics they support:
Cont…
• CP database: A CP database delivers consistency and partition
tolerance at the expense of availability. When a partition occurs
between any two nodes, the system has to shut down the non-
consistent node (i.e., make it unavailable) until the partition is
resolved.
• AP database: An AP database delivers availability and partition
tolerance at the expense of consistency. When a partition
occurs, all nodes remain available but those at the wrong end
of a partition might return an older version of data than others.
(When the partition is resolved, the AP databases typically
resync the nodes to repair all inconsistencies in the system.)
• CA database: A CA database delivers consistency and
availability across all nodes. It can’t do this if there is a partition
between any two nodes in the system, however, and therefore
can’t deliver fault tolerance.
The BASE Model

• The BASE acronym is used to describe the properties of

certain databases, usually NoSQL databases. It's often
referred to as the opposite of ACID.
• The definition:
Basically Available, Soft state, Eventual consistency
Cont…

• A BASE system gives up on consistency.

• Basically Available indicates that the system does guarantee
availability, in terms of the CAP theorem.
• Soft state indicates that the state of the system may change over time,
even without input. This is because of the eventual consistency model.
• Eventual consistency indicates that the system will become consistent
over time, given that the system doesn't receive input during that time.
ACID

• One hallmark of relational database systems is something known as ACID compliance.

• ACID is an acronym — the individual letters, meant to describe a characteristic of
individual database transactions, can be expanded as described in this list:
• Atomicity: The database transaction must completely succeed or completely fail.
Partial success is not allowed.
• Consistency: During the database transaction, the RDBMS progresses from one valid
state to another. The state is never invalid.
• Isolation: The client’s database transaction must occur in isolation from other clients
attempting to transact with the RDBMS.
• Durable: Completed transactions persist, even when servers restart. Transaction
failures cannot leave the data in a partially committed state.
State the difference between ACID and BASE
Eventual Consistency Model

• NoSQL databases are eventually consistent, but the eventual consistency implementation may vary
across different NoSQL databases.
• NRW is the notation used to describe how the eventual consistency model which is implemented
across NoSQL databases where
• N is the number of data copies that the database has maintained.
• R is the number of copies that an application needs to refer to before returning a read request’s
output.
• W is the number of data copies that need to be written to before a write operation is marked as
completed successfully.
• Using these notation configurations, the databases implement the model of eventual consistency.
Cont…

• Consistency can be implemented at both read and write operation

levels.
Write Operations
• N=W implies that the write operation will update all data copies
before returning the control to the client and marking the write
operation as successful.
• This is similar to how the traditional RDBMS databases work when
implementing synchronous replication. This setting will slow down
the write performance.
Cont…

• If write performance is a concern, which means you want the writes to be

happening fast, you can set W=1,
• R=N. This implies that the write will just update any one copy and mark
the write as successful, but whenever the user issues a read request, it
will read all the copies to return the result. If either of the copies is not
updated, it will ensure the same is updated, and then only the read will
be successful. This implementation will slow down the read
performance.
• Hence most NoSQL implementations use N>W>1. This implies that
greater than one node needs to be updated successfully; however, not
all nodes need to be updated at the same time.
Cont…

Read Operations
• If R is set to 1, the read operation will read any data copy, which can be
outdated.
• If R>1, more than one copy is read, and it will read most recent value.
However, this can slow down the read operation.
• Using N<W+R always ensures that a read operation retrieves the latest
value. This is because the number of written copies and read copies
are always greater than the actual number of copies, ensuring that at
least one read copy has the latest version. This is quorum assembly.
Categories of
NoSQL Database

• Key-value store database

• Column store database
• Document database
• Graph Database
Key-value store database

• Key-value stores are most basic types of NoSQL databases.

• Designed to handle huge amounts of data.
• Key value stores allow developer to store schema-less data.
• In the key-value storage, database stores data as hash table where each key is unique
and the value can be string, JSON, BLOB (Binary Large OBject) etc.
• A key may be strings, hashes, lists, sets, sorted sets and values are stored against these
keys.
• Key-Value stores can be used as collections, dictionaries, associative arrays etc.
• Key-Value stores follow the 'Availability' and 'Partition' aspects of CAP theorem.
• Key-Values stores would work well for shopping cart contents, or individual values like
color schemes, a landing page URI, or a default account number.
Cont…

• Data is stored in key/value pairs. It is designed in such a way

to handle lots of data and heavy load.
• Key-value pair storage databases store data as a hash table
where each key is unique, and the value can be a JSON,
BLOB(Binary Large Objects), string, etc.
• The value in a key-value store can be anything: a string, a
number, but also an entirely new set of key-value pairs
encapsulated in an object. Figure 6 shows a slightly more
complex key-value structure.
• Examples of key-value stores are Redis, Voldemort, Riak, and
Amazon’s DynamoDB.
Column store database

• Column-oriented databases primarily work on columns and every column is treated individually.
• Values of a single column are stored contiguously.
• Column stores data in column specific files.
• All data within each column datafile have the same type which makes it ideal for compression.
• Column stores can improve the performance of queries as it can access specific column data.
• High performance on aggregation queries (e.g. COUNT, SUM, AVG, MIN, MAX).
• Works on data warehouses and business intelligence, customer relationship management
(CRM), Library card catalogs etc.
• Example of Column-oriented databases : BigTable, Cassandra, SimpleDB etc.
• A column family consists of
multiple rows.
• Each row can contain a different
number of columns to the other
rows. And the columns don’t have to
match the columns in the other rows
(i.e. they can have different column
names, data types, etc).
• Each column is contained to its
row. It doesn’t span all rows like in a
relational database. Each column
contains a name/value pair, along
with a timestamp.
Document database

• A document database is a type of nonrelational database that is designed to store and query
data as JSON-like documents.
• Document databases make it easier for developers to store and query data in a database by
using the same document-model format they use in their application code.
• The flexible, semistructured, and hierarchical nature of documents and document databases
allows them to evolve with applications’ needs. The document model works well with use
cases such as catalogs, user profiles, and content management systems where each
document is unique and evolves over time.
• Document databases enable flexible indexing, powerful ad hoc queries, and analytics over
collections of documents.
Graph Based Database

• In computing, a graph database (GDB) is a database that uses graph

structures for semantic queries with nodes, edges, and properties to represent
and store data. A key concept of the system is
the graph (or edge or relationship).
• The graph relates the data items in the store to a collection of nodes and edges,
the edges representing the relationships between the nodes. The relationships
allow data in the store to be linked together directly and, in many cases,
retrieved with one operation.
• Graph databases hold the relationships between data as a priority. Querying
relationships is fast because they are perpetually stored in the database.
Relationships can be intuitively visualized using graph databases, making them
useful for heavily inter-connected data
Comparison of NoSQL vs SQL Database
NoSQL Advantage

• High scalability: This scaling up approach fails when the transaction rates and fast response
requirements increase. In contrast to this, the new generation of NoSQL databases is
designed to scale out (i.e. to expand horizontally using low-end commodity servers).
• Manageability and administration: NoSQL databases are designed to mostly work with
automated repairs, distributed data, and simpler data models, leading to low manageability
and administration.
• Low cost: NoSQL databases are typically designed to work with a cluster of cheap
commodity servers, enabling the users to store and process more data at a low cost.
• Flexible data models: NoSQL databases have a very flexible data model, enabling them to
work with any type of data; they don’t comply with the rigid RDBMS data models. As a result,
any application changes that involve updating the database schema can be easily
implemented.
NoSQL Disadvantage

• Maturity: Most NoSQL databases are pre-production versions with key features that are
still to be implemented. Thus, when deciding on a NoSQL database, we should analyse the
product properly to ensure the features are fully implemented and not still on the To-do
list.
• Support: Support is one limitation that we need to consider. Most NoSQL databases are
from start-ups which were open sourced. As a result, support is very minimal as compared
to the enterprise software companies and may not have global reach or support resources.
• Limited Query Capabilities: Since NoSQL databases are generally developed to meet
the scaling requirement of the web-scale applications, they provide limited querying
capabilities.
• Expertise: Since NoSQL is an evolving area, expertise on the technology is limited in the
developer and administrator community.

EasySTONE 6.7 ENG
No ratings yet
EasySTONE 6.7 ENG
1,482 pages
Pasco Capstone
No ratings yet
Pasco Capstone
6 pages
NOSQL
No ratings yet
NOSQL
23 pages
Lecture 8 Chapter 5 Part 4 Big Data Storage Concepts (4)
No ratings yet
Lecture 8 Chapter 5 Part 4 Big Data Storage Concepts (4)
9 pages
ngd unit 1-4
No ratings yet
ngd unit 1-4
43 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
Introduction To: Nosql
No ratings yet
Introduction To: Nosql
27 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
BDA MODULE 3
No ratings yet
BDA MODULE 3
20 pages
NoSQL_Notes
No ratings yet
NoSQL_Notes
11 pages
NoSQL Database
No ratings yet
NoSQL Database
8 pages
Module 2.3
No ratings yet
Module 2.3
25 pages
RK NoSQL
No ratings yet
RK NoSQL
35 pages
No SQL
No ratings yet
No SQL
109 pages
No SQL
No ratings yet
No SQL
19 pages
Module-2
No ratings yet
Module-2
100 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
Module-2
No ratings yet
Module-2
104 pages
NoSQL Database
No ratings yet
NoSQL Database
64 pages
NoSQL (1)
No ratings yet
NoSQL (1)
12 pages
NoSQL Intro
No ratings yet
NoSQL Intro
26 pages
DBMS Chapter 5
No ratings yet
DBMS Chapter 5
52 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
Unit VI_1
No ratings yet
Unit VI_1
31 pages
Module 5_NoSQL databases
No ratings yet
Module 5_NoSQL databases
33 pages
Introduction to NoSQL
No ratings yet
Introduction to NoSQL
13 pages
Intro to NoSQL DBs
No ratings yet
Intro to NoSQL DBs
44 pages
UNIT 4 CAP MONGODB
No ratings yet
UNIT 4 CAP MONGODB
23 pages
ABDMS-UNIT 2 AND UNIT 5 NOTES
No ratings yet
ABDMS-UNIT 2 AND UNIT 5 NOTES
10 pages
Hbase Hive Pig
No ratings yet
Hbase Hive Pig
144 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
No ratings yet
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
42 pages
Module 2 Notes
No ratings yet
Module 2 Notes
19 pages
BDA CW Chapter 3
No ratings yet
BDA CW Chapter 3
9 pages
Unit 2 Bda Bda
No ratings yet
Unit 2 Bda Bda
29 pages
BDT Unit 4
No ratings yet
BDT Unit 4
93 pages
NoSQL
No ratings yet
NoSQL
13 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
43 pages
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
No ratings yet
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
102 pages
nosql
No ratings yet
nosql
64 pages
nosql-kk
No ratings yet
nosql-kk
23 pages
bda module 3
No ratings yet
bda module 3
35 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
1 P16cse5a-P16ite3a 2020052406393588
No ratings yet
1 P16cse5a-P16ite3a 2020052406393588
24 pages
Big Data Storage and Processing
No ratings yet
Big Data Storage and Processing
49 pages
unit 4 BDA
No ratings yet
unit 4 BDA
22 pages
No SQL Lecture Notes
No ratings yet
No SQL Lecture Notes
17 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
41 NoSQL Introduction.pptx
No ratings yet
41 NoSQL Introduction.pptx
18 pages
Bda - Unit 2
No ratings yet
Bda - Unit 2
30 pages
NoSQL Database
No ratings yet
NoSQL Database
8 pages
SQL Vs Nosql DB
No ratings yet
SQL Vs Nosql DB
26 pages
BDA.Unit-2
No ratings yet
BDA.Unit-2
30 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
Recent Trends - Nosql Database Management
No ratings yet
Recent Trends - Nosql Database Management
26 pages
Nosql Databases: P.Krishna Reddy Iiit Hyderabad
No ratings yet
Nosql Databases: P.Krishna Reddy Iiit Hyderabad
30 pages
CS3492-DBMS unit-5
No ratings yet
CS3492-DBMS unit-5
9 pages
Massively Parallel Cloud Data Storage Systems: S. Sudarshan IIT Bombay
No ratings yet
Massively Parallel Cloud Data Storage Systems: S. Sudarshan IIT Bombay
17 pages
No SQL & RDBMS
No ratings yet
No SQL & RDBMS
39 pages
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
Xi Cs Public Q&A (1)
No ratings yet
Xi Cs Public Q&A (1)
52 pages
Weather in X-Plane
No ratings yet
Weather in X-Plane
4 pages
Unit 1 Final
No ratings yet
Unit 1 Final
79 pages
How To Bypass Email Verification On Setup
100% (1)
How To Bypass Email Verification On Setup
5 pages
Bugreport A632W OP SP1A.210812.016 2023 10 14 18 44 15 Dumpstate - Log 24177
No ratings yet
Bugreport A632W OP SP1A.210812.016 2023 10 14 18 44 15 Dumpstate - Log 24177
17 pages
travel email
No ratings yet
travel email
2 pages
Explore California Test Case
No ratings yet
Explore California Test Case
4 pages
Whereoware The Ultimate Website Checklist
No ratings yet
Whereoware The Ultimate Website Checklist
6 pages
Summary Yasser Resume
No ratings yet
Summary Yasser Resume
4 pages
Blender 2.8 Exporter For MSTS/OR
No ratings yet
Blender 2.8 Exporter For MSTS/OR
12 pages
Designing Software (V2)
No ratings yet
Designing Software (V2)
9 pages
War 3 Log
No ratings yet
War 3 Log
2 pages
Water Quality Monitoring System Using IOT: Suruchi Pokhrel, Anisha Pant, Ritisha Gautam, and Dinesh Baniya Kshatri
No ratings yet
Water Quality Monitoring System Using IOT: Suruchi Pokhrel, Anisha Pant, Ritisha Gautam, and Dinesh Baniya Kshatri
10 pages
Role of Interrupts
No ratings yet
Role of Interrupts
16 pages
Online Bus Reservation System
0% (1)
Online Bus Reservation System
7 pages
E-Stop SIL3 1500F DOC V60 en
No ratings yet
E-Stop SIL3 1500F DOC V60 en
28 pages
Unit 4 Ejb JDBC
No ratings yet
Unit 4 Ejb JDBC
13 pages
Drawing in Game Lab
No ratings yet
Drawing in Game Lab
5 pages
A Guide To Open Source GIS Software For The Public Sector
No ratings yet
A Guide To Open Source GIS Software For The Public Sector
9 pages
Lab 1 DFD - Part1
No ratings yet
Lab 1 DFD - Part1
14 pages
Wcdma Ran W12 Performance Management and Optimization: Performance Statistics OSS-RC Subscription Profiles
No ratings yet
Wcdma Ran W12 Performance Management and Optimization: Performance Statistics OSS-RC Subscription Profiles
15 pages
Smart Window Controller
No ratings yet
Smart Window Controller
15 pages
Fundamental of DB Lab Manual
No ratings yet
Fundamental of DB Lab Manual
48 pages
Sample Apa Education Research Paper
100% (1)
Sample Apa Education Research Paper
7 pages
ERD ML Prediction
No ratings yet
ERD ML Prediction
1 page
Evergreen Cayman Programming Guide
No ratings yet
Evergreen Cayman Programming Guide
54 pages
MM UNIT3
No ratings yet
MM UNIT3
13 pages
Hi 3798 MV200
No ratings yet
Hi 3798 MV200
3 pages

2- NoSQL

Uploaded by

2- NoSQL

Uploaded by

Assignment No.

• What is Big Data? Explain the various sources of Big Data?

• Structured Query Language is the standard means of manipulating and querying

• Interactive Query Language

• When huge amount of data need to be stored and retrieved .

• The data changing over time and is not structured.

• Support of Constraints and Joins is not required at database level

• The BASE acronym is used to describe the properties of

• A BASE system gives up on consistency.

• One hallmark of relational database systems is something known as ACID compliance.

• Consistency can be implemented at both read and write operation

• If write performance is a concern, which means you want the writes to be

• Key-value store database

• Key-value stores are most basic types of NoSQL databases.

• Data is stored in key/value pairs. It is designed in such a way

• In computing, a graph database (GDB) is a database that uses graph

You might also like