0% found this document useful (0 votes)
5 views

2- NoSQL

The document discusses Big Data and NoSQL databases, explaining their definitions, sources, and challenges. It details the differences between SQL and NoSQL, including the CAP theorem, BASE model, and various types of NoSQL databases such as key-value, column, document, and graph databases. Additionally, it highlights the advantages and disadvantages of NoSQL databases compared to traditional SQL databases.

Uploaded by

bhoothu8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

2- NoSQL

The document discusses Big Data and NoSQL databases, explaining their definitions, sources, and challenges. It details the differences between SQL and NoSQL, including the CAP theorem, BASE model, and various types of NoSQL databases such as key-value, column, document, and graph databases. Additionally, it highlights the advantages and disadvantages of NoSQL databases compared to traditional SQL databases.

Uploaded by

bhoothu8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Assignment No.

01

• What is Big Data? Explain the various sources of Big Data?


• What are the Three V’s of Big Data? Explain?
• List and explain usage of Big Data.
• Explain various challenges in Big Data.
• What are challenges that organizations are facing when
managing big data using legacy systems?
Chapter 2:
NoSQL
UNIT I
SQL (Structured Query Language)

• Structured Query Language is the standard means of manipulating and querying


data in relational databases
• SQL can be used to share and manage data, particularly data that is found in
relational database management systems, which include data organized into tables.
• Multiple files, each containing tables of data, also may be related together by a
common field.
• Using SQL, you can query, update, and reorganize data, as well as create and modify
the schema (structure) of a database system and control access to its data.
Roles of SQL

• Interactive Query Language


• Administrative Language
• Client / Server Model
• Database Programming Language
• SQL is an Internet data access language
• SQL is distributed Database Language
• SQL is OLTP
NoSQL (Not Only SQL)

• NoSQL databases (aka "not only SQL") are non tabular, and store data differently
than relational tables.
• A NoSQL database provides a mechanism for storage and retrieval of data that is
modeled in means other than the tabular relations used in relational databases.
• Such databases have existed since the late 1960s, but the name "NoSQL" was only
coined in the early 21st century
History of NoSQL

• The acronym NoSQL was first used in 1998 by Carlo Strozzi while naming his
lightweight, open-source “relational” database that did not use SQL.
• The name came up again in 2009 when Eric Evans and Johan Oskarsson used it to
describe non-relational databases. Relational databases are often referred to as SQL
systems.
• The term NoSQL can mean either “No SQL systems” or the more commonly
accepted translation of “Not only SQL,” to emphasize the fact some systems might
support SQL-like query languages.
Cont…

• The NoSQL model uses a distributed database system, meaning a system with multiple computers. The
non-relational system is quicker, uses an ad-hoc approach for organizing data, and processes large
amounts of differing kinds of data.
• For general research, NoSQL databases are the better choice for large, unstructured data sets compared
with relational databases due to their speed and flexibility.
• Not only can NoSQL systems handle both structured and unstructured data, but they can also process
unstructured Big Data quickly.
• This led to organizations such as Facebook, Twitter, LinkedIn, and Google adopting NoSQL systems. These
organizations process tremendous amounts of unstructured data, coordinating it to find patterns and gain
business insights.
Need of NoSQL

• When huge amount of data need to be stored and retrieved .

• The relationship between the data you store is not that important

• The data changing over time and is not structured.

• Support of Constraints and Joins is not required at database level

• The data is growing continuously, and you need to scale the database regular to handle the data.

• NoSQL database provides more flexibility when it comes to handling data. There is no requirement
to specify the schema to start working with the application.
Why Use NoSQL?
• The concept of NoSQL databases became popular with
Internet giants like Google, Facebook, Amazon, etc. who
deal with huge volumes of data. The system response
time becomes slow when you use RDBMS for massive
volumes of data.
• To resolve this problem, we could “scale up” our systems
by upgrading our existing hardware. This process is
expensive.
• The alternative for this issue is to distribute database load
on multiple hosts whenever the load increases. This
method is known as “scaling out.”
• NoSQL database is non-relational, so it scales out better
than relational databases as they are designed with web
applications in mind.
CAP Theorem (Brewers Theorem)
• The CAP Theorem, also known as Brewer’s theorem (after its developer, Eric Brewer), is an important part of non-
relational databases. It states that a distributed data store “cannot” simultaneously offer more than “two of three”
established guarantees. Brewer, at the University of California, presented the theory in the fall of 1998, and it was
published in 1999 as the CAP Principle. The three guarantees that cannot be met simultaneously are:

• Consistency: The data within the database remains consistent, even after an operation has been executed. For instance,
after updating a system, all clients will see the same data.

• Availability: The system is constantly on (always available), with no downtime. Availability means that that any client
making a request for data gets a response, even if one or more nodes are down. Another way to state this—all working
nodes in the distributed system return a valid response for any request, without exception.

• Partition Tolerance: Even if communication among the servers is no longer reliable, the system will continue to function.
This is because the servers can be partitioned off, into multiple groups which can’t communicate with each other.
Cont…
• NoSQL (non-relational) databases are ideal for
distributed network applications. Unlike their
vertically scalable SQL (relational) counterparts,
NoSQL databases are horizontally scalable and
distributed by design—they can rapidly scale across
a growing network consisting of multiple
interconnected nodes.
• Today, NoSQL databases are classified based on the
two CAP characteristics they support:
Cont…
• CP database: A CP database delivers consistency and partition
tolerance at the expense of availability. When a partition occurs
between any two nodes, the system has to shut down the non-
consistent node (i.e., make it unavailable) until the partition is
resolved.
• AP database: An AP database delivers availability and partition
tolerance at the expense of consistency. When a partition
occurs, all nodes remain available but those at the wrong end
of a partition might return an older version of data than others.
(When the partition is resolved, the AP databases typically
resync the nodes to repair all inconsistencies in the system.)
• CA database: A CA database delivers consistency and
availability across all nodes. It can’t do this if there is a partition
between any two nodes in the system, however, and therefore
can’t deliver fault tolerance.
The BASE Model

• The BASE acronym is used to describe the properties of


certain databases, usually NoSQL databases. It's often
referred to as the opposite of ACID.
• The definition:
Basically Available, Soft state, Eventual consistency
Cont…

• A BASE system gives up on consistency.


• Basically Available indicates that the system does guarantee
availability, in terms of the CAP theorem.
• Soft state indicates that the state of the system may change over time,
even without input. This is because of the eventual consistency model.
• Eventual consistency indicates that the system will become consistent
over time, given that the system doesn't receive input during that time.
ACID

• One hallmark of relational database systems is something known as ACID compliance.


• ACID is an acronym — the individual letters, meant to describe a characteristic of
individual database transactions, can be expanded as described in this list:
• Atomicity: The database transaction must completely succeed or completely fail.
Partial success is not allowed.
• Consistency: During the database transaction, the RDBMS progresses from one valid
state to another. The state is never invalid.
• Isolation: The client’s database transaction must occur in isolation from other clients
attempting to transact with the RDBMS.
• Durable: Completed transactions persist, even when servers restart. Transaction
failures cannot leave the data in a partially committed state.
State the difference between ACID and BASE
Eventual Consistency Model

• NoSQL databases are eventually consistent, but the eventual consistency implementation may vary
across different NoSQL databases.
• NRW is the notation used to describe how the eventual consistency model which is implemented
across NoSQL databases where
• N is the number of data copies that the database has maintained.
• R is the number of copies that an application needs to refer to before returning a read request’s
output.
• W is the number of data copies that need to be written to before a write operation is marked as
completed successfully.
• Using these notation configurations, the databases implement the model of eventual consistency.
Cont…

• Consistency can be implemented at both read and write operation


levels.
Write Operations
• N=W implies that the write operation will update all data copies
before returning the control to the client and marking the write
operation as successful.
• This is similar to how the traditional RDBMS databases work when
implementing synchronous replication. This setting will slow down
the write performance.
Cont…

• If write performance is a concern, which means you want the writes to be


happening fast, you can set W=1,
• R=N. This implies that the write will just update any one copy and mark
the write as successful, but whenever the user issues a read request, it
will read all the copies to return the result. If either of the copies is not
updated, it will ensure the same is updated, and then only the read will
be successful. This implementation will slow down the read
performance.
• Hence most NoSQL implementations use N>W>1. This implies that
greater than one node needs to be updated successfully; however, not
all nodes need to be updated at the same time.
Cont…

Read Operations
• If R is set to 1, the read operation will read any data copy, which can be
outdated.
• If R>1, more than one copy is read, and it will read most recent value.
However, this can slow down the read operation.
• Using N<W+R always ensures that a read operation retrieves the latest
value. This is because the number of written copies and read copies
are always greater than the actual number of copies, ensuring that at
least one read copy has the latest version. This is quorum assembly.
Categories of
NoSQL Database

• Key-value store database


• Column store database
• Document database
• Graph Database
Key-value store database

• Key-value stores are most basic types of NoSQL databases.


• Designed to handle huge amounts of data.
• Key value stores allow developer to store schema-less data.
• In the key-value storage, database stores data as hash table where each key is unique
and the value can be string, JSON, BLOB (Binary Large OBject) etc.
• A key may be strings, hashes, lists, sets, sorted sets and values are stored against these
keys.
• Key-Value stores can be used as collections, dictionaries, associative arrays etc.
• Key-Value stores follow the 'Availability' and 'Partition' aspects of CAP theorem.
• Key-Values stores would work well for shopping cart contents, or individual values like
color schemes, a landing page URI, or a default account number.
Cont…

• Data is stored in key/value pairs. It is designed in such a way


to handle lots of data and heavy load.
• Key-value pair storage databases store data as a hash table
where each key is unique, and the value can be a JSON,
BLOB(Binary Large Objects), string, etc.
• The value in a key-value store can be anything: a string, a
number, but also an entirely new set of key-value pairs
encapsulated in an object. Figure 6 shows a slightly more
complex key-value structure.
• Examples of key-value stores are Redis, Voldemort, Riak, and
Amazon’s DynamoDB.
Column store database

• Column-oriented databases primarily work on columns and every column is treated individually.
• Values of a single column are stored contiguously.
• Column stores data in column specific files.
• All data within each column datafile have the same type which makes it ideal for compression.
• Column stores can improve the performance of queries as it can access specific column data.
• High performance on aggregation queries (e.g. COUNT, SUM, AVG, MIN, MAX).
• Works on data warehouses and business intelligence, customer relationship management
(CRM), Library card catalogs etc.
• Example of Column-oriented databases : BigTable, Cassandra, SimpleDB etc.
• A column family consists of
multiple rows.
• Each row can contain a different
number of columns to the other
rows. And the columns don’t have to
match the columns in the other rows
(i.e. they can have different column
names, data types, etc).
• Each column is contained to its
row. It doesn’t span all rows like in a
relational database. Each column
contains a name/value pair, along
with a timestamp.
Document database

• A document database is a type of nonrelational database that is designed to store and query
data as JSON-like documents.
• Document databases make it easier for developers to store and query data in a database by
using the same document-model format they use in their application code.
• The flexible, semistructured, and hierarchical nature of documents and document databases
allows them to evolve with applications’ needs. The document model works well with use
cases such as catalogs, user profiles, and content management systems where each
document is unique and evolves over time.
• Document databases enable flexible indexing, powerful ad hoc queries, and analytics over
collections of documents.
Graph Based Database

• In computing, a graph database (GDB) is a database that uses graph


structures for semantic queries with nodes, edges, and properties to represent
and store data. A key concept of the system is
the graph (or edge or relationship).
• The graph relates the data items in the store to a collection of nodes and edges,
the edges representing the relationships between the nodes. The relationships
allow data in the store to be linked together directly and, in many cases,
retrieved with one operation.
• Graph databases hold the relationships between data as a priority. Querying
relationships is fast because they are perpetually stored in the database.
Relationships can be intuitively visualized using graph databases, making them
useful for heavily inter-connected data
Comparison of NoSQL vs SQL Database
NoSQL Advantage

• High scalability: This scaling up approach fails when the transaction rates and fast response
requirements increase. In contrast to this, the new generation of NoSQL databases is
designed to scale out (i.e. to expand horizontally using low-end commodity servers).
• Manageability and administration: NoSQL databases are designed to mostly work with
automated repairs, distributed data, and simpler data models, leading to low manageability
and administration.
• Low cost: NoSQL databases are typically designed to work with a cluster of cheap
commodity servers, enabling the users to store and process more data at a low cost.
• Flexible data models: NoSQL databases have a very flexible data model, enabling them to
work with any type of data; they don’t comply with the rigid RDBMS data models. As a result,
any application changes that involve updating the database schema can be easily
implemented.
NoSQL Disadvantage

• Maturity: Most NoSQL databases are pre-production versions with key features that are
still to be implemented. Thus, when deciding on a NoSQL database, we should analyse the
product properly to ensure the features are fully implemented and not still on the To-do
list.
• Support: Support is one limitation that we need to consider. Most NoSQL databases are
from start-ups which were open sourced. As a result, support is very minimal as compared
to the enterprise software companies and may not have global reach or support resources.
• Limited Query Capabilities: Since NoSQL databases are generally developed to meet
the scaling requirement of the web-scale applications, they provide limited querying
capabilities.
• Expertise: Since NoSQL is an evolving area, expertise on the technology is limited in the
developer and administrator community.

You might also like