0% found this document useful (0 votes)
71 views

No SQL

The document discusses different types of NoSQL databases and why they emerged as an alternative to relational databases. It covers key reasons like handling large datasets, scaling to clusters of machines, and impedance mismatch between object models and relational models. It also discusses some example NoSQL databases like those created by Google and Amazon.

Uploaded by

HELLO World
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views

No SQL

The document discusses different types of NoSQL databases and why they emerged as an alternative to relational databases. It covers key reasons like handling large datasets, scaling to clusters of machines, and impedance mismatch between object models and relational models. It also discusses some example NoSQL databases like those created by Google and Amazon.

Uploaded by

HELLO World
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 36

Types of NoSQL Databases

Introduction
• It’s born out of a need to handle larger data
volumes which forced a fundamental shift to
building large hardware platforms through
clusters of commodity servers.
• Advocates of NoSQL databases claim that they
can build systems that are more performant,
scale much better, and are easier to program
with.
Why Are NoSQL Databases Interesting?
• Application development productivity. A lot
of application development effort is spent on
mapping data between in-memory data
structures and a relational database.
• A NoSQL database may provide a data model
that better fits the application’s needs, thus
simplifying that interaction and resulting in
less code to write, debug, and evolve.
Cont’d
• Large-scale data. Organizations are finding it valuable
to capture more data and process it more quickly.
• They are finding it expensive, if even possible, to do so
with relational databases.
• The primary reason is that a relational database is
designed to run on a single machine, but it is usually
more economic to run large data and computing loads
on clusters of many smaller and cheaper machines.
• Many NoSQL databases are designed explicitly to run
on clusters, so they make a better fit for big data
scenarios.
The Value of Relational Databases
• Getting at Persistent Data – provide a “backing” store
for volatile memory
– Two areas of memory:
• Fast, small, volatile main memory
• Larger, slower, non volatile backing store
• Since main memory is volatile to keep data around, we
write it to a backing store, commonly seen a disk which
can be persistent memory.
The backing store can be: • File system • Database
The database allows more flexibility than a file system in
storing large amounts of data in a way that allows an
application program to get information quickly and easily.
Concurrency
• Multiple applications accessing shared data
– Transactions
• Enterprise applications tend to have many people using
same data at once, possibly modifying that data.
• We have to worry about coordinating interactions
between them to avoid things like double booking of
hotel rooms
• Since enterprise applications can have lots of users and
other systems all working concurrently, there’s a lot of
room for bad things to happen.
• Relational databases help to handle this by controlling
all access to their data through transactions..
Integration
• Enterprise requires multiple applications, written by
different teams, to collaborate in order to get things done.
• Applications often need to use the same data and updates
made through one application have to be visible to others.
• A common way to do this is shared database integration
where multiple applications store their data in a single
database.
• Using a single database allows all the applications to use
each others’ data easily, while the database’s concurrency
control handles multiple applications in the same way as it
handles multiple users in a single application.
Impedance Mismatch
• Impedance mismatch is a term used in computer science to
describe the problem that arises when two systems or
components that are supposed to work together have
different data models, structures, or interfaces that make
communication difficult or inefficient.
• In the context of databases, impedance mismatch refers to the
discrepancy between the object-oriented programming (OOP)
model used in application code and the relational model used
in database management systems (DBMS).
• While OOP models are designed to represent data as objects
with properties and methods, relational models represent data
as tables with columns and rows.
• This impedance mismatch can create challenges when it comes
to mapping objects in code to tables in a database or vice
versa.
Impedance Mismatch
• The difference between the relational model
and the in-memory data structures.
• The relational data model organizes data into
a structure of tables.
– Where a tuple is a set of name-value pairs and a
relation is a set of tuples.
• Structure and relationships have to be
mapped
– Rich, in-memory structures have to be translated
to relational representation to be stored on disk
– Translation: impedance mismatch
Cont’d
Cont’d
• Impedance mismatch has been made much
easier to deal with by the wide availability of
object relational mapping frameworks.
• Impedance mismatch has been made much
easier to deal with by the wide availability of
object relational mapping frameworks, such as
Hibernate and iBATIS that implement well-
known mapping patterns but the mapping
problem is still an issue.
Application and Integration Databases
• Data integration is the process of taking data from
different sources and formats and combining it into a
single data set.
• Integration database - with multiple applications, usually
developed by separate teams, storing their data in a
common database.
• This improves communication because all the applications
are operating on a consistent set of persistent data.
Or
• An integration database is a database which acts as the
data store for multiple applications, and thus integrates
data across these applications .
Cont’d
Cont’d
Integrate many applications becomes (dramatically)
more complexthan any single application needs
−Changes to the data model must be
coordinated
−Different structural and performance needs for
different applications
−Database integrity becomes an issue
Instead, treat the database as an application
database
−Single application, single development team
−Provide alternate integration mechanisms
Cont’d
• Data integration platforms are an efficient
approach to data utilization and storage.

• Rather than replicating data across locations


or environments, the integration database
serves as a single source of truth.
Alternate Integration Mechanism: Services
During the 2000s we saw a distinct shift to web
services where applications would communicate over
HTTP.
More recent push to use Web Services where applications
integrate over HTTP communications
−XML-RPC, SOAP, REST
∙Results in more flexibility for exchange data structure
−XML, JSON, etc.
−Text-based protocols
∙Results in letting application developers choose database
−Application databases
−Relational databases are often still an appropriate
choice
Application Database
• Application Database for a database that is
controlled and accessed by a single application.
• With an application database, only the team using
the application needs to know about the database
structure, which makes it much easier to maintain
and evolve the schema.
• Since the application team controls both the
database and the application code, the
responsibility for database integrity can be put in
the application code.
The Attack of the Clusters
The 2000s saw the web grow enormously
−Web use tracking data, social networks, activity logs,
mapping data, etc.
−Huge websites serving huge numbers of visitors
∙To handle the increase in data and traffic required more
computing resources
∙Instead of building bigger machines with more
processors, storage, and memory, use clusters of small,
commodity machines
−Cheaper, more resilient
∙But relational databases are not designed to be run on
clusters
Cont’d
• Coping with the increase in data and traffic required
more computing resources.
• To handle this kind of increase, you have two choices:
• 1. Scaling up implies:
– bigger machines
– more processors
– more disk storage
– more memory
• Scaling up disadvantages:
– But bigger machines get more and more expensive.
– There are real limits as size increases.
Cont’d
• Use lots of small machines in a cluster:
– A cluster of small machines can use commodity
hardware and ends up being cheaper at these
kinds of scales.
– It can also be more resilient—while individual
machine failures are common, the overall cluster
can be built to keep going despite such failures,
providing high reliability.
Clustered Relational Databases
• Relational databases are not designed to be run on
Clusters.
• Clustered relational databases, such as the Oracle RAC
or Microsoft SQL Server, work on the concept of a
shared disk subsystem where cluster still has the disk
subsystem as a single point of failure.
• Relational databases could also be run as separate
servers for different sets of data, effectively sharding
the database.
• Even though this separates the load, all the sharding
has to be controlled by the application which has to
keep track of which database server to talk to for each
bit of data.
Cont’d
• We lose any querying, referential integrity, transactions,
or consistency controls that cross shards.
• Commercial relational databases (licensed) are usually
priced on a single-server assumption, so running on a
cluster raised prices.
• This mismatch between relational databases and
clusters led some organization to consider an alternative
route to data storage. Two companies in particular
– 1. Google
– 2.Amazon
• Both were running large clusters
• They were capturing huge amounts of data
The Emergence of NoSQL
• Historical note: ‘NoSQL’ was first used to name an open-
source relational database development led by Carlo
Strozzi.
• Current use of the phrase came from a conference meet
up discussing “open-source, distributed, nonrelational
databases.
• The name NoSQL comes from the fact that the NoSQL
databases doesn’t use SQL as a query language.
• Instead, the database is manipulated through shell
scripts that can be combined into the usual UNIX
pipelines.
Cont’d
• Most NoSQL databases are driven by the need to run on
clusters.
• Relational databases use ACID transactions to handle
consistency across the whole database.
• This inherently clashes with a cluster environment, so
NoSQL databases offer a range of options for consistency
and distribution.
• Not all NoSQL databases are strongly oriented towards
running on clusters.
• Graph databases are one style of NoSQL databases that
uses a distribution model similar to relational databases
but offers a different data model that makes it better at
handling data with complex relationships.
Cont’d
• NoSQL databases operate without a schema,
allowing you to freely add fields to database
records without having to define any changes
in structure first.
• Two primary reasons for considering NoSQL:
– 1) To handle data access with sizes and
performance that demand a cluster
– 2) To improve the productivity of application
development by using a more convenient data
interaction style.
Cont’d
• A NoSQL is a database that provides a
mechanism for storage and retrieval of data,
they are used in real-time web applications
and big data and their use are increasing over
time.
• Many NoSQL stores compromise consistency
in favor of availability, speed and partition
tolerance.
Advantages of NoSQL
• 1. High Scalability
– NoSQL databases use sharding for horizontal
scaling.
– It can handle huge amount of data because of
scalability, as the data grows NoSQL scale itself to
handle that data in efficient manner.
• 2. High Availability
– Auto replication feature in NoSQL databases
makes it highly available.
Disadvantages of NoSQL
1. Narrow Focus: It is mainly designed for storage, but it
provides very little functionality.
2. Open Source: NoSQL is open-source database that is two
database systems are likely to be unequal.
3. Management Challenge: Big data management in NoSQL
is much more complex than a relational database.
4. GUI is not available: GUI mode tools to access the
database is not flexibly available in the market.
5. Backup: it is a great weak point for some NoSQL
databases like MongoDB.
6. Large Document size: Data in JSON format increases the
document size.
When should NoSQL be used
• When huge amount of data need to be stored and
retrieved.
• The relationship between data you store is not
that important.
• The data changing over time and is not structured.
• Support of constraint and joins is not required at
database level.
• The data is growing continuously and you need to
scale the database regular to handle the data.
Characteristics of NoSQL Databases
They do not use SQL and the relational model
• Some do have query languages which are similar to SQL to
be easy to learn and use.
∙ Mostly open-source projects
∙Designed to be distributed –clustered
−No expectation of ACID properties
−Range of options for consistency and distribution
∙Schema free
−Freely add fields to records without having to define any
changes in structure first
−Non-uniform data and custom fields
∙A no Definition of NoSQL: An ill-defined set of mostly open-
source databases, mostly developed in the early 21stcentury, and
mostly not using SQL
Polyglot Persistence
• Polyglot persistence is a conceptual term that refers to the use of
different data storage approaches and technologies to support the
unique storage requirements of various data types that live within
enterprise applications.

• Polyglot persistence refers to using different data storage technologies


to handle varying data storage needs.

• Polyglot Persistence is a fancy term to mean that when storing data, it is


best to use multiple data storage technologies, chosen based upon the
way data is being used by individual applications or components of a
single application.

• Different kinds of data are best dealt with different data stores. In
short, it means picking the right tool for the right use case.
Example
• Looking at a Polyglot Persistence example, an e-
commerce platform will deal with many types
of data (i.e. shopping cart, inventory, completed
orders, etc). Instead of trying to store all this
data in one database, which would require a lot
of data conversion to make the format of the
data all the same, store the data in the
database best suited for that type of data. So
the e-commerce platform might look like this:
Cont’d
Cont’d
Cont’d

You might also like