0% found this document useful (0 votes)
32 views15 pages

NOSQL

Uploaded by

slothy332
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views15 pages

NOSQL

Uploaded by

slothy332
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Introduction to NoSQL

NoSQL is a non-relational database that varies from traditional


relational database system. NoSQL database is provided for distributed
data stores where there is a need for large scale of data storing. Most
of the NOSQL databases do not follow ACID properties.

NoSQL is the ideal choice for big data, real-time web apps, online
shopping, online gaming, Internet of things, social networks, and online
advertising applications.

For example, Google and Facebook are collecting terabytes of data


daily for their users. Such databases do not require fixed schema, avoid
join operations, and scale data horizontally.

The most popular NOSQL databases are MongoDB, Apache Cassandra,


Redis, HBase, Splunk, and Neo4j.

Why NOSQL?

Modern NOSQL solutions can tackle applications that require a high


degree of scalability, data distribution, and continuous availability.

They store data in simple straightforward forms that can be easier to


understand than the type of data models used in SQL databases. In
addition, NoSQL databases often allow developers to directly change
the structure of the data

Key Features of NoSQL Database

Some of the main features of the NoSQL Database are listed below:
 Horizontal Scaling: NoSQL Databases can scale horizontally by
adding nodes to share loads. As the data grows the hardware can
be added and scalability features could be preserved for NoSQL.
 Performance: Users can increase the performance of the NoSQL
Database by adding a different server.
 Flexible Schema: NoSQL Databases do not require the same
schema as compared to SQL Databases. The document in the
same collection does not need to have the same set of fields and
data type.
 High Availability: Unlike Relational Databases that use primary
and secondary nodes for fetching data. NoSQL Databases use
master place architecture.

Aggregate Data Models:


Aggregate means a collection of objects that are treated as a unit. In
NoSQL Databases, an aggregate is a collection of data that interact as a
unit. Moreover, these units of data or aggregates of data form the
boundaries for the ACID operations.

Aggregate Data Models in NoSQL make it easier for the Databases to


manage data storage over the clusters as the aggregate data or unit can
now reside on any of the machines. Whenever data is retrieved from
the Database all the data comes along with the Aggregate Data Models
in NoSQL.

Aggregate Data Models in NoSQL don’t support ACID transactions and


sacrifice one of the ACID properties. With the help of Aggregate Data
Models in NoSQL, we can easily perform OLAP operations on the
Database.

We can achieve high efficiency of the Aggregate Data Models in the


NoSQL Database if the data transactions and interactions take place
within the same aggregate.
Types of Aggregate Data Models in NoSQL Databases
NoSQL databases can broadly be categorized in four types.

1. Key-value databases are a simpler type of database where each


item contains keys and values. The key or an ID used to access or
fetch the data of the aggregates corresponding to the key.

In this Aggregate Data Models in NoSQL, the data of the aggregates are
secure and encrypted and can be decrypted with a Key.

Use Cases:

 These Aggregate Data Models in NoSQL Database are used for


storing the user session data.
 Key Value-based Data Models are used for maintaining schema-
less user profiles.
 It is used for storing user preferences and shopping cart data.

In a sense, a key-value store is like a relational database with only


two columns: the key or attribute name (such as "state") and the
value (such as "West Bengal").
Some of the popular key-value databases are Riak, Redis (often
referred to as Data Structure server), Memcached and its
flavors, Berkeley DB, upscaledb (especially suited for embedded
use), Amazon DynamoDB (not open-source)

2. Document Model
The Document Data Model allows access to the parts of aggregates. In
this Aggregate Data Models in NoSQL, the data can be accessed in an
inflexible way. The Database stores and retrieves documents, which can
be XML, JSON, BSON, etc.

These documents are self-describing, hierarchical tree data structures


which can consist of maps, collections, and scalar values. The
documents stored are similar to each other but do not have to be
exactly the same. Document databases store documents in the value
part of the key-value store; Document databases such as MongoDB
provide a rich query language and constructs such as database, indexes
etc allowing for easier transition from relational databases.

Use Cases:

 Document Data Models are widely used in E-Commerce platforms


 It is used for storing data from content management systems.
 Document Data Models are well suited for Blogging and Analytics
platforms.
Some of the popular document databases we have seen
are MongoDB, CouchDB , Terrastore, OrientDB, RavenDB

3. Column – Oriented Data Model


Column-oriented databases are based on columns and every column is
considered individually. The values of a single column are stored
contiguously. Some examples of column-oriented data models are
Cassandra, BigTable, SimpleDB, and HBase.

The data maintained by columns are in the form of column-specific


files. In column-oriented data model, the performance on the
aggregation queries such as COUNT, SUM, AVG<,MIN and MAX is high.
The pictorial representation of the column-oriented data model is
shown below:
In this Aggregate Data Models in NoSQL, the first level of the Column
family contains the keys that act as a row identifier that is used to
select the aggregate data. Whereas the second level values are referred
to as columns.

4. Graph-Based Model
A graph database is a type of database that makes use of graph
structures for semantic queries having nodes, edges, and properties to
display and store data.

Graph databases are mainly used for storing entities and relationships
between entities. An entity is a node with its own properties. A node
can be perceived as an instance of an object in an application. Relations
are also known as edges, and they can have properties as well. Edges
have directional significance; nodes are organized by relationships
which allow you to find interesting patterns between the nodes.

Usually, when we store a graph-like structure in RDBMS, it's for a single


type of relationship ("who is my manager" is a common example).
Adding another relationship to the mix usually means a lot of schema
changes and data movement, which is not the case when we are using
graph databases. Similarly, in relational databases we model the graph
beforehand based on the Traversal we want; if the Traversal changes,
the data will have to change.

In graph databases, traversing the joins or relationships is very fast. The


relationship between nodes is not calculated at query time but is
actually persisted as a relationship. Traversing persisted relationships is
faster than calculating them for every query.

These Aggregate Data Models in NoSQL are widely used for storing the
huge volumes of complex aggregates and multidimensional data having
many interconnections between them.

Use Cases:

 Graph-based Data Models are used in social networking sites to


store interconnections.
 It is used in fraud detection systems.
 This Data Model is also widely used in Networks and IT
operations.

Figure below shows an example of a graph database:


The graph is organized, and the data in it are stored once but
interpreted in different aspects based on the relationships.

There are many graph databases available, such as Neo4J, Infinite


Graph, OrientDB.

Schema-less Databases
A well – defined schema of a database describes the tables, columns,
and data types of the values in the columns of the database.

A database without any schema is known as schema-less database.

Storing data in NoSQL is easier in comparison to data storage in SQL.


This is also true in case of document databases, where there is no
restriction on the type of document we want to store.

In the case of column-family databases, we can store any type of data


under any column according to our requirement. Also in graph
databases, there are no restrictions in adding edges or properties to
nodes. We can add edges and simultaneously define node properties as
per our convenience or requirement.

What are the benefits of using a schemaless database?


 Greater flexibility over data types
By operating without a schema, schemaless databases can store,
retrieve, and query any data type — perfect for big data analytics
and similar operations that are powered by unstructured data.
 No pre-defined database schemas
The lack of schema means that your NoSQL database can accept
any data type — including those that you do not yet use.
 No data truncation
A schemaless database makes almost no changes to your data;
each item is saved in its own document with a partial schema,
leaving the raw information untouched. This means that every
detail is always available and nothing is stripped to match the
current schema. This is particularly valuable if your analytics
needs to change at some point in the future.
 Suitable for real-time analytics functions
With the ability to process unstructured data, applications built on
NoSQL databases are better able to process real-time data, such
as readings and measurements from IoT sensors. Schemaless
databases are also ideal for use with machine learning and
artificial intelligence operations, helping to accelerate automated
actions in your business.
 Enhanced scalability and flexibility
With NoSQL, you can use whichever data model is best suited to
the job. Graph databases allow you to view relationships between
data points, or you can use traditional wide table views with an
exceptionally large number of columns. You can query, report,
and model information however you choose. And as your
requirements grow, you can keep adding nodes to increase
capacity and power.
When a record is saved to a relational database, anything
(particularly metadata) that does not match the schema is
truncated or removed. Deleted at write, these details cannot be
recovered at a later point in time.

Materialized Views

A materialized view is a pre-computed data set derived from a query


specification (the SELECT in the view definition) and stored for later
use. Because the data is pre-computed, querying a materialized view is
faster than executing a query against the base table of the view.

When storing data, the priority for developers and data administrators
is often focused on how the data is stored, as opposed to how it's read.
The chosen storage format is usually closely related to the format of
the data, requirements for managing data size and data integrity, and
the kind of store in use. For example, when using NoSQL document
store, the data is often represented as a series of aggregates, each
containing all of the information for that entity.

However, this can have a negative effect on queries. When a query only
needs a subset of the data from some entities, such as a summary of
orders for several customers without all of the order details, it must
extract all of the data for the relevant entities in order to obtain the
required information.

To support efficient querying, a common solution is to generate, in


advance, a view that materializes the data in a format suited to the
required results set. The Materialized View pattern describes
generating prepopulated views of data in environments where the
source data isn't in a suitable format for querying, where generating a
suitable query is difficult, or where query performance is poor due to
the nature of the data or the data store.

These materialized views, which only contain data required by a query,


allow applications to quickly obtain the information they need. In
addition to joining tables or combining data entities, materialized views
can include the current values of calculated columns or data items, the
results of combining values or executing transformations on the data
items, and values specified as part of the query. A materialized view can
even be optimized for just a single query.

An important point is that a materialized view and the data it contains


is completely disposable because it can be entirely rebuilt from the
source data stores. A materialized view is never updated directly by an
application, and so it's a specialized cache.

When there is a change in the source data while creating view, the view
must be updated to include the new information. This may occur
automatically on an appropriate schedule, or when the system detects
a change in the original data. In other cases, it may be necessary to
regenerate the view manually.

Distribution Models
Data Distribution NoSQL is a new type of database management
system that is fundamentally different from relational databases. This
type of database does not require table of fixed size of columns and
rows. Also this type of database totally avoids joins and support
horizontal scaling.

Aggregate oriented databases make distribution of data easier, since


the distribution mechanism has to move the aggregate and not have to
worry about related data, as all the related data is contained in the
aggregate. There are two styles of distributing data:

 Through sharding – Sharding is one of the major techniques of


data distribution. It is used to distribute various types of data
across multiple servers. Therefore, each server acts as a single
source for a subset of data.

Through replication – Replication is one of the major techniques for


fault tolerance. The idea is to copy data across multiple servers so that
each bit of data can be found in multiple places. Replication comes in
two forms,

 Master-slave replication makes one node the authoritative copy


that handles writes while slaves synchronize with the master and
may handle reads.

 Peer-to-peer replication allows writes to any node; the nodes


coordinate to synchronize their copies of the data.

Master-slave replication reduces the chance of update conflicts but


peer-to-peer replication avoids loading all writes onto a single server
creating a single point of failure. A system may use either or both
techniques. Like Riak database shards the data and also replicates it
based on the replication factor.

Sharding
Sharding involves splitting and distributing one logical data set across
multiple databases that share nothing and can be deployed across
multiple servers. To achieve sharding, the rows or columns of a larger
database table are split into multiple smaller tables.
Database sharding allows for horizontal scaling, which means more
servers can be added to the system as needed. It will enable the system
to handle more data and improve performance.

There are a few things to consider when sharding a database:

1. The data must be divided up in a way that makes sense.


2. The system must handle queries that span multiple servers.
3. The system must be able to recover from failures.

Database Replication
Database replication is the process of copying data from a database on
one server to a database on another server. We can do this for several
reasons, such as to keep a backup of the data in case the original is lost
or corrupted or to allow multiple people to access the same data from
different locations.

There are a few different ways to replicate databases, but the most
common is to use a tool like MySQL replication. This allows the data on
the original server to be copied to the new server without any manual
intervention.

There are several benefits to replicating databases, but the most


important is that it helps keep data safe. If the original database is lost
or corrupted, the replica can be used to restore it. This is especially
important for businesses that rely on their data to function.

Difference:

Replication is just a process of copying the same data across different


sites. In addition, sharding improves both read and write performance,
while replication improves read performance but not write
performance.

You might also like