0% found this document useful (0 votes)
13 views13 pages

NoSql Module 2 Part 1

The document discusses NoSQL database distribution models, focusing on sharding and replication techniques. Sharding divides data across multiple servers for scalability, while replication copies data across nodes for redundancy, with master-slave and peer-to-peer configurations. Each method has its advantages and complexities, impacting performance, availability, and consistency in data management.

Uploaded by

athulatk6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views13 pages

NoSql Module 2 Part 1

The document discusses NoSQL database distribution models, focusing on sharding and replication techniques. Sharding divides data across multiple servers for scalability, while replication copies data across nodes for redundancy, with master-slave and peer-to-peer configurations. Each method has its advantages and complexities, impacting performance, availability, and consistency in data management.

Uploaded by

athulatk6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Module : 2 NoSQL Database 12-07-22

Distribution Models

The primary driver of interest in NoSQL has been its ability to run databases on a
large cluster.

As data volumes increase, it becomes more difficult and expensive to scale up—buy a
bigger server to run the database on.

A more appealing option is to scale out—run the database on a cluster of servers.

Aggregate orientation fits well with scaling out because the aggregate is a natural unit to
use for distribution.

Depending on your distribution model, you can get a data store that will give you the
ability to handle larger quantities of data, the ability to process a greater read or write
traffic, or more availability in the face of network slowdowns or breakages. These are often
important benefits, but they come at a cost.

Running over a cluster introduces complexity—so it’s not something to do unless the
benefits are compelling.

Broadly, there are two paths to data distribution: replication and


sharding. Replication takes the same data and copies it over multiple
nodes.

Sharding puts different data on different nodes.

Replication and sharding are orthogonal techniques: You can use either or both of
them. Replication comes into two forms: master-slave and peer-to-peer.

We will now discuss these techniques starting at the simplest and working up to the more
complex: first single-server, then master-slave replication, then sharding, and finally peer-to
peer replication.

SINGLE-SERVER

What Is A Server Vs Database?


In general, a Server is a high-end network computer that manages connected devices
(“clients”), and it is used to access multiple applications as a central resource, whereas
a Database is a repository for back-end data processing applications.

Is Database A Server?

As defined by the client-server model, a database server is a server that provides database
services to other programs on a computer or to a computer. Querying relational databases
is handled by the same query language, SQL (Structured Query Language).

Is Sql A Database Or Server?


By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.
Module : 2 NoSQL Database 12-07-22

A relational database management system, Microsoft SQL Server is designed for


managing databases.

Single Server The first and the simplest distribution option is the one we would most
often recommend—no distribution at all.

Run the database on a single machine that handles all the reads and writes to the data store.

We prefer this option because it eliminates all the complexities that the other options
introduce; it’s easy for operations people to manage and easy for application developers
to reason about.

Although a lot of NoSQL databases are designed around the idea of running on a cluster, it
can make sense to use NoSQL with a single-server distribution model if the data model of
the NoSQL store is more suited to the application.

Graph databases are the obvious category here—these work best in a


single-server configuration.

If your data usage is mostly about processing aggregates, then a single-server document
or key-value store may well be worthwhile because it’s easier on application developers.

A single server database often has a fixed amount of ingest throughput since it runs on
a single machine.

The limits could be I/O or memory, storage capacity, processing power or a combination
of these.

These constraints can become a liability for applications intended to scale.

Sharding

Often, a busy data store is busy because different people are accessing different parts of the
dataset. In these circumstances we can support horizontal scalability(While horizontal
scaling refers to adding additional nodes, vertical scaling describes adding more power to
your current machines. For instance, if your server requires more processing power, vertical
scaling would mean upgrading the CPUs. You can also vertically scale the memory, storage,
or network speed.) by putting different parts of the data onto different servers—a
technique that’s called sharding.
By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.
Module : 2 NoSQL Database 12-07-22

Sharding is a type of database partitioning that separates large databases into smaller, faster,
more easily managed parts. These smaller parts are called data shards. The word shard
means "a small part of a whole."

Horizontal and vertical sharding

Sharding involves splitting and distributing one logical data set across multiple databases
that share nothing and can be deployed across multiple servers. To achieve sharding, the
rows or columns of a larger database table are split into multiple smaller tables.
By:Yojana
KiranKumar,Asst. Professor,Dept of BVoc.
Module : 2 NoSQL Database 12-07-22

Once a logical shard is stored on another node, it is known as a physical shard. One physical
shard can hold multiple logical shards. The shards are autonomous and don't share the same
data or computing resources. That's why they exemplify a shared-nothing architecture. At
the same time, the data in all the shards represents a logical data set.

Sharding can either be horizontal or vertical:

∙ Horizontal sharding. When each new table has the same schema but unique rows,

it is known as horizontal sharding.

In this type of sharding, more machines are added to an existing stack to spread
out the load, increase processing speed and support more traffic. This method
is most effective when queries return a subset of rows that are often grouped
together.

∙ Vertical sharding. When each new table has a schema that is a faithful subset of
the original table's schema, it is known as vertical sharding.

It is effective when queries usually return only a subset of columns of the data.

The following illustrates how new tables look when both horizontal and vertical sharding
are performed on the same original data set.

Original data set


Student ID Name Age Major Hometown

1 Amy 21 Economics Austin

2 Jack 20 History San Francisco

3 Matthew 22 Political Science New York City

4 Priya 19 Biology Gary

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.


Module : 2 NoSQL Database 12-07-22
Student ID Name Age Major Hometown

5 Ahmed 19 Philosophy Boston

Horizontal shards

Shard 1
Student ID Name Age Major Hometown

1 Amy 21 Economics Austin

2 Jack 20 History San Francisco

Shard 2
Student ID Name Age Major Hometown

3 Matthew 22 Political Science New York City


4 Priya 19 Biology Gary

5 Ahmed 19 Philosophy Boston

Vertical Shards

Shard 1

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.


Module : 2 NoSQL Database 12-07-22
Student ID Name Age

1 Amy 21

2 Jack 20

Shard 2 2
Student ID

1
Benefits of sharding

Sharding is common in scalable


2 database architectures.

Major
Shard 3
Student ID
Economics History

1
San Francisco

Hometown Austin

Since shards are smaller, faster and easier to manage, they help boost database scalability,
performance and administration.

Sharding also reduces the transaction cost of the database.

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.


Module : 2 NoSQL Database 12-07-22

Horizontal scaling, which is also known as scaling out, helps create a more flexible
database design, which is especially useful for parallel processing(Parallel processing is a
method in computing of running two or more processors (CPUs) to handle separate parts
of an overall task. Breaking up different parts of a task among multiple processors will
help reduce the amount of time to run a program.). It provides near-limitless scalability for
intense workloads and big data requirements.

With horizontal sharding, users can optimally use all the compute resources
across a cluster for every query.

This sharding method also speeds up query resolution, since each machine has to scan
fewer rows when responding to a query.
Vertical sharding increases RAM or storage capacity and improves central processing
unit (CPU) capacity.

It thus increases the power of a single machine or server.

Sharded databases also offer higher availability and mitigate the impact of outages
because, during an outage, only those portions of an application that rely on the missing
chunks of data become unusable.

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.


Module : 2 NoSQL Database 12-07-22

A sharded database also replicates backup shards to additional nodes to further


minimize damage due to an outage.

In contrast, an application running without sharded databases may be completely


unavailable following an outage.

Another advantage of sharding is that it increases the read/write throughput when


such operations are confined to a single shard.

Master-Slave Replication
With master-slave distribution, you replicate data across multiple
nodes. One node is designated as the master, or primary.

This master is the authoritative source for the data and is usually responsible for
processing any updates to that data.

The other nodes are slaves, or secondaries.

A replication process synchronizes the slaves with the master

Master-slave replication is most helpful for scaling when you have a read-intensive
dataset.

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.


Module : 2 NoSQL Database 12-07-22

You can scale horizontally to handle more read requests by adding more slave nodes and
ensuring that all read requests are routed to the slaves.

You are still, however, limited by the ability of the master to process updates and its ability
to pass those updates on.

Consequently it isn’t such a good scheme for datasets with heavy write traffic,
although offloading the read traffic will help a bit with handling the write load.

A second advantage of master-slave replication is read resilience: Should the master fail, the
slaves can still handle read requests. Again, this is useful if most of your data access is
reads.
The failure of the master does eliminate the ability to handle writes until either the master
is restored or a new master is appointed.

However, having slaves as replicates of the master does speed up recovery after a failure
of the master since a slave can be appointed a new master very quickly. T

he ability to appoint a slave to replace a failed master means that master-slave replication
is useful even if you don’t need to scale out.

All read and write traffic can go to the master while the slave acts as a hot backup. In
this case it’s easiest to think of the system as a single-server store with a hot backup.

You get the convenience of the single-server configuration but with greater resilience—
which is particularly handy if you want to be able to handle server failures gracefully.
Masters can be appointed manually or automatically.

Manual appointing typically means that when you configure your cluster, you configure
one node as the master. With automatic appointment, you create a cluster of nodes and
they elect one of themselves to be the master.

Apart from simpler configuration, automatic appointment means that the cluster can
automatically appoint a new master when a master fails, reducing downtime. In order to
get read resilience, you need to ensure that the read and write paths into your application
are different, so that you can handle a failure in the write path and still read.

This includes such things as putting the reads and writes through separate database
connections—a facility that is not often supported by database interaction libraries. As with
any feature, you cannot be sure you have read resilience without good tests that disable
the writes and check that reads still occur.

Replication comes with some alluring benefits, but it also comes with an inevitable
dark side— inconsistency.

You have the danger that different clients, reading different slaves, will see different values
because the changes haven’t all propagated to the slaves. In the worst case, that can
mean that a client cannot read a write it just made.

Even if you use master-slave replication just for hot backup this can be a concern, because
if the master fails, any updates not passed on to the backup are lost.

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.


Module : 2 NoSQL Database 12-07-22

Peer-to-Peer Replication

Peer-to-peer replication occurs when two or more servers or nodes, each of which can be
a standalone server, replication data changes between each other.

Data can be modified on any of the nodes so in that sense all nodes are equals or peers.

The most common alternative is a master-slave replication topology where all


transactions are processed on the master node and propagated to the slave nodes which
are typically read-only.
Another alternative is replication with writable secondaries which are not peers.
Transactions can be submitted to any node in the cluster but are actually processed by the
primary or master node and the results propagated to the secondary nodes.

With a peer-to-peer replication cluster, you can ride over node failures without losing
access to data. Furthermore, you can easily add nodes to improve your performance.

The biggest complication is, again, consistency. When you can write to two different places,
you run the risk that two people will attempt to update the same record at the same
time—a write-write conflict.

Inconsistencies on read lead to problems but at least they are relatively


transient. Inconsistent writes are forever.

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.


Module : 2 NoSQL Database 12-07-22

Combining Sharding and Replication

Replication and sharding are strategies that can be combined.

If we use both master-slave replication and sharding ,this means that we have
multiple masters, but each data item only has a single master.

Depending on your configuration, you may choose a node to be a master for some data
and slaves for others, or you may dedicate nodes for master or slave duties.
Using peer-to-peer replication and sharding is a common strategy for
column-family databases.

In a scenario like this you might have tens or hundreds of nodes in a cluster with
data sharded over them.

A good starting point for peer-to-peer replication is to have a replication factor of 3, so each
shard is present on three nodes.

Should a node fail, then the shards on that node will be built on the other nodes.

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.


Module : 2 NoSQL Database 12-07-22
Summary:

There are two styles of distributing data:

• Sharding distributes different data across multiple servers, so each server acts as the
single source for a subset of data.

• Replication copies data across multiple servers, so each bit of data can be found in
multiple places. A system may use either or both techniques.

• Replication comes in two forms:

• Master-slave replication makes one node the authoritative copy that handles writes
while slaves synchronize with the master and may handle reads.

• Peer-to-peer replication allows writes to any node; the nodes coordinate to


synchronize their copies of the data.

Master-slave replication reduces the chance of update conflicts but peer-to-peer


replication avoids loading all writes onto a single point of failure.

********

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.

You might also like