0% found this document useful (0 votes)

13 views13 pages

NoSql Module 2 Part 1

The document discusses NoSQL database distribution models, focusing on sharding and replication techniques. Sharding divides data across multiple servers for scalability, while replication copies data across nodes for redundancy, with master-slave and peer-to-peer configurations. Each method has its advantages and complexities, impacting performance, availability, and consistency in data management.

Uploaded by

athulatk6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views13 pages

NoSql Module 2 Part 1

Uploaded by

athulatk6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Module : 2 NoSQL Database 12-07-22

Distribution Models

The primary driver of interest in NoSQL has been its ability to run databases on a
large cluster.

As data volumes increase, it becomes more difficult and expensive to scale up—buy a
bigger server to run the database on.

A more appealing option is to scale out—run the database on a cluster of servers.

Aggregate orientation fits well with scaling out because the aggregate is a natural unit to
use for distribution.

Depending on your distribution model, you can get a data store that will give you the
ability to handle larger quantities of data, the ability to process a greater read or write
traffic, or more availability in the face of network slowdowns or breakages. These are often
important benefits, but they come at a cost.

Running over a cluster introduces complexity—so it’s not something to do unless the
benefits are compelling.

Broadly, there are two paths to data distribution: replication and

sharding. Replication takes the same data and copies it over multiple
nodes.

Sharding puts different data on different nodes.

Replication and sharding are orthogonal techniques: You can use either or both of
them. Replication comes into two forms: master-slave and peer-to-peer.

We will now discuss these techniques starting at the simplest and working up to the more
complex: first single-server, then master-slave replication, then sharding, and finally peer-to
peer replication.

SINGLE-SERVER

What Is A Server Vs Database?

In general, a Server is a high-end network computer that manages connected devices
(“clients”), and it is used to access multiple applications as a central resource, whereas
a Database is a repository for back-end data processing applications.

Is Database A Server?

As defined by the client-server model, a database server is a server that provides database
services to other programs on a computer or to a computer. Querying relational databases
is handled by the same query language, SQL (Structured Query Language).

Is Sql A Database Or Server?

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.
Module : 2 NoSQL Database 12-07-22

A relational database management system, Microsoft SQL Server is designed for

managing databases.

Single Server The first and the simplest distribution option is the one we would most
often recommend—no distribution at all.

Run the database on a single machine that handles all the reads and writes to the data store.

We prefer this option because it eliminates all the complexities that the other options
introduce; it’s easy for operations people to manage and easy for application developers
to reason about.

Although a lot of NoSQL databases are designed around the idea of running on a cluster, it
can make sense to use NoSQL with a single-server distribution model if the data model of
the NoSQL store is more suited to the application.

Graph databases are the obvious category here—these work best in a

single-server configuration.

If your data usage is mostly about processing aggregates, then a single-server document
or key-value store may well be worthwhile because it’s easier on application developers.

A single server database often has a fixed amount of ingest throughput since it runs on
a single machine.

The limits could be I/O or memory, storage capacity, processing power or a combination
of these.

These constraints can become a liability for applications intended to scale.

Sharding

Often, a busy data store is busy because different people are accessing different parts of the
dataset. In these circumstances we can support horizontal scalability(While horizontal
scaling refers to adding additional nodes, vertical scaling describes adding more power to
your current machines. For instance, if your server requires more processing power, vertical
scaling would mean upgrading the CPUs. You can also vertically scale the memory, storage,
or network speed.) by putting different parts of the data onto different servers—a
technique that’s called sharding.
By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.
Module : 2 NoSQL Database 12-07-22

Sharding is a type of database partitioning that separates large databases into smaller, faster,
more easily managed parts. These smaller parts are called data shards. The word shard
means "a small part of a whole."

Horizontal and vertical sharding

Sharding involves splitting and distributing one logical data set across multiple databases
that share nothing and can be deployed across multiple servers. To achieve sharding, the
rows or columns of a larger database table are split into multiple smaller tables.
By:Yojana
KiranKumar,Asst. Professor,Dept of BVoc.
Module : 2 NoSQL Database 12-07-22

Once a logical shard is stored on another node, it is known as a physical shard. One physical
shard can hold multiple logical shards. The shards are autonomous and don't share the same
data or computing resources. That's why they exemplify a shared-nothing architecture. At
the same time, the data in all the shards represents a logical data set.

Sharding can either be horizontal or vertical:

∙ Horizontal sharding. When each new table has the same schema but unique rows,

it is known as horizontal sharding.

In this type of sharding, more machines are added to an existing stack to spread
out the load, increase processing speed and support more traffic. This method
is most effective when queries return a subset of rows that are often grouped
together.

∙ Vertical sharding. When each new table has a schema that is a faithful subset of
the original table's schema, it is known as vertical sharding.

It is effective when queries usually return only a subset of columns of the data.

The following illustrates how new tables look when both horizontal and vertical sharding
are performed on the same original data set.

Original data set

Student ID Name Age Major Hometown

1 Amy 21 Economics Austin

2 Jack 20 History San Francisco

3 Matthew 22 Political Science New York City

4 Priya 19 Biology Gary

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.

Module : 2 NoSQL Database 12-07-22
Student ID Name Age Major Hometown

5 Ahmed 19 Philosophy Boston

Horizontal shards

Shard 1
Student ID Name Age Major Hometown

1 Amy 21 Economics Austin

2 Jack 20 History San Francisco

Shard 2
Student ID Name Age Major Hometown

3 Matthew 22 Political Science New York City

4 Priya 19 Biology Gary

5 Ahmed 19 Philosophy Boston

Vertical Shards

Shard 1

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.

Module : 2 NoSQL Database 12-07-22
Student ID Name Age

1 Amy 21

2 Jack 20

Shard 2 2
Student ID

1
Benefits of sharding

Sharding is common in scalable

2 database architectures.

Major
Shard 3
Student ID
Economics History

1
San Francisco

Hometown Austin

Since shards are smaller, faster and easier to manage, they help boost database scalability,
performance and administration.

Sharding also reduces the transaction cost of the database.

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.

Module : 2 NoSQL Database 12-07-22

Horizontal scaling, which is also known as scaling out, helps create a more flexible
database design, which is especially useful for parallel processing(Parallel processing is a
method in computing of running two or more processors (CPUs) to handle separate parts
of an overall task. Breaking up different parts of a task among multiple processors will
help reduce the amount of time to run a program.). It provides near-limitless scalability for
intense workloads and big data requirements.

With horizontal sharding, users can optimally use all the compute resources
across a cluster for every query.

This sharding method also speeds up query resolution, since each machine has to scan
fewer rows when responding to a query.
Vertical sharding increases RAM or storage capacity and improves central processing
unit (CPU) capacity.

It thus increases the power of a single machine or server.

Sharded databases also offer higher availability and mitigate the impact of outages
because, during an outage, only those portions of an application that rely on the missing
chunks of data become unusable.

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.

Module : 2 NoSQL Database 12-07-22

A sharded database also replicates backup shards to additional nodes to further

minimize damage due to an outage.

In contrast, an application running without sharded databases may be completely

unavailable following an outage.

Another advantage of sharding is that it increases the read/write throughput when

such operations are confined to a single shard.

Master-Slave Replication
With master-slave distribution, you replicate data across multiple
nodes. One node is designated as the master, or primary.

This master is the authoritative source for the data and is usually responsible for
processing any updates to that data.

The other nodes are slaves, or secondaries.

A replication process synchronizes the slaves with the master

Master-slave replication is most helpful for scaling when you have a read-intensive
dataset.

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.

Module : 2 NoSQL Database 12-07-22

You can scale horizontally to handle more read requests by adding more slave nodes and
ensuring that all read requests are routed to the slaves.

You are still, however, limited by the ability of the master to process updates and its ability
to pass those updates on.

Consequently it isn’t such a good scheme for datasets with heavy write traffic,
although offloading the read traffic will help a bit with handling the write load.

A second advantage of master-slave replication is read resilience: Should the master fail, the
slaves can still handle read requests. Again, this is useful if most of your data access is
reads.
The failure of the master does eliminate the ability to handle writes until either the master
is restored or a new master is appointed.

However, having slaves as replicates of the master does speed up recovery after a failure
of the master since a slave can be appointed a new master very quickly. T

he ability to appoint a slave to replace a failed master means that master-slave replication
is useful even if you don’t need to scale out.

All read and write traffic can go to the master while the slave acts as a hot backup. In
this case it’s easiest to think of the system as a single-server store with a hot backup.

You get the convenience of the single-server configuration but with greater resilience—
which is particularly handy if you want to be able to handle server failures gracefully.
Masters can be appointed manually or automatically.

Manual appointing typically means that when you configure your cluster, you configure
one node as the master. With automatic appointment, you create a cluster of nodes and
they elect one of themselves to be the master.

Apart from simpler configuration, automatic appointment means that the cluster can
automatically appoint a new master when a master fails, reducing downtime. In order to
get read resilience, you need to ensure that the read and write paths into your application
are different, so that you can handle a failure in the write path and still read.

This includes such things as putting the reads and writes through separate database
connections—a facility that is not often supported by database interaction libraries. As with
any feature, you cannot be sure you have read resilience without good tests that disable
the writes and check that reads still occur.

Replication comes with some alluring benefits, but it also comes with an inevitable
dark side— inconsistency.

You have the danger that different clients, reading different slaves, will see different values
because the changes haven’t all propagated to the slaves. In the worst case, that can
mean that a client cannot read a write it just made.

Even if you use master-slave replication just for hot backup this can be a concern, because
if the master fails, any updates not passed on to the backup are lost.

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.

Module : 2 NoSQL Database 12-07-22

Peer-to-Peer Replication

Peer-to-peer replication occurs when two or more servers or nodes, each of which can be
a standalone server, replication data changes between each other.

Data can be modified on any of the nodes so in that sense all nodes are equals or peers.

The most common alternative is a master-slave replication topology where all

transactions are processed on the master node and propagated to the slave nodes which
are typically read-only.
Another alternative is replication with writable secondaries which are not peers.
Transactions can be submitted to any node in the cluster but are actually processed by the
primary or master node and the results propagated to the secondary nodes.

With a peer-to-peer replication cluster, you can ride over node failures without losing
access to data. Furthermore, you can easily add nodes to improve your performance.

The biggest complication is, again, consistency. When you can write to two different places,
you run the risk that two people will attempt to update the same record at the same
time—a write-write conflict.

Inconsistencies on read lead to problems but at least they are relatively

transient. Inconsistent writes are forever.

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.

Module : 2 NoSQL Database 12-07-22

Combining Sharding and Replication

Replication and sharding are strategies that can be combined.

If we use both master-slave replication and sharding ,this means that we have
multiple masters, but each data item only has a single master.

Depending on your configuration, you may choose a node to be a master for some data
and slaves for others, or you may dedicate nodes for master or slave duties.
Using peer-to-peer replication and sharding is a common strategy for
column-family databases.

In a scenario like this you might have tens or hundreds of nodes in a cluster with
data sharded over them.

A good starting point for peer-to-peer replication is to have a replication factor of 3, so each
shard is present on three nodes.

Should a node fail, then the shards on that node will be built on the other nodes.

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.

Module : 2 NoSQL Database 12-07-22
Summary:

There are two styles of distributing data:

• Sharding distributes different data across multiple servers, so each server acts as the
single source for a subset of data.

• Replication copies data across multiple servers, so each bit of data can be found in
multiple places. A system may use either or both techniques.

• Replication comes in two forms:

• Master-slave replication makes one node the authoritative copy that handles writes
while slaves synchronize with the master and may handle reads.

• Peer-to-peer replication allows writes to any node; the nodes coordinate to

synchronize their copies of the data.

Master-slave replication reduces the chance of update conflicts but peer-to-peer

replication avoids loading all writes onto a single point of failure.

********

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.

Service Description: Alliance Access 7.1
100% (1)
Service Description: Alliance Access 7.1
58 pages
BDA GTU Study Material Presentations Unit-3 29092021094744AM
No ratings yet
BDA GTU Study Material Presentations Unit-3 29092021094744AM
37 pages
Module 2 Nosql
No ratings yet
Module 2 Nosql
31 pages
Nosql Databases
No ratings yet
Nosql Databases
379 pages
DBMS Module-5 2024 Chap 2
No ratings yet
DBMS Module-5 2024 Chap 2
25 pages
Big Data - No SQL Databases and Related Concepts
100% (1)
Big Data - No SQL Databases and Related Concepts
101 pages
Big Data Analytics Module-3
No ratings yet
Big Data Analytics Module-3
160 pages
Nosql M2-P1-P2
No ratings yet
Nosql M2-P1-P2
75 pages
BIG Data 2
No ratings yet
BIG Data 2
18 pages
Lecture 3 - Principles of NoSQL Databases
No ratings yet
Lecture 3 - Principles of NoSQL Databases
49 pages
Gcru 2 Nosql
No ratings yet
Gcru 2 Nosql
52 pages
Unit 5 NOSQL
No ratings yet
Unit 5 NOSQL
102 pages
Cloud Computing Applications and Paradigms
No ratings yet
Cloud Computing Applications and Paradigms
36 pages
6q9k5yndkd9j-SDE DF400 020 Full Deck
No ratings yet
6q9k5yndkd9j-SDE DF400 020 Full Deck
81 pages
Module 2
No ratings yet
Module 2
40 pages
2 NoSQL Databases Principles
No ratings yet
2 NoSQL Databases Principles
58 pages
NoSQL M1
No ratings yet
NoSQL M1
48 pages
DRKP Module 2 1
No ratings yet
DRKP Module 2 1
77 pages
Lec 3 - Basic Concepts
No ratings yet
Lec 3 - Basic Concepts
32 pages
NoSQL M2
No ratings yet
NoSQL M2
47 pages
No SQL
No ratings yet
No SQL
29 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
Mod5 CH2
No ratings yet
Mod5 CH2
36 pages
Module 2
No ratings yet
Module 2
36 pages
NoSQL Module 2
No ratings yet
NoSQL Module 2
76 pages
Distribution Model
100% (1)
Distribution Model
24 pages
III Sharding Strategies
No ratings yet
III Sharding Strategies
30 pages
Module 5
No ratings yet
Module 5
31 pages
S Harding
No ratings yet
S Harding
17 pages
Big Data Management Basic Principles
No ratings yet
Big Data Management Basic Principles
55 pages
Mathina BDA
No ratings yet
Mathina BDA
11 pages
Nosql Mod2
No ratings yet
Nosql Mod2
25 pages
NoSQL Database
No ratings yet
NoSQL Database
8 pages
Oracle Architectural Components PDF
No ratings yet
Oracle Architectural Components PDF
32 pages
Unit VI Big Data
No ratings yet
Unit VI Big Data
19 pages
Big Data Storage Concepts
No ratings yet
Big Data Storage Concepts
31 pages
0zI2XrFJX5tR CjuECI f5HwGdQkpL8DAkTmwDPyFm3H0eCERMEvG9fH
No ratings yet
0zI2XrFJX5tR CjuECI f5HwGdQkpL8DAkTmwDPyFm3H0eCERMEvG9fH
13 pages
Unit II Nosql Data Management
No ratings yet
Unit II Nosql Data Management
57 pages
777 1651399819 BD Module 5
No ratings yet
777 1651399819 BD Module 5
75 pages
No SQL PDF
No ratings yet
No SQL PDF
24 pages
Computerized Record Manegement System of Kebele 04
89% (9)
Computerized Record Manegement System of Kebele 04
58 pages
22 Distributed
No ratings yet
22 Distributed
6 pages
ER Diagram (Lab - 1)
100% (1)
ER Diagram (Lab - 1)
4 pages
Mongo Nosql
No ratings yet
Mongo Nosql
12 pages
S Harding
No ratings yet
S Harding
7 pages
21 Distributed
No ratings yet
21 Distributed
6 pages
Bda Ia2 Bda
No ratings yet
Bda Ia2 Bda
7 pages
WP SQL To Nosql Architectur Differences Considerations Migration 1+ (6) - 1641371845027
No ratings yet
WP SQL To Nosql Architectur Differences Considerations Migration 1+ (6) - 1641371845027
13 pages
Database Sharding White Paper V1
No ratings yet
Database Sharding White Paper V1
17 pages
NoSQL - Unit2
No ratings yet
NoSQL - Unit2
8 pages
Class 7 - Scaling, Sharding, Consistent Hashing
No ratings yet
Class 7 - Scaling, Sharding, Consistent Hashing
4 pages
NoSQL Databases UNIT-2
No ratings yet
NoSQL Databases UNIT-2
29 pages
NoSQL Notes
No ratings yet
NoSQL Notes
11 pages
Non Relational Database-NoSQL
No ratings yet
Non Relational Database-NoSQL
4 pages
Introduction To Nosql: What Is A Nosql Database Used For?
No ratings yet
Introduction To Nosql: What Is A Nosql Database Used For?
6 pages
CH 2 BDA
No ratings yet
CH 2 BDA
3 pages
BDT Assignment
No ratings yet
BDT Assignment
4 pages
Computer Arts and Technological College
No ratings yet
Computer Arts and Technological College
264 pages
04 Surveys Cattell PDF
No ratings yet
04 Surveys Cattell PDF
16 pages
Cost-Effective Database Scalability Using Database Sharding
No ratings yet
Cost-Effective Database Scalability Using Database Sharding
19 pages
Nosql What Does It Mean
No ratings yet
Nosql What Does It Mean
15 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
16 pages
3.0 User Manual V2.1
0% (1)
3.0 User Manual V2.1
133 pages
WizReport XL 7.0 - UserReference (English)
No ratings yet
WizReport XL 7.0 - UserReference (English)
150 pages
G12 Ict Computer Program PDF
100% (1)
G12 Ict Computer Program PDF
11 pages
Nosql Databases
No ratings yet
Nosql Databases
2 pages
Backup and Recovery
No ratings yet
Backup and Recovery
35 pages
DBMS Report
No ratings yet
DBMS Report
78 pages
Nosql What Does It Mean
No ratings yet
Nosql What Does It Mean
8 pages
EForensics Magazine
100% (1)
EForensics Magazine
162 pages
ASM Pocket PDF
No ratings yet
ASM Pocket PDF
2 pages
Chapter 6 Foundations of Business Intelligence: Databases and Information Management
No ratings yet
Chapter 6 Foundations of Business Intelligence: Databases and Information Management
22 pages
Top 10 Strategies For Oracle Performance Part 3
No ratings yet
Top 10 Strategies For Oracle Performance Part 3
8 pages
Cafe Management System
No ratings yet
Cafe Management System
54 pages
Cad User Manual
No ratings yet
Cad User Manual
95 pages
Appscan Enterprise V9.0.3.X Planning & Installation Guide
No ratings yet
Appscan Enterprise V9.0.3.X Planning & Installation Guide
157 pages
MIS - Lec 11 - FDs-Anomalies
No ratings yet
MIS - Lec 11 - FDs-Anomalies
26 pages
Case Study 2 Maruti Suzuki DBMS
No ratings yet
Case Study 2 Maruti Suzuki DBMS
3 pages
Tek Radius Manual
No ratings yet
Tek Radius Manual
42 pages
18-Computer Science Syllabus
No ratings yet
18-Computer Science Syllabus
5 pages
NoSQL Module 1 Part1
No ratings yet
NoSQL Module 1 Part1
13 pages
Comandos Lotus Domino 7
No ratings yet
Comandos Lotus Domino 7
31 pages
Hue Troubleshooting
No ratings yet
Hue Troubleshooting
35 pages
NoSql Module 2 Part2
No ratings yet
NoSql Module 2 Part2
13 pages
Constraints: Removing Unique, Primary Key, or Check Constraints
No ratings yet
Constraints: Removing Unique, Primary Key, or Check Constraints
3 pages
Ethical Hacking
No ratings yet
Ethical Hacking
7 pages
Ies Live E-Training Trainee Notes Macroflo (Version 6.0)
No ratings yet
Ies Live E-Training Trainee Notes Macroflo (Version 6.0)
10 pages
Naresh
No ratings yet
Naresh
6 pages
Ethical Hacking Questionbank
No ratings yet
Ethical Hacking Questionbank
3 pages
Question Bank Section Leave Blank
No ratings yet
Question Bank Section Leave Blank
6 pages
MIS Assignment
No ratings yet
MIS Assignment
4 pages
Database Connection
No ratings yet
Database Connection
4 pages
Employee Photo Upload
No ratings yet
Employee Photo Upload
4 pages
Arcserve Exchange - Backup - Database - Level - Backup - Fails - With - AE9725 - AW9732 - Exchange - 2007 - Exchange - 2010
No ratings yet
Arcserve Exchange - Backup - Database - Level - Backup - Fails - With - AE9725 - AW9732 - Exchange - 2007 - Exchange - 2010
3 pages
C Lab
No ratings yet
C Lab
1 page
SQL and NoSQL: Building Hybrid Data Solutions for Modern Applications
From Everand
SQL and NoSQL: Building Hybrid Data Solutions for Modern Applications
Robert Johnson
No ratings yet
PrestoDB in Practice: Definitive Reference for Developers and Engineers
From Everand
PrestoDB in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

NoSql Module 2 Part 1

Uploaded by

NoSql Module 2 Part 1

Uploaded by

Module : 2 NoSQL Database 12-07-22

A more appealing option is to scale out—run the database on a cluster of servers.

Broadly, there are two paths to data distribution: replication and

Sharding puts different data on different nodes.

What Is A Server Vs Database?

Is Sql A Database Or Server?

A relational database management system, Microsoft SQL Server is designed for

Graph databases are the obvious category here—these work best in a

These constraints can become a liability for applications intended to scale.

Horizontal and vertical sharding

Sharding can either be horizontal or vertical:

it is known as horizontal sharding.

Original data set

1 Amy 21 Economics Austin

2 Jack 20 History San Francisco

3 Matthew 22 Political Science New York City

4 Priya 19 Biology Gary

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.

5 Ahmed 19 Philosophy Boston

1 Amy 21 Economics Austin

2 Jack 20 History San Francisco

3 Matthew 22 Political Science New York City

5 Ahmed 19 Philosophy Boston

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.

Sharding is common in scalable

Sharding also reduces the transaction cost of the database.

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.

It thus increases the power of a single machine or server.

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.

A sharded database also replicates backup shards to additional nodes to further

In contrast, an application running without sharded databases may be completely

Another advantage of sharding is that it increases the read/write throughput when

The other nodes are slaves, or secondaries.

A replication process synchronizes the slaves with the master

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.

The most common alternative is a master-slave replication topology where all

Inconsistencies on read lead to problems but at least they are relatively

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.

Combining Sharding and Replication

Replication and sharding are strategies that can be combined.

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.

There are two styles of distributing data:

• Replication comes in two forms:

• Peer-to-peer replication allows writes to any node; the nodes coordinate to

Master-slave replication reduces the chance of update conflicts but peer-to-peer

By:Yojana KiranKumar,Asst. Professor,Dept of BVoc.

You might also like