0% found this document useful (0 votes)

24 views10 pages

NoSQL Databases MongoDB

The document discusses a proposed approach to improve load balancing in distributed storage systems for NoSQL databases, specifically MongoDB. It highlights the challenges of data skewness and the need for efficient data distribution, suggesting a method that partitions data into smaller chunks for independent relocation. The proposed method aims to enhance shard utilization and reduce chunk migration, demonstrating better performance compared to existing load balancing techniques.

Uploaded by

lalitsolanki7475

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views10 pages

NoSQL Databases MongoDB

Uploaded by

lalitsolanki7475

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

An Approach to Improve Load Balancing

in Distributed Storage Systems for NoSQL

Databases: MongoDB

Sudhakar and Shivendra Kumar Pandey

Abstract The ongoing process of heterogeneous data generation needs a better

NoSQL database system to accommodate it. NoSQL database stores data in the
distributed manner in their globally deployed shards. The data stored in these
databases should have high availability, and the system should not compromise
with the scalability and partition tolerance. The distributed storage systems have the
main challenge to address the skewness in the data. The process of distribution of
data items over the nodes in the system causes skewness of data. To address this
problem, we propose a different approach to balance load in the distributed envi-
ronment is the partitioning of data into small chunks that can be relocated
independently.

Keywords NoSQL ⋅ Data load balancing ⋅ MongoDB ⋅ Chunk migration

Big data

1 Introduction

Recent developments in size of data have heightened the need for storing world
digital data exceeds the limit of a zettabyte (i.e., 1021 bytes); it is a challenge as well
as necessity to develop a powerful and efﬁcient system that has the capacity to
accommodate data. For example, a system using a very dense storage medium like
deoxyribonucleic acid (DNA). DNA can encode two bits per nucleotide (NT) or
455 exabytes per gram of single-stranded DNA [1]. Taken into account the fact that

Sudhakar (✉)
Indian Computer Emergency Response Team, Ministry of Electronics
& Information Technology, New Delhi, India
e-mail: [email protected]
Sudhakar ⋅ S. K. Pandey
School of Computer & Systems Sciences, Jawaharlal Nehru University,
New Delhi, India
e-mail: [email protected]

© Springer Nature Singapore Pte Ltd. 2018 529

P. K. Pattnaik et al. (eds.), Progress in Computing, Analytics and Networking,
Advances in Intelligent Systems and Computing 710,
https://doi.org/10.1007/978-981-10-7871-2_51
530 Sudhakar and S. K. Pandey

ton of genetic material is prerequisite by DNA as to store zettabyte of information.

The argument can simply be made that data storage needs to be distributed for
quantitative reasons alone.
For this, the distributed storage systems need to be available and scalable when
needed. This amount of storage will only be possible if it is distributed geo-
graphically and should exhibit the property of being accessible by millions of users
besides its raw storage capacity.
Web giants like Google, Amazon, Facebook, and LinkedIn are the industries that
use distributed storage systems. To fulfill their requirements, they have deployed
thousands of data centers globally so that they can make data available all the time
with scalability at any level. Furthermore, in the case of failure that occurs in scaling
process is either the failure of the software or the hardware components. Therefore,
these failures need to be handled during the planning and implementation phases.
In addition to this, the traditional distributed large-scale storage systems are
inefficient to store the enormous heterogeneous data (structured, semi-structured, or
unstructured) [2, 3] as they do not satisfy the 7 Vs (volume, variety, velocity,
veracity, validity, volatility, and value) [4]. The way it is accessed and stored in
traditional databases may pose many limitations. These new horizons of the data
bring big data [5] into reality. Therefore, we require more powerful and efficient
solutions to process big data.
Big data requires a new kind of database system to handle heterogeneous data
sets. One of the most popular and well-known databases for big data is NoSQL
(Not Only SQL) that has the capability to process big data for mining valued
information. As a result, many industries have developed various NoSQL databases
depending on their requirements such as Facebook’s Cassandra [3], Amazon’s
Dynamo [6], Yahoo! Pnuts [7], Google’s BigTable [8] or Riak, and MongoDB [9].
These systems scale very well over the trade-offs of consistency, availability, and
partition tolerance as mentioned in the CAP theorem [10]. Each one of them is
using the shards to store their data. Since deploying shards globally is the main
challenge for big data to handle as it suffers from load balancing problem.
The load balancing techniques used in NoSQL systems do not consider chunk
migration as performance indicator rather they prefer high availability and partition
tolerance as their key indicator. In this work, we aim to bridge this gap by
proposing an improvement over existing load balancing techniques by taking into
account shard utilization and the migration of chunks to increase the efficiency of
NoSQL databases, considering the particular NoSQL, e.g., MongoDB. Here, in this
work, we have proposed a new approach to improve load balancing for MongoDB
and compared our results with automatic load balancing algorithm of MongoDB,
which clearly shows the better performance of our approach without affecting
memory utilization of the shards. The main contribution of this article is:
• A naive and holistic approach is proposed for load balancing for MongoDB
based NoSQL systems.
An Approach to Improve Load Balancing in Distributed Storage … 531

• Our proposed method is computationally cost-effective as the number of chunk

migrations among shards are less compared to the traditional load balancing
algorithms for MongoDB.
• Consistent performance is achieved by the proposed method irrespective of
number and size of the shards.
The rest of the article is organized as follows: Sect. 2 comprises literature survey
of NoSQL. Section 3 explains the maintenance and structure of the data being
stored in the shards in MongoDB. In Sect. 4, we present our proposed approach for
load balancing in MongoDB. Section 4.3 presents experiments and evaluation.
Finally, Sect. 5 concludes the paper with some of the future research directions.

2 Related Work

Facebook’s Cassandra [3, 11] is a distributed database designed to store structured

data in a key-value pair and indexed by a key. Cassandra is highly scalable from
both perspectives; one is that of storage and the second is that of request throughput
while preventing from single point failure. Additionally, Cassandra’s store’s data in
the form of tables that is very similar to the distributed multi-dimensional maps is
also indexed by a key. It belongs to the column family like BigTable [8]. In a single
row key, it provides atomic per-replica operation. In Cassandra, consistent hashing
[12] is used for the notion of data partitioning to fulﬁll the purpose of mapping keys
to nodes in a similar manner like Chord distributed hash table (DHT). Partitioned
data is stored in a Cassandra cluster that would contain the moving nodes on the
ring. To facilitate the load balancing, it uses DHT on its keys.
Amazon’s Dynamo [6] is distributed key-value store database. It mainly focuses
on scalability and availability rather than consistency. To address the problem of
non-uniformity of node distribution in the ring, it uses the concept of virtual nodes
(vnodes). Also, it follows a different strategy for partition-to-vnode assignment
which results into the better distribution of load across vnodes and therefore over
the physical nodes.
Scatter [13] unlike Amazon’s Dynamo, it is a distributed consistent key-value
store database, which is highly decentralized. For data storing, it uses uniform key
distribution through consistent hashing typical for DHTs. Scatter uses two policies
for load balancing. In the ﬁrst policy, the newly joined node in the system is
directed to randomly sample k groups, and then it joins the one handling a large
number of operations. In the second policy, based on load distribution, neighboring
groups can trade-off their responsibility ranges.
MongoDB [9, 14] is schema-free document-oriented database written in C++.
MongoDB uses replication to provide data availability and sharding to provide
partition tolerance to manage data across the distributed environment. It stores data
in the form of chunks. To manage even distribution of chunks across all the servers
in the cluster, a balancer is used in this system. Whenever balancer detects an
532 Sudhakar and S. K. Pandey

uneven chunk count event (i.e., chunk difference between minimal loaded and
maximal loaded shards is greater than or equal to 8), it redistributes the chunks
among shards until the load difference between any two shards is less than or equal
to two [15].

3 Maintenance of Load in MongoDB

The basic concept of automatic load balancing of MongoDB is breaking up the

larger collections into smaller chunks and distribute evenly over all available shards
so that each subset of the data set belongs to one shard. The criteria of partitioning
the collection in the database of MongoDB are that specify a shard key pattern for
the chunk in two more parameters, minkey (minimum size of the chunk) and
maxkey (maximum size of the chunk) [14]. We can say that chunks can have three
attributes of the collection (i.e., minkey, maxkey, and shardkey). When chunk size
reaches to a maximum size (i.e., 200 MB as conﬁgured for automatic balancing)
then the splitter splits the chunk into two new equal chunks.
As already mentioned, the basic idea of automatic load balancing of MongoDB
is that if the difference between the number of chunks of any two shards is greater
than or equal to eight as detected by balancer of MongoDB, then balancer consider
it as imbalanced shards and starts migrating chunks to other shards until the dif-
ference will decrease to two [15].
As discussed CAP theorem in Sect. 1, MongoDB embraces the properties of
availability and partition tolerance. Hence, to ensure the availability, it uses repli-
cated servers with automatic failover. In case of any failure in the system, partition
tolerance ensures smooth functioning by allowing it to continue work as a whole. In
the imbalance condition, the chunk migrations among the shards are managed
according to the automatic load balancing algorithm of the MongoDB. To imple-
ment availability and partition tolerance in distributed systems, replication and
sharding are commonly used which are briefly discussed below:

3.1 Replication

Replication is the process of synchronizing data across multiple servers connected

in a distributed manner. It provides redundancy and increases data availability by
making multiple copies of data on different database servers. In this way, data will
be protected from the single point failure and loss of one server will not affect the
availability of the data because the data can be recovered from the other copy of the
replica set. The replica is prj ∈ S i where, r represents the replica of a particular node

S i = pr1 , pr2 , pr3 where, S i ϵ S, S ∈ ℤ + and 1 ≤ j ≤ 3. It should be noted that one
An Approach to Improve Load Balancing in Distributed Storage … 533

node has only three replicas in which one must be a primary replica and others are
secondary replicas.
Every node has replication group G of size n, so the quorum is a subset of nodes
in a replication group G. Let view v is a tuple over G that deﬁnes important
information for G and view id is denoted as iv ∈ ℕ [16].
In some cases, replication can be used to service more read operations on the
database. To increase the availability of data for distributed applications, we can
also use different data centers to store the database geographically. Replica sets
have the same data in the replica group. In this group, one replica is primary, and
rest are secondary [14]. The primary accepts all the read/write operations from the
clients. In the case, if primary is unavailable, one of the secondary replicas is
elected as primary. For the election purpose, Paxos algorithm is used [17].

3.2 Sharding

MongoDB scales the system, when it needs to store data more than the capacity of a
single server (or shard) with the help of horizontal scaling. The principle of hori-
zontal scaling is to partition data by rows rather than splitting data into columns
(e.g., normalization and vertical partitioning do in the relational database) [14]. In
MongoDB, horizontal scaling is done by automatic sharding architecture to dis-
tribute data across thousands of nodes. Moreover, sharding occurs on a per col-
lection basis; it did not take into consideration of the whole database. MongoDB is
conﬁgured in such a way that it automatically detects which collection is growing
monotonically than the other. That collection has become the subject of sharding
while others may still reside on the single server. Some components that need an
explanation to understand the architecture of the MongoDB sharding is given in
Fig. 1.
• Shards are the servers that store data (each run mongod process) and ensure
availability and automatic failover, each shard comprising a replica set.
• Conﬁg Servers “store the cluster’s metadata,” which include the basic infor-
mation of chunks and shards. These chunks are contiguous ranges of data from
collections that are ordered by the shard key.
• Routing Services are run mongos processes performing read and write request
on behalf of client applications.
Auto-sharding in MongoDB provides some necessary functionality without
requiring large or powerful machines [15].
1. Automatic balancing if changes occur in load and data distribution.
2. Ease of adding new machines without downtime.
3. No solitary point of failure.
4. Ability to recover from failure automatically.
534 Sudhakar and S. K. Pandey

Fig. 1 Modiﬁed MongoDB distributed architecture [15]

In the system, S is thenset of shards and each shard consists of S i replica

o sets then
we can state it as S = S i : S i ∈ S where prj i ∈ S i , i ≥ 2, 1 ≤ j ≤ 3 , here prj i is
represent j replicas available in ith shards.

4 Proposed Method for Load Balancing

4.1 Basic Idea About Load Balancing and Preliminaries

In this section, we are going to introduce all the terms that we will use in this
document. For every data item d ∈ D, where D is the set of all data items, we deﬁne
all types of load here, that represented by Γ. The load lt : D → ℝ, t ∈ Γ is a
function that use for assigning the associated load value of load type t from set D.
There exists an associated load value for every unit of replication U ∈ D, i.e.,
t = ∑vϵU lt ðvÞ. And any node H in the distributed system, at a particular node UH
lU
t = ∑U ∈ UH lt . Every node
contain all units of replication, and an associated value lH U

has capacity ct ∈ ℝ for each load type t. Thus, the inequality lt < cH
H H
t must be
maintained as invariant, as the violation would result in failure of H. We also
calculate the utilization [18] of a node uH t = lt ̸ ct at t ∈ Γ. If a system has
H H
An Approach to Improve Load Balancing in Distributed Storage … 535

utilization μSt where S is a set of all nodes in the system and average utilization μ̂St of
t ∈ Γ [18], given as

∑H ∈ S lH 1
μSt = t
, μ̂St = ∑ uH ðiÞ
∑ H ∈ S cH
t ∣S∣ HϵS t

In addition to the above, we need to consider the third parameter that represents the
cost of moving replication unit or data item U from one host to another host in S, i.e.,
ρU: S × S → ℝ, this parameter is linearly depends on lU size . If the system is considered
uniform, then H, H 0 , H 00 ∈ S and U ∈ H such that ρU ðH, H 0 Þ = ρU ðH, H 00 Þ is seems
to be a constant where ρU ∈ ℝ.
Let

S = flt jt ∈ Γ, U ∈ Hg
LH ðiiÞ
H

CH = fcH
t jt ∈ Γg ðiiiÞ

LH = flH
t jt ∈ Γg ðivÞ

S jH ∈ Sg
LS = f CH , LH , LH ðvÞ

where LS is referred as the system statistics for S. The migration is done by the
balancer, i.e., MIGRATORðU, H Þ to MIGRATORðU, H 0 Þ where U for a unit of
replication and H, H 0 are the hosts.

4.2 Modiﬁed Load Balancing Algorithm

In the algorithm, there are two methods one is IsBalanceðÞ that will return a
Boolean value depending on whether the particular shard or host H is balanced or
not. However, the condition of balance is as follows: if a shard has more data than
its threshold value, then it will become imbalanced. The threshold value is
Const * cH H
t (e.g., Const = 0.7 or 0.8 or 0.9) where ct the capacity of the shard H. If
a shard shows that it is imbalanced, then it needs to migrate. The second method is
MIGRATORðÞ which migrates data from imbalanced shard to a balanced shard until
l ̂ data remains in the shard.
S
t
We are calculating the total data occupied in all shards that are lSt and then
calculate average data occupied by all shards that is lt̂ .
S

Furthermore, we check if the shard is balanced or not, and for each value of
imbalance shard, we migrate chunks from imbalance shard to balance shard until
536 Sudhakar and S. K. Pandey

Hj
lt t ≥ lt̂ OR lt t ≥ lH
Hi S t
the condition tmax met and this process repeated until all
shards become balance.

4.3 Experimental Results

To show the effectiveness of modiﬁed MongoDB, we have compared our approach

with traditional MongoDB and compared the two on the basis of chunk migration
rate and space utilization metrics.
To prove our claim, we have performed the experiments four times with a
different number of chunks and noted the results. In the first experiment, we have
considered shards with 100 chunk capacity. Whenever imbalance event occurs,
balancer algorithm executes automatically and redistributes chunks among shards in
order to maintain balance in the system. Furthermore, our approach performs less
chunk migration which in turn reduces the overhead cost of migration over Mon-
goDB load balancing algorithm, resulting into the efficient utilization of the shards.
For analytical purpose, the same experiment is performed with 1000, 10000, and
100000 chunk count capacity shards which is clear from the Fig. 2a.
It should be noted that the memory utilization of the shards in the process of load
balancing, then automatic load balancing algorithm and modified MongoDB per-
forms exactly the same as mentioned in Fig. 2b. Hence, our algorithm performs
better by minimizing overhead cost due to chunk migrations which leads to
improved load balancing computational complexity without compromising with
space utilization. Thus, we are able to improve two factors of the balancing algo-
rithm—computation cost of chunk migrations and memory utilization of the shards.

Chunks Migrations vs. Max. number of Space Utilization vs. Max number of
Chunks Chunks
20 1

15 0.8
0.6
10
0.4
5
0.2
0 0
100 1000 10000 100000 100 1000 10000 100000
Modified MongoDB MongoDB Modified MongoDB MongoDB
(a) Migration rate (b) Space utilization

Fig. 2 Experimental evaluation of modiﬁed MongoDB and MongoDB based on migration rate
and space utilization
An Approach to Improve Load Balancing in Distributed Storage … 537

5 Conclusion and Future Work

The large-scale applications and data processing require handling of issues like
scalability, reliability, and performance of distributed storage systems. One of the
prominent issues among them is to handle the skewness in the data distribution and
accessing of data items. We have presented an improved algorithm of load bal-
ancing for NoSQL database MongoDB, which handles aforementioned issues by
providing automatic load balancing. We have analyzed our algorithm and shown
that our approach is better than many similar systems employed speciﬁcally in
MongoDB database [14], in terms of chunk migration and memory utilization for
the individual shard.
Our proposed method is for MongoDB based NoSQL database systems. We will
try to incorporate this approach into existing different flavors of NoSQL.

References

1. Church, George M., Yuan Gao, and Sriram Kosuri, 2012, “Next-generation digital
information storage in DNA.” Science 337.6102: 1628–1628.
2. Dean, Jeffrey, and Sanjay Ghemawat, 2008, “MapReduce: simpliﬁed data processing on large
clusters.” Communications of the ACM 51.1: 107–113.
3. Lakshman, Avinash, and Prashant Malik, 2010, “Cassandra: a decentralized structured storage
system.” ACM SIGOPS Operating Systems Review 44.2: 35–40.
4. M. Ali-ud-din, et al., 2014, “Seven V’s of Big Data understanding Big Data to extract value,”
American Society for Engineering Education (ASEE Zone 1), 2014 Zone 1 Conference of the,
Bridgeport, CT, USA.
5. E. Dumbill, 2012, “What is big data?,” O’Reilly Media, Inc., Available: https://beta.oreilly.
com/ideas/what-is-big-data.
6. DeCandia, Giuseppe, et al., 2007, “Dynamo: amazon’s highly available key-value
store.” ACM SIGOPS operating systems review 41.6: 205–220.
7. Cooper, Brian F., et al., 2008, “PNUTS: Yahoo!’s hosted data serving platform.” Proc. of the
VLDB Endowment 1: 1277–1288.
8. Chang, Fay, et al., 2008, “Bigtable: A distributed storage system for structured data.” ACM
Trans. on Computer Systems (TOCS) 26.2: 4.
9. “MongoDB,” MongoDB Inc., 2015, Available: https://en.wikipedia.org/wiki/MongoDB.
10. E. A. Brewer, Towards robust distributed systems. (Invited Talk), Oregon, 2000.
11. Featherston, Dietrich, 2010, “cassandra: Principles and Application.” Department of Com-
puter Science University of Illinois at Urbana-Champaign.
12. Thusoo, Ashish, et al., 2010, “Data warehousing and analytics infrastructure at face-
book.” Proc. of the 2010 ACM SIGMOD Inter. Conf. on Management of data.
13. Glendenning, Lisa, et al. “Scalable consistency in Scatter, 2011,” Proc. of the Twenty-Third
ACM Symposium on Operating Systems Principles.
14. MongoDB Documentation,” 25 June 2015. [Online].
15. Liu, Yimeng, Yizhi Wang, and Yi Jin., 2012, “Research on the improvement of MongoDB
Auto-Sharding in cloud environment.” Computer Science & Education (ICCSE), 2012 7th
Inter. Conf. on. IEEE.
16. Gifford, David K, 1979, “Weighted voting for replicated data.” Proc. of the seventh ACM
symposium on Operating systems principles.
538 Sudhakar and S. K. Pandey

17. Lamport, Leslie, 1998, “The part-time parliament.” ACM Transactions on Computer Systems
(TOCS) 16.2: 133–169.
18. Godfrey, Brighten, et al., 2004, “Load balancing in dynamic structured P2P sys-
tems.” INFOCOM 2004. Twenty-third Annual Joint Conf. of the IEEE Computer and
Communications Societies. Vol. 4.

Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
Distribution Model
100% (1)
Distribution Model
24 pages
Nosql Databases
No ratings yet
Nosql Databases
379 pages
MongoDB Data Models Guide
100% (1)
MongoDB Data Models Guide
39 pages
MongoDB Administration Guide
50% (2)
MongoDB Administration Guide
198 pages
Stock Market System Design
No ratings yet
Stock Market System Design
38 pages
NoSQL Database
No ratings yet
NoSQL Database
8 pages
MA023 ADBMS TermWork
No ratings yet
MA023 ADBMS TermWork
234 pages
DTUnit 1 & 2
No ratings yet
DTUnit 1 & 2
69 pages
ECS781P-9-Cloud Data Management
No ratings yet
ECS781P-9-Cloud Data Management
79 pages
Unit 1 Mangodb
No ratings yet
Unit 1 Mangodb
57 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
Module 2
No ratings yet
Module 2
36 pages
Non Relational Database Management Systems
No ratings yet
Non Relational Database Management Systems
15 pages
Mongo DB
No ratings yet
Mongo DB
28 pages
NoSQL Big Data Management
No ratings yet
NoSQL Big Data Management
36 pages
NGT Paper
No ratings yet
NGT Paper
25 pages
Lec21Notes Merged
No ratings yet
Lec21Notes Merged
20 pages
Unit 2 BDA
No ratings yet
Unit 2 BDA
32 pages
Lecture 5 Distributed Storage Systems
No ratings yet
Lecture 5 Distributed Storage Systems
26 pages
Big Data Storage System Based On A Distributed Hash Tables System
No ratings yet
Big Data Storage System Based On A Distributed Hash Tables System
9 pages
Big Data Slides
No ratings yet
Big Data Slides
26 pages
Chapter 9 - BDMT
No ratings yet
Chapter 9 - BDMT
61 pages
Introduction To Nosql: Gabriele Pozzani
No ratings yet
Introduction To Nosql: Gabriele Pozzani
49 pages
Scalability Design Principles
No ratings yet
Scalability Design Principles
10 pages
Automated Testing For Distributed Databases With Fuzzy Fragment Reallocation
No ratings yet
Automated Testing For Distributed Databases With Fuzzy Fragment Reallocation
16 pages
Module 3
No ratings yet
Module 3
14 pages
Software and Systems Modeling
No ratings yet
Software and Systems Modeling
17 pages
Mongodb Notes Basic To Advanced 1692833294
No ratings yet
Mongodb Notes Basic To Advanced 1692833294
10 pages
Unit - III
No ratings yet
Unit - III
34 pages
Loadbalancing and Maintaining The QoS On Cloud Computing
No ratings yet
Loadbalancing and Maintaining The QoS On Cloud Computing
7 pages
Adbms Mini Sem 5-1
No ratings yet
Adbms Mini Sem 5-1
10 pages
Framing The Future of Information Systems in Afghan Dynamics
No ratings yet
Framing The Future of Information Systems in Afghan Dynamics
4 pages
Bigdata and Nosql DBS: Piyushgupta July2013
No ratings yet
Bigdata and Nosql DBS: Piyushgupta July2013
27 pages
What Is A Distributed Database
No ratings yet
What Is A Distributed Database
8 pages
Big Dataahh Is The Future
No ratings yet
Big Dataahh Is The Future
10 pages
SI Modernization Scorecard
No ratings yet
SI Modernization Scorecard
43 pages
NoSQL - Unit2
No ratings yet
NoSQL - Unit2
8 pages
MongoDB Case Study 1
No ratings yet
MongoDB Case Study 1
6 pages
01 NSQL
No ratings yet
01 NSQL
5 pages
Data Infrastructure at Meta: Atik Ishrak October 2024
No ratings yet
Data Infrastructure at Meta: Atik Ishrak October 2024
6 pages
IJCRT2307237
No ratings yet
IJCRT2307237
7 pages
85 1464663337 - 30-05-2016 PDF
No ratings yet
85 1464663337 - 30-05-2016 PDF
5 pages
Unit5 Notes Short DB
No ratings yet
Unit5 Notes Short DB
6 pages
Application of Mongodb Technology in Nosql Database in Video Intelligent Big Data Analysis
No ratings yet
Application of Mongodb Technology in Nosql Database in Video Intelligent Big Data Analysis
5 pages
Iccmc51019 2021 9418441
No ratings yet
Iccmc51019 2021 9418441
5 pages
ReductStore - White Paper - Review
No ratings yet
ReductStore - White Paper - Review
7 pages
Massively Parallel Cloud Data Storage Systems: S. Sudarshan IIT Bombay
No ratings yet
Massively Parallel Cloud Data Storage Systems: S. Sudarshan IIT Bombay
17 pages
International Journal On Recent and Inno
No ratings yet
International Journal On Recent and Inno
5 pages
Nosql and Data Scalability 2.0: Amazon Dynamodb
No ratings yet
Nosql and Data Scalability 2.0: Amazon Dynamodb
7 pages
Cassandra High Availability Sample Chapter
No ratings yet
Cassandra High Availability Sample Chapter
16 pages
Pcadmmin,+1451591946 ICITDCEME-15
No ratings yet
Pcadmmin,+1451591946 ICITDCEME-15
4 pages
BDA Unit-4
No ratings yet
BDA Unit-4
12 pages
How To Design A System To Scale To Your First 100 Million Users - by Anh T. Dang - Level Up Coding
No ratings yet
How To Design A System To Scale To Your First 100 Million Users - by Anh T. Dang - Level Up Coding
34 pages
A Thorough Introduction To Distributed Systems
No ratings yet
A Thorough Introduction To Distributed Systems
31 pages
BDA Unit-2
No ratings yet
BDA Unit-2
30 pages
Lecture 06
No ratings yet
Lecture 06
68 pages
Big Data Storage Concepts
No ratings yet
Big Data Storage Concepts
31 pages
Content Technologies
No ratings yet
Content Technologies
54 pages
PostgreSQL Distributed Architectures and Best Practices
No ratings yet
PostgreSQL Distributed Architectures and Best Practices
42 pages
What Is Elasticsearch
No ratings yet
What Is Elasticsearch
63 pages
Unit II
No ratings yet
Unit II
83 pages
Elasticsearch
No ratings yet
Elasticsearch
15 pages
NoSQL Databases Critical Analysis and Comparison
No ratings yet
NoSQL Databases Critical Analysis and Comparison
7 pages
Shared-Disk vs. Shared-Nothing: Comparing Architectures For Clustered Databases
No ratings yet
Shared-Disk vs. Shared-Nothing: Comparing Architectures For Clustered Databases
18 pages
Advanced DB Questions and Answer
No ratings yet
Advanced DB Questions and Answer
66 pages
The Modern Graph Database Buyers Guide
No ratings yet
The Modern Graph Database Buyers Guide
17 pages
Yugabyte Fundamentals Certification Exam Preparation Guide
No ratings yet
Yugabyte Fundamentals Certification Exam Preparation Guide
7 pages
No SQL Technical Com
No ratings yet
No SQL Technical Com
61 pages
Key Value Database
No ratings yet
Key Value Database
8 pages
Chapter 6 Quiz - 2019-20 1q Ece154p Cola
No ratings yet
Chapter 6 Quiz - 2019-20 1q Ece154p Cola
17 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Modin for Scalable Data Science: The Complete Guide for Developers and Engineers
From Everand
Modin for Scalable Data Science: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Optimized Caching Techniques: Application for Scalable Distributed Architectures
From Everand
Optimized Caching Techniques: Application for Scalable Distributed Architectures
Peter Jones
No ratings yet
Citus for Scalable PostgreSQL Systems: The Complete Guide for Developers and Engineers
From Everand
Citus for Scalable PostgreSQL Systems: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
DBMS MASTER: Become Pro in Database Management System
From Everand
DBMS MASTER: Become Pro in Database Management System
Ummed Singh
No ratings yet
AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
CouchDB Essentials: Definitive Reference for Developers and Engineers
From Everand
CouchDB Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Virtuoso Database Systems: The Complete Guide for Developers and Engineers
From Everand
Virtuoso Database Systems: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Efficient Parallel Computing with Dask: Definitive Reference for Developers and Engineers
From Everand
Efficient Parallel Computing with Dask: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
MongoDB Architecture and Operations: Definitive Reference for Developers and Engineers
From Everand
MongoDB Architecture and Operations: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
PrestoDB in Practice: Definitive Reference for Developers and Engineers
From Everand
PrestoDB in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Couchbase Essentials: Definitive Reference for Developers and Engineers
From Everand
Couchbase Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advanced Resilient Distributed Datasets in Distributed Computing: Definitive Reference for Developers and Engineers
From Everand
Advanced Resilient Distributed Datasets in Distributed Computing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Exploring the Fundamentals of Database Management Systems: Business strategy books, #2
From Everand
Exploring the Fundamentals of Database Management Systems: Business strategy books, #2
SANJIVAN SAINI
No ratings yet
Distributed Cluster Operations with DC/OS: Definitive Reference for Developers and Engineers
From Everand
Distributed Cluster Operations with DC/OS: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet

NoSQL Databases MongoDB

Uploaded by

NoSQL Databases MongoDB

Uploaded by

An Approach to Improve Load Balancing

in Distributed Storage Systems for NoSQL

Sudhakar and Shivendra Kumar Pandey

Abstract The ongoing process of heterogeneous data generation needs a better

Keywords NoSQL ⋅ Data load balancing ⋅ MongoDB ⋅ Chunk migration

© Springer Nature Singapore Pte Ltd. 2018 529

ton of genetic material is prerequisite by DNA as to store zettabyte of information.

• Our proposed method is computationally cost-effective as the number of chunk

Facebook’s Cassandra [3, 11] is a distributed database designed to store structured

3 Maintenance of Load in MongoDB

The basic concept of automatic load balancing of MongoDB is breaking up the

Replication is the process of synchronizing data across multiple servers connected

Fig. 1 Modiﬁed MongoDB distributed architecture [15]

In the system, S is thenset of shards and each shard consists of S i replica

4 Proposed Method for Load Balancing

4.1 Basic Idea About Load Balancing and Preliminaries

4.2 Modiﬁed Load Balancing Algorithm

4.3 Experimental Results

To show the effectiveness of modiﬁed MongoDB, we have compared our approach

5 Conclusion and Future Work

You might also like