Unit-5 Notes
Unit-5 Notes
NoSQL DATABASE:
Over time, four major types of NoSQL databases emerged: document databases, key-value
databases, wide-column stores, and graph databases.
Advantages of NoSQL:
The CAP theorem applies a similar type of logic to distributed systems-namely, that a
distributed system can deliver only two of three desired
characteristics: consistency, availability, and partition tolerance (the „C,‟ „A‟ and „P‟ in
CAP).
A distributed system is a network that stores data on more than one node (physical or virtual
machines) at the same time. Because all cloud applications are distributed systems, it‟s
essential to understand the CAP theorem when designing a cloud app so that you can choose
a data management system that delivers the characteristics your application needs most.
The CAP theorem is also called Brewer‟s Theorem, because it was first advanced by
Professor Eric A. Brewer during a talk he gave on distributed computing in 2000. Two years
later, MIT professors Seth Gilbert and Nancy Lynch published a proof of “Brewer‟s
Conjecture.”
The three distributed system characteristics to which the CAP theorem refers to are:
Consistency:
Consistency means that all clients see the same data at the same time, no matter which node
they connect to. For this to happen, whenever data is written to one node, it must be instantly
forwarded or replicated to all the other nodes in the system before the write is deemed
„successful.‟
Availability:
Availability means that any client making a request for data gets a response, even if one or
more nodes are down. Another way to state this—all working nodes in the distributed system
return a valid response for any request, without exception.
Partition tolerance:
NoSQL databases are ideal for distributed network applications. Unlike their vertically
scalable SQL (relational) counterparts, NoSQL databases are horizontally scalable and
distributed by design—they can rapidly scale across a growing network consisting of multiple
interconnected nodes. (See "SQL vs. NoSQL Databases: What's the Difference?" for more
information.)
NoSQL databases are classified based on the two CAP characteristics they support:
MongoDB is a popular NoSQL database management system that stores data as BSON
(binary JSON) documents. It's frequently used for big data and real-time applications running
at multiple different locations. Relative to the CAP theorem, MongoDB is a CP data store—it
resolves network partitions by maintaining consistency, while compromising on availability.
MongoDB is a single-master system—each replica set (link resides outside ibm.com) can
have only one primary node that receives all the write operations. All other nodes in the same
replica set are secondary nodes that replicate the primary node's operation log and apply it to
their own data set. By default, clients also read from the primary node, but they can also
specify a read preference (link resides outside ibm.com) that allows them to read from
secondary nodes.
When the primary node becomes unavailable, the secondary node with the most recent
operation log will be elected as the new primary node. Once all the other secondary nodes
catch up with the new master, the cluster becomes available again. As clients can't make any
write requests during this interval, the data remains consistent across the entire network.
Apache Cassandra is an open source NoSQL database maintained by the Apache Software
Foundation. It‟s a wide-column database that lets you store data on a distributed network.
However, unlike MongoDB, Cassandra has a masterless architecture, and as a result, it has
multiple points of failure, rather than a single one.
As data only becomes inconsistent in the case of a network partition and inconsistencies are
quickly resolved, Cassandra offers “repair” functionality to help nodes catch up with their
peers. However, constant availability results in a highly performant system that might be
worth the trade-off in many cases.
DATA MODELS IN NoSQL:
Document Stores
Key-Value Stores
Graph Databases
Column Stores
Each type has its own unique strengths and weaknesses and is best suited to certain
types of applications or use cases. Here‟s a brief overview of each type. The picture below
represents these four different kinds of NoSQL data model.
A Document Data Model is a lot different than other data models because it stores
data in JSON, BSON, or XML documents. in this data model, we can move documents
under one document and apart from this, any particular elements can be indexed to run
queries faster. Often documents are stored and retrieved in such a way that it becomes close
to the data objects which are used in many applications which means very less translations
are required to use data in applications. JSON is a native language that is often used to store
and query data too.
So in the document data model, each document has a key-value pair below is an example
for the same.
{
"Name" : "Yashodhra",
"Address" : "Near Patel Nagar",
"Email" : "[email protected]",
"Contact" : "12345"
}
Working of Document Data Model:
This is a data model which works as a semi-structured data model in which the records and
data associated with them are stored in a single document which means this data model is
not completely unstructured. The main thing is that data here is stored in a document.
Features:
Document Type Model: As we all know data is stored in documents rather than tables
or graphs, so it becomes easy to map things in many programming languages.
Flexible Schema: Overall schema is very much flexible to support this statement one
must know that not all documents in a collection need to have the same fields.
Distributed and Resilient: Document data models are very much dispersed which is
the reason behind horizontal scaling and distribution of data.
Manageable Query Language: These data models are the ones in which query
language allows the developers to perform CRUD (Create Read Update Destroy)
operations on the data model.
Advantages:
Schema-less: These are very good in retaining existing data at massive volumes
because there are absolutely no restrictions in the format and the structure of data
storage.
Faster creation of document and maintenance: It is very simple to create a document
and apart from this maintenance requires is almost nothing.
Open formats: It has a very simple build process that uses XML, JSON, and its other
forms.
Built-in versioning: It has built-in versioning which means as the documents grow in
size there might be a chance they can grow in complexity. Versioning decreases
conflicts.
Disadvantages:
Weak Atomicity: It lacks in supporting multi-document ACID transactions. A change
in the document data model involving two collections will require us to run two
separate queries i.e. one for each collection. This is where it breaks atomicity
requirements.
Consistency Check Limitations: One can search the collections and documents that
are not connected to an author collection but doing this might create a problem in the
performance of database performance.
Security: Nowadays many web applications lack security which in turn results in the
leakage of sensitive data. So it becomes a point of concern, one must pay attention to
web app vulnerabilities.
Applications of Document Data Model :
Content Management: These data models are very much used in creating
various video streaming platforms, blogs, and similar services Because each is stored as
a single document and the database here is much easier to maintain as the service
evolves over time.
Book Database: These are very much useful in making book databases because as we
know this data model lets us nest.
Catalog: When it comes to storing and reading catalog files these data models are very
much used because it has a fast reading ability if incase Catalogs have thousands of
attributes stored.
Analytics Platform: These data models are very much used in the Analytics Platform.
A number of easy strings or even a complicated entity are referred to as a value that is
associated with a key by a key-value database, which is utilized to monitor the entity. Like
in many programming paradigms, a key-value database resembles a map object or array, or
dictionary, however, which is put away in a tenacious manner and controlled by a DBMS.
An efficient and compact structure of the index is used by the key-value store to have the
option to rapidly and dependably find value using its key. For example, Redis is a key -
value store used to tracklists, maps, heaps, and primitive types (which are simple data
structures) in a constant database. Redis can uncover a very basic point of interaction to
query and manipulate value types, just by supporting a predetermined number of value
types, and when arranged, is prepared to do high throughput.
Here are a few situations in which you can use a key-value database:-
User session attributes in an online app like finance or gaming, which is referred to as
real-time random data access.
Caching mechanism for repeatedly accessing data or key-based design.
The application is developed on queries that are based on keys.
Features:
Advantages:
It is very easy to use. Due to the simplicity of the database, data can accept any kind, or
even different kinds when required.
Its response time is fast due to its simplicity, given that the remaining environment nea r
it is very much constructed and improved.
Key-value store databases are scalable vertically as well as horizontally.
Built-in redundancy makes this database more reliable.
Disadvantages:
Here are some popular key-value databases which are widely used:
Couchbase: It permits SQL-style querying and searching for text.
Amazon DynamoDB: The key-value database which is mostly used is Amazon
DynamoDB as it is a trusted database used by a large number of users. It can easily
handle a large number of requests every day and it also provides various security
options.
Riak: It is the database used to develop applications.
Aerospike: It is an open-source and real-time database working with billions of
exchanges.
Berkeley DB: It is a high-performance and open-source database providing scalability.
GRAPH DATA MODEL:
Graph Based Data Model in NoSQL is a type of Data Model which tries to focus on
building the relationship between data elements. As the name suggests Graph-Based Data
Model, each element here is stored as a node, and the association between these elements is
often known as Links. Association is stored directly as these are the first-class elements of
the data model. These data models give us a conceptual view of the data.
These are the data models which are based on topographical network structure. Obviously,
in graph theory, we have terms like Nodes, edges, and properties, let‟s see what it means
here in the Graph-Based data model.
Nodes: These are the instances of data that represent objects which is to be tracked.
Edges: As we already know edges represent relationships between nodes.
Properties: It represents information associated with nodes.
The below image represents Nodes with properties from relationships represented by edges.
JanusGraph: These are very helpful in big data analytics. It is a scalable graph
database system open source too. JanusGraph has different features like:
Storage: Many options are available for storing graph data like Cassandra.
Support for transactions: There are many supports available like
ACID (Atomicity, Consistency, Isolation, and Durability) which can hold
thousands of concurrent users.
Searching options: Complex searching options are available and optional
support too.
Neo4j: It stands for Network Exploration and Optimization 4 Java. As the name
suggests this graph database is written in Java with native graph storage and processing.
Neo4j has different features like:
Scalable: Scalable through data partitioning into pieces known as shards.
Higher Availability: Availability is very much high due to continuous
backups and rolling upgrades.
Query Language: Uses programmer-friendly query language Cypher graph
query language.DGraph main features are:
DGraph: It is an open-source distributed graph database system designed
with scalability.
Query Language: It uses GraphQL, which is solely made for APIs.
open-source system: support for many open standards.
Advantages of Graph Data Model :
Structure: The structures are very agile and workable too.
Explicit Representation: The portrayal of relationships between entities is explicit.
Real-time O/P Results: Query gives us real-time output results.
Basically, the relational database stores data in rows and also reads the data row by row,
column store is organized as a set of columns. So if someone wants to run analytics on a
small number of columns, one can read those columns directly without consuming memory
with the unwanted data. Columns are somehow are of the same type and gain from more
efficient compression, which makes reads faster than before. Examples of Columnar Data
Model: Cassandra and Apache Hadoop Hbase.
In Columnar Data Model instead of organizing information into rows, it does in columns.
This makes them function the same way that tables work in relational databases. This type
of data model is much more flexible obviously because it is a type of NoSQL database. The
below example will help in understanding the Columnar data model:
Row-Oriented Table:
S.No. Name ID
01. Tanmay 2
02. Abhishek 5
03. Samriddha 7
04. Aditi 8
S.No. Course ID
01. B-Tech 2
02. B-Tech 5
03. B-Tech 7
04. B-Tech 8
S.No Branch ID
01. Computer 2
02. Electronics 5
03. IT 7
04. E & TC 8
Columnar Data Model uses the concept of keyspace, which is like a schema in relational
models.
Advantages of Columnar Data Model :
Well structured: Since these data models are good at compression so these are very
structured or well organized in terms of storage.
Flexibility: A large amount of flexibility as it is not necessary for the columns to look
like each other, which means one can add new and different columns without disrupting
the whole database
Aggregation queries are fast: The most important thing is aggregation queries are
quite fast because a majority of the information is stored in a column. An example
would be Adding up the total number of students enrolled in one year.
Scalability: It can be spread across large clusters of machines, even numbering in
thousands.
Load Times: Since one can easily load a row table in a few seconds so load times are
nearly excellent.
MONGODB:
o Scalability
o Performance
o High Availability
o Scaling from single server deployments to large, complex multi-site architectures.
o Key points of MongoDB
o Develop Faster
o Deploy Easier
o Scale Bigger
Features of MongoDB
In MongoDB, you can search by field, range query and it also supports regular expression
searches.
2. Indexing
3. Replication
A master can perform Reads and Writes and a Slave copies data from the master and can only
be used for reads or back up (not writes)
4. Duplication of data
MongoDB can run over multiple servers. The data is duplicated to keep the system up and
also keep its running condition in case of hardware failure.
5. Load balancing
10. Stores files of any size easily without complicating your stack.
The MongoDB database commands are used to create, modify, and update the database.
1. db.adminCommand(cmd)
The admin command method runs against the admin database to run specified database
commands by providing a helper.
Command: Either the argument is specified in the document form or a string form. If the
command is defined as a string, it cannot include any argument.
Example:
Creating a user named JavaTpoint with the dbOwner role on the admin database.
db.adminCommand(
{
createUser: "JavaTpoint",
pwd: passwordPrompt(),
roles: [
{ role: "dbOwner", db: "admin" }
]
})
2. db.aggregate()
The aggregate method initialize a specific diagnostic or admin pipeline, which does not
require anu underlying collection.
Syntax:
Example:
The following example runs a pipeline with two stages. The first is the $currentOp operation
and the second will filters the results.
use admin
db.aggregate( [ {
$currentOp : { allUsers: true, idleConnections: true } },
{
$match : { shard: "shardDemo" }
}
])
3. db.cloneDatabase("hostname")
The clonedatabase method copies the specified database to the current database and assumes
that the database at the remote location has the same name as the current database.
The hostname parameter contains the hostname of the database that we want to copy.
Example:
db.cloneDatabase("customers")
4. db.commandHelp(command)
We have the help option for the specified database command using the commandHelp
method. The command parameter contains the name of a database command.
5. db.createCollection(name, options)
A new collection or view will be created using this method. The createCollection method is
used primarily for creating new collections that use specific options when the collection is
first referenced in a command.
For example - we will create a javaTpoint collection with a JSON Schema validator:
db.createCollection( "student", {
validator: { $jsonSchema: {
bsonType: "object",
required: [ "phone" ],
properties: {
phone: {
bsonType: "string",
description: "must be a string and is required"
},
email: {
bsonType : "string",
pattern: "@mongodb\.com$",
description: "must be a string and match the regular expression pattern"
},
status: {
enum: [ "Unknown", "Incomplete" ],
description: "can only be one of the enum values"
}
}
}}
})
6. db.dropDatabase(<writeConcern>)
The drop method removes the specified database and the associated data files.
For example -
We use <database> operation to switch the current database to the temporary database. We
use the db.dropDatabase() method to drops the temporary database
use temp
db.dropDatabase()
CASSANDRA:
Apache Cassandra is highly scalable, high performance, distributed NoSQL database.
Cassandra is designed to handle huge amount of data across many commodity servers,
providing high availability without a single point of failure.
Cassandra has a distributed architecture which is capable to handle a huge amount of data.
Data is placed on different machines with more than one replication factor to attain a high
availability without a single point of failure.
Cassandra vs MongoDB
o Development simplicity
o Operational simplicity
3) Cassandra stores data in tabular form like SQL MongoDB stores data in JSON format.
format.
4) Cassandra is got license by Apache. MongoDB is got license by AGPL and drivers by
Apache.
5) Cassandra is mainly designed to handle large MongoDB is designed to deal with JSON-like
amounts of data across many commodity documents and access applications easier and
servers. faster.
6) Cassandra provides high availability with no MongoDB is easy to administer in the case of
single point of failure. failure.
Key Points of Apache Cassandra:
o MongoDB is well suited for Bigdata and mobile & social infrastructure.
o MongoDB provides Replication, High availability and Auto-sharding.
o MongoDB is used by companies like Foursquare, Intuit, Shutterfly, SourceForge, The
New York Times, Lexis Nexis Orange Digital etc.