0% found this document useful (0 votes)

106 views17 pages

Unit-5 Notes

The document discusses NoSQL databases, including key features and types. It describes document databases, key-value stores, wide column stores, and graph databases. It also covers the CAP theorem and how MongoDB and Cassandra relate to consistency and availability.

Uploaded by

Shyam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

106 views17 pages

Unit-5 Notes

Uploaded by

Shyam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

UNIT-V- NoSQL DATABASE

Introduction to NoSQL - CAP Theorem – Data Models - Key-Value

Databases - Document Databases- Column Family Stores – Graph
Databases –Working of NoSQL Using MONGODB/CASSANDRA.

NoSQL DATABASE:

o NoSQL Database is used to refer a non-SQL or non relational database.

o It provides a mechanism for storage and retrieval of data other than tabular relations
model used in relational databases. NoSQL database doesn't use tables for storing
data. It is generally used to store big data and real-time web applications.

Types of NoSQL databases:

Over time, four major types of NoSQL databases emerged: document databases, key-value
databases, wide-column stores, and graph databases.

 Document databases store data in documents similar to JSON (JavaScript Object

Notation) objects. Each document contains pairs of fields and values. The values can
typically be a variety of types including things like strings, numbers, booleans, arrays,
or objects.
 Key-value databases are a simpler type of database where each item contains keys
and values.
 Wide-column stores store data in tables, rows, and dynamic columns.
 Graph databases store data in nodes and edges. Nodes typically store information
about people, places, and things, while edges store information about the relationships
between the nodes.

Advantages of NoSQL:

o It supports query language.

o It provides fast performance.
o It provides horizontal scalability.
CAP THEOREM:

The CAP theorem applies a similar type of logic to distributed systems-namely, that a
distributed system can deliver only two of three desired
characteristics: consistency, availability, and partition tolerance (the „C,‟ „A‟ and „P‟ in
CAP).

A distributed system is a network that stores data on more than one node (physical or virtual
machines) at the same time. Because all cloud applications are distributed systems, it‟s
essential to understand the CAP theorem when designing a cloud app so that you can choose
a data management system that delivers the characteristics your application needs most.

The CAP theorem is also called Brewer‟s Theorem, because it was first advanced by
Professor Eric A. Brewer during a talk he gave on distributed computing in 2000. Two years
later, MIT professors Seth Gilbert and Nancy Lynch published a proof of “Brewer‟s
Conjecture.”

The three distributed system characteristics to which the CAP theorem refers to are:

Consistency:

Consistency means that all clients see the same data at the same time, no matter which node
they connect to. For this to happen, whenever data is written to one node, it must be instantly
forwarded or replicated to all the other nodes in the system before the write is deemed
„successful.‟

Availability:

Availability means that any client making a request for data gets a response, even if one or
more nodes are down. Another way to state this—all working nodes in the distributed system
return a valid response for any request, without exception.

Partition tolerance:

A partition is a communications break within a distributed system—a lost or temporarily

delayed connection between two nodes. Partition tolerance means that the cluster must
continue to work despite any number of communication breakdowns between nodes in the
system.

CAP theorem NoSQL database types:

NoSQL databases are ideal for distributed network applications. Unlike their vertically
scalable SQL (relational) counterparts, NoSQL databases are horizontally scalable and
distributed by design—they can rapidly scale across a growing network consisting of multiple
interconnected nodes. (See "SQL vs. NoSQL Databases: What's the Difference?" for more
information.)
NoSQL databases are classified based on the two CAP characteristics they support:

 CP database: A CP database delivers consistency and partition tolerance at the

expense of availability. When a partition occurs between any two nodes, the system
has to shut down the non-consistent node (i.e., make it unavailable) until the partition
is resolved.

 AP database: An AP database delivers availability and partition tolerance at the

expense of consistency. When a partition occurs, all nodes remain available but those
at the wrong end of a partition might return an older version of data than others.
(When the partition is resolved, the AP databases typically resync the nodes to repair
all inconsistencies in the system.)

 CA database: A CA database delivers consistency and availability across all nodes. It

can‟t do this if there is a partition between any two nodes in the system, however, and
therefore can‟t deliver fault tolerance.

MongoDB and the CAP theorem

MongoDB is a popular NoSQL database management system that stores data as BSON
(binary JSON) documents. It's frequently used for big data and real-time applications running
at multiple different locations. Relative to the CAP theorem, MongoDB is a CP data store—it
resolves network partitions by maintaining consistency, while compromising on availability.

MongoDB is a single-master system—each replica set (link resides outside ibm.com) can
have only one primary node that receives all the write operations. All other nodes in the same
replica set are secondary nodes that replicate the primary node's operation log and apply it to
their own data set. By default, clients also read from the primary node, but they can also
specify a read preference (link resides outside ibm.com) that allows them to read from
secondary nodes.

When the primary node becomes unavailable, the secondary node with the most recent
operation log will be elected as the new primary node. Once all the other secondary nodes
catch up with the new master, the cluster becomes available again. As clients can't make any
write requests during this interval, the data remains consistent across the entire network.

Cassandra and the CAP theorem (AP)

Apache Cassandra is an open source NoSQL database maintained by the Apache Software
Foundation. It‟s a wide-column database that lets you store data on a distributed network.
However, unlike MongoDB, Cassandra has a masterless architecture, and as a result, it has
multiple points of failure, rather than a single one.

Relative to the CAP theorem, Cassandra is an AP database—it delivers availability and

partition tolerance but can't deliver consistency all the time. Because Cassandra doesn't have
a master node, all the nodes must be available continuously. However, Cassandra
provides eventual consistency by allowing clients to write to any nodes at any time and
reconciling inconsistencies as quickly as possible.

As data only becomes inconsistent in the case of a network partition and inconsistencies are
quickly resolved, Cassandra offers “repair” functionality to help nodes catch up with their
peers. However, constant availability results in a highly performant system that might be
worth the trade-off in many cases.
DATA MODELS IN NoSQL:

NoSQL data models can be divided into four main types:

 Document Stores
 Key-Value Stores
 Graph Databases
 Column Stores

Each type has its own unique strengths and weaknesses and is best suited to certain
types of applications or use cases. Here‟s a brief overview of each type. The picture below
represents these four different kinds of NoSQL data model.

DOCUMENT DATA MODEL:

A Document Data Model is a lot different than other data models because it stores
data in JSON, BSON, or XML documents. in this data model, we can move documents
under one document and apart from this, any particular elements can be indexed to run
queries faster. Often documents are stored and retrieved in such a way that it becomes close
to the data objects which are used in many applications which means very less translations
are required to use data in applications. JSON is a native language that is often used to store
and query data too.
So in the document data model, each document has a key-value pair below is an example
for the same.
{
"Name" : "Yashodhra",
"Address" : "Near Patel Nagar",
"Email" : "[email protected]",
"Contact" : "12345"
}
Working of Document Data Model:
This is a data model which works as a semi-structured data model in which the records and
data associated with them are stored in a single document which means this data model is
not completely unstructured. The main thing is that data here is stored in a document.
Features:

 Document Type Model: As we all know data is stored in documents rather than tables
or graphs, so it becomes easy to map things in many programming languages.
 Flexible Schema: Overall schema is very much flexible to support this statement one
must know that not all documents in a collection need to have the same fields.
 Distributed and Resilient: Document data models are very much dispersed which is
the reason behind horizontal scaling and distribution of data.
 Manageable Query Language: These data models are the ones in which query
language allows the developers to perform CRUD (Create Read Update Destroy)
operations on the data model.

Examples of Document Data Models :

 Amazon DocumentDB
 MongoDB
 Cosmos DB
 ArangoDB
 Couchbase Server
 CouchDB

Advantages:
 Schema-less: These are very good in retaining existing data at massive volumes
because there are absolutely no restrictions in the format and the structure of data
storage.
 Faster creation of document and maintenance: It is very simple to create a document
and apart from this maintenance requires is almost nothing.
 Open formats: It has a very simple build process that uses XML, JSON, and its other
forms.
 Built-in versioning: It has built-in versioning which means as the documents grow in
size there might be a chance they can grow in complexity. Versioning decreases
conflicts.

Disadvantages:
 Weak Atomicity: It lacks in supporting multi-document ACID transactions. A change
in the document data model involving two collections will require us to run two
separate queries i.e. one for each collection. This is where it breaks atomicity
requirements.
 Consistency Check Limitations: One can search the collections and documents that
are not connected to an author collection but doing this might create a problem in the
performance of database performance.
 Security: Nowadays many web applications lack security which in turn results in the
leakage of sensitive data. So it becomes a point of concern, one must pay attention to
web app vulnerabilities.
Applications of Document Data Model :
 Content Management: These data models are very much used in creating
various video streaming platforms, blogs, and similar services Because each is stored as
a single document and the database here is much easier to maintain as the service
evolves over time.
 Book Database: These are very much useful in making book databases because as we
know this data model lets us nest.
 Catalog: When it comes to storing and reading catalog files these data models are very
much used because it has a fast reading ability if incase Catalogs have thousands of
attributes stored.
 Analytics Platform: These data models are very much used in the Analytics Platform.

KEY-VALUE (KV) DATA MODEL:

A key-value data model or database is also referred to as a key-value store. It is a non-

relational type of database. In this, an associative array is used as a basic database in which
an individual key is linked with just one value in a collection. For the values, keys are
special identifiers. Any kind of entity can be valued. The collection of key-value pairs
stored on separate records is called key-value databases and they do not have an already
defined structure.

How do key-value databases work?

A number of easy strings or even a complicated entity are referred to as a value that is
associated with a key by a key-value database, which is utilized to monitor the entity. Like
in many programming paradigms, a key-value database resembles a map object or array, or
dictionary, however, which is put away in a tenacious manner and controlled by a DBMS.
An efficient and compact structure of the index is used by the key-value store to have the
option to rapidly and dependably find value using its key. For example, Redis is a key -
value store used to tracklists, maps, heaps, and primitive types (which are simple data
structures) in a constant database. Redis can uncover a very basic point of interaction to
query and manipulate value types, just by supporting a predetermined number of value
types, and when arranged, is prepared to do high throughput.

When to use a key-value database:

Here are a few situations in which you can use a key-value database:-
 User session attributes in an online app like finance or gaming, which is referred to as
real-time random data access.
 Caching mechanism for repeatedly accessing data or key-based design.
 The application is developed on queries that are based on keys.

Features:

 One of the most un-complex kinds of NoSQL data models.

 For storing, getting, and removing data, key-value databases utilize simple functions.
 Querying language is not present in key-value databases.
 Built-in redundancy makes this database more reliable.

Advantages:

 It is very easy to use. Due to the simplicity of the database, data can accept any kind, or
even different kinds when required.
 Its response time is fast due to its simplicity, given that the remaining environment nea r
it is very much constructed and improved.
 Key-value store databases are scalable vertically as well as horizontally.
 Built-in redundancy makes this database more reliable.

Disadvantages:

 As querying language is not present in key-value databases, transportation of queries

from one database to a different database cannot be done.
 The key-value store database is not refined. You cannot query the database without a
key.

Some examples of key-value databases:

Here are some popular key-value databases which are widely used:
 Couchbase: It permits SQL-style querying and searching for text.
 Amazon DynamoDB: The key-value database which is mostly used is Amazon
DynamoDB as it is a trusted database used by a large number of users. It can easily
handle a large number of requests every day and it also provides various security
options.
 Riak: It is the database used to develop applications.
 Aerospike: It is an open-source and real-time database working with billions of
exchanges.
 Berkeley DB: It is a high-performance and open-source database providing scalability.
GRAPH DATA MODEL:

Graph Based Data Model in NoSQL is a type of Data Model which tries to focus on
building the relationship between data elements. As the name suggests Graph-Based Data
Model, each element here is stored as a node, and the association between these elements is
often known as Links. Association is stored directly as these are the first-class elements of
the data model. These data models give us a conceptual view of the data.
These are the data models which are based on topographical network structure. Obviously,
in graph theory, we have terms like Nodes, edges, and properties, let‟s see what it means
here in the Graph-Based data model.
 Nodes: These are the instances of data that represent objects which is to be tracked.
 Edges: As we already know edges represent relationships between nodes.
 Properties: It represents information associated with nodes.
The below image represents Nodes with properties from relationships represented by edges.

Working of Graph Data Model :

In these data models, the nodes which are connected together are connected physically and
the physical connection among them is also taken as a piece of data. Connecting data in this
way becomes easy to query a relationship. This data model reads the relationship from
storage directly instead of calculating and querying the connection steps. Like many
different NoSQL databases these data models don‟t have any schema as it is important
because schema makes the model well and good and easy to edit.
Examples of Graph Data Models :

 JanusGraph: These are very helpful in big data analytics. It is a scalable graph
database system open source too. JanusGraph has different features like:
 Storage: Many options are available for storing graph data like Cassandra.
 Support for transactions: There are many supports available like
ACID (Atomicity, Consistency, Isolation, and Durability) which can hold
thousands of concurrent users.
 Searching options: Complex searching options are available and optional
support too.
 Neo4j: It stands for Network Exploration and Optimization 4 Java. As the name
suggests this graph database is written in Java with native graph storage and processing.
Neo4j has different features like:
 Scalable: Scalable through data partitioning into pieces known as shards.
 Higher Availability: Availability is very much high due to continuous
backups and rolling upgrades.
 Query Language: Uses programmer-friendly query language Cypher graph
query language.DGraph main features are:
 DGraph: It is an open-source distributed graph database system designed
with scalability.
 Query Language: It uses GraphQL, which is solely made for APIs.
 open-source system: support for many open standards.
Advantages of Graph Data Model :
 Structure: The structures are very agile and workable too.
 Explicit Representation: The portrayal of relationships between entities is explicit.
 Real-time O/P Results: Query gives us real-time output results.

Disadvantages of Graph Data Model :

 No standard query language: Since the language depends on the platform that is used
so there is no certain standard query language.
 Unprofessional Graphs: Graphs are very unprofessional for transactional-based
systems.
 Small User Base: The user base is small which makes it very difficult to get support
when running into a system.

Applications of Graph Data Model:

 Graph data models are very much used in fraud detection which itself is very much
useful and important.
 It is used in Digital asset management which provides a scalable database model to keep
track of digital assets.
 It is used in Network management which alerts a network administrator about problems
in a network.
 It is used in Context-aware services by giving traffic updates and many more.
 It is used in Real-Time Recommendation Engines which provide a better user
experience.
COLUMNAR OR COLUMN-FAMILY DATA MODEL:

Basically, the relational database stores data in rows and also reads the data row by row,
column store is organized as a set of columns. So if someone wants to run analytics on a
small number of columns, one can read those columns directly without consuming memory
with the unwanted data. Columns are somehow are of the same type and gain from more
efficient compression, which makes reads faster than before. Examples of Columnar Data
Model: Cassandra and Apache Hadoop Hbase.

Working of Columnar Data Model:

In Columnar Data Model instead of organizing information into rows, it does in columns.
This makes them function the same way that tables work in relational databases. This type
of data model is much more flexible obviously because it is a type of NoSQL database. The
below example will help in understanding the Columnar data model:
Row-Oriented Table:

S.No. Name Course Branch ID

01. Tanmay B-Tech Computer 2

02. Abhishek B-Tech Electronics 5

S.No. Name Course Branch ID

03. Samriddha B-Tech IT 7

04. Aditi B-Tech E & TC 8

Column – Oriented Table:

S.No. Name ID

01. Tanmay 2

02. Abhishek 5

03. Samriddha 7

04. Aditi 8

S.No. Course ID

01. B-Tech 2

02. B-Tech 5

03. B-Tech 7

04. B-Tech 8

S.No Branch ID

01. Computer 2

02. Electronics 5

03. IT 7

04. E & TC 8
Columnar Data Model uses the concept of keyspace, which is like a schema in relational
models.
Advantages of Columnar Data Model :
 Well structured: Since these data models are good at compression so these are very
structured or well organized in terms of storage.
 Flexibility: A large amount of flexibility as it is not necessary for the columns to look
like each other, which means one can add new and different columns without disrupting
the whole database
 Aggregation queries are fast: The most important thing is aggregation queries are
quite fast because a majority of the information is stored in a column. An example
would be Adding up the total number of students enrolled in one year.
 Scalability: It can be spread across large clusters of machines, even numbering in
thousands.
 Load Times: Since one can easily load a row table in a few seconds so load times are
nearly excellent.

Disadvantages of Columnar Data Model:

 Designing indexing Schema: To design an effective and working schema is too
difficult and very time-consuming.
 Suboptimal data loading: incremental data loading is suboptimal and must be avoided,
but this might not be an issue for some users.
 Security vulnerabilities: If security is one of the priorities then it must be known that
the Columnar data model lacks inbuilt security features in this case, one must look into
relational databases.
 Online Transaction Processing (OLTP): Online Transaction Processing (OLTP)
applications are also not compatible with columnar data models because of the way data
is stored.

Applications of Columnar Data Model:

 Columnar Data Model is very much used in various Blogging Platforms.
 It is used in Content management systems like WordPress, Joomla, etc.
 It is used in Systems that maintain counters.
 It is used in Systems that require heavy write requests.
 It is used in Services that have expiring usage.

MONGODB:

 MongoDB is a No SQL database. It is an open-source, cross-platform, document-

oriented database written in C++.
 MongoDB is an open-source document database that provides high performance, high
availability, and automatic scaling.
 In simple words, you can say that - Mongo DB is a document-oriented database. It is
an open source product, developed and supported by a company named 10gen.
 MongoDB is available under General Public license for free, and it is also available
under Commercial license from the manufacturer.
 The manufacturing company 10gen has defined MongoDB as:
 "MongoDB is a scalable, open source, high performance, document-oriented
database." - 10gen
 MongoDB was designed to work with commodity servers. Now it is used by the
company of all sizes, across all industry
Purpose of Building MongoDB:

The primary purpose of building MongoDB is:

o Scalability
o Performance
o High Availability
o Scaling from single server deployments to large, complex multi-site architectures.
o Key points of MongoDB
o Develop Faster
o Deploy Easier
o Scale Bigger

Features of MongoDB

These are some important features of MongoDB:

1. Support ad hoc queries

In MongoDB, you can search by field, range query and it also supports regular expression
searches.

2. Indexing

You can index any field in a document.

3. Replication

MongoDB supports Master Slave replication.

A master can perform Reads and Writes and a Slave copies data from the master and can only
be used for reads or back up (not writes)

4. Duplication of data

MongoDB can run over multiple servers. The data is duplicated to keep the system up and
also keep its running condition in case of hardware failure.

5. Load balancing

It has an automatic load balancing configuration because of data placed in shards.

6. Supports map reduce and aggregation tools.

7. Uses JavaScript instead of Procedures.

8. It is a schema-less database written in C++.

9. Provides high performance.

10. Stores files of any size easily without complicating your stack.

11. Easy to administer in the case of failures.

12. It also supports:

o JSON data model with dynamic schemas

o Auto-sharding for horizontal scalability
o Built in replication for high availability
o Now a day many companies using MongoDB to create new types of applications,
improve performance and availability.

MongoDB Database commands

The MongoDB database commands are used to create, modify, and update the database.

1. db.adminCommand(cmd)

The admin command method runs against the admin database to run specified database
commands by providing a helper.

Command: Either the argument is specified in the document form or a string form. If the
command is defined as a string, it cannot include any argument.

Example:

Creating a user named JavaTpoint with the dbOwner role on the admin database.

db.adminCommand(
{
createUser: "JavaTpoint",
pwd: passwordPrompt(),
roles: [
{ role: "dbOwner", db: "admin" }
]
})

2. db.aggregate()

The aggregate method initialize a specific diagnostic or admin pipeline, which does not
require anu underlying collection.

Syntax:

db.aggregate( [ <pipeline> ], { <options> } )

The pipeline parameter does not require any underlying collection and always starts with a
compatible stage, such as $currentOp or $listLocalSessions. It is an array of stages that will
be executed.

Example:

The following example runs a pipeline with two stages. The first is the $currentOp operation
and the second will filters the results.

use admin
db.aggregate( [ {
$currentOp : { allUsers: true, idleConnections: true } },
{
$match : { shard: "shardDemo" }
}
])

3. db.cloneDatabase("hostname")

The clonedatabase method copies the specified database to the current database and assumes
that the database at the remote location has the same name as the current database.

The hostname parameter contains the hostname of the database that we want to copy.

Example:

db.cloneDatabase("customers")

4. db.commandHelp(command)

We have the help option for the specified database command using the commandHelp
method. The command parameter contains the name of a database command.

5. db.createCollection(name, options)

A new collection or view will be created using this method. The createCollection method is
used primarily for creating new collections that use specific options when the collection is
first referenced in a command.

For example - we will create a javaTpoint collection with a JSON Schema validator:

db.createCollection( "student", {
validator: { $jsonSchema: {
bsonType: "object",
required: [ "phone" ],
properties: {
phone: {
bsonType: "string",
description: "must be a string and is required"
},
email: {
bsonType : "string",
pattern: "@mongodb\.com$",
description: "must be a string and match the regular expression pattern"
},
status: {
enum: [ "Unknown", "Incomplete" ],
description: "can only be one of the enum values"
}
}
}}
})

6. db.dropDatabase(<writeConcern>)

The drop method removes the specified database and the associated data files.

For example -

We use <database> operation to switch the current database to the temporary database. We
use the db.dropDatabase() method to drops the temporary database

use temp
db.dropDatabase()

CASSANDRA:
Apache Cassandra is highly scalable, high performance, distributed NoSQL database.
Cassandra is designed to handle huge amount of data across many commodity servers,
providing high availability without a single point of failure.
Cassandra has a distributed architecture which is capable to handle a huge amount of data.
Data is placed on different machines with more than one replication factor to attain a high
availability without a single point of failure.

Important Points of Cassandra:

o Cassandra is a column-oriented database.

o Cassandra is scalable, consistent, and fault-tolerant.
o Cassandra's distribution design is based on Amazon's Dynamo and its data model on
Google's Bigtable.
o Cassandra is created at Facebook. It is totally different from relational database
management systems.
o Cassandra follows a Dynamo-style replication model with no single point of failure,
but adds a more powerful "column family" data model.
o Cassandra is being used by some of the biggest companies like Facebook, Twitter,
Cisco, Rackspace, ebay, Twitter, Netflix, and more.

Cassandra vs MongoDB

Cassandra and MongoDB both are types of NoSQL databases. Cassandra is a

distributed database system designed to handle large amount of data and known for
its high scalability and high performance. While, MongoDB is document oriented
database which also provides high scalability, high performance and automatic
scaling.

In terms of simplicity, databases can be divided in two types:

o Development simplicity
o Operational simplicity

While MongoDB is known for an easy out-of-the-box experience, Cassandra is

known for easy to manage at scale.

Following is a list of important differences between them:

Index Cassandra Mongodb

1) Cassandra is high performance distributed MongoDB is cross-platform document-oriented

database system. database system.

2) Cassandra is written in Java. MongoDB is written in C++.

3) Cassandra stores data in tabular form like SQL MongoDB stores data in JSON format.
format.

4) Cassandra is got license by Apache. MongoDB is got license by AGPL and drivers by
Apache.

5) Cassandra is mainly designed to handle large MongoDB is designed to deal with JSON-like
amounts of data across many commodity documents and access applications easier and
servers. faster.

6) Cassandra provides high availability with no MongoDB is easy to administer in the case of
single point of failure. failure.
Key Points of Apache Cassandra:

o Cassandra is highly scalable, high performance, consistent and fault-tolerant

database system. Cassandra is a column-oriented database.
o Cassandra provides easy data distribution.
o Cassandra supports ACID properties i.e. Atomicity, Consistency, Isolation, and
Durability.
o Cassandra follows the distribution design of Amazon?s dynamo and its data model
design is based on Google's Bigtable.
o Cassandra was initially created at Facebook for inbox search and now it is being used
by some of the biggest companies like Facebook, Twitter, ebay, Netflix, Cisco,
Rackspace etc.

Key Points of MongoDB:

o MongoDB is well suited for Bigdata and mobile & social infrastructure.
o MongoDB provides Replication, High availability and Auto-sharding.
o MongoDB is used by companies like Foursquare, Intuit, Shutterfly, SourceForge, The
New York Times, Lexis Nexis Orange Digital etc.

Unit 2 BDA
No ratings yet
Unit 2 BDA
32 pages
Unit 5
No ratings yet
Unit 5
27 pages
NoSQL Module 2
No ratings yet
NoSQL Module 2
76 pages
Data Mining Unit-IV
No ratings yet
Data Mining Unit-IV
37 pages
SPARQL
No ratings yet
SPARQL
39 pages
Basic Concepts in Big Data 1
No ratings yet
Basic Concepts in Big Data 1
43 pages
CCS334 BIG DATA ANALYTICS Session 1 Intr
No ratings yet
CCS334 BIG DATA ANALYTICS Session 1 Intr
18 pages
CC Module 5
No ratings yet
CC Module 5
26 pages
BDA Unit2 Complete
No ratings yet
BDA Unit2 Complete
56 pages
Big Data Analytics Unit-5
No ratings yet
Big Data Analytics Unit-5
28 pages
Oreilly Technical Guide Understanding Etl
No ratings yet
Oreilly Technical Guide Understanding Etl
107 pages
Unit 1 Notes in NoSQL
No ratings yet
Unit 1 Notes in NoSQL
20 pages
Unit 4-DBP
No ratings yet
Unit 4-DBP
66 pages
Nosqlmodule 1
100% (1)
Nosqlmodule 1
102 pages
Module 3 Nosql
No ratings yet
Module 3 Nosql
12 pages
4.2 NoSQL Databases UNIT-1
No ratings yet
4.2 NoSQL Databases UNIT-1
35 pages
NoSQL Databases UNIT-3
No ratings yet
NoSQL Databases UNIT-3
20 pages
Distributed Database: GDC Thana Semester 6
No ratings yet
Distributed Database: GDC Thana Semester 6
10 pages
Big Data - RDBMS, NoSQL and DynamoDB
No ratings yet
Big Data - RDBMS, NoSQL and DynamoDB
6 pages
NOSQL
No ratings yet
NOSQL
23 pages
Unit 5-Key - Value Store Database
No ratings yet
Unit 5-Key - Value Store Database
16 pages
Bda - Unit 2
No ratings yet
Bda - Unit 2
30 pages
HBase
No ratings yet
HBase
36 pages
Introduction To Cassandra
No ratings yet
Introduction To Cassandra
15 pages
ABAP On HANA
No ratings yet
ABAP On HANA
114 pages
No SQL
No ratings yet
No SQL
11 pages
Characteristics of Key Value DB (DB)
No ratings yet
Characteristics of Key Value DB (DB)
13 pages
IBM Watsonx - Data Level 2 Quiz - Attempt Review
No ratings yet
IBM Watsonx - Data Level 2 Quiz - Attempt Review
17 pages
Introduction To Nosql: - Key Value Databases
No ratings yet
Introduction To Nosql: - Key Value Databases
14 pages
Mrcet R20 Iv 1 QB
No ratings yet
Mrcet R20 Iv 1 QB
79 pages
DA Unit 5
100% (1)
DA Unit 5
191 pages
1664473609-Unit 5 - Database Management - MongoDB
No ratings yet
1664473609-Unit 5 - Database Management - MongoDB
23 pages
Module 4 Nosql
No ratings yet
Module 4 Nosql
8 pages
Unit - III
No ratings yet
Unit - III
34 pages
Visual Guide To NoSQL Systems - Beany
No ratings yet
Visual Guide To NoSQL Systems - Beany
9 pages
Unit 3
No ratings yet
Unit 3
28 pages
Nosql - Journey Ahead!: Origin: Punch Cards To Dbms
No ratings yet
Nosql - Journey Ahead!: Origin: Punch Cards To Dbms
54 pages
Hbase
No ratings yet
Hbase
13 pages
Unit 5 - Chapter 2 - Introduction To MongoDB
No ratings yet
Unit 5 - Chapter 2 - Introduction To MongoDB
53 pages
Chapter 06 SQL (Advanced)
No ratings yet
Chapter 06 SQL (Advanced)
38 pages
Cassandra: Types of Nosql Databases
No ratings yet
Cassandra: Types of Nosql Databases
6 pages
More Details On Data Models
No ratings yet
More Details On Data Models
23 pages
Mongodb Cookbook: Chapter No.1 "Installing and Starting The Mongodb Server"
100% (1)
Mongodb Cookbook: Chapter No.1 "Installing and Starting The Mongodb Server"
40 pages
Design Document Database
No ratings yet
Design Document Database
62 pages
Cloud Computing Architecture Module III
No ratings yet
Cloud Computing Architecture Module III
14 pages
SQL NoSQL NewSQL
No ratings yet
SQL NoSQL NewSQL
12 pages
Nosql: Non-Relational Next Generation Operational Datastores and Databases
No ratings yet
Nosql: Non-Relational Next Generation Operational Datastores and Databases
19 pages
Unit-V: Database Management System
No ratings yet
Unit-V: Database Management System
5 pages
Sample Paper Q0503
No ratings yet
Sample Paper Q0503
20 pages
6 Documentdatabases
No ratings yet
6 Documentdatabases
27 pages
Hadoop I/O: Jaeyong Choi
No ratings yet
Hadoop I/O: Jaeyong Choi
36 pages
Oracle 12C New Features SQL, PLSQL and Administration PDF
No ratings yet
Oracle 12C New Features SQL, PLSQL and Administration PDF
240 pages
Mapreduce and Hadoop Distributed File System
No ratings yet
Mapreduce and Hadoop Distributed File System
36 pages
Serverless Architecture For Product Defect Detection Using Computer Vision Ra
No ratings yet
Serverless Architecture For Product Defect Detection Using Computer Vision Ra
1 page
An Investigation of NoSQL Database Performance From A MYSQL Perspective
No ratings yet
An Investigation of NoSQL Database Performance From A MYSQL Perspective
3 pages
Apache Cassandra
No ratings yet
Apache Cassandra
7 pages
A Performance Comparison of SQL and NoSQL Databases
No ratings yet
A Performance Comparison of SQL and NoSQL Databases
5 pages
Introducing Oracle Database 21c
No ratings yet
Introducing Oracle Database 21c
14 pages
Mongo DB
No ratings yet
Mongo DB
13 pages
NoSQL - Database Revolution
No ratings yet
NoSQL - Database Revolution
10 pages
Bda Unit 5
No ratings yet
Bda Unit 5
17 pages
Cassandra Installation Review
No ratings yet
Cassandra Installation Review
6 pages
5 - Data Model Changes in SD - 23012018 - S2
100% (1)
5 - Data Model Changes in SD - 23012018 - S2
34 pages
Edureka Interview Questions - HDFS
No ratings yet
Edureka Interview Questions - HDFS
4 pages
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
No ratings yet
Big Data, Map Reduce & Hadoop: By: Surbhi Vyas (7) Varsha
40 pages
What Is DW2.0
No ratings yet
What Is DW2.0
13 pages
BullSequana S400 and Dell EMC VMAX 250F Fast Track
No ratings yet
BullSequana S400 and Dell EMC VMAX 250F Fast Track
39 pages
Vertica Column-vs-Row
No ratings yet
Vertica Column-vs-Row
64 pages
SAP HANA Troubleshooting and Performance Analysis Guide en
No ratings yet
SAP HANA Troubleshooting and Performance Analysis Guide en
212 pages
4.NoSQL 1
No ratings yet
4.NoSQL 1
69 pages
Nosql Databases: by Amy Alexander and Tanya Christina
No ratings yet
Nosql Databases: by Amy Alexander and Tanya Christina
14 pages
SAP HANA Troubleshooting and Performance Analysis Guide en
No ratings yet
SAP HANA Troubleshooting and Performance Analysis Guide en
126 pages
ClickHouse Grokking
No ratings yet
ClickHouse Grokking
18 pages
Sap BW On Hana (Lesson1-3)
No ratings yet
Sap BW On Hana (Lesson1-3)
18 pages
SAP BW4 Quick Reference
No ratings yet
SAP BW4 Quick Reference
15 pages
MixPanel-architecture June2018
No ratings yet
MixPanel-architecture June2018
14 pages
Sap Hana Nse
No ratings yet
Sap Hana Nse
32 pages
Rockset For Hybrid Search
No ratings yet
Rockset For Hybrid Search
27 pages
HBase - Tutorial
No ratings yet
HBase - Tutorial
14 pages
Information Technology Bca 4th Sem Model Paper
No ratings yet
Information Technology Bca 4th Sem Model Paper
18 pages
Data Storage Structures
No ratings yet
Data Storage Structures
38 pages
Ebook The Evolution of The Data Warehouse
No ratings yet
Ebook The Evolution of The Data Warehouse
40 pages
Database Storage: Intro To Database Systems Andy Pavlo
No ratings yet
Database Storage: Intro To Database Systems Andy Pavlo
54 pages
NoSql 2-3
No ratings yet
NoSql 2-3
7 pages
An Outline of Big Data Tools & Technologies
No ratings yet
An Outline of Big Data Tools & Technologies
6 pages
Program: B.E Subject Name: Data Science Subject Code: IT-8003 Semester: 8th
No ratings yet
Program: B.E Subject Name: Data Science Subject Code: IT-8003 Semester: 8th
11 pages
05 - A Unified OLAP or OLTP Big Data Processing Framework in Telecom Industry
No ratings yet
05 - A Unified OLAP or OLTP Big Data Processing Framework in Telecom Industry
6 pages
A Course in In-Memory Data Management: Prof. Hasso Plattner
No ratings yet
A Course in In-Memory Data Management: Prof. Hasso Plattner
9 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
Touchpad Plus Ver. 1.1 Class 7
From Everand
Touchpad Plus Ver. 1.1 Class 7
Nisha Batra
No ratings yet

Unit-5 Notes

Uploaded by

Unit-5 Notes

Uploaded by

UNIT-V- NoSQL DATABASE

Introduction to NoSQL - CAP Theorem – Data Models - Key-Value

o NoSQL Database is used to refer a non-SQL or non relational database.

Types of NoSQL databases:

 Document databases store data in documents similar to JSON (JavaScript Object

o It supports query language.

A partition is a communications break within a distributed system—a lost or temporarily

CAP theorem NoSQL database types:

 CP database: A CP database delivers consistency and partition tolerance at the

 AP database: An AP database delivers availability and partition tolerance at the

 CA database: A CA database delivers consistency and availability across all nodes. It

MongoDB and the CAP theorem

Cassandra and the CAP theorem (AP)

Relative to the CAP theorem, Cassandra is an AP database—it delivers availability and

NoSQL data models can be divided into four main types:

DOCUMENT DATA MODEL:

Examples of Document Data Models :

KEY-VALUE (KV) DATA MODEL:

A key-value data model or database is also referred to as a key-value store. It is a non-

How do key-value databases work?

When to use a key-value database:

 One of the most un-complex kinds of NoSQL data models.

 As querying language is not present in key-value databases, transportation of queries

Some examples of key-value databases:

Working of Graph Data Model :

Disadvantages of Graph Data Model :

Applications of Graph Data Model:

Working of Columnar Data Model:

S.No. Name Course Branch ID

01. Tanmay B-Tech Computer 2

02. Abhishek B-Tech Electronics 5

03. Samriddha B-Tech IT 7

04. Aditi B-Tech E & TC 8

Column – Oriented Table:

Disadvantages of Columnar Data Model:

Applications of Columnar Data Model:

 MongoDB is a No SQL database. It is an open-source, cross-platform, document-

The primary purpose of building MongoDB is:

These are some important features of MongoDB:

1. Support ad hoc queries

You can index any field in a document.

MongoDB supports Master Slave replication.

It has an automatic load balancing configuration because of data placed in shards.

6. Supports map reduce and aggregation tools.

7. Uses JavaScript instead of Procedures.

8. It is a schema-less database written in C++.

11. Easy to administer in the case of failures.

12. It also supports:

o JSON data model with dynamic schemas

MongoDB Database commands

db.aggregate( [ <pipeline> ], { <options> } )

Important Points of Cassandra:

o Cassandra is a column-oriented database.

Cassandra and MongoDB both are types of NoSQL databases. Cassandra is a

In terms of simplicity, databases can be divided in two types:

While MongoDB is known for an easy out-of-the-box experience, Cassandra is

Following is a list of important differences between them:

Index Cassandra Mongodb

1) Cassandra is high performance distributed MongoDB is cross-platform document-oriented

2) Cassandra is written in Java. MongoDB is written in C++.

o Cassandra is highly scalable, high performance, consistent and fault-tolerant

Key Points of MongoDB:

You might also like