0% found this document useful (0 votes)
25 views52 pages

What Is NoSQL

Uploaded by

Sayli Gawde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views52 pages

What Is NoSQL

Uploaded by

Sayli Gawde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 52

What is NoSQL?

NoSQL stands for ‘Not-Only SQL’. These types of databases are Non-Relational or non-tabular. A No-SQL
database does not require a specific schema and hence is schema-less, and all the entries/documents are JSON
documents. Examples − MongoDB, DynamoDB, Redis, etc.

When to Use NoSQL?

NoSQL databases are very flexible, easy to use, developer-friendly, and provide higher performance. These types of
databases are a much better option when −

 You have Big Data Applications that handle large volumes of unstructured data.

 You need to scale the database based on constantly changing requirements

 You need the flexibility to store different types of data

Differences between SQL and NoSQL

The following table highlights the major differences between SQL and NoSQL −
Key SQL NoSQL

SQL databases are classified as NoSQL databases are known as non-


Type
Relational databases, i.e., RDBMS. relational or distributed database.

SQL databases use standard NoSQL database has dynamic schema for
Structured Query Languages, as the unstructured data. The data stored in a
name suggests. NoSQL database is not structured.
Language SQL is an industry-standard and Data could be stored as document-oriented,
very powerful language to execute column oriented, graph-based or organized
complex queries. as a Key-Value store. The syntax can vary
from database to database.

SQL databases can extend their In order to increase the capacity of a NoSQL
capacity on a single server by database, you would have to install new
increasing their RAM, CPU or SSD. servers parallel to the parent server.
SQL databases are scalable NoSQL databases are horizontally scalable
Scalability
vertically, as their storage could be which means they can easily handle more
increased for the same server by traffic by adding new servers to the database,
enhancing their storage components. which makes them a great choice for large
and constantly changing databases.

SQL databases have a fixed, pre- NoSQL databases don’t have a pre-defined
defined schema, which makes the schema, which makes them schema-less and
Schema
data storage more rigid, static, and more flexible.
restrictive.

SQL follows ACID (Atomicity, NoSQL is based on CAP (Consistency,


Internal Consistency, Isolation and Availability, and Partition Tolerance).
implementation Durability) properties for its
operations.

SQL databases can only be run on a NoSQL Databases can run on multiple
single system and because of this, systems, and hence, they support data
Data Storage they don’t follow the distribution of distribution features like data repetition,
data and hence they don’t support partition, etc., making them the best option
hierarchical storage of data. for hierarchical storage of data.

SQL databases are table-based NoSQL is document-based, key-value pair,


databases which makes them better and graph databases, which makes them
Type of Data
for multi-row transaction better when there are a lot of changes in the
applications. data.

SQL databases are best suited for NoSQL databases are not so good for
Performance and complex queries but are not complex queries because these are not as
suitability preferred for hierarchical large data powerful as SQL queries but are best suited
storage. for hierarchical large data storage.

SQL databases are implemented in NoSQL is purely open source. Some of its
both open source and commercial famous implementation are MongoDB,
databases such as like Postgres & BigTable, Redis, RavenDB, Cassandra,
Examples MySQL as open source and Oracle Hbase, Neo4j, and Cou
and Sqlite as commercial.
Characteristics of NoSQL Database

Although there are different ways that can be incorporated to understand how NoSQL databases work, we will now
look at some of the most common features that define a basic NoSQL database.

1. Complex-free working

Unlike SQL databases, NoSQL databases are not complicated. They store data in an unstructured or a semi-
structured form that requires no relational or tabular arrangement. Perhaps they are easier to use and can be
accomplished by all.

(Suggest blog: SQL: Applications and Commands)

2. Independent of Schema

Secondly, NoSQL databases are independent of schemas which implies that they can be run over without any
predetermined schemas.

That said, they are far more efficient to work with and perhaps this particular feature works well for young
programmers and organizations handling large amounts of heterogeneous data that requires no schemas to structure
it.

(Must check: SQL vs NoSQL)

3. Better Scalability

One of the most prominent features of such a database is that it has high scalability that makes it suitable for large
amounts of data.

Needless to mention that the contemporary data scientists often prefer to work with NoSQL databases due to this
feature since it allows them to accommodate humongous data without rupturing its efficacy.
4. Flexible to accommodate

Since such databases can accommodate heterogeneous data that requires no structuring, they are claimed to be
flexible in terms of their usage and reliability.

For beginners intending to try their hands in the field, NoSQL databases are easy to handle yet very useful.

(Read also: Top sites to learn SQL)

5. Durable

If durability is not one of its most striking features, then what is? NoSQL databases are highly durable as they can
accommodate data ranging from heterogeneous to homogeneous.

Not only can they accommodate structured data, but they can also incorporate unstructured data that requires no
query language. Undoubtedly, these databases are durable and efficient.

Types of NoSQL Databases

As we have gained some useful insights from the features of the NoSQL databases as to how they work, let us now
jump on to the various NoSQL database types to understand the concept in a better manner.

To begin with, NoSQL databases can be divided into 4 types. They are as follows -

1. Document Database

As the title itself indicates, the document database stores data in the form of documents. This implies that data is
grouped into files that make it easier to be recognized when it is required for building application software.

One of the major benefits of a document database is that it allows the developer to store data in a particular format
of documents according to the same format they follow for their applications.

It is a semi-structured and hierarchical NoSQL database that allows efficient storage of data. Especially when it
comes to user profiles or catalogs, this type of NoSQL database works very well. A typical NoSQL database
example is Mongodb.
(Also read - Hadoop vs Mongodb)

2. Key-Value Database

Termed to be the simplest form of NoSQL database of all other types, the key-value database is a database that
stores data in a schema-less manner. This type of database stores data in the key-value format.

Herein, a data point is categorized as a key to which a value (another data point) is allotted. For instance, a key data
point can be termed as 'age' while the value data point can be termed as '45'.

This way, data gets stored in an organized manner with the help of associative pairing. A typical example of this
type is Amazon's Dynamo database.

"Hundreds of thousands of AWS customers have chosen DynamoDB as their key-value and document database for
mobile, web, gaming, ad tech, IoT, and other applications that need low-latency data access at any scale."- Amazon's
Dynamo

3. Column-oriented Database

Another type of NoSQL database is the column-oriented database. This type of database stores data in the form of
columns that segregates information into homogenous categories.

This allows the user to access only the desired data without having to retrieve unnecessary information.

When it comes to data analytics in social media networking sites, the column-oriented database works very
efficiently by showcasing data that is prevalent in the search results.

Since such types of databases accommodate large amounts of data, it is better to filter out information. This is
exactly what the column-oriented database does. A typical example of a column-oriented NoSQL database
is Apache HBase.

4. Graph Database
The 4th type of NoSQL database is the graph database. Herein, data is stored in the form of graphical knowledge
and related elements like edges, nodes, etc.

Data points are placed in such a manner that nodes are related to edges and thus, a network or connection is
established between several data points.

This way, one data point leads to the other without the user having to retrieve individual data points. In the case of
software development, this type of database works well since connected data points often lead to networked data
storage.

This, in turn, makes the functioning of software highly effective and organized. An example of the graph NoSQL
database is Amazon Neptune.

“Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to build and run
applications that work with highly connected datasets. “

Advantages of NoSQL: There are many advantages of working with NoSQL databases such as MongoDB and
Cassandra. The main advantages are high scalability and high availability.

1. High scalability: NoSQL databases use sharding for horizontal scaling. Partitioning of data and placing it
on multiple machines in such a way that the order of the data is preserved is sharding. Vertical scaling
means adding more resources to the existing machine whereas horizontal scaling means adding more
machines to handle the data. Vertical scaling is not that easy to implement but horizontal scaling is easy to
implement. Examples of horizontal scaling databases are MongoDB, Cassandra, etc. NoSQL can handle a
huge amount of data because of scalability, as the data grows NoSQL scalesThe auto itself to handle that
data in an efficient manner.

2. Flexibility: NoSQL databases are designed to handle unstructured or semi-structured data, which means
that they can accommodate dynamic changes to the data model. This makes NoSQL databases a good fit
for applications that need to handle changing data requirements.

3. High availability: The auto, replication feature in NoSQL databases makes it highly available because in
case of any failure data replicates itself to the previous consistent state.

4. Scalability: NoSQL databases are highly scalable, which means that they can handle large amounts of data
and traffic with ease. This makes them a good fit for applications that need to handle large amounts of data
or traffic

5. Performance: NoSQL databases are designed to handle large amounts of data and traffic, which means that
they can offer improved performance compared to traditional relational databases.

6. Cost-effectiveness: NoSQL databases are often more cost-effective than traditional relational databases, as
they are typically less complex and do not require expensive hardware or software.
7. Agility: Ideal for agile development.

Disadvantages of NoSQL: NoSQL has the following disadvantages.

1. Lack of standardization: There are many different types of NoSQL databases, each with its own unique
strengths and weaknesses. This lack of standardization can make it difficult to choose the right database for
a specific application

2. Lack of ACID compliance: NoSQL databases are not fully ACID-compliant, which means that they do not
guarantee the consistency, integrity, and durability of data. This can be a drawback for applications that
require strong data consistency guarantees.

3. Narrow focus: NoSQL databases have a very narrow focus as it is mainly designed for storage but it
provides very little functionality. Relational databases are a better choice in the field of Transaction
Management than NoSQL.

4. Open-source: NoSQL is an databaseopen-source database. There is no reliable standard for NoSQL yet. In
other words, two database systems are likely to be unequal.

5. Lack of support for complex queries: NoSQL databases are not designed to handle complex queries, which
means that they are not a good fit for applications that require complex data analysis or reporting.

6. Lack of maturity: NoSQL databases are relatively new and lack the maturity of traditional relational
databases. This can make them less reliable and less secure than traditional databases.

7. Management challenge: The purpose of big data tools is to make the management of a large amount of data
as simple as possible. But it is not so easy. Data management in NoSQL is much more complex than in a
relational database. NoSQL, in particular, has a reputation for being challenging to install and even more
hectic to manage on a daily basis.

8. GUI is not available: GUI mode tools to access the database are not flexibly available in the market.

9. Backup: Backup is a great weak point for some NoSQL databases like MongoDB. MongoDB has no
approach for the backup of data in a consistent manner.

10. Large document size: Some database systems like MongoDB and CouchDB store data in JSON format.
This means that documents are quite large (BigData, network bandwidth, speed), and having descriptive
key names actually hurts since they increase the document size.

What is MongoDB – Working and Features

 Read

 Discuss

 Courses




MongoDB is an open-source document-oriented database that is designed to store a large scale of data and
also allows you to work with that data very efficiently. It is categorized under the NoSQL (Not only SQL)
database because the storage and retrieval of data in the MongoDB are not in the form of tables.

The MongoDB database is developed and managed by MongoDB.Inc under SSPL(Server Side Public
License) and initially released in February 2009. It also provides official driver support for all the popular
languages like C, C++, C#, and .Net, Go, Java, Node.js, Perl, PHP, Python, Motor, Ruby, Scala, Swift,
Mongoid. So, that you can create an application using any of these languages. Nowadays there are so many
companies that used MongoDB like Facebook, Nokia, eBay, Adobe, Google, etc. to store their large
amount of data.

How it works ?

Now, we will see how actually thing happens behind the scene. As we know that MongoDB is a database
server and the data is stored in these databases. Or in other words, MongoDB environment gives you a
server that you can start and then create multiple databases on it using MongoDB.
Because of its NoSQL database, the data is stored in the collections and documents. Hence the database,
collection, and documents are related to each other as shown below:
 The MongoDB database contains collections just like the MYSQL database contains tables. You are
allowed to create multiple databases and multiple collections.
 Now inside of the collection we have documents. These documents contain the data we want to store in the
MongoDB database and a single collection can contain multiple documents and you are schema-less means
it is not necessary that one document is similar to another.
 The documents are created using the fields. Fields are key-value pairs in the documents, it is just like
columns in the relation database. The value of the fields can be of any BSON data types like double, string,
boolean, etc.
 The data stored in the MongoDB is in the format of BSON documents. Here, BSON stands for Binary
representation of JSON documents. Or in other words, in the backend, the MongoDB server converts the
JSON data into a binary form that is known as BSON and this BSON is stored and queried more
efficiently.
 In MongoDB documents, you are allowed to store nested data. This nesting of data allows you to create
complex relations between data and store them in the same document which makes the working and
fetching of data extremely efficient as compared to SQL. In SQL, you need to write complex joins to get
the data from table 1 and table 2. The maximum size of the BSON document is 16MB.

NOTE: In MongoDB server, you are allowed to run multiple databases.

For example, we have a database named GeeksforGeeks. Inside this database, we have two collections and
in these collections we have two documents. And in these documents we store our data in the form of
fields. As shown in the below image:
How mongoDB is different from RDBMS ?
Some major differences in between MongoDB and the RDBMS are as follows:

MongoDB RDBMS

It is a non-relational and document-oriented


database. It is a relational database.

It is not suitable for hierarchical data


It is suitable for hierarchical data storage. storage.

It has a dynamic schema. It has a predefined schema.

It centers around the CAP theorem It centers around ACID properties


(Consistency, Availability, and Partition (Atomicity, Consistency, Isolation, and
tolerance). Durability).

In terms of performance, it is much faster In terms of performance, it is slower than


than RDBMS. MongoDB.

Features of MongoDB –

 Schema-less Database: It is the great feature provided by the MongoDB. A Schema-less database means
one collection can hold different types of documents in it. Or in other words, in the MongoDB database, a
single collection can hold multiple documents and these documents may consist of the different numbers of
fields, content, and size. It is not necessary that the one document is similar to another document like in the
relational databases. Due to this cool feature, MongoDB provides great flexibility to databases.
 Document Oriented: In MongoDB, all the data stored in the documents instead of tables like in RDBMS.
In these documents, the data is stored in fields(key-value pair) instead of rows and columns which make the
data much more flexible in comparison to RDBMS. And each document contains its unique object id.
 Indexing: In MongoDB database, every field in the documents is indexed with primary and secondary
indices this makes easier and takes less time to get or search data from the pool of the data. If the data is not
indexed, then database search each document with the specified query which takes lots of time and not so
efficient.
 Scalability: MongoDB provides horizontal scalability with the help of sharding. Sharding means to
distribute data on multiple servers, here a large amount of data is partitioned into data chunks using the
shard key, and these data chunks are evenly distributed across shards that reside across many physical
servers. It will also add new machines to a running database.
 Replication: MongoDB provides high availability and redundancy with the help of replication, it creates
multiple copies of the data and sends these copies to a different server so that if one server fails, then the
data is retrieved from another server.
 Aggregation: It allows to perform operations on the grouped data and get a single result or computed
result. It is similar to the SQL GROUPBY clause. It provides three different aggregations i.e, aggregation
pipeline, map-reduce function, and single-purpose aggregation methods
 High Performance: The performance of MongoDB is very high and data persistence as compared to
another database due to its features like scalability, indexing, replication, etc.

Advantages of MongoDB :

 It is a schema-less NoSQL database. You need not to design the schema of the database when you are
working with MongoDB.
 It does not support join operation.
 It provides great flexibility to the fields in the documents.
 It contains heterogeneous data.
 It provides high performance, availability, scalability.
 It supports Geospatial efficiently.
 It is a document oriented database and the data is stored in BSON documents.
 It also supports multiple document ACID transition(string from MongoDB 4.0).
 It does not require any SQL injection.
 It is easily integrated with Big Data Hadoop

Disadvantages of MongoDB :

 It uses high memory for data storage.


 You are not allowed to store more than 16MB data in the documents.
 The nesting of data in BSON is also limited you are not allowed to nest data more than 100 levels.

NoSQL Data Architecture Patterns

 Read

 Discuss

 Courses



Architecture Pattern is a logical way of categorizing data that will be stored on the Database. NoSQL is a type of
database which helps to perform operations on big data and store it in a valid format. It is widely used because of its
flexibility and a wide variety of services.

Architecture Patterns of NoSQL:


The data is stored in NoSQL in any of the following four data architecture patterns.

1. Key-Value Store Database

2. Column Store Database

3. Document Database

4. Graph Database
These are explained as following below.

1. Key-Value Store Database:


This model is one of the most basic models of NoSQL databases. As the name suggests, the data is stored in form of
Key-Value Pairs. The key is usually a sequence of strings, integers or characters but can also be a more advanced
data type. The value is typically linked or co-related to the key. The key-value pair storage databases generally store
data as a hash table where each key is unique. The value can be of any type (JSON, BLOB(Binary Large Object),
strings, etc). This type of pattern is usually used in shopping websites or e-commerce applications.

Advantages:

 Can handle large amounts of data and heavy load,


 Easy retrieval of data by keys.

Limitations:

 Complex queries may attempt to involve multiple key-value pairs which may delay performance.
 Data can be involving many-to-many relationships which may collide.

Examples:

 DynamoDB
 Berkeley DB

2. Column Store Database:


Rather than storing data in relational tuples, the data is stored in individual cells which are further grouped into
columns. Column-oriented databases work only on columns. They store large amounts of data into columns
together. Format and titles of the columns can diverge from one row to other. Every column is treated separately.
But still, each individual column may contain multiple other columns like traditional databases.
Basically, columns are mode of storage in this type.
Advantages:

 Data is readily available


 Queries like SUM, AVERAGE, COUNT can be easily performed on columns.

Examples:

 HBase
 Bigtable by Google
 Cassandra
3. Document Database:
The document database fetches and accumulates data in form of key-value pairs but here, the values are called as
Documents. Document can be stated as a complex data structure. Document here can be a form of text, arrays,
strings, JSON, XML or any such format. The use of nested documents is also very common. It is very effective as
most of the data created is usually in form of JSONs and is unstructured.

Advantages:

 This type of format is very useful and apt for semi-structured data.
 Storage retrieval and managing of documents is easy.

Limitations:

 Handling multiple documents is challenging


 Aggregation operations may not work accurately.

Examples:

 MongoDB
 CouchDB
Figure – Document Store Model in form of JSON documents

4. Graph Databases:
Clearly, this architecture pattern deals with the storage and management of data in graphs. Graphs are basically
structures that depict connections between two or more objects in some data. The objects or entities are called as
nodes and are joined together by relationships called Edges. Each edge has a unique identifier. Each node serves as a
point of contact for the graph. This pattern is very commonly used in social networks where there are a large number
of entities and each entity has one or many characteristics which are connected by edges. The relational database
pattern has tables that are loosely connected, whereas graphs are often very strong and rigid in nature.

Advantages:

 Fastest traversal because of connections.


 Spatial data can be easily handled.

Limitations:
Wrong connections may lead to infinite loops.

Examples:

 Neo4J
 FlockDB( Used by Twitter)
Figure – Graph model format of NoSQL Databases

Columnar Data Model of NoSQL

 Read

 Discuss

 Courses




The Columnar Data Model of NoSQL is important. NoSQL databases are different from SQL
databases. This is because it uses a data model that has a different structure than the
previously followed row-and-column table model used with relational database management
systems (RDBMS). NoSQL databases are a flexible schema model which is designed to scale
horizontally across many servers and is used in large volumes of data.
Columnar Data Model of NoSQL :
Basically, the relational database stores data in rows and also reads the data row by row,
column store is organized as a set of columns. So if someone wants to run analytics on a
small number of columns, one can read those columns directly without consuming memory
with the unwanted data. Columns are somehow are of the same type and gain from more
efficient compression, which makes reads faster than before. Examples of Columnar Data
Model: Cassandra and Apache Hadoop Hbase.
Working of Columnar Data Model:
In Columnar Data Model instead of organizing information into rows, it does in columns. This
makes them function the same way that tables work in relational databases. This type of
data model is much more flexible obviously because it is a type of NoSQL database. The
below example will help in understanding the Columnar data model:
Row-Oriented Table:
S.No. Name Course Branch ID

01. Tanmay B-Tech Computer 2

02. Abhishek B-Tech Electronics 5

03. Samriddha B-Tech IT 7

04. Aditi B-Tech E & TC 8

Column – Oriented Table:


S.No. Name ID

01. Tanmay 2

02. Abhishek 5

03. Samriddha 7

04. Aditi 8
S.No. Name ID

S.No. Course ID

01. B-Tech 2

02. B-Tech 5

03. B-Tech 7

04. B-Tech 8

S.No
. Branch ID

01. Computer 2

02. Electronics 5

03. IT 7

04. E & TC 8

Columnar Data Model uses the concept of keyspace, which is like a schema in relational
models.
Advantages of Columnar Data Model :
 Well structured: Since these data models are good at compression so these are very
structured or well organized in terms of storage.
 Flexibility: A large amount of flexibility as it is not necessary for the columns to look
like each other, which means one can add new and different columns without disrupting
the whole database
 Aggregation queries are fast: The most important thing is aggregation queries are
quite fast because a majority of the information is stored in a column. An example would
be Adding up the total number of students enrolled in one year.
 Scalability: It can be spread across large clusters of machines, even numbering in
thousands.
 Load Times: Since one can easily load a row table in a few seconds so load times are
nearly excellent.
Disadvantages of Columnar Data Model:
 Designing indexing Schema: To design an effective and working schema is too
difficult and very time-consuming.
 Suboptimal data loading: incremental data loading is suboptimal and must be
avoided, but this might not be an issue for some users.
 Security vulnerabilities: If security is one of the priorities then it must be known that
the Columnar data model lacks inbuilt security features in this case, one must look into
relational databases.
 Online Transaction Processing (OLTP): Online Transaction Processing (OLTP)
applications are also not compatible with columnar data models because of the way data
is stored.
Applications of Columnar Data Model:
 Columnar Data Model is very much used in various Blogging Platforms.
 It is used in Content management systems like WordPress, Joomla, etc.
 It is used in Systems that maintain counters.
 It is used in Systems that require heavy write requests.
 It is used in Services that have expiring usage.

Architecture of HBase

 Read

 Discuss

 Courses



Prerequisites –
Introduction to Hadoop, Apache HBase
HBase architecture has 3 main components: HMaster, Region Server, Zookeeper.
Figure – Architecture of HBase
All the 3 components are described below:

1. HMaster –
The implementation of Master Server in HBase is HMaster. It is a process in which
regions are assigned to region server as well as DDL (create, delete table) operations. It
monitor all Region Server instances present in the cluster. In a distributed environment,
Master runs several background threads. HMaster has many features like controlling load
balancing, failover etc.

2. Region Server –
HBase Tables are divided horizontally by row key range into Regions. Regions are the
basic building elements of HBase cluster that consists of the distribution of tables and
are comprised of Column families. Region Server runs on HDFS DataNode which is
present in Hadoop cluster. Regions of Region Server are responsible for several things,
like handling, managing, executing as well as reads and writes HBase operations on that
set of regions. The default size of a region is 256 MB.
3. Zookeeper –
It is like a coordinator in HBase. It provides services like maintaining configuration
information, naming, providing distributed synchronization, server failure notification etc.
Clients communicate with region servers via zookeeper.

Advantages of HBase –

1. Can store large data sets

2. Database can be shared

3. Cost-effective from gigabytes to petabytes

4. High availability through failover and replication

Disadvantages of HBase –

1. No support SQL structure

2. No transaction support

3. Sorted only on key

4. Memory issues on the cluster

Comparison between HBase and HDFS:

 HBase provides low latency access while HDFS provides high latency operations.

 HBase supports random read and writes while HDFS supports Write once Read Many
times.

 HBase is accessed through shell commands, Java API, REST, Avro or Thrift API while HDFS
is accessed through MapReduce jobs.

Features of HBase architecture :

Distributed and Scalable: HBase is designed to be distributed and scalable, which means
it can handle large datasets and can scale out horizontally by adding more nodes to the
cluster.
Column-oriented Storage: HBase stores data in a column-oriented manner, which means
data is organized by columns rather than rows. This allows for efficient data retrieval and
aggregation.
Hadoop Integration: HBase is built on top of Hadoop, which means it can leverage
Hadoop’s distributed file system (HDFS) for storage and MapReduce for data processing.
Consistency and Replication: HBase provides strong consistency guarantees for read and
write operations, and supports replication of data across multiple nodes for fault tolerance.
Built-in Caching: HBase has a built-in caching mechanism that can cache frequently
accessed data in memory, which can improve query performance.
Compression: HBase supports compression of data, which can reduce storage
requirements and improve query performance.
Flexible Schema: HBase supports flexible schemas, which means the schema can be
updated on the fly without requiring a database schema migration.
Note – HBase is extensively used for online analytical operations, like in banking
applications such as real-time data updates in ATM machines, HBase can be used

Document Databases in NoSQL

 Read

 Discuss

 Courses



In this article, we will see about the Document Data Model of NoSQL and apart from Examples,
Advantages, Disadvantages, and Applications of the document data model.

Document Data Model:

A Document Data Model is a lot different than other data models because it stores data in JSON, BSON, or
XML documents. in this data model, we can move documents under one document and apart from this, any
particular elements can be indexed to run queries faster. Often documents are stored and retrieved in such a
way that it becomes close to the data objects which are used in many applications which means very less
translations are required to use data in applications. JSON is a native language that is often used to store
and query data too.

So in the document data model, each document has a key-value pair below is an example for the same.

"Name" : "Yashodhra",

"Address" : "Near Patel Nagar",

"Email" : "[email protected]",

"Contact" : "12345"

Working of Document Data Model:

This is a data model which works as a semi-structured data model in which the records and data associated
with them are stored in a single document which means this data model is not completely unstructured. The
main thing is that data here is stored in a document.

Features:
 Document Type Model: As we all know data is stored in documents rather than tables or graphs, so it
becomes easy to map things in many programming languages.
 Flexible Schema: Overall schema is very much flexible to support this statement one must know that not
all documents in a collection need to have the same fields.
 Distributed and Resilient: Document data models are very much dispersed which is the reason behind
horizontal scaling and distribution of data.
 Manageable Query Language: These data models are the ones in which query language allows the
developers to perform CRUD (Create Read Update Destroy) operations on the data model.

Examples of Document Data Models :

 Amazon DocumentDB
 MongoDB
 Cosmos DB
 ArangoDB
 Couchbase Server
 CouchDB

Advantages:

 Schema-less: These are very good in retaining existing data at massive volumes because there are
absolutely no restrictions in the format and the structure of data storage.
 Faster creation of document and maintenance: It is very simple to create a document and apart from this
maintenance requires is almost nothing.
 Open formats: It has a very simple build process that uses XML, JSON, and its other forms.
 Built-in versioning: It has built-in versioning which means as the documents grow in size there might be a
chance they can grow in complexity. Versioning decreases conflicts.

Disadvantages:

 Weak Atomicity: It lacks in supporting multi-document ACID transactions. A change in the document
data model involving two collections will require us to run two separate queries i.e. one for each collection.
This is where it breaks atomicity requirements.
 Consistency Check Limitations: One can search the collections and documents that are not connected to
an author collection but doing this might create a problem in the performance of database performance.
 Security: Nowadays many web applications lack security which in turn results in the leakage of sensitive
data. So it becomes a point of concern, one must pay attention to web app vulnerabilities.

Applications of Document Data Model :

 Content Management: These data models are very much used in creating various video streaming
platforms, blogs, and similar services Because each is stored as a single document and the database here is
much easier to maintain as the service evolves over time.
 Book Database: These are very much useful in making book databases because as we know this data
model lets us nest.
 Catalog: When it comes to storing and reading catalog files these data models are very much used because
it has a fast reading ability if incase Catalogs have thousands of attributes stored.
 Analytics Platform: These data models are very much used in the Analytics Platform.
Difference between Redis and Memcached

 Read

 Discuss

 Courses



1. Redis :
Redis is an open-source, key-value, NoSQL database. It is an in-memory data structure that stores all the
data served from memory and uses disk for storage. It offers a unique data model and high performance
that supports various data structures like string, list, sets, hash, which it uses as a database cache or message
broker. It is also called Data Structure Server. It does not support schema RDBMS, SQL, or ACID
transactions.

2. Memcached :
Memcached is a simple, open-source, in-memory caching system that can be used as a temporary in-
memory data storage. The stored data in memory has high read and write performance and distributes data
into multiple servers. It is a key-value of string object that is stored in memory and the API is available for
all the languages. Memcached is very efficient for websites.

Difference between Redis and Memcached –

Parameter REDIS MEMCACHED

Initial Release It was released in 2009. It was released in 2003.

It was developed by Salvatore It was developed by


Developer Sanfilippo. Danga Interactive.

Cores Used It uses single cores. It uses multiple cores.

In Memcached,
In Redis, maximum key length is maximum key length is
Length of a key 2GB. 250 bytes.

It is simple and easier to install as It may be difficult to


Installation compared to Memcached. install.

It uses only string and


It uses list, string, hashes, sorted sets integers as data
Data Structure and bitmaps as data structure. structure.

It reads and writes


It reads and writes speed is slower speed is higher than
Speed than Memcached. Redis.

Replication It supports Master-Slave Replication It does not support any


and Multi-Master Replication
methods. replication method.

It is less durable than


Durability It is more durable than Memcached. Redis.

It has Document Store, Graph


DBMS, Search Engine, and Time
Secondary Series DBMS as secondary database It has no secondary
database model models. database models.

It does not use persistent


Persistence It uses persistent data. data.

Partitioning It does not support any


method It supports Sharding. partitioning method.

About NoSQL

The label NoSQL itself has a rather fuzzy definition. “NoSQL” was coined in 1998 by Carlo Strozzi as the
name for his then-new NoSQL Database, chosen simply because it doesn’t use SQL for managing data.

The term took on a new meaning after 2009 when Johan Oskarsson organized a meetup for developers to
discuss the spread of “open source, distributed, and non relational databases”
like Cassandra and Voldemort. Oskarsson named the meetup “NOSQL” and since then the term has been
used as a catch-all for any database that doesn’t employ the relational model. Interestingly, Strozzi’s
NoSQL database does in fact employ the relational model, meaning that the original NoSQL database
doesn’t fit the contemporary definition of NoSQL.

Because “NoSQL” generally refers to any DBMS that doesn’t employ the relational model, there are
several operational data models associated with the NoSQL concept. The following table includes several
such data models, but please note that this is not a comprehensive list:

Operational Database Model Example DBMSs

Key-value store Redis, MemcacheDB

Columnar database Cassandra, Apache HBase


Operational Database Model Example DBMSs

Document store MongoDB, Couchbase

Graph database OrientDB, Neo4j

Despite these different underlying data models, most NoSQL databases share several characteristics. For
one, NoSQL databases are typically designed to maximize availability at the expense of consistency. In this
sense, consistency refers to the idea that any read operation will return the most recent data written to the
database. In a distributed database designed for strong consistency, any data written to one node will be
immediately available on all other nodes; otherwise, an error will occur.

Conversely, NoSQL databases oftentimes aim for eventual consistency. This means that newly written data
is made available on other nodes in the database eventually (usually in a matter of a few milliseconds),
though not necessarily immediately. This has the benefit of improving the availability of one’s data: even
though you may not see the very latest data written, you can still view an earlier version of it instead of
receiving an error.

Relational databases are designed to deal with normalized data that fits neatly into a predefined schema. In
the context of a DBMS, normalized data is data that’s been organized in a way to eliminate redundancies
— meaning that the database takes up as little storage space as possible — while a schema is an outline of
how the data in the database is structured.

While NoSQL databases are equipped to handle normalized data and they are able to sort data within a
predefined schema, their respective data models usually allow for far greater flexibility than the rigid
structure imposed by relational databases. Because of this, NoSQL databases have a reputation for being a
better choice for storing semi-structured and unstructured data. With that in mind, though, because NoSQL
databases don’t come with a predefined schema that often means it’s up to the database administrator to
define how the data should be organized and accessed in whatever way makes the most sense for their
application.

Now that you have some context around what NoSQL databases are and what makes them different from
relational databases, let’s take a closer look at some of the more widely-implemented NoSQL database
models.

Key-value Databases

Key-value databases, also known as key-value stores, work by storing and managing associative arrays. An
associative array, also known as a dictionary or hash table, consists of a collection of key-value pairs in
which a key serves as a unique identifier to retrieve an associated value. Values can be anything from
simple objects, like integers or strings, to more complex objects, like JSON structures.

In contrast to relational databases, which define a data structure made up of tables of rows and columns
with predefined data types, key-value databases store data as a single collection without any structure or
relation. After connecting to the database server, an application can define a key (for
example, the_meaning_of_life) and provide a matching value (for example, 42) which can later be
retrieved the same way by supplying the key. A key-value database treats any data held within it as an
opaque blob; it’s up to the application to understand how it’s structured.
Key-value databases are often described as highly performant, efficient, and scalable. Common use cases
for key-value databases are caching, message queuing, and session management.

Some popular open-source key-value data stores are:

Database Description

An in-memory data store used as a database, cache, or message broker, Redis supports a variety of dat
Redis
structures, ranging from strings to bitmaps, streams, and spatial indexes.

Memcache A general-purpose memory object caching system frequently used to speed up data-driven websites an
d applications by caching data and objects in memory.

Riak A distributed key-value database with advanced local and multi-cluster replication.

Columnar Databases

Columnar databases, sometimes called column-oriented databases, are database systems that store data in
columns. This may seem similar to traditional relational databases, but rather than grouping columns
together into tables, each column is stored in a separate file or region in the system’s storage.

The data stored in a columnar database appears in record order, meaning that the first entry in one column
is related to the first entry in other columns. This design allows queries to only read the columns they need,
rather than having to read every row in a table and discard unneeded data after it’s been stored in memory.

Because the data in each column is of the same type, it allows for various storage and read optimization
strategies. In particular, many columnar database administrators implement a compression strategy such
as run-length encoding to minimize the amount of space taken up by a single column. This can have the
benefit of speeding up reads since queries need to go over fewer rows. One drawback with columnar
databases, though, is that load performance tends to be slow since each column must be written separately
and data is often kept compressed. Incremental loads in particular, as well as reads of individual records,
can be costly in terms of performance.

Column-oriented databases have been around since the 1960s. Since the mid-2000s, though, columnar
databases have become more widely used for data analytics since the columnar data model lends itself well
to fast query processing. They’re also seen as advantageous in cases where an application needs to
frequently perform aggregate functions, such as finding the average or sum total of data in a column. Some
columnar database management systems are even capable of using SQL queries.

Some popular open-source columnar databases are:


Database Description

Apache
A column store designed to maximize scalability, availability, and performance.
Cassandra

A distributed database that supports structured storage for large amounts of data and
Apache HBase
is designed to work with the Hadoop software library.

A fault tolerant DBMS that supports real time generation of analytical data and SQL
ClickHouse
queries.

Document-oriented Databases

Document-oriented databases, or document stores, are NoSQL databases that store data in the form of
documents. Document stores are a type of key-value store: each document has a unique identifier — its key
— and the document itself serves as the value.

The difference between these two models is that, in a key-value database, the data is treated as opaque and
the database doesn’t know or care about the data held within it; it’s up to the application to understand what
data is stored. In a document store, however, each document contains some kind of metadata that provides
a degree of structure to the data. Document stores often come with an API or query language that allows
users to retrieve documents based on the metadata they contain. They also allow for complex data
structures, as you can nest documents within other documents.

Unlike relational databases, in which the information of a given object may be spread across multiple tables
or databases, a document-oriented database can store all the data of a given object in a single document.
Document stores typically store data as JSON, BSON, XML, or YAML documents, and some can store
binary formats like PDF documents. Some use a variant of SQL, full-text search, or their own native query
language for data retrieval, and others feature more than one query method.

Document-oriented databases have seen an enormous growth in popularity in recent years. Thanks to their
flexible schema, they’ve found regular use in e-commerce, blogging, and analytics platforms, as well as
content management systems. Document stores are considered highly scalable, with sharding being a
common horizontal scaling strategy. They are also excellent for keeping large amounts of unrelated,
complex information that varies in structure.

Some popular open-source document based data stores are:

Database Description

MongoDB A general purpose, distributed document store, MongoDB is the world’s most widely used document-
Database Description

oriented database at the time of this writing.

Originally known as Membase, a JSON-based, Memcached-compatible document-based data store.


Couchbase
A multi-model database, Couchbase can also function as a key-value store.

Apache A project of the Apache Software Foundation, CouchDB stores data as JSON documents and uses
CouchDB JavaScript as its query language.

Graph Databases

Graph databases can be thought of as a subcategory of the document store model, in that they store data in
documents and don’t insist that data adhere to a predefined schema. The difference, though, is that graph
databases add an extra layer to the document model by highlighting the relationships between individual
documents.

To better grasp the concept of graph databases, it’s important to understand the following terms:

 Node: A node is a representation of an individual entity tracked by a graph database. It is more or less
equivalent to the concept of a record or row in a relational database or a document in a document store. For
example, in a graph database of music recording artists, a node might represent a single performer or band.
 Property: A property is relevant information related to individual nodes. Building on our recording artist
example, some properties might be “vocalist,” “jazz,” or “platinum-selling artist,” depending on what
information is relevant to the database.
 Edge: Also known as a graph or relationship, an edge is the representation of how two nodes are related,
and is a key concept of graph databases that differentiates them from RDBMSs and document stores. Edges
can be directed or undirected.
 Undirected: In an undirected graph, the edges between nodes exist just to show a connection
between them. In this case, edges can be thought of as “two-way” relationships — there’s no
implied difference between how one node relates to the other.
 Directed: In a directed graph, edges can have different meanings based on which direction the
relationship originates from. In this case, edges are “one-way” relationships. For example, a
directed graph database might specify a relationship from Sammy to the Seaweeds showing that
Sammy produced an album for the group, but might not show an equivalent relationship from The
Seaweeds to Sammy.

Database Description

An ACID-compliant DBMS with native graph storage and processing. As of this writing, Neo4j is the most
Neo4j
popular graph database in the world.
Database Description

Not exclusively a graph database, ArangoDB is a multi-model database that unites the graph, document, and
ArangoD
key-value data models in one DBMS. It features AQL (a native SQL-like query language), full-text search, and
B
a ranking engine.

Another multi-model database, OrientDB supports the graph, document, key-value, and object models. It
OrientDB
supports SQL queries and ACID transactions.

Certain operations are much simpler to perform using graph databases because of how they link and group
related pieces of information. These databases are commonly used in cases where it’s important to be able
to gain insights from the relationships between data points or in applications where the information
available to end users is determined by their connections to others, as in a social network. They’ve found
regular use in fraud detection, recommendation engines, and identity and access management applications.

Some popular open-source graph databases are:

Databases Performing CRUD operations:

MongoDB CRUD operations

 Read

 Discuss

 Courses




As we know that we can use MongoDB for various things like building
an application (including web and mobile), or analysis of data, or an
administrator of a MongoDB database, in all these cases we need to
interact with the MongoDB server to perform certain operations like
entering new data into the application, updating data into the
application, deleting data from the application, and reading the data
of the application. MongoDB provides a set of some basic but most
essential operations that will help you to easily interact with the
MongoDB server and these operations are known as CRUD

operations.

Create Operations –

The create or insert operations are used to insert or add new


documents in the collection. If a collection does not exist, then it will
create a new collection in the database. You can perform, create
operations using the following methods provided by the MongoDB:
Method Description

It is used to insert a single document in


db.collection.insertOne() the collection.

It is used to insert multiple documents


db.collection.insertMany() in the collection.

db.createCollection() It is used to create an empty collection.

Example 1: In this example, we are inserting details of a single student


in the form of document in the student collection using
db.collection.insertOne() method.
Example 2: In this example, we are inserting details of the multiple
students in the form of documents in the student collection using
db.collection.insertMany() method.
Read Operations –

The Read operations are used to retrieve documents from the


collection, or in other words, read operations are used to query a
collection for a document. You can perform read operation using the
following method provided by the MongoDB:

Method Description

It is used to retrieve documents from the


db.collection.find() collection.

.pretty() : this method is used to decorate the result such that it is


easy to read.
Example : In this example, we are retrieving the details of students
from the student collection using db.collection.find() method.
Update Operations –

The update operations are used to update or modify the existing


document in the collection. You can perform update operations using
the following methods provided by the MongoDB:

Method Description

It is used to update a single document


in the collection that satisfy the given
db.collection.updateOne() criteria.

It is used to update multiple


documents in the collection that
db.collection.updateMany() satisfy the given criteria.

It is used to replace single document in


the collection that satisfy the given
db.collection.replaceOne() criteria.
Example 1: In this example, we are updating the age of Sumit in the
student collection using db.collection.updateOne() method.
Example 2: In this example, we are updating the year of course in all
the documents in the student collection using
db.collection.updateMany() method.
Delete Operations –

The delete operation are used to delete or remove the documents


from a collection. You can perform delete operations using the
following methods provided by the MongoDB:

Method Description

It is used to delete a single document


from the collection that satisfy the
db.collection.deleteOne() given criteria.

It is used to delete multiple documents


from the collection that satisfy the
db.collection.deleteMany() given criteria.
Example 1: In this example, we are deleting a document from the
student collection using db.collection.deleteOne() method.
Example 2: In this example, we are deleting all the documents from
the student collection using db.collection.deleteMany() method.
Whether you're preparing for your first job interview or aiming to
upskill in this

You might also like