What Is NoSQL
What Is NoSQL
NoSQL stands for ‘Not-Only SQL’. These types of databases are Non-Relational or non-tabular. A No-SQL
database does not require a specific schema and hence is schema-less, and all the entries/documents are JSON
documents. Examples − MongoDB, DynamoDB, Redis, etc.
NoSQL databases are very flexible, easy to use, developer-friendly, and provide higher performance. These types of
databases are a much better option when −
You have Big Data Applications that handle large volumes of unstructured data.
The following table highlights the major differences between SQL and NoSQL −
Key SQL NoSQL
SQL databases use standard NoSQL database has dynamic schema for
Structured Query Languages, as the unstructured data. The data stored in a
name suggests. NoSQL database is not structured.
Language SQL is an industry-standard and Data could be stored as document-oriented,
very powerful language to execute column oriented, graph-based or organized
complex queries. as a Key-Value store. The syntax can vary
from database to database.
SQL databases can extend their In order to increase the capacity of a NoSQL
capacity on a single server by database, you would have to install new
increasing their RAM, CPU or SSD. servers parallel to the parent server.
SQL databases are scalable NoSQL databases are horizontally scalable
Scalability
vertically, as their storage could be which means they can easily handle more
increased for the same server by traffic by adding new servers to the database,
enhancing their storage components. which makes them a great choice for large
and constantly changing databases.
SQL databases have a fixed, pre- NoSQL databases don’t have a pre-defined
defined schema, which makes the schema, which makes them schema-less and
Schema
data storage more rigid, static, and more flexible.
restrictive.
SQL databases can only be run on a NoSQL Databases can run on multiple
single system and because of this, systems, and hence, they support data
Data Storage they don’t follow the distribution of distribution features like data repetition,
data and hence they don’t support partition, etc., making them the best option
hierarchical storage of data. for hierarchical storage of data.
SQL databases are best suited for NoSQL databases are not so good for
Performance and complex queries but are not complex queries because these are not as
suitability preferred for hierarchical large data powerful as SQL queries but are best suited
storage. for hierarchical large data storage.
SQL databases are implemented in NoSQL is purely open source. Some of its
both open source and commercial famous implementation are MongoDB,
databases such as like Postgres & BigTable, Redis, RavenDB, Cassandra,
Examples MySQL as open source and Oracle Hbase, Neo4j, and Cou
and Sqlite as commercial.
Characteristics of NoSQL Database
Although there are different ways that can be incorporated to understand how NoSQL databases work, we will now
look at some of the most common features that define a basic NoSQL database.
1. Complex-free working
Unlike SQL databases, NoSQL databases are not complicated. They store data in an unstructured or a semi-
structured form that requires no relational or tabular arrangement. Perhaps they are easier to use and can be
accomplished by all.
2. Independent of Schema
Secondly, NoSQL databases are independent of schemas which implies that they can be run over without any
predetermined schemas.
That said, they are far more efficient to work with and perhaps this particular feature works well for young
programmers and organizations handling large amounts of heterogeneous data that requires no schemas to structure
it.
3. Better Scalability
One of the most prominent features of such a database is that it has high scalability that makes it suitable for large
amounts of data.
Needless to mention that the contemporary data scientists often prefer to work with NoSQL databases due to this
feature since it allows them to accommodate humongous data without rupturing its efficacy.
4. Flexible to accommodate
Since such databases can accommodate heterogeneous data that requires no structuring, they are claimed to be
flexible in terms of their usage and reliability.
For beginners intending to try their hands in the field, NoSQL databases are easy to handle yet very useful.
5. Durable
If durability is not one of its most striking features, then what is? NoSQL databases are highly durable as they can
accommodate data ranging from heterogeneous to homogeneous.
Not only can they accommodate structured data, but they can also incorporate unstructured data that requires no
query language. Undoubtedly, these databases are durable and efficient.
As we have gained some useful insights from the features of the NoSQL databases as to how they work, let us now
jump on to the various NoSQL database types to understand the concept in a better manner.
To begin with, NoSQL databases can be divided into 4 types. They are as follows -
1. Document Database
As the title itself indicates, the document database stores data in the form of documents. This implies that data is
grouped into files that make it easier to be recognized when it is required for building application software.
One of the major benefits of a document database is that it allows the developer to store data in a particular format
of documents according to the same format they follow for their applications.
It is a semi-structured and hierarchical NoSQL database that allows efficient storage of data. Especially when it
comes to user profiles or catalogs, this type of NoSQL database works very well. A typical NoSQL database
example is Mongodb.
(Also read - Hadoop vs Mongodb)
2. Key-Value Database
Termed to be the simplest form of NoSQL database of all other types, the key-value database is a database that
stores data in a schema-less manner. This type of database stores data in the key-value format.
Herein, a data point is categorized as a key to which a value (another data point) is allotted. For instance, a key data
point can be termed as 'age' while the value data point can be termed as '45'.
This way, data gets stored in an organized manner with the help of associative pairing. A typical example of this
type is Amazon's Dynamo database.
"Hundreds of thousands of AWS customers have chosen DynamoDB as their key-value and document database for
mobile, web, gaming, ad tech, IoT, and other applications that need low-latency data access at any scale."- Amazon's
Dynamo
3. Column-oriented Database
Another type of NoSQL database is the column-oriented database. This type of database stores data in the form of
columns that segregates information into homogenous categories.
This allows the user to access only the desired data without having to retrieve unnecessary information.
When it comes to data analytics in social media networking sites, the column-oriented database works very
efficiently by showcasing data that is prevalent in the search results.
Since such types of databases accommodate large amounts of data, it is better to filter out information. This is
exactly what the column-oriented database does. A typical example of a column-oriented NoSQL database
is Apache HBase.
4. Graph Database
The 4th type of NoSQL database is the graph database. Herein, data is stored in the form of graphical knowledge
and related elements like edges, nodes, etc.
Data points are placed in such a manner that nodes are related to edges and thus, a network or connection is
established between several data points.
This way, one data point leads to the other without the user having to retrieve individual data points. In the case of
software development, this type of database works well since connected data points often lead to networked data
storage.
This, in turn, makes the functioning of software highly effective and organized. An example of the graph NoSQL
database is Amazon Neptune.
“Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to build and run
applications that work with highly connected datasets. “
Advantages of NoSQL: There are many advantages of working with NoSQL databases such as MongoDB and
Cassandra. The main advantages are high scalability and high availability.
1. High scalability: NoSQL databases use sharding for horizontal scaling. Partitioning of data and placing it
on multiple machines in such a way that the order of the data is preserved is sharding. Vertical scaling
means adding more resources to the existing machine whereas horizontal scaling means adding more
machines to handle the data. Vertical scaling is not that easy to implement but horizontal scaling is easy to
implement. Examples of horizontal scaling databases are MongoDB, Cassandra, etc. NoSQL can handle a
huge amount of data because of scalability, as the data grows NoSQL scalesThe auto itself to handle that
data in an efficient manner.
2. Flexibility: NoSQL databases are designed to handle unstructured or semi-structured data, which means
that they can accommodate dynamic changes to the data model. This makes NoSQL databases a good fit
for applications that need to handle changing data requirements.
3. High availability: The auto, replication feature in NoSQL databases makes it highly available because in
case of any failure data replicates itself to the previous consistent state.
4. Scalability: NoSQL databases are highly scalable, which means that they can handle large amounts of data
and traffic with ease. This makes them a good fit for applications that need to handle large amounts of data
or traffic
5. Performance: NoSQL databases are designed to handle large amounts of data and traffic, which means that
they can offer improved performance compared to traditional relational databases.
6. Cost-effectiveness: NoSQL databases are often more cost-effective than traditional relational databases, as
they are typically less complex and do not require expensive hardware or software.
7. Agility: Ideal for agile development.
1. Lack of standardization: There are many different types of NoSQL databases, each with its own unique
strengths and weaknesses. This lack of standardization can make it difficult to choose the right database for
a specific application
2. Lack of ACID compliance: NoSQL databases are not fully ACID-compliant, which means that they do not
guarantee the consistency, integrity, and durability of data. This can be a drawback for applications that
require strong data consistency guarantees.
3. Narrow focus: NoSQL databases have a very narrow focus as it is mainly designed for storage but it
provides very little functionality. Relational databases are a better choice in the field of Transaction
Management than NoSQL.
4. Open-source: NoSQL is an databaseopen-source database. There is no reliable standard for NoSQL yet. In
other words, two database systems are likely to be unequal.
5. Lack of support for complex queries: NoSQL databases are not designed to handle complex queries, which
means that they are not a good fit for applications that require complex data analysis or reporting.
6. Lack of maturity: NoSQL databases are relatively new and lack the maturity of traditional relational
databases. This can make them less reliable and less secure than traditional databases.
7. Management challenge: The purpose of big data tools is to make the management of a large amount of data
as simple as possible. But it is not so easy. Data management in NoSQL is much more complex than in a
relational database. NoSQL, in particular, has a reputation for being challenging to install and even more
hectic to manage on a daily basis.
8. GUI is not available: GUI mode tools to access the database are not flexibly available in the market.
9. Backup: Backup is a great weak point for some NoSQL databases like MongoDB. MongoDB has no
approach for the backup of data in a consistent manner.
10. Large document size: Some database systems like MongoDB and CouchDB store data in JSON format.
This means that documents are quite large (BigData, network bandwidth, speed), and having descriptive
key names actually hurts since they increase the document size.
Read
Discuss
Courses
MongoDB is an open-source document-oriented database that is designed to store a large scale of data and
also allows you to work with that data very efficiently. It is categorized under the NoSQL (Not only SQL)
database because the storage and retrieval of data in the MongoDB are not in the form of tables.
The MongoDB database is developed and managed by MongoDB.Inc under SSPL(Server Side Public
License) and initially released in February 2009. It also provides official driver support for all the popular
languages like C, C++, C#, and .Net, Go, Java, Node.js, Perl, PHP, Python, Motor, Ruby, Scala, Swift,
Mongoid. So, that you can create an application using any of these languages. Nowadays there are so many
companies that used MongoDB like Facebook, Nokia, eBay, Adobe, Google, etc. to store their large
amount of data.
How it works ?
Now, we will see how actually thing happens behind the scene. As we know that MongoDB is a database
server and the data is stored in these databases. Or in other words, MongoDB environment gives you a
server that you can start and then create multiple databases on it using MongoDB.
Because of its NoSQL database, the data is stored in the collections and documents. Hence the database,
collection, and documents are related to each other as shown below:
The MongoDB database contains collections just like the MYSQL database contains tables. You are
allowed to create multiple databases and multiple collections.
Now inside of the collection we have documents. These documents contain the data we want to store in the
MongoDB database and a single collection can contain multiple documents and you are schema-less means
it is not necessary that one document is similar to another.
The documents are created using the fields. Fields are key-value pairs in the documents, it is just like
columns in the relation database. The value of the fields can be of any BSON data types like double, string,
boolean, etc.
The data stored in the MongoDB is in the format of BSON documents. Here, BSON stands for Binary
representation of JSON documents. Or in other words, in the backend, the MongoDB server converts the
JSON data into a binary form that is known as BSON and this BSON is stored and queried more
efficiently.
In MongoDB documents, you are allowed to store nested data. This nesting of data allows you to create
complex relations between data and store them in the same document which makes the working and
fetching of data extremely efficient as compared to SQL. In SQL, you need to write complex joins to get
the data from table 1 and table 2. The maximum size of the BSON document is 16MB.
For example, we have a database named GeeksforGeeks. Inside this database, we have two collections and
in these collections we have two documents. And in these documents we store our data in the form of
fields. As shown in the below image:
How mongoDB is different from RDBMS ?
Some major differences in between MongoDB and the RDBMS are as follows:
MongoDB RDBMS
Features of MongoDB –
Schema-less Database: It is the great feature provided by the MongoDB. A Schema-less database means
one collection can hold different types of documents in it. Or in other words, in the MongoDB database, a
single collection can hold multiple documents and these documents may consist of the different numbers of
fields, content, and size. It is not necessary that the one document is similar to another document like in the
relational databases. Due to this cool feature, MongoDB provides great flexibility to databases.
Document Oriented: In MongoDB, all the data stored in the documents instead of tables like in RDBMS.
In these documents, the data is stored in fields(key-value pair) instead of rows and columns which make the
data much more flexible in comparison to RDBMS. And each document contains its unique object id.
Indexing: In MongoDB database, every field in the documents is indexed with primary and secondary
indices this makes easier and takes less time to get or search data from the pool of the data. If the data is not
indexed, then database search each document with the specified query which takes lots of time and not so
efficient.
Scalability: MongoDB provides horizontal scalability with the help of sharding. Sharding means to
distribute data on multiple servers, here a large amount of data is partitioned into data chunks using the
shard key, and these data chunks are evenly distributed across shards that reside across many physical
servers. It will also add new machines to a running database.
Replication: MongoDB provides high availability and redundancy with the help of replication, it creates
multiple copies of the data and sends these copies to a different server so that if one server fails, then the
data is retrieved from another server.
Aggregation: It allows to perform operations on the grouped data and get a single result or computed
result. It is similar to the SQL GROUPBY clause. It provides three different aggregations i.e, aggregation
pipeline, map-reduce function, and single-purpose aggregation methods
High Performance: The performance of MongoDB is very high and data persistence as compared to
another database due to its features like scalability, indexing, replication, etc.
Advantages of MongoDB :
It is a schema-less NoSQL database. You need not to design the schema of the database when you are
working with MongoDB.
It does not support join operation.
It provides great flexibility to the fields in the documents.
It contains heterogeneous data.
It provides high performance, availability, scalability.
It supports Geospatial efficiently.
It is a document oriented database and the data is stored in BSON documents.
It also supports multiple document ACID transition(string from MongoDB 4.0).
It does not require any SQL injection.
It is easily integrated with Big Data Hadoop
Disadvantages of MongoDB :
Read
Discuss
Courses
Architecture Pattern is a logical way of categorizing data that will be stored on the Database. NoSQL is a type of
database which helps to perform operations on big data and store it in a valid format. It is widely used because of its
flexibility and a wide variety of services.
3. Document Database
4. Graph Database
These are explained as following below.
Advantages:
Limitations:
Complex queries may attempt to involve multiple key-value pairs which may delay performance.
Data can be involving many-to-many relationships which may collide.
Examples:
DynamoDB
Berkeley DB
Examples:
HBase
Bigtable by Google
Cassandra
3. Document Database:
The document database fetches and accumulates data in form of key-value pairs but here, the values are called as
Documents. Document can be stated as a complex data structure. Document here can be a form of text, arrays,
strings, JSON, XML or any such format. The use of nested documents is also very common. It is very effective as
most of the data created is usually in form of JSONs and is unstructured.
Advantages:
This type of format is very useful and apt for semi-structured data.
Storage retrieval and managing of documents is easy.
Limitations:
Examples:
MongoDB
CouchDB
Figure – Document Store Model in form of JSON documents
4. Graph Databases:
Clearly, this architecture pattern deals with the storage and management of data in graphs. Graphs are basically
structures that depict connections between two or more objects in some data. The objects or entities are called as
nodes and are joined together by relationships called Edges. Each edge has a unique identifier. Each node serves as a
point of contact for the graph. This pattern is very commonly used in social networks where there are a large number
of entities and each entity has one or many characteristics which are connected by edges. The relational database
pattern has tables that are loosely connected, whereas graphs are often very strong and rigid in nature.
Advantages:
Limitations:
Wrong connections may lead to infinite loops.
Examples:
Neo4J
FlockDB( Used by Twitter)
Figure – Graph model format of NoSQL Databases
Read
Discuss
Courses
The Columnar Data Model of NoSQL is important. NoSQL databases are different from SQL
databases. This is because it uses a data model that has a different structure than the
previously followed row-and-column table model used with relational database management
systems (RDBMS). NoSQL databases are a flexible schema model which is designed to scale
horizontally across many servers and is used in large volumes of data.
Columnar Data Model of NoSQL :
Basically, the relational database stores data in rows and also reads the data row by row,
column store is organized as a set of columns. So if someone wants to run analytics on a
small number of columns, one can read those columns directly without consuming memory
with the unwanted data. Columns are somehow are of the same type and gain from more
efficient compression, which makes reads faster than before. Examples of Columnar Data
Model: Cassandra and Apache Hadoop Hbase.
Working of Columnar Data Model:
In Columnar Data Model instead of organizing information into rows, it does in columns. This
makes them function the same way that tables work in relational databases. This type of
data model is much more flexible obviously because it is a type of NoSQL database. The
below example will help in understanding the Columnar data model:
Row-Oriented Table:
S.No. Name Course Branch ID
01. Tanmay 2
02. Abhishek 5
03. Samriddha 7
04. Aditi 8
S.No. Name ID
S.No. Course ID
01. B-Tech 2
02. B-Tech 5
03. B-Tech 7
04. B-Tech 8
S.No
. Branch ID
01. Computer 2
02. Electronics 5
03. IT 7
04. E & TC 8
Columnar Data Model uses the concept of keyspace, which is like a schema in relational
models.
Advantages of Columnar Data Model :
Well structured: Since these data models are good at compression so these are very
structured or well organized in terms of storage.
Flexibility: A large amount of flexibility as it is not necessary for the columns to look
like each other, which means one can add new and different columns without disrupting
the whole database
Aggregation queries are fast: The most important thing is aggregation queries are
quite fast because a majority of the information is stored in a column. An example would
be Adding up the total number of students enrolled in one year.
Scalability: It can be spread across large clusters of machines, even numbering in
thousands.
Load Times: Since one can easily load a row table in a few seconds so load times are
nearly excellent.
Disadvantages of Columnar Data Model:
Designing indexing Schema: To design an effective and working schema is too
difficult and very time-consuming.
Suboptimal data loading: incremental data loading is suboptimal and must be
avoided, but this might not be an issue for some users.
Security vulnerabilities: If security is one of the priorities then it must be known that
the Columnar data model lacks inbuilt security features in this case, one must look into
relational databases.
Online Transaction Processing (OLTP): Online Transaction Processing (OLTP)
applications are also not compatible with columnar data models because of the way data
is stored.
Applications of Columnar Data Model:
Columnar Data Model is very much used in various Blogging Platforms.
It is used in Content management systems like WordPress, Joomla, etc.
It is used in Systems that maintain counters.
It is used in Systems that require heavy write requests.
It is used in Services that have expiring usage.
Architecture of HBase
Read
Discuss
Courses
Prerequisites –
Introduction to Hadoop, Apache HBase
HBase architecture has 3 main components: HMaster, Region Server, Zookeeper.
Figure – Architecture of HBase
All the 3 components are described below:
1. HMaster –
The implementation of Master Server in HBase is HMaster. It is a process in which
regions are assigned to region server as well as DDL (create, delete table) operations. It
monitor all Region Server instances present in the cluster. In a distributed environment,
Master runs several background threads. HMaster has many features like controlling load
balancing, failover etc.
2. Region Server –
HBase Tables are divided horizontally by row key range into Regions. Regions are the
basic building elements of HBase cluster that consists of the distribution of tables and
are comprised of Column families. Region Server runs on HDFS DataNode which is
present in Hadoop cluster. Regions of Region Server are responsible for several things,
like handling, managing, executing as well as reads and writes HBase operations on that
set of regions. The default size of a region is 256 MB.
3. Zookeeper –
It is like a coordinator in HBase. It provides services like maintaining configuration
information, naming, providing distributed synchronization, server failure notification etc.
Clients communicate with region servers via zookeeper.
Advantages of HBase –
Disadvantages of HBase –
2. No transaction support
HBase provides low latency access while HDFS provides high latency operations.
HBase supports random read and writes while HDFS supports Write once Read Many
times.
HBase is accessed through shell commands, Java API, REST, Avro or Thrift API while HDFS
is accessed through MapReduce jobs.
Distributed and Scalable: HBase is designed to be distributed and scalable, which means
it can handle large datasets and can scale out horizontally by adding more nodes to the
cluster.
Column-oriented Storage: HBase stores data in a column-oriented manner, which means
data is organized by columns rather than rows. This allows for efficient data retrieval and
aggregation.
Hadoop Integration: HBase is built on top of Hadoop, which means it can leverage
Hadoop’s distributed file system (HDFS) for storage and MapReduce for data processing.
Consistency and Replication: HBase provides strong consistency guarantees for read and
write operations, and supports replication of data across multiple nodes for fault tolerance.
Built-in Caching: HBase has a built-in caching mechanism that can cache frequently
accessed data in memory, which can improve query performance.
Compression: HBase supports compression of data, which can reduce storage
requirements and improve query performance.
Flexible Schema: HBase supports flexible schemas, which means the schema can be
updated on the fly without requiring a database schema migration.
Note – HBase is extensively used for online analytical operations, like in banking
applications such as real-time data updates in ATM machines, HBase can be used
Read
Discuss
Courses
In this article, we will see about the Document Data Model of NoSQL and apart from Examples,
Advantages, Disadvantages, and Applications of the document data model.
A Document Data Model is a lot different than other data models because it stores data in JSON, BSON, or
XML documents. in this data model, we can move documents under one document and apart from this, any
particular elements can be indexed to run queries faster. Often documents are stored and retrieved in such a
way that it becomes close to the data objects which are used in many applications which means very less
translations are required to use data in applications. JSON is a native language that is often used to store
and query data too.
So in the document data model, each document has a key-value pair below is an example for the same.
"Name" : "Yashodhra",
"Email" : "[email protected]",
"Contact" : "12345"
This is a data model which works as a semi-structured data model in which the records and data associated
with them are stored in a single document which means this data model is not completely unstructured. The
main thing is that data here is stored in a document.
Features:
Document Type Model: As we all know data is stored in documents rather than tables or graphs, so it
becomes easy to map things in many programming languages.
Flexible Schema: Overall schema is very much flexible to support this statement one must know that not
all documents in a collection need to have the same fields.
Distributed and Resilient: Document data models are very much dispersed which is the reason behind
horizontal scaling and distribution of data.
Manageable Query Language: These data models are the ones in which query language allows the
developers to perform CRUD (Create Read Update Destroy) operations on the data model.
Amazon DocumentDB
MongoDB
Cosmos DB
ArangoDB
Couchbase Server
CouchDB
Advantages:
Schema-less: These are very good in retaining existing data at massive volumes because there are
absolutely no restrictions in the format and the structure of data storage.
Faster creation of document and maintenance: It is very simple to create a document and apart from this
maintenance requires is almost nothing.
Open formats: It has a very simple build process that uses XML, JSON, and its other forms.
Built-in versioning: It has built-in versioning which means as the documents grow in size there might be a
chance they can grow in complexity. Versioning decreases conflicts.
Disadvantages:
Weak Atomicity: It lacks in supporting multi-document ACID transactions. A change in the document
data model involving two collections will require us to run two separate queries i.e. one for each collection.
This is where it breaks atomicity requirements.
Consistency Check Limitations: One can search the collections and documents that are not connected to
an author collection but doing this might create a problem in the performance of database performance.
Security: Nowadays many web applications lack security which in turn results in the leakage of sensitive
data. So it becomes a point of concern, one must pay attention to web app vulnerabilities.
Content Management: These data models are very much used in creating various video streaming
platforms, blogs, and similar services Because each is stored as a single document and the database here is
much easier to maintain as the service evolves over time.
Book Database: These are very much useful in making book databases because as we know this data
model lets us nest.
Catalog: When it comes to storing and reading catalog files these data models are very much used because
it has a fast reading ability if incase Catalogs have thousands of attributes stored.
Analytics Platform: These data models are very much used in the Analytics Platform.
Difference between Redis and Memcached
Read
Discuss
Courses
1. Redis :
Redis is an open-source, key-value, NoSQL database. It is an in-memory data structure that stores all the
data served from memory and uses disk for storage. It offers a unique data model and high performance
that supports various data structures like string, list, sets, hash, which it uses as a database cache or message
broker. It is also called Data Structure Server. It does not support schema RDBMS, SQL, or ACID
transactions.
2. Memcached :
Memcached is a simple, open-source, in-memory caching system that can be used as a temporary in-
memory data storage. The stored data in memory has high read and write performance and distributes data
into multiple servers. It is a key-value of string object that is stored in memory and the API is available for
all the languages. Memcached is very efficient for websites.
In Memcached,
In Redis, maximum key length is maximum key length is
Length of a key 2GB. 250 bytes.
About NoSQL
The label NoSQL itself has a rather fuzzy definition. “NoSQL” was coined in 1998 by Carlo Strozzi as the
name for his then-new NoSQL Database, chosen simply because it doesn’t use SQL for managing data.
The term took on a new meaning after 2009 when Johan Oskarsson organized a meetup for developers to
discuss the spread of “open source, distributed, and non relational databases”
like Cassandra and Voldemort. Oskarsson named the meetup “NOSQL” and since then the term has been
used as a catch-all for any database that doesn’t employ the relational model. Interestingly, Strozzi’s
NoSQL database does in fact employ the relational model, meaning that the original NoSQL database
doesn’t fit the contemporary definition of NoSQL.
Because “NoSQL” generally refers to any DBMS that doesn’t employ the relational model, there are
several operational data models associated with the NoSQL concept. The following table includes several
such data models, but please note that this is not a comprehensive list:
Despite these different underlying data models, most NoSQL databases share several characteristics. For
one, NoSQL databases are typically designed to maximize availability at the expense of consistency. In this
sense, consistency refers to the idea that any read operation will return the most recent data written to the
database. In a distributed database designed for strong consistency, any data written to one node will be
immediately available on all other nodes; otherwise, an error will occur.
Conversely, NoSQL databases oftentimes aim for eventual consistency. This means that newly written data
is made available on other nodes in the database eventually (usually in a matter of a few milliseconds),
though not necessarily immediately. This has the benefit of improving the availability of one’s data: even
though you may not see the very latest data written, you can still view an earlier version of it instead of
receiving an error.
Relational databases are designed to deal with normalized data that fits neatly into a predefined schema. In
the context of a DBMS, normalized data is data that’s been organized in a way to eliminate redundancies
— meaning that the database takes up as little storage space as possible — while a schema is an outline of
how the data in the database is structured.
While NoSQL databases are equipped to handle normalized data and they are able to sort data within a
predefined schema, their respective data models usually allow for far greater flexibility than the rigid
structure imposed by relational databases. Because of this, NoSQL databases have a reputation for being a
better choice for storing semi-structured and unstructured data. With that in mind, though, because NoSQL
databases don’t come with a predefined schema that often means it’s up to the database administrator to
define how the data should be organized and accessed in whatever way makes the most sense for their
application.
Now that you have some context around what NoSQL databases are and what makes them different from
relational databases, let’s take a closer look at some of the more widely-implemented NoSQL database
models.
Key-value Databases
Key-value databases, also known as key-value stores, work by storing and managing associative arrays. An
associative array, also known as a dictionary or hash table, consists of a collection of key-value pairs in
which a key serves as a unique identifier to retrieve an associated value. Values can be anything from
simple objects, like integers or strings, to more complex objects, like JSON structures.
In contrast to relational databases, which define a data structure made up of tables of rows and columns
with predefined data types, key-value databases store data as a single collection without any structure or
relation. After connecting to the database server, an application can define a key (for
example, the_meaning_of_life) and provide a matching value (for example, 42) which can later be
retrieved the same way by supplying the key. A key-value database treats any data held within it as an
opaque blob; it’s up to the application to understand how it’s structured.
Key-value databases are often described as highly performant, efficient, and scalable. Common use cases
for key-value databases are caching, message queuing, and session management.
Database Description
An in-memory data store used as a database, cache, or message broker, Redis supports a variety of dat
Redis
structures, ranging from strings to bitmaps, streams, and spatial indexes.
Memcache A general-purpose memory object caching system frequently used to speed up data-driven websites an
d applications by caching data and objects in memory.
Riak A distributed key-value database with advanced local and multi-cluster replication.
Columnar Databases
Columnar databases, sometimes called column-oriented databases, are database systems that store data in
columns. This may seem similar to traditional relational databases, but rather than grouping columns
together into tables, each column is stored in a separate file or region in the system’s storage.
The data stored in a columnar database appears in record order, meaning that the first entry in one column
is related to the first entry in other columns. This design allows queries to only read the columns they need,
rather than having to read every row in a table and discard unneeded data after it’s been stored in memory.
Because the data in each column is of the same type, it allows for various storage and read optimization
strategies. In particular, many columnar database administrators implement a compression strategy such
as run-length encoding to minimize the amount of space taken up by a single column. This can have the
benefit of speeding up reads since queries need to go over fewer rows. One drawback with columnar
databases, though, is that load performance tends to be slow since each column must be written separately
and data is often kept compressed. Incremental loads in particular, as well as reads of individual records,
can be costly in terms of performance.
Column-oriented databases have been around since the 1960s. Since the mid-2000s, though, columnar
databases have become more widely used for data analytics since the columnar data model lends itself well
to fast query processing. They’re also seen as advantageous in cases where an application needs to
frequently perform aggregate functions, such as finding the average or sum total of data in a column. Some
columnar database management systems are even capable of using SQL queries.
Apache
A column store designed to maximize scalability, availability, and performance.
Cassandra
A distributed database that supports structured storage for large amounts of data and
Apache HBase
is designed to work with the Hadoop software library.
A fault tolerant DBMS that supports real time generation of analytical data and SQL
ClickHouse
queries.
Document-oriented Databases
Document-oriented databases, or document stores, are NoSQL databases that store data in the form of
documents. Document stores are a type of key-value store: each document has a unique identifier — its key
— and the document itself serves as the value.
The difference between these two models is that, in a key-value database, the data is treated as opaque and
the database doesn’t know or care about the data held within it; it’s up to the application to understand what
data is stored. In a document store, however, each document contains some kind of metadata that provides
a degree of structure to the data. Document stores often come with an API or query language that allows
users to retrieve documents based on the metadata they contain. They also allow for complex data
structures, as you can nest documents within other documents.
Unlike relational databases, in which the information of a given object may be spread across multiple tables
or databases, a document-oriented database can store all the data of a given object in a single document.
Document stores typically store data as JSON, BSON, XML, or YAML documents, and some can store
binary formats like PDF documents. Some use a variant of SQL, full-text search, or their own native query
language for data retrieval, and others feature more than one query method.
Document-oriented databases have seen an enormous growth in popularity in recent years. Thanks to their
flexible schema, they’ve found regular use in e-commerce, blogging, and analytics platforms, as well as
content management systems. Document stores are considered highly scalable, with sharding being a
common horizontal scaling strategy. They are also excellent for keeping large amounts of unrelated,
complex information that varies in structure.
Database Description
MongoDB A general purpose, distributed document store, MongoDB is the world’s most widely used document-
Database Description
Apache A project of the Apache Software Foundation, CouchDB stores data as JSON documents and uses
CouchDB JavaScript as its query language.
Graph Databases
Graph databases can be thought of as a subcategory of the document store model, in that they store data in
documents and don’t insist that data adhere to a predefined schema. The difference, though, is that graph
databases add an extra layer to the document model by highlighting the relationships between individual
documents.
To better grasp the concept of graph databases, it’s important to understand the following terms:
Node: A node is a representation of an individual entity tracked by a graph database. It is more or less
equivalent to the concept of a record or row in a relational database or a document in a document store. For
example, in a graph database of music recording artists, a node might represent a single performer or band.
Property: A property is relevant information related to individual nodes. Building on our recording artist
example, some properties might be “vocalist,” “jazz,” or “platinum-selling artist,” depending on what
information is relevant to the database.
Edge: Also known as a graph or relationship, an edge is the representation of how two nodes are related,
and is a key concept of graph databases that differentiates them from RDBMSs and document stores. Edges
can be directed or undirected.
Undirected: In an undirected graph, the edges between nodes exist just to show a connection
between them. In this case, edges can be thought of as “two-way” relationships — there’s no
implied difference between how one node relates to the other.
Directed: In a directed graph, edges can have different meanings based on which direction the
relationship originates from. In this case, edges are “one-way” relationships. For example, a
directed graph database might specify a relationship from Sammy to the Seaweeds showing that
Sammy produced an album for the group, but might not show an equivalent relationship from The
Seaweeds to Sammy.
Database Description
An ACID-compliant DBMS with native graph storage and processing. As of this writing, Neo4j is the most
Neo4j
popular graph database in the world.
Database Description
Not exclusively a graph database, ArangoDB is a multi-model database that unites the graph, document, and
ArangoD
key-value data models in one DBMS. It features AQL (a native SQL-like query language), full-text search, and
B
a ranking engine.
Another multi-model database, OrientDB supports the graph, document, key-value, and object models. It
OrientDB
supports SQL queries and ACID transactions.
Certain operations are much simpler to perform using graph databases because of how they link and group
related pieces of information. These databases are commonly used in cases where it’s important to be able
to gain insights from the relationships between data points or in applications where the information
available to end users is determined by their connections to others, as in a social network. They’ve found
regular use in fraud detection, recommendation engines, and identity and access management applications.
Read
Discuss
Courses
As we know that we can use MongoDB for various things like building
an application (including web and mobile), or analysis of data, or an
administrator of a MongoDB database, in all these cases we need to
interact with the MongoDB server to perform certain operations like
entering new data into the application, updating data into the
application, deleting data from the application, and reading the data
of the application. MongoDB provides a set of some basic but most
essential operations that will help you to easily interact with the
MongoDB server and these operations are known as CRUD
operations.
Create Operations –
Method Description
Method Description
Method Description