0% found this document useful (0 votes)
21 views10 pages

NoSQL Databases

NoSQL databases, which stand for Not Only SQL, are non-relational databases that can store structured, unstructured, and semi-structured data, offering flexibility, scalability, and global availability. They are categorized into four main types: Document, Key-value, Wide-column, and Graph databases, each with unique advantages and use cases. NoSQL databases are particularly useful in scenarios involving constant data changes, large data volumes, and when data consistency is not a priority.

Uploaded by

pichedekho3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views10 pages

NoSQL Databases

NoSQL databases, which stand for Not Only SQL, are non-relational databases that can store structured, unstructured, and semi-structured data, offering flexibility, scalability, and global availability. They are categorized into four main types: Document, Key-value, Wide-column, and Graph databases, each with unique advantages and use cases. NoSQL databases are particularly useful in scenarios involving constant data changes, large data volumes, and when data consistency is not a priority.

Uploaded by

pichedekho3
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

NoSQL databases

NoSQL stands for Not Only SQL, meaning that NoSQL databases have the
specificity of not being relational because they can store data in an
unstructured format. The following graphic highlights the main five key
features of NoSQL databases.

Why are NoSQL databases important?


NoSQL databases have become popular in the industry because of the
following benefits:
 Multi-mode data: NoSQL databases offer more flexibility than traditional
SQL databases because they can store structured (e.g. data captured
from sensors), unstructured (images, videos, etc.), and semi-structured
(XML, JSON, etc.) data.
 Easy scalability: this is made simple because of their peer-to-peer
architectures, meaning multiple machines can be added to the
architecture.
 Global availability: this makes it possible to access the same data
simultaneously through different machines from different geographical
zones because the database is shared globally.
 Flexibility: NoSQL databases can rapidly adapt to changing requirements
with frequent updates and new features.
NoSQL Databases vs. SQL Databases

SQL Databases NoSQL Databases

SQL databases use


On the other hand, NoSQL databases
structured query languages
use a dynamic schema to query data.
to perform operations,
Language Also, some NoSQL databases use
requiring the use of
SQL-like syntax for document
predefined schema to better
manipulation.
interact with the data.

NoSQL databases are more flexible.


SQL databases have a This flexibility means that records in
Data predefined and fixed format, the databases can be created
Schema which cannot be changed without having a predefined
for new data. structure, and each record has its
own structure.

SQL databases are only


NoSQL databases are horizontally
vertically scalable, meaning
scalable, meaning that additional
that a single machine needs
Scalability machines are added to the existing
to increase CPU, RAM, SSD,
infrastructure to satisfy the storage
at a certain level to meet
demand.
the demand.
The horizontal scaling and dynamic
data schema make NoSQL suitable
The vertical scaling makes it
for big data. Also, NoSQL databases
Big Data difficult for SQL databases to
were developed by top internet
Support store very big data
companies (Amazon, Google, Yahoo,
(petabytes).
etc.) to face the challenges of the
rapidly increasing amount of data.

SQL databases use the ACID NoSQL databases, on the other hand,
(Atomicity, Consistency, use the CAP (Consistency,
Properties
Isolation, Durability) Availability, Partition Tolerance)
property. property.

When should NoSQL databases be used?


In this fast-growing and competitive environment, industries need to collect as
much data as possible to satisfy their business goals. Collecting data is one
thing, but storing them in the right infrastructure is another challenge. The
difficulty comes because data can be of different types such as images, videos,
text, and sounds. Using relational databases to store these different data types
is not always a smart move. However, the question remains:
When to use NoSQL instead of SQL?
You should consider using NoSQL when you are in the following scenario:
 Constant changing of data: when you do not know how your system or
applications will grow in the future, meaning that you might want to add
new data types, new functions, etc.
 A lot of data: when your business is dealing with huge data that might
grow over time.
 No consistency: when data consistency and 100% integrity are not your
priority. For example, when you develop a social media platform for your
business, all the employees seeing your posts at once might not be an
issue.
 Scalability and cost: NoSQL databases allow greater flexibility and can
control costs as your data needs change.
4 Main Types of NoSQL Databases
NoSQL databases are divided into four main categories. Each one has its
specificity, so you should choose the one that best fits your use case: Below,
we've highlighted the main NoSQL database examples. This section aims to
cover each of these databases by providing their role and a non-exhaustive list
of their advantages and limitations, and their use cases.

1. Document Databases
This type of database is designed to store and query JSON, XML, BSON, etc.,
documents. Each document is a row or a record in the database and is in the
key-value format. A document stores information about one object and its
related data. For instance, the following database contains three records, each
one gives information about a student. For the first document, firstname is a
key, and Franck is its value.
Document Database Advantages
 Schemaless: there are no limitations in terms of the format and
structure of the data storage. This is beneficial, especially when there is a
continuous transformation in the database.
 Easy to update: a piece of new information can be added or deleted
without changing the rest of the existing fields of that specific document.
 Improved performance: all the information about a document can be
found in that exact same document. There is no need to refer to external
information, which might not be the case for a relational database where
the user might have to request other tables.
Document Database Limitations
 Consistency check issues: because documents do not necessarily need
to have a relationship with one another, and two documents can have
different fields.
 Atomicity issues: If we have to change two collections of documents, we
will need to run a separate query for each document.
When to Use Document Databases
 Recommended when your data schema is subject to constant changes in
the future.
Document Database Applications
 Because of their flexibility, document databases can be practical for
online user profiles, where different users can have different types of
information. In this case, each user’s profile is stored only by using
attributes that are specific to them.
 They can be used for content management, which requires effective
storage of data from a variety of sources. That information can then be
used to create and incorporate new types of content.
2. Key-value Databases
These are the simplest types of NoSQL databases. Every item is stored in the
database in a key-value pair. We can think of it as a table with exactly two
columns. The first column contains a unique key. The second column is the
value for each key. The values can be in different data types, such as integer,
string, and float, or more complex data types, such as image and document.
The following example illustrates a key-value database containing information
about customers where the key is their phone number, and the value is their
monthly purchase.
Key-value Database Advantages
 Simplicity: the key-value structure is straightforward. The absence of
data type makes it simple to use.
 Speed: the simple data format makes read and write operations faster.
Key-value Database Limitations
 They cannot perform any filtering on the value column because the
returned value is all the information stored in the value field.
 It is optimized only by having a single key and value. Storing multiple
values would require a parser.
 The value is updated only as a whole, which requires getting the
complete data, performing the required processing on that data, and
finally storing back the whole data. This might create a performance
issue when the processing requires a lot of time.
When to Use Key-value Databases
 Adapted for applications based on simple key-based queries.
 Used for simple applications that need to temporarily store simple
objects such as cache.
 They can be used as well when there is a need for real-time data access.
Applications
 They are better for simple applications that need to temporarily store
simple objects such as cache.
3. Wide-column Databases
As the name suggests, column-oriented databases are used to store data as a
collection of columns, where each column is treated separately, and the
implementation logic is based on Google Big Table paper. They are mostly used
for analytical workloads such as business intelligence, data warehouse
management, and customer relationship management.
For instance, we can quickly get the average age and average price respectively
of customers and products with the aggregation function AVG on each column.

4. Graph/node Databases
Graph databases are used to store, map and search relationships between
nodes through edges. A node represents a data element, also called an object
or entity. Each node has an incoming or outcoming edge. An edge represents
the relationship between two nodes. Those edges contain some properties
corresponding to the nodes they connect.
“Zoumana studies at Texas Tech University. He likes to run at the Park inside
the University”
Graph/node Database Advantages
 They are an agile and flexible structure.
 The relationship between nodes in the database is human readable and
explicit, thus easy to understand.
Graph/node Database Limitations
 There is no standardized query language because each language is
platform-dependent.
 The previous reason makes it difficult to find support online when facing
an issue.
When to Use Graph/node Databases
 They can be used when you need to create relationships between data
elements and be able to quickly retrieve those relationships.
Applications
 They can be used to perform sophisticated fraud detection in real-time
financial transactions.
 They can be used for mining data from social media. For instance,
LinkedIn uses a graph database to identify which users follow each other,
and the relationship between those users and their expertise (ML
Engineer).
 Network mapping can be a great fit for representation as a graph since
those networks map relationships between hardware and the services
they support.
7 Best NoSQL Databases for Data Science
Now that you have a better knowledge of NoSQL databases, let’s look at a list
of NoSQL databases that are popular for data science projects. This analysis is
only focused on open-source NoSQL databases.

1. MongoDB
MongoDB is an open-source document-oriented database that stores data in
JSON format. It is the most commonly used database and was designed for high
availability and scalability, providing auto-sharing and built-in replication.
Our Introduction to MongoDB course covers the use of MongoDB and Python.
It helps in acquiring the skills to manipulate and analyze flexibly structured data
with MongoDB. Uber, LaunchDarkl, Delivery Hero, and 4300 companies use
MongoDB in their tech stack.
2. Cassandra
Cassandra is also an open-source large column database. It can distribute your
data across multiple machines and automatically repartition as you add new
machines to your infrastructure. Uber, Facebook, Netflix, and 506 other
companies use it in their tech stack.
3. Elasticsearch
Similar to MongoDB, Elasticsearch is also a document-oriented database and
open-source. It is a world-leading search and analytical tool focusing on
scalability and speed. Uber, Shopify, Udemy, and about 3760 other companies
use it in their stack.
4. Neo4J
Neo4J is an open-source graph-oriented database. It is mainly used to deal with
growing data with relationships. Around 220 companies reportedly use it in
their tech stack.
5. HBase
This is a distributed and column-oriented database. It also provides the same
capabilities as Google’s BigTable on top of Apache Hadoop. Reportedly, 81
companies use HBase on their tech stack.
6. CouchDB
CouchDB is also an open-source document-oriented database that collects and
stores data in a JSON format. Around 84 companies use it on their tech stack.
7. OrientDB
Also an open-source database, OrientDB is a multi-model database supporting
graph, document, key-value, and object models. Only 13 companies reportedly
use it on their tech stack.

You might also like