0% found this document useful (0 votes)
29 views

NoSQL Database

NoSQL databases are non-relational databases that are designed to scale horizontally and handle large volumes of data across clusters of commodity servers. They provide high performance for read/write operations and are used by companies that need to store large amounts of semi-structured or unstructured data like social networks and e-commerce sites.

Uploaded by

josephowino13101
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views

NoSQL Database

NoSQL databases are non-relational databases that are designed to scale horizontally and handle large volumes of data across clusters of commodity servers. They provide high performance for read/write operations and are used by companies that need to store large amounts of semi-structured or unstructured data like social networks and e-commerce sites.

Uploaded by

josephowino13101
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

NoSQL DATABASES

NoSQL database stands for "Not Only SQL" or "Not SQL.

NoSQL Database is a non-relational Data Management System that does not require a fixed
schema. It avoids joins, and is easy to scale. The major purpose of using a NoSQL database is for
distributed data stores with humongous data storage needs. NoSQL is used for Big data and real-
time web apps. For example, companies like Twitter, Facebook and Google collect terabytes of
user data every single day.

While traditional RDBMS uses SQL syntax to store and retrieve structured data, NoSQL
database system encompasses a wide range of database technologies that can store structured,
semi-structured, unstructured and polymorphic data.

The concept of NoSQL databases became popular with Internet giants like Google, Facebook,
Amazon, etc. who deal with huge volumes of data. The system response time becomes slow
when RDBMS is used for massive volumes of data.

Characteristics of NoSQL databases

a. Non-relational- NoSQL databases never follow the relational model. It therefore means
that there are no tables with flat fixed columns & records, does not require object-
relational mapping and data normalization, no complex features like query languages,
query planners, referential integrity joins, ACID properties among others.
b. Schema- free- unlike relational databases, NoSQL databases either are schema-free or
have relaxed schemas. They don’t require any sort of definition of the schema of the data
and offer heterogeneous structures of data in the same domain
c. Simple API- NoSQL databases offers easy to use interfaces for storage and querying
data. The APIs allow low-level data manipulation & selection methods with text-based
protocols mostly used with HTTP REST with JSON. The APIs are Web-enabled to allow
the databases run as internet-facing services.
d. Distributed- Multiple NoSQL databases can be executed in a distributed fashion with the
provision for auto-scaling and fail-over capabilities. The relational ACID properties are
often sacrificed for scalability and throughput. There is no synchronous replication
between distributed nodes but rather asynchronous Multi-Master Replication, peer-to-
peer, HDFS Replication. The most important aspect is eventual consistency. The shared
Nothing Architecture enables less coordination and higher distribution.

Types of NoSQL databases


NoSQL Databases are mainly categorized into four types. Every category has its unique
attributes and limitations. None of them is better to solve all the problems. However users should
select the database based on their product needs.
They include:

a. Key-value Pair Based


b. Column-oriented Graph
c. Graphs based
d. Document-oriented

Key Value Pair Based

It is one of the most basic NoSQL database examples. Data is stored in key/value pairs. It is
designed in such a way to handle lots of data and heavy load. Key-value pair storage databases
store data as a hash table where each key is unique, and the value can be a JSON, BLOB (Binary
Large Objects), string, etc.

Key value stores help the developer to store schema-less data. They work best for shopping cart
contents. Redis, Dynamo, Riak are some NoSQL examples of key-value store Databases. They
are all based on Amazon's Dynamo paper.

Column-based
Column-oriented databases work on columns and are based on Big Table paper by Google.
Every column is treated separately. Values of single column databases are stored contiguously.

They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN etc. as
the data is readily available in a column.

Column-based NoSQL databases are widely used to manage data warehouses, business
intelligence, CRM, Library card catalogs among others.

HBase, Cassandra, HBase, Hypertable are NoSQL query examples of column based database.

Document-Oriented

Document-Oriented NoSQL DB stores and retrieves data as a key value pair but the value part is
stored as a document. The document is stored in JSON or XML formats. The value is understood
by the DB and can be queried.

Unlike in relational databases where you have to know what columns you have, for a document
database, you have data store like JSON object where you do not require defining the columns
and this makes it flexible.

The document type is mostly used for CMS systems, blogging platforms, real-time analytics & e-
commerce applications. It should not be used for complex transactions which require multiple
operations or queries against varying aggregate structures.

Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes are popular Document originated
DBMS systems.
Graph-Based

A graph type database stores entities as well as the relations amongst those entities. The entity is
stored as a node with the relationship as edges. An edge gives a relationship between nodes.
Every node and edge has a unique identifier.

Unlike the relational database where tables are loosely connected, a Graph database is a multi-
relational in nature. Traversing relationship is fast as they are already captured into the DB, and
there is no need to calculate them.

Graph base database mostly used for social networks, logistics, spatial data among others.
Neo4J, Infinite Graph, OrientDB, FlockDB are some popular graph-based databases.

The CAP Theorem


CAP theorem is also called brewer's theorem. It states that it is impossible for a distributed data
store to offer more than two out of three guarantees i.e Consistency, Availability and Partition
Tolerance.

Consistency- The data should remain consistent even after the execution of an operation. This
means once data is written, any future read request should contain that data. For example, after
updating the order status, all the clients should be able to see the same data.

Availability-The database should always be available and responsive. It should not have any
downtime.
Partition Tolerance- Partition Tolerance means that the system should continue to function even
if the communication among the servers is not stable. For example, the servers can be partitioned
into multiple groups which may not communicate with each other. Here, if part of the database is
unavailable, other parts are always unaffected.

At any particular given time, you must always give something up i.e consistency, availability or
tolerance to failure and reconfiguration.

Database Sharding

Database sharding is the process of storing a large database across multiple machines. A single
machine, or database server, can store and process only a limited amount of data. Database
sharding overcomes this limitation by splitting data into smaller chunks, called shards, and
storing them across several database servers. All database servers usually have the same
underlying technologies, and they work together to store and process large volumes of data.

By distributing the data across multiple machines, the storage capacity of the system is increased.
In addition, a sharded database can handle more requests than a single machine can.

Sharding is a form of scaling known as horizontal scaling or scale-out since additional nodes are
brought on to share the load. Horizontal scaling allows for near-limitless scalability to handle big
data and intense workloads. In contrast, vertical scaling refers to increasing the power of a single
machine or single server through a more powerful CPU, increased RAM, or increased storage
capacity.

Advantages of Sharding

 Increased read/write throughput — By distributing the dataset across multiple shards,


both read and write operation capacity is increased as long as read and write operations
are confined to a single shard.
 Increased storage capacity — by increasing the number of shards, you can also increase
overall total storage capacity, allowing near-infinite scalability.

 High availability —shards provide high availability in two ways. First, since each shard
is a replica set, every piece of data is replicated. Second, even if an entire shard becomes
unavailable since the data is distributed, the database as a whole still remains partially
functional, with part of the schema on different shards.

Disadvantages of sharding

 Query overhead — each sharded database must have a separate machine or service which
understands how to route a querying operation to the appropriate shard. This introduces
additional latency on every operation. Furthermore, if the data required for the query is
horizontally partitioned across multiple shards, the router must then query each shard and
merge the result together. This can make an otherwise simple operation quite expensive
and slow down response times.
 Complexity of administration — with a single unsharded database, only the database
server itself requires upkeep and maintenance. With every sharded database, on top of
managing the shards themselves, there are additional service nodes to maintain. Plus, in
cases where replication is being used, any data updates must be mirrored across each
replicated node. Overall, a sharded database is a more complex system which requires
more administration.
 Increased infrastructure costs — Sharding by its nature requires additional machines and
compute power over a single database server. While this allows your database to grow
beyond the limits of a single machine, each additional shard comes with higher costs. The
cost of a distributed database system, especially if it is missing the proper optimization,
can be significant.

Benefits of NoSQL databases

1. Horizontal scalable and cheap


While SQL data is stored as rows grouped inside tables, NoSQL data is stored as individual, self-
contained units inside various structures like documents, dictionaries, columns, and graphs. This
allows NoSQL data to reside on multiple machines. As data quantities increase, NoSQL
databases can grow by adding more smaller devices (horizontal scaling), unlike their SQL
counterparts, which require expensive hardware upgrades (vertical scaling). This makes the
NoSQL database well suited for big-data scenarios like storing continuous data from IoT sensors.
2. High performance i.e millions of Transactions per second
NoSQL databases aren’t saddled with the responsibility of ensuring data integrity, which means
they don’t have to validate the data's integrity when it enters the system. This is especially useful
in scenarios like logging, where write speed is a key metric, and data integrity checks are not
required.

3. Flexibility of data types


NoSQL databases can ingest unstructured and semi-structured data at scale, unlike their SQL
counterparts, which define data types for every column before data can enter the system using
constraints. Not having these checks in NoSQL means developers can save a lot of time not
having to define data types and relationships for every single column. It also means that if the
structure of incoming data changes, it can go into the same database without developers getting
involved.

4. Availability and Redundance: Zero down time


NoSQL databases replicate data in a near-real-time manner across their nodes. Each node can
potentially have multiple replicas, which can take over if the primary node fails and start fielding
read requests from client applications. Furthermore, nodes and replicas can be spread across
diverse geographic locations, ensuring 100 percent uptime in the event of a server crash, power
loss to one or more nodes, natural disaster among many others

5. Ease of use via APIs


NoSQL provides access to stored data objects via object-based APIs, as opposed to SQL, which
is a mathematical, set-based language. Many enterprise developers lack the experience to
develop complex logic in SQL — they prefer to call an API and let it figure out how to fetch the
data. This makes it easier and faster to scale.

6. No set schema, provides support for semi-structured and unstructured data


NoSQL is schemaless, which means databases can store data of any type. This even means
support for types of data not invented yet, providing a level of future-proofing that SQL-based
databases just can't offer. This also means no data truncation (SQL- based systems requires that
you specify the length of every data point

7. Automatic data expiration


NoSQL databases allow enterprises to define the age of any data entering the system. The ability
to define an age limit for data is made possible by the BASE property of NoSQL databases,
whereas implementing the same feature in a SQL-based system would require coding. This
feature may come in handy when collecting log data in databases, where we may not be
interested in something that happened after more than a specific amount of time.

8. Specialized data type support: key value, document, graph and wide column
NoSQL databases can store data in various ways. This can be document databases and store data
in XML, JSON, or BSON format, whichever is closer to the data objects used by your
application. The typical use case for document databases includes trading platforms, mobile app
development, and e-commerce platforms.

Drawbacks of NoSQL Databases

1. Purpose specific
SQL databases can do much more than just storing data. They can ensure data integrity and can
be used to calculate analytics, pull data using complex joins, and quickly and efficiently retrieve
large quantities of data. They are more portable and come with well-defined and understood
standards. NoSQL databases are not general in this way. They are purpose-built to do one thing
well, meaning you will have to dedicate your NoSQL database for one use case only and would
have to create a second database for a different purpose. If you are planning to use your database
for various use cases, NoSQL is less likely to be the right candidate.

2. Small community
SQL databases have been around for about half a century, which means that there is a decent
talent pool: People with deep SQL experience are relatively easy to find. That is not the case
with NoSQL databases, which have only been around for about a decade. Furthermore, NoSQL
databases generally have specific APIs, and fewer people have deep experience with any specific
API. Finding people that can run your NoSQL setup can be a challenge.

3. Relatively large
In terms of storage utilization, NoSQL databases are fast but relatively inefficient compared to
SQL-based systems. NoSQL databases are redundant by design because they can create multiple
copies of data internally. Schema metadata gets stored in every document which also increases
space usage.

4. Newer and Less Mature


SQL databases have had more than half a century of adoption and improvement, and the
technology is quite mature. NoSQL databases have only been around for a fraction of the time of
SQL-based systems, which means NoSQL database offerings aren’t mature enough yet and
stability is sometimes an issue.

5. No universal Query Language and No Joins


Relational database systems rely on SQL which has been around for a half a century. The
language is mature and well understood by a large cadre of software professionals. Joins give us
the ability to pull information from multiple related tables. SQL allows joins which means
queries are flexible and can fetch data in various ways to extract detailed meaning from raw data.

NoSQL APIs are generally not flexible and do not allow the flexibility joins can offer. They are
not as prevalent and the talent pool of professionals comfortable enough to work with these APIs
is not deep. While NoSQL databases provide a fast way to store and retrieve data, a lot of the
work of number crunching is left mostly to developer tools. Developers can make API calls like
below to fetch the data into memory first, then use tools like Python to do the number crunching
in memory and calculate results. Relating information inside the NoSQL database is hard and
generally discouraged which is a big disadvantage.

6. Lack of Data Integrity safeguards


Relational database systems offer better data integrity at the cost of speed. NoSQL databases
have no mechanisms to prevent bad data from entering the system. If users wish to build in this
functionality, they must write custom logic, which can get tedious.

7. Each NoSQL implementation has its own syntax


Relational and NoSQL databases both have a variety of vendors. However, the learning curve for
switching relational database vendors is not all that steep because all vendors offer a flavor of
SQL, a language that is widely understood and adopted. On the other hand, NoSQL databases
rely on APIs developed and customized by the vendors. These APIs can vary greatly from
vendor to vendor, making it difficult to switch vendors once users are locked in.

8. Potential data retrieval inconsistencies


SQL and NoSQL models differ widely when it comes to reading rapidly changing data. SQL
uses isolation levels i.e they can define whether they are OK reading stale data or must have the
latest data while accepting that the queries will be delayed. NoSQL model has no such controls.
Users may see data in stale state until it gets updated on all nodes.

Conclusion

With databases forming the core of most applications, it’s essential to choose the model that fits
your requirements. Selecting the right model depends not just on speed and accuracy
requirements but on your development team’s skillset as well.

NoSQL databases are a great fit when you’re dealing with unstructured or semi-structured data,
such as IoT sensor data, where logging data quickly is more important than waiting for
confirmation that the data is good. It’s also great when dealing with huge volumes of data in
applications where speed is more important than accuracy. NoSQL is a good fit when you have
developers comfortable working with NoSQL APIs, and you are trying to minimize downtime
when servicing clients.

SQL databases are fine when speed is not a requirement but accuracy, consistent and integrity
are important — think about a banking application, where we’d rather wait a bit than show an
incorrect account balance! It’s also well-suited to situations where developers are comfortable
working with SQL and when you are trying to use the same database to perform a variety of
tasks, including complex joins to extract meaning from your data.

You might also like