Unit 4-1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

UNIT 4

Introduction to NoSQL
NoSQL is a type of database management system (DBMS) that is designed to
handle and store large volumes of unstructured and semi-structured data.
Unlike traditional relational databases that use tables with
pre-defined schemas to store data, NoSQL databases use flexible data
models that can adapt to changes in data structures and are capable of
scaling horizontally to handle growing amounts of data.
The term NoSQL
originally referred to “non-SQL” or “non-relational” databases, but the
term has since evolved to mean “not only SQL,” as NoSQL databases have
expanded to include a wide range of different database architectures and
data models.

NoSQL databases are generally classified into four main


categories:
1. Document databases: These
databases store data as semi-structured documents, such as JSON or XML,
and can be queried using document-oriented query languages.

2. Key-value stores: These databases store data as key-value pairs, and are
optimized for simple and fast read/write operations.

3. Column-family stores: These


databases store data as column families, which are sets of columns that
are treated as a single entity. They are optimized for fast and
efficient querying of large amounts of data.

4. Graph databases: These databases store data as nodes and edges, and are
designed to handle complex relationships between data.

NoSQL databases are often used in applications where there is a high volume of
data that
needs to be processed and analyzed in real-time, such as social media

UNIT 4 1
analytics, e-commerce, and gaming. They can also be used for other
applications, such as content management systems, document management,
and customer relationship management.
However, NoSQL databases may not be suitable for all applications, as they may
not provide the
same level of data consistency and transactional guarantees as
traditional relational databases. It is important to carefully evaluate
the specific needs of an application when choosing a database management
system.

NoSQL originally referring to non SQL or non relational is a database that


provides a mechanism for storage and retrieval of data. This data is
modeled in means other than the tabular relations used in relational
databases. Such databases came into existence in the late 1960s
,
but did not obtain the NoSQL moniker until a surge of popularity in the
early twenty-first century. NoSQL databases are used in real-time web
applications and big data and their use are increasing over time.

NoSQL systems are also sometimes called Not only SQL to emphasize the fact
that they may support SQL-like query languages. A NoSQL database includes
simplicity of design, simpler horizontal
scaling to clusters of machines,
has and finer control over availability. The data structures used by NoSQL
databases are different from those used by default in relational
databases which makes some operations faster in NoSQL. The suitability
of a given NoSQL database depends on the problem it should solve.

NoSQL databases, also known as “not only SQL” databases, are a new type of
database management system that has, gained popularity in recent years.
Unlike traditional relational
databases, NoSQL databases are designed to handle large amounts of
unstructured or semi-structured data, and they can accommodate dynamic
changes to the data model. This makes NoSQL databases a good fit for
modern web applications, real-time analytics, and big data processing.

Data structures used by NoSQL databases are


sometimes also viewed as more flexible than relational database tables.

UNIT 4 2
Many NoSQL stores compromise consistency in favor of availability,
speed,
, and partition tolerance.
Barriers to the greater adoption of NoSQL stores include the use of
low-level query languages, lack of standardized interfaces, and huge
previous investments in existing relational databases.

Most NoSQL stores lack true ACID(Atomicity,


Consistency, Isolation, Durability) transactions but a few databases,
such as MarkLogic, Aerospike, FairCom c-treeACE, Google Spanner (though
technically a NewSQL database), Symas LMDB, and OrientDB have made them
central to their designs.

Most NoSQL databases offer a concept of eventual


consistency in which database changes are propagated to all nodes so
queries for data might not return updated data immediately or might
result in reading data that is not accurate which is a problem known as
stale reads. Also,
has some NoSQL
systems may exhibit lost writes and other forms of data loss. Some NoSQL
systems provide concepts such as write-ahead logging to avoid data
loss.

One simple example of a NoSQL database is a document database. In a


document database, data is stored in documents rather
than tables. Each document can contain a different set of fields, making it
easy to accommodate changing data requirements

For example, “Take, for instance, a database that


holds data regarding employees.”. In a relational database, this
information might be stored in tables, with one table for employee
information and another table for department information. In a document
database, each employee would be stored as a separate document, with all of
their information contained within the document.

NoSQL databases are a relatively new type of database management system


that hasa gained popularity in recent years due to their scalability and
flexibility. They are designed to handle large amounts of unstructured
or semi-structured data and can handle dynamic changes to the data

UNIT 4 3
model. This makes NoSQL databases a good fit for modern web
applications, real-time analytics, and big data processing.

Key Features of NoSQL:

1. Dynamic schema: NoSQL


databases do not have a fixed schema and can accommodate changing data
structures without the need for migrations or schema alterations.

2. Horizontal scalability:
NoSQL databases are designed to scale out by adding more nodes to a
database cluster, making them well-suited for handling large amounts of
data and high levels of traffic.

3. Document-based: Some NoSQL


databases, such as MongoDB, use a document-based data model, where data
is stored in a schema-less semi-structured format, such as JSON or BSON.

4. Key-value-based: Other NoSQL databases, such as Redis, use a key-value


data model, where data is stored as a collection of key-value pairs.

5. Column-based: Some NoSQL databases, such as Cassandra, use a column-


based data model, where data is organized into columns instead of rows.

6. Distributed and high availability: NoSQL databases are often designed to be


highly available and to
automatically handle node failures and data replication across multiple
nodes in a database cluster.

7. Flexibility: NoSQL databases allow developers to store and retrieve data in a


flexible and dynamic
manner, with support for multiple data types and changing data
structures.

8. Performance: NoSQL databases are optimized for high performance and can
handle a high volume of
reads and writes, making them suitable for big data and real-time
applications.

Advantages of NoSQL:
There are many advantages of working with NoSQL databases such as

UNIT 4 4
MongoDB and Cassandra. The main advantages are high scalability and high
availability.

1. High scalability: NoSQL


databases use sharding for horizontal scaling. Partitioning of data and
placing it on multiple machines in such a way that the order of the data is
preserved is sharding. Vertical scaling means adding more resources
to the existing machine whereas horizontal scaling means adding more
machines to handle the data. Vertical scaling is not that easy to
implement but horizontal scaling is easy to implement. Examples of
horizontal scaling databases are MongoDB, Cassandra, etc. NoSQL can
handle a huge amount of data because of scalability, as the data grows
NoSQL scales
The auto itself to handle that data in an efficient manner.

2. Flexibility: NoSQL databases are designed to handle unstructured or semi-


structured data, which
means that they can accommodate dynamic changes to the data model. This
makes NoSQL databases a good fit for applications that need to handle
changing data requirements.

3. High availability: The auto, replication feature in NoSQL databases makes it


highly available
because in case of any failure data replicates itself to the previous
consistent state.

4. Scalability: NoSQL databases are highly scalable, which means that they can
handle large amounts of
data and traffic with ease. This makes them a good fit for applications
that need to handle large amounts of data or traffic

5. Performance: NoSQL databases are designed to handle large amounts of data


and traffic, which means
that they can offer improved performance compared to traditional
relational databases.

6. Cost-effectiveness: NoSQL
databases are often more cost-effective than traditional relational

UNIT 4 5
databases, as they are typically less complex and do not require
expensive hardware or software.

7. Agility: Ideal for agile development.

Disadvantages of NoSQL: NoSQL has the following disadvantages.

1. Lack of standardization: There are many different types of NoSQL databases,


each with its own
unique strengths and weaknesses. This lack of standardization can make
it difficult to choose the right database for a specific application

2. Lack of ACID compliance:


NoSQL databases are not fully ACID-compliant, which means that they do
not guarantee the consistency, integrity, and durability of data. This
can be a drawback for applications that require strong data consistency
guarantees.

3. Narrow focus: NoSQL


databases have a very narrow focus as it is mainly designed for storage
but it provides very little functionality. Relational databases are a
better choice in the field of Transaction Management than NoSQL.

4. Open-source: NoSQL is an databaseopen-source database. There is no


reliable standard for NoSQL yet. In other words,
two database systems are likely to be unequal.

5. Lack of support for complex queries: NoSQL databases are not designed to
handle complex queries, which means that
they are not a good fit for applications that require complex data
analysis or reporting.

6. Lack of maturity: NoSQL


databases are relatively new and lack the maturity of traditional
relational databases. This can make them less reliable and less secure
than traditional databases.

7. Management challenge: The


purpose of big data tools is to make the management of a large amount of
data as simple as possible. But it is not so easy. Data management in
NoSQL is much more complex than in a relational database. NoSQL, in

UNIT 4 6
particular, has a reputation for being challenging to install and even
more hectic to manage on a daily basis.

8. GUI is not available: GUI mode tools to access the database are not flexibly
available in the market.

9. Backup: Backup is a great


weak point for some NoSQL databases like MongoDB. MongoDB has no
approach for the backup of data in a consistent manner.

10. Large document size: Some


database systems like MongoDB and CouchDB store data in JSON format.
This means that documents are quite large (BigData, network bandwidth,
speed), and having descriptive key names actually hurts since they
increase the document size.

Types of NoSQL database: Types of NoSQL databases and the name of the
database system that falls in that category are:

1. Graph Databases: Examples – Amazon Neptune, Neo4j

2. Key value store: Examples – Memcached, Redis, Coherence

3. Column: Examples – Hbase, Big Table, Accumulo

4. Document-based: Examples – MongoDB, CouchDB, Cloudant

When should NoSQL be used:

1. When a huge amount of data needs to be stored and retrieved.

2. The relationship between the data you store is not that important

3. The data changes over time and is not structured.

4. Support of Constraints and Joins is not required at the database level

5. The data is growing continuously and you need to scale the database regularly
to handle the data.

In conclusion, NoSQL
databases offer several benefits over traditional relational databases,
such as scalability, flexibility, and cost-effectiveness. However, they
also have several drawbacks, such as a lack of standardization, lack of
ACID compliance, and lack of support for complex queries. When choosing a

UNIT 4 7
database for a specific application, it is important to weigh the
benefits and drawbacks carefully to determine the best fit.

Business Driver in nosql


The business drivers behind the adoption of NoSQL databases are diverse and
often specific to the unique needs and challenges faced by modern organizations.
Here's a detailed exploration of the key business drivers for adopting NoSQL
databases:

1. Scalability:

Horizontal Scalability: NoSQL databases are designed to scale out


horizontally, allowing organizations to add more servers to their
infrastructure to handle increasing data volumes and user loads.

Distributed Architecture: NoSQL databases distribute data across multiple


nodes in a cluster, enabling seamless scaling without the need for
expensive and disruptive hardware upgrades.

2. Performance:

High Throughput and Low Latency: NoSQL databases are optimized for
fast read and write operations, making them suitable for real-time
applications and high-speed data processing tasks.

Efficient Storage and Retrieval: NoSQL databases use optimized storage


formats and indexing techniques to efficiently store and retrieve data,
resulting in faster query performance compared to traditional relational
databases.

3. Flexibility and Schema Agnosticism:

Dynamic Schema: NoSQL databases support flexible schemas, allowing


organizations to store heterogeneous data types and adapt to changing
data requirements without requiring a predefined schema.

Schema Evolution: NoSQL databases facilitate schema evolution by


accommodating changes to data structures over time, minimizing
downtime and disruption to applications.

4. Big Data and Unstructured Data:

UNIT 4 8
Handling Variety: NoSQL databases excel at handling diverse data types,
including unstructured and semi-structured data such as documents,
JSON, XML, and multimedia files.

Support for Big Data Workloads: NoSQL databases are well-suited for big
data analytics and processing tasks, enabling organizations to analyze
large volumes of data in real-time and derive valuable insights.

5. High Availability and Fault Tolerance:

Distributed Replication: NoSQL databases employ replication strategies to


ensure data availability and fault tolerance. Data is replicated across
multiple nodes in the cluster, reducing the risk of data loss in the event of
node failures.

Automatic Failover: NoSQL databases support automatic failover


mechanisms, where if one node fails, another node automatically takes
over its responsibilities to ensure continuous service availability.

6. Cost Efficiency:

Lower Total Cost of Ownership (TCO): NoSQL databases can often


provide lower TCO compared to traditional relational databases, especially
for large-scale deployments. They require less hardware and
infrastructure due to their distributed nature and efficient scaling
capabilities.

Open Source Solutions: Many NoSQL databases are open-source and


free to use, eliminating licensing costs and reducing upfront investment.

7. Agility and Rapid Development:

Agile Development: NoSQL databases support agile development


methodologies by enabling rapid prototyping, iterative development, and
faster time-to-market for new applications and features.

DevOps and Continuous Integration/Deployment (CI/CD): NoSQL


databases integrate seamlessly with DevOps practices and CI/CD
pipelines, allowing organizations to automate deployment, testing, and
scaling of database infrastructure.

8. Specific Use Cases:

UNIT 4 9
Real-Time Analytics: NoSQL databases are used for real-time analytics
applications such as fraud detection, recommendation engines, and
personalized content delivery.

IoT and Sensor Data: NoSQL databases are ideal for handling large
volumes of time-series data generated by IoT devices, sensors, and
machine logs.

Content Management and Personalization: NoSQL databases power


content management systems, e-commerce platforms, and personalized
recommendation systems, where flexibility, scalability, and performance
are critical.

Overall, the adoption of NoSQL databases is driven by the need for scalable,
flexible, and high-performance data management solutions that can meet the
demands of modern businesses in the digital age. By embracing NoSQL
technology, organizations can gain a competitive edge, unlock new opportunities,
and deliver innovative products and services to their customers.

NoSQL Data Architecture Patterns


Architecture Pattern is a logical way of categorizing data that will be stored on
the Database. NoSQL is a type of database which helps to perform operations on
big data and
store it in a valid format. It is widely used because of its
flexibility and a wide variety of services.

Architecture Patterns of NoSQL:


The data is stored in NoSQL in any of the following four data architecture patterns.

1. Key-Value Store Database


2. Column Store Database
3. Document Database
4. Graph Database

These are explained as following below.


1. Key-Value Store Database:

UNIT 4 10
This model is one of the most basic models of NoSQL databases. As the name
suggests, the
data is stored in form of Key-Value Pairs. The key is usually a sequence
of strings, integers or characters but can also be a more advanced data
type. The value is typically linked or co-related to the key. The
key-value pair storage databases generally store data as a hash table
where each key is unique. The value can be of any type (JSON,
BLOB(Binary Large Object), strings, etc). This type of pattern is
usually used in shopping websites or e-commerce applications.

Advantages:

Can handle large amounts of data and heavy load,

Easy retrieval of data by keys.

Limitations:

Complex queries may attempt to involve multiple key-value pairs which may
delay performance.

Data can be involving many-to-many relationships which may collide.

Examples:

DynamoDB

Berkeley DB

2. Column Store Database:

UNIT 4 11
Rather than storing
data in relational tuples, the data is stored in individual cells which
are further grouped into columns. Column-oriented databases work only on
columns. They store large amounts of data into columns together. Format
and titles of the columns can diverge from one row to other. Every
column is treated separately. But still, each individual column may
contain multiple other columns like traditional databases.
Basically, columns are mode of storage in this type.

Advantages:

Data is readily available

Queries like SUM, AVERAGE, COUNT can be easily performed on columns.

Examples:

HBase

Bigtable by Google

Cassandra

3. Document Database:

UNIT 4 12
The document database
fetches and accumulates data in form of key-value pairs but here, the
values are called as Documents. Document can be stated as a complex data
structure. Document here can be a form of text, arrays, strings, JSON,
XML or any such format. The use of nested documents is also very common.
It is very effective as most of the data created is usually in form of
JSONs and is unstructured.

Advantages:

This type of format is very useful and apt for semi-structured data.

Storage retrieval and managing of documents is easy.

Limitations:

Handling multiple documents is challenging

Aggregation operations may not work accurately.

Examples:

MongoDB

CouchDB

Figure – Document Store Model in form of JSON documents

UNIT 4 13
4. Graph Databases:
Clearly, this architecture
pattern deals with the storage and management of data in graphs. Graphs
are basically structures that depict connections between two or more
objects in some data. The objects or entities are called as nodes and
are joined together by relationships called Edges. Each edge has a
unique identifier. Each node serves as a point of contact for the graph.
This pattern is very commonly used in social networks where there are a
large number of entities and each entity has one or many
characteristics which are connected by edges. The relational database
pattern has tables that are loosely connected, whereas graphs are often
very strong and rigid in nature.

Advantages:

Fastest traversal because of connections.

Spatial data can be easily handled.

Limitations:
Wrong connections may lead to infinite loops.
Examples:

Neo4J

FlockDB( Used by Twitter)

UNIT 4 14
Figure – Graph model format of NoSQL Databases

5. Wide Column Stores:

Description: Wide column stores store data in columns rather than rows,
enabling efficient retrieval of columns across multiple rows.

Use Case: Suitable for analytics, data warehousing, and applications


requiring high scalability and fault tolerance.

Example: Apache Cassandra, Apache HBase, ScyllaDB.

6. Time-Series Databases:

Description: Time-series databases specialize in storing and querying


time-stamped data points, making them suitable for monitoring, IoT, and
telemetry data.

Use Case: Ideal for storing and analyzing time-series data generated by
sensors, devices, and applications.

Example: InfluxDB, Prometheus, TimescaleDB.

7. Multimodel Databases:

Description: Multimodel databases support multiple data models within a


single database system, providing flexibility to handle diverse data types
and use cases.

UNIT 4 15
Use Case: Useful for applications requiring a combination of key-value,
document, graph, and relational data models.

Example: ArangoDB, Couchbase, OrientDB.

8. NewSQL Databases:

Description: NewSQL databases combine the scalability and flexibility of


NoSQL with the ACID transactions and SQL query capabilities of traditional
relational databases.

Use Case: Suitable for applications requiring strong consistency,


transactional integrity, and real-time analytics.

Example: Google Spanner, CockroachDB, TiDB.

Each of these NoSQL architectural patterns offers distinct advantages and trade-
offs, allowing organizations to choose the most suitable approach based on their
specific requirements, such as scalability, performance, flexibility, and
consistency. By leveraging these patterns, organizations can effectively manage
and derive insights from big data while optimizing resource utilization and
ensuring high availability and fault tolerance.

MongoDB: An introduction
MongoDB, the most popular NoSQL database, is an open-source document-
oriented
database. The term ‘NoSQL’ means ‘non-relational’. It means that MongoDB
isn’t based on the table-like relational database structure but
provides an altogether different mechanism for storage and retrieval of
data. This format of storage is called BSON ( similar to JSON format).

A simple MongoDB document Structure:

{
title: 'Geeksforgeeks',
by: 'Harshit Gupta',
url: 'https://www.geeksforgeeks.org',

UNIT 4 16
type: 'NoSQL'
}

SQL databases store data in tabular


format. This data is stored in a predefined data model which is not very
much flexible for today’s real-world highly growing applications.
Modern applications are more networked, social and interactive than ever.
Applications are storing more and more data and are accessing it at higher rates.
Relational Database Management System(RDBMS) is
not the correct choice when it comes to handling big data by the virtue
of their design since they are not horizontally scalable
. If
the database runs on a single server, then it will reach a scaling
limit. NoSQL databases are more scalable and provide superior
performance. MongoDB is such a NoSQL database that scales by adding more
and more servers and increases productivity with its flexible document
model.

RDBMS vs MongoDB:
RDBMS has a typical schema design that shows number of tables and
the relationship between these tables whereas MongoDB is
document-oriented. There is no concept of schema or relationship.

Complex transactions are not supported in MongoDB because complex join


operations are not available.

MongoDB allows a highly flexible and scalable document structure.


For example, one data document of a collection in MongoDB can have two
fields whereas the other document in the same collection can have four.

MongoDB is faster as compared to RDBMS due to efficient indexing and


storage techniques.

There are a few terms that are related in both databases. What’s
called Table in RDBMS is called a Collection in MongoDB. Similarly, a
Row is called a Document and a Column is called a Field. MongoDB
provides a default ‘_id’ (if not provided explicitly) which is a 12-byte

UNIT 4 17
hexadecimal number that assures the uniqueness of every document. It is
similar to the Primary key in RDBMS.

Features of MongoDB:
Document Oriented: MongoDB stores the main subject
in the minimal number of documents and not by breaking it up into
multiple relational structures like RDBMS. For example, it stores all
the information of a computer in a single document called Computer and
not in distinct relational structures like CPU, RAM, Hard disk, etc.

Indexing: Without indexing, a database would have


to scan every document of a collection to select those that match the
query which would be inefficient. So, for efficient searching Indexing
is a must and MongoDB uses it to process huge volumes of data in very
less time.

Scalability: MongoDB scales horizontally using


sharding (partitioning data across various servers). Data is partitioned into
data chunks using the shard key, and these data chunks are evenly
distributed across shards that reside across many physical servers.
Also, new machines can be added to a running database.

Replication and High Availability: MongoDB


increases the data availability with multiple copies of data on
different servers. By providing redundancy, it protects the database
from hardware failures. If one server goes down, the data can be
retrieved easily from other active servers which also had the data
stored on them.

Aggregation: Aggregation operations process data


records and return the computed results. It is similar to the GROUPBY
clause in SQL. A few aggregation expressions are sum, avg, min, max, etc

Where do we use MongoDB?


MongoDB is preferred over RDBMS in the following scenarios:

Big Data: If you have huge amount of data to be


stored in tables, think of MongoDB before RDBMS databases. MongoDB has

UNIT 4 18
built-in solution for partitioning and sharding your database.

Unstable Schema: Adding a new column in RDBMS is


hard whereas MongoDB is schema-less. Adding a new field does not effect
old documents and will be very easy.

Distributed data Since multiple copies of data are


stored across different servers, recovery of data is instant and safe
even if there is a hardware failure.

Language Support by MongoDB:


MongoDB currently provides official driver support for all popular
programming languages like C, C++, Rust, C#, Java, Node.js, Perl, PHP,
Python, Ruby, Scala, Go, and Erlang.

Who’s using MongoDB?


MongoDB has been adopted as backend software by a number of major
websites and services including EA, Cisco, Shutterfly, Adobe, Ericsson,
Craigslist, eBay, and Foursquare.

Advantages of MongoDB
MongoDB offers several potential benefits:

Schema-less. Like other NoSQL databases, MongoDB doesn't require


predefined schemas. It stores any type of data. This gives users the flexibility
to create
any number of fields in a document, making it easier to scale MongoDB
databases compared to relational databases.

Document-oriented. One of the advantages of using documents


is that these objects map to native data types in several programming
languages., Having embedded documents also reduces the need for database
joins, which can lower costs.

Scalability. A core function of MongoDB is its horizontal


scalability, which makes it a useful database for companies running big
data applications. In addition, sharding lets the database distribute

UNIT 4 19
data across a cluster of machines. MongoDB also supports the creation of
zones of data based on a shard key.

Third-party support. MongoDB supports several storage engines and


provides pluggable storage engine APIs that let third parties
develop their own storage engines for MongoDB.

Aggregation. The DBMS also has built-in aggregation capabilities, which lets
users run MapReduce code directly on the database rather than running
MapReduce on Hadoop. MongoDB also includes its own file system called
GridFS, akin to the Hadoop Distributed File System. The use of the file system
is primarily for storing files larger than
BSON's size limit of 16 MB per document. These similarities let MongoDB
be used instead of Hadoop, though the database software does integrate
with Hadoop,
Spark and other data processing frameworks.

Disadvantages of MongoDB
Though there are some valuable benefits to MongoDB, there are some downsides
to it as well.

Continuity. With its automatic failover strategy, a user sets up just one master
node in a MongoDB cluster. If the master fails,
another node will automatically convert to the new master. This switch
promises continuity, but it isn't instantaneous -- it can take up to a
minute. By comparison, the
Cassandra NoSQL database supports multiple master nodes. If one master
goes down, another is
standing by, creating a highly available database infrastructure.

Write limits. MongoDB's single master node also limits how


fast data can be written to the database. Data writes must be recorded
on the master, and writing new information to the database is limited by the
capacity of that master node.

Data consistency. MongoDB doesn't provide full referential


integrity through the use of foreign-key constraints, which could affect data
consistency.

UNIT 4 20
Security. In addition, user authentication isn't enabled by default in MongoDB
databases. However, malicious hackers have targeted large numbers of
unsecured MongoDB systems in attacks, which led to the addition of a default
setting that blocks networked connections to
databases if they haven't been configured by a database administrator.

UNIT 4 21

You might also like